The NAB molecular manipulation language

Generation of Models for "Unusual" DNA and RNA:
A Computer Language for Structural Exploration

Tom Macke, W.A. Svrcek-Seiler, Russell A. Brown, Istvan Kolossvary, Yannick Bomble and David A. Case

Summary

NAB was originally designed as a small modeling language (a "molecular awk"), with a principal focus on constructing models for non-helical nucleic acids. It has been used to construct models of helical and non-helical nucleic acids from a few dozen to a few hundred nucleotides in size, and provides a combination of rigid body transformations and distance geometry to create candidate structures that match input criteria. We have applied NAB to duplex-, triplex- and tetraplex DNA, to RNA hairpins and pseudo-knots, to closed-circular DNA, and to models of the small subunit of the ribosome and of recombination sites. [1]

As the code developed, an implementation of the AMBER force field was added, which includes the AMBER implementation of the generalized Born model for solvation effects. Version 5 includes analytical second derivatives, opening the way to new types of simulations. Force-field calculations can be carried out on proteins and small molecules, as well as nucleic acids, making NAB a useful platform for a variety of modeling tasks. For example, NAB code is incorporated into both AutoDock and Dock to provide a mechanism to carry out force-field calculations on protein-ligand and nucleic-acid-ligand complexes.

NAB consists of a language specification (constructed using lex and yacc) that has special support for macromolecules and their components, along with more general-purpose constructs such as strings, regular expressions and hashed arrays. This language has a C-like syntax, and is compiled to C code at an intermediate stage. There is also a support library (primarily coded in C) that implements rigid-body transformations, distance geometry, energy minimzation and molecular dynamics and normal mode analysis.

[1] T. Macke and D.A. Case. Modeling unusual nucleic acid structures. In Molecular Modeling of Nucleic Acids, N.B. Leontes and J. SantaLucia, Jr., eds. (Washington, DC: American Chemical Society, 1998), pp. 379-393.

The NAB language

NAB (Nucleic Acid Builder) was developed by Tom Macke as a part of his graduate research at The Scripps Research Institute. It is a computer language (specified through lex and yacc) that allows nucleic acid structures to be described in a hierarchical fashion, using a language similar to C or awk, but designed especially for the manipulation of nucleic acid structures. NAB manipulates molecules through three principal techniques:

  1. First are base transformations, which are useful in helical or near-helical situations in which the geometric relation of one basepair (or triple) can be specified relative to others in the helix. Under these circumstances, the bases are laid out first to achieve desired helical and base-pairing configurations, and the sugar-phosphate backbone (or derivatives thereof) are added and optimized in a separate step using molecular mechanics energy minimization procedures or distance geometry. Bases can be laid out along arbitrary curves in space.

  2. The second pillar of NAB functionality is distance geometry, which allows molecular structures to be built that satisfy sets of distance constraints. Such constaints often form a natural way of describing neighbor relationships, cross-linking or footprinting results, or hydrogen bond and helical constaints in nucleic acids. By systematically exploring databases of known nucleic acid structures, we have been able to derive sets of correlated distance constraints that significantly improve the performance of distance geometry techniques as applied to unusual nuclic acid structures. These technques are especially useful in laying out non-helical regions of structures, such as hairpins or loops in pseudo-knot RNA structures.

  3. Once initial models have been constructed, they may be optimized or modified through energy minimization or molecular dynamics simulations. A full (non-periodic) implementation of the Amber force fields is provided, which includes the generalized Born solvation model, and its first and second derivatives. The second derivative facility allows accurate minimization and normal-mode analyses for quite large systems, using the generalized Born implicit solvent model [2].

    [2] R.A. Brown and D.A. Case. Second derivatives in generalized Born theory. J. Comput. Chem. 27, 1662-1675 (2006).

How to obtain NAB

Our latest release incorporates the "low-mode" (lmod) codes from Istvan Kolossvary, and Langevin modes from Yannick Bomble. This is now bundled as a part of AmberTools, and the current version is 12. AmberTools (and hence, NAB) is distributed as source code under the GNU General Public License (GPL). It runs on Linux, Mac OSX, Windows (under cygwin), and on most flavors of UNIX. The code (including a Users' Manual) may be downloaded from the Amber web site.

One use of NAB

James Stroud has used this code to prepare a web server that constructs nucleic acid helices from sequences. The target audience is primarily crystallographers, but others might find this site useful as well:

http://structure.usc.edu/make-na/

A simple code that does most of what this site does is available here:

fd_helix.c

This is a single C file, and compilation and usage instructions are in the comments at the top of the file.

The future(?) of NAB

NAB was originally written in the 1980's, in K&R C, and is beginning to show its age. In particular, the NAB language supports only a subset of features of the underlying C-language, and the unsupported features are not well-documented. Further, because the language hides all pointer usage, any new built-in functions have to be registered in the symbol.c file, so that the way references are handled can be established. This makes it increasingly onerous to extend NAB to the modern era. Furthermore, alternatives like C++ and python, which were not viable languages at the time NAB was begun, support much of what NAB was designed to do, but in a much more extensible way.

In the mid-2000's, we began the process of splitting the molecular-mechanics parts of NAB off into a separate C-library, called "sff" (orginally meaning Simple Force Field, although it's not so simple anymore.) Users can see this code, which is still under active development, in the AmberTools/src/sff folder.

The NAB language itself is in the course of being deprecated. Beginning with the AmberTools18 release, we removed the documentation of NAB from the Reference Manual, since it had not changed in many years. The NAB compiler has had only bug-fix updates in the last decade. We will continue to distribute and support the language (since other programs depend on it), but current efforts are now directed towards "nabc", which is a port of the basic ideas to C. Sample driver programs and tests can be viewed in the AmberTools/test/nabc directory, and more extensive developments are planned for future versions of AmberTools. In particular, drivers for pbsa and 3D-RISM calculations, and for non-periodic implicit solvent simulations, are available there, duplicating much of the functionality of the original NAB versions.



Updated on April 15, 2019. Comments to case@biomaps.rutgers.edu