The NAB molecular manipulation language

Generation of Models for "Unusual" DNA and RNA:
A Computer Language for Structural Exploration

Tom Macke, W.A. Svrcek-Seiler, Russell A. Brown, Istvan Kolossvary, Yannick Bomble and David A. Case

Summary

NAB was originally designed as a small modeling language (a "molecular awk"), with a principal focus on constructing models for non-helical nucleic acids. It has been used to construct models of helical and non-helical nucleic acids from a few dozen to a few hundred nucleotides in size, and provides a combination of rigid body transformations and distance geometry to create candidate structures that match input criteria. We have applied NAB to duplex-, triplex- and tetraplex DNA, to RNA hairpins and pseudo-knots, to closed-circular DNA, and to models of the small subunit of the ribosome and of recombination sites. [1]

As the code developed, an implementation of the AMBER force field was added, which includes the AMBER implementation of the generalized Born model for solvation effects. Version 5 includes analytical second derivatives, opening the way to new types of simulations. Force-field calculations can be carried out on proteins and small molecules, as well as nucleic acids, making NAB a useful platform for a variety of modeling tasks. For example, NAB code is incorporated into both AutoDock and Dock to provide a mechanism to carry out force-field calculations on protein-ligand and nucleic-acid-ligand complexes.

NAB consists of a language specification (constructed using lex and yacc) that has special support for macromolecules and their components, along with more general-purpose constructs such as strings, regular expressions and hashed arrays. This language has a C-like syntax, and is compiled to C code at an intermediate stage. There is also a support library (primarily coded in C) that implements rigid-body transformations, distance geometry, energy minimzation and molecular dynamics and normal mode analysis.

[1] T. Macke and D.A. Case. Modeling unusual nucleic acid structures. In Molecular Modeling of Nucleic Acids, N.B. Leontes and J. SantaLucia, Jr., eds. (Washington, DC: American Chemical Society, 1998), pp. 379-393.

The NAB language

NAB (Nucleic Acid Builder) was developed by Tom Macke as a part of his graduate research at The Scripps Research Institute. It is a computer language (specified through lex and yacc) that allows nucleic acid structures to be described in a hierarchical fashion, using a language similar to C or awk, but designed especially for the manipulation of nucleic acid structures. NAB manipulates molecules through three principal techniques:

  1. First are base transformations, which are useful in helical or near-helical situations in which the geometric relation of one basepair (or triple) can be specified relative to others in the helix. Under these circumstances, the bases are laid out first to achieve desired helical and base-pairing configurations, and the sugar-phosphate backbone (or derivatives thereof) are added and optimized in a separate step using molecular mechanics energy minimization procedures or distance geometry. Bases can be laid out along arbitrary curves in space.

  2. The second pillar of NAB functionality is distance geometry, which allows molecular structures to be built that satisfy sets of distance constraints. Such constaints often form a natural way of describing neighbor relationships, cross-linking or footprinting results, or hydrogen bond and helical constaints in nucleic acids. By systematically exploring databases of known nucleic acid structures, we have been able to derive sets of correlated distance constraints that significantly improve the performance of distance geometry techniques as applied to unusual nuclic acid structures. These technques are especially useful in laying out non-helical regions of structures, such as hairpins or loops in pseudo-knot RNA structures.

  3. Once initial models have been constructed, they may be optimized or modified through energy minimization or molecular dynamics simulations. A full (non-periodic) implementation of the Amber force fields is provided, which includes the generalized Born solvation model, and its first and second derivatives. The second derivative facility allows accurate minimization and normal-mode analyses for quite large systems, using the generalized Born implicit solvent model [2].

    [2] R.A. Brown and D.A. Case. Second derivatives in generalized Born theory. J. Comput. Chem. 27, 1662-1675 (2006).

How to obtain NAB

Our latest release incorporates the "low-mode" (lmod) codes from Istvan Kolossvary, and Langevin modes from Yannick Bomble. This is now bundled as a part of AmberTools, and the current version is 12. AmberTools (and hence, NAB) is distributed as source code under the GNU General Public License (GPL). It runs on Linux, Mac OSX, Windows (under cygwin), and on most flavors of UNIX. The code (including a Users' Manual) may be downloaded from the Amber web site.

One use of NAB

James Stroud has used this code to prepare a web server that constructs nucleic acid helices from sequences. The target audience is primarily crystallographers, but others might find this site useful as well:

http://structure.usc.edu/make-na/



Updated on July 12, 2016. Comments to case@biomaps.rutgers.edu