The NAB molecular manipulation language

Generation of Models for "Unusual" DNA and RNA:
A Computer Language for Structural Exploration

Tom Macke, W.A. Svrcek-Seiler, Russell A. Brown, Istvan Kolossvary, Yannick Bomble and David A. Case

Summary

NAB was originally designed as a small modeling language (a "molecular awk"), with a principal focus on constructing models for non-helical nucleic acids. It has been used to construct models of helical and non-helical nucleic acids from a few dozen to a few hundred nucleotides in size, and provides a combination of rigid body transformations and distance geometry to create candidate structures that match input criteria. We have applied NAB to duplex-, triplex- and tetraplex DNA, to RNA hairpins and pseudo-knots, to closed-circular DNA, and to models of the small subunit of the ribosome and of recombination sites. [1]

As the code developed, an implementation of the AMBER force field was added, which includes the AMBER implementation of the generalized Born model for solvation effects. Version 5 includes analytical second derivatives, opening the way to new types of simulations. Force-field calculations can be carried out on proteins and small molecules, as well as nucleic acids, making NAB a useful platform for a variety of modeling tasks. For example, NAB code is incorporated into both AutoDock and Dock to provide a mechanism to carry out force-field calculations on protein-ligand and nucleic-acid-ligand complexes.

NAB consists of a language specification (constructed using lex and yacc) that has special support for macromolecules and their components, along with more general-purpose constructs such as strings, regular expressions and hashed arrays. This language has a C-like syntax, and is compiled to C code at an intermediate stage. There is also a support library (primarily coded in C) that implements rigid-body transformations, distance geometry, energy minimzation and molecular dynamics and normal mode analysis.

[1] T. Macke and D.A. Case. Modeling unusual nucleic acid structures. In Molecular Modeling of Nucleic Acids, N.B. Leontes and J. SantaLucia, Jr., eds. (Washington, DC: American Chemical Society, 1998), pp. 379-393.

The NAB language

NAB (Nucleic Acid Builder) was developed by Tom Macke as a part of his graduate research at The Scripps Research Institute. It is a computer language (specified through lex and yacc) that allows nucleic acid structures to be described in a hierarchical fashion, using a language similar to C or awk, but designed especially for the manipulation of nucleic acid structures. NAB manipulates molecules through three principal techniques:

  1. First are base transformations, which are useful in helical or near-helical situations in which the geometric relation of one basepair (or triple) can be specified relative to others in the helix. Under these circumstances, the bases are laid out first to achieve desired helical and base-pairing configurations, and the sugar-phosphate backbone (or derivatives thereof) are added and optimized in a separate step using molecular mechanics energy minimization procedures or distance geometry. Bases can be laid out along arbitrary curves in space.

  2. The second pillar of NAB functionality is distance geometry, which allows molecular structures to be built that satisfy sets of distance constraints. Such constaints often form a natural way of describing neighbor relationships, cross-linking or footprinting results, or hydrogen bond and helical constaints in nucleic acids. By systematically exploring databases of known nucleic acid structures, we have been able to derive sets of correlated distance constraints that significantly improve the performance of distance geometry techniques as applied to unusual nuclic acid structures. These technques are especially useful in laying out non-helical regions of structures, such as hairpins or loops in pseudo-knot RNA structures.

  3. Once initial models have been constructed, they may be optimized or modified through energy minimization or molecular dynamics simulations. A full (non-periodic) implementation of the Amber force fields is provided, which includes the generalized Born solvation model, and its first and second derivatives. The second derivative facility allows accurate minimization and normal-mode analyses for quite large systems, using the generalized Born implicit solvent model [2].

    [2] R.A. Brown and D.A. Case. Second derivatives in generalized Born theory. J. Comput. Chem. 27, 1662-1675 (2006).

How to obtain NAB

Our latest release incorporates the "low-mode" (lmod) codes from Istvan Kolossvary, and Langevin modes from Yannick Bomble. NAB is distributed as source code under the GNU General Public License (GPL). It runs on Linux, Mac OSX, Windows (under WSL), and on most flavors of UNIX. The code (including a Users' Manual) may be downloaded from:

github.com/dacase/nabc.

This repository includes the NAB language compiler, the libsff library of force field routines, a C-language interface to libsff (called nabc), and the shifts package for estimating NMR chemical shifts (since it is built on top of NAB.)

Note: NAB used to be included in AmberTools, but was split off from that larger package in 2023. However, many (but by no means all) applications of NAB will also require installation of AmberTools; see ambermd.org/GetAmber.php for more information about AmberTools.

The future(?) of NAB

NAB was originally written in the 1980's, in K&R C, and is beginning to show its age. In particular, the NAB language supports only a subset of features of the underlying C-language, and the unsupported features are not well-documented. Further, because the language hides all pointer usage, any new built-in functions have to be registered in the symbol.c file, so that the way references are handled can be established. This makes it increasingly onerous to extend NAB to the modern era. Furthermore, alternatives like C++ and python, which were not viable languages at the time NAB was begun, support much of what NAB was designed to do, but in a much more extensible way.

In the mid-2000's, we began the process of splitting the molecular-mechanics parts of NAB off into a separate C-library, called "sff" (orginally meaning Simple Force Field, although it's not so simple anymore.) Users can see this code, which is still under active development, in the src/sff folder.

The NAB language itself is in the course of being deprecated, but is still getting maintenance releases. It forms the basis for (among other uses) the SHIFTS package for estimating NMR chemical shifts in proteins and nucleic acids (also a part of "nabc" package.)



Updated on October 3, 2024. Comments to david.case@rutgers.edu