Crystallography & NMR System

A protein monomer + DNA fragment + ytterbium ion + water molecules

This tutorial shows how to generate a molecular topology file (.mtf) and a CNS coordinate file (.pdb) using the generate_easy.inp task file, starting with a standard coordinate file as obtained from the PDB.

The structure with the PDB ID code 2bop is a protein monomer with a DNA fragment, a ytterbium ion, and water molecules. The nomenclature used for the DNA residue names and some atom names is significantly different from that used in CNS. There is a utility program called fix_dna_rna which can be used to convert a PDB format DNA/RNA file to something more suitable for CNS.

      fix_dna_rna < 2bop.pdb > 2bop_fix1.pdb [< 1 second]

For a certain carbon in thymine, the atom name used in the original PDB file is C5M, while CNS expects the name C5A. The atom names are changed with the UNIX command:

      sed 's/ C5M THY / C5A THY /' 2bop_fix1.pdb > 2bop_fix2.pdb  [< 1 second]

It would also be possible to manually change the atom names by using a standard text editor.

In the original PDB file, the chain identifier used for the water molecules is identical to that of the protein. This can lead to complicated selection statements in other task files and is also generally confusing. It is recommended to change the chain identifiers, for example with the UNIX command:

      sed 's/ HOH A / HOH W /' 2bop_fix2.pdb > 2bop_fix3.pdb  [< 1 second]

The generate_easy.inp task files also needs to be modified:

The correct name for the protein coordinate file must be defined (2bop_fix3.pdb).
All nucleic acid residues initially have ribose sugars (rather than deoxyribose). A patch must be applied to convert the ribose to deoxyribose for DNA residues. In this example, the selection statement for the RNA to DNA conversion must therefore be changed to (segid B).

The command to generate the files generate_easy.mtf and generate_easy.pdb is:

      cns_solve < generate_easy.inp > generate_easy.out  [3 seconds]

Inspection of the generate_easy.pdb output file shows that the first character of the CNS segment identifier is also used as chain identifier. In CNS, segment identifiers can be up to four characters long. However, many other programs (including the graphics program O) can only handle the one-character chain identifiers and ignore the segment identifiers. For compatibility, it is therefore frequently much more convenient to use only one-character segment identifiers.

Script to run this tutorial

Back to tutorials Previous section