
There are two ways of introducing capping groups in CNS. One way is to apply a patch to a terminus to add the atoms of the capping group. In this case the terminal residue including the capping group is treated as one unit. The residue name is still that of the standard amino acid. This solution works well as long as all the work is done in CNS. However, if the CNS PDB file is used in another program (for example O), the additional atoms can cause problems. Alternatively, a new residue type can be defined for each capping group. This approach causes less problems when using the CNS PDB file in other programs. In the following, the creation of three types of files for the definition of new residue types is outlined:
| capping.top | This topology file defines the atoms in each capping group, the bonds, angles, dihedrals and impropers, and methods for linking each group to a protein chain. |
| capping.param | This parameter file defines the target values and weights for the bonds, angles, dihedrals and impropers defined in the topology file. |
| protein_capping.link | This linkage file defines how the linkage methods defined in the topology file are used. |
mass NHHE 16.02270 ! extended atom nitrogen with two hydrogen
residue NHH
group
atom N type=NHHE charge=0.00 end
end
In the standard topology file protein.top used for refinement
against X-ray data, only some hydrogen are explicitely included.
Following this approach, the hydrogen for the amide group are not
specified explicitly. Instead, an extended atom type
NHHE is defined with the sum of the masses of one nitrogen and
two hydrogen. The name of the new residue type is chosen as NHH.
When a structure is generated from a PDB file, the name of the
nitrogen in the amide group must be N. No charge is specified,
because charges are not normally used in crystallographic refinement.
The second part of the topology definition for the amide capping group specifies methods for linking the group to a protein chain. Two methods are required, one for proline, and another for all the other amino acids.
presidue NHHL { link for all amino acids except proline }
add bond -C +N
add angle -CA -C +N
add angle -O -C +N
add improper -C -CA +N -O {planar -C}
end
presidue NHHP { link for proline }
add bond -C +N
add angle -CA -C +N
add angle -O -C +N
add dihedral -N -CA -C +N
add improper -C -CA +N -O {planar -C}
end
The two presidue definitions are based on the PEPT
and PEPP definitions in the standard topology file
protein.top. The plus and minus signs are used to distinguish
the atoms in the two residues to be linked. The add bond -C +N
statement defines a bond from the atom with the name C in the
"previous" residue to the atom with the name N in
the "next" residue in the peptide chain. The angle, dihedral
and improper definitions are a subset of the PEPT and
PEPP definitions. All statements involving atoms in the
"next" residue other than N are omitted.
bond C NHHE 1342.404 1.328 ! same as C NH2
angle CH1E C NHHE 863.744 116.900 ! same as CH1E C N
angle CH2G C NHHE 440.686 118.200 ! same as CH2G C N
angle O C NHHE 991.543 122.000 ! same as N C O
evaluate ($vdw_radius_N= 3.0) {-0.1}
evaluate ($vdw_radius_N = $vdw_radius_N / 2^(1/6))
evaluate ($vdw_radius14_N = $vdw_radius_N -0.3/ 2^(1/6))
evaluate ($vdw_eps=0.1)
nonbonded NHHE $vdw_eps $vdw_radius_N $vdw_eps $vdw_radius14_N ! same as NH2
For each topology definition, there must be a corresponding parameter
definition. However, in contrast to the topology definitions which are
based on atom names, the parameter definitions are type based.
Therefore one topology statement can require several parameter
statements. This is, for example, the case for the topology statement
"add angle -CA -C +N". In glycine, the CA
is defined with type CH2G, and in all other amino-acid
residues with type CH1E. Therefore there are two corresponding
parameter angle statements.
One might notice that there are no parameter definitions for the two dihedral and improper definitions above (add dihedral -N -CA -C +N, add improper -C -CA +N -O). This is because the matching parameter statements in protein_rep.param use wildcards, and therefore the type of the nitrogen is not important. The matching parameter statements are:
It would be possible to override the wildcards with specific dihedral and improper parameter statements. More details can be found in the reference section.dihe X C CH1E X 0.0 3 0.0 ! psi angle evaluate ($kimpr_strong=750.0) impr C X X O $kimpr_strong 0 0.0
residue ACE
group
atom CA type=CH3E charge=0.00 end
atom C type=C charge=0.00 end
atom O type=O charge=0.00 end
bond C CA
bond C O
end
For the sake of simplicity, pre-existing atom type definitions
from protein.top are used.
The second part of the topology definition for the acetyl capping group specifies methods for linking the group to a protein chain. As before for NHH, two methods are required, one for proline, and another for all the other amino acids.
presidue ACEL { link for all amino acids except proline }
add bond -C +N
add angle -CA -C +N
add angle -O -C +N
add angle -C +N +CA
add angle -C +N +H
add dihedral -C +N +CA +C
add dihedral -CA -C +N +CA
add improper -C -CA +N -O {planar -C}
add improper +N -C +CA +H {planar +N}
end
presidue ACEP { link for proline }
add bond -C +N
add angle -CA -C +N
add angle -O -C +N
add angle -C +N +CA
add angle -C +N +CD
add dihedral -C +N +CA +C
add dihedral -CA -C +N +CA
add improper -C -CA +N -O {planar -C}
add improper +N +CA +CD -C {planar +N}
end
As for NHH, the two presidue definitions are based on
the PEPT and PEPP definitions in the standard
topology file protein.top. The only difference is that the
dihedral statements involving the nitrogen atom in the
"previous" residue are omitted.
bond C CH3E 1000.000 1.507 ! distance from Corina angle CH3E C O 672.465 120.800 ! same as CH1E C O angle CH3E C NH1 485.856 116.200 ! same as CH1E C NH1 angle CH3E C N 863.744 116.900 ! same as CH1E C N evaluate ($kdih_rigid=1250.0) dihe CH3E C N CH1E $kdih_rigid 2 180.0 ! same as H2E C N CH1E
The syntax for the link statements is:link ACEP head - ACE tail + PRO end link ACEP head - ACE tail + CPR end link ACEL head - ACE tail + * end link NHHP head - PRO tail + NHH end link NHHP head - CPR tail + NHH end link NHHL head - * tail + NHH end
link <presidue-label> head <patch-character> <residue-name>
tail <patch-character> <residue-name>
In all the examples, the <patch-character> for the head is a
minus sign, the <patch-character> for the tail is a plus sign.
Other characters could be used but for chains minus and plus signs are
most intuitive.
cns_solve < generate.inp > generate.out [< 1 second]
To demonstrate this, the files capping.pdb and capping.param are copied to the files incomplete.pdb, and incomplete.param, respectively. incomplete.inp is a copy of the previously used file generate.inp with the new file names for the PDB and the parameter file.
Arbitrarily, a dihedral definition is deleted from incomplete.param, and a gamma-carbon from the cis-proline in incomplete.pdb. The generate task file incomplete.inp will now attempt to build this missing carbon atom. For this it needs all the parameter definitions, and will therefore produce error messages if some are missing.
cns_solve < incomplete.inp > incomplete.out [< 1 second]
The error message at the end of incomplete.out is:
%CODDIH-ERR: missing dihedral parameters %%%%%%%%%%%%%%%%%%%%% dihedral energy constant missing. target dihedral value missing. periodicity missing. ATOM1: SEGId="C ", RESId="1 ", NAME="CA ", CHEMical="CH3E" ATOM2: SEGId="C ", RESId="1 ", NAME="C ", CHEMical="C " ATOM3: SEGId="C ", RESId="2 ", NAME="N ", CHEMical="N " ATOM4: SEGId="C ", RESId="2 ", NAME="CA ", CHEMical="CH1E" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %CODDIH-ERR: missing dihedral parameters %%%%%%%%%%%%%%%%%%%%% dihedral energy constant missing. target dihedral value missing. periodicity missing. ATOM1: SEGId="D ", RESId="1 ", NAME="CA ", CHEMical="CH3E" ATOM2: SEGId="D ", RESId="1 ", NAME="C ", CHEMical="C " ATOM3: SEGId="D ", RESId="2 ", NAME="N ", CHEMical="N " ATOM4: SEGId="D ", RESId="2 ", NAME="CA ", CHEMical="CH1E" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %CODDIH error encountered: program will be aborted.In general, there will only be error messages for one type of parameter definition and it can be necessary to run the generate task file several times before all the missing parameter definitions are found. For example, first, only the missing bond parameters are shown. If they are supplied and the generate task file is run again, the missing angle parameters are listed, and so on for the dihedral, improper and non-bonded parameter definitions.
It should be noted that a more elegant and rigorous way of generating parameter definitions is by the use of the LEARN statement. However, this requires well defined coordinates of several example molecules. More details about the LEARN statement can be found in another tutorial section.