There are two ways of introducing capping groups in CNS. One way is to apply a patch to a terminus to add the atoms of the capping group. In this case the terminal residue including the capping group is treated as one unit. The residue name is still that of the standard amino acid. This solution works well as long as all the work is done in CNS. However, if the CNS PDB file is used in another program (for example O), the additional atoms can cause problems. Alternatively, a new residue type can be defined for each capping group. This approach causes less problems when using the CNS PDB file in other programs. In the following, the creation of three types of files for the definition of new residue types is outlined:
capping.top | This topology file defines the atoms in each capping group, the bonds, angles, dihedrals and impropers, and methods for linking each group to a protein chain. |
capping.param | This parameter file defines the target values and weights for the bonds, angles, dihedrals and impropers defined in the topology file. |
protein_capping.link | This linkage file defines how the linkage methods defined in the topology file are used. |
In the standard topology file protein.top used for refinement against X-ray data, only some hydrogen are explicitely included. Following this approach, the hydrogen for the amide group are not specified explicitly. Instead, an extended atom type NHHE is defined with the sum of the masses of one nitrogen and two hydrogen. The name of the new residue type is chosen as NHH. When a structure is generated from a PDB file, the name of the nitrogen in the amide group must be N. No charge is specified, because charges are not normally used in crystallographic refinement.mass NHHE 16.02270 ! extended atom nitrogen with two hydrogen residue NHH group atom N type=NHHE charge=0.00 end end
The second part of the topology definition for the amide capping group specifies methods for linking the group to a protein chain. Two methods are required, one for proline, and another for all the other amino acids.
The two presidue definitions are based on the PEPT and PEPP definitions in the standard topology file protein.top. The plus and minus signs are used to distinguish the atoms in the two residues to be linked. The add bond -C +N statement defines a bond from the atom with the name C in the "previous" residue to the atom with the name N in the "next" residue in the peptide chain. The angle, dihedral and improper definitions are a subset of the PEPT and PEPP definitions. All statements involving atoms in the "next" residue other than N are omitted.presidue NHHL { link for all amino acids except proline } add bond -C +N add angle -CA -C +N add angle -O -C +N add improper -C -CA +N -O {planar -C} end presidue NHHP { link for proline } add bond -C +N add angle -CA -C +N add angle -O -C +N add dihedral -N -CA -C +N add improper -C -CA +N -O {planar -C} end
For each topology definition, there must be a corresponding parameter definition. However, in contrast to the topology definitions which are based on atom names, the parameter definitions are type based. Therefore one topology statement can require several parameter statements. This is, for example, the case for the topology statement "add angle -CA -C +N". In glycine, the CA is defined with type CH2G, and in all other amino-acid residues with type CH1E. Therefore there are two corresponding parameter angle statements.bond C NHHE 1342.404 1.328 ! same as C NH2 angle CH1E C NHHE 863.744 116.900 ! same as CH1E C N angle CH2G C NHHE 440.686 118.200 ! same as CH2G C N angle O C NHHE 991.543 122.000 ! same as N C O evaluate ($vdw_radius_N= 3.0) {-0.1} evaluate ($vdw_radius_N = $vdw_radius_N / 2^(1/6)) evaluate ($vdw_radius14_N = $vdw_radius_N -0.3/ 2^(1/6)) evaluate ($vdw_eps=0.1) nonbonded NHHE $vdw_eps $vdw_radius_N $vdw_eps $vdw_radius14_N ! same as NH2
One might notice that there are no parameter definitions for the two dihedral and improper definitions above (add dihedral -N -CA -C +N, add improper -C -CA +N -O). This is because the matching parameter statements in protein_rep.param use wildcards, and therefore the type of the nitrogen is not important. The matching parameter statements are:
It would be possible to override the wildcards with specific dihedral and improper parameter statements. More details can be found in the reference section.dihe X C CH1E X 0.0 3 0.0 ! psi angle evaluate ($kimpr_strong=750.0) impr C X X O $kimpr_strong 0 0.0
For the sake of simplicity, pre-existing atom type definitions from protein.top are used.residue ACE group atom CA type=CH3E charge=0.00 end atom C type=C charge=0.00 end atom O type=O charge=0.00 end bond C CA bond C O end
The second part of the topology definition for the acetyl capping group specifies methods for linking the group to a protein chain. As before for NHH, two methods are required, one for proline, and another for all the other amino acids.
As for NHH, the two presidue definitions are based on the PEPT and PEPP definitions in the standard topology file protein.top. The only difference is that the dihedral statements involving the nitrogen atom in the "previous" residue are omitted.presidue ACEL { link for all amino acids except proline } add bond -C +N add angle -CA -C +N add angle -O -C +N add angle -C +N +CA add angle -C +N +H add dihedral -C +N +CA +C add dihedral -CA -C +N +CA add improper -C -CA +N -O {planar -C} add improper +N -C +CA +H {planar +N} end presidue ACEP { link for proline } add bond -C +N add angle -CA -C +N add angle -O -C +N add angle -C +N +CA add angle -C +N +CD add dihedral -C +N +CA +C add dihedral -CA -C +N +CA add improper -C -CA +N -O {planar -C} add improper +N +CA +CD -C {planar +N} end
bond C CH3E 1000.000 1.507 ! distance from Corina angle CH3E C O 672.465 120.800 ! same as CH1E C O angle CH3E C NH1 485.856 116.200 ! same as CH1E C NH1 angle CH3E C N 863.744 116.900 ! same as CH1E C N evaluate ($kdih_rigid=1250.0) dihe CH3E C N CH1E $kdih_rigid 2 180.0 ! same as H2E C N CH1E
The syntax for the link statements is:link ACEP head - ACE tail + PRO end link ACEP head - ACE tail + CPR end link ACEL head - ACE tail + * end link NHHP head - PRO tail + NHH end link NHHP head - CPR tail + NHH end link NHHL head - * tail + NHH end
In all the examples, the <patch-character> for the head is a minus sign, the <patch-character> for the tail is a plus sign. Other characters could be used but for chains minus and plus signs are most intuitive.link <presidue-label> head <patch-character> <residue-name> tail <patch-character> <residue-name>
cns_solve < generate.inp > generate.out [< 1 second]
To demonstrate this, the files capping.pdb and capping.param are copied to the files incomplete.pdb, and incomplete.param, respectively. incomplete.inp is a copy of the previously used file generate.inp with the new file names for the PDB and the parameter file.
Arbitrarily, a dihedral definition is deleted from incomplete.param, and a gamma-carbon from the cis-proline in incomplete.pdb. The generate task file incomplete.inp will now attempt to build this missing carbon atom. For this it needs all the parameter definitions, and will therefore produce error messages if some are missing.
cns_solve < incomplete.inp > incomplete.out [< 1 second]The error message at the end of incomplete.out is:
%CODDIH-ERR: missing dihedral parameters %%%%%%%%%%%%%%%%%%%%% dihedral energy constant missing. target dihedral value missing. periodicity missing. ATOM1: SEGId="C ", RESId="1 ", NAME="CA ", CHEMical="CH3E" ATOM2: SEGId="C ", RESId="1 ", NAME="C ", CHEMical="C " ATOM3: SEGId="C ", RESId="2 ", NAME="N ", CHEMical="N " ATOM4: SEGId="C ", RESId="2 ", NAME="CA ", CHEMical="CH1E" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %CODDIH-ERR: missing dihedral parameters %%%%%%%%%%%%%%%%%%%%% dihedral energy constant missing. target dihedral value missing. periodicity missing. ATOM1: SEGId="D ", RESId="1 ", NAME="CA ", CHEMical="CH3E" ATOM2: SEGId="D ", RESId="1 ", NAME="C ", CHEMical="C " ATOM3: SEGId="D ", RESId="2 ", NAME="N ", CHEMical="N " ATOM4: SEGId="D ", RESId="2 ", NAME="CA ", CHEMical="CH1E" %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %CODDIH error encountered: program will be aborted.In general, there will only be error messages for one type of parameter definition and it can be necessary to run the generate task file several times before all the missing parameter definitions are found. For example, first, only the missing bond parameters are shown. If they are supplied and the generate task file is run again, the missing angle parameters are listed, and so on for the dihedral, improper and non-bonded parameter definitions.
It should be noted that a more elegant and rigorous way of generating parameter definitions is by the use of the LEARN statement. However, this requires well defined coordinates of several example molecules. More details about the LEARN statement can be found in another tutorial section.