Creation of topology and parameter files for N-terminal acetyl and C-terminal amide capping groups


Some proteins have capping groups at the termini. At the N-terminus, this can be an acetyl group, and at the C-terminus an amide group.

There are two ways of introducing capping groups in CNS. One way is to apply a patch to a terminus to add the atoms of the capping group. In this case the terminal residue including the capping group is treated as one unit. The residue name is still that of the standard amino acid. This solution works well as long as all the work is done in CNS. However, if the CNS PDB file is used in another program (for example O), the additional atoms can cause problems. Alternatively, a new residue type can be defined for each capping group. This approach causes less problems when using the CNS PDB file in other programs. In the following, the creation of three types of files for the definition of new residue types is outlined:

capping.top This topology file defines the atoms in each capping group, the bonds, angles, dihedrals and impropers, and methods for linking each group to a protein chain.
capping.param This parameter file defines the target values and weights for the bonds, angles, dihedrals and impropers defined in the topology file.
protein_capping.link This linkage file defines how the linkage methods defined in the topology file are used.


Important remark

This tutorial outlines how to create topology and parameter files for two very simple chemical groups. The definition of the topology given here is more or less unambiguous and complete. However, the corresponding parameter definitions are only syntactically correct. The specific target values and weights are copied from corresponding definitions in the standard parameter file protein_rep.param and therefore certainly not ideal. One reason for this shortcoming is that it is in general very time consuming to obtain a consistent set of parameters. Another reason is that the ideal set of parameters also depends on the context in which it is used. Parameters used in refinement against X-ray data tend to be different from parameters used in refinement against NMR data or free Molecular Dynamics simulations. The parameters shown in the examples below are probably sufficient for use in refinement against good X-ray data, but not suitable for free Molecular Dynamics simulations.

Definition of the topology for the amide capping group

The topology definiton for the amide group involves just one atom:
mass  NHHE  16.02270 ! extended atom nitrogen with two hydrogen

residue NHH
  group
    atom N   type=NHHE  charge=0.00  end
end
In the standard topology file protein.top used for refinement against X-ray data, only some hydrogen are explicitely included. Following this approach, the hydrogen for the amide group are not specified explicitly. Instead, an extended atom type NHHE is defined with the sum of the masses of one nitrogen and two hydrogen. The name of the new residue type is chosen as NHH. When a structure is generated from a PDB file, the name of the nitrogen in the amide group must be N. No charge is specified, because charges are not normally used in crystallographic refinement.

The second part of the topology definition for the amide capping group specifies methods for linking the group to a protein chain. Two methods are required, one for proline, and another for all the other amino acids.

presidue NHHL { link for all amino acids except proline }
  add bond -C +N

  add angle -CA -C +N
  add angle -O  -C +N

  add improper  -C -CA +N -O  {planar -C}
end


presidue NHHP  { link for proline }
 add bond -C +N

 add angle -CA -C +N
 add angle -O  -C +N

 add dihedral  -N -CA -C +N

 add improper  -C -CA +N -O  {planar -C}
end
The two presidue definitions are based on the PEPT and PEPP definitions in the standard topology file protein.top. The plus and minus signs are used to distinguish the atoms in the two residues to be linked. The add bond -C +N statement defines a bond from the atom with the name C in the "previous" residue to the atom with the name N in the "next" residue in the peptide chain. The angle, dihedral and improper definitions are a subset of the PEPT and PEPP definitions. All statements involving atoms in the "next" residue other than N are omitted.

Definition of the parameters for the amide capping group

All the topology definitions for NHH involve a new atom type, NHHE. Therefore new parameters must be specified for all interactions where NHHE takes part.
bond C    NHHE 1342.404  1.328 ! same as C    NH2

angle CH1E C    NHHE  863.744  116.900 ! same as CH1E C    N
angle CH2G C    NHHE  440.686  118.200 ! same as CH2G C    N
angle O    C    NHHE  991.543  122.000 ! same as N    C    O

evaluate ($vdw_radius_N=        3.0)                             {-0.1}
evaluate ($vdw_radius_N   = $vdw_radius_N    / 2^(1/6))
evaluate ($vdw_radius14_N   = $vdw_radius_N    -0.3/ 2^(1/6))
evaluate ($vdw_eps=0.1)
nonbonded  NHHE  $vdw_eps $vdw_radius_N  $vdw_eps $vdw_radius14_N ! same as NH2
For each topology definition, there must be a corresponding parameter definition. However, in contrast to the topology definitions which are based on atom names, the parameter definitions are type based. Therefore one topology statement can require several parameter statements. This is, for example, the case for the topology statement "add angle -CA -C +N". In glycine, the CA is defined with type CH2G, and in all other amino-acid residues with type CH1E. Therefore there are two corresponding parameter angle statements.

One might notice that there are no parameter definitions for the two dihedral and improper definitions above (add dihedral -N -CA -C +N, add improper -C -CA +N -O). This is because the matching parameter statements in protein_rep.param use wildcards, and therefore the type of the nitrogen is not important. The matching parameter statements are:

dihe X    C    CH1E X       0.0       3       0.0 ! psi angle
evaluate ($kimpr_strong=750.0)
impr C    X    X    O     $kimpr_strong       0       0.0
It would be possible to override the wildcards with specific dihedral and improper parameter statements. More details can be found in the reference section.

Definition of the topology for the acetyl capping group

The topology definiton for the acetyl group involves the addition of three atoms and two bonds.
residue ACE
  group
    atom CA   type=CH3E  charge=0.00  end
    atom C    type=C     charge=0.00  end
    atom O    type=O     charge=0.00  end

  bond C   CA
  bond C   O
end
For the sake of simplicity, pre-existing atom type definitions from protein.top are used.

The second part of the topology definition for the acetyl capping group specifies methods for linking the group to a protein chain. As before for NHH, two methods are required, one for proline, and another for all the other amino acids.

presidue ACEL { link for all amino acids except proline }
  add bond -C +N

  add angle -CA -C +N
  add angle -O  -C +N
  add angle -C  +N +CA
  add angle -C  +N +H

  add dihedral  -C +N +CA +C
  add dihedral  -CA -C +N +CA

  add improper  -C -CA +N -O  {planar -C}
  add improper  +N -C +CA +H  {planar +N}
end

presidue ACEP { link for proline }
 add bond -C +N

 add angle -CA -C +N
 add angle -O  -C +N
 add angle -C  +N +CA
 add angle -C  +N +CD

 add dihedral  -C +N +CA +C
 add dihedral  -CA -C +N +CA

 add improper  -C -CA +N -O  {planar -C}
 add improper  +N +CA +CD -C  {planar +N}
end
As for NHH, the two presidue definitions are based on the PEPT and PEPP definitions in the standard topology file protein.top. The only difference is that the dihedral statements involving the nitrogen atom in the "previous" residue are omitted.

Definition of the parameters for the acetyl capping group

Since standard atom types were assigned to the carbon and oxygen of the acetyl group, most parameters are already defined. It only remains to define a new bond parameter, three angle parameters, and one dihedral parameter.
bond C    CH3E 1000.000  1.507 ! distance from Corina

angle CH3E C    O     672.465  120.800 ! same as CH1E C    O
angle CH3E C    NH1   485.856  116.200 ! same as CH1E C    NH1
angle CH3E C    N     863.744  116.900 ! same as CH1E C    N

evaluate ($kdih_rigid=1250.0)
dihe CH3E C    N    CH1E  $kdih_rigid  2  180.0 ! same as H2E C    N    CH1E

Creation of the linkage file

For proteins, the rules for applying the linking methods defined in the topology files are listed in the standard library file protein.link. To include the new rules for the acetyl and amide groups, protein.link is copied and modified. At the top of the file, a new block is inserted:
link ACEP    head - ACE   tail + PRO     end
link ACEP    head - ACE   tail + CPR     end
link ACEL    head - ACE   tail + *       end

link NHHP    head - PRO   tail + NHH     end
link NHHP    head - CPR   tail + NHH     end
link NHHL    head - *     tail + NHH     end
The syntax for the link statements is:
link <presidue-label>  head <patch-character> <residue-name>
                       tail <patch-character> <residue-name>
In all the examples, the <patch-character> for the head is a minus sign, the <patch-character> for the tail is a plus sign. Other characters could be used but for chains minus and plus signs are most intuitive.

Using the new topology and parameter files

The use of the new topology and parameter files is demonstrated by generating the molecular topology based on a simple PDB file. The example file capping.pdb contains four "chains", each consisting of one amino-acid residue with the two capping groups. To generate the molecular topology, the task file generate.inp is modified. The protein input file name is set to capping.pdb, the protein linkage file name is redefined as protein_capping.link, and the prosthetic group topology and parameter files are defined as capping.top and capping.param, respectively. All other parameters are unchanged.
      cns_solve < generate.inp > generate.out  [< 1 second]

(Mis)using generate.inp error messages

When creating new topology and parameter files, it is often not easy to figure out which additional bond, angle, dihedral and improper parameters have to be defined. However, once the topology is defined correctly, the error diagnostics produced when running generat.inp can be (mis)used to find the missing parameter definitions.

To demonstrate this, the files capping.pdb and capping.param are copied to the files incomplete.pdb, and incomplete.param, respectively. incomplete.inp is a copy of the previously used file generate.inp with the new file names for the PDB and the parameter file.

Arbitrarily, a dihedral definition is deleted from incomplete.param, and a gamma-carbon from the cis-proline in incomplete.pdb. The generate task file incomplete.inp will now attempt to build this missing carbon atom. For this it needs all the parameter definitions, and will therefore produce error messages if some are missing.

      cns_solve < incomplete.inp > incomplete.out  [< 1 second]
The error message at the end of incomplete.out is:
 %CODDIH-ERR: missing dihedral parameters %%%%%%%%%%%%%%%%%%%%%
  dihedral energy constant missing.
  target dihedral value missing.
  periodicity missing.
  ATOM1: SEGId="C   ",  RESId="1   ",  NAME="CA  ",  CHEMical="CH3E"
  ATOM2: SEGId="C   ",  RESId="1   ",  NAME="C   ",  CHEMical="C   "
  ATOM3: SEGId="C   ",  RESId="2   ",  NAME="N   ",  CHEMical="N   "
  ATOM4: SEGId="C   ",  RESId="2   ",  NAME="CA  ",  CHEMical="CH1E"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %CODDIH-ERR: missing dihedral parameters %%%%%%%%%%%%%%%%%%%%%
  dihedral energy constant missing.
  target dihedral value missing.
  periodicity missing.
  ATOM1: SEGId="D   ",  RESId="1   ",  NAME="CA  ",  CHEMical="CH3E"
  ATOM2: SEGId="D   ",  RESId="1   ",  NAME="C   ",  CHEMical="C   "
  ATOM3: SEGId="D   ",  RESId="2   ",  NAME="N   ",  CHEMical="N   "
  ATOM4: SEGId="D   ",  RESId="2   ",  NAME="CA  ",  CHEMical="CH1E"
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %CODDIH error encountered: program will be aborted.
In general, there will only be error messages for one type of parameter definition and it can be necessary to run the generate task file several times before all the missing parameter definitions are found. For example, first, only the missing bond parameters are shown. If they are supplied and the generate task file is run again, the missing angle parameters are listed, and so on for the dihedral, improper and non-bonded parameter definitions.

It should be noted that a more elegant and rigorous way of generating parameter definitions is by the use of the LEARN statement. However, this requires well defined coordinates of several example molecules. More details about the LEARN statement can be found in another tutorial section.



Script to run this tutorial

Back to tutorials   Previous section   Next section