Learning parameters from coordinates


This facility can be used to obtain bond, bond angle, dihedral angle, or improper angle equilibrium parameters and energy constants from selected atoms of Cartesian coordinate sets. The coordinates are specified in the main coordinate set. The statement learns parameters only for those interaction terms that are turned on by the flags statement. The learned parameters will take precedence over the type-based parameters.

The equilibrium geometry parameters can be directly obtained from a single coordinate set or averaged over successive coordinate sets. If just a single coordinate set is available, one can learn only the equilibrium geometry, not the energy constants. If an ensemble of coordinates is available, energy constants can be derived assuming equipartition of energy among the different internal coordinates. This is only approximately verified in a real system, since there is actually coupling among the internal coordinates.

  k_bond     = (kT)/( 2 <(r-<r>)^2> )
  k_angle    = (kT)/( 2 <(q-<q>)^2> )
  k_dihedral = (kT)/( 2 <(f-<f>)^2> )

The brackets represent an average over the ensemble of coordinate sets. kT/2 is the mean thermal energy per harmonic degree of freedom at T=298K. The last expression assumes that all dihedral angles and torsion angles are represented by a harmonic functional form with periodicity n set to zero. In fact, the learn facility will set the periodicity of all "learned" dihedral and improper angles to zero. In the case that one of the variances in the denominators becomes zero, the corresponding energy constant is set to 999999. Parameter learning is not possible for nonbonded parameters.

The possible learn statements are as follows:

The possible learn options are as follows:

It is important that the learn options be specified in the initialization stage:

  learn initiate selection=( name c* ) MODE=STATistics end 
  learn accumulate end 
  learn terminate  end 
Requirements

The atom selection is fragile.

Example: Learning Unknown Equilibrium Parameters from Coordinates

In the following example, a protein and ligand are considered. The molecular structure of the the protein and the ligand have to be generated as outlined previously. The ligand requires the definition of the topology:

topology 

   autogenerate angles=true dihedrals=false end 

   residue LIGA 
      atom A type=C end 
      atom B type=C end 
      atom C type=C end 
      atom D type=O end 

      bond A B 
      bond B C 
      bond C D 

     improper A B C D 

   end 
end 

segment 
   name=LIGA 
   molecule number=1 name=LIGA end 
end 

Note that mass statements may be required if the atom types of the ligand are non-standard.

The protein parameters can be obtained from one of CNS's protein parameter files. In general, the ligand parameters will be unknown. Suppose that the ligand coordinates are known from an appropriate crystal structure. We can learn the unknown ligand parameters from the known Cartesian coordinates. For purposes of structure determination, it is usually sufficient to set the energy constants to a uniform value.

The following statements define the unknown ligand parameters. They should be inserted in all CNS protocols at any place after the molecular structure and coordinate files have been read and before the first energy evaluation is performed.

flags exclude * include bonds angles impropers end 

parameters 
    learn initiate sele=(segid LIGA) mode=nostatistics end 
    learn accumulate end 
    learn terminate end 
end 

parameters 
   BOND   (segid LIGA) (segid LIGA) 400. TOKEN 
   ANGLE  (segid LIGA) (segid LIGA) (segid LIGA) 60. TOKEN 
   IMPR  (segid LIGA) (segid LIGA) (segid LIGA) (segid LIGA) 50. TOKEN TOKEN 

   {* Set the nonbonded parameters (only if required). *}

   NBON ( (name A or name B or name C) and segid "LIGA" )  0.1 3.5 0.1 3.5 
   NBON ( name D and segid "LIGA" )  0.1 3.4 0.1 3.4 
end 

flags include vdw elec pvdw pele end. 

Note that the learn statement automatically sets the periodicity of all learned dihedral and improper angles to zero. Also note that the user has to specify improper and dihedral angles in the topology definition of the ligand in order to maintain planarity and chirality in certain parts of the ligand. Nonbonded parameters may have to be set by appropriate parameter statements unless they are already defined through type-based parameters. Finally, one has to activate the nonbonded energy terms and any other energy terms that might be needed, using the flags statement.

Example: Learning Atom-based Parameters from an Ensemble of Structures

The learn statement is used to derive equilibrium geometries and energy constants simultaneously from a thermal ensemble of ten coordinate files:

{* Only the active energy terms are affected by the learn statement.* } 

flags exclude * include bonds angles dihedrals impropers end 

{* Initiate the learning process. *} 

parameters 
   learn initiate sele=(all) mode=statistics end 
end 

{* Loop through the ensemble of coordinates. *} 

for $filename in ( "a1.pdb" "a2.pdb" "a3.pdb" "a4.pdb" "a5.pdb" 
                   "a6.pdb" "a7.pdb" "a8.pdb" "a9.pdb" "a10.pdb" ) loop main 

    coordinates @@$filename 

    parameters 
       learn accumulate end 
    end 

end loop main   

{* Now we terminate the learning process. *} 

parameters 
   learn terminate end 
end 

{* One could now compute energies with the learned parameters or  *} 
{* reduce them to type-based parameters and write them to a file. *} 

Back to tutorials   Previous section   Next section