This facility can be used to obtain bond, bond angle, dihedral angle, or improper angle equilibrium parameters and energy constants from selected atoms of Cartesian coordinate sets. The coordinates are specified in the main coordinate set. The statement learns parameters only for those interaction terms that are turned on by the flags statement. The learned parameters will take precedence over the type-based parameters.
The equilibrium geometry parameters can be directly obtained from a single coordinate set or averaged over successive coordinate sets. If just a single coordinate set is available, one can learn only the equilibrium geometry, not the energy constants. If an ensemble of coordinates is available, energy constants can be derived assuming equipartition of energy among the different internal coordinates. This is only approximately verified in a real system, since there is actually coupling among the internal coordinates.
k_bond = (kT)/( 2 <(r-<r>)^2> ) k_angle = (kT)/( 2 <(q-<q>)^2> ) k_dihedral = (kT)/( 2 <(f-<f>)^2> )
The brackets represent an average over the ensemble of coordinate sets. kT/2 is the mean thermal energy per harmonic degree of freedom at T=298K. The last expression assumes that all dihedral angles and torsion angles are represented by a harmonic functional form with periodicity n set to zero. In fact, the learn facility will set the periodicity of all "learned" dihedral and improper angles to zero. In the case that one of the variances in the denominators becomes zero, the corresponding energy constant is set to 999999. Parameter learning is not possible for nonbonded parameters.
The possible learn statements are as follows:
The possible learn options are as follows:
It is important that the learn options be specified in the initialization stage:
learn initiate selection=( name c* ) MODE=STATistics end learn accumulate end learn terminate endRequirements
The atom selection is fragile.
Example: Learning Unknown Equilibrium Parameters from CoordinatesIn the following example, a protein and ligand are considered. The molecular structure of the the protein and the ligand have to be generated as outlined previously. The ligand requires the definition of the topology:
topology autogenerate angles=true dihedrals=false end residue LIGA atom A type=C end atom B type=C end atom C type=C end atom D type=O end bond A B bond B C bond C D improper A B C D end end segment name=LIGA molecule number=1 name=LIGA end end
Note that mass statements may be required if the atom types of the ligand are non-standard.
The protein parameters can be obtained from one of CNS's protein parameter files. In general, the ligand parameters will be unknown. Suppose that the ligand coordinates are known from an appropriate crystal structure. We can learn the unknown ligand parameters from the known Cartesian coordinates. For purposes of structure determination, it is usually sufficient to set the energy constants to a uniform value.
The following statements define the unknown ligand parameters. They should be inserted in all CNS protocols at any place after the molecular structure and coordinate files have been read and before the first energy evaluation is performed.
flags exclude * include bonds angles impropers end parameters learn initiate sele=(segid LIGA) mode=nostatistics end learn accumulate end learn terminate end end parameters BOND (segid LIGA) (segid LIGA) 400. TOKEN ANGLE (segid LIGA) (segid LIGA) (segid LIGA) 60. TOKEN IMPR (segid LIGA) (segid LIGA) (segid LIGA) (segid LIGA) 50. TOKEN TOKEN {* Set the nonbonded parameters (only if required). *} NBON ( (name A or name B or name C) and segid "LIGA" ) 0.1 3.5 0.1 3.5 NBON ( name D and segid "LIGA" ) 0.1 3.4 0.1 3.4 end flags include vdw elec pvdw pele end.
Note that the learn statement automatically sets the periodicity of all learned dihedral and improper angles to zero. Also note that the user has to specify improper and dihedral angles in the topology definition of the ligand in order to maintain planarity and chirality in certain parts of the ligand. Nonbonded parameters may have to be set by appropriate parameter statements unless they are already defined through type-based parameters. Finally, one has to activate the nonbonded energy terms and any other energy terms that might be needed, using the flags statement.
Example: Learning Atom-based Parameters from an Ensemble of StructuresThe learn statement is used to derive equilibrium geometries and energy constants simultaneously from a thermal ensemble of ten coordinate files:
{* Only the active energy terms are affected by the learn statement.* } flags exclude * include bonds angles dihedrals impropers end {* Initiate the learning process. *} parameters learn initiate sele=(all) mode=statistics end end {* Loop through the ensemble of coordinates. *} for $filename in ( "a1.pdb" "a2.pdb" "a3.pdb" "a4.pdb" "a5.pdb" "a6.pdb" "a7.pdb" "a8.pdb" "a9.pdb" "a10.pdb" ) loop main coordinates @@$filename parameters learn accumulate end end end loop main {* Now we terminate the learning process. *} parameters learn terminate end end {* One could now compute energies with the learned parameters or *} {* reduce them to type-based parameters and write them to a file. *}