Checking the initial model


Although the model has been obtained by molecular replacement, and should therefore have reasonable geometry, it is sensible to check the model prior to starting refinement. In this example the CNS task file model_stats_start.inp is used to analyse the model geometry and diffraction statistics. The results will indicate if the model has poor geometry which can occur in the course of manual rebuilding or also due to an error at the generate stage (see previous section). Locating any possible problems prior to refinement will save time later on.

      cns_solve < model_stats_start.inp > model_stats_start.out [2 minutes]

The output listing file (model_stats_start.list) contains a variety of information which is self-explanatory. Important things to note are:

Initial R-values:

=================================== summary ==================================

resolution range: 500 - 1.8 A
  R-values:
  initial                                        r= 0.5112 free_r= 0.4890
  after B-factor and/or bulk solvent correction  r= 0.4306 free_r= 0.4424

Initial R-values that are very different from expected are usually the result of a simple mistake such as incorrect space group, unit cell dimensions, input diffraction data or input model.

R-values versus resolution:

================================= R-values ===================================


=======> R-values with |Fobs|/sigma cutoff= 0.0

 Test set (test = 1):

 #bin | resolution range | #refl |
    1   3.88  500.01        354      0.3723
    2   3.08    3.88        354      0.4350
    3   2.69    3.08        375      0.4688
    4   2.44    2.69        394      0.4178
    5   2.27    2.44        349      0.4616
    6   2.13    2.27        384      0.4603
    7   2.03    2.13        349      0.4796
    8   1.94    2.03        348      0.5447
    9   1.86    1.94        374      0.4543
   10   1.80    1.86        393      0.4623

 Working set:

 #bin | resolution range | #refl |
    1   3.88  500.01       3463      0.3662
    2   3.08    3.88       3468      0.4065
    3   2.69    3.08       3422      0.4435
    4   2.44    2.69       3431      0.4527
    5   2.27    2.44       3444      0.4629
    6   2.13    2.27       3416      0.4555
    7   2.03    2.13       3438      0.4647
    8   1.94    2.03       3443      0.4569
    9   1.86    1.94       3400      0.4575
   10   1.80    1.86       3323      0.4782

This distribution of R-values is reasonable - there is no dramatic increase in R-value (in particular free R-value) as resolution increases. If there were resolution shells with R-values significantly higher than the rest this might indicate possible problems with the data processing (ice rings for example). However, the R-values indicate the fit to the experimental data and should not be manipulated by removing data - in particular by the use of sigma cutoffs to exclude weak data.

The overall geometry:

rmsd bonds= 0.009109 with 2 bond violations > 0.05
rmsd angles=  1.57655 with 9 angle violations >  8.0
rmsd dihedrals= 24.18230 with 0 angle violations >  60.0
rmsd improper=  1.36747 with 37 angle violations >  3.0

If the model has been through extensive manual rebuilding the initial geometry may have significant deviations from ideality. Any major problem can be detected by the detailed geometry analysis (below).

The geometry in detail:

================================= geometry ===================================

=======> bond violations

 (atom-i        |atom-j        )    dist.   equil.   delta    energy   const.

 (A    159  CG1 |A    159  CD1 )    1.569    1.513    0.056    1.222  389.218
 (B    159  CG1 |B    159  CD1 )    1.568    1.513    0.055    1.181  389.218

=======> angle violations

 (atom-i        |atom-j        |atom-k        )  angle    equil.     delta    energy  const.

 (A    111  N   |A    111  CA  |A    111  C   )  103.152  111.200   -8.048    4.891  247.886
 (A    112  N   |A    112  CA  |A    112  C   )  103.195  111.200   -8.005    4.839  247.886
 (A    182  N   |A    182  CA  |A    182  C   )  101.879  111.200   -9.321    6.561  247.886
 (A    198  N   |A    198  CA  |A    198  C   )   99.371  111.200  -11.829   10.566  247.886
 (B    111  N   |B    111  CA  |B    111  C   )  103.139  111.200   -8.061    4.907  247.886
 (B    112  N   |B    112  CA  |B    112  C   )  103.134  111.200   -8.066    4.913  247.886
 (B    182  N   |B    182  CA  |B    182  C   )  101.841  111.200   -9.359    6.614  247.886
 (B    198  N   |B    198  CA  |B    198  C   )   99.325  111.200  -11.875   10.648  247.886
 (B    206  N   |B    206  CA  |B    206  C   )  103.168  111.200   -8.032    4.872  247.886

=======> improper angle violations

 (atom-i        |atom-j        |atom-k        |atom-L        )    angle    equil.   delta    energy   const.   period

 (A    114  C   |A    114  CA  |A    115  N   |A    114  O   )    3.997    0.000   -3.997    3.649  750.000   0
 (A    117  CA  |A    117  N   |A    117  C   |A    117  CB  )   31.678   35.264    3.586    2.938  750.000   0
 (A    120  CA  |A    120  N   |A    120  C   |A    120  CB  )   38.700   35.264   -3.435    2.696  750.000   0
 (B    115  CA  |B    115  N   |B    115  C   |B    115  CB  )   32.248   35.264    3.017    2.079  750.000   0
 (B    114  C   |B    114  CA  |B    115  N   |B    114  O   )    4.034    0.000   -4.034    3.717  750.000   0
 (B    117  CA  |B    117  N   |B    117  C   |B    117  CB  )   31.640   35.264    3.624    3.001  750.000   0
 (B    120  CA  |B    120  N   |B    120  C   |B    120  CB  )   38.645   35.264   -3.380    2.611  750.000   0

=======> dihedral angle violations

 (atom-i        |atom-j        |atom-k        |atom-L        )    angle    equil.   delta    energy   const.   period

Specific problems with the model geometry can be identified here. In particular the deviations for the bond lengths should be checked. If there are very long bond lengths the model should be checked. This can occur as a result of a mistake at the generate stage, in particular forgetting to include a TER or BREAK card between separate chains. If present, these unphysical bonds will cause serious problems in refinement.

Non-trans peptide bonds:

============================ non-trans peptides ==============================

cis-peptide: segid=A resid=186 resname=PRO
             current dihedral value=   -0.443

cis-peptide: segid=B resid=186 resname=PRO
             current dihedral value=   -0.262

The presence of non-trans peptides, unless they are proline residues, will cause problems in refinement. They can be identified, and an appropriate parameter file created using the CNS task file general/cis_peptide.inp. The parameter file generated is read into subsequent refinement task files.


Script to run this tutorial

Back to tutorials   Previous section   Next section