Checking the initial model


The structure used in this tutorial is that of a double mutant (D97A,E99A) of the general diffusion porin from Rhodopseudomonas blastica (B.Schmid, L.Maveyraud, M.Kromer, G.E.Schulz "Porin mutants with new channel properties", Protein Science 7, 1603-1611). The deposited refined model (7prn) and the structure factors (r7prnsf) were obtained directly from the Protein Data Bank. The twinning of the data had previously been identified (T.O.Yeates, B.C.Fam "Protein crystals and their evil twins", Structure 7, 25-29). Comparison of the original model and that after refinement with the twinning included shows negligible changes in the protein structure.

Although the model has been previously refined against the data (without twinning taken into account), and should therefore have reasonable geometry, it is sensible to check the model prior to starting refinement. In this example the CNS task file model_stats_twin.inp is used to analyse the model geometry and diffraction statistics. The results will indicate if the model has poor geometry which can occur in the course of manual rebuilding or also due to an error at the generate stage (see previous section). Locating any possible problems prior to refinement will save time later on.

      cns_solve < model_stats_twin.inp > model_stats_twin.out [2 minutes]

The output listing file (model_stats_twin.list) contains a variety of information which is self-explanatory. Important things to note are:

Initial R-values:

=================================== summary ==================================

resolution range: 500.0 - 2.25 A
  Twinned R-values:
  initial                                        r= 0.2737 free_r= 0.2462
  after B-factor and/or bulk solvent correction  r= 0.1981 free_r= 0.2061

Initial R-values that are very different from expected are usually the result of a simple mistake such as incorrect space group, unit cell dimensions, input diffraction data or input model.

R-values versus resolution:

============================= twinned R-values ===============================


=======> R-values with |Fobs|/sigma cutoff= 0.0

 Test set (test = 1):

 #bin | resolution range | #refl |
    1   4.85  500.01        196      0.1872
    2   3.85    4.85        187      0.1717
    3   3.36    3.85        217      0.1881
    4   3.05    3.36        211      0.2140
    5   2.83    3.05        223      0.2117
    6   2.67    2.83        239      0.2629
    7   2.53    2.67        233      0.2254
    8   2.42    2.53        215      0.2324
    9   2.33    2.42        242      0.2253
   10   2.25    2.33        232      0.2413

 Working set:

 #bin | resolution range | #refl |
    1   4.85  500.01       2128      0.1751
    2   3.85    4.85       2179      0.1591
    3   3.36    3.85       2188      0.1844
    4   3.05    3.36       2188      0.1962
    5   2.83    3.05       2153      0.2181
    6   2.67    2.83       2192      0.2243
    7   2.53    2.67       2132      0.2422
    8   2.42    2.53       2179      0.2382
    9   2.33    2.42       2182      0.2628
   10   2.25    2.33       2174      0.2510

This distribution of R-values is reasonable - there is no dramatic increase in R-value (in particular free R-value) as resolution increases. If there were resolution shells with R-values significantly higher than the rest this might indicate possible problems with the data processing (ice rings for example). However, the R-values indicate the fit to the experimental data and should not be manipulated by removing data - in particular by the use of sigma cutoffs to exclude weak data.

The overall geometry:

rmsd bonds= 0.020606 with 42 bond violations > 0.05
rmsd angles=  2.78530 with 14 angle violations >  8.0
rmsd dihedrals= 28.86228 with 2 angle violations >  60.0
rmsd improper=  2.32829 with 215 angle violations >  3.0

If the model has been through extensive manual rebuilding the initial geometry may have significant deviations from ideality. Any major problem can be detected by the detailed geometry analysis (below).

The geometry in detail:

================================= geometry ===================================

=======> bond violations

 (atom-i        |atom-j        )    dist.   equil.   delta    energy   const.

 (A    2    C   |A    2    O   )    1.291    1.231    0.060    5.253 1480.000
 (A    6    C   |A    7    N   )    1.405    1.329    0.076   17.535 3020.408
 (A    10   CD1 |A    10   CE1 )    1.432    1.382    0.050    1.668  657.778
 (A    31   C   |A    31   O   )    1.301    1.231    0.070    7.234 1480.000

....

 (A    289  CE2 |A    289  CD2 |A    289  CG  |A    289  CD1 )   -4.262    0.000    4.262    4.149  750.000   0
 (A    288  C   |A    288  CA  |A    289  N   |A    288  O   )    8.214    0.000   -8.214   15.414  750.000   0
 (A    289  C   |A    289  CA  |A    289  OXT |A    289  O   )   14.885    0.000  -14.885   50.622  750.000   0

=======> dihedral angle violations

 (atom-i        |atom-j        |atom-k        |atom-L        )    angle    equil.   delta    energy   const.   period

 (A    1    CB  |A    1    CG  |A    1    SD  |A    1    CE  )  -11.417  -90.000  -78.583    9.608    5.000   2
 (A    174  CB  |A    174  CG  |A    174  SD  |A    174  CE  ) -179.700  -90.000   89.700   10.000    5.000   2

Specific problems with the model geometry can be identified here. In particular the deviations for the bond lengths should be checked. If there are very long bond lengths the model should be checked. This can occur as a result of a mistake at the generate stage, in particular forgetting to include a TER or BREAK card between separate chains. If present, these unphysical bonds will cause serious problems in refinement.

Non-trans peptide bonds:

============================ non-trans peptides ==============================

there are no distorted or cis- peptide planes

The presence of non-trans peptides, unless they are proline residues, will cause problems in refinement. They can be identified, and an appropriate parameter file created using the CNS task file general/cis_peptide.inp. The parameter file generated is read into subsequent refinement task files.

Statistics about detwinning of the data:

================================ detwinning ==================================

data detwinned with: F_detwin   = (Fo^2 - alpha*[Fo^2 + Fo'^2])/(1 - 2*alpha)
                     Fo'[h,k,l] = Fo[h,-h-k,-l]
                     alpha      = 0.304

reflections rejected (I_detwin <= 0): 683
                         working set: 626
                            test set: 57

The detwinning algorithm depends on whether the twinning is perfect or partial. Some reflections are rejected during the detwinning procedure because the resultant intensity is less than zero.

Detwinned R-values:

============================ detwinned R-values ==============================

resolution range: 500.0 - 2.25 A
  R-values:
  after detwinning  r= 0.2637 free_r= 0.2764

The R-values after detwinning are in greater in magnitude than the twinned R-values.


Script to run this tutorial

Back to tutorials   Previous section   Next section