The structure used in this tutorial is that of a double mutant (D97A,E99A) of the general diffusion porin from Rhodopseudomonas blastica (B.Schmid, L.Maveyraud, M.Kromer, G.E.Schulz "Porin mutants with new channel properties", Protein Science 7, 1603-1611). The deposited refined model (7prn) and the structure factors (r7prnsf) were obtained directly from the Protein Data Bank. The twinning of the data had previously been identified (T.O.Yeates, B.C.Fam "Protein crystals and their evil twins", Structure 7, 25-29). Comparison of the original model and that after refinement with the twinning included shows negligible changes in the protein structure.
Although the model has been previously refined against the data (without twinning taken into account), and should therefore have reasonable geometry, it is sensible to check the model prior to starting refinement. In this example the CNS task file model_stats_twin.inp is used to analyse the model geometry and diffraction statistics. The results will indicate if the model has poor geometry which can occur in the course of manual rebuilding or also due to an error at the generate stage (see previous section). Locating any possible problems prior to refinement will save time later on.
cns_solve < model_stats_twin.inp > model_stats_twin.out [2 minutes]
The output listing file (model_stats_twin.list) contains a variety of information which is self-explanatory. Important things to note are:
Initial R-values:
=================================== summary ================================== resolution range: 500.0 - 2.25 A Twinned R-values: initial r= 0.2737 free_r= 0.2462 after B-factor and/or bulk solvent correction r= 0.1981 free_r= 0.2061
Initial R-values that are very different from expected are usually the result of a simple mistake such as incorrect space group, unit cell dimensions, input diffraction data or input model.
R-values versus resolution:
============================= twinned R-values =============================== =======> R-values with |Fobs|/sigma cutoff= 0.0 Test set (test = 1): #bin | resolution range | #refl | 1 4.85 500.01 196 0.1872 2 3.85 4.85 187 0.1717 3 3.36 3.85 217 0.1881 4 3.05 3.36 211 0.2140 5 2.83 3.05 223 0.2117 6 2.67 2.83 239 0.2629 7 2.53 2.67 233 0.2254 8 2.42 2.53 215 0.2324 9 2.33 2.42 242 0.2253 10 2.25 2.33 232 0.2413 Working set: #bin | resolution range | #refl | 1 4.85 500.01 2128 0.1751 2 3.85 4.85 2179 0.1591 3 3.36 3.85 2188 0.1844 4 3.05 3.36 2188 0.1962 5 2.83 3.05 2153 0.2181 6 2.67 2.83 2192 0.2243 7 2.53 2.67 2132 0.2422 8 2.42 2.53 2179 0.2382 9 2.33 2.42 2182 0.2628 10 2.25 2.33 2174 0.2510
This distribution of R-values is reasonable - there is no dramatic increase in R-value (in particular free R-value) as resolution increases. If there were resolution shells with R-values significantly higher than the rest this might indicate possible problems with the data processing (ice rings for example). However, the R-values indicate the fit to the experimental data and should not be manipulated by removing data - in particular by the use of sigma cutoffs to exclude weak data.
The overall geometry:
rmsd bonds= 0.020606 with 42 bond violations > 0.05 rmsd angles= 2.78530 with 14 angle violations > 8.0 rmsd dihedrals= 28.86228 with 2 angle violations > 60.0 rmsd improper= 2.32829 with 215 angle violations > 3.0
If the model has been through extensive manual rebuilding the initial geometry may have significant deviations from ideality. Any major problem can be detected by the detailed geometry analysis (below).
The geometry in detail:
================================= geometry =================================== =======> bond violations (atom-i |atom-j ) dist. equil. delta energy const. (A 2 C |A 2 O ) 1.291 1.231 0.060 5.253 1480.000 (A 6 C |A 7 N ) 1.405 1.329 0.076 17.535 3020.408 (A 10 CD1 |A 10 CE1 ) 1.432 1.382 0.050 1.668 657.778 (A 31 C |A 31 O ) 1.301 1.231 0.070 7.234 1480.000 .... (A 289 CE2 |A 289 CD2 |A 289 CG |A 289 CD1 ) -4.262 0.000 4.262 4.149 750.000 0 (A 288 C |A 288 CA |A 289 N |A 288 O ) 8.214 0.000 -8.214 15.414 750.000 0 (A 289 C |A 289 CA |A 289 OXT |A 289 O ) 14.885 0.000 -14.885 50.622 750.000 0 =======> dihedral angle violations (atom-i |atom-j |atom-k |atom-L ) angle equil. delta energy const. period (A 1 CB |A 1 CG |A 1 SD |A 1 CE ) -11.417 -90.000 -78.583 9.608 5.000 2 (A 174 CB |A 174 CG |A 174 SD |A 174 CE ) -179.700 -90.000 89.700 10.000 5.000 2
Specific problems with the model geometry can be identified here. In particular the deviations for the bond lengths should be checked. If there are very long bond lengths the model should be checked. This can occur as a result of a mistake at the generate stage, in particular forgetting to include a TER or BREAK card between separate chains. If present, these unphysical bonds will cause serious problems in refinement.
Non-trans peptide bonds:
============================ non-trans peptides ============================== there are no distorted or cis- peptide planes
The presence of non-trans peptides, unless they are proline residues, will cause problems in refinement. They can be identified, and an appropriate parameter file created using the CNS task file general/cis_peptide.inp. The parameter file generated is read into subsequent refinement task files.
Statistics about detwinning of the data:
================================ detwinning ================================== data detwinned with: F_detwin = (Fo^2 - alpha*[Fo^2 + Fo'^2])/(1 - 2*alpha) Fo'[h,k,l] = Fo[h,-h-k,-l] alpha = 0.304 reflections rejected (I_detwin <= 0): 683 working set: 626 test set: 57
The detwinning algorithm depends on whether the twinning is perfect or partial. Some reflections are rejected during the detwinning procedure because the resultant intensity is less than zero.
Detwinned R-values:
============================ detwinned R-values ============================== resolution range: 500.0 - 2.25 A R-values: after detwinning r= 0.2637 free_r= 0.2764
The R-values after detwinning are in greater in magnitude than the twinned R-values.