DANNSR V1.0

Program for Database Analysis of Sequence Structure Relationships in Proteins

Raj Srinivasan

Johns Hopkins University Medical School

Baltimore , MD 21205.

raj@grserv.med.jhmi.edu

Table of Contents

  1. Introduction
  2. Command Reference
  3. Files & Things
  4. Utilities
  5. Tutorial


The information supplied in this document is believed to be true but no liability is assumed for its use or for the infringements of the rights of the others resulting from its use.

This package is distributed without any conditions. It may be lent, re-sold, hired out or otherwise circulated without the supplier's prior consent, in any form of packaging or cover. Any part of this manual or accompanying software may be reproduced, stored in a retrieval system on optical or magnetic disk, tape or any other medium, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise for any purpose..


Introduction

DANSSR (Database Analysis of protein Structure Sequence Realtionships) is an effort to develop a program wherein properties of interest pertaining to a given sequence can be investigated using a database of known structures. The properties in the current incarnation are solely structural, i.e. distances, angles, torsions, backbone/sidechain conformations and solvent accessible surfaces. The program is suited for analysis of contiguous sequences only. The program is written in SGI Fortran and includes the common do , enddo extension and the getarg() unix function. It should compile without any modification on most unix machines.


Command Reference

DANSSR allows the execution of interactive commands typed at the "Danssr" prompt in the terminal window. Each command must be given on a separate line. Keywords are case insensitive and may be entered in either upper or lower case letters. All whitespace characters are ignored except to separate keywords and their arguments.

The commands/keywords currently recognised by DANSSR are given below.

    Sequence          Code          Restrain
    Measure           Print         Output
    Coords            Unix          Source
    Search            Quit           

Sequence

Syntax: sequence {<"string">}

The DANSSR sequence command permits the input of a polypeptide sequence as a series of one letter codes. The codes representing each amino acid are read from a file called match.sets The file should exist in one of three places: a) the directory from which the program was launched b) in the users home directory c) in the directory pointed to by the variable $DANNSR_DIR A typical e.g.,of a file is as follows:

A	 A
C	 C
D	 D
E	 E
F	 F
G	 G
H	 H
I	 I
K	 K
L	 L
M	 M
N	 N
P	 P
Q	 Q
R	 R
S	 S
T	 T
V	 V
W	 W
Y	 Y
0	 ARNDCQEGHILKMFPSTWYV              
1	 ARNDCQEGHILKMFSTWYV
2	 ARNDCQEHILKMFSTWYV                      		
3	 CILMFWV
4	 STNDEQ
5	 AKRHY
6	 STNDEQKRHY
a	 STND

The first twenty codes represent the 20 natural amino-acids The code 0 represents any amino acid, 1 - any amino acid but Proline and so on. The user may edit this file to suit their own requirements. The code should be only one character long. It is a good idea to not alter the first twenty codes in the above file. Following is a list of examples. The command:

sequence "AAAA"

searches for a four residue sequence where all positions are A.

sequence "A[STND]AAA"

searches for a four residue sequence where the second position is one of S,T,N or D while the rest are A.

sequence "0aa0"

searches for a four residue sequence where the second and third positions are one of S,T,N or D while the rest are any residue.

Note the quotes around the query sequence. This is necessary.


Code

Syntax: code {<"string">}

The DANSSR code command is used to limit the phi,psi range of each residue in the query sequence to a specific region of the Ramachandran Map. Depending on the database being used various code schemes may be employed.

Three different databases are supplied. In the first, case the phi,psi region of the residue is marked as H,T,E or X depending on whether the residue is in a helix, turn, sheet or coil. The secondary structure was assigned from an automated examination of backbone-backbone hydrogen bonding patterns. The second flavor of the database assigns a single letter mnemonic for the every region of the phi,psi map following the nomencalature of Zimmermann and Scheraga (PNAS .....). A complete description of the codes is as follows:

CODE	  PHI-Range	  Psi-Range
 A	-110 to  -40	 -90 to  -10
 B	-110 to  -40     -10 to   50
 C	-110 to  -40	  50 to  130
 F	-110 to  -40	 130 to  180 & -180 to -140
 E	-180 to -110	-180 to -140 &  110 to  180
 G      -180 to -110     -90 to  -40
 B      -180 to -110     -40 to   10
 D      -180 to -110      10 to  110
 a	 110 to   40	  90 to   10
 b	 110 to   40      10 to  -50
 c	 110 to   40	  50 to -130
 f	 110 to   40	-130 to -180 &  180 to  140
 e	 180 to  110	 180 to  140 & -110 to -180
 g       180 to  110      90 to   40
 b       180 to  110      40 to  -10
 d       180 to  110     -10 to -110
 H	 anything not above.
The third flavor is related to the second method. In this case the phi,psi map is divided into 42 equal 60 deg. x 60 deg. bins for e.g. -180,180 -180,120 -180,60 -180,0 and so on, with each bin having an associated code. The description of the codes is:
CODE	  PHI-Range	  Psi-Range
 A       -180 +/- 30	 -180 +/- 30
 B	 -180 +/- 30	 -120 +/- 30
 C	 -180 +/- 30	  -60 +/- 30
 D	 -180 +/- 30	    0 +/- 30
 E	 -180 +/- 30	   60 +/- 30
 F	 -180 +/- 30	  120 +/- 30
 A	 -180 +/- 30	  180 +/- 30
 G       -120 +/- 30	 -180 +/- 30
 H	 -120 +/- 30	 -120 +/- 30
 I	 -120 +/- 30	  -60 +/- 30
 J	 -120 +/- 30	    0 +/- 30
 K	 -120 +/- 30	   60 +/- 30
 L	 -120 +/- 30	  120 +/- 30
 G	 -120 +/- 30	  180 +/- 30
 M        -60 +/- 30	 -180 +/- 30
 N	  -60 +/- 30	 -120 +/- 30
 O	  -60 +/- 30	  -60 +/- 30
 P	  -60 +/- 30	    0 +/- 30
 Q	  -60 +/- 30	   60 +/- 30
 R	  -60 +/- 30	  120 +/- 30
 M	  -60 +/- 30	  180 +/- 30
 S          0 +/- 30	 -180 +/- 30
 T	    0 +/- 30	 -120 +/- 30
 U	    0 +/- 30	  -60 +/- 30
 V	    0 +/- 30	    0 +/- 30
 W	    0 +/- 30	   60 +/- 30
 X	    0 +/- 30	  120 +/- 30
 S	    0 +/- 30	  180 +/- 30
 m         60 +/- 30     -180 +/- 30
 r         60 +/- 30     -120 +/- 30
 q         60 +/- 30      -60 +/- 30
 p         60 +/- 30        0 +/- 30
 o         60 +/- 30       60 +/- 30
 n         60 +/- 30      120 +/- 30
 m         60 +/- 30      180 +/- 30
 g        120 +/- 30     -180 +/- 30
 l        120 +/- 30     -120 +/- 30
 k        120 +/- 30      -60 +/- 30
 j        120 +/- 30        0 +/- 30
 i        120 +/- 30       60 +/- 30
 h        120 +/- 30      120 +/- 30
 g        120 +/- 30      180 +/- 30
 a        180 +/- 30     -180 +/- 30
 f        180 +/- 30     -120 +/- 30
 e        180 +/- 30      -60 +/- 30
 d        180 +/- 30        0 +/- 30
 c        180 +/- 30       60 +/- 30
 b        180 +/- 30      120 +/- 30
 a        180 +/- 30      180 +/- 30
Specifying the code for a query sequence is analogous to specifying the sequence itself. For e.g., code "HHHH" constrains all four residues to have the sequence code "H" code "H0HH" constrains the first,third and fourth residues to be in the H conformation while the second residue may adopt any conformation code"H[HE]HH" constrains the first,third and fourth residues to be in the H conformation while the second residue may adopt either the H or E conformation and so on.


Restrain

Syntax: restrain [option] [target] [tolerance]

The DANSSR restrain command is used to specify various distance,angle and torsion restraints in the search. In addition the following pre-defined torsional restraints are available:

phi,psi,omega

These latter restraints are specified as follows:

restrain option residue-number target-value tolerance

where option is one of phi,psi,omega

target-value is the desired value ( e.g. 180.0)

and tolerance is the permissible variation from the target value ( e.g. 30.0)

Thus, the command

restrain phi 2 -60 30

constrains the phi value of the 2nd residue in the input sequence to be -60 +/- 30 degrees.

The command

restrain phi all -60 30

constrains the phi value of the all residues in the input sequence to be -60 +/- 30 degrees.

In addition atom based constraints may also be specified. Specifically, the distance between any two atoms, the angle between any three atoms, and the torsion between any four atoms may be defined.

Thus the command

restrain distance 1 O 5 N 2.0 3.5

constrains the distance between atom O of residue 1 and atom N of residue 5 to be between 2 and 3.5 Angstroms. The atom names stored in the database follow the Brookhaven convention strictly.

The command

restrain angle 1 O 5 N 5 CA 120 30

constrains the angle between atom O of residue 1 , atom N of residue 5 and atom CA of residue 5 to be 120 +/- 30 degrees.

Similarly, the command

restrain torsion 1 O 5 N 5 CA 4 C 180 30

constrains the dihedral angle between atom O of residue 1 , atom N of residue 5 ,atom CA of residue 5 and atom C of residue 4 to be 180 +/- 30 degrees.

Any query can have any combination of all or few of the restraints. The limit on the individual number of torsions, distances and angles is currently however limited to 20 of each. Specification of all three restraints listed above will recover sequences containing a hydrogen bond between residues 1 and 5, for example.


Measure

Syntax: measure [option]

The DANSSR measure command is used to measure distances,angles and torsions between atoms. Measurements of phi, psi, omega, chi1, chi2, chi3, chi4 and chi5 should not be specified with this option. These are accessible through the Print option.

Thus, the command

measure distance 1 O 5 N

measures the distance between atom O of residue 1 and atom N of residue 5.

The command

measure angle 1 O 5 N 5 CA

measures the angle between atom O of residue 1 , atom N of residue 5 and atom CA of residue 5.

Similarly, the command

measure torsion 1 O 5 N 5 CA 4 C

measures the dihedral angle between atom O of residue 1 , atom N of residue 5 ,atom CA of residue 5 and atom C of residue 4.

The limit on the individual number of torsions, distances and angles to be measured is currently limited to 20 of each. All mesurements are listed to the Output file.


PRINT

Syntax: print [options]

The DANSSR print command is used to control the level of printing to the output file. Only the matched sequence is always printed by default. Additionally, when the measure command is part of the query, all measurements are printed. Additional information, if required, can be accessed through the print command. The following opitons are available:

phi - prints the phi torsion value of each residue

psi - prints the psi torsion value of each residue

omega - prints the omega torsion value of each residue

chi1 - prints the chi2 torsion value of each residue

chi2 - prints the chi2 torsion value of each residue

chi3 - prints the chi3 torsion value of each residue

chi4 - prints the chi4 torsion value of each residue

chi5 - prints the chi5 torsion value of each residue

code - prints the conformational code of each residue

coord - prints the coordinates of all atoms in each residue

to a file in pdb format

resarea - prints the solvent accessible surface areas

and the normalized areas of each residue

For e.g. the command

print phi psi omega codes

writes out the phi,psi etc. values. All options may be given on the same line or on separate/multiple lines and in any combination.


Output

Syntax: output to filename

The DANSSR output option is used to specify the file to which results are to be written. The filename may not be longer than 128 characters. Any file in the current directory with the same name will be overwritten.

e.g. output to test1.dat


Coords

Syntax: coords to filename

The DANSSR coords option is used to specify the file to which coordinates are to be written. The filename may not be longer than 128 characters. Any file in the current directory with the same name will be overwritten.

e.g. output to test1.pdb

This command is necessary only if printing of coordinates has been requested.


Unix

Syntax: unix command

The DANSSR unix option is used to pass a command to the system. Any valid unix command may be passed. Some examples are

unix more myfile

unix /bin/sh

Remember that user defined aliases are not recognized. This can have disastorous consequences with commands such as rm , mv etc.


Source

Syntax: source filename

The DANSSR source command is used to read in commands from a file. Any valid command is processed and action taken. Each session also produces a log file called danssr.log which can be edited, renamed and subsequently sourced.


Search

Syntax: search

The DANSSR search command performs the actual search. If a sequence has not been defined before issuing this command, the user will be prompted for a sequence. Similarly it output filename(s) have not been specified, the user is prompted.


Quit

Syntax: quit

The DANSSR quit command ends the session.


Files & Things

    DANSSR uses two files - match.sets and danssr.db. The layout of the match.sets file is described under the Sequence command.

    The main data for the program, such as sequences, coordinates etc. comes from a file called danssr.db. Currently files for two different datasets are supplied. One contains 43 Chains and the other 274 chains. The files are respectively, database.43 and database.274. Additionally there are also files called database_zs.43 and database_rc.43 which are identical to database.42 except the secondary structure column (field 3 of the residue description line) contains codes assigned using the zimmermann & Scheraga nomenclature and the Rational Codes formalism ( look under the command Code for explanations) respectively. Similarly suffixed files are available for the 274 Chain database also.

    On startup of the program, a search is made for the location of the files in the following three areas hierarchically:

    i) current working directory

    ii) users HOME directory &

    iii) the directory pointed to by the variable $DANSSR_DIR

    If the search is successful the files are used. If not, the user is prompted for the file to use. Supply the filename with full path information.

    A brief description of the database is as follows. For each protein the first line is a five letter code ( four letter pdb code and chain name if any)

    For each residue in each protein there is :

      i) a line containing its name, number,conformatonal code, single letter code, phi,psi,omega,chi1,chi2,chi3,chi4,chi5, accesible surface area, normalized fractional buried area, and number of atoms in the residue. A value of 999.999 in any of these coulmns indicates that the values are not available, either because they are not appropriate, or the data was not available( missing atoms, etc.)

      ii) one line for each atom in the residue containing its name, x, y & z coordinates, and accessible area.

    A value of XXX in the residue name column, indicates a chain break.


Utilities

    The program dbmake can be used to create a database for use with DANSSR. To run this program use the command

    dbmake "lstfil" "dbname"

    where lstfil contains the list of files corresponding to each protein to be added to the database. List each file on a separate line, and give full path.

    dbname is the name of the database file to create.

    Since these files are text files, different databases may be combined ( for e.g. using the cat command in unix ) to produce a single larger database. The programs makerc and makezs read in a database produce by dbmake and replace the secondary structure colums with rational codes and Zimmermann-Scheraga codes respectively.


Tutorial

Searching for a Sequence

    To search for the sequence Ser-X-X-Glu where X is any amino acid, and to print out the phi,psi,and omega values and the conformational codes, the following commands are used. The output is saved to the file sxxe.seq

    sequence "S00E"

    print phi psi omega code

    output to sxxe.seq

    search

Searching for sequence with specified backbone conformation

    There are two different ways of specifying the backbone conformation.

    Let us search for the sequence Ser-X-X-Glu where X is any amino acid and specify that the residues X,X,Glu should be in an alpha-helical conformation.

    Method I:

    Let us assume that the alpha-helical conformation is characterized by a phi value of -65 +/- 20 and a psi value of -40 +/- 20

    sequence "S00E"

    print phi psi omega code

    output to sxxe.seq

    restrain phi 2 -65 20

    restrain phi 3 -65 20

    restrain phi 4 -65 20

    restrain psi 2 -40 20

    restrain psi 3 -40 20

    restrain psi 4 -40 20

    search

    Method II:

    In this case we will specify the conformational codes for the residues. The codes will depend on the database being used. In this example it is assumed that the database being used is the one with the rational codes (O corresponds to the region -60+/-30,-60+/-30)

    sequence "S00E"

    print phi psi omega code

    output to sxxe.seq

    code "0OOO"

    search