Organization in the Protein Coil Library
The files stored in the coil library FTP site or returned after a batch search are organized heirarchically by PDB ID. This is done to reduce filesystem access times and fascilitate searches using the UNIX find utility. At the lowest directory level in the heirarchy, files are further sorted by fragment length. As a result, the number of files in a particular directory is generally less then 50, yielding relatively fast access on UNIX/Linux filesystems.
The heirarchical organization is based on the middle two letters of the PDB ID. For example, hen egg lysozyme, which has a PDB ID of 1HEL, will be located in the directory h/he/. At the final level, fragments of varying sizes are stored in directories that correspond to their fragment length. Again, using lysozyme as an example, any seven-residue fragments, if they exist, will reside in the directory h/he/7/. Similarly, seven-residue fragments from 2HEX and 1HE0 will also be in this location.
Filenames are constructed in a straightforward way. The first four characters of the file name designate the corresponding PDB ID. The fifth character specifies the chain identifier (case sensitive, "_" is used for no chain). After the PDB/chain information, there is a period, and the next characters in the file name indicate the fragment length. Following the fragment length, there is another period, after which follows the starting residue number for the fragment. The last character of the residue (before the next period) represents the insertion code. If the insertion code for the starting residue is a space, the character will be an underscore, just as with the chain identifier. The last part of the file name is the extension, ".tor.gz" or ".pdb.gz," which indicates whether the file is a (gzipped) torsion file or a PDB structure file. In the case of the example above, a the structure of a seven-residue fragment from 1HEL starting at residue number 102 would be stored as:
h/he/7/1hel_.7.102_.pdb.gz
while its corresponding torsion angle file will be named
h/he/7/1hel_.7.102_.tor.gz
Only torsion angle files are packaged in a batch PDB ID list search.
Windows and Macintosh users may use either WinZip or StuffIt to decompress the binaries. Additionally, we have provided some example code for searching and parsing file names on the utilities page.