STRIDE: Protein secondary structure assignment from atomic coordinates Dmitrij Frishman & Patrick Argos European Molecular Biology Laboratory Postfach 102209, Meyerhofstr. 1 69012 Heidelberg Germany FRISHMAN@EMBL-HEIDELBERG.DE ARGOS@EMBL-HEIDELBERG.DE CONTENTS 1. About the method 2. Copyright notice 3. Availability 4. Installation 5. Using STRIDE 6. Output format 7. Bug reports and user feedback 8. References ---------------------------------------------------------------------- 1. About the method STRIDE [1] is a program to recognize secondary structural elements in proteins from their atomic coordinates. It performs the same task as DSSP by Kabsch and Sander [2] but utilizes both hydrogen bond energy and mainchain dihedral angles rather than hydrogen bonds alone. It relies on database-derived recognition parameters with the crystallographers' secondary structure definitions as a standard-of- truth. Please see Frishman and Argos [1] for detailed description of the algorithm. 2. Copyright notice Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. For calculation of the residue solvent accessible area the program NSC [3,4] is used and was kindly provided by Dr. F.Eisenhaber (EISENHABER@EMBL-HEIDELBERG.DE). Please direct to him all questions concerning specifically accessibility calculations. 3. Availability Executables of STRIDE for several UNIX platforms, VAX/VMS, OpenVMS, Dos and Mac together with documentation and source code are available by anonymous FTP from ftp.ebi.ac.uk (directories /pub/software/unix/stride, /pub/software/dos/stride, /pub/software/vms/stride, /pub/software/mac/stride). We are willing to compile the program for other architectures if temporary access to them will be granted by an interested user. Data files with STRIDE secondary structure assignments for the current release of the PDB [5] databank are in the directory /pub/databases/stride of the same site. Atomic coordinate sets can be submitted for secondary structure assignment through electronic mail to stride@embl-heildelberg.de. A mail message containing HELP in the first line will be answered with appropriate instructions. See also WWW page http://www.embl-heidelberg.de/stride/stride_info.html. 4. Installation For UNIX, DOS and Mac no installation is needed. Just download the executable corresponding to your platform, and you are all set. For VAX and OpenVMS you need only to link the executable with a logical name; for example: yourlogicalname:= $ $yourdiskname:[your.directory.name]stride.exe and then use yourlogicalname as the program name. 5. Using STRIDE The only required parameter for STRIDE is the name of the file containing a set of atomic coordinates in PDB [5] format. By default STRIDE writes to standard output, i.e. your screen. On systems that allow to redirect output you can do so to create a disk file. Help is available if you just type STRIDE without parameters. The following options are accepted: -fFilename Write output to the file "Filename" rather than to stdout. -h Report hydrogen bonds. By default no hydrogen bond information is included in the output. -o Report secondary structure summary only. -rId1Id2.. Read only chains Id1, Id2 etc. of the PDB file *). All other chains will be ignored. By default all valid protein chains are read. -cId1Id2.. Process only chains Id1, Id2 ...etc *). Secondary structure assignment will be produced only for these chains, but other chains that are present will be taken into account while calculating residue accessible surface and detecting inter-chain hydrogen bonds and, possibly, interchain beta-sheets. By default all protein chains read are processed. -mFilename Generate a Molscript [6] file. Using the program Molscript by Per Craulis you can create a postscript picture of your structure. You can manually edit the Molscript file produced by STRIDE to achieve the desired orientation and to include additional details. -q[Filename] Generate sequence file in FASTA [7] format and die. Filename is optional. If no file name is specified, stdandard output is used. All options are case- and position-insensitive. Examples: 1. Calculate secondary structure assignment for 1ACP including hydrogen bond information: stride 1acp.brk -h 2. Calculate secondary structure assignment for 4RUB and write the output to the file 4rub.str stride 4rub.brk -f4rub.str 3. Calculate secondary structure assignment for chain B of 4RUB. Ignore all other chains. Generate a Molscript file 4rub.mol. stride 4rub.brk -rb -m4rub.mol 4. Calculate secondary structure assignment for chain C of 2GLS in the presence of chains A and B. Report secondary structure summary only. stride 2gls.brk -rabc -cc -o 6. Output format STRIDE produces output that is easily readable both visually and with computer programs. The side effect of this conveniency is larger file size of individual STRIDE entries. Every record is 79 symbols long and has the following general format: Position Description 1-3 Record code 4-5 Not used 6-73 Data 74-75 Not used 75-79 Four letter PDB code (if available) Below follows the description of each record type. Code Description and format of data REM Remarks and blank lines Format: free HDR Header. Protein name, date of file creation and PDB code Format: free CMP Compound.Full name of the molecule and identifying information Format: free SRC Species, organ, tissue, and mutant from which the molecule has been obtained Format: free AUT Names of the structure authors Format: free CHN File name and PDB chain identifier*). Format: File name beginning from position 6 followed by one space and one-letter chain identifier SEQ Amino acid sequence Format: 6-9 First residue PDB number 11-60 Sequence 62-65 Last residue PDB number STR Secondary structure summary Format: 11-60 Secondary structure assignment **) LOC Location of secondary structure elements Format: 6-17 Element name 19-21 First residue name 32-26 First residue PDB number 28-28 First residue chain identifier 36-38 Last residue name 42-45 Last residue PDB number 47-47 Last residue chain identifier ASG Detailed secondary structure assignment Format: 6-8 Residue name 10-10 Protein chain identifier 12-15 PDB residue number 17-20 Ordinal residue number 25-25 One letter secondary structure code **) 27-39 Full secondary structure name 43-49 Phi angle 53-59 Psi angle 65-69 Residue solvent accessible area DNR Donor residue Format: 6-8 Donor residue name 10-10 Protein chain identifier 12-15 PDB residue number 17-20 Ordinal residue number 26-28 Acceptor residue name 30-30 Protein chain identifier 32-35 PDB residue number 37-40 Ordinal residue number 42-45 N..0 distance 47-52 N..O=C angle 54-59 O..N-C angle 61-66 Angle between the planes of donor complex and O..N-C 68-73 angle between the planes of acceptor complex and N..O=C ACC Acceptor residue Format: 6-8 Acceptor residue name 10-10 Protein chain identifier 12-15 PDB residue number 17-20 Ordinal residue number 26-28 Donor residue name 30-30 Protein chain identifier 32-35 PDB residue number 37-40 Ordinal residue number 42-45 N..0 distance 47-52 N..O=C angle 54-59 O..N-C angle 61-66 Angle between the planes of donor complex and O..N-C 68-73 angle between the planes of acceptor complex and N..O=C HDR, CMP, SCR and AUT records are directly copied from the PDB file, if supplied by the authors. If only the secondary structure summary is requested, only CHN, SEQ, STR and LOC records will be output. Hydrogen bond information (records DNR and ACC) was made very redundant to facilitate human reading and will not be reported by default. *) IMPORTANT NOTE: if the protein chain identifier is ' ' (space), it will be substituted by '-' (dash) everywhere in the STRIDE output. The same is true for command line parameters involving chain identifiers where you have to specify '-' instead of ' '. **) One-letter secondary structure code is nearly the same as used in DSSP [2] (see Frishman and Argos [1] for details): H Alpha helix G 3-10 helix I PI-helix E Extended conformation B or b Isolated bridge T Turn C Coil (none of the above) For each record (data line) except those with codes REM and STR the number of fields is consistent and is readily suitable for processing with external tools, such as awk, perl, etc. 7. Bug reports and user feedback Please send your suggestions, questions and bug reports to FRISHMAN@EMBL-HEIDELBERG.DE. Send your contact address to get information on updates and new features. 8. References 1. Frishman,D & Argos,P. (1995) Knowledge-based secondary structure assignment. Proteins: structure, function and genetics, 23, 566-579. 2. Kabsch,W. & Sander,C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22: 2577-2637. 3. Eisenhaber, F. and Argos, P. (1993) Improved strategy in analytic surface calculation for molecular systems: handling of singularities and computational efficiency. J. comput. Chem. 14, 1272-1280. 4. Eisenhaber, F., Lijnzaad, P., Argos, P., Sander, C., and Scharf, M. (1995) The double cubic lattice method: efficient approaches to numerical integration of surface area and volume and to dot surface contouring of molecular assemblies. J. comput. Chem. 16, 273-284. 5. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F., Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and Tasumi, M. (1977) The protein data bank: a computer-based archival file for macromolecular structures. J. Mol. Biol. 112, 535-542. 6. Kraulis, P.J. (1991) MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Cryst. 24, 946-950. 7. Pearson, W.R. (1990) Rapid and sensitive sequence comparison with FASTP and FASTA. Methods. Enzymol. 183, 63-98.