You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

386 lines
13 KiB
Plaintext

STRIDE: Protein secondary structure assignment
from atomic coordinates
Dmitrij Frishman & Patrick Argos
European Molecular Biology Laboratory
Postfach 102209, Meyerhofstr. 1
69012 Heidelberg
Germany
FRISHMAN@EMBL-HEIDELBERG.DE
ARGOS@EMBL-HEIDELBERG.DE
CONTENTS
1. About the method
2. Copyright notice
3. Availability
4. Installation
5. Using STRIDE
6. Output format
7. Bug reports and user feedback
8. References
----------------------------------------------------------------------
1. About the method
STRIDE [1] is a program to recognize secondary structural elements in
proteins from their atomic coordinates. It performs the same task as
DSSP by Kabsch and Sander [2] but utilizes both hydrogen bond energy
and mainchain dihedral angles rather than hydrogen bonds alone. It
relies on database-derived recognition parameters with the
crystallographers' secondary structure definitions as a standard-of-
truth. Please see Frishman and Argos [1] for detailed description of
the algorithm.
2. Copyright notice
Permission is hereby granted, free of charge, to any person obtaining
a copy of this software and associated documentation files (the
"Software"), to deal in the Software without restriction, including
without limitation the rights to use, copy, modify, merge, publish,
distribute, sublicense, and/or sell copies of the Software, and to
permit persons to whom the Software is furnished to do so, subject to
the following conditions:
The above copyright notice and this permission notice shall be
included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
For calculation of the residue solvent accessible area the program NSC
[3,4] is used and was kindly provided by Dr. F.Eisenhaber
(EISENHABER@EMBL-HEIDELBERG.DE). Please direct to him all questions
concerning specifically accessibility calculations.
3. Availability
Executables of STRIDE for several UNIX platforms, VAX/VMS, OpenVMS,
Dos and Mac together with documentation and source code are available
by anonymous FTP from ftp.ebi.ac.uk (directories
/pub/software/unix/stride, /pub/software/dos/stride,
/pub/software/vms/stride, /pub/software/mac/stride). We are willing to
compile the program for other architectures if temporary access to
them will be granted by an interested user.
Data files with STRIDE secondary structure assignments for the current
release of the PDB [5] databank are in the directory
/pub/databases/stride of the same site. Atomic coordinate sets can be
submitted for secondary structure assignment through electronic mail
to stride@embl-heildelberg.de. A mail message containing HELP in the
first line will be answered with appropriate instructions. See also
WWW page http://www.embl-heidelberg.de/stride/stride_info.html.
4. Installation
For UNIX, DOS and Mac no installation is needed. Just download the
executable corresponding to your platform, and you are all set. For
VAX and OpenVMS you need only to link the executable with a logical
name; for example:
yourlogicalname:= $ $yourdiskname:[your.directory.name]stride.exe
and then use yourlogicalname as the program name.
5. Using STRIDE
The only required parameter for STRIDE is the name of the file
containing a set of atomic coordinates in PDB [5] format. By default
STRIDE writes to standard output, i.e. your screen. On systems that
allow to redirect output you can do so to create a disk file. Help is
available if you just type STRIDE without parameters. The following
options are accepted:
-fFilename Write output to the file "Filename" rather than to
stdout.
-h Report hydrogen bonds. By default no hydrogen bond
information is included in the output.
-o Report secondary structure summary only.
-rId1Id2.. Read only chains Id1, Id2 etc. of the PDB file *). All
other chains will be ignored. By default all valid
protein chains are read.
-cId1Id2.. Process only chains Id1, Id2 ...etc *). Secondary
structure assignment will be produced only for these
chains, but other chains that are present will be taken
into account while calculating residue accessible
surface and detecting inter-chain hydrogen bonds and,
possibly, interchain beta-sheets. By default all
protein chains read are processed.
-mFilename Generate a Molscript [6] file. Using the program
Molscript by Per Craulis you can create a postscript
picture of your structure. You can manually edit the
Molscript file produced by STRIDE to achieve the
desired orientation and to include additional details.
-q[Filename] Generate sequence file in FASTA [7] format and die.
Filename is optional. If no file name is specified,
stdandard output is used.
All options are case- and position-insensitive.
Examples:
1. Calculate secondary structure assignment for 1ACP including
hydrogen bond information:
stride 1acp.brk -h
2. Calculate secondary structure assignment for 4RUB and write the
output to the file 4rub.str
stride 4rub.brk -f4rub.str
3. Calculate secondary structure assignment for chain B of 4RUB.
Ignore all other chains. Generate a Molscript file 4rub.mol.
stride 4rub.brk -rb -m4rub.mol
4. Calculate secondary structure assignment for chain C of 2GLS in
the presence of chains A and B. Report secondary structure
summary only.
stride 2gls.brk -rabc -cc -o
6. Output format
STRIDE produces output that is easily readable both visually and with
computer programs. The side effect of this conveniency is larger file
size of individual STRIDE entries. Every record is 79 symbols long and
has the following general format:
Position Description
1-3 Record code
4-5 Not used
6-73 Data
74-75 Not used
75-79 Four letter PDB code (if available)
Below follows the description of each record type.
Code Description and format of data
REM Remarks and blank lines
Format: free
HDR Header. Protein name, date of file creation and PDB code
Format: free
CMP Compound.Full name of the molecule and identifying
information
Format: free
SRC Species, organ, tissue, and mutant from which the molecule
has been obtained
Format: free
AUT Names of the structure authors
Format: free
CHN File name and PDB chain identifier*).
Format: File name beginning from position 6 followed
by one space and one-letter chain identifier
SEQ Amino acid sequence
Format: 6-9 First residue PDB number
11-60 Sequence
62-65 Last residue PDB number
STR Secondary structure summary
Format: 11-60 Secondary structure assignment **)
LOC Location of secondary structure elements
Format: 6-17 Element name
19-21 First residue name
32-26 First residue PDB number
28-28 First residue chain identifier
36-38 Last residue name
42-45 Last residue PDB number
47-47 Last residue chain identifier
ASG Detailed secondary structure assignment
Format: 6-8 Residue name
10-10 Protein chain identifier
12-15 PDB residue number
17-20 Ordinal residue number
25-25 One letter secondary structure code **)
27-39 Full secondary structure name
43-49 Phi angle
53-59 Psi angle
65-69 Residue solvent accessible area
DNR Donor residue
Format: 6-8 Donor residue name
10-10 Protein chain identifier
12-15 PDB residue number
17-20 Ordinal residue number
26-28 Acceptor residue name
30-30 Protein chain identifier
32-35 PDB residue number
37-40 Ordinal residue number
42-45 N..0 distance
47-52 N..O=C angle
54-59 O..N-C angle
61-66 Angle between the planes of donor
complex and O..N-C
68-73 angle between the planes of acceptor
complex and N..O=C
ACC Acceptor residue
Format: 6-8 Acceptor residue name
10-10 Protein chain identifier
12-15 PDB residue number
17-20 Ordinal residue number
26-28 Donor residue name
30-30 Protein chain identifier
32-35 PDB residue number
37-40 Ordinal residue number
42-45 N..0 distance
47-52 N..O=C angle
54-59 O..N-C angle
61-66 Angle between the planes of donor
complex and O..N-C
68-73 angle between the planes of acceptor
complex and N..O=C
HDR, CMP, SCR and AUT records are directly copied from the PDB file,
if supplied by the authors. If only the secondary structure summary is
requested, only CHN, SEQ, STR and LOC records will be output.
Hydrogen bond information (records DNR and ACC) was made very
redundant to facilitate human reading and will not be reported by
default.
*) IMPORTANT NOTE: if the protein chain identifier is ' ' (space), it
will be substituted by '-' (dash) everywhere in the STRIDE output.
The same is true for command line parameters involving chain
identifiers where you have to specify '-' instead of ' '.
**) One-letter secondary structure code is nearly the same as used in
DSSP [2] (see Frishman and Argos [1] for details):
H Alpha helix
G 3-10 helix
I PI-helix
E Extended conformation
B or b Isolated bridge
T Turn
C Coil (none of the above)
For each record (data line) except those with codes REM and STR the
number of fields is consistent and is readily suitable for processing
with external tools, such as awk, perl, etc.
7. Bug reports and user feedback
Please send your suggestions, questions and bug reports to
FRISHMAN@EMBL-HEIDELBERG.DE. Send your contact address to get
information on updates and new features.
8. References
1. Frishman,D & Argos,P. (1995) Knowledge-based secondary structure
assignment. Proteins: structure, function and genetics, 23,
566-579.
2. Kabsch,W. & Sander,C. (1983) Dictionary of protein secondary
structure: pattern recognition of hydrogen-bonded and
geometrical features. Biopolymers, 22: 2577-2637.
3. Eisenhaber, F. and Argos, P. (1993) Improved strategy in
analytic surface calculation for molecular systems: handling of
singularities and computational efficiency. J. comput. Chem. 14,
1272-1280.
4. Eisenhaber, F., Lijnzaad, P., Argos, P., Sander, C., and Scharf,
M. (1995) The double cubic lattice method: efficient approaches
to numerical integration of surface area and volume and to dot
surface contouring of molecular assemblies. J. comput. Chem. 16,
273-284.
5. Bernstein, F.C., Koetzle, T.F., Williams, G.J., Meyer, E.F.,
Brice, M.D., Rodgers, J.R., Kennard, O., Shimanouchi, T., and
Tasumi, M. (1977) The protein data bank: a computer-based
archival file for macromolecular structures. J. Mol. Biol. 112,
535-542.
6. Kraulis, P.J. (1991) MOLSCRIPT: a program to produce both
detailed and schematic plots of protein structures. J. Appl.
Cryst. 24, 946-950.
7. Pearson, W.R. (1990) Rapid and sensitive sequence comparison
with FASTP and FASTA. Methods. Enzymol. 183, 63-98.