stride/doc/stride.doc



		  STRIDE: Protein secondary structure assignment
			      from atomic coordinates

			 Dmitrij Frishman & Patrick Argos

		       European	Molecular Biology Laboratory
			  Postfach 102209, Meyerhofstr.	1
				  69012	Heidelberg
				      Germany

			    FRISHMAN@EMBL-HEIDELBERG.DE
			      ARGOS@EMBL-HEIDELBERG.DE


				      CONTENTS


       1.  About the method

       2.  Copyright notice

       3.  Availability

       4.  Installation

       5.  Using STRIDE

       6.  Output format

       7.  Bug reports and user	feedback

       8.  References


       ----------------------------------------------------------------------


       1.  About the method


       STRIDE [1] is a program to recognize secondary structural elements  in
       proteins	 from  their atomic coordinates. It performs the same task as
       DSSP by Kabsch and Sander [2] but utilizes both hydrogen	 bond  energy
       and  mainchain  dihedral	 angles	 rather	than hydrogen bonds alone. It
       relies	on   database-derived	recognition   parameters   with	  the
       crystallographers'  secondary  structure	definitions as a standard-of-
       truth. Please see Frishman and Argos [1]	for detailed  description  of
       the algorithm.


       2.  Copyright notice

       Permission is hereby granted, free of charge, to any person  obtaining
       a copy of  this  software  and  associated  documentation  files  (the
       "Software"), to deal in the Software  without  restriction,  including
       without limitation the rights to use, copy,  modify,  merge,  publish,
       distribute, sublicense, and/or sell copies of  the  Software,  and  to
       permit persons to whom the Software is furnished to do so, subject  to
       the following conditions:

       The above  copyright  notice  and  this  permission  notice  shall  be
       included in all  copies  or  substantial  portions  of  the  Software.

       THE SOFTWARE IS PROVIDED  "AS  IS",  WITHOUT  WARRANTY  OF  ANY  KIND,
       EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED  TO  THE  WARRANTIES  OF
       MERCHANTABILITY,   FITNESS    FOR    A    PARTICULAR    PURPOSE    AND
       NONINFRINGEMENT.  IN NO EVENT SHALL THE AUTHORS OR  COPYRIGHT  HOLDERS
       BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER  LIABILITY,  WHETHER  IN  AN
       ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING  FROM,  OUT  OF  OR  IN
       CONNECTION WITH THE SOFTWARE OR THE  USE  OR  OTHER  DEALINGS  IN  THE
       SOFTWARE.

       For calculation of the residue solvent accessible area the program NSC
       [3,4]   is   used   and	 was  kindly  provided	by  Dr.	 F.Eisenhaber
       (EISENHABER@EMBL-HEIDELBERG.DE).	Please direct to  him  all  questions
       concerning specifically accessibility calculations.

       3.  Availability


       Executables of STRIDE for several UNIX  platforms,  VAX/VMS,  OpenVMS,
       Dos  and	Mac together with documentation	and source code	are available
       by     anonymous	    FTP	    from      ftp.ebi.ac.uk	 (directories
       /pub/software/unix/stride,		    /pub/software/dos/stride,
       /pub/software/vms/stride, /pub/software/mac/stride). We are willing to
       compile	the  program  for  other architectures if temporary access to
       them will be granted by an interested user.

       Data files with STRIDE secondary	structure assignments for the current
       release	 of   the   PDB	  [5]	databank   are	 in   the   directory
       /pub/databases/stride of	the same site. Atomic coordinate sets can  be
       submitted  for  secondary structure assignment through electronic mail
       to stride@embl-heildelberg.de. A	mail message containing	HELP  in  the
       first  line  will  be answered with appropriate instructions. See also
       WWW page	http://www.embl-heidelberg.de/stride/stride_info.html.


       4.  Installation


       For UNIX, DOS and Mac no	installation is	 needed.  Just	download  the
       executable  corresponding  to  your platform, and you are all set. For
       VAX and OpenVMS you need	only to	link the executable  with  a  logical
       name; for example:

	 yourlogicalname:= $ $yourdiskname:[your.directory.name]stride.exe

       and then	use yourlogicalname as the program name.


       5.  Using STRIDE


       The only	required parameter  for	 STRIDE	 is  the  name	of  the	 file
       containing  a  set of atomic coordinates	in PDB [5] format. By default
       STRIDE writes to	standard output, i.e. your screen.  On	systems	 that
       allow  to redirect output you can do so to create a disk	file. Help is
       available if you	just type STRIDE without  parameters.  The  following
       options are accepted:


       -fFilename     Write output to the  file	 "Filename"  rather  than  to
		      stdout.


       -h	      Report hydrogen bonds.  By  default  no  hydrogen	 bond
		      information is included in the output.

       -o	      Report secondary structure summary only.

       -rId1Id2..     Read only	chains Id1, Id2	etc. of	the PDB	file *).  All
		      other  chains  will  be  ignored.	 By default all	valid
		      protein chains are read.

       -cId1Id2..     Process only  chains  Id1,  Id2  ...etc  *).  Secondary
		      structure	 assignment  will  be produced only for	these
		      chains, but other	chains that are	present	will be	taken
		      into   account  while  calculating  residue  accessible
		      surface and detecting inter-chain	hydrogen  bonds	 and,
		      possibly,	  interchain   beta-sheets.  By	 default  all
		      protein chains read are processed.

       -mFilename     Generate	a  Molscript  [6]  file.  Using	 the  program
		      Molscript	 by  Per  Craulis you can create a postscript
		      picture of your structure. You can  manually  edit  the
		      Molscript	 file  produced	 by  STRIDE  to	 achieve  the
		      desired orientation and to include additional details.

       -q[Filename]   Generate sequence	file in	FASTA  [7]  format  and	 die.
		      Filename	is  optional.  If  no file name	is specified,
		      stdandard	output is used.

       All options are case- and position-insensitive.

       Examples:


	 1.  Calculate secondary  structure  assignment	 for  1ACP  including
	     hydrogen bond information:

				   stride  1acp.brk  -h

	 2.  Calculate secondary structure assignment for 4RUB and write  the
	     output to the file	4rub.str

				stride 4rub.brk	-f4rub.str

	 3.  Calculate secondary structure assignment for chain	 B  of	4RUB.
	     Ignore all	other chains. Generate a Molscript file	4rub.mol.

			      stride 4rub.brk -rb -m4rub.mol

	 4.  Calculate secondary structure assignment for chain	C of 2GLS  in
	     the  presence  of	chains	A  and	B. Report secondary structure
	     summary only.

			       stride 2gls.brk -rabc -cc -o


       6.  Output format


       STRIDE produces output that is easily readable both visually and	 with
       computer	 programs. The side effect of this conveniency is larger file
       size of individual STRIDE entries. Every	record is 79 symbols long and
       has the following general format:

       Position	       Description

       1-3	       Record code
       4-5	       Not used
       6-73	       Data
       74-75	       Not used
       75-79	       Four letter PDB code (if	available)

       Below follows the description of	each record type.


       Code   Description and format of	data

       REM    Remarks and blank	lines

	      Format: free

       HDR    Header. Protein name, date of file creation and PDB code

	      Format: free

       CMP    Compound.Full  name  of	the   molecule	 and   identifying
	      information

	      Format: free

       SRC    Species, organ, tissue, and mutant from which  the  molecule
	      has been obtained

	      Format: free

       AUT    Names of the structure authors

	      Format: free

       CHN    File name	and PDB	chain identifier*).

	      Format: File name	beginning from position	6 followed
		      by one space and one-letter chain	identifier

       SEQ    Amino acid sequence

	      Format:  6-9  First residue PDB number
		      11-60 Sequence
		      62-65 Last residue PDB number

       STR    Secondary	structure summary

	      Format: 11-60 Secondary structure	assignment **)

       LOC    Location of secondary structure elements

	      Format:  6-17 Element name
		      19-21 First residue name
		      32-26 First residue PDB number
		      28-28 First residue chain	identifier
		      36-38 Last residue name
		      42-45 Last residue PDB number
		      47-47 Last residue chain identifier

       ASG    Detailed secondary structure assignment

	      Format:  6-8  Residue name
		      10-10 Protein chain identifier
		      12-15 PDB	residue	number
		      17-20 Ordinal residue number
		      25-25 One	letter secondary structure code	**)
		      27-39 Full secondary structure name
		      43-49 Phi	angle
		      53-59 Psi	angle
		      65-69 Residue solvent accessible area

       DNR    Donor residue

	      Format:  6-8  Donor residue name
		      10-10 Protein chain identifier
		      12-15 PDB	residue	number
		      17-20 Ordinal residue number
		      26-28 Acceptor residue name
		      30-30 Protein chain identifier
		      32-35 PDB	residue	number
		      37-40 Ordinal residue number
		      42-45 N..0 distance
		      47-52 N..O=C angle
		      54-59 O..N-C angle
		      61-66 Angle between the planes of	donor
			    complex and	O..N-C
		      68-73 angle between the planes of	acceptor
			    complex and	N..O=C
       ACC    Acceptor residue

	      Format:  6-8  Acceptor residue name
		      10-10 Protein chain identifier
		      12-15 PDB	residue	number
		      17-20 Ordinal residue number
		      26-28 Donor residue name
		      30-30 Protein chain identifier
		      32-35 PDB	residue	number
		      37-40 Ordinal residue number
		      42-45 N..0 distance
		      47-52 N..O=C angle
		      54-59 O..N-C angle
		      61-66 Angle between the planes of	donor
			    complex and	O..N-C
		      68-73 angle between the planes of	acceptor
			    complex and	N..O=C


       HDR, CMP, SCR and AUT records are directly copied from the  PDB	file,
       if supplied by the authors. If only the secondary structure summary is
       requested,  only	CHN,  SEQ,  STR	 and  LOC  records  will  be  output.
       Hydrogen	 bond  information  (records  DNR  and	ACC)  was  made	 very
       redundant to facilitate human reading and  will	not  be	 reported  by
       default.


       *)  IMPORTANT NOTE: if the protein chain	identifier is '	' (space), it
	   will	be substituted by '-' (dash) everywhere	in the STRIDE output.
	   The same is true  for  command  line	 parameters  involving	chain
	   identifiers where you have to specify '-' instead of	' '.

       **) One-letter secondary	structure code is nearly the same as used  in
	   DSSP	[2] (see Frishman and Argos [1]	for details):

	   H	    Alpha helix
	   G	    3-10 helix
	   I	    PI-helix
	   E	    Extended conformation
	   B or	b   Isolated bridge
	   T	    Turn
	   C	    Coil (none of the above)


       For each	record (data line) except those	with codes REM	and  STR  the
       number  of fields is consistent and is readily suitable for processing
       with external tools, such as awk, perl, etc.


       7.  Bug reports and user	feedback


       Please  send  your  suggestions,	 questions   and   bug	 reports   to
       FRISHMAN@EMBL-HEIDELBERG.DE.   Send   your   contact  address  to  get
       information on updates and new features.


       8.  References


	 1.  Frishman,D	& Argos,P. (1995) Knowledge-based secondary structure
	     assignment.  Proteins:  structure,	function and genetics, 23,
	     566-579.

	 2.  Kabsch,W. & Sander,C. (1983)  Dictionary  of  protein  secondary
	     structure:	   pattern   recognition   of	hydrogen-bonded	  and
	     geometrical features. Biopolymers,	22: 2577-2637.

	 3.  Eisenhaber,  F.  and  Argos,  P.  (1993)  Improved	 strategy  in
	     analytic  surface calculation for molecular systems: handling of
	     singularities and computational efficiency. J. comput. Chem. 14,
	     1272-1280.

	 4.  Eisenhaber, F., Lijnzaad, P., Argos, P., Sander, C., and Scharf,
	     M.	 (1995)	The double cubic lattice method: efficient approaches
	     to	numerical integration of surface area and volume and  to  dot
	     surface contouring	of molecular assemblies. J. comput. Chem. 16,
	     273-284.

	 5.  Bernstein,	F.C., Koetzle, T.F.,  Williams,	 G.J.,	Meyer,	E.F.,
	     Brice,  M.D.,  Rodgers,  J.R., Kennard, O., Shimanouchi, T., and
	     Tasumi, M.	 (1977)	 The  protein  data  bank:  a  computer-based
	     archival  file for	macromolecular structures. J. Mol. Biol. 112,
	     535-542.

	 6.  Kraulis, P.J.  (1991)  MOLSCRIPT:	a  program  to	produce	 both
	     detailed  and  schematic  plots  of protein structures. J.	Appl.
	     Cryst. 24,	946-950.

	 7.  Pearson, W.R. (1990) Rapid	 and  sensitive	 sequence  comparison
	     with FASTP	and FASTA. Methods. Enzymol. 183, 63-98.