The installation instructions assume you are using a bash
shell (you can find out by typing echo $SHELL
). If you aren't, first run /bin/bash
before proceeding. You should have Python 2.4
and a compiler installed. FEATURE has been tested and compiles on
Solaris, Mac OS X, Linux, and Cygwin. It does not compile (yet) on SGI
Irix.
I recommend installing the subversion client with SSL to retrieve the latest packages. The SSL part is important because SimTK.org uses the https URL scheme which requires SSL. TortoiseSVN is a fantastic client for Windows, seamlessly integrated into Explorer. You can also download subversion binaries for Mac OS X. You can also use Fink and install the svn-client-ssl
package, but you may run into dependency issues. For the other operating systems, you can check here to see if a pre-compiled binary is available. Otherwise you'll have to build it yourself from source.
mkdir $HOME/Projects cd $HOME/Projects
This will create a directory called Projects
in your
home directory. The following commands will be working from this
directory, so you have to change to the directory after creating it.
FEATURE is available through the subversion interface using the svn client. The following will create a feature
directory and retrieve the packages from the current release of feature. Change to the feature
directory after svn is finished.
svn checkout https://simtk.org/svn/feature/releases feature cd feature
If you don't have subversion (svn), or can't install it, then fetch the packages directly from the release (using wget, curl, or some web browser):
mkdir feature cd feature wget https://simtk.org/frs/download.php/33/feature-1.9.tar.gz wget https://simtk.org/frs/download.php/32/tools-0.1.tar.gz
The feature-1.9
package contains the source code for FEATURE and will be built in the next step. The tools-0.1
package is a collection of scripts that will be useful for working with FEATURE. After retrieving the packages, unpack them:
gunzip -c feature-1.9.tar.gz | tar xvf - gunzip -c tools-0.1.tar.gz | tar xvf -
Each of the packages are packaged using tar
and compressed with gzip
. So the above commands should unpack the packages and put them in their respective directories.
Because FEATURE is written in C, it needs to be compiled. The following command will change to the FEATURE directory, configure it for your system, and build the binaries
cd feature-1.9 ./configure make cd ..
After FEATURE is built, you need to set a few environment variables so that the programs can find their data files. First, set FEATURE_DIR
so that FEATURE can find its data files. Then, add the location of FEATURE and the tools/bin to your PATH
. Finally, add the tools/lib to your PYTHONPATH
so that the python scripts can find their friends. If you had unpacked the packages under $HOME/Projects
, the following will configure your environment (for bash
).
export FEATURE_DIR=$HOME/Projects/feature-1.9 export PATH=${PATH}:$HOME/Projects/feature-1.9:$HOME/Projects/tools-0.1/bin export PYTHONPATH=${PYTHONPATH}:$HOME/Projects/tools-0.1/lib
If you are using csh
or tcsh
as a shell, you should do the following:
setenv FEATURE_DIR $HOME/Projects/feature-1.9 setenv PATH ${PATH}:$HOME/Projects/feature-1.9:$HOME/Projects/tools-0.1/bin setenv PYTHONPATH ${PYTHONPATH}:$HOME/Projects/tools-0.1/lib
If you want this environment change to persist across logins, append the lines to your $HOME/.bash_profile
(if you are using bash
), or $HOME/.cshrc
(if you are using csh
or tcsh
).
FEATURE works on protein structure files. The atomic coordinates of
the proteins are available from the Protein Data Bank (PDB), http://www.rcsb.org. You can download the full repository from their FTP site,
or you can search for an individual file using their query form, and
download the PDB file. FEATURE also uses data derived from DSSP (http://www.cmbi.kun.nl/gv/dssp/). The DSSP files can be created from the PDB files using dsspcmbi
.
Sample PDB and DSSP databases for the following example are available in the structure package. If you used subversion to fetch the release, the packages are already downloaded. Otherwise you can fetch them with:
wget https://simtk.org/frs/download.php/35/pdb.tar.gz wget https://simtk.org/frs/download.php/34/dssp.tar.gz
When you have downloaded the sample database, unpack them with:
gunzip -c pdb.tar.gz | tar xvf - gunzip -c dssp.tar.gz | tar xvf -
Finally, setup the environment so that FEATURE knows where to find the files (for bash
):
export PDB_DIR=$HOME/Projects/pdb export DSSP_DIR=$HOME/Projects/dssp
If you are using csh/tcsh you should use:
setenv PDB_DIR $HOME/Projects/pdb setenv DSSP_DIR $HOME/Projects/dssp
If you want this environment change to persist across logins, append the lines to your $HOME/.bash_profile
(if you are using bash
), or $HOME/.cshrc
(if you are using csh
or tcsh
).
Example for serine protease
The example data files are already retrieved if you used subversion to get the release earlier. Otherwise, you can get the examples with:
wget https://simtk.org/frs/download.php/31/examples-0.2.tar.gz
After getting the examples, unpack them and change to the serine protease example directory:
gunzip -c examples-0.2.tar.gz | tar xvf - cd examples-0.2/serprot/
The example directory contains data files for building a serine
protease model. These data files contain the positive training set trypsin_ser_og.pos.ptf
point file and the negative training set trypsin_ser_og.neg.ptf
point file. To create a model, the microenvironments of these points need to be characterized using featurize
:
featurize -P trypsin_ser_og.pos.ptf > trypsin_ser_og.pos.ff featurize -P trypsin_ser_og.neg.ptf > trypsin_ser_og.neg.ff
After characterizing the environment, create a model using buildmodel
buildmodel trypsin_ser_og.pos.ff trypsin_ser_og.neg.ff > trypsin_ser_og.model
Once a model is built, you can use it to identify locations with similar environments on another protein structure. For instance 1bqy is a serine protease that was not in the original training set. You can also try 1ufo, an unidentified structural genomics target (but you'll have to get the PDB and DSSP file yourself as an exercise).
To scan a protein structure, you have to first select the residues and atoms that were used in the model. In this case, it is the SER residue and the OG atom:
atomselector.py -r ser -a og 1bqy > 1bqy_ser_og.ptf
That will create a point file which can then be fed into featurize
to characterize the microenvironments:
featurize -P 1bqy_ser_og.ptf > 1bqy_ser_og.ff
You can then use the model you built earlier to score all the environments and then sort the results in ascending order.
scoreit -a trypsin_ser_og.model 1bqy_ser_og.ff > 1bqy_ser_og.hits sort -k2 -n 1bqy_ser_og.hits > 1bqy_ser_og.hits.sorted
You can see the top 5 scoring hits in 1bqy by using:
tail -5 1bqy_ser_og.hits.sorted
You should see the highest scoring hits. The first field is just a unique label for the site. The second field is the score for the site. The next three fields are the x,y,z coordinates for the hit. The last field shows the residue. SER195 is the catalytic serine for 1bqy.
Env_1bqy_0 -110.586184 11.959 25.312 62.893 # SER29:A@OG Env_1bqy_32 -36.637977 37.557 8.635 139.522 # SER214:B@OG Env_1bqy_15 -28.324812 22.436 37.11 53.493 # SER214:A@OG Env_1bqy_31 149.060025 35.096 9.444 145.234 # SER195:B@OG Env_1bqy_14 176.033218 19.642 39.631 59.232 # SER195:A@OG