FEATURE Documentation

Installation

The installation instructions assume you are using a bash shell (you can find out by typing echo $SHELL). If you aren't, first run /bin/bash before proceeding. You should have Python 2.4 and a compiler installed. FEATURE has been tested and compiles on Solaris, Mac OS X, Linux, and Cygwin. It does not compile (yet) on SGI Irix.

I recommend installing the subversion client with SSL to retrieve the latest packages. The SSL part is important because SimTK.org uses the https URL scheme which requires SSL. TortoiseSVN is a fantastic client for Windows, seamlessly integrated into Explorer. You can also download subversion binaries for Mac OS X. You can also use Fink and install the svn-client-ssl package, but you may run into dependency issues. For the other operating systems, you can check here to see if a pre-compiled binary is available. Otherwise you'll have to build it yourself from source.

Optional step: Create project directory, e.g.:

mkdir $HOME/Projects
cd $HOME/Projects

This will create a directory called Projects in your home directory. The following commands will be working from this directory, so you have to change to the directory after creating it.

Step 1: Get FEATURE from repository:

FEATURE is available through the subversion interface using the svn client. The following will create a feature directory and retrieve the packages from the current release of feature. Change to the feature directory after svn is finished.

svn checkout https://simtk.org/svn/feature/releases feature
cd feature

If you don't have subversion (svn), or can't install it, then fetch the packages directly from the release (using wget, curl, or some web browser):

mkdir feature
cd feature
wget https://simtk.org/frs/download.php/33/feature-1.9.tar.gz
wget https://simtk.org/frs/download.php/32/tools-0.1.tar.gz

The feature-1.9 package contains the source code for FEATURE and will be built in the next step. The tools-0.1 package is a collection of scripts that will be useful for working with FEATURE. After retrieving the packages, unpack them:

gunzip -c feature-1.9.tar.gz | tar xvf -
gunzip -c tools-0.1.tar.gz | tar xvf -

Each of the packages are packaged using tar and compressed with gzip. So the above commands should unpack the packages and put them in their respective directories.

Step 2: Build feature:

Because FEATURE is written in C, it needs to be compiled. The following command will change to the FEATURE directory, configure it for your system, and build the binaries

cd feature-1.9
./configure
make
cd ..

Step 3: Configure environment:

After FEATURE is built, you need to set a few environment variables so that the programs can find their data files. First, set FEATURE_DIR so that FEATURE can find its data files. Then, add the location of FEATURE and the tools/bin to your PATH. Finally, add the tools/lib to your PYTHONPATH so that the python scripts can find their friends. If you had unpacked the packages under $HOME/Projects, the following will configure your environment (for bash).

export FEATURE_DIR=$HOME/Projects/feature-1.9
export PATH=${PATH}:$HOME/Projects/feature-1.9:$HOME/Projects/tools-0.1/bin
export PYTHONPATH=${PYTHONPATH}:$HOME/Projects/tools-0.1/lib

If you are using csh or tcsh as a shell, you should do the following:

setenv FEATURE_DIR $HOME/Projects/feature-1.9
setenv PATH ${PATH}:$HOME/Projects/feature-1.9:$HOME/Projects/tools-0.1/bin
setenv PYTHONPATH ${PYTHONPATH}:$HOME/Projects/tools-0.1/lib

If you want this environment change to persist across logins, append the lines to your $HOME/.bash_profile (if you are using bash), or $HOME/.cshrc (if you are using csh or tcsh).

Step 4: Create the structure databases:

FEATURE works on protein structure files. The atomic coordinates of the proteins are available from the Protein Data Bank (PDB), http://www.rcsb.org. You can download the full repository from their FTP site, or you can search for an individual file using their query form, and download the PDB file. FEATURE also uses data derived from DSSP (http://www.cmbi.kun.nl/gv/dssp/). The DSSP files can be created from the PDB files using dsspcmbi.

Sample PDB and DSSP databases for the following example are available in the structure package. If you used subversion to fetch the release, the packages are already downloaded. Otherwise you can fetch them with:

wget https://simtk.org/frs/download.php/35/pdb.tar.gz
wget https://simtk.org/frs/download.php/34/dssp.tar.gz

When you have downloaded the sample database, unpack them with:

gunzip -c pdb.tar.gz | tar xvf -
gunzip -c dssp.tar.gz | tar xvf -

Finally, setup the environment so that FEATURE knows where to find the files (for bash):

export PDB_DIR=$HOME/Projects/pdb
export DSSP_DIR=$HOME/Projects/dssp

If you are using csh/tcsh you should use:

setenv PDB_DIR $HOME/Projects/pdb
setenv DSSP_DIR $HOME/Projects/dssp

If you want this environment change to persist across logins, append the lines to your $HOME/.bash_profile (if you are using bash), or $HOME/.cshrc (if you are using csh or tcsh).

Examples

Example for serine protease

Step 0: Get the examples

The example data files are already retrieved if you used subversion to get the release earlier. Otherwise, you can get the examples with:

wget https://simtk.org/frs/download.php/31/examples-0.2.tar.gz

After getting the examples, unpack them and change to the serine protease example directory:

gunzip -c examples-0.2.tar.gz | tar xvf -
cd examples-0.2/serprot/

Step 1: Build model

The example directory contains data files for building a serine protease model. These data files contain the positive training set trypsin_ser_og.pos.ptf point file and the negative training set trypsin_ser_og.neg.ptf point file. To create a model, the microenvironments of these points need to be characterized using featurize:

featurize -P trypsin_ser_og.pos.ptf > trypsin_ser_og.pos.ff
featurize -P trypsin_ser_og.neg.ptf > trypsin_ser_og.neg.ff

After characterizing the environment, create a model using buildmodel

buildmodel trypsin_ser_og.pos.ff trypsin_ser_og.neg.ff > trypsin_ser_og.model

Step 2: Test model

Once a model is built, you can use it to identify locations with similar environments on another protein structure. For instance 1bqy is a serine protease that was not in the original training set. You can also try 1ufo, an unidentified structural genomics target (but you'll have to get the PDB and DSSP file yourself as an exercise).

To scan a protein structure, you have to first select the residues and atoms that were used in the model. In this case, it is the SER residue and the OG atom:

atomselector.py -r ser -a og 1bqy > 1bqy_ser_og.ptf

That will create a point file which can then be fed into featurize to characterize the microenvironments:

featurize -P 1bqy_ser_og.ptf > 1bqy_ser_og.ff

You can then use the model you built earlier to score all the environments and then sort the results in ascending order.

scoreit -a trypsin_ser_og.model 1bqy_ser_og.ff > 1bqy_ser_og.hits
sort -k2 -n 1bqy_ser_og.hits > 1bqy_ser_og.hits.sorted

You can see the top 5 scoring hits in 1bqy by using:

tail -5 1bqy_ser_og.hits.sorted

You should see the highest scoring hits. The first field is just a unique label for the site. The second field is the score for the site. The next three fields are the x,y,z coordinates for the hit. The last field shows the residue. SER195 is the catalytic serine for 1bqy.

Env_1bqy_0      -110.586184     11.959  25.312  62.893  #       SER29:A@OG
Env_1bqy_32     -36.637977      37.557  8.635   139.522 #       SER214:B@OG
Env_1bqy_15     -28.324812      22.436  37.11   53.493  #       SER214:A@OG
Env_1bqy_31     149.060025      35.096  9.444   145.234 #       SER195:B@OG
Env_1bqy_14     176.033218      19.642  39.631  59.232  #       SER195:A@OG