The CleavPredict webserver can be accessed at http://cleavpredict.sanfordburnham.org/ for the online prediction of protease substrates and their corresponding cleavage sites from Uniprot ID, Fasta sequence, PDB file and PDB ID. At present, CleavPredict can predict the substrate cleavage sites for 11 different matrix metalloproteinases (MMPs). CleavPredict uses position weight matrix to predict substrate cleavage positions. These substrate cleavage predictions are integrated with other structural features (Secondary structure, Disorder region and etc.). They are also integrated to SNPs, PTMs, Colocalization, Coexpression and other informations.


Query type

The CleavPredict webserver can process four types of query to predict the substrate cleavage sites. The first step is to select one of the matrix metalloproteinases and then enter either of four query types (Uniprot ID, Fasta Sequence, PDB file or PDB ID). Users have to be careful in choosing valid query type i.e valid Uniprot ID, fasta format sequence or sequence, correct PDB file and 4 letter PDB ID code.

It has Batch mode query method under which user can submit a file containing fasta format sequence or multiple PDB files against specific protease. User need to provide email address for sending results. This method will predict substrates cleavage sites along with structural features. Example of batch mode input files are:

Fasta input file: test.fasta
>fasta1
MVKVYAPASSANMSVGFDVLGAAVTPVDGALLGDVVTVEAAETFSLNNLGRFADKLPSEP
RENIVYQCWERFCQELGKQIPVAMTLEKNMPIGSGLGSSACSVVAALMAMNEHCGKPLND
TRLLALMGELEGRISGSIHYDNVAPCFLGGMQLMIEENDIISQQVPGFDEWLWVLAYPGI
>fasta2
MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDHGWWKQH
YEWRGNRWHLHGPPPPPRHHKKAPHDHHGGHGPGKHHR


PDB input files: 3N85.pdb, 4J72.pdb, etc


Primary recognition of the cleavage sites

The method used for predicting primary recognition of the cleavage sites is Positional Weight Matrices (PWM). Each protease's PWMs have been determined by statistical analysis of large numbers of data obtained from high throughput phage display experiment [1,2]. This method was tested using the 10-fold cross-validation approach and achieves high accuracy (>90%) and true positive rate (>80%) as well as false positive rate (< 5%) when tested on phage substrates.

Prediction of structural features

Along with substrate cleavage sites prediction, structural features (Secondary structure, Disorder region, Solvent accessibility, and Transmembrane domain) have been also determined. Both are integrated and shown together which influences proteolysis[3]. In the case of fasta and uniprot query, secondary structure and disorder have been calculated by using Jnet, Disopred respectively. Jnet and Disopred confidence score in the scale of 0-9 are shown in result. With the PDB query, seondary structure and solvent accessibility have been calculated by using DSSP program. The results of secondary structure from DSSP program has been converted into three larger classes: helix (G, H and I), strand (E and B) and loop (all others). In both query system, transmembrane domain has been calculated by using TMHMMM program.

Prediction of Signal peptide and linked with Co-expression, Co-localization, Sub-cellular location, SNPs and PTMs database.

Presence of signal peptide in a query has been determined by SignalP program[4]. Discrimination score is shown to discriminate signal peptides from non-signal peptides and Dmaxcut-off score which is a combined value from both Signal-peptide and cleavage site prediction networks. The COXPRESdb database[5] has been used to determine specific protease and substrate coexpression. Avg Rank is presented which is based on correlation score and Avg coexp. score which is average correlation score between substrate and protease in gene expression pattern. Mentha database[6] has been linked to determine interactions between protease and substrate. Each interactions are assigned as a reliability score (Mentha Score) that takes into account all the supporting evidence. Sub-cellular location and Single Polymorphism Neucleotide (SNP) have been retrieved from uniprot. The experimental known Post Translational Modification (PTM) of substrate is deterimed by using curated dbPTM database[7].

Results

Uniprot query results: All predicted potential cleavage sites are shown along with disorder, secondary structures, and transmembrane domain in tabulated form. The VMS button beneath cleavage sites table, which can be used to predict Virtual Mass Spectroscopy spectrum to display all possible mass fragments after proteolysis. Other informations related to chosen protease and given substrate like signal peptide, sub-cellular locations, coexpressions, interactions, SNPs, and PTMs are shown in subsequent tables. All predicted cleavage sites, SNPs, and PTMs are marked and displayed in sequence format as well as in PDB structure if available in GLmol viewer.

Fasta query results: For fasta sequence query, blast results against swissprot database are shown in tabulated form, from which appropriate uniprot id can be used for predicting cleavage sites. On the other hand, users have option to continue without selecting uniprot id in which other informations like sub-cellular locations, coexpressions, interactions, SNPs, and PTMs are not calculated. In this case, only cleavage sites are marked in given sequence and also corresponding pdb structures are shown from blast if available. These PDB structures are displayed with marked cleavage sites in GLmol viewer.

PDB file query results: For uploded pdb file, all predicted potential cleavage sites are displayed in tabulated form in which secondary structure and solvent accessibility are shown which have been calculated by DSSP program. The presence or absence of signal peptide are shown in subsequent table. All cleavage sites are marked and displayed in pdb sequence as well as structure is displayed with marked cleavage sites. Below this, blast results are shown with corresponding uniprot id. Using this uniprot id users can predict cleavage sites along with other informations like sub-cellular locations, coexpressions, interactions, SNPs, and PTMs.

PDB id query results: For a given PDB id, all predicted potential cleavage sites are displayed in tabulated form in which secondary structure and solvent accessibility are shown which have been calculated by DSSP program. Other informations like signal peptide, sub-cellular locations, coexpressions, interactions, SNPs, and PTMs of chosen protease and a given PDB id are shown in subsequent tables. All cleavage sites, and found SNPs, PTMs are marked and displayed in uniprot sequence of corresponding pdb id. The uniprot sequence of corresponding pdb id is used to display because SNPs and PTMs informations have come from uniprot. The PDB structure is also displayed using GLmol viewer which is marked with cleavage sites, SNPs, and PTMs (if SNPs and PTMs positions found in PDB).

References

1. Ratnikov, B.I., Cieplak, P. and Smith, J.W. (2009) High throughput substrate phage display for protease profiling. Methods Mol Biol, 539, 93-114.

2. Ratnikov, B.I., Cieplak, P., Gramatikoff, K., Pierce, J., Eroshkin, A., Igarashi, Y., Sun, Q., Godzik, A., Osterman, A.L., Stec, B. et al. (2013) Basis for Substrate Recognition and Distinciton by Matrix Metalloproteinases. P.N.A.S., submitted.

3. Kazanov, M.D., Igarashi, Y., Eroshkin, A.M., Cieplak, P., Ratnikov, B., Zhang, Y., Li, Z., Godzik, A., Osterman, A.L. and Smith, J.W. (2011) Structural determinants of limited proteolysis. Journal of proteome research, 10, 3642-3651.

4. Thomas Nordahl Petersen, Søren Brunak, Gunnar von Heijne, Henrik Nielsen (2011) SignalP 4.0: discriminating signal peptides from transmembrane regions. Nature Methods, 8:785-786

5. Obayashi T, Okamura Y, Ito S, Tadaka S, Motoike IN, Kinoshita K. (2013) COXPRESdb: a database of comparative gene coexpression networks of eleven species for mammals. Nucleic Acids Res. 41

6. Alberto Calderone, Luisa Castagnoli, Gianni Cesareni (2013) mentha: a resource for browsing integrated protein-interaction networks Nature Methods 10, 690

7. C.T. Lu, K.Y Huang, M.G. Su, T.Y. Lee, N.A. Bretana, W.C. Chang, Y.J. Chen, Y.J. Chen and H.D. Huang. (2013) DbPTM 3.0: an informative resource for investigating substrate site specificity and functional association of protein post-translational modifications. Nucleic Acids Research, Vol. 41, D295-305.