Citation

If you use funscoR please cite:

Ochoa et al. The functional landscape of the human phosphoproteome bioRxiv (2019). https://doi.org/10.1101/541656

Installation

First, install funscoR from github. It requires to have devtools installed.

devtools::install_github("evocellnet/funscoR")

Getting started

To get started, fire up the packages and load some sample data.

Datasets

Phosphoproteome

A reference human phosphoproteome is provided in the phosphoproteome object. The data frame contains a list of phosphorylation residues as described below.

acc position residue
A0A075B6Q4 24 S
A0A075B6Q4 35 S
A0A075B6Q4 57 S
A0A075B6Q4 68 S
A0A075B6Q4 71 S
A0A075B6Q4 72 S

A different reference phosphoproteome can be used as starting point if provided in the same format. Beware certain annotations for the reference set of sites might be required to get equivalent performances.

Gold standard

A gold standard of known regulatory sites is required in order to train the model. Parsed annotations from PhosphositePlus (see license) are provided in this package in the object psp.

acc position
O95786 854
Q8TD46 302
P60484 380
Q9UPY3 1016
P35367 142
O43684 19

A different gold standard can be used in the downstream analysis as long as the provided data frame contains the acc and position columns.

Phosphoproteome annotation

An extensive functional annotation for the human phosphoproteome is provided as part of this package. Next you can find a list of all the annotations available.

Item
feature_disopred
feature_disprot
feature_domains
feature_elm
feature_evmut
feature_exac
feature_foldx
feature_hotspots
feature_interfaces
feature_ms_pride
feature_neighPTMs
feature_netphorest
feature_paxdb
feature_proteinlength
feature_ptmdb_age
feature_ptmdb_coregulation
feature_ptmdb_counts
feature_ptmdb_regulation
feature_pwm_match
feature_scratch1Dfeatures
feature_sift_scores
feature_spectral_counts
feature_topology
feature_transitpeptides

Each of the annotation objects can be used indepedently. For example, it’s described next the object that contains the ancestral reconstruction of all available phosphosites. The column w0_mya contains the inferred age of the last common ancestor for the phosphosite. w3_mya contains the equivalent information using a window of +/-3 residues to asses the conservation. More information about the dataset can be found using using ?feature_ptmdb_age.

acc residue position w0_mya w0_ancestor_name w3_mya w3_ancestor_name
P84085 T 4 96 Boreoeutheria 96 Boreoeutheria
P84085 S 6 96 Boreoeutheria 96 Boreoeutheria
P84085 S 10 96 Boreoeutheria 96 Boreoeutheria
P84085 S 103 96 Boreoeutheria 824 Bilateria
P84085 S 137 96 Boreoeutheria 0 Homo sapiens
P84085 S 150 96 Boreoeutheria 96 Boreoeutheria

In order to train a model, you might be interested in annotating the phosphoproteome with all the available features. You can use the annotate_sites function for this.

Model training

A preprocessing step must be run to ensure the features are properly provided to the model. The function preprocess_features defaults to a series of methods but additional tunning can be applied using the methods= and features_to_exclude= arguments. Different preprocessing steps are necessary for “ST” and “Y” residues, as some of the features are exclusive to each of the sets.

Once the features are ready, a model can be trained using a provided gold standard. The default algorithm is a Gradient Boosting Machine with a series of hyperparameters optimized to the default set. Different algorithms and settings can be provided using the parameters= argument. The training process can be parallelized using the doParallel package if the ncores parameter exceeds 1.

Predicting functional scores

Given an annotated phosphoproteome with preprocessed features and a trained model, new functional scores can be predicted for “ST” and “Y” separately.