Welcome to TeaProt!

The online proteomics/transcriptomics analysis pipeline featuring novel underrepresented PTM genesets.

Introduction

TeaProt is an online Shiny tool that integrates upstream transcription factor enrichment analysis with downstream pathway analysis through an easy-to-use interactive interface. TeaProt maps user’s omics data with online databases to provide a collection of annotations on drug-gene interactions, subcellular localizations, phenotypic functions, gene-disease associations and enzyme-gene interactions, usefull for further analyses. Users can combine TeaProt and urPTMdb for a novel and easy-to-use online proteomics/transcriptomics analysis pipeline featuring novel underrepresented genesets to allow the discovery of downstream cellular processes, upstream transcriptional regulation and classes of PTMs potentially regulated by a users’ intervention.

Tutorial

1. Uploading your data

Convert the file to the right format
- accepted formats include '.csv', '.txt', '.xls', '.xlsx'
Make sure your file contains the following types of columns:
- Identifiers (gene names/ UniProt ID/ ENSEMBL ID)
- P-values
- Fold change values (log2)
Click “Download demo data” for clarity
Once the above is checked, press “Browse” to upload your data

2. Preparing for analysis

Select the identifier column from the drop-down box
Select the p-value column from the drop-down box
Select the fold change column from the drop-down box
Choose the type of species of which your data is sourced from
Choose a p-value cut off as a determinant for significance
Choose a (log2) fold change cutoff as a determinant for significance

3. Start the analysis

Press “Start” to initiate analysis

4. View analysis

Press “Analysis” on the sidebar to view the results and annotated datasets

Browser compatibility

OS	version	Chrome	Firefox	Microsoft Edge	Safari
Linux	Ubuntu 20.04.1 LTS	87.0.4280.88	78.0.1	n/a	n/a
MacOS	10.13.6	87.0.4280.67	83.0	n/a	13.1.2
Windows	10	87.0.4280.88	83.0	87.0.664.55	n/a

Contact

For technical support, please email support@coffeeprot.com. To contact the Parker lab, please contact ben.parker@unimelb.edu.au.

Citation

Molendijk J, Yip R, Parker BL. urPTMdb/TeaProt: Upstream and Downstream Proteomics Analysis. J Proteome Res. 2022 Jun 27. doi: 10.1021/acs.jproteome.2c00048. Epub ahead of print. PMID: 35759515.

Acknowledgements

This research was supported by use of the Nectar Research Cloud and by the University of Melbourne Research Platform Services. The Nectar Research Cloud is a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy. This work was funded by an Australian National Health and Medical Research Council Ideas Grant (APP1184363) and The University of Melbourne Driving Research Momentum program.

Human Protein Atlas subcellular localization data was obtained from http://www.proteinatlas.org and has previously been described in Thul PJ et al., A subcellular map of the human proteome. Science. (2017). Drug-gene interaction data was obtained from DGIdb (https://www.dgidb.org/downloads). Genotype-phenotype associations were downloaded from the International Mouse Phenotyping Consortium (IMPC, www.mousephenotype.org). Enzymatic annotations were retrieved from BRENDA (https://www.brenda-enzymes.org/). Disease-Gene annotations were retrieved from DisGeNet (https://www.disgenet.org/). Transcription factor data was downloaded from CHEA3 (maayanlab.cloud/chea3/). The DNA vector image used in the TeaProt banner on the Welcome page was obtained from Vecteezy (Human dna design Vectors by Vecteezy).

Demo data

Inputs

Choose CSV File

Browse...

Choose ID column

Choose p-value column

Choose log2 fold-change column

Choose Species

Choose a p-value cutoff:

Choose a (log2) fold-change cutoff:

urPTMdb

The underrepresented PTM gene-set database.

urPTMdb

urPTMdb is a database of gene-sets covering currently underrepresented post-translational moditications (PTMs). Previously published studies and datasets (PRIDE / MASSIVE) are analyzed to identify substrates or interactions relating to PTMs. We have analyzed the results of 58 studies, generating 141 gene-sets covering 18 underrepresented PTMs. Additionally, we generated pathway gene-sets of the primary enzymes involved in the PTMs, as well as consensus gene-sets where replicate studies were available.

Citation

Code access

The code to generate urPTMdb is accessible at github.com/JeffreyMolendijk/urPTMdb.

Using urPTMdb

urPTMdb is included as an option in the fgsea tab of TeaProt for analysis of your uploaded dataset. Alternatively, urPTMdb can be downloaded for use in external tools by clicking the download button on the right. urPTMdb is provided in ‘.gmt’ format.

Download urPTMdb

Number of studies:	58
Number of PTMs:	18
Number of gene-sets:	141
Filesize:	1,188 KB

urPTMdb is generated by analyzing the genes reported by many studies to create novel PTM-related gene-sets. urPTMdb is provided in three formats, containing either the original identifier, or formats where genes from other species have been converted to the species of interest. It is recommended to download the database for the species you plan to analyze. In TeaProt, the database use is determined by the species selected at the start of the analysis.

urPTMdb Original - Contains the gene identifiers as reported in the original studies
urPTMdb Human - All mouse genes have been converted to human homologs using homologene
urPTMdb Mouse - All human genes have been converted to mouse homologs using homologene

Download urPTMdb - Original Download urPTMdb - Human Download urPTMdb - Mouse

Browse geneset

Select a geneset

Jaccard
Szymkiewicz–Simpson
Geneset network

Gene-set Jaccard index network of urPTMdb. The Jaccard index indicates the similarity between two gene-sets, where the connected nodes have an index > 0.15. For more information regarding this metric, please visit https://en.wikipedia.org/wiki/Jaccard_index.

Gene-set Szymkiewicz–Simpson coefficient network of urPTMdb. The Szymkiewicz–Simpson coefficient indicates the similarity between two gene-sets, where the connected nodes have a coefficient > 0.6. For more information regarding this metric, please visit https://en.wikipedia.org/wiki/Overlap_coefficient.

About the Table

User-uploaded input data is annotated with information from various sources. The annotated table contain information of:

Drug Interaction
Cell ontology
Associated disease

Export options are available at the bottom of the table

Annotated table

About the Analysis

Analysis are performed to analyze the p-values and fold-changes of your data.

Bar graphs that show the distribution of p-values and fold-changes in the data
Volcano plot that shows the fold-changes and corresponding p-values of each data point

(Hover onto each data point to view the exact values)

Distributions

Volcano plot

About the Analysis

Your data is mapped with online databases to provide annotations. For each sets of graphs below, your data is mapped to a different database. The first graph of each section displays the number of genes that could be annotated by the mapped database. The second graph displays the annotation

Drug-gene interaction

Subcellular localization

IMPC procedure

DisGeNet disease

BRENDA enzymatic reactions

About the Analysis

Analyses are performed to demonstrate the changes in gene expressions in relation to several annotations including (1) subcellular localization, (2) DisGeNet, (3) Drug-gene interactions and (4) International Mouse Phenotyping Consortium interactions. A Pearson’s Chi-squared test based on protein annotations (subcellular localization) indicates whether specific annotations are primarily found in upregulated, downregulated or non-significant (NS) proteins. Only localizations with positive residuals in the upregulated group are shown. The data in the figure is colored by Pearson residuals, and sized by the absolute Pearson residuals.

Subcellular Localizations

DisGeNet

Drug-gene interactions

IMPC genotype-phenotype Associations

About the Analysis

This analysis is dependent on the fold-change values in your data. The graph displays the most enriched biological pathways that are associated with the differential expressions. In the input section, choose the geneset collection that you want the analysis to be based on. After running the analysis, the results will be displayed in the following tabs:

panel: Image showing the top x positively and negatively enriched pathways
table: Table showing all fgsea results
volcano: Volcano plot showing the p-value and NES of each tested geneset
single: Tab showing fgsea enrichment and coloured volcano plot for a single geneset of interest

input

Choose gene-sets

Choose Number of Pathways to display