From OpenTutorials
Jump to: navigation, search

This tutorial presents an analysis workflow for combining proteomics and transcriptomics data in GenMAPP-CS, showing how two different datasets that don't correlate at the protein level can be integrated and analyzed in the context of pathways. The data used for this tutorial is from a study examining both protein abundance (mass-spec) and transcript abundance (expression array) during maturation of human dendritic cells. The data is described in detail here.

To download GenMAPP-CS, please visit our website.

Loading data

GenMAPP-CS allows import of any number of datasets. The datasets can be annotated with different identifiers and can contain different data types etc. In this case, there are two datasets, both annotated with Entrex gene identifiers:

Transcriptomics data

The transcript-level data is from the Affymetrix expression arrays (HG U133 Plus 2.0), and has been summarized at the gene level and annotated with Entrez gene IDs. The data looks like this:

Entrez_ID     mRNA ratio mDC/iDC
1	      1.47
2	      0.31
9	      0.64
10	      1

The expression data is available for download here: File:Buschow mRNA.txt.

Proteomics data

Protein abundance data was collected using mass spectrometry, and after peptide identification the data was converted to IPI and Entrez gene identifiers using the biomaRt Bioconductor package. The data looks like this:

Entrez_ID	ensembl_Id	ipi_id	        empai_immature	empai_mature	protein ratio mDC/iDC	result_type	Detected in:
2	        ENSG00000175899	IPI00478003.1	0.57	        0.03	        0.05	                estimated	iDC
9	        ENSG00000171428	IPI00644361.2	0.13	        0.15	        1.22	                estimated	iDC
16	        ENSG00000090861	IPI00027442.4	0.73	        0.93	        1.27	                detected	both
22	        ENSG00000131269	IPI00306748.1	0.06	        0.12	        1.91	                estimated	mDC

The proteomics data is available for download here: File:Buschow protein.txt.

  • In the Database panel of Workspaces, select the human database, since the data for this tutorial is from human dendritic cells. Detailed instructions on how to select a database are available in the Expression Analysis tutorial.
  • Import both data files consecutively under Import dataset from table... in the Actions menu. For detailed instructions on how to use the GenMAPP-CS Dataset Import, refer to the Expression Analysis tutorial.

Creating coloring criteria

  • Create two Criteria Sets, one for the expression data and one for the proteomics data, with the same cutoffs for the ratio for both data types:
Expression data
mRNA up 2-fold: mRNA ratio mDC/iDC > 2 → orange
mRNA down 2-fold: mRNA ratio mDC/iDC < 0.5 → light blue
Proteomics data
protein up 2-fold: protein ratio mDC/iDC > 2 → red
protein down 2-fold: protein ratio mDC/iDC < 0.5 → blue

The two criteria should look like this:

criteria protein criteria mRNA

Over-representation analysis with GO-Elite

  • Setup and run GO-Elite for the "Protein up 2-fold" criteria. Make sure to indicate "EntrezGene" as Primary ID System. Note: GO-Elite analysis currently works on one criteria at a time. Support for selecting all criteria for analysis is being added. The GO-Elite interface should look like this:

GO-Elite combined

  • Repeat GO-Elite analysis for the down-regulated criteria, and then for both criteria in the "mRNA" criteria set.

For details on how to run GO-Elite, please refer to the GO-Elite tutorial.

Once the GO-Elite analysis is complete, the Results Panel will have 4 tabs, one for each criteria in the two criteria sets. Comparing the pathway results for the 2-fold up criteria in the two datasets, we see the following lists:

GO-Elite combined data protein GO-Elite combined data mrna

Comparing the lists of pathways and GO terms between the two datasets, we can see the following:

  1. For 2-fold up, several of the top pathway hits are identical between the two datasets.
  2. Several of the up-regulated pathways represent the same processes as those identified in the publication, which include several known DC maturation markers (CD86, HLA I and II).
  3. The GO results for the up-regulated transcripts and proteins reveal many immune-related processes, such as cytokine- and inflammatory-related processes.

Explore data on networks

  • Open the "Toll-like receptor signaling pathway" pathway by clicking on the pathways in the GO-Elite results list.
  • In the Criteria Set panel of Workspaces, select either of the two criteria sets, right-click and select Combine All Criteria.

Toll-like receptor signaling pathway

As noted in the related publication, there is not much overall correlation between differential mRNA and protein expression at the level of individual genes and proteins. However, the two data types are complimentary when analyzed in the context of pathways.