Tutorial:ExpressionAnalysisGenMAPP-CS

From OpenTutorials
Jump to: navigation, search

This tutorial represents a basic expression data analysis workflow in GenMAPP-CS. The example data used is a subset of a microarray comparison of mouse embryonic stem cells and differentiated embryoid bodies. The data is available here: File:MusMusculus ES-EB 25000.txt.

To download GenMAPP-CS, please visit our website.

Loading data

GenMAPP-CS accepts data in any spreadsheet-like format. The data can be annotated with a identifiers from any of the GenMAPP-CS supported ID systems, which includes gene and protein identifiers from all major databases. The data may include one or more columns of identifiers and may include any number of columns containing experimental data. To get a sense of the format that your data should be in, open the file MusMusculus_ES-EB_25000.txt. The first few lines of the file are as follows:

probeset id	ES avg	        EB avg	        log Fold	Fold	        p-value
1429654_at	9.482740455	2.473083185	-7.00965727	-128.8596946	2.83E-08
1418362_at	11.56708277	5.098371531	-6.468711242	-88.56785302	2.20E-08
1429701_at	10.09899575	4.205961367	-5.893034379	-59.42649463	3.36E-07
1420085_at	7.880958624	1.989590081	-5.891368542	-59.35791625	3.37E-10
  1. The first line contains column headers.
  2. The first column contains Affymetrix probe set identifiers.
  3. The EB vs ES dataset contains average expression values for the two groups, a log fold change and a p-value for the comparison.
  4. All columns are separated by a single tab character.

Select a database

GenMAPP-CS works with a set of species-specific databases. GenMAPP-CS works with downloaded local databases or with remote databases through BridgeDb web services. Before importing data, select the appropriate database for your data. For optimum performance, we recommend downloading the relevant species databases as local files.

  • In the Database section of the Workspaces panel, select the appropriate species for your data.

database

  • If you don't have a local copy of the database, you will see a status message showing you that GenMAPP-CS is connected to the database through web services.
  • To download a local copy of the database, simply click the download arrow button download arrow.
  • Once download is complete, you will see a different status message displaying the name of the local database file.

database

  • If you are working with an outdated local version of the database, the file name will appear in red and the download button arrow will appear green. You can then select to download the latest version by clicking the download button.
  • If you are currently working offline with a local database file, the download button will be replaced by a reload button reload, indicating that there is no connection to web services. If your internet connection status changes, you can retry the connection by clicking the reload button.

Configure database resources

It is possible to customize the database resources that GenMAPP-CS uses. For example, it is possible to add mapping relationships from text files. Configuring the database resources is described in the manual.

Import your data

  • In the Actions drop-down menu, select Import dataset from table....

import data

  • In the GenMAPP-CS Dataset Import manager, click the Select File(s) button and select the file you want to import.
  • Your data file will be displayed in the Preview section of the import manager.

GenMAPP-CS guesses what type of identifier is in the first column of your data (ID column), to use as the basis for mapping your data to the database upon import. However, you have control over which column is used for mapping, as well as several aspects of data import:

  1. To change the column used as the identifiers, simply select a different column from the ID column drop-down. GenMAPP-CS will automatically guess what type of identifier is in the selected column. You should always double-check the type drop-down to make sure that the correct identifier type is selected.
  2. To change the identifier type that corresponds to the identifier selected in the ID column, simply select the appropriate type from the type drop-down.
  3. To exclude specific columns of data from import, click on the column header in the Preview. The check mark icon in the column header will change to an x, indicating that the column will not be imported.
  4. There are also a set of advanced import options available under Show additional options. These options allow you to skip certain rows of data by starting the import on a specific line or excluding rows that begin with a specific pattern. You can also change the delimiter assignment for your file, for example if your data is incorrectly displayed in the Preview due to wrong delimiter assignment.
  • Click Import to start the import. A status bar will appear while the import is in progress and once complete it will let you know how many rows of data were successfully imported and if there were any problems. Your dataset will appear in the Dataset panel in Workspaces.

dataset

Creating coloring criteria

The next step before loading networks is to create custom coloring criteria to view your data on networks. Coloring criteria are organized into Criteria Set which contain one or more logical rules for one or more columns of data. We are going to create a criteria set named "EB vs ES" with two criteria for the ES vs EB comparison, up two-fold and down 2-fold (a 2 fold increase is equivalent to 1 in log space):

up-regulated: log fold > 1 → green
down-regulated: log fold < -1 → blue
  • Select Create new criteria set... from the Actions panel. This will open the Criteria Builder interface:

criteria set

  • Type in a name for the criteria set. Click Enter to open the Criteria Builder interface.
  • In the Criteria section, double-click in the first row of the Label column, where it says "Label1" and replace the label with the label for the first criteria.
  • Click in the first row of the Expression column. This activates the Expression Editor.
  • To create the logical rule, click to select the appropriate attribute in the Attributes list, and select logical operators from the Operators list. For numerical criteria, type in any numerical values.
  • Back in the Criteria section, double-click the first row of the Value column and select a color from the color chooser. The first criteria is now ready.
  • In the Criteria section, click the Add button Add row to add a new row to the criteria list. Repeat the steps above to create any additional criteria.
  • Click Save to save the criteria set and either close out of the Criteria Builder or continue with the next Criteria Set.

The Criteria Builder should now look like this:

criteria builder

Loading networks from WikiPathways

GenMAPP-CS can access pathways directly from WikiPathways, through a search interface where you can search for pathway names or node names. For the "EB vs ES" dataset, we are going to look for pathways involved in pluripotency, using the term "pluripotency" as the search term. The below instructions describe loading of networks from WikiPathways. GenMAPP-CS can also load networks in other formats, described here.

  • To load a network from WikiPathways, select Load network from web... from the Actions drop-down. This brings up an interface for importing networks from the WikiPathways web service.
  • Select the appropriate species in the organism drop-down.
  • Type in your search term in the Search field.
  • Search results will be displayed as a list. Simply double-click on any entry to open the network.
  • To make it easy to load additional networks, the WikiPathways web service interface will remain open until you decide to close it.

load network

  • In the Criteria set panel of Workspaces, right-click on the criteria set and select Apply to All Networks. This will apply the color criteria to both imported networks.

GenMAPP-CS should now look like this:

colored networks

Explore data on networks

Looking at the "Wnt Signaling and Pluripotency" network, there are a couple of interesting observations:

  1. The lower left section of the network involving Nanog is heavily down-regulated in embryoid bodies, indicating a shift away from pluripotency processes.
  2. The lower right section of the network with Jun and Ccnd2 and others factors driving differentiation, is up-regulated.

wnt network zoom

Backpage

GenMAPP-CS gives you access to detailed information about each network node in the Backpage.

  • Click on any node in the network to select it. In the right-side Results Panel, the corresponding Backpage is displayed:

backpage

Starting at the top, the information displayed in the Backpage is:

  1. The node label.
  2. The identifier assigned to the node and the corresponding database name.
  3. All linked identifiers from other databases, including gene and protein identifiers, microarray probes, GO terms and WikiGenes articles.
  4. A table displaying relevant coloring for the node based on the currently selected criteria set. If multiple probes link to a node, the table will display one row for each probe.

Data Panel

The Data Panel at the bottom of the GenMAPP-CS window displays attributes for node, edge and network-level attributes. For network nodes, the data panel can display all or selected columns of any imported dataset. The Data Panel will display data for any nodes that are selected in the network.

  • Select all a few nodes in the network by click-and-drag.
  • At the top of the Data Panel, click the Select All Attributes button select all attributes. This will display all available attributes for the selected nodes.
  • To select a specific set of data columns for display, first click the Unselect All Attributes button unselect all attributes.
  • Click the Select Attributes button select attributes. Use the check boxes to select the attributes for EB avg, ES avg, log fold and p-value.

Looking at the fold changes for the EB vs ES comparison, this confirms that the nodes in this branch of the pathway are all significantly down-regulated in the EB group.

data panel