Tutorial:Comparative Analysis of Fitness and Expression Datasets
Slideshow Comparative_Analysis_of_Fitness_and_Expression_Datasets (about 40 minutes)
Handout Comparative_Analysis_of_Fitness_and_Expression_Datasets.pdf (6 pages)
Tutorial Curators Anna Kuchinsky
Cells respond to fatty acid exposure by metabolic reorganization (including upregulation of genes involved in fatty acid metabolism) and structural changes (including proliferation of peroxisomes, which involves proteins called peroxins). You have generated two datasets to measure the response of yeast cells to fatty acids:
- Fitness Dataset: You have applied a genome-wide screen to identify nonessential yeast genes necessary for efficient metabolism of fatty acids.
- Expression Dataset: You have identified nonessential yeast genes that are transcriptionally responsive to fatty acid exposure by microarray expression analysis in the presence and absence of fatty acids.
You are interested in integrating the datasets for systems‐level analyses to understand the response of yeast to fatty acid exposure. Data integration is an important aspect of these types of analyses. Many integration approaches expect that datasets measuring phenotypes in response to the same stimulus should identify the same molecular players and therefore differences between the datasets are due to false positives and negatives. You wish to comparatively analyse the datasets to help you decide how they should be integrated. Are the datasets similar or complementary? How are they similar or different? What does each dataset measure? How should they be integrated? For further reading, see companion article.
Cytoscape 2.6 network visualization and analysis software will be used to analyze the data and identify potential trends that can be further investigated by statistical analysis of the datasets. Specifically, the fitness and expression datasets will each be integrated with a protein‐metabolite interaction network3 and other datasets in the form of gene attributes, and will then be comparatively analyzed with each other. Cerebral v.2 plug‐in4, a cell‐region based rendering and layout tool developed at the University of British Columbia, will be used to obtain a more advanced spatial organization of the data. Each network will be organized based on cellular compartment of the genes and the biological process they are involved with.
Description of Datasets (all tab delimited text files)
We consider an interaction between a gene and a metabolite to exist if the gene product is known to directly use or produce the metabolite. This is a network file that has all yeast gene‐metabolite interactions (in the format gene name [TAB] pm [TAB] metabolite name). It also contains the names of all yeast genes in column 1 (even if they do not have an interaction with a metabolite).
This file contains the gene names in column one and attributes of the gene in successive columns. Attributes can be either categorical variable (e.g. GO terms) or numerical variables that can either be discrete (e.g. 0 or 1) or continuous (e.g. expression ratios). This file can be used in numerous ways to manipulate or add visual information to the network.
This file is a multi‐experiment gene expression dataset. Column 1 contains gene names, column 2 contains gene descriptions, the next 8 columns are log10 expression ratios for 8 microarray experiments, 6 of which are a time course dataset (half, one, three, six, nine, and o_g, correspond to ½ h, 1 h, 3 h, 6 h, 9 h and 26 h after growth in oleate compared to growth in glycerol medium). The next 8 columns are lambda significance values reflecting significance of differential expression. Data are given for the ~3000 genes that had significant differential expression for at least one of the 8 experiments.
This is a list of 216 genes that are necessary for robust growth of yeast on fatty acids as a sole carbon source.
This is a list of 169 genes that change in expression in response to fatty acid exposure based on analysis of two time course gene expression datasets measuring the response of yeast to fatty acid.
Part A: Generation and analysis of two sub networks using Cytoscape
- Open Cytoscape 2.6.2: If Cytoscape icon is on the desktop, double click to open. If not, go to start\my computer\local disk\program files\cytoscape _v2.6.2 directory and double click cytoscape.bat. Make sure you open version 2.6.2 and not an older version.
- Load the metabolite‐gene network into Cytoscape: From text tool bar at top choose File/import/network (multiple file types) then select Metabolite_gene_interactions.sif from C:\cytoscape_demo files directory. Data name and size (6962 nodes, 2,092 edges) are in the control panel (left). The network is highlighted red because it is very large and therefore not displayed. To display the network, right click the network name and choose “create view”. Nodes in new network can be laid out by clicking “layout” on the text tool bar at the top /yfiles/organic is a nice layout.
- If you wish to see details, zoom in using the zoom tool on the graphic tool bar at the top. (In this version of Cytoscape, large networks are not shown in detail until you zoom in. This allows for larger datasets to be loaded into the program). Then restore the view by clicking the restore 1:1 magnifying glass icon in the graphic tool bar.
- Load node attribute table: Select File/import/attribute from table (text/MS excel) and then select “select file(s)” next to input file and choose “numerical_attributes.txt” from the C:\cytoscape_demo_files directory. Before importing, we want to specify file type and attribute names. To do this, first click “show text file import options” under the heading “advanced”. Delimiter should be “tab” and attribute names should be “transfer first line as attribute names”. Now click “import” in lower right corner. 3595 entries
- To view the attributes, go to the data panel at the bottom of the screen. Click on the first “select attributes” icon (see image below). Then, select attributes to view by clicking the box next to the attribute. After this is done, genes that are selected/highlighted in the network will appear in this data panel with attributes.
- Visualize expression data: You can visualize different node attributes in the network using the “set visual style” icon or by clicking the VizMapper tab in the control panel at the right. For example, for visualizing gene expression data as node color, first set the default node color to white by double clicking the display nodes under defaults in the VizMapper in the control panel. Choose node_fill_color and set it to white. Expand node visual mapping (by clicking the adjacent + sign) and then expand node color (click the adjacent + sign). Choose expression_oleate as node color, choose continuous mapping as mapping type and then chose display colors for expression ratios. Here we will set minimum to red, maximum to green and 0 (no change in expression) to white.
- Make a sub network of genes with fitness defect and interacting metabolites: To do this, choose the parent network (metabolite_gene_interaction) in the control panel and then chose select/nodes/from file from the text toolbar along the top of the screen. Choose “fitness.sif”. The description of the network at the left shows that 216 nodes are selected. To choose interacting metabolites, choose select/nodes/first neighbors of selected nodes from text toolbar along the top of the screen. There should now be 264 nodes selected from the network (at the left). To make a new network from this selection, choose file/new/network/from selected nodes, all edges. Rename this network to “fitness” by right clicking the network name in the control panel at the left and then choosing edit network title.
- Customize the visual display of the new network: Nodes in new network can be laid out by clicking “layout” on the text tool bar at the top /yfiles/organic is a nice layout. To differentiate between metabolites and genes, Change the node shape of the metabolites using the node attribute file: choose VizMapper in the control panel at the left and under unused properties find node shape (unused properties may need to beexpand first by clicking the adjacent + sign). Double click beside node shape see available properties (each property is a column of the node attribute or expression Choose metab (a column that has 0 for genes and 1 for metabolites), then choose discrete mapping for mapping type. Set 0 to Ellipse and 1 to diamond. You can zoom in to see gene names.
- Make a sub network of genes of the expression dataset and interacting metabolites: To do this, choose the parent network (metabolite_gene_interaction) in the control panel and deselect all selected nodes by left clicking a blank space in the network. Next, choose select/nodes/from file from the text toolbar along the top of the screen and choose "expression.sif". The description of the network at the left shows that 169 nodes are selected. To choose interacting metabolites, choose select/nodes/first neighbors of selected nodes from text toolbar along the top of the screen. There should now be 284 nodes selected from the network (at the left). To make a new network from this selection, choose file/new/network/from selected nodes, all edges. Rename this network to "expression" by right clicking the network name in the control panel at the left then choosing edit network title. Nodes in new network can be laid out by clicking Layout on the text tool bar at the top and then /yFiles/organic.
- Visually compare the fitness and expression networks: To do this, first hide the parent network by right clicking the parent network in the control panel at the left and choosing destroy view. Then, arrange the fitness and expression networks in tile conformation by choosing View in the text toolbar at the top of the screen and choosing /arrange network windows/tiled. What can you say about the overlap between the two datasets? (two noticeable differences)
Part B: Using cerebral plugin to spatially organize the two subnetworks
- Set up sub clustering of the peroxisome compartment for the expression network: With the expression dataset selected, choose plugin and then "create cerebral view" from the text toolbar at the top of the screen. Select the localization data that will be used to organize the network by choosing Cellular component general from the pulldown menu of node attributes we have loaded. Compartments are shown in the table below. Change the order of compartments in the display by selecting the row and moving it with the up and down arrows at the right. An order that reflects the shape of the cell can work well. I will use the order: other, cell wall, plasma membrane, ER/golgi, nucleus, mitochondrion, cytoplasm, peroxisomes. Different variations of this order are fine, but peroxisome should appear last because the last compartment can be spatially arranged by a second attribute (see below), and peroxisomes are most closely linked to fat metabolism.
- Set up sub clustering of the peroxisome compartment for the expression network: In the downstream nodes pull down menu of available attributes choose pxml. This attribute annotates genes with peroxisomal products with 1 and all other genes with 0. In the next window, choose genes annotated with 1 to include in the sub clustering analysis. In the next window, choose the label for this cluster to be peroxisome. In the bottom window, choose the annotation that will be used for sub clustering from the pull-down menu (biological_process) and then select Layout. Edge curviness, label density and group label size can be edited under the cerebral tab in the control panel at the left under the cerebral tab.
- Setup cerebral display of fitness network: This can be done by following steps 11 and 12 with the fitness sub network selected.
- Does the cerebral display provide further biological insight?
Independent study after tutorial
Cerebral has a multi-experiment comparison tool. With this functionality, users can now upload, compare, contrast and cluster quantitative data from multiple experiments. A time course gene expression data file, tp12s23cr2.mtx, that can be used with the multi-experiment comparison tool is in the Cytoscape_data_files directory and described in the "Description of data sets" section above.
References and websites
- Companion article: Jennifer J Smith, Yaroslav Sydorskyy, Marcello Marelli, Daehee Hwang, Hamid Bolouri, Richard A Rachubinski and John D Aitchison (2006) Expression and functional profiling reveal distinct gene classes involved in fatty acid metabolism. Molecular Systems Biology 2 doi:10.1038/msb4100051
- Cytoscape v 2.6.0
Download from http://www.cytoscape.org/ Paul Shannon, Andrew Markiel, Owen Ozier, Nitin S. Baliga, Jonathan T. Wang, Daniel Ramage, Nada Amin, Benno Schwikowski and Trey Ideker (2003) Cytoscape: A Software Environment for Integrated Models of Biomolecular Interaction Networks. Genome Res. 13:2498-2504.
- Metabolite-protein interactions are from Prinz et al. (Prinz S, Avila-Campillo I, Aldridge C, Srinivasan A, Dimitrov K, SiegelAF, Galitski T (2004) Control of yeast filamentous-form growth by modules in an integrated molecular network. Genome Res 14:380-390, ), which is a modified version of interactions compiled previously (Forster J, Famili I, Fu P, Palsson BO, Nielsen J (2003) Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network. Genome Res 13: 244-253).
- Cerebral v.2 Cytoscape plug-in
http://www.pathogenomics.ca/cerebral/ Barsky A, Gardy JL, Hancock REW, and Munzner T. (2007) Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics 23(8):1040-2.
- A comprehensive set of physical interactions amongst yeast proteins and genetic interactions among yeast genes can be downloaded from Saccharomyces Genome Database (SGD)
- Teacher contact info: firstname.lastname@example.org; email@example.com