Tutorial:Basic Expression Analysis in Cytoscape
Slideshow Basic Expression Analysis in Cytoscape (30 min)
Handout Basic_Expression_Analysis_in_Cytoscape.pdf (10 pages)
Tutorial Sources Cytoscape Wiki (Natalie Yeung)
Tutorial Curators Kristina Hanspers, Alex Pico, Mike Smoot
Cytoscape is an open source software platform for integrating, visualizing, and analyzing measurement data in the context of networks. This tutorial will introduce you to:
- Visualizing networks using expression data.
- Filtering networks based on expression data.
- Assessing expression data in the context of a biological network.
- Start Cytoscape and load the network galFiltered.sif.
- Apply the force-directed layout to organize the layout of the nodes. Select the "Layout->Cytoscape Layouts->Force-Directed Layout" menu.
- Cytoscape should now look similar to this:
Loading expression data
- Using your favorite text editor, open the file galExpData.csv. The first few lines of the file are as follows:
GENE,COMMON,gal1RGexp,gal4RGexp,gal80Rexp,gal1RGsig,gal4RGsig,gal80Rsig YHR051W,COX6,-0.034,0.111,-0.304,3.75720e-01,1.56240e-02,7.91340e-06 YHR124W,NDT80,-0.090,0.007,-0.348,2.71460e-01,9.64330e-01,3.44760e-01 YKL181W,PRS1,-0.167,-0.233,0.112,6.27120e-03,7.89400e-04,1.44060e-01
You should note the following information about the file:
- The first line consists of labels.
- All columns are separated by a single comma character.
- The first column contains node names, and must match the names of the nodes in your network exactly!
- The second column contains common locus names. This column is optional, and the data is not currently used by Cytoscape, but including this column makes the format consistent with the output of many microarray analysis packages, and makes the file easier to read.
- The remaining columns contain experimental data, two columns per experiment (one column represents the expression measurement and the second represents the significance value for that measurement), and one line per node. In this case, there are three expression results per node.
- Under the File menu, select Import → Attribute from Table (Text/MS Excel).
- Click "Node" for the type of attribute to import.
- Select the file galExpData.csv.
- Click the "Text File Import Options" check box.
- Click the "Tab" check box in the "Delimiter" section to unselect Tab and click "Comma" instead. The preview should now indicate that it is importing multiple columns of data.
- Click the "Transfer first line as attribute names" check box in the "Attribute Names" section. The preview should now show be using the first row of the input file as column names and the import window should look like the image below.
- Click the "Import" button to import the attribute data.
- Select a node on the Cytoscape canvas by clicking on it.
- In the Node Attribute Browser, click the Select Attributes button , and select the attributes gal1RGexp, gal4RGexp, and gal80Rexp by left-clicking on them. Right-click to close the menu.
- Under the Node Attribute Browser, you should see your node listed with their expression values, as shown.
Visualizing Expression Data
Probably the most common use of expression data in Cytoscape is to set the visual attributes of the nodes in a network according to expression data. This creates a powerful visualization, portraying functional relation and experimental response at the same time. Here, we will walk through the steps for doing this.
Label the Nodes
- Open the VizMapper by selecting its tab or by clicking on its icon:
- Use the "Common" name attribute to give the nodes useful names.
- Zoom in on the network so that node labels are visible.
- Click the second column of the "Node Label" row in the Visual Mapping Browser.
- This should produce a drop-down menu of available attribute names. Select "Common".
- Verify that the node labels on the network have changed to their common names.
- Click the Triangle in the "Node Label" row of the "Visual Mapping Browser" to see the other visual properties.
Color the nodes
Define the node color of this visual style:
- Double-Click the Node Color row in the Visual Mapping Browser in the Unused Visual Properties Section.
- This action will move Node Color to the top of the Visual Mapping Browser.
- Click the "Please select a value!" cell in the Node Color section.
- This will produce a drop-down menu of available attribute names. Select "gal80Rexp".
- Click the "Please select a mapping" cell in the Node Color section.
- This will produce a drop-down menu of available mapping types. Select "Continuous Mapping".
- This action will produce a basic black to white color gradient.
- Click on the color gradient to change the colors. This will pop-up a gradient editing dialog.
- Drag the left-most, red handle along the top of the gradient. Drag it to an Attribute Value of approx. -1.2
- Drag the white handle to approx 0.5. You can type the Attribute Value in the Handle Settings section to be more precise.
- You can also change the color of each handle by double-clicking or using the Node Color selector button in the Handle Settings section.
- This should produce a Red-White-Green Color gradient like the image below, with min and max extremes colored black and blue, respectively.
- Click 'OK' to save the gradient adjustment dialog and verify that the nodes in the network reflect the new coloring scheme.
Set the default node color
Note that the default node color of pink falls within this spectrum. A useful trick is to choose a color outside this spectrum to distinguish nodes with no defined expression value and those with slight repression.
- Click the Defaults network icon in the VizMapper panel.
- Click the Node Color entry and choose a dark gray color.
- Zoom out on the network view to verify that a few nodes have been colored gray.
Set the Node Shape
We imported both expression measurement values and significance values for those measurements. We can use the significance values to change the shape of the nodes so that measurements we have confidence in appear as squares while potentially bad measurements appear as circles.
- Double-Click the Node Shape row in the Visual Mapping Browser in the Unused Visual Properties Section.
- This action will move Node Shape to the top of the Visual Mapping Browser.
- Click the "Please select a value!" cell in the Node Shape section.
- This will produce a drop-down menu of available attribute names. Select "gal80Rsig".
- Click the "Please select a mapping" cell in the Node Shape section.
- This will produce a drop-down menu of available mapping types. Select "Continuous Mapping".
- This will create an empty icon in the "Graphical View" row of the Node Shape section. Click on this icon.
- This action will pop-up a continuous shape selection dialog.
- Click the Add button.
- This action will split the range of values with a slider down the middle with a node shape icon to either side of the slider.
- Double-Click on the left node icon (a circle).
- This will pop-up a node shape selection dialog.
- Choose the Rectangle shape and click the Apply button.
- The continuous shape selection dialog should now show both a square and a circle node shape icon.
- Click on the black triangle and move the slider to the left, to slightly lower that 0.05, our threshold for significance.
- Close the continuous shape selection dialog and verify that some nodes have a square shape and some nodes have a circular shape.
The network should now look like this:
A biological analysis scenario
This section presents one scenario of how expression data can be combined with network data to tell a biological story.
First, here is some background on your data. You are working with yeast, and the genes Gal1, Gal4, and Gal80 are all yeast transcription factors. Your expression experiments all involve some perturbation of these transcription factor genes. Gal1, Gal4, and Gal80 are also represented in your interaction network, where they are labeled according to yeast locus tags: Gal1 corresponds to YBR020W, Gal4 to YPL248C, and Gal80 to YML051W.
Your network contains a combination of protein-protein (pp) and protein-DNA (pd) interactions. Here, we shall filter out the protein-protein interactions to focus on the protein-DNA interactions.
- Click the Filters tab in the Control Panel.
- Click the Attribute/Filter chooser in the Filter Definition and choose "edge.interaction".
- Click the Add button in the Filter Definition section to add the selected attribute to the filter.
- This action will create a text search box entry in the filter.
- Type the letters "pp" into the text search box. This indicates that we're searching for all edge interaction attributes that match the string "pp".
- Click the Apply Filter button at the bottom of the Filters panel.
- You should now see many edges in the network selected (i.e. colored red).
- Since we're only interested in the protein-DNA edges, we can delete the protein-protein edges we've just selected.
- Select the menu Edit → Delete Selected Nodes and Edge. You should now see many unconnected nodes in the network.
- Select the menu Layout → Cytoscape Layouts → Force-Directed Layout to clean up the network visualization.
- The final filtered and cleaned up network should look like this:
Observe the Network
Notice that three bright green (highly induced) nodes are in the same region of the graph. Zoom into the graph to see more details.
- Notice that there are two nodes that interact with all three green nodes: GAL4 (YPL248C) and GAL11 (YOL051W).
- Select these two nodes and their immediate neighbors by selecting the menu Select → Nodes → First Neighbors of Selected Nodes.
- It is sometimes useful to create a new network from selected nodes. Do this by selecting the menu File → New → Network → From Selected Nodes, All Edges.
- With some layout and zooming, this new network should appear similar to the one shown:
- Right click on the node GAL4.
- Select the menu LinkOut → Entrez → Gene.
- This action will pop-up a browser window and search the Entrez Gene database for the term "YPL248C", the id of the node.
- In the results in the browser the first entry should be labeled GAL4. Click on this entry.
- The description of GAL4 tells us that it is repressed by GAL80.
- Our data show precisely this:
- Both nodes (GAL4 and GAL11) show fairly small changes in expression, and neither change is statistically significant: they are rendered as light-colored circles. These slight changes in expression suggest that the critical change affecting the black nodes might be somewhere else in the network, and not either of these nodes.
- GAL4 interacts with GAL80 (YML051W), which shows a significant level of repression: it is depicted as a red square.
- Note that while GAL80 shows evidence of significant repression, most nodes interacting with GAL4 show significant levels of induction: they are rendered as green squares.
- GAL11 is a general transcription co-factor with many interactions.
- Putting all of this together, we see that the transcriptional activation activity of Gal4 is repressed by Gal80. So, repression of Gal80 increases the transcriptional activation activity of Gal4. Even though the expression of Gal4 itself did not change much, the Gal4 transcripts were much more likely to be active transcription factors when Gal80 was repressed. This explains why there is so much up-regulation in the vicinity of Gal4.