clusterMaker is a Cytoscape plugin that unifies different clustering techniques and displays into a single interface. Current clustering algorithms include hierarchical, AutoSOME, k-medoid, and k-Means for clustering expression or genetic data; and MCL, AutoSOME, TransClust, SPCS, Affinity Propagation and MCODE for clustering similarity networks to look for protein families. A recent BMC Bioinformatics publication: "clusterMaker: A Multi-algorithm Clustering Plugin for Cytoscape" discussed the capabilities of clusterMaker by utilizing three scenarios. The second scenario focused on the utilization of clusterMaker to analyze protein-protein interaction data and genetic interaction.
Biological Use Case: Finding complexes in proteomic and genetic interaction data
- Download CollinsPlus.cys
- Go to File → Open and select CollinsPlus.cys from where you downloaded it.
This session file includes three networks: combined_scores_good.txt, DNA and Tran 07-21-06b.csv, and RNAPuberNov2+Meg6c.csv. Nodes in all three networks refer to yeast genes/proteins. The first network is a high-quality MS/TAP data set derived from combining three large-scale MS/TAP data sets that includes 2,401 proteins. The other two are genetic interaction experiments with 743 genes and 552 genes, respectively. We're only going to use the first two of these data sets in this tutorial.
Run MCL Clustering
MCL is a useful clustering tool for finding complexes in MS/TAP data sets. It is relatively fast and parallelizes nicely. We'll use MCL to cluster the proteomics data set to find potential clusters.
- Select the combined_scores_good.txt network in the Network panel.
- Select Plugins → Cluster → MCL cluster to bring up the MCL cluster Settings dialog.
- The original authors of this study used a weight called PE Score to indicate the strength of the association, which we can use with MCL. Select PE Score in the Array Sources menu.
- Click Create Clusters. This is a large network, and may take some time to complete the clustering. Iteration 3, in particular, will take a little longer (5-10 minutes) as it is the densest point in the cluster. For comparison, on a MacBook Pro with 4GB of RAM and a 2.66GHz Core i7, an MCL cluster of this network takes 12 minutes.
- After the algorithm has finished, MCL will display a dialog with the summary results.
Visualize MCL Clusters
- To see the clusters in a new network, click on Visualize Clusters.
- clusterMaker adds a new attribute (0_MCL_cluster in this case) to the network. Each cluster has a unique number for this attribute that may be used to change the graphics attributes in the VizMapper.
Run Hierarchical Clustering on Genetic Interaction Data
At this point, you should have a Cytoscape network that is partitioned by the various clusters (putative complexes). What is often useful is to combine other, orthogonal, sources of data that might be used to increase the confidence of the complex determination. In this case, we'll use one of the genetic interaction data sets to accomplish that.
- Select the DNA and Tran 07-21-06b.csv network. There is no view for this network, so selecting it will not change the currently displayed network.
- Select Plugins → Cluster → Hierarchical cluster.
- Under Array sources select edge.DNA Strength. In this case, we're going to be using an edge attribute to give us a diagonal clustering.
- Deselect Create groups from clusters
- Click on Create Clusters
Visualize Genetic Interaction Clusters and Interacting with MCL Clusters
- Select Visualize Clusters to display the diagonally symmetrical dendrogram. Slide this treeview a little off to the side so that you can see both the treeview and the MCL clustered network at the same time.
- Play with this representation a little. Note that the cells in heat map represent edges rather than nodes. Typically, selection in this kind of dendrogram will involve selecting a set of edges by holding down the shift key and the left mouse button and sweeping out a group of cells.
- Select the combined_scores_good.txt--clustered network that you created after the MCL clustering.
- Type GIM3 into the Search: box in the Cytoscape toolbar and press return. This should focus on the node that represents the GIM3 protein in the MCL clustered network. Zoom out a little so that you can see the entire cluster that includes: GIM3, GIM4, GIM5, PAC10, PFD1, YKE2, and BUD27
- Select all of the nodes in that cluster
- Note that those same genes are selected in the treeview, but the selection does not represent a tight cluster.
- Deselect the edge that connects the tighter cluster to BUD27
- Note how this tightens up the cluster. All of the proteins in this cluster are known to be part of the prefoldin complex, although BUD27 is annotated as an "Unconventional prefoldin protein" and it's molecular function is unknown.
This session file also has some expression data from an old yeast heat shock experiment loaded that can be used to explore the same data set. When multiple treeviews or other heat map views are up, selections are reflected in all of them.