Commit 5ff719de authored by Ayse Berceste Dincer's avatar Ayse Berceste Dincer
Browse files

Update README.md

parent 98356dbe
......@@ -2,11 +2,33 @@
### Nicasia Beebe-Wang, Ayse B. Dincer, and Su-In Lee
##### Paul G. Allen School of Computer Science & Engineering, University of Washington, Seattle
Although biological pathways are essential forinterpreting results from computational biologystudies, the growing number of pathway databasesmakes it difficult to perform pathway analysis.Our study seeks to reconcile pathways from dif-ferent databases and reduce pathway redundancyby revealing informative groups with distinct bio-logical functions. Uniquely applying the Louvaincommunity detection algorithm to a network of4,847 pathways from KEGG, REACTOME andGene Ontology databases, we identify 35 distinctcommunities of pathways and show that thesecommunities are consistent with expert-curatedpathway categories. Further, we develop an algo-rithm to automatically annotate each communitybased on member pathways’ names. By learn-ing informative categories, we progress towards atool that computational biologists can use to moreefficiently interpret their biological findings.
Although biological pathways are essential for interpreting results from computational biology studies, the growing number of pathway databases makes it difficult to perform pathway analysis. Our study seeks to reconcile pathways from different databases and reduce pathway redundancy by revealing informative groups with distinct biological functions. Uniquely applying the Louvain community detection algorithm to a network of 4,847 pathways from KEGG, REACTOME and Gene Ontology databases, we identify 35 distinct communities of pathways and show that these communities are consistent with expert-curated pathway categories. Further, we develop an algorithm to automatically annotate each community based on member pathways’ names. By learning informative categories, we progress towards a tool that computational biologists can use to more efficiently interpret their biological findings.
<img align="center" src="Concept_Figure.png" width="60%">.
<img align="center" src="Concept_Figure.jpg" width="60%">.
---
---
### Pipeline
**pathways_raw** folder contains raw pathway files from MSigDB.
**curated_hierarchies** folder contains the hierarchies and high-level categories for KEGG, REACTOME, and GO databases.
**pipeline** folder contains all the scripts in our pipeline.
#### 1. Generating adjacency matrices and curated hierarchies
**gmts_to_adj_matrices.py** and **gmts_to_adj_matrices_offdiagonal.py** define adjacency matrices for the MSigDB pathways by measuring the pairwise similarities betweeen pathways from each database and across databases, respectively.
**save_true_labels.ipynb** script records the true curated labels for each pathway category.
**adj_matrices** and **curated_labels** folders contain the pathway adjacency matrices and curated labels for each pathway category, respectively.
#### 2. Comparison of clustering algorithms
**algorithm_helpers.py** is the helper script to run various clustering and community detection algorithms.
**CNM_networkx.py** is the modified version of the CNM algorithm.
**select_resolution_for_Louvain.ipynb** script selects the best resolution for the Louvain algorithm for each database category.
**comparison_of_algorithms.ipynb** script executes all the clustering algorithms and compares them across all pathway databases.
#### 3. Generating combined pathway network and learning communities
**combined_graph_louvain_with_weights.ipynb** defines the combined pathay network and applies Louvain algoritm to learn pathway communities.
**Full_graph_louvain_with_weights_community_labels** includes the community labels learned by different resolutions.
#### 4. Analysis of combined pathway network
**combined_graph_community_sizes_and_distributions.ipynb** investigates the size and pathway distribution for each pathway community.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment