Commit dca17bf5 authored by Nicasia Beebe-Wang's avatar Nicasia Beebe-Wang
Browse files

Update README.md

parent 3cb29d6c
......@@ -12,33 +12,33 @@ Although biological pathways are essential for interpreting results from computa
### Resources
**Community_Members.xlsx** and **Community_Members.csv** files contain the list of all pathways included in each of the 35 pathway communities we learned.
**Community_kmers.xlsx** and **Community_kmers.csv** files contain the list of k-mers for each pathway community, along with the number of occurence and hubness of each pathway (within the community's subgraph).
- **Community_Members.xlsx** and **Community_Members.csv** files contain the list of all pathways included in each of the 35 pathway communities we learned.
- **Community_kmers.xlsx** and **Community_kmers.csv** files contain the list of k-mers for each pathway community, along with the number of occurence and hubness of each pathway (within the community's subgraph).
---
### Pipeline
**pathways_raw** folder contains raw pathway files from MSigDB.
**curated_hierarchies** folder contains the hierarchies and high-level categories for KEGG, REACTOME, and GO databases.
**pipeline** folder contains all the scripts in our pipeline.
- **pathways_raw** folder contains raw pathway files from MSigDB.
- **curated_hierarchies** folder contains the hierarchies and high-level categories for KEGG, REACTOME, and GO databases.
- **pipeline** folder contains all the scripts in our pipeline.
#### 1. Generating adjacency matrices and curated hierarchies
**gmts_to_adj_matrices.py** and **gmts_to_adj_matrices_offdiagonal.py** define adjacency matrices for the MSigDB pathways by measuring the pairwise similarities betweeen pathways from each database and across databases, respectively.
**save_true_labels.ipynb** script records the true curated labels for each pathway category.
**adj_matrices** and **curated_labels** folders contain the pathway adjacency matrices and curated labels for each pathway category, respectively.
- **gmts_to_adj_matrices.py** and **gmts_to_adj_matrices_offdiagonal.py** define adjacency matrices for the MSigDB pathways by measuring the pairwise similarities betweeen pathways from each database and across databases, respectively.
- **save_true_labels.ipynb** script records the true curated labels for each pathway category.
- **adj_matrices** and **curated_labels** folders contain the pathway adjacency matrices and curated labels for each pathway category, respectively.
#### 2. Comparison of clustering algorithms
**algorithm_helpers.py** is the helper script to run various clustering and community detection algorithms.
**CNM_networkx.py** is a modified version of the CNM algorithm (which allows us to select the number of communities to generate) originally from [NetworkX]( https://networkx.github.io/documentation/stable/_modules/networkx/algorithms/community/modularity_max.html).
**select_resolution_for_Louvain.ipynb** script selects the best resolution for the Louvain algorithm for each database category.
**comparison_of_algorithms.ipynb** script executes all the clustering algorithms and compares them across all pathway databases.
- **algorithm_helpers.py** is the helper script to run various clustering and community detection algorithms.
- **CNM_networkx.py** is a modified version of the CNM algorithm (which allows us to select the number of communities to generate) originally from [NetworkX]( https://networkx.github.io/documentation/stable/_modules/networkx/algorithms/community/modularity_max.html).
- **select_resolution_for_Louvain.ipynb** script selects the best resolution for the Louvain algorithm for each database category.
- **comparison_of_algorithms.ipynb** script executes all the clustering algorithms and compares them across all pathway databases.
#### 3. Generating combined pathway network and learning communities
**combined_graph_louvain_with_weights.ipynb** defines the combined pathay network and applies the Louvain algoritm to learn pathway communities.
**Full_graph_louvain_with_weights_community_labels** includes the community labels learned using different resolutions.
- **combined_graph_louvain_with_weights.ipynb** defines the combined pathay network and applies the Louvain algoritm to learn pathway communities.
- **Full_graph_louvain_with_weights_community_labels** includes the community labels learned using different resolutions.
#### 4. Analysis of combined pathway network
**combined_graph_analyses/community_sizes_and_distributions.ipynb** investigates the size and pathway distribution for each pathway community.
**combined_graph_analyses/curated_category_distributions_clustermaps.ipynb** generates cluster maps showing distributions of curated categories as they relate to our communities
**combined_graph_analyses/generate_kmer_labels.ipynb** automatically generates labels for each community based on their members' names.
\ No newline at end of file
- **combined_graph_analyses/community_sizes_and_distributions.ipynb** investigates the size and pathway distribution for each pathway community.
- **combined_graph_analyses/curated_category_distributions_clustermaps.ipynb** generates cluster maps showing distributions of curated categories as they relate to our communities
- **combined_graph_analyses/generate_kmer_labels.ipynb** automatically generates labels for each community based on their members' names.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment