File size: 4,509 Bytes
949080e 9a93226 949080e 9a93226 949080e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 |
---
license: mit
tags:
- cancer-genomics
- bioinformatics
- graph-database
- neo4j
- distributed-computing
- boinc
- healthcare
- genomics
- fastq
- blast
- variant-calling
- gdc-portal
- tcga
library_name: cancer-at-home-v2
pipeline_tag: other
metrics:
- accuracy
- bleu
- bleurt
---
# Cancer@Home v2
A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.
## 🚀 Quick Start (5 minutes)
### Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM minimum
### Installation
1. **Clone and setup**
```bash
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
```
2. **Start Neo4j Database**
```bash
docker-compose up -d
```
3. **Run the application**
```bash
python run.py
```
4. **Open your browser**
- Application: http://localhost:5000
- Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)
## 🎯 Features
### 1. **Distributed Computing (BOINC Integration)**
- Submit cancer research computational tasks
- Monitor distributed workload processing
- Real-time task status tracking
### 2. **GDC Data Integration**
- Download cancer genomics data from GDC Portal
- Support for various cancer types (TCGA, TARGET projects)
- Automatic data parsing and normalization
### 3. **Sequence Analysis Pipeline**
- FASTQ file processing
- BLAST sequence alignment
- Variant calling and annotation
### 4. **Neo4j Graph Database**
- Graph-based cancer data modeling
- Relationships: Gene → Mutation → Patient → Cancer Type
- Interactive graph visualization
### 5. **GraphQL API**
- Query cancer data flexibly
- Filter by gene, mutation, patient cohort
- Aggregate statistics
### 6. **Interactive Dashboard**
- Real-time data visualization
- Network graphs for gene interactions
- Mutation frequency charts
- Patient cohort analysis
## 📊 Architecture
```
Cancer@Home v2
│
├── Frontend (React + D3.js)
│ ├── Dashboard
│ ├── Neo4j Visualization
│ └── Task Monitor
│
├── Backend (FastAPI)
│ ├── REST API
│ ├── GraphQL Endpoint
│ └── WebSocket (real-time updates)
│
├── Data Layer
│ ├── Neo4j (Graph Database)
│ ├── BOINC Client
│ └── GDC API Client
│
└── Analysis Pipeline
├── FASTQ Parser
├── BLAST Wrapper
└── Variant Annotator
```
## 🗂️ Project Structure
```
CancerAtHome2/
├── backend/
│ ├── api/ # FastAPI routes
│ ├── boinc/ # BOINC integration
│ ├── gdc/ # GDC data fetcher
│ ├── neo4j/ # Neo4j database layer
│ ├── pipeline/ # Bioinformatics pipeline
│ └── graphql/ # GraphQL schema
├── frontend/
│ ├── public/
│ └── src/
│ ├── components/ # React components
│ ├── views/ # Page views
│ └── api/ # API client
├── data/ # Downloaded datasets
├── docker-compose.yml # Neo4j container
├── requirements.txt # Python dependencies
└── run.py # Main entry point
```
## 🧬 Data Flow
1. **Data Ingestion**: Download cancer genomics data from GDC Portal
2. **Processing**: Run FASTQ/BLAST analysis on distributed BOINC network
3. **Storage**: Store results in Neo4j graph database
4. **Visualization**: Query and visualize via web dashboard
## 🔧 Configuration
Edit `config.yml` to customize:
- Neo4j connection settings
- GDC API parameters
- BOINC project URL
- Analysis pipeline options
## 📖 Usage Examples
### Query Mutations by Gene
```graphql
query {
mutations(gene: "TP53") {
id
position
consequence
patients {
cancerType
stage
}
}
}
```
### Submit Analysis Task
```python
from backend.boinc import BOINCClient
client = BOINCClient()
task_id = client.submit_task(
workunit_type="variant_calling",
input_file="sample.fastq"
)
```
## 🤝 Inspired By
- [Cancer@Home v1](https://www.herox.com/DCx/round/516/entry/23285) - Distributed cancer research
- [Neo4j Cancer Visualization](https://medium.com/neo4j/visualize-cancer-1c80a95f5bb4) - Graph-based cancer data modeling
## 📄 License
MIT License
## 🛟 Support
For issues or questions, please open a Huggingface or GitHub issue. |