Cancer@Home v2
A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.
🚀 Quick Start (5 minutes)
Prerequisites
- Python 3.8+
- Docker Desktop
- 8GB RAM minimum
Installation
- Clone and setup
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate # Windows
pip install -r requirements.txt
- Start Neo4j Database
docker-compose up -d
- Run the application
python run.py
- Open your browser
- Application: http://localhost:5000
- Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)
🎯 Features
1. Distributed Computing (BOINC Integration)
- Submit cancer research computational tasks
- Monitor distributed workload processing
- Real-time task status tracking
2. GDC Data Integration
- Download cancer genomics data from GDC Portal
- Support for various cancer types (TCGA, TARGET projects)
- Automatic data parsing and normalization
3. Sequence Analysis Pipeline
- FASTQ file processing
- BLAST sequence alignment
- Variant calling and annotation
4. Neo4j Graph Database
- Graph-based cancer data modeling
- Relationships: Gene → Mutation → Patient → Cancer Type
- Interactive graph visualization
5. GraphQL API
- Query cancer data flexibly
- Filter by gene, mutation, patient cohort
- Aggregate statistics
6. Interactive Dashboard
- Real-time data visualization
- Network graphs for gene interactions
- Mutation frequency charts
- Patient cohort analysis
📊 Architecture
Cancer@Home v2
│
├── Frontend (React + D3.js)
│ ├── Dashboard
│ ├── Neo4j Visualization
│ └── Task Monitor
│
├── Backend (FastAPI)
│ ├── REST API
│ ├── GraphQL Endpoint
│ └── WebSocket (real-time updates)
│
├── Data Layer
│ ├── Neo4j (Graph Database)
│ ├── BOINC Client
│ └── GDC API Client
│
└── Analysis Pipeline
├── FASTQ Parser
├── BLAST Wrapper
└── Variant Annotator
🗂️ Project Structure
CancerAtHome2/
├── backend/
│ ├── api/ # FastAPI routes
│ ├── boinc/ # BOINC integration
│ ├── gdc/ # GDC data fetcher
│ ├── neo4j/ # Neo4j database layer
│ ├── pipeline/ # Bioinformatics pipeline
│ └── graphql/ # GraphQL schema
├── frontend/
│ ├── public/
│ └── src/
│ ├── components/ # React components
│ ├── views/ # Page views
│ └── api/ # API client
├── data/ # Downloaded datasets
├── docker-compose.yml # Neo4j container
├── requirements.txt # Python dependencies
└── run.py # Main entry point
🧬 Data Flow
- Data Ingestion: Download cancer genomics data from GDC Portal
- Processing: Run FASTQ/BLAST analysis on distributed BOINC network
- Storage: Store results in Neo4j graph database
- Visualization: Query and visualize via web dashboard
🔧 Configuration
Edit config.yml to customize:
- Neo4j connection settings
- GDC API parameters
- BOINC project URL
- Analysis pipeline options
📖 Usage Examples
Query Mutations by Gene
query {
mutations(gene: "TP53") {
id
position
consequence
patients {
cancerType
stage
}
}
}
Submit Analysis Task
from backend.boinc import BOINCClient
client = BOINCClient()
task_id = client.submit_task(
workunit_type="variant_calling",
input_file="sample.fastq"
)
🤝 Inspired By
- Cancer@Home v1 - Distributed cancer research
- Neo4j Cancer Visualization - Graph-based cancer data modeling
📄 License
MIT License
🛟 Support
For issues or questions, please open a Huggingface or GitHub issue.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support