Cancer@Home v2

A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.

🚀 Quick Start (5 minutes)

Prerequisites

Python 3.8+
Docker Desktop
8GB RAM minimum

Installation

Clone and setup

cd CancerAtHome2
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt

Start Neo4j Database

docker-compose up -d

Run the application

python run.py

Open your browser

Application: http://localhost:5000
Neo4j Browser: http://localhost:7474 (username: neo4j, password: cancer123)

🎯 Features

1. Distributed Computing (BOINC Integration)

Submit cancer research computational tasks
Monitor distributed workload processing
Real-time task status tracking

2. GDC Data Integration

Download cancer genomics data from GDC Portal
Support for various cancer types (TCGA, TARGET projects)
Automatic data parsing and normalization

3. Sequence Analysis Pipeline

FASTQ file processing
BLAST sequence alignment
Variant calling and annotation

4. Neo4j Graph Database

Graph-based cancer data modeling
Relationships: Gene → Mutation → Patient → Cancer Type
Interactive graph visualization

5. GraphQL API

Query cancer data flexibly
Filter by gene, mutation, patient cohort
Aggregate statistics

6. Interactive Dashboard

Real-time data visualization
Network graphs for gene interactions
Mutation frequency charts
Patient cohort analysis

📊 Architecture

Cancer@Home v2
│
├── Frontend (React + D3.js)
│   ├── Dashboard
│   ├── Neo4j Visualization
│   └── Task Monitor
│
├── Backend (FastAPI)
│   ├── REST API
│   ├── GraphQL Endpoint
│   └── WebSocket (real-time updates)
│
├── Data Layer
│   ├── Neo4j (Graph Database)
│   ├── BOINC Client
│   └── GDC API Client
│
└── Analysis Pipeline
    ├── FASTQ Parser
    ├── BLAST Wrapper
    └── Variant Annotator

🗂️ Project Structure

CancerAtHome2/
├── backend/
│   ├── api/              # FastAPI routes
│   ├── boinc/            # BOINC integration
│   ├── gdc/              # GDC data fetcher
│   ├── neo4j/            # Neo4j database layer
│   ├── pipeline/         # Bioinformatics pipeline
│   └── graphql/          # GraphQL schema
├── frontend/
│   ├── public/
│   └── src/
│       ├── components/   # React components
│       ├── views/        # Page views
│       └── api/          # API client
├── data/                 # Downloaded datasets
├── docker-compose.yml    # Neo4j container
├── requirements.txt      # Python dependencies
└── run.py               # Main entry point

🧬 Data Flow

Data Ingestion: Download cancer genomics data from GDC Portal
Processing: Run FASTQ/BLAST analysis on distributed BOINC network
Storage: Store results in Neo4j graph database
Visualization: Query and visualize via web dashboard

🔧 Configuration

Edit config.yml to customize:

Neo4j connection settings
GDC API parameters
BOINC project URL
Analysis pipeline options

📖 Usage Examples

Query Mutations by Gene

query {
  mutations(gene: "TP53") {
    id
    position
    consequence
    patients {
      cancerType
      stage
    }
  }
}

Submit Analysis Task

from backend.boinc import BOINCClient

client = BOINCClient()
task_id = client.submit_task(
    workunit_type="variant_calling",
    input_file="sample.fastq"
)

🤝 Inspired By

Cancer@Home v1 - Distributed cancer research
Neo4j Cancer Visualization - Graph-based cancer data modeling

📄 License

MIT License

🛟 Support

For issues or questions, please open a Huggingface or GitHub issue.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support