Cancer@Home v2

A distributed computing platform for cancer genomics research, combining BOINC distributed computing, GDC cancer data analysis, sequence processing (FASTQ/BLAST), and Neo4j graph visualization.

🚀 Quick Start (5 minutes)

Prerequisites

  • Python 3.8+
  • Docker Desktop
  • 8GB RAM minimum

Installation

  1. Clone and setup
cd CancerAtHome2
python -m venv venv
venv\Scripts\activate  # Windows
pip install -r requirements.txt
  1. Start Neo4j Database
docker-compose up -d
  1. Run the application
python run.py
  1. Open your browser

🎯 Features

1. Distributed Computing (BOINC Integration)

  • Submit cancer research computational tasks
  • Monitor distributed workload processing
  • Real-time task status tracking

2. GDC Data Integration

  • Download cancer genomics data from GDC Portal
  • Support for various cancer types (TCGA, TARGET projects)
  • Automatic data parsing and normalization

3. Sequence Analysis Pipeline

  • FASTQ file processing
  • BLAST sequence alignment
  • Variant calling and annotation

4. Neo4j Graph Database

  • Graph-based cancer data modeling
  • Relationships: Gene → Mutation → Patient → Cancer Type
  • Interactive graph visualization

5. GraphQL API

  • Query cancer data flexibly
  • Filter by gene, mutation, patient cohort
  • Aggregate statistics

6. Interactive Dashboard

  • Real-time data visualization
  • Network graphs for gene interactions
  • Mutation frequency charts
  • Patient cohort analysis

📊 Architecture

Cancer@Home v2
│
├── Frontend (React + D3.js)
│   ├── Dashboard
│   ├── Neo4j Visualization
│   └── Task Monitor
│
├── Backend (FastAPI)
│   ├── REST API
│   ├── GraphQL Endpoint
│   └── WebSocket (real-time updates)
│
├── Data Layer
│   ├── Neo4j (Graph Database)
│   ├── BOINC Client
│   └── GDC API Client
│
└── Analysis Pipeline
    ├── FASTQ Parser
    ├── BLAST Wrapper
    └── Variant Annotator

🗂️ Project Structure

CancerAtHome2/
├── backend/
│   ├── api/              # FastAPI routes
│   ├── boinc/            # BOINC integration
│   ├── gdc/              # GDC data fetcher
│   ├── neo4j/            # Neo4j database layer
│   ├── pipeline/         # Bioinformatics pipeline
│   └── graphql/          # GraphQL schema
├── frontend/
│   ├── public/
│   └── src/
│       ├── components/   # React components
│       ├── views/        # Page views
│       └── api/          # API client
├── data/                 # Downloaded datasets
├── docker-compose.yml    # Neo4j container
├── requirements.txt      # Python dependencies
└── run.py               # Main entry point

🧬 Data Flow

  1. Data Ingestion: Download cancer genomics data from GDC Portal
  2. Processing: Run FASTQ/BLAST analysis on distributed BOINC network
  3. Storage: Store results in Neo4j graph database
  4. Visualization: Query and visualize via web dashboard

🔧 Configuration

Edit config.yml to customize:

  • Neo4j connection settings
  • GDC API parameters
  • BOINC project URL
  • Analysis pipeline options

📖 Usage Examples

Query Mutations by Gene

query {
  mutations(gene: "TP53") {
    id
    position
    consequence
    patients {
      cancerType
      stage
    }
  }
}

Submit Analysis Task

from backend.boinc import BOINCClient

client = BOINCClient()
task_id = client.submit_task(
    workunit_type="variant_calling",
    input_file="sample.fastq"
)

🤝 Inspired By

📄 License

MIT License

🛟 Support

For issues or questions, please open a Huggingface or GitHub issue.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support