Histopathologic Cancer Detection CNN

This model is a deep learning-based classifier designed to detect metastatic tissue in histopathologic scans of lymph node sections. It takes small pathology image patches as input and predicts whether the center 32x32 pixel region contains tumor tissue

The architecture consists of four convolutional layers, each followed by ReLU activation and max pooling, and two fully connected layers with dropout regularization to improve generalization. The model outputs log-probabilities using log softmax activation, optimized with Negative Log-Likelihood Loss (NLLLoss) and trained using the Adam optimizer

Model Details

Model Description

  • Developed by: Lorenzo Maiuri
  • Funded by: No funds
  • Shared by: Lorenzo Maiuri
  • Model type: Binary Image Classification (Tumor / No Tumor)
  • License: MIT

Model Sources

Uses

Try It Out

Coming soon...

Direct Use

Downstream Use

  • Research and Development: The model can be employed as a baseline for digital pathology research and to develop improved algorithms for tumor detection.
  • Educational Purposes: It serves as a tool for teaching deep learning techniques applied to medical imaging and histopathology
  • Pre-Screening Tool: In a controlled research environment, the model may assist in pre-screening histopathologic slides to flag potential tumor regions for further expert analysis.

Out-of-Scope Use

  • Clinical Diagnosis: The model is not validated for direct clinical decision-making and should not be used as a standalone diagnostic tool.
  • Regulatory or Legal Decisions: It should not be applied in scenarios that require compliance with regulatory standards for medical devices or inform legal judgments.
  • Generalization Beyond Histopathology: The model is specifically trained on a curated histopathologic dataset and may not perform adequately on images from other domains or imaging modalities.

Bias, Risks, and Limitations

  • Dataset Bias: The training data may not represent the full spectrum of histopathologic variability across different patient populations or institutions. This can lead to reduced performance when applied to unseen or diverse data.
  • Model Generalization: Due to its training on a specific dataset, the model might exhibit decreased accuracy on images with different acquisition protocols or quality.
  • False Positives/Negatives: Misclassifications can occur, potentially leading to overdiagnosis (false positives) or missed detections (false negatives), which can have significant clinical implications if not interpreted with caution.
  • Ethical and Regulatory Risks: Using AI for medical image analysis without thorough validation and regulatory approval carries inherent risks, including patient safety and data privacy concerns.

Recommendations

  • Human Oversight: Always use the model as a supplementary tool alongside expert clinical review rather than a replacement for professional diagnosis.
  • Local Validation: Before deployment in a new clinical or research setting, conduct extensive local validation to assess performance on your specific dataset.
  • Continuous Monitoring and Updating: Implement a system for continuous monitoring of model performance and update the model as new data and insights become available.
  • Further Research: Investigate and mitigate potential biases by incorporating more diverse data sources and performing subgroup analyses.
  • Regulatory Compliance: Ensure that any deployment in a clinical context complies with applicable regulatory standards and guidelines, and consider obtaining necessary certifications.

Training Details

Training Data

  • Dataset: PatchCamelyon (PCam)
  • Image Types: high-resolution 96×96 pixel images with corresponding labels indicating the presence of metastatic tissue in the central 32×32 pixel region
  • Classes: Binary classification (No Tumor, Tumor)
  • Labeling Criteria: A label of 1 (Tumor) is assigned if at least one pixel in the 32×32 px center region contains tumor tissue. Otherwise, the label is 0 (No Tumor).

Training Procedure

  • Batch Size: 32
  • Epochs: 50
  • Optimizer: Adam
  • Loss Function: Negative Log-Likelihood Loss (NLLLoss)
  • Callbacks: Early Stopping, ReduceLROnPlateau(factor=0.5, patience=20)
  • Dropout Rate: 0.25

Model Architecture

The model follows a Convolutional Neural Network (CNN) architecture with the following key components:

  • 4 Convolutional Layers with ReLU activation
  • Max Pooling Layers to reduce spatial dimensions
  • Dropout Layers (0.25) to prevent overfitting
  • Fully Connected (FC) Layers for classification
  • Softmax Activation in the final layer for binary classification

Preprocessing

  • Augmentation: Data augmentation was applied to improve generalization, including random horizontal and vertical flips, rotations

Training Hyperparameters

  • Epochs: 20
  • Batch Size: 75
  • Learning Rate: 0.001
  • Optimizer: Adam
  • Dropout Rate: 0.4

Speeds, Sizes, Times

  • Total Training Time: 33m
  • Hardware Used: Tesla P100

Testing Data, Factors & Metrics

Testing Data

The model was evaluated on an independent test set consisting of images from the same distribution as the training set but held out during training

Metrics

The performance of the model was assessed using the following evaluation metrics:

  • Accuracy: Proportion of correctly classified images.
  • Precision: Ability of the model to correctly identify tumor-positive cases while minimizing false positives.
  • Recall (Sensitivity): Ability to correctly identify tumor-positive cases, minimizing false negatives.
  • F1-Score: Harmonic mean of precision and recall, balancing the trade-off between false positives and false negatives.
  • AUC-ROC: Measures the model’s ability to distinguish between classes across different classification thresholds.
  • Confusion Matrix: Provides insight into true positives, true negatives, false positives, and false negatives.

Results

Summary

The trained model demonstrated strong classification performance on the test set A confusion matrix analysis showed a low false positive rate, meaning the model has strong specificity. However, some false negatives were observed, indicating the need for further improvements in recall to ensure fewer missed tumor cases

  • The model effectively distinguishes between tumor and non-tumor patches, making it a valuable tool for assisting pathologists in screening histopathology slides.
  • Future improvements could focus on enhancing recall, incorporating additional domain-specific augmentations, and testing on more diverse datasets to improve generalization across different histopathology labs and imaging setups.

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: Tesla P100
  • Hours used: 0.33
  • Cloud Provider: Kaggle
  • Carbon Emitted: 0.04

Citation

If you use this model, please cite it as follows:

@misc{maiurilorenzo/histoplastic-cancer-CNN-classifier,
  author = {Lorenzo Maiuri},
  title = {maiurilorenzo/histoplastic-cancer-CNN-classifier},
  year = {2025},
  publisher = {Hugging Face Hub},
  license = {MIT}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support image-classification models for pytorch library.

Evaluation results