PULSE-7B Handler Deployment Guide

🚀 Deployment Rehberi

Gereksinimler

Python 3.8+
CUDA 11.8+ (GPU kullanımı için)
Minimum 16GB RAM (CPU), 8GB VRAM (GPU)

Kurulum

Bağımlılıkları yükleyin:

pip install -r requirements.txt

PULSE LLaVA Installation (PULSE-7B için kritik):

# PULSE-7B için PULSE'un kendi LLaVA implementasyonu gerekli:
pip install git+https://github.com/AIMedLab/PULSE.git#subdirectory=LLaVA

# Bu otomatik olarak transformers==4.37.2 yükleyecektir

Flash Attention (isteğe bağlı, performans için):

pip install flash-attn --no-build-isolation

HuggingFace Inference Deployment

1. Model Repository Yapısı

your-model-repo/
├── handler.py
├── config.json
├── generation_config.json
├── requirements.txt
├── model.safetensors.index.json
├── tokenizer_config.json
├── special_tokens_map.json
└── tokenizer.model

2. Endpoint Oluşturma

# HuggingFace CLI ile deploy
huggingface-cli login
huggingface-cli repo create your-pulse-endpoint --type=space

3. Test Requests

Image URL ile test:

curl -X POST "YOUR_ENDPOINT_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "Analyze this ECG image",
      "image": "https://i.imgur.com/7uuejqO.jpeg"
    },
    "parameters": {
      "temperature": 0.2,
      "max_new_tokens": 512
    }
  }'

Base64 ile test:

curl -X POST "YOUR_ENDPOINT_URL" \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": {
      "query": "What do you see in this ECG?",
      "image": "data:image/jpeg;base64,/9j/4AAQ..."
    },
    "parameters": {
      "temperature": 0.2
    }
  }'

Performans Optimizasyonları

GPU Memory Optimizasyonu

torch_dtype=torch.bfloat16 kullanın
low_cpu_mem_usage=True ayarlayın
device_map="auto" ile otomatik dağıtım

CPU Optimizasyonu

torch_dtype=torch.float32 kullanın
Thread sayısını ayarlayın: torch.set_num_threads(4)

Monitoring ve Debugging

Log Seviyeleri

import logging
logging.basicConfig(level=logging.INFO)

Memory Usage

import torch
print(f"GPU Memory: {torch.cuda.memory_allocated()/1024**3:.2f}GB")

Troubleshooting

Common Issues:

"llava_llama architecture not recognized" Error

# PULSE-7B Solution: Install PULSE's LLaVA implementation
pip install git+https://github.com/AIMedLab/PULSE.git#subdirectory=LLaVA

# Also install development transformers
pip install git+https://github.com/huggingface/transformers.git

# Or add both to requirements.txt:
git+https://github.com/huggingface/transformers.git
git+https://github.com/AIMedLab/PULSE.git#subdirectory=LLaVA

CUDA Out of Memory
- Batch size'ı azaltın
- max_new_tokens değerini düşürün
- Gradient checkpointing kullanın
Slow Image Processing
- Image timeout değerini artırın
- Image resize threshold ayarlayın
Model Loading Issues
- HuggingFace token'ını kontrol edin
- Network bağlantısını doğrulayın
- Cache dizinini temizleyin
- Transformers sürümünü kontrol edin

Security Best Practices

Image URL'leri validate edin
Base64 boyut limitlerini ayarlayın
Rate limiting uygulayın
Input sanitization yapın

Monitoring Metrics

Response time
Memory usage
Error rates
Image processing success rate
Token generation speed