Multi-Model Orchestrator
A sophisticated multi-model orchestration system that manages parent-child LLM relationships, specifically integrating CLIP-GPT2 image captioner and Flickr30k text-to-image models.
🚀 Features
Parent Orchestrator
- Intelligent Task Routing: Automatically routes tasks to appropriate child models
- Model Management: Handles loading, caching, and lifecycle of child models
- Error Handling: Robust error handling and recovery mechanisms
- Task History: Comprehensive logging and monitoring of all operations
- Async Support: Both synchronous and asynchronous processing modes
Child Models
- CLIP-GPT2 Image Captioner: Converts images to descriptive text captions
- Flickr30k Text-to-Image: Generates images from text descriptions
- Extensible Architecture: Easy to add new child models
📦 Installation
pip install git+https://huggingface.co/kunaliitkgp09/multi-model-orchestrator
🎯 Quick Start
from multi_model_orchestrator import SimpleMultiModelOrchestrator
# Initialize orchestrator
orchestrator = SimpleMultiModelOrchestrator()
orchestrator.initialize_models()
# Generate caption from image
caption = orchestrator.generate_caption("sample_image.jpg")
print(f"Caption: {caption}")
# Generate image from text
image_path = orchestrator.generate_image("A beautiful sunset over mountains")
print(f"Generated image: {image_path}")
🔗 Model Integration
Child Model 1: CLIP-GPT2 Image Captioner
- Model:
kunaliitkgp09/clip-gpt2-image-captioner
- Task: Image-to-text captioning
- Performance: ~40% accuracy on test samples
Child Model 2: Flickr30k Text-to-Image
- Model:
kunaliitkgp09/flickr30k-text-to-image
- Task: Text-to-image generation
- Performance: Fine-tuned on Flickr30k dataset
📊 Usage Examples
Multimodal Processing
# Process both image and text together
results = orchestrator.process_multimodal_task(
image_path="sample_image.jpg",
text_prompt="A serene landscape with mountains"
)
print("Caption:", results["caption"])
print("Generated Image:", results["generated_image"])
Async Processing
from multi_model_orchestrator import AsyncMultiModelOrchestrator
import asyncio
async def async_example():
orchestrator = AsyncMultiModelOrchestrator()
orchestrator.initialize_models()
results = await orchestrator.process_multimodal_async(
image_path="sample_image.jpg",
text_prompt="A futuristic cityscape"
)
return results
asyncio.run(async_example())
🎯 Use Cases
- Content Creation: Generate captions and images for social media
- Research and Development: Model performance comparison and prototyping
- Production Systems: Automated content generation pipelines
- Educational Applications: AI model demonstration and learning
📈 Performance Metrics
- Processing Time: Optimized for real-time applications
- Memory Usage: Efficient GPU/CPU memory management
- Success Rate: Robust error handling and recovery
- Extensibility: Easy integration of new child models
🤝 Contributing
Contributions are welcome! Please feel free to submit pull requests or open issues for:
- New child model integrations
- Performance improvements
- Bug fixes
- Documentation enhancements
📄 License
This project is licensed under the MIT License.
🙏 Acknowledgments
- CLIP-GPT2 Model: kunaliitkgp09/clip-gpt2-image-captioner
- Stable Diffusion Model: kunaliitkgp09/flickr30k-text-to-image
- Hugging Face: For providing the model hosting platform
- PyTorch: For the deep learning framework
- Transformers: For the model loading and processing utilities
Happy Orchestrating! 🚀
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support