File size: 11,592 Bytes

cdf4769
 
 
1fedb07
cdf4769

---
license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
library_name: onnx
tags:
- stable-diffusion
- text-to-image
- diffusion
- webgpu
- browser-ai
- onnx
- zhare-ai
- client-side
- privacy-preserving
pipeline_tag: text-to-image
inference: false
widget:
- text: "A beautiful sunset over mountains, digital art style"
  example_title: "Mountain Sunset"
- text: "A futuristic cityscape with flying cars at night, cyberpunk"
  example_title: "Cyberpunk City"
- text: "A serene lake surrounded by autumn trees, oil painting"
  example_title: "Autumn Lake"
- text: "Portrait of a wise elderly person, studio lighting, photorealistic"
  example_title: "Portrait"
model-index:
- name: sd-1-5-webgpu
  results:
  - task:
      type: text-to-image
      name: Text-to-Image Generation
    dataset:
      name: Browser Performance Benchmark
      type: webgpu-inference
    metrics:
    - type: generation-time
      value: 3-45
      name: Generation Time (seconds)
      config: 512x512, 20 steps, various hardware
    - type: memory-usage
      value: 4-6
      name: VRAM Usage (GB)
      config: WebGPU acceleration
    - type: model-size
      value: 3.5
      name: Total Model Size (GB)
      config: All ONNX components
---

<div align="center">
  <img src="zhare-logo.png" alt="Zhare-AI Logo" width="200" height="auto" style="margin-bottom: 20px;">
</div>

# Stable Diffusion 1.5 WebGPU by Zhare-AI

<div align="center">
  
![License](https://img.shields.io/badge/License-CreativeML_OpenRAIL--M-blue.svg)
![WebGPU](https://img.shields.io/badge/WebGPU-Ready-green)
![Privacy](https://img.shields.io/badge/Privacy-First-purple)
![Production](https://img.shields.io/badge/Production-Ready-brightgreen)

**Privacy-preserving text-to-image generation in your browser with WebGPU acceleration**

</div>

This is a browser-optimized implementation of Stable Diffusion v1.5, specifically converted and optimized for client-side deployment using WebGPU acceleration. Developed by **Zhare-AI**, this model enables high-quality image generation directly in web browsers without requiring server infrastructure, ensuring complete user privacy and data sovereignty.

<div align="center">
  <img src="zhare-logo.png" alt="Zhare-AI - Democratizing AI" width="150" height="auto">
  <p><em>Democratizing AI through distributed computing and privacy-preserving technology</em></p>
</div>

## 🌟 Key Features

- 🌐 **Fully Client-Side**: Complete image generation in the browser, no data leaves your device
- ⚡ **WebGPU Accelerated**: Hardware-accelerated inference with automatic WebAssembly fallback  
- 🔒 **Privacy-First**: All processing happens locally, protecting user prompts and generated content
- 📱 **Cross-Platform**: Compatible with desktop and mobile browsers
- 🛠️ **Production-Ready**: Optimized for real-world web applications

## 🚀 Quick Start

### Installation & Setup

```bash
# Clone or download the model
git lfs install
git clone https://huggingface.co/Zhare-AI/sd-1-5-webgpu
```

## 📊 Performance Specifications

### Model Architecture

| Component | Description | Approximate Size |
|-----------|-------------|------------------|
| **Text Encoder** | CLIP ViT-L/14 for text understanding | ~500MB |
| **UNet** | Core diffusion model for image generation | ~3.4GB |
| **VAE Decoder** | Converts latents to final images | ~160MB |
| **VAE Encoder** | Encodes images to latent space | ~160MB |
| **Safety Checker** | Content filtering (optional) | ~600MB |

**Total Model Size**: ~4.8GB (without safety checker: ~4.2GB)

### Browser Performance Benchmarks

*Generation time for 512×512 images with 20 inference steps:*

| Hardware Category | Example Device | Typical Performance |
|------------------|----------------|-------------------|
| **High-End Desktop** | RTX 4090, RTX 4080 | 3-8 seconds |
| **Gaming Desktop** | RTX 3080, RTX 3070 | 8-15 seconds |
| **Intel Arc GPUs** | Arc A750, Arc A770 | 8-15 seconds |
| **AMD High-End** | RX 7900 XT/XTX | 6-12 seconds |
| **Apple Silicon** | M2 Max, M1 Ultra | 10-20 seconds |
| **Integrated GPUs** | Intel Iris Xe | 25-50 seconds |
| **WebAssembly Fallback** | CPU-only devices | 2-10 minutes |

### System Requirements

- **Minimum VRAM**: 4GB (recommended: 6GB+)
- **System RAM**: 8GB minimum, 16GB recommended
- **Storage**: 5GB free space for model files
- **Browser**: Chrome 113+, Edge 113+ (WebGPU), or any modern browser (WebAssembly fallback)

## 🌐 Browser Compatibility

| Browser | WebGPU Support | Performance Level | Notes |
|---------|---------------|------------------|-------|
| **Chrome 113+** | ✅ Full Support | Excellent | Primary recommendation |
| **Microsoft Edge 113+** | ✅ Full Support | Excellent | Primary recommendation |
| **Firefox 141+** | ✅ Stable Support | Very Good | Recent WebGPU implementation |
| **Safari 17.4+** | 🔶 Experimental | Good | Behind feature flag |
| **Mobile Chrome 121+** | 🔶 Limited | Fair | Android only, limited memory |

*All browsers support WebAssembly fallback for universal compatibility*

## 📝 Model Details

### Training Information

This model is based on Stable Diffusion v1.5 with the following training characteristics:

- **Base Dataset**: LAION-5B filtered subset (~590M image-text pairs)
- **Training Resolution**: 512×512 pixels
- **Architecture**: Latent Diffusion Model with CLIP ViT-L/14 text encoder
- **Precision**: Originally trained in FP32, optimized to FP16 for browser deployment

### Optimization for Web Deployment

- **ONNX Conversion**: Optimized computational graph for web inference
- **WebGPU Kernels**: Custom compute shaders for GPU acceleration
- **Memory Efficiency**: Attention slicing and dynamic memory management
- **Cross-Platform**: WebAssembly fallback ensures universal browser support

## 🛡️ Ethical Use and Safety

### Built-in Safety Features

- **Content Filter**: Optional NSFW detection and filtering
- **Prompt Sanitization**: Basic filtering of potentially harmful prompts
- **Local Processing**: No data transmission ensures privacy protection

### Responsible Use Guidelines

✅ **Encouraged Uses:**
- Creative art and design projects
- Educational demonstrations of AI capabilities
- Rapid prototyping for applications
- Personal creative exploration
- Research and development

❌ **Prohibited Uses:**
- Creating harmful, offensive, or illegal content
- Generating misleading information or deepfakes
- Violating copyright or intellectual property rights
- Any use that violates the CreativeML OpenRAIL-M license terms

### Privacy and Data Protection

- **Zero Data Collection**: All processing occurs locally in your browser
- **No Server Communication**: Model runs entirely offline after initial download
- **User Control**: Complete control over generated content and prompts
- **GDPR Compliant**: No personal data processing or storage

## ⚠️ Limitations and Considerations

### Technical Limitations

- **Resolution**: Optimized for 512×512 (other resolutions may reduce quality)
- **Batch Size**: Single image generation only in browser environment
- **Memory Constraints**: Limited by browser and device VRAM/RAM
- **Generation Speed**: Slower than dedicated server hardware

### Content Limitations

- **Language Bias**: Best performance with English prompts
- **Cultural Representation**: Training data may reflect Western/English-speaking biases
- **Artistic Style**: Tendency toward photorealistic and digital art styles
- **Consistency**: Multiple generations from same prompt may vary significantly

### Browser-Specific Considerations

- **WebGPU Availability**: Limited to supporting browsers and devices
- **Memory Management**: Browser security limits may affect large model loading
- **Performance Variance**: Significant variation across different devices and browsers

## 📜 License: CreativeML OpenRAIL-M

This model is released under the **CreativeML OpenRAIL-M** license, which allows for:

✅ **Permitted:**
- Commercial and non-commercial use
- Distribution and modification
- Creation of derivative works
- Integration into applications and services

🚫 **Restrictions:**
- Must not be used to generate harmful content
- Cannot be used for illegal activities
- Must include license terms in any distribution
- Derivative works must maintain the same license restrictions

**Full License Text**: Available at [CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license)

### License Compliance

When using this model:
1. **Include License**: Provide license terms to end users
2. **Respect Restrictions**: Ensure use cases comply with content restrictions
3. **Derivative Works**: Apply same license to modified versions
4. **Attribution**: Credit original Stable Diffusion creators and Zhare-AI adaptation

## 🏢 About Zhare-AI

<div align="center">
  <img src="zhare-logo.png" alt="Zhare-AI" width="120" height="auto" style="margin: 20px 0;">
</div>

**Zhare-AI** is focused on democratizing AI technology by making powerful models accessible directly in web browsers. Our mission is to enable privacy-preserving AI applications that put users in control of their data and creative processes.

- **Website**: [zhare.ai](https://zhare.ai)
- **Focus**: Distributed AI computing and browser-based AI applications
- **Philosophy**: Privacy-first, user-controlled AI experiences
- **Vision**: Making AI accessible, private, and distributed

### Our Mission

We believe AI should be:
- **Accessible** to everyone, regardless of infrastructure
- **Private** with complete user data control
- **Distributed** across devices rather than centralized servers
- **Transparent** with open-source implementations

## 📚 Citation and References

### Cite This Work

```bibtex
@misc{zhare-ai-sd15-webgpu-2025,
  title={Stable Diffusion 1.5 WebGPU: Browser-Optimized Text-to-Image Generation},
  author={Zhare-AI},
  year={2025},
  howpublished={\url{https://huggingface.co/Zhare-AI/sd-1-5-webgpu}},
  note={WebGPU-optimized implementation for privacy-preserving browser-based image generation}
}
```

### Original Stable Diffusion Citation

```bibtex
@InProceedings{Rombach_2022_CVPR,
  author    = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Björn},
  title     = {High-Resolution Image Synthesis With Latent Diffusion Models},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  month     = {June},
  year      = {2022},
  pages     = {10684-10695}
}
```

## 🤝 Community and Support

### Getting Help

- **Issues**: Report technical problems via the repository issues
- **Discussions**: Join the community discussion for tips and examples
- **Documentation**: Comprehensive guides available in the repository

### Contributing

We welcome contributions to improve browser compatibility, performance, and user experience:

- Performance optimizations for different hardware
- Browser compatibility improvements  
- Documentation enhancements
- Example applications and tutorials

---

<div align="center">
  <img src="zhare-logo.png" alt="Zhare-AI" width="100" height="auto">
  
**🚀 Ready to create amazing images directly in your browser?**

*This model brings the power of Stable Diffusion to web applications while keeping your data completely private and secure.*

**Developed with ❤️ by Zhare-AI for the open-source community**

[🌐 Visit Zhare.ai](https://zhare.ai) | [📧 Contact Us](mailto:[email protected]) | [💬 Join Discussion](https://github.com/Zhare-AI)

</div>