File size: 11,592 Bytes
cdf4769 1fedb07 cdf4769 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 |
---
license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
library_name: onnx
tags:
- stable-diffusion
- text-to-image
- diffusion
- webgpu
- browser-ai
- onnx
- zhare-ai
- client-side
- privacy-preserving
pipeline_tag: text-to-image
inference: false
widget:
- text: "A beautiful sunset over mountains, digital art style"
example_title: "Mountain Sunset"
- text: "A futuristic cityscape with flying cars at night, cyberpunk"
example_title: "Cyberpunk City"
- text: "A serene lake surrounded by autumn trees, oil painting"
example_title: "Autumn Lake"
- text: "Portrait of a wise elderly person, studio lighting, photorealistic"
example_title: "Portrait"
model-index:
- name: sd-1-5-webgpu
results:
- task:
type: text-to-image
name: Text-to-Image Generation
dataset:
name: Browser Performance Benchmark
type: webgpu-inference
metrics:
- type: generation-time
value: 3-45
name: Generation Time (seconds)
config: 512x512, 20 steps, various hardware
- type: memory-usage
value: 4-6
name: VRAM Usage (GB)
config: WebGPU acceleration
- type: model-size
value: 3.5
name: Total Model Size (GB)
config: All ONNX components
---
<div align="center">
<img src="zhare-logo.png" alt="Zhare-AI Logo" width="200" height="auto" style="margin-bottom: 20px;">
</div>
# Stable Diffusion 1.5 WebGPU by Zhare-AI
<div align="center">




**Privacy-preserving text-to-image generation in your browser with WebGPU acceleration**
</div>
This is a browser-optimized implementation of Stable Diffusion v1.5, specifically converted and optimized for client-side deployment using WebGPU acceleration. Developed by **Zhare-AI**, this model enables high-quality image generation directly in web browsers without requiring server infrastructure, ensuring complete user privacy and data sovereignty.
<div align="center">
<img src="zhare-logo.png" alt="Zhare-AI - Democratizing AI" width="150" height="auto">
<p><em>Democratizing AI through distributed computing and privacy-preserving technology</em></p>
</div>
## 🌟 Key Features
- 🌐 **Fully Client-Side**: Complete image generation in the browser, no data leaves your device
- ⚡ **WebGPU Accelerated**: Hardware-accelerated inference with automatic WebAssembly fallback
- 🔒 **Privacy-First**: All processing happens locally, protecting user prompts and generated content
- 📱 **Cross-Platform**: Compatible with desktop and mobile browsers
- 🛠️ **Production-Ready**: Optimized for real-world web applications
## 🚀 Quick Start
### Installation & Setup
```bash
# Clone or download the model
git lfs install
git clone https://huggingface.co/Zhare-AI/sd-1-5-webgpu
```
## 📊 Performance Specifications
### Model Architecture
| Component | Description | Approximate Size |
|-----------|-------------|------------------|
| **Text Encoder** | CLIP ViT-L/14 for text understanding | ~500MB |
| **UNet** | Core diffusion model for image generation | ~3.4GB |
| **VAE Decoder** | Converts latents to final images | ~160MB |
| **VAE Encoder** | Encodes images to latent space | ~160MB |
| **Safety Checker** | Content filtering (optional) | ~600MB |
**Total Model Size**: ~4.8GB (without safety checker: ~4.2GB)
### Browser Performance Benchmarks
*Generation time for 512×512 images with 20 inference steps:*
| Hardware Category | Example Device | Typical Performance |
|------------------|----------------|-------------------|
| **High-End Desktop** | RTX 4090, RTX 4080 | 3-8 seconds |
| **Gaming Desktop** | RTX 3080, RTX 3070 | 8-15 seconds |
| **Intel Arc GPUs** | Arc A750, Arc A770 | 8-15 seconds |
| **AMD High-End** | RX 7900 XT/XTX | 6-12 seconds |
| **Apple Silicon** | M2 Max, M1 Ultra | 10-20 seconds |
| **Integrated GPUs** | Intel Iris Xe | 25-50 seconds |
| **WebAssembly Fallback** | CPU-only devices | 2-10 minutes |
### System Requirements
- **Minimum VRAM**: 4GB (recommended: 6GB+)
- **System RAM**: 8GB minimum, 16GB recommended
- **Storage**: 5GB free space for model files
- **Browser**: Chrome 113+, Edge 113+ (WebGPU), or any modern browser (WebAssembly fallback)
## 🌐 Browser Compatibility
| Browser | WebGPU Support | Performance Level | Notes |
|---------|---------------|------------------|-------|
| **Chrome 113+** | ✅ Full Support | Excellent | Primary recommendation |
| **Microsoft Edge 113+** | ✅ Full Support | Excellent | Primary recommendation |
| **Firefox 141+** | ✅ Stable Support | Very Good | Recent WebGPU implementation |
| **Safari 17.4+** | 🔶 Experimental | Good | Behind feature flag |
| **Mobile Chrome 121+** | 🔶 Limited | Fair | Android only, limited memory |
*All browsers support WebAssembly fallback for universal compatibility*
## 📝 Model Details
### Training Information
This model is based on Stable Diffusion v1.5 with the following training characteristics:
- **Base Dataset**: LAION-5B filtered subset (~590M image-text pairs)
- **Training Resolution**: 512×512 pixels
- **Architecture**: Latent Diffusion Model with CLIP ViT-L/14 text encoder
- **Precision**: Originally trained in FP32, optimized to FP16 for browser deployment
### Optimization for Web Deployment
- **ONNX Conversion**: Optimized computational graph for web inference
- **WebGPU Kernels**: Custom compute shaders for GPU acceleration
- **Memory Efficiency**: Attention slicing and dynamic memory management
- **Cross-Platform**: WebAssembly fallback ensures universal browser support
## 🛡️ Ethical Use and Safety
### Built-in Safety Features
- **Content Filter**: Optional NSFW detection and filtering
- **Prompt Sanitization**: Basic filtering of potentially harmful prompts
- **Local Processing**: No data transmission ensures privacy protection
### Responsible Use Guidelines
✅ **Encouraged Uses:**
- Creative art and design projects
- Educational demonstrations of AI capabilities
- Rapid prototyping for applications
- Personal creative exploration
- Research and development
❌ **Prohibited Uses:**
- Creating harmful, offensive, or illegal content
- Generating misleading information or deepfakes
- Violating copyright or intellectual property rights
- Any use that violates the CreativeML OpenRAIL-M license terms
### Privacy and Data Protection
- **Zero Data Collection**: All processing occurs locally in your browser
- **No Server Communication**: Model runs entirely offline after initial download
- **User Control**: Complete control over generated content and prompts
- **GDPR Compliant**: No personal data processing or storage
## ⚠️ Limitations and Considerations
### Technical Limitations
- **Resolution**: Optimized for 512×512 (other resolutions may reduce quality)
- **Batch Size**: Single image generation only in browser environment
- **Memory Constraints**: Limited by browser and device VRAM/RAM
- **Generation Speed**: Slower than dedicated server hardware
### Content Limitations
- **Language Bias**: Best performance with English prompts
- **Cultural Representation**: Training data may reflect Western/English-speaking biases
- **Artistic Style**: Tendency toward photorealistic and digital art styles
- **Consistency**: Multiple generations from same prompt may vary significantly
### Browser-Specific Considerations
- **WebGPU Availability**: Limited to supporting browsers and devices
- **Memory Management**: Browser security limits may affect large model loading
- **Performance Variance**: Significant variation across different devices and browsers
## 📜 License: CreativeML OpenRAIL-M
This model is released under the **CreativeML OpenRAIL-M** license, which allows for:
✅ **Permitted:**
- Commercial and non-commercial use
- Distribution and modification
- Creation of derivative works
- Integration into applications and services
🚫 **Restrictions:**
- Must not be used to generate harmful content
- Cannot be used for illegal activities
- Must include license terms in any distribution
- Derivative works must maintain the same license restrictions
**Full License Text**: Available at [CreativeML OpenRAIL-M License](https://huggingface.co/spaces/CompVis/stable-diffusion-license)
### License Compliance
When using this model:
1. **Include License**: Provide license terms to end users
2. **Respect Restrictions**: Ensure use cases comply with content restrictions
3. **Derivative Works**: Apply same license to modified versions
4. **Attribution**: Credit original Stable Diffusion creators and Zhare-AI adaptation
## 🏢 About Zhare-AI
<div align="center">
<img src="zhare-logo.png" alt="Zhare-AI" width="120" height="auto" style="margin: 20px 0;">
</div>
**Zhare-AI** is focused on democratizing AI technology by making powerful models accessible directly in web browsers. Our mission is to enable privacy-preserving AI applications that put users in control of their data and creative processes.
- **Website**: [zhare.ai](https://zhare.ai)
- **Focus**: Distributed AI computing and browser-based AI applications
- **Philosophy**: Privacy-first, user-controlled AI experiences
- **Vision**: Making AI accessible, private, and distributed
### Our Mission
We believe AI should be:
- **Accessible** to everyone, regardless of infrastructure
- **Private** with complete user data control
- **Distributed** across devices rather than centralized servers
- **Transparent** with open-source implementations
## 📚 Citation and References
### Cite This Work
```bibtex
@misc{zhare-ai-sd15-webgpu-2025,
title={Stable Diffusion 1.5 WebGPU: Browser-Optimized Text-to-Image Generation},
author={Zhare-AI},
year={2025},
howpublished={\url{https://huggingface.co/Zhare-AI/sd-1-5-webgpu}},
note={WebGPU-optimized implementation for privacy-preserving browser-based image generation}
}
```
### Original Stable Diffusion Citation
```bibtex
@InProceedings{Rombach_2022_CVPR,
author = {Rombach, Robin and Blattmann, Andreas and Lorenz, Dominik and Esser, Patrick and Ommer, Björn},
title = {High-Resolution Image Synthesis With Latent Diffusion Models},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2022},
pages = {10684-10695}
}
```
## 🤝 Community and Support
### Getting Help
- **Issues**: Report technical problems via the repository issues
- **Discussions**: Join the community discussion for tips and examples
- **Documentation**: Comprehensive guides available in the repository
### Contributing
We welcome contributions to improve browser compatibility, performance, and user experience:
- Performance optimizations for different hardware
- Browser compatibility improvements
- Documentation enhancements
- Example applications and tutorials
---
<div align="center">
<img src="zhare-logo.png" alt="Zhare-AI" width="100" height="auto">
**🚀 Ready to create amazing images directly in your browser?**
*This model brings the power of Stable Diffusion to web applications while keeping your data completely private and secure.*
**Developed with ❤️ by Zhare-AI for the open-source community**
[🌐 Visit Zhare.ai](https://zhare.ai) | [📧 Contact Us](mailto:[email protected]) | [💬 Join Discussion](https://github.com/Zhare-AI)
</div> |