DFloat11
/

Qwen-Image-DF11

lossless compression

70% size, 100% accuracy

Model card Files Files and versions Community

LeanQuant commited on 15 days ago

Commit

e78d713

·

verified ·

1 Parent(s): 195a01d

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -49,6 +49,7 @@ This is a **DFloat11 losslessly compressed** version of the original `Qwen/Qwen-
     def parse_args():
         parser = argparse.ArgumentParser(description='Generate images using Qwen-Image model')
         parser.add_argument('--cpu_offload', action='store_true', help='Enable CPU offloading')
         parser.add_argument('--no_pin_memory', action='store_true', help='Disable memory pinning')
         parser.add_argument('--prompt', type=str, default='A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197".',
                             help='Text prompt for image generation')
@@ -83,6 +84,7 @@ This is a **DFloat11 losslessly compressed** version of the original `Qwen/Qwen-
         "DFloat11/Qwen-Image-DF11",
         device="cpu",
         cpu_offload=args.cpu_offload,
         pin_memory=not args.no_pin_memory,
         bfloat16_model=transformer,
     )
@@ -136,8 +138,12 @@ This is a **DFloat11 losslessly compressed** version of the original `Qwen/Qwen-
     python qwen_image.py --cpu_offload
     ```
-    If you are getting out-of-memory errors, try disabling memory-pinning:
     ```bash
     python qwen_image.py --cpu_offload --no_pin_memory
     ```

     def parse_args():
         parser = argparse.ArgumentParser(description='Generate images using Qwen-Image model')
         parser.add_argument('--cpu_offload', action='store_true', help='Enable CPU offloading')
+        parser.add_argument('--cpu_offload_blocks', type=int, default=None, help='Number of transformer blocks to offload to CPU')
         parser.add_argument('--no_pin_memory', action='store_true', help='Disable memory pinning')
         parser.add_argument('--prompt', type=str, default='A coffee shop entrance features a chalkboard sign reading "Qwen Coffee 😊 $2 per cup," with a neon light beside it displaying "通义千问". Next to it hangs a poster showing a beautiful Chinese woman, and beneath the poster is written "π≈3.1415926-53589793-23846264-33832795-02384197".',
                             help='Text prompt for image generation')
         "DFloat11/Qwen-Image-DF11",
         device="cpu",
         cpu_offload=args.cpu_offload,
+        cpu_offload_blocks=args.cpu_offload_blocks,
         pin_memory=not args.no_pin_memory,
         bfloat16_model=transformer,
     )
     python qwen_image.py --cpu_offload
     ```
+    If you are getting out-of-CPU-memory errors, try limiting the number of offloaded blocks or disabling memory-pinning:
     ```bash
+    # Offload only 16 blocks (offloading more blocks uses less GPU memory and more CPU memory; offloading less blocks is faster):
+    python qwen_image.py --cpu_offload --cpu_offload_blocks 16
+    # Disable memory-pinning (the most memory efficient way, but could be slower):
     python qwen_image.py --cpu_offload --no_pin_memory
     ```