YOLO-World-V2 / README.md
qqc1989's picture
Update README.md
e888b5f verified
metadata
license: mit
language:
  - en
  - zh
tags:
  - YOLO World
pipeline_tag: zero-shot-object-detection

YOLOWorld

This SDK enables efficient Open-Vocabulary-Object-Detection using YOLO-Worldv2 Large, optimized for Axera’s NPU-based SoC platforms including AX650 Series, AX630C Series, AX8850 Series, or Axera's dedicated AI accelerator.

References links:

For those who are interested in model conversion, you can try to export axmodel through

Support Platform

Performance

Model Input Shape Latency (ms) CMM Usage (MB)
yolo_u16_ax650.axmodel 1 x 640 x 640 x 3 9.522 ms 21 MB
clip_b1_u16_ax650.axmodel 1 x 77 2.997 ms 137 MB
yolo_u16_ax630c.axmodel 1 x 640 x 640 x 3 43.450 ms 31 MB
clip_b1_u16_ax630c.axmodel 1 x 77 10.703 ms 134 MB

How to use

Download all files from this repository to the device

(py312) axera@raspberrypi:~/samples/yoloworldv2 $ tree
.
├── config.json
├── football.jpg
├── install
│   ├── bin
│   │   ├── axcl_aarch64
│   │   │   └── test_detect_by_text
│   │   ├── axcl_x86
│   │   │   └── test_detect_by_text
│   │   └── host_650
│   │       └── test_detect_by_text
│   └── lib
│       ├── axcl_aarch64
│       │   └── libyoloworld.so
│       ├── axcl_x86
│       │   └── libyoloworld.so
│       └── host_650
│           └── libyoloworld.so
├── models
│   ├── clip_b1_u16_ax630c.axmodel
│   ├── clip_b1_u16_ax650.axmodel
│   ├── yolo_u16_ax630c.axmodel
│   └── yolo_u16_ax650.axmodel
├── pyyoloworld
│   ├── example.py
│   ├── gardio_example.jpg
│   ├── gradio_example.py
│   ├── libyoloworld.so
│   ├── pyaxdev.py
│   ├── __pycache__
│   │   ├── pyaxdev.cpython-312.pyc
│   │   └── pyyoloworld.cpython-312.pyc
│   ├── pyyoloworld.py
│   └── requirements.txt
├── README.md
└── vocab.txt

13 directories, 23 files

python env requirement

pip install -r pyyoloworld/requirements.txt

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro)

TODO

Inference with M.2 Accelerator card

What is M.2 Accelerator card?, Show this DEMO based on Raspberry PI 5.

(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libstdc++.so.6
(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ cp install/lib/axcl_aarch64/libyoloworld.so pyyoloworld/
(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg $ cd pyyoloworld/
(py312) axera@raspberrypi:~/samples/yoloworldv2-new.hg/pyyoloworld $ python gradio_example.py --yoloworld ../models/yolo_u16_ax650.axmodel --tenc ../models/clip_b1_u16_ax650.axmodel --vocab ../vocab.txt
Trying to load: /home/axera/samples/yoloworldv2-new.hg/pyyoloworld/aarch64/libyoloworld.so
✅ Successfully loaded: /home/axera/samples/yoloworldv2-new.hg/pyyoloworld/libyoloworld.so
[I][                             run][  31]: AXCLWorker start with devid 0

input size: 2
    name:   images [unknown] [unknown]
        1 x 640 x 640 x 3   size: 1228800

    name: txt_feats [unknown] [unknown]
        1 x 4 x 512   size: 8192


output size: 3
    name:  stride8
        1 x 80 x 80 x 68   size: 1740800

    name: stride16
        1 x 40 x 40 x 68   size: 435200

    name: stride32
        1 x 20 x 20 x 68   size: 108800

[I][                       yw_create][ 408]: num_classes: 4, num_features: 512, input w: 640, h: 640
is_output_nhwc: 1

input size: 1
    name: text_token [unknown] [unknown]
        1 x 77   size: 308


output size: 1
    name:     2202
        1 x 1 x 512   size: 2048

[I][               load_text_encoder][  44]: text feature len 512
[I][                  load_tokenizer][  60]: text token len 77
* Running on local URL:  http://0.0.0.0:7860
* To create a public link, set `share=True` in `launch()`.

If your Raspberry PI 5 IP Address is 192.168.1.100, so using this URL http://192.168.1.100:7860 with your WebApp.

Input:man, shoes, ball, person and the test image

Result: