|
2025-08-18 22:25:05 - INFO - Loading model: Qwen/Qwen2-VL-2B-Instruct-AWQ |
|
2025-08-18 22:25:08 - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk). |
|
2025-08-18 22:25:15 - INFO - Model loaded in 10.72 seconds |
|
2025-08-18 22:25:15 - INFO - GPU Memory Usage after model load: 2.31 GB |
|
2025-08-18 22:25:32 - INFO - [a4aa5634-1f05-4a10-a409-f6f99576382b] Received new video inference request. Prompt: 'Please describe the video.', Video: 'messi_part_001.mp4' |
|
2025-08-18 22:25:32 - INFO - [a4aa5634-1f05-4a10-a409-f6f99576382b] Video saved to temporary file: temp_videos/a4aa5634-1f05-4a10-a409-f6f99576382b.mp4 |
|
2025-08-18 22:25:32 - INFO - [a4aa5634-1f05-4a10-a409-f6f99576382b] Extracting frames using method: uniform, rate/threshold: 30 |
|
2025-08-18 22:25:36 - INFO - [a4aa5634-1f05-4a10-a409-f6f99576382b] Extracted 30 frames successfully. Saving to temporary files... |
|
2025-08-18 22:25:36 - INFO - [a4aa5634-1f05-4a10-a409-f6f99576382b] 30 frames saved to temp_videos/a4aa5634-1f05-4a10-a409-f6f99576382b |
|
2025-08-18 22:25:37 - INFO - Prompt token length: 2276 |
|
2025-08-18 22:25:47 - ERROR - [a4aa5634-1f05-4a10-a409-f6f99576382b] An error occurred during processing: 'avg_gpu_memory_mb' |
|
Traceback (most recent call last): |
|
File "/mnt/data/xiuying/Code/local_deploy/infer.py", line 109, in video_inference |
|
logging.info(f"Tokens per second: {output['tokens_per_second']}, Avg GPU memory MB: {output['avg_gpu_memory_mb']}") |
|
~~~~~~^^^^^^^^^^^^^^^^^^^^^ |
|
KeyError: 'avg_gpu_memory_mb' |
|
2025-08-18 22:25:47 - INFO - [a4aa5634-1f05-4a10-a409-f6f99576382b] Cleaned up temporary file: temp_videos/a4aa5634-1f05-4a10-a409-f6f99576382b.mp4 |
|
2025-08-18 22:25:47 - INFO - [a4aa5634-1f05-4a10-a409-f6f99576382b] Cleaned up temporary frame directory: temp_videos/a4aa5634-1f05-4a10-a409-f6f99576382b |
|
|