Wangtwohappy's picture
Upload folder using huggingface_hub
f8ba0eb verified
2025-08-21 00:42:53 - INFO - Loading model: Qwen/Qwen2.5-VL-3B-Instruct-AWQ
2025-08-21 00:42:56 - INFO - We will use 90% of the memory on device 0 for storing the model, and 10% for the buffer to avoid OOM. You can set `max_memory` in to a higher value to use more memory (at your own risk).
2025-08-21 00:43:05 - INFO - Model loaded in 11.91 seconds
2025-08-21 00:43:05 - INFO - GPU Memory Usage after model load: 3250.55 MB
2025-08-21 00:44:34 - INFO - [85d08818-6d68-43fa-a772-626d83ea5d11] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_001.mp4'
2025-08-21 00:44:34 - INFO - [85d08818-6d68-43fa-a772-626d83ea5d11] Video saved to temporary file: temp_videos/85d08818-6d68-43fa-a772-626d83ea5d11.mp4
2025-08-21 00:44:34 - INFO - [85d08818-6d68-43fa-a772-626d83ea5d11] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:44:41 - INFO - [85d08818-6d68-43fa-a772-626d83ea5d11] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:44:41 - INFO - [85d08818-6d68-43fa-a772-626d83ea5d11] 30 frames saved to temp_videos/85d08818-6d68-43fa-a772-626d83ea5d11
2025-08-21 00:44:41 - INFO - Prompt token length: 2306
2025-08-21 00:44:47 - INFO - Tokens per second: 11.896679103804114, Peak GPU memory MB: 5348.375
2025-08-21 00:44:47 - INFO - [85d08818-6d68-43fa-a772-626d83ea5d11] Inference time: 12.73 seconds, CPU usage: 20.1%, CPU core utilization: [17.7, 19.0, 21.6, 22.0]
2025-08-21 00:44:47 - INFO - [85d08818-6d68-43fa-a772-626d83ea5d11] Cleaned up temporary frame directory: temp_videos/85d08818-6d68-43fa-a772-626d83ea5d11
2025-08-21 00:44:47 - INFO - [6f9278db-56d7-44a9-b7f0-7200571a0979] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_002.mp4'
2025-08-21 00:44:47 - INFO - [6f9278db-56d7-44a9-b7f0-7200571a0979] Video saved to temporary file: temp_videos/6f9278db-56d7-44a9-b7f0-7200571a0979.mp4
2025-08-21 00:44:47 - INFO - [6f9278db-56d7-44a9-b7f0-7200571a0979] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:44:52 - INFO - [6f9278db-56d7-44a9-b7f0-7200571a0979] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:44:52 - INFO - [6f9278db-56d7-44a9-b7f0-7200571a0979] 30 frames saved to temp_videos/6f9278db-56d7-44a9-b7f0-7200571a0979
2025-08-21 00:44:52 - INFO - Prompt token length: 2306
2025-08-21 00:44:57 - INFO - Tokens per second: 12.02869428489415, Peak GPU memory MB: 5348.375
2025-08-21 00:44:57 - INFO - [6f9278db-56d7-44a9-b7f0-7200571a0979] Inference time: 10.58 seconds, CPU usage: 55.1%, CPU core utilization: [42.1, 41.3, 93.8, 43.2]
2025-08-21 00:44:57 - INFO - [6f9278db-56d7-44a9-b7f0-7200571a0979] Cleaned up temporary frame directory: temp_videos/6f9278db-56d7-44a9-b7f0-7200571a0979
2025-08-21 00:44:57 - INFO - [2835a505-ec18-45e1-9b43-393c4eb0c79a] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_003.mp4'
2025-08-21 00:44:57 - INFO - [2835a505-ec18-45e1-9b43-393c4eb0c79a] Video saved to temporary file: temp_videos/2835a505-ec18-45e1-9b43-393c4eb0c79a.mp4
2025-08-21 00:44:57 - INFO - [2835a505-ec18-45e1-9b43-393c4eb0c79a] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:45:02 - INFO - [2835a505-ec18-45e1-9b43-393c4eb0c79a] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:45:02 - INFO - [2835a505-ec18-45e1-9b43-393c4eb0c79a] 30 frames saved to temp_videos/2835a505-ec18-45e1-9b43-393c4eb0c79a
2025-08-21 00:45:02 - INFO - Prompt token length: 2306
2025-08-21 00:45:08 - INFO - Tokens per second: 11.82593643667435, Peak GPU memory MB: 5348.375
2025-08-21 00:45:08 - INFO - [2835a505-ec18-45e1-9b43-393c4eb0c79a] Inference time: 10.16 seconds, CPU usage: 56.2%, CPU core utilization: [90.9, 44.3, 46.7, 42.8]
2025-08-21 00:45:08 - INFO - [2835a505-ec18-45e1-9b43-393c4eb0c79a] Cleaned up temporary frame directory: temp_videos/2835a505-ec18-45e1-9b43-393c4eb0c79a
2025-08-21 00:45:08 - INFO - [9ad1595c-b1c3-409e-99bb-050a41cf9e9e] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_004.mp4'
2025-08-21 00:45:08 - INFO - [9ad1595c-b1c3-409e-99bb-050a41cf9e9e] Video saved to temporary file: temp_videos/9ad1595c-b1c3-409e-99bb-050a41cf9e9e.mp4
2025-08-21 00:45:08 - INFO - [9ad1595c-b1c3-409e-99bb-050a41cf9e9e] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:45:13 - INFO - [9ad1595c-b1c3-409e-99bb-050a41cf9e9e] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:45:13 - INFO - [9ad1595c-b1c3-409e-99bb-050a41cf9e9e] 30 frames saved to temp_videos/9ad1595c-b1c3-409e-99bb-050a41cf9e9e
2025-08-21 00:45:13 - INFO - Prompt token length: 2306
2025-08-21 00:45:19 - INFO - Tokens per second: 11.785621023429538, Peak GPU memory MB: 5348.375
2025-08-21 00:45:19 - INFO - [9ad1595c-b1c3-409e-99bb-050a41cf9e9e] Inference time: 11.90 seconds, CPU usage: 53.0%, CPU core utilization: [38.8, 90.1, 41.1, 42.3]
2025-08-21 00:45:19 - INFO - [9ad1595c-b1c3-409e-99bb-050a41cf9e9e] Cleaned up temporary frame directory: temp_videos/9ad1595c-b1c3-409e-99bb-050a41cf9e9e
2025-08-21 00:45:19 - INFO - [83ee3b32-7870-4d00-b3f0-d1ec1167d45e] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_005.mp4'
2025-08-21 00:45:19 - INFO - [83ee3b32-7870-4d00-b3f0-d1ec1167d45e] Video saved to temporary file: temp_videos/83ee3b32-7870-4d00-b3f0-d1ec1167d45e.mp4
2025-08-21 00:45:19 - INFO - [83ee3b32-7870-4d00-b3f0-d1ec1167d45e] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:45:24 - INFO - [83ee3b32-7870-4d00-b3f0-d1ec1167d45e] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:45:24 - INFO - [83ee3b32-7870-4d00-b3f0-d1ec1167d45e] 30 frames saved to temp_videos/83ee3b32-7870-4d00-b3f0-d1ec1167d45e
2025-08-21 00:45:25 - INFO - Prompt token length: 2306
2025-08-21 00:45:32 - INFO - Tokens per second: 9.017638706034026, Peak GPU memory MB: 5348.375
2025-08-21 00:45:32 - INFO - [83ee3b32-7870-4d00-b3f0-d1ec1167d45e] Inference time: 12.17 seconds, CPU usage: 75.1%, CPU core utilization: [69.4, 92.0, 68.0, 70.7]
2025-08-21 00:45:32 - INFO - [83ee3b32-7870-4d00-b3f0-d1ec1167d45e] Cleaned up temporary frame directory: temp_videos/83ee3b32-7870-4d00-b3f0-d1ec1167d45e
2025-08-21 00:45:50 - INFO - [91458b58-07b8-4a0e-bbec-63fde300aebc] Received new video inference request. Prompt: 'Please describe the video in detail, only focus on customer and staff behavior and activities and do not overly describe the static scene.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_001.mp4'
2025-08-21 00:45:50 - INFO - [91458b58-07b8-4a0e-bbec-63fde300aebc] Video saved to temporary file: temp_videos/91458b58-07b8-4a0e-bbec-63fde300aebc.mp4
2025-08-21 00:45:50 - INFO - [91458b58-07b8-4a0e-bbec-63fde300aebc] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:45:57 - INFO - [91458b58-07b8-4a0e-bbec-63fde300aebc] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:45:57 - INFO - [91458b58-07b8-4a0e-bbec-63fde300aebc] 30 frames saved to temp_videos/91458b58-07b8-4a0e-bbec-63fde300aebc
2025-08-21 00:45:57 - INFO - Prompt token length: 2296
2025-08-21 00:46:18 - INFO - Tokens per second: 11.854063880552362, Peak GPU memory MB: 5348.375
2025-08-21 00:46:18 - INFO - [91458b58-07b8-4a0e-bbec-63fde300aebc] Inference time: 28.38 seconds, CPU usage: 43.3%, CPU core utilization: [34.4, 74.4, 32.5, 31.9]
2025-08-21 00:46:18 - INFO - [91458b58-07b8-4a0e-bbec-63fde300aebc] Cleaned up temporary frame directory: temp_videos/91458b58-07b8-4a0e-bbec-63fde300aebc
2025-08-21 00:46:18 - INFO - [65b42141-20bf-4cf1-92b1-f29d846146ab] Received new video inference request. Prompt: 'Please describe the video in detail, only focus on customer and staff behavior and activities and do not overly describe the static scene.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_002.mp4'
2025-08-21 00:46:18 - INFO - [65b42141-20bf-4cf1-92b1-f29d846146ab] Video saved to temporary file: temp_videos/65b42141-20bf-4cf1-92b1-f29d846146ab.mp4
2025-08-21 00:46:18 - INFO - [65b42141-20bf-4cf1-92b1-f29d846146ab] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:46:23 - INFO - [65b42141-20bf-4cf1-92b1-f29d846146ab] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:46:23 - INFO - [65b42141-20bf-4cf1-92b1-f29d846146ab] 30 frames saved to temp_videos/65b42141-20bf-4cf1-92b1-f29d846146ab
2025-08-21 00:46:23 - INFO - Prompt token length: 2296
2025-08-21 00:46:47 - INFO - Tokens per second: 11.997021386458192, Peak GPU memory MB: 5348.375
2025-08-21 00:46:47 - INFO - [65b42141-20bf-4cf1-92b1-f29d846146ab] Inference time: 29.08 seconds, CPU usage: 37.0%, CPU core utilization: [33.9, 16.1, 80.3, 17.7]
2025-08-21 00:46:47 - INFO - [65b42141-20bf-4cf1-92b1-f29d846146ab] Cleaned up temporary frame directory: temp_videos/65b42141-20bf-4cf1-92b1-f29d846146ab
2025-08-21 00:46:47 - INFO - [2ff4de72-4fa0-4759-9211-626a4f60c683] Received new video inference request. Prompt: 'Please describe the video in detail, only focus on customer and staff behavior and activities and do not overly describe the static scene.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_003.mp4'
2025-08-21 00:46:47 - INFO - [2ff4de72-4fa0-4759-9211-626a4f60c683] Video saved to temporary file: temp_videos/2ff4de72-4fa0-4759-9211-626a4f60c683.mp4
2025-08-21 00:46:47 - INFO - [2ff4de72-4fa0-4759-9211-626a4f60c683] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:46:52 - INFO - [2ff4de72-4fa0-4759-9211-626a4f60c683] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:46:52 - INFO - [2ff4de72-4fa0-4759-9211-626a4f60c683] 30 frames saved to temp_videos/2ff4de72-4fa0-4759-9211-626a4f60c683
2025-08-21 00:46:52 - INFO - Prompt token length: 2296
2025-08-21 00:47:16 - INFO - Tokens per second: 12.037390307990146, Peak GPU memory MB: 5348.375
2025-08-21 00:47:16 - INFO - [2ff4de72-4fa0-4759-9211-626a4f60c683] Inference time: 29.02 seconds, CPU usage: 37.2%, CPU core utilization: [48.4, 16.9, 65.1, 18.1]
2025-08-21 00:47:16 - INFO - [2ff4de72-4fa0-4759-9211-626a4f60c683] Cleaned up temporary frame directory: temp_videos/2ff4de72-4fa0-4759-9211-626a4f60c683
2025-08-21 00:47:16 - INFO - [68a0b698-fcf0-4e8b-b0cb-e03797f97561] Received new video inference request. Prompt: 'Please describe the video in detail, only focus on customer and staff behavior and activities and do not overly describe the static scene.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_004.mp4'
2025-08-21 00:47:16 - INFO - [68a0b698-fcf0-4e8b-b0cb-e03797f97561] Video saved to temporary file: temp_videos/68a0b698-fcf0-4e8b-b0cb-e03797f97561.mp4
2025-08-21 00:47:16 - INFO - [68a0b698-fcf0-4e8b-b0cb-e03797f97561] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:47:21 - INFO - [68a0b698-fcf0-4e8b-b0cb-e03797f97561] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:47:21 - INFO - [68a0b698-fcf0-4e8b-b0cb-e03797f97561] 30 frames saved to temp_videos/68a0b698-fcf0-4e8b-b0cb-e03797f97561
2025-08-21 00:47:21 - INFO - Prompt token length: 2296
2025-08-21 00:47:45 - INFO - Tokens per second: 12.027123899562989, Peak GPU memory MB: 5348.375
2025-08-21 00:47:45 - INFO - [68a0b698-fcf0-4e8b-b0cb-e03797f97561] Inference time: 29.08 seconds, CPU usage: 36.9%, CPU core utilization: [74.2, 17.8, 15.5, 40.0]
2025-08-21 00:47:45 - INFO - [68a0b698-fcf0-4e8b-b0cb-e03797f97561] Cleaned up temporary frame directory: temp_videos/68a0b698-fcf0-4e8b-b0cb-e03797f97561
2025-08-21 00:47:45 - INFO - [6b5a1c52-b835-40af-b34e-b1b24b36ca95] Received new video inference request. Prompt: 'Please describe the video in detail, only focus on customer and staff behavior and activities and do not overly describe the static scene.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_005.mp4'
2025-08-21 00:47:45 - INFO - [6b5a1c52-b835-40af-b34e-b1b24b36ca95] Video saved to temporary file: temp_videos/6b5a1c52-b835-40af-b34e-b1b24b36ca95.mp4
2025-08-21 00:47:45 - INFO - [6b5a1c52-b835-40af-b34e-b1b24b36ca95] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:47:50 - INFO - [6b5a1c52-b835-40af-b34e-b1b24b36ca95] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:47:50 - INFO - [6b5a1c52-b835-40af-b34e-b1b24b36ca95] 30 frames saved to temp_videos/6b5a1c52-b835-40af-b34e-b1b24b36ca95
2025-08-21 00:47:50 - INFO - Prompt token length: 2296
2025-08-21 00:48:07 - INFO - Tokens per second: 11.998806395924422, Peak GPU memory MB: 5348.375
2025-08-21 00:48:07 - INFO - [6b5a1c52-b835-40af-b34e-b1b24b36ca95] Inference time: 21.52 seconds, CPU usage: 40.1%, CPU core utilization: [93.9, 22.7, 21.5, 22.1]
2025-08-21 00:48:07 - INFO - [6b5a1c52-b835-40af-b34e-b1b24b36ca95] Cleaned up temporary frame directory: temp_videos/6b5a1c52-b835-40af-b34e-b1b24b36ca95
2025-08-21 00:48:07 - INFO - [f6c17199-243f-477c-8b92-175e7d81c801] Received new video inference request. Prompt: 'Please describe the video in detail, only focus on customer and staff behavior and activities and do not overly describe the static scene.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_006.mp4'
2025-08-21 00:48:07 - INFO - [f6c17199-243f-477c-8b92-175e7d81c801] Video saved to temporary file: temp_videos/f6c17199-243f-477c-8b92-175e7d81c801.mp4
2025-08-21 00:48:07 - INFO - [f6c17199-243f-477c-8b92-175e7d81c801] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:48:12 - INFO - [f6c17199-243f-477c-8b92-175e7d81c801] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:48:12 - INFO - [f6c17199-243f-477c-8b92-175e7d81c801] 30 frames saved to temp_videos/f6c17199-243f-477c-8b92-175e7d81c801
2025-08-21 00:48:12 - INFO - Prompt token length: 2296
2025-08-21 00:48:36 - INFO - Tokens per second: 12.045229786817497, Peak GPU memory MB: 5348.375
2025-08-21 00:48:36 - INFO - [f6c17199-243f-477c-8b92-175e7d81c801] Inference time: 29.18 seconds, CPU usage: 37.2%, CPU core utilization: [44.5, 28.8, 58.7, 16.6]
2025-08-21 00:48:36 - INFO - [f6c17199-243f-477c-8b92-175e7d81c801] Cleaned up temporary frame directory: temp_videos/f6c17199-243f-477c-8b92-175e7d81c801