File size: 117,081 Bytes
f8ba0eb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
2025-08-20 23:34:55 - INFO - Loading model: openbmb/MiniCPM-V-4
2025-08-20 23:34:56 - INFO - vision_config is None, using default vision config
2025-08-20 23:36:01 - INFO - Model loaded in 65.64 seconds
2025-08-20 23:36:01 - INFO - GPU Memory Usage after model load: 7802.99 MB
2025-08-20 23:36:30 - INFO - [5373d18b-66ad-4a38-b02b-dbac9e400b89] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_001.mp4'
2025-08-20 23:36:30 - INFO - [5373d18b-66ad-4a38-b02b-dbac9e400b89] Video saved to temporary file: temp_videos/5373d18b-66ad-4a38-b02b-dbac9e400b89.mp4
2025-08-20 23:36:30 - INFO - [5373d18b-66ad-4a38-b02b-dbac9e400b89] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:36:36 - INFO - [5373d18b-66ad-4a38-b02b-dbac9e400b89] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:36:36 - INFO - [5373d18b-66ad-4a38-b02b-dbac9e400b89] 30 frames saved to temp_videos/5373d18b-66ad-4a38-b02b-dbac9e400b89
2025-08-20 23:36:53 - INFO - vision_config is None, using default vision config
2025-08-20 23:37:12 - INFO - Tokens per second: 8.779788779248046, Peak GPU memory MB: 11824.375
2025-08-20 23:37:12 - INFO - [5373d18b-66ad-4a38-b02b-dbac9e400b89] Inference time: 41.20 seconds, CPU usage: 20.4%, CPU core utilization: [14.8, 25.6, 19.6, 21.6]
2025-08-20 23:37:12 - INFO - [5373d18b-66ad-4a38-b02b-dbac9e400b89] Cleaned up temporary frame directory: temp_videos/5373d18b-66ad-4a38-b02b-dbac9e400b89
2025-08-20 23:37:12 - INFO - [cbcd04dd-9447-4344-812e-e4387f036fea] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_002.mp4'
2025-08-20 23:37:12 - INFO - [cbcd04dd-9447-4344-812e-e4387f036fea] Video saved to temporary file: temp_videos/cbcd04dd-9447-4344-812e-e4387f036fea.mp4
2025-08-20 23:37:12 - INFO - [cbcd04dd-9447-4344-812e-e4387f036fea] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:37:17 - INFO - [cbcd04dd-9447-4344-812e-e4387f036fea] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:37:17 - INFO - [cbcd04dd-9447-4344-812e-e4387f036fea] 30 frames saved to temp_videos/cbcd04dd-9447-4344-812e-e4387f036fea
2025-08-20 23:37:29 - INFO - vision_config is None, using default vision config
2025-08-20 23:37:45 - INFO - Tokens per second: 8.238054802466655, Peak GPU memory MB: 11824.375
2025-08-20 23:37:45 - INFO - [cbcd04dd-9447-4344-812e-e4387f036fea] Inference time: 33.71 seconds, CPU usage: 35.8%, CPU core utilization: [31.6, 59.3, 28.2, 24.1]
2025-08-20 23:37:45 - INFO - [cbcd04dd-9447-4344-812e-e4387f036fea] Cleaned up temporary frame directory: temp_videos/cbcd04dd-9447-4344-812e-e4387f036fea
2025-08-20 23:37:45 - INFO - [4ca8090e-ea72-431d-aa92-756575a05665] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_003.mp4'
2025-08-20 23:37:45 - INFO - [4ca8090e-ea72-431d-aa92-756575a05665] Video saved to temporary file: temp_videos/4ca8090e-ea72-431d-aa92-756575a05665.mp4
2025-08-20 23:37:45 - INFO - [4ca8090e-ea72-431d-aa92-756575a05665] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:37:50 - INFO - [4ca8090e-ea72-431d-aa92-756575a05665] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:37:50 - INFO - [4ca8090e-ea72-431d-aa92-756575a05665] 30 frames saved to temp_videos/4ca8090e-ea72-431d-aa92-756575a05665
2025-08-20 23:38:03 - INFO - vision_config is None, using default vision config
2025-08-20 23:38:26 - INFO - Tokens per second: 9.842206491452542, Peak GPU memory MB: 11824.375
2025-08-20 23:38:26 - INFO - [4ca8090e-ea72-431d-aa92-756575a05665] Inference time: 40.18 seconds, CPU usage: 34.2%, CPU core utilization: [42.5, 14.8, 64.8, 14.6]
2025-08-20 23:38:26 - INFO - [4ca8090e-ea72-431d-aa92-756575a05665] Cleaned up temporary frame directory: temp_videos/4ca8090e-ea72-431d-aa92-756575a05665
2025-08-20 23:38:26 - INFO - [e7c9965e-02c8-4472-bc73-adb425fc488d] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_004.mp4'
2025-08-20 23:38:26 - INFO - [e7c9965e-02c8-4472-bc73-adb425fc488d] Video saved to temporary file: temp_videos/e7c9965e-02c8-4472-bc73-adb425fc488d.mp4
2025-08-20 23:38:26 - INFO - [e7c9965e-02c8-4472-bc73-adb425fc488d] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:38:30 - INFO - [e7c9965e-02c8-4472-bc73-adb425fc488d] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:38:30 - INFO - [e7c9965e-02c8-4472-bc73-adb425fc488d] 30 frames saved to temp_videos/e7c9965e-02c8-4472-bc73-adb425fc488d
2025-08-20 23:38:43 - INFO - vision_config is None, using default vision config
2025-08-20 23:38:57 - INFO - Tokens per second: 7.223886604016651, Peak GPU memory MB: 11824.375
2025-08-20 23:38:57 - INFO - [e7c9965e-02c8-4472-bc73-adb425fc488d] Inference time: 31.57 seconds, CPU usage: 36.1%, CPU core utilization: [35.9, 19.2, 72.5, 16.5]
2025-08-20 23:38:57 - INFO - [e7c9965e-02c8-4472-bc73-adb425fc488d] Cleaned up temporary frame directory: temp_videos/e7c9965e-02c8-4472-bc73-adb425fc488d
2025-08-20 23:38:57 - INFO - [42ed726f-5d5b-4a66-bebd-387f3916eea3] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_005.mp4'
2025-08-20 23:38:57 - INFO - [42ed726f-5d5b-4a66-bebd-387f3916eea3] Video saved to temporary file: temp_videos/42ed726f-5d5b-4a66-bebd-387f3916eea3.mp4
2025-08-20 23:38:57 - INFO - [42ed726f-5d5b-4a66-bebd-387f3916eea3] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:39:02 - INFO - [42ed726f-5d5b-4a66-bebd-387f3916eea3] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:39:02 - INFO - [42ed726f-5d5b-4a66-bebd-387f3916eea3] 30 frames saved to temp_videos/42ed726f-5d5b-4a66-bebd-387f3916eea3
2025-08-20 23:39:15 - INFO - vision_config is None, using default vision config
2025-08-20 23:39:35 - INFO - Tokens per second: 9.414676082668533, Peak GPU memory MB: 11824.375
2025-08-20 23:39:35 - INFO - [42ed726f-5d5b-4a66-bebd-387f3916eea3] Inference time: 38.23 seconds, CPU usage: 34.6%, CPU core utilization: [34.4, 30.3, 14.1, 59.5]
2025-08-20 23:39:35 - INFO - [42ed726f-5d5b-4a66-bebd-387f3916eea3] Cleaned up temporary frame directory: temp_videos/42ed726f-5d5b-4a66-bebd-387f3916eea3
2025-08-20 23:39:35 - INFO - [770b60c1-2ad5-4fc8-a042-36f7397d63c8] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_006.mp4'
2025-08-20 23:39:35 - INFO - [770b60c1-2ad5-4fc8-a042-36f7397d63c8] Video saved to temporary file: temp_videos/770b60c1-2ad5-4fc8-a042-36f7397d63c8.mp4
2025-08-20 23:39:35 - INFO - [770b60c1-2ad5-4fc8-a042-36f7397d63c8] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:39:40 - INFO - [770b60c1-2ad5-4fc8-a042-36f7397d63c8] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:39:40 - INFO - [770b60c1-2ad5-4fc8-a042-36f7397d63c8] 30 frames saved to temp_videos/770b60c1-2ad5-4fc8-a042-36f7397d63c8
2025-08-20 23:39:53 - INFO - vision_config is None, using default vision config
2025-08-20 23:40:12 - INFO - Tokens per second: 9.018116223758131, Peak GPU memory MB: 11824.375
2025-08-20 23:40:12 - INFO - [770b60c1-2ad5-4fc8-a042-36f7397d63c8] Inference time: 36.60 seconds, CPU usage: 35.1%, CPU core utilization: [47.3, 19.3, 54.1, 19.6]
2025-08-20 23:40:12 - INFO - [770b60c1-2ad5-4fc8-a042-36f7397d63c8] Cleaned up temporary frame directory: temp_videos/770b60c1-2ad5-4fc8-a042-36f7397d63c8
2025-08-20 23:40:12 - INFO - [979c0d57-545b-43cd-8601-b2ae280d1197] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_007.mp4'
2025-08-20 23:40:12 - INFO - [979c0d57-545b-43cd-8601-b2ae280d1197] Video saved to temporary file: temp_videos/979c0d57-545b-43cd-8601-b2ae280d1197.mp4
2025-08-20 23:40:12 - INFO - [979c0d57-545b-43cd-8601-b2ae280d1197] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:40:17 - INFO - [979c0d57-545b-43cd-8601-b2ae280d1197] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:40:17 - INFO - [979c0d57-545b-43cd-8601-b2ae280d1197] 30 frames saved to temp_videos/979c0d57-545b-43cd-8601-b2ae280d1197
2025-08-20 23:40:30 - INFO - vision_config is None, using default vision config
2025-08-20 23:40:46 - INFO - Tokens per second: 8.128884269772154, Peak GPU memory MB: 11824.375
2025-08-20 23:40:46 - INFO - [979c0d57-545b-43cd-8601-b2ae280d1197] Inference time: 33.74 seconds, CPU usage: 35.5%, CPU core utilization: [29.4, 48.5, 49.0, 15.1]
2025-08-20 23:40:46 - INFO - [979c0d57-545b-43cd-8601-b2ae280d1197] Cleaned up temporary frame directory: temp_videos/979c0d57-545b-43cd-8601-b2ae280d1197
2025-08-20 23:40:46 - INFO - [09341564-481c-4f01-b9ae-fd64f197af41] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_008.mp4'
2025-08-20 23:40:46 - INFO - [09341564-481c-4f01-b9ae-fd64f197af41] Video saved to temporary file: temp_videos/09341564-481c-4f01-b9ae-fd64f197af41.mp4
2025-08-20 23:40:46 - INFO - [09341564-481c-4f01-b9ae-fd64f197af41] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:40:51 - INFO - [09341564-481c-4f01-b9ae-fd64f197af41] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:40:51 - INFO - [09341564-481c-4f01-b9ae-fd64f197af41] 30 frames saved to temp_videos/09341564-481c-4f01-b9ae-fd64f197af41
2025-08-20 23:41:04 - INFO - vision_config is None, using default vision config
2025-08-20 23:41:20 - INFO - Tokens per second: 8.104748907769563, Peak GPU memory MB: 11824.375
2025-08-20 23:41:20 - INFO - [09341564-481c-4f01-b9ae-fd64f197af41] Inference time: 33.69 seconds, CPU usage: 35.7%, CPU core utilization: [22.5, 24.9, 45.2, 50.3]
2025-08-20 23:41:20 - INFO - [09341564-481c-4f01-b9ae-fd64f197af41] Cleaned up temporary frame directory: temp_videos/09341564-481c-4f01-b9ae-fd64f197af41
2025-08-20 23:41:20 - INFO - [152baa13-549f-4cd5-a9fd-b9caa193427c] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_009.mp4'
2025-08-20 23:41:20 - INFO - [152baa13-549f-4cd5-a9fd-b9caa193427c] Video saved to temporary file: temp_videos/152baa13-549f-4cd5-a9fd-b9caa193427c.mp4
2025-08-20 23:41:20 - INFO - [152baa13-549f-4cd5-a9fd-b9caa193427c] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:41:24 - INFO - [152baa13-549f-4cd5-a9fd-b9caa193427c] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:41:24 - INFO - [152baa13-549f-4cd5-a9fd-b9caa193427c] 30 frames saved to temp_videos/152baa13-549f-4cd5-a9fd-b9caa193427c
2025-08-20 23:41:37 - INFO - vision_config is None, using default vision config
2025-08-20 23:41:54 - INFO - Tokens per second: 8.260490846413186, Peak GPU memory MB: 11824.375
2025-08-20 23:41:54 - INFO - [152baa13-549f-4cd5-a9fd-b9caa193427c] Inference time: 34.09 seconds, CPU usage: 35.2%, CPU core utilization: [44.4, 20.5, 41.4, 34.6]
2025-08-20 23:41:54 - INFO - [152baa13-549f-4cd5-a9fd-b9caa193427c] Cleaned up temporary frame directory: temp_videos/152baa13-549f-4cd5-a9fd-b9caa193427c
2025-08-20 23:41:54 - INFO - [f16dda49-2af2-4b53-b80c-e152a2424314] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_010.mp4'
2025-08-20 23:41:54 - INFO - [f16dda49-2af2-4b53-b80c-e152a2424314] Video saved to temporary file: temp_videos/f16dda49-2af2-4b53-b80c-e152a2424314.mp4
2025-08-20 23:41:54 - INFO - [f16dda49-2af2-4b53-b80c-e152a2424314] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:41:58 - INFO - [f16dda49-2af2-4b53-b80c-e152a2424314] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:41:58 - INFO - [f16dda49-2af2-4b53-b80c-e152a2424314] 30 frames saved to temp_videos/f16dda49-2af2-4b53-b80c-e152a2424314
2025-08-20 23:42:11 - INFO - vision_config is None, using default vision config
2025-08-20 23:42:28 - INFO - Tokens per second: 8.474433175082025, Peak GPU memory MB: 11824.375
2025-08-20 23:42:28 - INFO - [f16dda49-2af2-4b53-b80c-e152a2424314] Inference time: 34.75 seconds, CPU usage: 35.3%, CPU core utilization: [40.2, 15.6, 70.7, 14.8]
2025-08-20 23:42:28 - INFO - [f16dda49-2af2-4b53-b80c-e152a2424314] Cleaned up temporary frame directory: temp_videos/f16dda49-2af2-4b53-b80c-e152a2424314
2025-08-20 23:42:28 - INFO - [4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_011.mp4'
2025-08-20 23:42:28 - INFO - [4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55] Video saved to temporary file: temp_videos/4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55.mp4
2025-08-20 23:42:28 - INFO - [4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:42:33 - INFO - [4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:42:33 - INFO - [4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55] 30 frames saved to temp_videos/4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55
2025-08-20 23:42:46 - INFO - vision_config is None, using default vision config
2025-08-20 23:43:00 - INFO - Tokens per second: 7.281456041328239, Peak GPU memory MB: 11824.375
2025-08-20 23:43:00 - INFO - [4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55] Inference time: 31.78 seconds, CPU usage: 35.9%, CPU core utilization: [41.3, 59.3, 27.5, 15.5]
2025-08-20 23:43:00 - INFO - [4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55] Cleaned up temporary frame directory: temp_videos/4ad4941a-d5b6-4fef-bf4e-4ba8bf29ac55
2025-08-20 23:43:00 - INFO - [b91fc444-5706-4fc6-b7e3-97045044f1ad] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_012.mp4'
2025-08-20 23:43:00 - INFO - [b91fc444-5706-4fc6-b7e3-97045044f1ad] Video saved to temporary file: temp_videos/b91fc444-5706-4fc6-b7e3-97045044f1ad.mp4
2025-08-20 23:43:00 - INFO - [b91fc444-5706-4fc6-b7e3-97045044f1ad] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:43:05 - INFO - [b91fc444-5706-4fc6-b7e3-97045044f1ad] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:43:05 - INFO - [b91fc444-5706-4fc6-b7e3-97045044f1ad] 30 frames saved to temp_videos/b91fc444-5706-4fc6-b7e3-97045044f1ad
2025-08-20 23:43:18 - INFO - vision_config is None, using default vision config
2025-08-20 23:43:34 - INFO - Tokens per second: 8.229193657182282, Peak GPU memory MB: 11824.375
2025-08-20 23:43:34 - INFO - [b91fc444-5706-4fc6-b7e3-97045044f1ad] Inference time: 34.07 seconds, CPU usage: 35.4%, CPU core utilization: [17.3, 58.4, 14.4, 51.7]
2025-08-20 23:43:34 - INFO - [b91fc444-5706-4fc6-b7e3-97045044f1ad] Cleaned up temporary frame directory: temp_videos/b91fc444-5706-4fc6-b7e3-97045044f1ad
2025-08-20 23:43:34 - INFO - [8470205f-d443-4e1d-9371-99a2ab5f76b6] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_013.mp4'
2025-08-20 23:43:34 - INFO - [8470205f-d443-4e1d-9371-99a2ab5f76b6] Video saved to temporary file: temp_videos/8470205f-d443-4e1d-9371-99a2ab5f76b6.mp4
2025-08-20 23:43:34 - INFO - [8470205f-d443-4e1d-9371-99a2ab5f76b6] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:43:39 - INFO - [8470205f-d443-4e1d-9371-99a2ab5f76b6] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:43:39 - INFO - [8470205f-d443-4e1d-9371-99a2ab5f76b6] 30 frames saved to temp_videos/8470205f-d443-4e1d-9371-99a2ab5f76b6
2025-08-20 23:43:52 - INFO - vision_config is None, using default vision config
2025-08-20 23:44:08 - INFO - Tokens per second: 7.933309315527597, Peak GPU memory MB: 11824.375
2025-08-20 23:44:08 - INFO - [8470205f-d443-4e1d-9371-99a2ab5f76b6] Inference time: 33.32 seconds, CPU usage: 35.5%, CPU core utilization: [59.0, 27.1, 15.4, 40.4]
2025-08-20 23:44:08 - INFO - [8470205f-d443-4e1d-9371-99a2ab5f76b6] Cleaned up temporary frame directory: temp_videos/8470205f-d443-4e1d-9371-99a2ab5f76b6
2025-08-20 23:44:08 - INFO - [87989d50-7108-4eea-a2e2-3ac68ce850b0] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_014.mp4'
2025-08-20 23:44:08 - INFO - [87989d50-7108-4eea-a2e2-3ac68ce850b0] Video saved to temporary file: temp_videos/87989d50-7108-4eea-a2e2-3ac68ce850b0.mp4
2025-08-20 23:44:08 - INFO - [87989d50-7108-4eea-a2e2-3ac68ce850b0] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:44:13 - INFO - [87989d50-7108-4eea-a2e2-3ac68ce850b0] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:44:13 - INFO - [87989d50-7108-4eea-a2e2-3ac68ce850b0] 30 frames saved to temp_videos/87989d50-7108-4eea-a2e2-3ac68ce850b0
2025-08-20 23:44:25 - INFO - vision_config is None, using default vision config
2025-08-20 23:44:47 - INFO - Tokens per second: 9.6318698479325, Peak GPU memory MB: 11824.375
2025-08-20 23:44:47 - INFO - [87989d50-7108-4eea-a2e2-3ac68ce850b0] Inference time: 39.46 seconds, CPU usage: 34.5%, CPU core utilization: [46.5, 21.2, 28.6, 41.7]
2025-08-20 23:44:47 - INFO - [87989d50-7108-4eea-a2e2-3ac68ce850b0] Cleaned up temporary frame directory: temp_videos/87989d50-7108-4eea-a2e2-3ac68ce850b0
2025-08-20 23:44:47 - INFO - [2e1a26e8-2e97-4635-9479-94580fac857d] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_015.mp4'
2025-08-20 23:44:47 - INFO - [2e1a26e8-2e97-4635-9479-94580fac857d] Video saved to temporary file: temp_videos/2e1a26e8-2e97-4635-9479-94580fac857d.mp4
2025-08-20 23:44:47 - INFO - [2e1a26e8-2e97-4635-9479-94580fac857d] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:44:52 - INFO - [2e1a26e8-2e97-4635-9479-94580fac857d] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:44:52 - INFO - [2e1a26e8-2e97-4635-9479-94580fac857d] 30 frames saved to temp_videos/2e1a26e8-2e97-4635-9479-94580fac857d
2025-08-20 23:45:05 - INFO - vision_config is None, using default vision config
2025-08-20 23:45:19 - INFO - Tokens per second: 7.41501152980709, Peak GPU memory MB: 11824.375
2025-08-20 23:45:19 - INFO - [2e1a26e8-2e97-4635-9479-94580fac857d] Inference time: 32.04 seconds, CPU usage: 35.9%, CPU core utilization: [44.7, 31.3, 28.4, 39.1]
2025-08-20 23:45:19 - INFO - [2e1a26e8-2e97-4635-9479-94580fac857d] Cleaned up temporary frame directory: temp_videos/2e1a26e8-2e97-4635-9479-94580fac857d
2025-08-20 23:45:19 - INFO - [e846a73a-a898-4957-8799-5b00b759bd1c] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_016.mp4'
2025-08-20 23:45:19 - INFO - [e846a73a-a898-4957-8799-5b00b759bd1c] Video saved to temporary file: temp_videos/e846a73a-a898-4957-8799-5b00b759bd1c.mp4
2025-08-20 23:45:19 - INFO - [e846a73a-a898-4957-8799-5b00b759bd1c] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:45:24 - INFO - [e846a73a-a898-4957-8799-5b00b759bd1c] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:45:24 - INFO - [e846a73a-a898-4957-8799-5b00b759bd1c] 30 frames saved to temp_videos/e846a73a-a898-4957-8799-5b00b759bd1c
2025-08-20 23:45:37 - INFO - vision_config is None, using default vision config
2025-08-20 23:45:57 - INFO - Tokens per second: 9.31005578568323, Peak GPU memory MB: 11824.375
2025-08-20 23:45:57 - INFO - [e846a73a-a898-4957-8799-5b00b759bd1c] Inference time: 37.84 seconds, CPU usage: 34.4%, CPU core utilization: [26.5, 43.5, 51.5, 16.2]
2025-08-20 23:45:57 - INFO - [e846a73a-a898-4957-8799-5b00b759bd1c] Cleaned up temporary frame directory: temp_videos/e846a73a-a898-4957-8799-5b00b759bd1c
2025-08-20 23:45:57 - INFO - [e95f483f-4174-421b-b85c-177358e58486] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_017.mp4'
2025-08-20 23:45:57 - INFO - [e95f483f-4174-421b-b85c-177358e58486] Video saved to temporary file: temp_videos/e95f483f-4174-421b-b85c-177358e58486.mp4
2025-08-20 23:45:57 - INFO - [e95f483f-4174-421b-b85c-177358e58486] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:46:02 - INFO - [e95f483f-4174-421b-b85c-177358e58486] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:46:02 - INFO - [e95f483f-4174-421b-b85c-177358e58486] 30 frames saved to temp_videos/e95f483f-4174-421b-b85c-177358e58486
2025-08-20 23:46:15 - INFO - vision_config is None, using default vision config
2025-08-20 23:46:29 - INFO - Tokens per second: 7.603712717977961, Peak GPU memory MB: 11824.375
2025-08-20 23:46:29 - INFO - [e95f483f-4174-421b-b85c-177358e58486] Inference time: 32.46 seconds, CPU usage: 36.2%, CPU core utilization: [34.5, 20.6, 74.3, 15.2]
2025-08-20 23:46:29 - INFO - [e95f483f-4174-421b-b85c-177358e58486] Cleaned up temporary frame directory: temp_videos/e95f483f-4174-421b-b85c-177358e58486
2025-08-20 23:46:29 - INFO - [0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_018.mp4'
2025-08-20 23:46:29 - INFO - [0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b] Video saved to temporary file: temp_videos/0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b.mp4
2025-08-20 23:46:29 - INFO - [0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:46:37 - INFO - [0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:46:37 - INFO - [0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b] 30 frames saved to temp_videos/0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b
2025-08-20 23:46:50 - INFO - vision_config is None, using default vision config
2025-08-20 23:47:11 - INFO - Tokens per second: 9.605931622305029, Peak GPU memory MB: 11824.375
2025-08-20 23:47:11 - INFO - [0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b] Inference time: 41.81 seconds, CPU usage: 53.0%, CPU core utilization: [47.2, 51.0, 44.8, 69.2]
2025-08-20 23:47:11 - INFO - [0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b] Cleaned up temporary frame directory: temp_videos/0fae1f1a-6dfb-4b0e-aaa9-b36a4e6ab08b
2025-08-20 23:47:11 - INFO - [1317b652-49c6-45bf-bbb5-329fa9fd9572] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_019.mp4'
2025-08-20 23:47:11 - INFO - [1317b652-49c6-45bf-bbb5-329fa9fd9572] Video saved to temporary file: temp_videos/1317b652-49c6-45bf-bbb5-329fa9fd9572.mp4
2025-08-20 23:47:11 - INFO - [1317b652-49c6-45bf-bbb5-329fa9fd9572] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:47:16 - INFO - [1317b652-49c6-45bf-bbb5-329fa9fd9572] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:47:16 - INFO - [1317b652-49c6-45bf-bbb5-329fa9fd9572] 30 frames saved to temp_videos/1317b652-49c6-45bf-bbb5-329fa9fd9572
2025-08-20 23:47:29 - INFO - vision_config is None, using default vision config
2025-08-20 23:47:42 - INFO - Tokens per second: 6.553943613195835, Peak GPU memory MB: 11824.375
2025-08-20 23:47:42 - INFO - [1317b652-49c6-45bf-bbb5-329fa9fd9572] Inference time: 30.42 seconds, CPU usage: 36.3%, CPU core utilization: [28.7, 56.6, 44.1, 15.8]
2025-08-20 23:47:42 - INFO - [1317b652-49c6-45bf-bbb5-329fa9fd9572] Cleaned up temporary frame directory: temp_videos/1317b652-49c6-45bf-bbb5-329fa9fd9572
2025-08-20 23:47:42 - INFO - [dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_020.mp4'
2025-08-20 23:47:42 - INFO - [dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef] Video saved to temporary file: temp_videos/dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef.mp4
2025-08-20 23:47:42 - INFO - [dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:47:47 - INFO - [dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:47:47 - INFO - [dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef] 30 frames saved to temp_videos/dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef
2025-08-20 23:48:00 - INFO - vision_config is None, using default vision config
2025-08-20 23:48:15 - INFO - Tokens per second: 7.993319182461253, Peak GPU memory MB: 11824.375
2025-08-20 23:48:15 - INFO - [dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef] Inference time: 33.46 seconds, CPU usage: 35.9%, CPU core utilization: [55.1, 38.2, 24.9, 25.4]
2025-08-20 23:48:15 - INFO - [dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef] Cleaned up temporary frame directory: temp_videos/dc27d7c0-3dad-49d5-aa0b-79cc072fe0ef
2025-08-20 23:48:15 - INFO - [2d2b338a-5383-48df-a50c-a5469d4bf2a2] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_021.mp4'
2025-08-20 23:48:15 - INFO - [2d2b338a-5383-48df-a50c-a5469d4bf2a2] Video saved to temporary file: temp_videos/2d2b338a-5383-48df-a50c-a5469d4bf2a2.mp4
2025-08-20 23:48:15 - INFO - [2d2b338a-5383-48df-a50c-a5469d4bf2a2] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:48:20 - INFO - [2d2b338a-5383-48df-a50c-a5469d4bf2a2] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:48:20 - INFO - [2d2b338a-5383-48df-a50c-a5469d4bf2a2] 30 frames saved to temp_videos/2d2b338a-5383-48df-a50c-a5469d4bf2a2
2025-08-20 23:48:33 - INFO - vision_config is None, using default vision config
2025-08-20 23:48:50 - INFO - Tokens per second: 8.475114910409703, Peak GPU memory MB: 11824.375
2025-08-20 23:48:50 - INFO - [2d2b338a-5383-48df-a50c-a5469d4bf2a2] Inference time: 34.69 seconds, CPU usage: 35.2%, CPU core utilization: [23.3, 27.9, 41.5, 48.1]
2025-08-20 23:48:50 - INFO - [2d2b338a-5383-48df-a50c-a5469d4bf2a2] Cleaned up temporary frame directory: temp_videos/2d2b338a-5383-48df-a50c-a5469d4bf2a2
2025-08-20 23:48:50 - INFO - [91c7b79b-21ef-4745-a10e-22a03a1abd0c] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_022.mp4'
2025-08-20 23:48:50 - INFO - [91c7b79b-21ef-4745-a10e-22a03a1abd0c] Video saved to temporary file: temp_videos/91c7b79b-21ef-4745-a10e-22a03a1abd0c.mp4
2025-08-20 23:48:50 - INFO - [91c7b79b-21ef-4745-a10e-22a03a1abd0c] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:48:55 - INFO - [91c7b79b-21ef-4745-a10e-22a03a1abd0c] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:48:55 - INFO - [91c7b79b-21ef-4745-a10e-22a03a1abd0c] 30 frames saved to temp_videos/91c7b79b-21ef-4745-a10e-22a03a1abd0c
2025-08-20 23:49:08 - INFO - vision_config is None, using default vision config
2025-08-20 23:49:23 - INFO - Tokens per second: 7.968927639160867, Peak GPU memory MB: 11824.375
2025-08-20 23:49:23 - INFO - [91c7b79b-21ef-4745-a10e-22a03a1abd0c] Inference time: 33.28 seconds, CPU usage: 35.7%, CPU core utilization: [63.9, 25.4, 38.6, 14.6]
2025-08-20 23:49:23 - INFO - [91c7b79b-21ef-4745-a10e-22a03a1abd0c] Cleaned up temporary frame directory: temp_videos/91c7b79b-21ef-4745-a10e-22a03a1abd0c
2025-08-20 23:49:23 - INFO - [5139cda6-0da8-43e9-8f70-5adf1590238b] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_023.mp4'
2025-08-20 23:49:23 - INFO - [5139cda6-0da8-43e9-8f70-5adf1590238b] Video saved to temporary file: temp_videos/5139cda6-0da8-43e9-8f70-5adf1590238b.mp4
2025-08-20 23:49:23 - INFO - [5139cda6-0da8-43e9-8f70-5adf1590238b] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:49:28 - INFO - [5139cda6-0da8-43e9-8f70-5adf1590238b] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:49:28 - INFO - [5139cda6-0da8-43e9-8f70-5adf1590238b] 30 frames saved to temp_videos/5139cda6-0da8-43e9-8f70-5adf1590238b
2025-08-20 23:49:41 - INFO - vision_config is None, using default vision config
2025-08-20 23:49:55 - INFO - Tokens per second: 7.454720470818721, Peak GPU memory MB: 11824.375
2025-08-20 23:49:55 - INFO - [5139cda6-0da8-43e9-8f70-5adf1590238b] Inference time: 32.14 seconds, CPU usage: 35.9%, CPU core utilization: [54.3, 22.4, 50.0, 17.0]
2025-08-20 23:49:55 - INFO - [5139cda6-0da8-43e9-8f70-5adf1590238b] Cleaned up temporary frame directory: temp_videos/5139cda6-0da8-43e9-8f70-5adf1590238b
2025-08-20 23:49:55 - INFO - [b92496ee-5a36-490d-aadb-2f06439d7995] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_024.mp4'
2025-08-20 23:49:55 - INFO - [b92496ee-5a36-490d-aadb-2f06439d7995] Video saved to temporary file: temp_videos/b92496ee-5a36-490d-aadb-2f06439d7995.mp4
2025-08-20 23:49:55 - INFO - [b92496ee-5a36-490d-aadb-2f06439d7995] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:50:00 - INFO - [b92496ee-5a36-490d-aadb-2f06439d7995] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:50:00 - INFO - [b92496ee-5a36-490d-aadb-2f06439d7995] 30 frames saved to temp_videos/b92496ee-5a36-490d-aadb-2f06439d7995
2025-08-20 23:50:13 - INFO - vision_config is None, using default vision config
2025-08-20 23:50:29 - INFO - Tokens per second: 8.25678903581856, Peak GPU memory MB: 11824.375
2025-08-20 23:50:29 - INFO - [b92496ee-5a36-490d-aadb-2f06439d7995] Inference time: 34.09 seconds, CPU usage: 35.5%, CPU core utilization: [51.5, 45.5, 16.6, 28.4]
2025-08-20 23:50:29 - INFO - [b92496ee-5a36-490d-aadb-2f06439d7995] Cleaned up temporary frame directory: temp_videos/b92496ee-5a36-490d-aadb-2f06439d7995
2025-08-20 23:50:29 - INFO - [4ecf6faf-4096-491c-8670-2c39714858dc] Received new video inference request. Prompt: 'Summarize the key events in this convenience store video. Focus only on the actions and interactions of the people. Avoid repetitive descriptions of the store's layout or shelves.', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_025.mp4'
2025-08-20 23:50:29 - INFO - [4ecf6faf-4096-491c-8670-2c39714858dc] Video saved to temporary file: temp_videos/4ecf6faf-4096-491c-8670-2c39714858dc.mp4
2025-08-20 23:50:29 - INFO - [4ecf6faf-4096-491c-8670-2c39714858dc] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:50:34 - INFO - [4ecf6faf-4096-491c-8670-2c39714858dc] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:50:34 - INFO - [4ecf6faf-4096-491c-8670-2c39714858dc] 30 frames saved to temp_videos/4ecf6faf-4096-491c-8670-2c39714858dc
2025-08-20 23:50:47 - INFO - vision_config is None, using default vision config
2025-08-20 23:51:17 - INFO - Tokens per second: 10.736432842240825, Peak GPU memory MB: 11824.375
2025-08-20 23:51:17 - INFO - [4ecf6faf-4096-491c-8670-2c39714858dc] Inference time: 47.95 seconds, CPU usage: 33.5%, CPU core utilization: [38.6, 16.2, 63.7, 15.3]
2025-08-20 23:51:17 - INFO - [4ecf6faf-4096-491c-8670-2c39714858dc] Cleaned up temporary frame directory: temp_videos/4ecf6faf-4096-491c-8670-2c39714858dc
2025-08-20 23:54:07 - INFO - [380eb612-eca5-49dd-b2d1-bbdeb5ecf54c] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_001.mp4'
2025-08-20 23:54:07 - INFO - [380eb612-eca5-49dd-b2d1-bbdeb5ecf54c] Video saved to temporary file: temp_videos/380eb612-eca5-49dd-b2d1-bbdeb5ecf54c.mp4
2025-08-20 23:54:07 - INFO - [380eb612-eca5-49dd-b2d1-bbdeb5ecf54c] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:54:12 - INFO - [380eb612-eca5-49dd-b2d1-bbdeb5ecf54c] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:54:12 - INFO - [380eb612-eca5-49dd-b2d1-bbdeb5ecf54c] 30 frames saved to temp_videos/380eb612-eca5-49dd-b2d1-bbdeb5ecf54c
2025-08-20 23:54:25 - INFO - vision_config is None, using default vision config
2025-08-20 23:54:37 - INFO - Tokens per second: 6.4951065299349615, Peak GPU memory MB: 11824.375
2025-08-20 23:54:37 - INFO - [380eb612-eca5-49dd-b2d1-bbdeb5ecf54c] Inference time: 29.85 seconds, CPU usage: 6.8%, CPU core utilization: [6.3, 7.3, 10.0, 3.6]
2025-08-20 23:54:37 - INFO - [380eb612-eca5-49dd-b2d1-bbdeb5ecf54c] Cleaned up temporary frame directory: temp_videos/380eb612-eca5-49dd-b2d1-bbdeb5ecf54c
2025-08-20 23:54:37 - INFO - [aee27771-15f4-4e39-89c7-953aaf2d6435] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_002.mp4'
2025-08-20 23:54:37 - INFO - [aee27771-15f4-4e39-89c7-953aaf2d6435] Video saved to temporary file: temp_videos/aee27771-15f4-4e39-89c7-953aaf2d6435.mp4
2025-08-20 23:54:37 - INFO - [aee27771-15f4-4e39-89c7-953aaf2d6435] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:54:42 - INFO - [aee27771-15f4-4e39-89c7-953aaf2d6435] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:54:42 - INFO - [aee27771-15f4-4e39-89c7-953aaf2d6435] 30 frames saved to temp_videos/aee27771-15f4-4e39-89c7-953aaf2d6435
2025-08-20 23:54:55 - INFO - vision_config is None, using default vision config
2025-08-20 23:55:04 - INFO - Tokens per second: 4.620365146287069, Peak GPU memory MB: 11824.375
2025-08-20 23:55:04 - INFO - [aee27771-15f4-4e39-89c7-953aaf2d6435] Inference time: 27.55 seconds, CPU usage: 37.7%, CPU core utilization: [23.5, 33.0, 45.6, 48.7]
2025-08-20 23:55:04 - INFO - [aee27771-15f4-4e39-89c7-953aaf2d6435] Cleaned up temporary frame directory: temp_videos/aee27771-15f4-4e39-89c7-953aaf2d6435
2025-08-20 23:55:04 - INFO - [33034474-6cc6-44ea-bf4e-595b33e0a842] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_003.mp4'
2025-08-20 23:55:04 - INFO - [33034474-6cc6-44ea-bf4e-595b33e0a842] Video saved to temporary file: temp_videos/33034474-6cc6-44ea-bf4e-595b33e0a842.mp4
2025-08-20 23:55:04 - INFO - [33034474-6cc6-44ea-bf4e-595b33e0a842] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:55:09 - INFO - [33034474-6cc6-44ea-bf4e-595b33e0a842] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:55:09 - INFO - [33034474-6cc6-44ea-bf4e-595b33e0a842] 30 frames saved to temp_videos/33034474-6cc6-44ea-bf4e-595b33e0a842
2025-08-20 23:55:22 - INFO - vision_config is None, using default vision config
2025-08-20 23:55:35 - INFO - Tokens per second: 6.8004162288880075, Peak GPU memory MB: 11824.375
2025-08-20 23:55:35 - INFO - [33034474-6cc6-44ea-bf4e-595b33e0a842] Inference time: 30.57 seconds, CPU usage: 36.6%, CPU core utilization: [23.5, 34.8, 48.2, 39.9]
2025-08-20 23:55:35 - INFO - [33034474-6cc6-44ea-bf4e-595b33e0a842] Cleaned up temporary frame directory: temp_videos/33034474-6cc6-44ea-bf4e-595b33e0a842
2025-08-20 23:55:35 - INFO - [08e94b4d-f36d-44c5-b12a-d657bad86595] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_004.mp4'
2025-08-20 23:55:35 - INFO - [08e94b4d-f36d-44c5-b12a-d657bad86595] Video saved to temporary file: temp_videos/08e94b4d-f36d-44c5-b12a-d657bad86595.mp4
2025-08-20 23:55:35 - INFO - [08e94b4d-f36d-44c5-b12a-d657bad86595] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:55:40 - INFO - [08e94b4d-f36d-44c5-b12a-d657bad86595] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:55:40 - INFO - [08e94b4d-f36d-44c5-b12a-d657bad86595] 30 frames saved to temp_videos/08e94b4d-f36d-44c5-b12a-d657bad86595
2025-08-20 23:55:53 - INFO - vision_config is None, using default vision config
2025-08-20 23:56:04 - INFO - Tokens per second: 5.72115634822065, Peak GPU memory MB: 11824.375
2025-08-20 23:56:04 - INFO - [08e94b4d-f36d-44c5-b12a-d657bad86595] Inference time: 29.03 seconds, CPU usage: 36.6%, CPU core utilization: [18.2, 59.9, 52.6, 16.0]
2025-08-20 23:56:04 - INFO - [08e94b4d-f36d-44c5-b12a-d657bad86595] Cleaned up temporary frame directory: temp_videos/08e94b4d-f36d-44c5-b12a-d657bad86595
2025-08-20 23:56:04 - INFO - [aeba9bb8-25b3-44c0-8827-e08f9da881f5] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_005.mp4'
2025-08-20 23:56:04 - INFO - [aeba9bb8-25b3-44c0-8827-e08f9da881f5] Video saved to temporary file: temp_videos/aeba9bb8-25b3-44c0-8827-e08f9da881f5.mp4
2025-08-20 23:56:04 - INFO - [aeba9bb8-25b3-44c0-8827-e08f9da881f5] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:56:09 - INFO - [aeba9bb8-25b3-44c0-8827-e08f9da881f5] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:56:09 - INFO - [aeba9bb8-25b3-44c0-8827-e08f9da881f5] 30 frames saved to temp_videos/aeba9bb8-25b3-44c0-8827-e08f9da881f5
2025-08-20 23:56:22 - INFO - vision_config is None, using default vision config
2025-08-20 23:56:32 - INFO - Tokens per second: 4.8389257523545774, Peak GPU memory MB: 11824.375
2025-08-20 23:56:32 - INFO - [aeba9bb8-25b3-44c0-8827-e08f9da881f5] Inference time: 28.02 seconds, CPU usage: 37.2%, CPU core utilization: [18.8, 20.2, 51.9, 57.8]
2025-08-20 23:56:32 - INFO - [aeba9bb8-25b3-44c0-8827-e08f9da881f5] Cleaned up temporary frame directory: temp_videos/aeba9bb8-25b3-44c0-8827-e08f9da881f5
2025-08-20 23:56:32 - INFO - [f59d2020-42d3-4106-ba4a-822cd7a913b9] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_006.mp4'
2025-08-20 23:56:32 - INFO - [f59d2020-42d3-4106-ba4a-822cd7a913b9] Video saved to temporary file: temp_videos/f59d2020-42d3-4106-ba4a-822cd7a913b9.mp4
2025-08-20 23:56:32 - INFO - [f59d2020-42d3-4106-ba4a-822cd7a913b9] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:56:37 - INFO - [f59d2020-42d3-4106-ba4a-822cd7a913b9] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:56:37 - INFO - [f59d2020-42d3-4106-ba4a-822cd7a913b9] 30 frames saved to temp_videos/f59d2020-42d3-4106-ba4a-822cd7a913b9
2025-08-20 23:56:50 - INFO - vision_config is None, using default vision config
2025-08-20 23:57:01 - INFO - Tokens per second: 5.491475730408376, Peak GPU memory MB: 11824.375
2025-08-20 23:57:01 - INFO - [f59d2020-42d3-4106-ba4a-822cd7a913b9] Inference time: 28.85 seconds, CPU usage: 36.8%, CPU core utilization: [19.5, 63.2, 31.2, 33.4]
2025-08-20 23:57:01 - INFO - [f59d2020-42d3-4106-ba4a-822cd7a913b9] Cleaned up temporary frame directory: temp_videos/f59d2020-42d3-4106-ba4a-822cd7a913b9
2025-08-20 23:57:01 - INFO - [10f4ed0e-10c6-4a0e-baef-cbc5a15a174c] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_007.mp4'
2025-08-20 23:57:01 - INFO - [10f4ed0e-10c6-4a0e-baef-cbc5a15a174c] Video saved to temporary file: temp_videos/10f4ed0e-10c6-4a0e-baef-cbc5a15a174c.mp4
2025-08-20 23:57:01 - INFO - [10f4ed0e-10c6-4a0e-baef-cbc5a15a174c] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:57:06 - INFO - [10f4ed0e-10c6-4a0e-baef-cbc5a15a174c] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:57:06 - INFO - [10f4ed0e-10c6-4a0e-baef-cbc5a15a174c] 30 frames saved to temp_videos/10f4ed0e-10c6-4a0e-baef-cbc5a15a174c
2025-08-20 23:57:19 - INFO - vision_config is None, using default vision config
2025-08-20 23:57:30 - INFO - Tokens per second: 5.435457465325606, Peak GPU memory MB: 11824.375
2025-08-20 23:57:30 - INFO - [10f4ed0e-10c6-4a0e-baef-cbc5a15a174c] Inference time: 28.75 seconds, CPU usage: 37.6%, CPU core utilization: [28.1, 30.6, 71.4, 20.3]
2025-08-20 23:57:30 - INFO - [10f4ed0e-10c6-4a0e-baef-cbc5a15a174c] Cleaned up temporary frame directory: temp_videos/10f4ed0e-10c6-4a0e-baef-cbc5a15a174c
2025-08-20 23:57:30 - INFO - [5a373da4-23c9-4001-93b0-2b33e1830f66] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_008.mp4'
2025-08-20 23:57:30 - INFO - [5a373da4-23c9-4001-93b0-2b33e1830f66] Video saved to temporary file: temp_videos/5a373da4-23c9-4001-93b0-2b33e1830f66.mp4
2025-08-20 23:57:30 - INFO - [5a373da4-23c9-4001-93b0-2b33e1830f66] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:57:35 - INFO - [5a373da4-23c9-4001-93b0-2b33e1830f66] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:57:35 - INFO - [5a373da4-23c9-4001-93b0-2b33e1830f66] 30 frames saved to temp_videos/5a373da4-23c9-4001-93b0-2b33e1830f66
2025-08-20 23:57:48 - INFO - vision_config is None, using default vision config
2025-08-20 23:57:58 - INFO - Tokens per second: 4.895421462414148, Peak GPU memory MB: 11824.375
2025-08-20 23:57:58 - INFO - [5a373da4-23c9-4001-93b0-2b33e1830f66] Inference time: 28.19 seconds, CPU usage: 37.4%, CPU core utilization: [27.1, 35.4, 25.4, 61.8]
2025-08-20 23:57:58 - INFO - [5a373da4-23c9-4001-93b0-2b33e1830f66] Cleaned up temporary frame directory: temp_videos/5a373da4-23c9-4001-93b0-2b33e1830f66
2025-08-20 23:57:58 - INFO - [3b4f5fd8-cb88-4655-b977-6866088b99b9] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_009.mp4'
2025-08-20 23:57:58 - INFO - [3b4f5fd8-cb88-4655-b977-6866088b99b9] Video saved to temporary file: temp_videos/3b4f5fd8-cb88-4655-b977-6866088b99b9.mp4
2025-08-20 23:57:58 - INFO - [3b4f5fd8-cb88-4655-b977-6866088b99b9] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:58:03 - INFO - [3b4f5fd8-cb88-4655-b977-6866088b99b9] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:58:03 - INFO - [3b4f5fd8-cb88-4655-b977-6866088b99b9] 30 frames saved to temp_videos/3b4f5fd8-cb88-4655-b977-6866088b99b9
2025-08-20 23:58:16 - INFO - vision_config is None, using default vision config
2025-08-20 23:58:28 - INFO - Tokens per second: 6.246345549948331, Peak GPU memory MB: 11824.375
2025-08-20 23:58:28 - INFO - [3b4f5fd8-cb88-4655-b977-6866088b99b9] Inference time: 29.96 seconds, CPU usage: 36.8%, CPU core utilization: [36.0, 58.0, 37.8, 15.2]
2025-08-20 23:58:28 - INFO - [3b4f5fd8-cb88-4655-b977-6866088b99b9] Cleaned up temporary frame directory: temp_videos/3b4f5fd8-cb88-4655-b977-6866088b99b9
2025-08-20 23:58:28 - INFO - [db892fe2-bfd0-4e5d-9afe-9740e936d083] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_010.mp4'
2025-08-20 23:58:28 - INFO - [db892fe2-bfd0-4e5d-9afe-9740e936d083] Video saved to temporary file: temp_videos/db892fe2-bfd0-4e5d-9afe-9740e936d083.mp4
2025-08-20 23:58:28 - INFO - [db892fe2-bfd0-4e5d-9afe-9740e936d083] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:58:33 - INFO - [db892fe2-bfd0-4e5d-9afe-9740e936d083] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:58:33 - INFO - [db892fe2-bfd0-4e5d-9afe-9740e936d083] 30 frames saved to temp_videos/db892fe2-bfd0-4e5d-9afe-9740e936d083
2025-08-20 23:58:46 - INFO - vision_config is None, using default vision config
2025-08-20 23:59:01 - INFO - Tokens per second: 7.823522467010842, Peak GPU memory MB: 11824.375
2025-08-20 23:59:01 - INFO - [db892fe2-bfd0-4e5d-9afe-9740e936d083] Inference time: 32.94 seconds, CPU usage: 35.7%, CPU core utilization: [52.5, 23.9, 22.6, 43.7]
2025-08-20 23:59:01 - INFO - [db892fe2-bfd0-4e5d-9afe-9740e936d083] Cleaned up temporary frame directory: temp_videos/db892fe2-bfd0-4e5d-9afe-9740e936d083
2025-08-20 23:59:01 - INFO - [2c42d84f-5110-43d0-8b9a-70073da685d1] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_011.mp4'
2025-08-20 23:59:01 - INFO - [2c42d84f-5110-43d0-8b9a-70073da685d1] Video saved to temporary file: temp_videos/2c42d84f-5110-43d0-8b9a-70073da685d1.mp4
2025-08-20 23:59:01 - INFO - [2c42d84f-5110-43d0-8b9a-70073da685d1] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:59:06 - INFO - [2c42d84f-5110-43d0-8b9a-70073da685d1] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:59:06 - INFO - [2c42d84f-5110-43d0-8b9a-70073da685d1] 30 frames saved to temp_videos/2c42d84f-5110-43d0-8b9a-70073da685d1
2025-08-20 23:59:19 - INFO - vision_config is None, using default vision config
2025-08-20 23:59:30 - INFO - Tokens per second: 5.483772695220153, Peak GPU memory MB: 11824.375
2025-08-20 23:59:30 - INFO - [2c42d84f-5110-43d0-8b9a-70073da685d1] Inference time: 28.90 seconds, CPU usage: 37.2%, CPU core utilization: [18.5, 41.5, 21.7, 67.3]
2025-08-20 23:59:30 - INFO - [2c42d84f-5110-43d0-8b9a-70073da685d1] Cleaned up temporary frame directory: temp_videos/2c42d84f-5110-43d0-8b9a-70073da685d1
2025-08-20 23:59:30 - INFO - [2872bdf2-bc5e-418d-8cae-07911540d748] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_012.mp4'
2025-08-20 23:59:30 - INFO - [2872bdf2-bc5e-418d-8cae-07911540d748] Video saved to temporary file: temp_videos/2872bdf2-bc5e-418d-8cae-07911540d748.mp4
2025-08-20 23:59:30 - INFO - [2872bdf2-bc5e-418d-8cae-07911540d748] Extracting frames using method: uniform, rate/threshold: 30
2025-08-20 23:59:35 - INFO - [2872bdf2-bc5e-418d-8cae-07911540d748] Extracted 30 frames successfully. Saving to temporary files...
2025-08-20 23:59:35 - INFO - [2872bdf2-bc5e-418d-8cae-07911540d748] 30 frames saved to temp_videos/2872bdf2-bc5e-418d-8cae-07911540d748
2025-08-20 23:59:48 - INFO - vision_config is None, using default vision config
2025-08-21 00:00:00 - INFO - Tokens per second: 6.375310581123721, Peak GPU memory MB: 11824.375
2025-08-21 00:00:00 - INFO - [2872bdf2-bc5e-418d-8cae-07911540d748] Inference time: 30.13 seconds, CPU usage: 37.0%, CPU core utilization: [32.4, 49.7, 41.4, 24.4]
2025-08-21 00:00:00 - INFO - [2872bdf2-bc5e-418d-8cae-07911540d748] Cleaned up temporary frame directory: temp_videos/2872bdf2-bc5e-418d-8cae-07911540d748
2025-08-21 00:00:00 - INFO - [32bd674d-1546-4df6-9cb2-910789260e3b] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_013.mp4'
2025-08-21 00:00:00 - INFO - [32bd674d-1546-4df6-9cb2-910789260e3b] Video saved to temporary file: temp_videos/32bd674d-1546-4df6-9cb2-910789260e3b.mp4
2025-08-21 00:00:00 - INFO - [32bd674d-1546-4df6-9cb2-910789260e3b] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:00:05 - INFO - [32bd674d-1546-4df6-9cb2-910789260e3b] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:00:05 - INFO - [32bd674d-1546-4df6-9cb2-910789260e3b] 30 frames saved to temp_videos/32bd674d-1546-4df6-9cb2-910789260e3b
2025-08-21 00:00:18 - INFO - vision_config is None, using default vision config
2025-08-21 00:00:30 - INFO - Tokens per second: 6.371663181689808, Peak GPU memory MB: 11824.375
2025-08-21 00:00:30 - INFO - [32bd674d-1546-4df6-9cb2-910789260e3b] Inference time: 30.19 seconds, CPU usage: 38.7%, CPU core utilization: [25.3, 24.5, 51.5, 53.1]
2025-08-21 00:00:30 - INFO - [32bd674d-1546-4df6-9cb2-910789260e3b] Cleaned up temporary frame directory: temp_videos/32bd674d-1546-4df6-9cb2-910789260e3b
2025-08-21 00:00:30 - INFO - [b8254b3e-b84d-4015-b0f1-46653b45a403] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_014.mp4'
2025-08-21 00:00:30 - INFO - [b8254b3e-b84d-4015-b0f1-46653b45a403] Video saved to temporary file: temp_videos/b8254b3e-b84d-4015-b0f1-46653b45a403.mp4
2025-08-21 00:00:30 - INFO - [b8254b3e-b84d-4015-b0f1-46653b45a403] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:00:35 - INFO - [b8254b3e-b84d-4015-b0f1-46653b45a403] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:00:35 - INFO - [b8254b3e-b84d-4015-b0f1-46653b45a403] 30 frames saved to temp_videos/b8254b3e-b84d-4015-b0f1-46653b45a403
2025-08-21 00:00:48 - INFO - vision_config is None, using default vision config
2025-08-21 00:01:00 - INFO - Tokens per second: 6.242465092882211, Peak GPU memory MB: 11824.375
2025-08-21 00:01:00 - INFO - [b8254b3e-b84d-4015-b0f1-46653b45a403] Inference time: 29.93 seconds, CPU usage: 36.6%, CPU core utilization: [43.4, 16.4, 27.8, 58.6]
2025-08-21 00:01:00 - INFO - [b8254b3e-b84d-4015-b0f1-46653b45a403] Cleaned up temporary frame directory: temp_videos/b8254b3e-b84d-4015-b0f1-46653b45a403
2025-08-21 00:01:00 - INFO - [b9ea4590-07b7-4d5d-934d-ee43d847cbe4] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_015.mp4'
2025-08-21 00:01:00 - INFO - [b9ea4590-07b7-4d5d-934d-ee43d847cbe4] Video saved to temporary file: temp_videos/b9ea4590-07b7-4d5d-934d-ee43d847cbe4.mp4
2025-08-21 00:01:00 - INFO - [b9ea4590-07b7-4d5d-934d-ee43d847cbe4] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:01:05 - INFO - [b9ea4590-07b7-4d5d-934d-ee43d847cbe4] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:01:05 - INFO - [b9ea4590-07b7-4d5d-934d-ee43d847cbe4] 30 frames saved to temp_videos/b9ea4590-07b7-4d5d-934d-ee43d847cbe4
2025-08-21 00:01:18 - INFO - vision_config is None, using default vision config
2025-08-21 00:01:31 - INFO - Tokens per second: 6.8003448292571615, Peak GPU memory MB: 11824.375
2025-08-21 00:01:31 - INFO - [b9ea4590-07b7-4d5d-934d-ee43d847cbe4] Inference time: 30.82 seconds, CPU usage: 36.4%, CPU core utilization: [76.2, 16.3, 36.4, 16.7]
2025-08-21 00:01:31 - INFO - [b9ea4590-07b7-4d5d-934d-ee43d847cbe4] Cleaned up temporary frame directory: temp_videos/b9ea4590-07b7-4d5d-934d-ee43d847cbe4
2025-08-21 00:01:31 - INFO - [8570af4d-c6dc-4f35-9db8-610d6f2e46d8] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_016.mp4'
2025-08-21 00:01:31 - INFO - [8570af4d-c6dc-4f35-9db8-610d6f2e46d8] Video saved to temporary file: temp_videos/8570af4d-c6dc-4f35-9db8-610d6f2e46d8.mp4
2025-08-21 00:01:31 - INFO - [8570af4d-c6dc-4f35-9db8-610d6f2e46d8] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:01:36 - INFO - [8570af4d-c6dc-4f35-9db8-610d6f2e46d8] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:01:36 - INFO - [8570af4d-c6dc-4f35-9db8-610d6f2e46d8] 30 frames saved to temp_videos/8570af4d-c6dc-4f35-9db8-610d6f2e46d8
2025-08-21 00:01:49 - INFO - vision_config is None, using default vision config
2025-08-21 00:02:01 - INFO - Tokens per second: 6.328218691218011, Peak GPU memory MB: 11824.375
2025-08-21 00:02:01 - INFO - [8570af4d-c6dc-4f35-9db8-610d6f2e46d8] Inference time: 30.10 seconds, CPU usage: 36.5%, CPU core utilization: [24.4, 22.4, 15.9, 83.2]
2025-08-21 00:02:01 - INFO - [8570af4d-c6dc-4f35-9db8-610d6f2e46d8] Cleaned up temporary frame directory: temp_videos/8570af4d-c6dc-4f35-9db8-610d6f2e46d8
2025-08-21 00:02:01 - INFO - [576c0d47-06ea-44af-8aa5-2463af7c8cf7] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_017.mp4'
2025-08-21 00:02:01 - INFO - [576c0d47-06ea-44af-8aa5-2463af7c8cf7] Video saved to temporary file: temp_videos/576c0d47-06ea-44af-8aa5-2463af7c8cf7.mp4
2025-08-21 00:02:01 - INFO - [576c0d47-06ea-44af-8aa5-2463af7c8cf7] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:02:06 - INFO - [576c0d47-06ea-44af-8aa5-2463af7c8cf7] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:02:06 - INFO - [576c0d47-06ea-44af-8aa5-2463af7c8cf7] 30 frames saved to temp_videos/576c0d47-06ea-44af-8aa5-2463af7c8cf7
2025-08-21 00:02:19 - INFO - vision_config is None, using default vision config
2025-08-21 00:02:30 - INFO - Tokens per second: 5.366089856961263, Peak GPU memory MB: 11824.375
2025-08-21 00:02:30 - INFO - [576c0d47-06ea-44af-8aa5-2463af7c8cf7] Inference time: 28.73 seconds, CPU usage: 37.3%, CPU core utilization: [35.5, 35.8, 41.0, 36.9]
2025-08-21 00:02:30 - INFO - [576c0d47-06ea-44af-8aa5-2463af7c8cf7] Cleaned up temporary frame directory: temp_videos/576c0d47-06ea-44af-8aa5-2463af7c8cf7
2025-08-21 00:02:30 - INFO - [247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_018.mp4'
2025-08-21 00:02:30 - INFO - [247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016] Video saved to temporary file: temp_videos/247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016.mp4
2025-08-21 00:02:30 - INFO - [247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:02:35 - INFO - [247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:02:35 - INFO - [247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016] 30 frames saved to temp_videos/247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016
2025-08-21 00:02:48 - INFO - vision_config is None, using default vision config
2025-08-21 00:03:01 - INFO - Tokens per second: 7.211379440337794, Peak GPU memory MB: 11824.375
2025-08-21 00:03:01 - INFO - [247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016] Inference time: 31.65 seconds, CPU usage: 36.4%, CPU core utilization: [47.5, 23.3, 51.7, 23.0]
2025-08-21 00:03:01 - INFO - [247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016] Cleaned up temporary frame directory: temp_videos/247d4e66-1ef1-4d9e-9fa5-4fa5c66aa016
2025-08-21 00:03:02 - INFO - [84e77116-c87b-4b39-8001-7954bd84f3fd] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_019.mp4'
2025-08-21 00:03:02 - INFO - [84e77116-c87b-4b39-8001-7954bd84f3fd] Video saved to temporary file: temp_videos/84e77116-c87b-4b39-8001-7954bd84f3fd.mp4
2025-08-21 00:03:02 - INFO - [84e77116-c87b-4b39-8001-7954bd84f3fd] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:03:06 - INFO - [84e77116-c87b-4b39-8001-7954bd84f3fd] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:03:06 - INFO - [84e77116-c87b-4b39-8001-7954bd84f3fd] 30 frames saved to temp_videos/84e77116-c87b-4b39-8001-7954bd84f3fd
2025-08-21 00:03:19 - INFO - vision_config is None, using default vision config
2025-08-21 00:03:30 - INFO - Tokens per second: 5.540220507070647, Peak GPU memory MB: 11824.375
2025-08-21 00:03:30 - INFO - [84e77116-c87b-4b39-8001-7954bd84f3fd] Inference time: 28.94 seconds, CPU usage: 37.0%, CPU core utilization: [33.0, 54.7, 43.2, 17.3]
2025-08-21 00:03:30 - INFO - [84e77116-c87b-4b39-8001-7954bd84f3fd] Cleaned up temporary frame directory: temp_videos/84e77116-c87b-4b39-8001-7954bd84f3fd
2025-08-21 00:03:30 - INFO - [988d9b75-c51b-4171-9ced-1e7ec41af950] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_020.mp4'
2025-08-21 00:03:30 - INFO - [988d9b75-c51b-4171-9ced-1e7ec41af950] Video saved to temporary file: temp_videos/988d9b75-c51b-4171-9ced-1e7ec41af950.mp4
2025-08-21 00:03:30 - INFO - [988d9b75-c51b-4171-9ced-1e7ec41af950] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:03:35 - INFO - [988d9b75-c51b-4171-9ced-1e7ec41af950] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:03:35 - INFO - [988d9b75-c51b-4171-9ced-1e7ec41af950] 30 frames saved to temp_videos/988d9b75-c51b-4171-9ced-1e7ec41af950
2025-08-21 00:03:48 - INFO - vision_config is None, using default vision config
2025-08-21 00:04:00 - INFO - Tokens per second: 6.098932177607684, Peak GPU memory MB: 11824.375
2025-08-21 00:04:00 - INFO - [988d9b75-c51b-4171-9ced-1e7ec41af950] Inference time: 29.66 seconds, CPU usage: 36.9%, CPU core utilization: [19.0, 31.2, 54.7, 42.7]
2025-08-21 00:04:00 - INFO - [988d9b75-c51b-4171-9ced-1e7ec41af950] Cleaned up temporary frame directory: temp_videos/988d9b75-c51b-4171-9ced-1e7ec41af950
2025-08-21 00:04:00 - INFO - [ad9e17b9-f6e0-44d4-a882-cc7bde714ced] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_021.mp4'
2025-08-21 00:04:00 - INFO - [ad9e17b9-f6e0-44d4-a882-cc7bde714ced] Video saved to temporary file: temp_videos/ad9e17b9-f6e0-44d4-a882-cc7bde714ced.mp4
2025-08-21 00:04:00 - INFO - [ad9e17b9-f6e0-44d4-a882-cc7bde714ced] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:04:05 - INFO - [ad9e17b9-f6e0-44d4-a882-cc7bde714ced] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:04:05 - INFO - [ad9e17b9-f6e0-44d4-a882-cc7bde714ced] 30 frames saved to temp_videos/ad9e17b9-f6e0-44d4-a882-cc7bde714ced
2025-08-21 00:04:18 - INFO - vision_config is None, using default vision config
2025-08-21 00:04:29 - INFO - Tokens per second: 5.256702476105206, Peak GPU memory MB: 11824.375
2025-08-21 00:04:29 - INFO - [ad9e17b9-f6e0-44d4-a882-cc7bde714ced] Inference time: 28.58 seconds, CPU usage: 37.3%, CPU core utilization: [41.7, 37.5, 52.9, 17.2]
2025-08-21 00:04:29 - INFO - [ad9e17b9-f6e0-44d4-a882-cc7bde714ced] Cleaned up temporary frame directory: temp_videos/ad9e17b9-f6e0-44d4-a882-cc7bde714ced
2025-08-21 00:04:29 - INFO - [55b659d8-42d1-481b-b2f4-d9ac78a772cd] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_022.mp4'
2025-08-21 00:04:29 - INFO - [55b659d8-42d1-481b-b2f4-d9ac78a772cd] Video saved to temporary file: temp_videos/55b659d8-42d1-481b-b2f4-d9ac78a772cd.mp4
2025-08-21 00:04:29 - INFO - [55b659d8-42d1-481b-b2f4-d9ac78a772cd] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:04:33 - INFO - [55b659d8-42d1-481b-b2f4-d9ac78a772cd] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:04:33 - INFO - [55b659d8-42d1-481b-b2f4-d9ac78a772cd] 30 frames saved to temp_videos/55b659d8-42d1-481b-b2f4-d9ac78a772cd
2025-08-21 00:04:46 - INFO - vision_config is None, using default vision config
2025-08-21 00:04:55 - INFO - Tokens per second: 2.3146754530784004, Peak GPU memory MB: 11824.375
2025-08-21 00:04:55 - INFO - [55b659d8-42d1-481b-b2f4-d9ac78a772cd] Inference time: 25.85 seconds, CPU usage: 38.1%, CPU core utilization: [69.0, 33.5, 30.6, 19.4]
2025-08-21 00:04:55 - INFO - [55b659d8-42d1-481b-b2f4-d9ac78a772cd] Cleaned up temporary frame directory: temp_videos/55b659d8-42d1-481b-b2f4-d9ac78a772cd
2025-08-21 00:04:55 - INFO - [e8440f1e-8228-4173-8d4e-75e3bd0655f0] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_023.mp4'
2025-08-21 00:04:55 - INFO - [e8440f1e-8228-4173-8d4e-75e3bd0655f0] Video saved to temporary file: temp_videos/e8440f1e-8228-4173-8d4e-75e3bd0655f0.mp4
2025-08-21 00:04:55 - INFO - [e8440f1e-8228-4173-8d4e-75e3bd0655f0] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:04:59 - INFO - [e8440f1e-8228-4173-8d4e-75e3bd0655f0] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:04:59 - INFO - [e8440f1e-8228-4173-8d4e-75e3bd0655f0] 30 frames saved to temp_videos/e8440f1e-8228-4173-8d4e-75e3bd0655f0
2025-08-21 00:05:12 - INFO - vision_config is None, using default vision config
2025-08-21 00:05:23 - INFO - Tokens per second: 4.957384579566463, Peak GPU memory MB: 11824.375
2025-08-21 00:05:23 - INFO - [e8440f1e-8228-4173-8d4e-75e3bd0655f0] Inference time: 28.24 seconds, CPU usage: 37.5%, CPU core utilization: [33.0, 50.8, 47.3, 18.9]
2025-08-21 00:05:23 - INFO - [e8440f1e-8228-4173-8d4e-75e3bd0655f0] Cleaned up temporary frame directory: temp_videos/e8440f1e-8228-4173-8d4e-75e3bd0655f0
2025-08-21 00:05:23 - INFO - [1f996954-f0cf-4ed2-813a-0fe738be6d01] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_024.mp4'
2025-08-21 00:05:23 - INFO - [1f996954-f0cf-4ed2-813a-0fe738be6d01] Video saved to temporary file: temp_videos/1f996954-f0cf-4ed2-813a-0fe738be6d01.mp4
2025-08-21 00:05:23 - INFO - [1f996954-f0cf-4ed2-813a-0fe738be6d01] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:05:28 - INFO - [1f996954-f0cf-4ed2-813a-0fe738be6d01] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:05:28 - INFO - [1f996954-f0cf-4ed2-813a-0fe738be6d01] 30 frames saved to temp_videos/1f996954-f0cf-4ed2-813a-0fe738be6d01
2025-08-21 00:05:41 - INFO - vision_config is None, using default vision config
2025-08-21 00:05:55 - INFO - Tokens per second: 7.516056826941947, Peak GPU memory MB: 11824.375
2025-08-21 00:05:55 - INFO - [1f996954-f0cf-4ed2-813a-0fe738be6d01] Inference time: 32.20 seconds, CPU usage: 35.6%, CPU core utilization: [33.4, 36.1, 32.6, 40.1]
2025-08-21 00:05:55 - INFO - [1f996954-f0cf-4ed2-813a-0fe738be6d01] Cleaned up temporary frame directory: temp_videos/1f996954-f0cf-4ed2-813a-0fe738be6d01
2025-08-21 00:05:55 - INFO - [0750de05-37d6-46f0-bd2c-9e1e5a7140f8] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_025.mp4'
2025-08-21 00:05:55 - INFO - [0750de05-37d6-46f0-bd2c-9e1e5a7140f8] Video saved to temporary file: temp_videos/0750de05-37d6-46f0-bd2c-9e1e5a7140f8.mp4
2025-08-21 00:05:55 - INFO - [0750de05-37d6-46f0-bd2c-9e1e5a7140f8] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:06:00 - INFO - [0750de05-37d6-46f0-bd2c-9e1e5a7140f8] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:06:00 - INFO - [0750de05-37d6-46f0-bd2c-9e1e5a7140f8] 30 frames saved to temp_videos/0750de05-37d6-46f0-bd2c-9e1e5a7140f8
2025-08-21 00:06:13 - INFO - vision_config is None, using default vision config
2025-08-21 00:06:23 - INFO - Tokens per second: 4.695294352922656, Peak GPU memory MB: 11824.375
2025-08-21 00:06:23 - INFO - [0750de05-37d6-46f0-bd2c-9e1e5a7140f8] Inference time: 27.94 seconds, CPU usage: 37.5%, CPU core utilization: [52.4, 21.4, 57.3, 18.7]
2025-08-21 00:06:23 - INFO - [0750de05-37d6-46f0-bd2c-9e1e5a7140f8] Cleaned up temporary frame directory: temp_videos/0750de05-37d6-46f0-bd2c-9e1e5a7140f8
2025-08-21 00:06:23 - INFO - [01029caa-6289-4c40-9cbc-a4ae5a937946] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_026.mp4'
2025-08-21 00:06:23 - INFO - [01029caa-6289-4c40-9cbc-a4ae5a937946] Video saved to temporary file: temp_videos/01029caa-6289-4c40-9cbc-a4ae5a937946.mp4
2025-08-21 00:06:23 - INFO - [01029caa-6289-4c40-9cbc-a4ae5a937946] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:06:28 - INFO - [01029caa-6289-4c40-9cbc-a4ae5a937946] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:06:28 - INFO - [01029caa-6289-4c40-9cbc-a4ae5a937946] 30 frames saved to temp_videos/01029caa-6289-4c40-9cbc-a4ae5a937946
2025-08-21 00:06:41 - INFO - vision_config is None, using default vision config
2025-08-21 00:06:54 - INFO - Tokens per second: 6.726536746978091, Peak GPU memory MB: 11824.375
2025-08-21 00:06:54 - INFO - [01029caa-6289-4c40-9cbc-a4ae5a937946] Inference time: 30.64 seconds, CPU usage: 36.4%, CPU core utilization: [24.0, 59.4, 17.0, 45.3]
2025-08-21 00:06:54 - INFO - [01029caa-6289-4c40-9cbc-a4ae5a937946] Cleaned up temporary frame directory: temp_videos/01029caa-6289-4c40-9cbc-a4ae5a937946
2025-08-21 00:06:54 - INFO - [f210db90-ebcd-4bb7-bbf4-c28a7a20457b] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_027.mp4'
2025-08-21 00:06:54 - INFO - [f210db90-ebcd-4bb7-bbf4-c28a7a20457b] Video saved to temporary file: temp_videos/f210db90-ebcd-4bb7-bbf4-c28a7a20457b.mp4
2025-08-21 00:06:54 - INFO - [f210db90-ebcd-4bb7-bbf4-c28a7a20457b] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:06:58 - INFO - [f210db90-ebcd-4bb7-bbf4-c28a7a20457b] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:06:58 - INFO - [f210db90-ebcd-4bb7-bbf4-c28a7a20457b] 30 frames saved to temp_videos/f210db90-ebcd-4bb7-bbf4-c28a7a20457b
2025-08-21 00:07:11 - INFO - vision_config is None, using default vision config
2025-08-21 00:07:24 - INFO - Tokens per second: 6.553781610815721, Peak GPU memory MB: 11824.375
2025-08-21 00:07:24 - INFO - [f210db90-ebcd-4bb7-bbf4-c28a7a20457b] Inference time: 30.34 seconds, CPU usage: 36.5%, CPU core utilization: [32.0, 56.7, 38.4, 18.8]
2025-08-21 00:07:24 - INFO - [f210db90-ebcd-4bb7-bbf4-c28a7a20457b] Cleaned up temporary frame directory: temp_videos/f210db90-ebcd-4bb7-bbf4-c28a7a20457b
2025-08-21 00:07:24 - INFO - [c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_028.mp4'
2025-08-21 00:07:24 - INFO - [c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c] Video saved to temporary file: temp_videos/c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c.mp4
2025-08-21 00:07:24 - INFO - [c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:07:29 - INFO - [c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:07:29 - INFO - [c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c] 30 frames saved to temp_videos/c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c
2025-08-21 00:07:42 - INFO - vision_config is None, using default vision config
2025-08-21 00:07:55 - INFO - Tokens per second: 6.759111802459959, Peak GPU memory MB: 11824.375
2025-08-21 00:07:55 - INFO - [c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c] Inference time: 30.70 seconds, CPU usage: 36.3%, CPU core utilization: [31.2, 20.1, 76.9, 17.1]
2025-08-21 00:07:55 - INFO - [c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c] Cleaned up temporary frame directory: temp_videos/c32e0c9f-bf3d-4cb5-a0da-af1e3e029c5c
2025-08-21 00:07:55 - INFO - [9ca4ad09-6652-4495-9660-8e536730d426] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_029.mp4'
2025-08-21 00:07:55 - INFO - [9ca4ad09-6652-4495-9660-8e536730d426] Video saved to temporary file: temp_videos/9ca4ad09-6652-4495-9660-8e536730d426.mp4
2025-08-21 00:07:55 - INFO - [9ca4ad09-6652-4495-9660-8e536730d426] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:08:00 - INFO - [9ca4ad09-6652-4495-9660-8e536730d426] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:08:00 - INFO - [9ca4ad09-6652-4495-9660-8e536730d426] 30 frames saved to temp_videos/9ca4ad09-6652-4495-9660-8e536730d426
2025-08-21 00:08:13 - INFO - vision_config is None, using default vision config
2025-08-21 00:08:32 - INFO - Tokens per second: 9.263214649215994, Peak GPU memory MB: 11824.375
2025-08-21 00:08:32 - INFO - [9ca4ad09-6652-4495-9660-8e536730d426] Inference time: 37.69 seconds, CPU usage: 34.5%, CPU core utilization: [53.9, 19.4, 24.3, 40.4]
2025-08-21 00:08:32 - INFO - [9ca4ad09-6652-4495-9660-8e536730d426] Cleaned up temporary frame directory: temp_videos/9ca4ad09-6652-4495-9660-8e536730d426
2025-08-21 00:08:32 - INFO - [df092fec-bdf9-43d3-bdc0-e3addd960939] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_030.mp4'
2025-08-21 00:08:32 - INFO - [df092fec-bdf9-43d3-bdc0-e3addd960939] Video saved to temporary file: temp_videos/df092fec-bdf9-43d3-bdc0-e3addd960939.mp4
2025-08-21 00:08:32 - INFO - [df092fec-bdf9-43d3-bdc0-e3addd960939] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:08:37 - INFO - [df092fec-bdf9-43d3-bdc0-e3addd960939] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:08:37 - INFO - [df092fec-bdf9-43d3-bdc0-e3addd960939] 30 frames saved to temp_videos/df092fec-bdf9-43d3-bdc0-e3addd960939
2025-08-21 00:08:50 - INFO - vision_config is None, using default vision config
2025-08-21 00:09:00 - INFO - Tokens per second: 4.063290188399236, Peak GPU memory MB: 11824.375
2025-08-21 00:09:00 - INFO - [df092fec-bdf9-43d3-bdc0-e3addd960939] Inference time: 27.34 seconds, CPU usage: 37.5%, CPU core utilization: [50.1, 20.2, 61.0, 18.7]
2025-08-21 00:09:00 - INFO - [df092fec-bdf9-43d3-bdc0-e3addd960939] Cleaned up temporary frame directory: temp_videos/df092fec-bdf9-43d3-bdc0-e3addd960939
2025-08-21 00:09:00 - INFO - [ca418df5-894b-4902-ba9d-0e23d1dd86ab] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_031.mp4'
2025-08-21 00:09:00 - INFO - [ca418df5-894b-4902-ba9d-0e23d1dd86ab] Video saved to temporary file: temp_videos/ca418df5-894b-4902-ba9d-0e23d1dd86ab.mp4
2025-08-21 00:09:00 - INFO - [ca418df5-894b-4902-ba9d-0e23d1dd86ab] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:09:05 - INFO - [ca418df5-894b-4902-ba9d-0e23d1dd86ab] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:09:05 - INFO - [ca418df5-894b-4902-ba9d-0e23d1dd86ab] 30 frames saved to temp_videos/ca418df5-894b-4902-ba9d-0e23d1dd86ab
2025-08-21 00:09:17 - INFO - vision_config is None, using default vision config
2025-08-21 00:09:31 - INFO - Tokens per second: 7.1731396320969125, Peak GPU memory MB: 11824.375
2025-08-21 00:09:31 - INFO - [ca418df5-894b-4902-ba9d-0e23d1dd86ab] Inference time: 31.49 seconds, CPU usage: 36.2%, CPU core utilization: [35.7, 20.6, 19.6, 68.6]
2025-08-21 00:09:31 - INFO - [ca418df5-894b-4902-ba9d-0e23d1dd86ab] Cleaned up temporary frame directory: temp_videos/ca418df5-894b-4902-ba9d-0e23d1dd86ab
2025-08-21 00:09:31 - INFO - [e88d0f9b-c698-414c-858d-003fae28cdf3] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_032.mp4'
2025-08-21 00:09:31 - INFO - [e88d0f9b-c698-414c-858d-003fae28cdf3] Video saved to temporary file: temp_videos/e88d0f9b-c698-414c-858d-003fae28cdf3.mp4
2025-08-21 00:09:31 - INFO - [e88d0f9b-c698-414c-858d-003fae28cdf3] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:09:36 - INFO - [e88d0f9b-c698-414c-858d-003fae28cdf3] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:09:36 - INFO - [e88d0f9b-c698-414c-858d-003fae28cdf3] 30 frames saved to temp_videos/e88d0f9b-c698-414c-858d-003fae28cdf3
2025-08-21 00:09:49 - INFO - vision_config is None, using default vision config
2025-08-21 00:10:02 - INFO - Tokens per second: 6.459775499170554, Peak GPU memory MB: 11824.375
2025-08-21 00:10:02 - INFO - [e88d0f9b-c698-414c-858d-003fae28cdf3] Inference time: 30.22 seconds, CPU usage: 36.5%, CPU core utilization: [40.0, 21.9, 46.6, 37.5]
2025-08-21 00:10:02 - INFO - [e88d0f9b-c698-414c-858d-003fae28cdf3] Cleaned up temporary frame directory: temp_videos/e88d0f9b-c698-414c-858d-003fae28cdf3
2025-08-21 00:10:02 - INFO - [53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_033.mp4'
2025-08-21 00:10:02 - INFO - [53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae] Video saved to temporary file: temp_videos/53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae.mp4
2025-08-21 00:10:02 - INFO - [53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:10:06 - INFO - [53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:10:06 - INFO - [53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae] 30 frames saved to temp_videos/53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae
2025-08-21 00:10:19 - INFO - vision_config is None, using default vision config
2025-08-21 00:10:37 - INFO - Tokens per second: 8.532561247112476, Peak GPU memory MB: 11824.375
2025-08-21 00:10:37 - INFO - [53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae] Inference time: 34.97 seconds, CPU usage: 35.2%, CPU core utilization: [16.2, 20.7, 47.4, 56.6]
2025-08-21 00:10:37 - INFO - [53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae] Cleaned up temporary frame directory: temp_videos/53a4ed5c-13d3-4b71-b5f6-ce4d784c05ae
2025-08-21 00:10:37 - INFO - [ccd7f7c1-6c22-4ee5-876a-1c19b01e0607] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_034.mp4'
2025-08-21 00:10:37 - INFO - [ccd7f7c1-6c22-4ee5-876a-1c19b01e0607] Video saved to temporary file: temp_videos/ccd7f7c1-6c22-4ee5-876a-1c19b01e0607.mp4
2025-08-21 00:10:37 - INFO - [ccd7f7c1-6c22-4ee5-876a-1c19b01e0607] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:10:41 - INFO - [ccd7f7c1-6c22-4ee5-876a-1c19b01e0607] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:10:41 - INFO - [ccd7f7c1-6c22-4ee5-876a-1c19b01e0607] 30 frames saved to temp_videos/ccd7f7c1-6c22-4ee5-876a-1c19b01e0607
2025-08-21 00:10:54 - INFO - vision_config is None, using default vision config
2025-08-21 00:11:06 - INFO - Tokens per second: 6.093386425880963, Peak GPU memory MB: 11824.375
2025-08-21 00:11:06 - INFO - [ccd7f7c1-6c22-4ee5-876a-1c19b01e0607] Inference time: 29.67 seconds, CPU usage: 36.6%, CPU core utilization: [47.7, 65.3, 17.1, 16.5]
2025-08-21 00:11:06 - INFO - [ccd7f7c1-6c22-4ee5-876a-1c19b01e0607] Cleaned up temporary frame directory: temp_videos/ccd7f7c1-6c22-4ee5-876a-1c19b01e0607
2025-08-21 00:11:06 - INFO - [496ad230-f9e4-468b-9097-aae0d083dd17] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_035.mp4'
2025-08-21 00:11:06 - INFO - [496ad230-f9e4-468b-9097-aae0d083dd17] Video saved to temporary file: temp_videos/496ad230-f9e4-468b-9097-aae0d083dd17.mp4
2025-08-21 00:11:06 - INFO - [496ad230-f9e4-468b-9097-aae0d083dd17] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:11:11 - INFO - [496ad230-f9e4-468b-9097-aae0d083dd17] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:11:11 - INFO - [496ad230-f9e4-468b-9097-aae0d083dd17] 30 frames saved to temp_videos/496ad230-f9e4-468b-9097-aae0d083dd17
2025-08-21 00:11:24 - INFO - vision_config is None, using default vision config
2025-08-21 00:11:39 - INFO - Tokens per second: 7.846984662180754, Peak GPU memory MB: 11824.375
2025-08-21 00:11:39 - INFO - [496ad230-f9e4-468b-9097-aae0d083dd17] Inference time: 33.09 seconds, CPU usage: 35.7%, CPU core utilization: [14.8, 15.5, 59.9, 52.6]
2025-08-21 00:11:39 - INFO - [496ad230-f9e4-468b-9097-aae0d083dd17] Cleaned up temporary frame directory: temp_videos/496ad230-f9e4-468b-9097-aae0d083dd17
2025-08-21 00:11:39 - INFO - [124ab1d5-65f5-4f6f-8641-57c4c85808d4] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_036.mp4'
2025-08-21 00:11:39 - INFO - [124ab1d5-65f5-4f6f-8641-57c4c85808d4] Video saved to temporary file: temp_videos/124ab1d5-65f5-4f6f-8641-57c4c85808d4.mp4
2025-08-21 00:11:39 - INFO - [124ab1d5-65f5-4f6f-8641-57c4c85808d4] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:11:44 - INFO - [124ab1d5-65f5-4f6f-8641-57c4c85808d4] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:11:44 - INFO - [124ab1d5-65f5-4f6f-8641-57c4c85808d4] 30 frames saved to temp_videos/124ab1d5-65f5-4f6f-8641-57c4c85808d4
2025-08-21 00:11:57 - INFO - vision_config is None, using default vision config
2025-08-21 00:12:07 - INFO - Tokens per second: 4.356472577241336, Peak GPU memory MB: 11824.375
2025-08-21 00:12:07 - INFO - [124ab1d5-65f5-4f6f-8641-57c4c85808d4] Inference time: 27.59 seconds, CPU usage: 37.4%, CPU core utilization: [20.9, 70.7, 18.3, 40.0]
2025-08-21 00:12:07 - INFO - [124ab1d5-65f5-4f6f-8641-57c4c85808d4] Cleaned up temporary frame directory: temp_videos/124ab1d5-65f5-4f6f-8641-57c4c85808d4
2025-08-21 00:12:07 - INFO - [e52f096d-c451-4930-83cf-bb8210f55a92] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_037.mp4'
2025-08-21 00:12:07 - INFO - [e52f096d-c451-4930-83cf-bb8210f55a92] Video saved to temporary file: temp_videos/e52f096d-c451-4930-83cf-bb8210f55a92.mp4
2025-08-21 00:12:07 - INFO - [e52f096d-c451-4930-83cf-bb8210f55a92] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:12:12 - INFO - [e52f096d-c451-4930-83cf-bb8210f55a92] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:12:12 - INFO - [e52f096d-c451-4930-83cf-bb8210f55a92] 30 frames saved to temp_videos/e52f096d-c451-4930-83cf-bb8210f55a92
2025-08-21 00:12:25 - INFO - vision_config is None, using default vision config
2025-08-21 00:12:35 - INFO - Tokens per second: 5.138801059166309, Peak GPU memory MB: 11824.375
2025-08-21 00:12:35 - INFO - [e52f096d-c451-4930-83cf-bb8210f55a92] Inference time: 28.53 seconds, CPU usage: 37.3%, CPU core utilization: [47.6, 38.0, 17.4, 46.2]
2025-08-21 00:12:35 - INFO - [e52f096d-c451-4930-83cf-bb8210f55a92] Cleaned up temporary frame directory: temp_videos/e52f096d-c451-4930-83cf-bb8210f55a92
2025-08-21 00:12:35 - INFO - [00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_038.mp4'
2025-08-21 00:12:35 - INFO - [00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a] Video saved to temporary file: temp_videos/00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a.mp4
2025-08-21 00:12:35 - INFO - [00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:12:40 - INFO - [00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:12:40 - INFO - [00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a] 30 frames saved to temp_videos/00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a
2025-08-21 00:12:53 - INFO - vision_config is None, using default vision config
2025-08-21 00:13:02 - INFO - Tokens per second: 3.425754264592105, Peak GPU memory MB: 11824.375
2025-08-21 00:13:02 - INFO - [00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a] Inference time: 26.76 seconds, CPU usage: 37.7%, CPU core utilization: [23.2, 51.9, 42.6, 33.3]
2025-08-21 00:13:02 - INFO - [00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a] Cleaned up temporary frame directory: temp_videos/00c8bc41-e1e0-4ab7-8072-cd4bc78fe18a
2025-08-21 00:13:02 - INFO - [a840eac1-a5b6-435f-aceb-589aa77afa45] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_039.mp4'
2025-08-21 00:13:02 - INFO - [a840eac1-a5b6-435f-aceb-589aa77afa45] Video saved to temporary file: temp_videos/a840eac1-a5b6-435f-aceb-589aa77afa45.mp4
2025-08-21 00:13:02 - INFO - [a840eac1-a5b6-435f-aceb-589aa77afa45] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:13:07 - INFO - [a840eac1-a5b6-435f-aceb-589aa77afa45] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:13:07 - INFO - [a840eac1-a5b6-435f-aceb-589aa77afa45] 30 frames saved to temp_videos/a840eac1-a5b6-435f-aceb-589aa77afa45
2025-08-21 00:13:20 - INFO - vision_config is None, using default vision config
2025-08-21 00:13:30 - INFO - Tokens per second: 4.69602386319334, Peak GPU memory MB: 11824.375
2025-08-21 00:13:30 - INFO - [a840eac1-a5b6-435f-aceb-589aa77afa45] Inference time: 27.88 seconds, CPU usage: 37.3%, CPU core utilization: [51.3, 17.6, 19.3, 60.6]
2025-08-21 00:13:30 - INFO - [a840eac1-a5b6-435f-aceb-589aa77afa45] Cleaned up temporary frame directory: temp_videos/a840eac1-a5b6-435f-aceb-589aa77afa45
2025-08-21 00:13:30 - INFO - [79fff7f5-155c-4aab-a7fd-d432cf110dde] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_040.mp4'
2025-08-21 00:13:30 - INFO - [79fff7f5-155c-4aab-a7fd-d432cf110dde] Video saved to temporary file: temp_videos/79fff7f5-155c-4aab-a7fd-d432cf110dde.mp4
2025-08-21 00:13:30 - INFO - [79fff7f5-155c-4aab-a7fd-d432cf110dde] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:13:35 - INFO - [79fff7f5-155c-4aab-a7fd-d432cf110dde] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:13:35 - INFO - [79fff7f5-155c-4aab-a7fd-d432cf110dde] 30 frames saved to temp_videos/79fff7f5-155c-4aab-a7fd-d432cf110dde
2025-08-21 00:13:48 - INFO - vision_config is None, using default vision config
2025-08-21 00:14:06 - INFO - Tokens per second: 8.923322929883941, Peak GPU memory MB: 11824.375
2025-08-21 00:14:06 - INFO - [79fff7f5-155c-4aab-a7fd-d432cf110dde] Inference time: 36.23 seconds, CPU usage: 34.9%, CPU core utilization: [26.5, 17.6, 53.3, 42.2]
2025-08-21 00:14:06 - INFO - [79fff7f5-155c-4aab-a7fd-d432cf110dde] Cleaned up temporary frame directory: temp_videos/79fff7f5-155c-4aab-a7fd-d432cf110dde
2025-08-21 00:14:06 - INFO - [5a62181d-1328-4203-92fb-95497b40e1cf] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_041.mp4'
2025-08-21 00:14:06 - INFO - [5a62181d-1328-4203-92fb-95497b40e1cf] Video saved to temporary file: temp_videos/5a62181d-1328-4203-92fb-95497b40e1cf.mp4
2025-08-21 00:14:06 - INFO - [5a62181d-1328-4203-92fb-95497b40e1cf] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:14:11 - INFO - [5a62181d-1328-4203-92fb-95497b40e1cf] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:14:11 - INFO - [5a62181d-1328-4203-92fb-95497b40e1cf] 30 frames saved to temp_videos/5a62181d-1328-4203-92fb-95497b40e1cf
2025-08-21 00:14:24 - INFO - vision_config is None, using default vision config
2025-08-21 00:14:37 - INFO - Tokens per second: 6.836469983622892, Peak GPU memory MB: 11824.375
2025-08-21 00:14:37 - INFO - [5a62181d-1328-4203-92fb-95497b40e1cf] Inference time: 31.03 seconds, CPU usage: 36.3%, CPU core utilization: [30.2, 49.8, 43.0, 22.1]
2025-08-21 00:14:37 - INFO - [5a62181d-1328-4203-92fb-95497b40e1cf] Cleaned up temporary frame directory: temp_videos/5a62181d-1328-4203-92fb-95497b40e1cf
2025-08-21 00:14:37 - INFO - [b8f2e591-45db-423c-9af0-ed74004baae0] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_042.mp4'
2025-08-21 00:14:37 - INFO - [b8f2e591-45db-423c-9af0-ed74004baae0] Video saved to temporary file: temp_videos/b8f2e591-45db-423c-9af0-ed74004baae0.mp4
2025-08-21 00:14:37 - INFO - [b8f2e591-45db-423c-9af0-ed74004baae0] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:14:42 - INFO - [b8f2e591-45db-423c-9af0-ed74004baae0] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:14:42 - INFO - [b8f2e591-45db-423c-9af0-ed74004baae0] 30 frames saved to temp_videos/b8f2e591-45db-423c-9af0-ed74004baae0
2025-08-21 00:14:55 - INFO - vision_config is None, using default vision config
2025-08-21 00:15:07 - INFO - Tokens per second: 5.799046418435553, Peak GPU memory MB: 11824.375
2025-08-21 00:15:07 - INFO - [b8f2e591-45db-423c-9af0-ed74004baae0] Inference time: 29.23 seconds, CPU usage: 36.6%, CPU core utilization: [52.7, 42.9, 16.9, 34.0]
2025-08-21 00:15:07 - INFO - [b8f2e591-45db-423c-9af0-ed74004baae0] Cleaned up temporary frame directory: temp_videos/b8f2e591-45db-423c-9af0-ed74004baae0
2025-08-21 00:15:07 - INFO - [d32f2792-3f75-4bb2-803e-e3d8f5ab1a76] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_043.mp4'
2025-08-21 00:15:07 - INFO - [d32f2792-3f75-4bb2-803e-e3d8f5ab1a76] Video saved to temporary file: temp_videos/d32f2792-3f75-4bb2-803e-e3d8f5ab1a76.mp4
2025-08-21 00:15:07 - INFO - [d32f2792-3f75-4bb2-803e-e3d8f5ab1a76] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:15:12 - INFO - [d32f2792-3f75-4bb2-803e-e3d8f5ab1a76] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:15:12 - INFO - [d32f2792-3f75-4bb2-803e-e3d8f5ab1a76] 30 frames saved to temp_videos/d32f2792-3f75-4bb2-803e-e3d8f5ab1a76
2025-08-21 00:15:25 - INFO - vision_config is None, using default vision config
2025-08-21 00:15:40 - INFO - Tokens per second: 7.766284842271324, Peak GPU memory MB: 11824.375
2025-08-21 00:15:40 - INFO - [d32f2792-3f75-4bb2-803e-e3d8f5ab1a76] Inference time: 32.92 seconds, CPU usage: 35.7%, CPU core utilization: [16.6, 43.3, 58.1, 24.8]
2025-08-21 00:15:40 - INFO - [d32f2792-3f75-4bb2-803e-e3d8f5ab1a76] Cleaned up temporary frame directory: temp_videos/d32f2792-3f75-4bb2-803e-e3d8f5ab1a76
2025-08-21 00:15:40 - INFO - [978e0423-475a-455d-aa66-ca499da86744] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_044.mp4'
2025-08-21 00:15:40 - INFO - [978e0423-475a-455d-aa66-ca499da86744] Video saved to temporary file: temp_videos/978e0423-475a-455d-aa66-ca499da86744.mp4
2025-08-21 00:15:40 - INFO - [978e0423-475a-455d-aa66-ca499da86744] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:15:44 - INFO - [978e0423-475a-455d-aa66-ca499da86744] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:15:44 - INFO - [978e0423-475a-455d-aa66-ca499da86744] 30 frames saved to temp_videos/978e0423-475a-455d-aa66-ca499da86744
2025-08-21 00:15:57 - INFO - vision_config is None, using default vision config
2025-08-21 00:16:07 - INFO - Tokens per second: 4.5631272946545245, Peak GPU memory MB: 11824.375
2025-08-21 00:16:07 - INFO - [978e0423-475a-455d-aa66-ca499da86744] Inference time: 27.84 seconds, CPU usage: 37.3%, CPU core utilization: [25.7, 32.6, 18.2, 72.5]
2025-08-21 00:16:07 - INFO - [978e0423-475a-455d-aa66-ca499da86744] Cleaned up temporary frame directory: temp_videos/978e0423-475a-455d-aa66-ca499da86744
2025-08-21 00:16:07 - INFO - [e4056e7f-abce-447f-9556-fc5a853b32aa] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_045.mp4'
2025-08-21 00:16:07 - INFO - [e4056e7f-abce-447f-9556-fc5a853b32aa] Video saved to temporary file: temp_videos/e4056e7f-abce-447f-9556-fc5a853b32aa.mp4
2025-08-21 00:16:07 - INFO - [e4056e7f-abce-447f-9556-fc5a853b32aa] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:16:12 - INFO - [e4056e7f-abce-447f-9556-fc5a853b32aa] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:16:12 - INFO - [e4056e7f-abce-447f-9556-fc5a853b32aa] 30 frames saved to temp_videos/e4056e7f-abce-447f-9556-fc5a853b32aa
2025-08-21 00:16:25 - INFO - vision_config is None, using default vision config
2025-08-21 00:16:39 - INFO - Tokens per second: 7.282653773730579, Peak GPU memory MB: 11824.375
2025-08-21 00:16:39 - INFO - [e4056e7f-abce-447f-9556-fc5a853b32aa] Inference time: 31.87 seconds, CPU usage: 36.1%, CPU core utilization: [40.4, 17.4, 32.9, 53.5]
2025-08-21 00:16:39 - INFO - [e4056e7f-abce-447f-9556-fc5a853b32aa] Cleaned up temporary frame directory: temp_videos/e4056e7f-abce-447f-9556-fc5a853b32aa
2025-08-21 00:16:39 - INFO - [5d08d2de-911a-498a-9b3a-f89597376f02] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_046.mp4'
2025-08-21 00:16:39 - INFO - [5d08d2de-911a-498a-9b3a-f89597376f02] Video saved to temporary file: temp_videos/5d08d2de-911a-498a-9b3a-f89597376f02.mp4
2025-08-21 00:16:39 - INFO - [5d08d2de-911a-498a-9b3a-f89597376f02] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:16:44 - INFO - [5d08d2de-911a-498a-9b3a-f89597376f02] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:16:44 - INFO - [5d08d2de-911a-498a-9b3a-f89597376f02] 30 frames saved to temp_videos/5d08d2de-911a-498a-9b3a-f89597376f02
2025-08-21 00:16:57 - INFO - vision_config is None, using default vision config
2025-08-21 00:17:11 - INFO - Tokens per second: 7.413608779975096, Peak GPU memory MB: 11824.375
2025-08-21 00:17:11 - INFO - [5d08d2de-911a-498a-9b3a-f89597376f02] Inference time: 32.04 seconds, CPU usage: 36.2%, CPU core utilization: [15.1, 15.9, 55.1, 58.5]
2025-08-21 00:17:11 - INFO - [5d08d2de-911a-498a-9b3a-f89597376f02] Cleaned up temporary frame directory: temp_videos/5d08d2de-911a-498a-9b3a-f89597376f02
2025-08-21 00:17:11 - INFO - [72154fe0-fabb-4695-a74c-07b13b4d70ee] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_047.mp4'
2025-08-21 00:17:11 - INFO - [72154fe0-fabb-4695-a74c-07b13b4d70ee] Video saved to temporary file: temp_videos/72154fe0-fabb-4695-a74c-07b13b4d70ee.mp4
2025-08-21 00:17:11 - INFO - [72154fe0-fabb-4695-a74c-07b13b4d70ee] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:17:16 - INFO - [72154fe0-fabb-4695-a74c-07b13b4d70ee] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:17:16 - INFO - [72154fe0-fabb-4695-a74c-07b13b4d70ee] 30 frames saved to temp_videos/72154fe0-fabb-4695-a74c-07b13b4d70ee
2025-08-21 00:17:29 - INFO - vision_config is None, using default vision config
2025-08-21 00:17:52 - INFO - Tokens per second: 9.744964420585076, Peak GPU memory MB: 11824.375
2025-08-21 00:17:52 - INFO - [72154fe0-fabb-4695-a74c-07b13b4d70ee] Inference time: 40.16 seconds, CPU usage: 34.8%, CPU core utilization: [16.9, 39.2, 15.3, 67.8]
2025-08-21 00:17:52 - INFO - [72154fe0-fabb-4695-a74c-07b13b4d70ee] Cleaned up temporary frame directory: temp_videos/72154fe0-fabb-4695-a74c-07b13b4d70ee
2025-08-21 00:17:52 - INFO - [66d33b91-3d66-4c03-8dce-a78ba0c64f20] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_048.mp4'
2025-08-21 00:17:52 - INFO - [66d33b91-3d66-4c03-8dce-a78ba0c64f20] Video saved to temporary file: temp_videos/66d33b91-3d66-4c03-8dce-a78ba0c64f20.mp4
2025-08-21 00:17:52 - INFO - [66d33b91-3d66-4c03-8dce-a78ba0c64f20] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:17:56 - INFO - [66d33b91-3d66-4c03-8dce-a78ba0c64f20] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:17:56 - INFO - [66d33b91-3d66-4c03-8dce-a78ba0c64f20] 30 frames saved to temp_videos/66d33b91-3d66-4c03-8dce-a78ba0c64f20
2025-08-21 00:18:09 - INFO - vision_config is None, using default vision config
2025-08-21 00:18:21 - INFO - Tokens per second: 5.534022820709767, Peak GPU memory MB: 11824.375
2025-08-21 00:18:21 - INFO - [66d33b91-3d66-4c03-8dce-a78ba0c64f20] Inference time: 28.96 seconds, CPU usage: 36.9%, CPU core utilization: [64.7, 22.2, 29.9, 30.5]
2025-08-21 00:18:21 - INFO - [66d33b91-3d66-4c03-8dce-a78ba0c64f20] Cleaned up temporary frame directory: temp_videos/66d33b91-3d66-4c03-8dce-a78ba0c64f20
2025-08-21 00:18:21 - INFO - [f92833f5-5c53-4d98-bf8a-1686007f24e8] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_049.mp4'
2025-08-21 00:18:21 - INFO - [f92833f5-5c53-4d98-bf8a-1686007f24e8] Video saved to temporary file: temp_videos/f92833f5-5c53-4d98-bf8a-1686007f24e8.mp4
2025-08-21 00:18:21 - INFO - [f92833f5-5c53-4d98-bf8a-1686007f24e8] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:18:25 - INFO - [f92833f5-5c53-4d98-bf8a-1686007f24e8] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:18:25 - INFO - [f92833f5-5c53-4d98-bf8a-1686007f24e8] 30 frames saved to temp_videos/f92833f5-5c53-4d98-bf8a-1686007f24e8
2025-08-21 00:18:38 - INFO - vision_config is None, using default vision config
2025-08-21 00:18:51 - INFO - Tokens per second: 6.507467905304537, Peak GPU memory MB: 11824.375
2025-08-21 00:18:51 - INFO - [f92833f5-5c53-4d98-bf8a-1686007f24e8] Inference time: 30.29 seconds, CPU usage: 36.6%, CPU core utilization: [53.7, 28.6, 33.4, 30.8]
2025-08-21 00:18:51 - INFO - [f92833f5-5c53-4d98-bf8a-1686007f24e8] Cleaned up temporary frame directory: temp_videos/f92833f5-5c53-4d98-bf8a-1686007f24e8
2025-08-21 00:18:51 - INFO - [08e97c3f-0eee-4b5d-a9f5-51984817e6f9] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_050.mp4'
2025-08-21 00:18:51 - INFO - [08e97c3f-0eee-4b5d-a9f5-51984817e6f9] Video saved to temporary file: temp_videos/08e97c3f-0eee-4b5d-a9f5-51984817e6f9.mp4
2025-08-21 00:18:51 - INFO - [08e97c3f-0eee-4b5d-a9f5-51984817e6f9] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:18:56 - INFO - [08e97c3f-0eee-4b5d-a9f5-51984817e6f9] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:18:56 - INFO - [08e97c3f-0eee-4b5d-a9f5-51984817e6f9] 30 frames saved to temp_videos/08e97c3f-0eee-4b5d-a9f5-51984817e6f9
2025-08-21 00:19:09 - INFO - vision_config is None, using default vision config
2025-08-21 00:19:18 - INFO - Tokens per second: 3.2541179562626197, Peak GPU memory MB: 11824.375
2025-08-21 00:19:18 - INFO - [08e97c3f-0eee-4b5d-a9f5-51984817e6f9] Inference time: 26.65 seconds, CPU usage: 37.9%, CPU core utilization: [48.0, 17.9, 20.3, 65.1]
2025-08-21 00:19:18 - INFO - [08e97c3f-0eee-4b5d-a9f5-51984817e6f9] Cleaned up temporary frame directory: temp_videos/08e97c3f-0eee-4b5d-a9f5-51984817e6f9
2025-08-21 00:19:18 - INFO - [fa21ed16-4a4b-4252-a232-b1be5c853f7d] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_051.mp4'
2025-08-21 00:19:18 - INFO - [fa21ed16-4a4b-4252-a232-b1be5c853f7d] Video saved to temporary file: temp_videos/fa21ed16-4a4b-4252-a232-b1be5c853f7d.mp4
2025-08-21 00:19:18 - INFO - [fa21ed16-4a4b-4252-a232-b1be5c853f7d] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:19:22 - INFO - [fa21ed16-4a4b-4252-a232-b1be5c853f7d] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:19:22 - INFO - [fa21ed16-4a4b-4252-a232-b1be5c853f7d] 30 frames saved to temp_videos/fa21ed16-4a4b-4252-a232-b1be5c853f7d
2025-08-21 00:19:35 - INFO - vision_config is None, using default vision config
2025-08-21 00:19:45 - INFO - Tokens per second: 3.912398850108666, Peak GPU memory MB: 11824.375
2025-08-21 00:19:45 - INFO - [fa21ed16-4a4b-4252-a232-b1be5c853f7d] Inference time: 27.18 seconds, CPU usage: 37.2%, CPU core utilization: [62.7, 17.9, 17.4, 50.5]
2025-08-21 00:19:45 - INFO - [fa21ed16-4a4b-4252-a232-b1be5c853f7d] Cleaned up temporary frame directory: temp_videos/fa21ed16-4a4b-4252-a232-b1be5c853f7d
2025-08-21 00:19:45 - INFO - [5e7d1173-00c2-46a2-81a0-1dc1223bd7c2] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_052.mp4'
2025-08-21 00:19:45 - INFO - [5e7d1173-00c2-46a2-81a0-1dc1223bd7c2] Video saved to temporary file: temp_videos/5e7d1173-00c2-46a2-81a0-1dc1223bd7c2.mp4
2025-08-21 00:19:45 - INFO - [5e7d1173-00c2-46a2-81a0-1dc1223bd7c2] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:19:50 - INFO - [5e7d1173-00c2-46a2-81a0-1dc1223bd7c2] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:19:50 - INFO - [5e7d1173-00c2-46a2-81a0-1dc1223bd7c2] 30 frames saved to temp_videos/5e7d1173-00c2-46a2-81a0-1dc1223bd7c2
2025-08-21 00:20:02 - INFO - vision_config is None, using default vision config
2025-08-21 00:20:16 - INFO - Tokens per second: 7.248168780638238, Peak GPU memory MB: 11824.375
2025-08-21 00:20:16 - INFO - [5e7d1173-00c2-46a2-81a0-1dc1223bd7c2] Inference time: 31.69 seconds, CPU usage: 36.1%, CPU core utilization: [60.1, 41.3, 16.2, 26.8]
2025-08-21 00:20:16 - INFO - [5e7d1173-00c2-46a2-81a0-1dc1223bd7c2] Cleaned up temporary frame directory: temp_videos/5e7d1173-00c2-46a2-81a0-1dc1223bd7c2
2025-08-21 00:20:16 - INFO - [1456aa86-ac67-48b3-8739-8f5901221155] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_053.mp4'
2025-08-21 00:20:16 - INFO - [1456aa86-ac67-48b3-8739-8f5901221155] Video saved to temporary file: temp_videos/1456aa86-ac67-48b3-8739-8f5901221155.mp4
2025-08-21 00:20:16 - INFO - [1456aa86-ac67-48b3-8739-8f5901221155] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:20:21 - INFO - [1456aa86-ac67-48b3-8739-8f5901221155] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:20:21 - INFO - [1456aa86-ac67-48b3-8739-8f5901221155] 30 frames saved to temp_videos/1456aa86-ac67-48b3-8739-8f5901221155
2025-08-21 00:20:34 - INFO - vision_config is None, using default vision config
2025-08-21 00:20:47 - INFO - Tokens per second: 6.420566489349132, Peak GPU memory MB: 11824.375
2025-08-21 00:20:47 - INFO - [1456aa86-ac67-48b3-8739-8f5901221155] Inference time: 30.22 seconds, CPU usage: 36.5%, CPU core utilization: [46.5, 21.6, 59.6, 18.1]
2025-08-21 00:20:47 - INFO - [1456aa86-ac67-48b3-8739-8f5901221155] Cleaned up temporary frame directory: temp_videos/1456aa86-ac67-48b3-8739-8f5901221155
2025-08-21 00:20:47 - INFO - [b6dffc6c-f438-4737-809c-69a392f5e9c0] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_054.mp4'
2025-08-21 00:20:47 - INFO - [b6dffc6c-f438-4737-809c-69a392f5e9c0] Video saved to temporary file: temp_videos/b6dffc6c-f438-4737-809c-69a392f5e9c0.mp4
2025-08-21 00:20:47 - INFO - [b6dffc6c-f438-4737-809c-69a392f5e9c0] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:20:51 - INFO - [b6dffc6c-f438-4737-809c-69a392f5e9c0] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:20:51 - INFO - [b6dffc6c-f438-4737-809c-69a392f5e9c0] 30 frames saved to temp_videos/b6dffc6c-f438-4737-809c-69a392f5e9c0
2025-08-21 00:21:04 - INFO - vision_config is None, using default vision config
2025-08-21 00:21:15 - INFO - Tokens per second: 5.19902103533599, Peak GPU memory MB: 11824.375
2025-08-21 00:21:15 - INFO - [b6dffc6c-f438-4737-809c-69a392f5e9c0] Inference time: 28.49 seconds, CPU usage: 37.6%, CPU core utilization: [49.0, 34.0, 48.4, 19.0]
2025-08-21 00:21:15 - INFO - [b6dffc6c-f438-4737-809c-69a392f5e9c0] Cleaned up temporary frame directory: temp_videos/b6dffc6c-f438-4737-809c-69a392f5e9c0
2025-08-21 00:21:15 - INFO - [41c34e88-b17f-4653-a2d5-652b09643d82] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_055.mp4'
2025-08-21 00:21:15 - INFO - [41c34e88-b17f-4653-a2d5-652b09643d82] Video saved to temporary file: temp_videos/41c34e88-b17f-4653-a2d5-652b09643d82.mp4
2025-08-21 00:21:15 - INFO - [41c34e88-b17f-4653-a2d5-652b09643d82] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:21:20 - INFO - [41c34e88-b17f-4653-a2d5-652b09643d82] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:21:20 - INFO - [41c34e88-b17f-4653-a2d5-652b09643d82] 30 frames saved to temp_videos/41c34e88-b17f-4653-a2d5-652b09643d82
2025-08-21 00:21:33 - INFO - vision_config is None, using default vision config
2025-08-21 00:21:46 - INFO - Tokens per second: 6.72473013041542, Peak GPU memory MB: 11824.375
2025-08-21 00:21:46 - INFO - [41c34e88-b17f-4653-a2d5-652b09643d82] Inference time: 30.70 seconds, CPU usage: 36.5%, CPU core utilization: [35.2, 45.0, 37.4, 28.4]
2025-08-21 00:21:46 - INFO - [41c34e88-b17f-4653-a2d5-652b09643d82] Cleaned up temporary frame directory: temp_videos/41c34e88-b17f-4653-a2d5-652b09643d82
2025-08-21 00:21:46 - INFO - [1d8dced7-a51d-4146-b969-9cf087ebf060] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_056.mp4'
2025-08-21 00:21:46 - INFO - [1d8dced7-a51d-4146-b969-9cf087ebf060] Video saved to temporary file: temp_videos/1d8dced7-a51d-4146-b969-9cf087ebf060.mp4
2025-08-21 00:21:46 - INFO - [1d8dced7-a51d-4146-b969-9cf087ebf060] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:21:51 - INFO - [1d8dced7-a51d-4146-b969-9cf087ebf060] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:21:51 - INFO - [1d8dced7-a51d-4146-b969-9cf087ebf060] 30 frames saved to temp_videos/1d8dced7-a51d-4146-b969-9cf087ebf060
2025-08-21 00:22:04 - INFO - vision_config is None, using default vision config
2025-08-21 00:22:15 - INFO - Tokens per second: 5.483718629411901, Peak GPU memory MB: 11824.375
2025-08-21 00:22:15 - INFO - [1d8dced7-a51d-4146-b969-9cf087ebf060] Inference time: 28.89 seconds, CPU usage: 37.1%, CPU core utilization: [27.1, 19.3, 43.8, 58.1]
2025-08-21 00:22:15 - INFO - [1d8dced7-a51d-4146-b969-9cf087ebf060] Cleaned up temporary frame directory: temp_videos/1d8dced7-a51d-4146-b969-9cf087ebf060
2025-08-21 00:22:15 - INFO - [ccc3c8c3-225b-4c96-acfe-9c195f572e4a] Received new video inference request. Prompt: 'Summarize the key observable events in this 1-minute convenience store video clip. Focus strictly on the physical actions and interactions of the people. Describe only what you can see', Video: '/mnt/data/xiuying/Code/local_deploy/video/Clips_60s/sample_part_057.mp4'
2025-08-21 00:22:15 - INFO - [ccc3c8c3-225b-4c96-acfe-9c195f572e4a] Video saved to temporary file: temp_videos/ccc3c8c3-225b-4c96-acfe-9c195f572e4a.mp4
2025-08-21 00:22:15 - INFO - [ccc3c8c3-225b-4c96-acfe-9c195f572e4a] Extracting frames using method: uniform, rate/threshold: 30
2025-08-21 00:22:20 - INFO - [ccc3c8c3-225b-4c96-acfe-9c195f572e4a] Extracted 30 frames successfully. Saving to temporary files...
2025-08-21 00:22:20 - INFO - [ccc3c8c3-225b-4c96-acfe-9c195f572e4a] 30 frames saved to temp_videos/ccc3c8c3-225b-4c96-acfe-9c195f572e4a
2025-08-21 00:22:33 - INFO - vision_config is None, using default vision config
2025-08-21 00:22:49 - INFO - Tokens per second: 8.073006063593622, Peak GPU memory MB: 11824.375
2025-08-21 00:22:49 - INFO - [ccc3c8c3-225b-4c96-acfe-9c195f572e4a] Inference time: 33.72 seconds, CPU usage: 35.9%, CPU core utilization: [27.8, 59.7, 19.4, 37.0]
2025-08-21 00:22:49 - INFO - [ccc3c8c3-225b-4c96-acfe-9c195f572e4a] Cleaned up temporary frame directory: temp_videos/ccc3c8c3-225b-4c96-acfe-9c195f572e4a