360+x Dataset

For more information, please feel free to check our project page.

Overview

360+x dataset introduces a unique panoptic perspective to scene understanding, differentiating itself from traditional datasets by offering multiple viewpoints and modalities, captured from a variety of scenes

Key Features:

  • Multi-viewpoint Captures: Includes 360° panoramic video, third-person front view video, egocentric monocular video, and egocentric binocular video.
  • Rich Audio Modalities: Features normal audio and directional binaural delay.
  • 2,152 multi-model videos captured by 360 cameras and Spectacles camera (8579k frames in total) Captured in 17 cities across 5 countries, covering 28 scenes ranging from Artistic Spaces to Natural Landscapes.
  • Action Temporal Segmentation: Provides labels for 38 action instances for each video pair.

About This Repo

This repository stores the pretrained models of the 360+x dataset. For more code information, please check our official code repository.

Dataset Details

Project Description

  • Developed by: Hao Chen, Yuqi Hou, Chenyuan Qu, Irene Testini, Xiaohan Hong, Jianbo Jiao
  • Funded by: the Ramsay Research Fund, and the Royal Society Short Industry Fellowship
  • License: Creative Commons Attribution-NonCommercial-ShareAlike 4.0

Sources

Dataset Statistics

  • Total Videos: 2,152, split between 464 videos captured using 360 cameras and 1,688 with Spectacles cameras.
  • Scenes: 15 indoor and 13 outdoor, totaling 28 scene categories.
  • Short Clips: The videos have been segmented into 1,380 shorter clips, each approximately 10 seconds long, totaling around 67.78 hours.
  • Frames: 8,579k frames across all clips.

Dataset Structure

Our dataset offers a comprehensive collection of panoramic videos, binocular videos, and third-person videos, each pair of videos accompanied by annotations. Additionally, it includes features extracted using I3D, VGGish, and ResNet-18. Given the high-resolution nature of our dataset (5760x2880 for panoramic and binocular videos, 1920x1080 for third-person front view videos), the overall size is considerably large. To accommodate diverse research needs and computational resources, we also provide a lower-resolution version of the dataset (640x320 for panoramic and binocular videos, 569x320 for third-person front view videos) available for download.

In this repo, we provide the lower-resolution version of the dataset. To access the high-resolution version, please visit the official website.

BibTeX

@inproceedings{chen2024x360,
  title={360+x: A Panoptic Multi-modal Scene Understanding Dataset},
  author={Chen, Hao and Hou, Yuqi and Qu, Chenyuan and Testini, Irene and Hong, Xiaohan and Jiao, Jianbo},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2024}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.