new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Feb 17

PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds

Potato yield is a key indicator for optimizing cultivation practices in agriculture. Potato yield can be estimated on harvesters using RGB-D cameras, which capture three-dimensional (3D) information of individual tubers moving along the conveyor belt. However, point clouds reconstructed from RGB-D images are incomplete due to self-occlusion, leading to systematic underestimation of tuber weight. To address this, we introduce PointRAFT, a high-throughput point cloud regression network that directly predicts continuous 3D shape properties, such as tuber weight, from partial point clouds. Rather than reconstructing full 3D geometry, PointRAFT infers target values directly from raw 3D data. Its key architectural novelty is an object height embedding that incorporates tuber height as an additional geometric cue, improving weight prediction under practical harvesting conditions. PointRAFT was trained and evaluated on 26,688 partial point clouds collected from 859 potato tubers across four cultivars and three growing seasons on an operational harvester in Japan. On a test set of 5,254 point clouds from 172 tubers, PointRAFT achieved a mean absolute error of 12.0 g and a root mean squared error of 17.2 g, substantially outperforming a linear regression baseline and a standard PointNet++ regression network. With an average inference time of 6.3 ms per point cloud, PointRAFT supports processing rates of up to 150 tubers per second, meeting the high-throughput requirements of commercial potato harvesters. Beyond potato weight estimation, PointRAFT provides a versatile regression network applicable to a wide range of 3D phenotyping and robotic perception tasks. The code, network weights, and a subset of the dataset are publicly available at https://github.com/pieterblok/pointraft.git.

  • 6 authors
·
Dec 30, 2025

HiPoNet: A Multi-View Simplicial Complex Network for High Dimensional Point-Cloud and Single-Cell Data

In this paper, we propose HiPoNet, an end-to-end differentiable neural network for regression, classification, and representation learning on high-dimensional point clouds. Our work is motivated by single-cell data which can have very high-dimensionality --exceeding the capabilities of existing methods for point clouds which are mostly tailored for 3D data. Moreover, modern single-cell and spatial experiments now yield entire cohorts of datasets (i.e., one data set for every patient), necessitating models that can process large, high-dimensional point-clouds at scale. Most current approaches build a single nearest-neighbor graph, discarding important geometric and topological information. In contrast, HiPoNet models the point-cloud as a set of higher-order simplicial complexes, with each particular complex being created using a reweighting of features. This method thus generates multiple constructs corresponding to different views of high-dimensional data, which in biology offers the possibility of disentangling distinct cellular processes. It then employs simplicial wavelet transforms to extract multiscale features, capturing both local and global topology from each view. We show that geometric and topological information is preserved in this framework both theoretically and empirically. We showcase the utility of HiPoNet on point-cloud level tasks, involving classification and regression of entire point-clouds in data cohorts. Experimentally, we find that HiPoNet outperforms other point-cloud and graph-based models on single-cell data. We also apply HiPoNet to spatial transcriptomics datasets using spatial coordinates as one of the views. Overall, HiPoNet offers a robust and scalable solution for high-dimensional data analysis.

  • 10 authors
·
Feb 11, 2025

Enhancing Sampling Protocol for Point Cloud Classification Against Corruptions

Established sampling protocols for 3D point cloud learning, such as Farthest Point Sampling (FPS) and Fixed Sample Size (FSS), have long been relied upon. However, real-world data often suffer from corruptions, such as sensor noise, which violates the benign data assumption in current protocols. As a result, these protocols are highly vulnerable to noise, posing significant safety risks in critical applications like autonomous driving. To address these issues, we propose an enhanced point cloud sampling protocol, PointSP, designed to improve robustness against point cloud corruptions. PointSP incorporates key point reweighting to mitigate outlier sensitivity and ensure the selection of representative points. It also introduces a local-global balanced downsampling strategy, which allows for scalable and adaptive sampling while maintaining geometric consistency. Additionally, a lightweight tangent plane interpolation method is used to preserve local geometry while enhancing the density of the point cloud. Unlike learning-based approaches that require additional model training, PointSP is architecture-agnostic, requiring no extra learning or modification to the network. This enables seamless integration into existing pipelines. Extensive experiments on synthetic and real-world corrupted datasets show that PointSP significantly improves the robustness and accuracy of point cloud classification, outperforming state-of-the-art methods across multiple benchmarks.

  • 5 authors
·
Aug 21, 2024

AutoSynth: Learning to Generate 3D Training Data for Object Point Cloud Registration

In the current deep learning paradigm, the amount and quality of training data are as critical as the network architecture and its training details. However, collecting, processing, and annotating real data at scale is difficult, expensive, and time-consuming, particularly for tasks such as 3D object registration. While synthetic datasets can be created, they require expertise to design and include a limited number of categories. In this paper, we introduce a new approach called AutoSynth, which automatically generates 3D training data for point cloud registration. Specifically, AutoSynth automatically curates an optimal dataset by exploring a search space encompassing millions of potential datasets with diverse 3D shapes at a low cost.To achieve this, we generate synthetic 3D datasets by assembling shape primitives, and develop a meta-learning strategy to search for the best training data for 3D registration on real point clouds. For this search to remain tractable, we replace the point cloud registration network with a much smaller surrogate network, leading to a 4056.43 times speedup. We demonstrate the generality of our approach by implementing it with two different point cloud registration networks, BPNet and IDAM. Our results on TUD-L, LINEMOD and Occluded-LINEMOD evidence that a neural network trained on our searched dataset yields consistently better performance than the same one trained on the widely used ModelNet40 dataset.

  • 2 authors
·
Sep 20, 2023

SCOOP: Self-Supervised Correspondence and Optimization-Based Scene Flow

Scene flow estimation is a long-standing problem in computer vision, where the goal is to find the 3D motion of a scene from its consecutive observations. Recently, there have been efforts to compute the scene flow from 3D point clouds. A common approach is to train a regression model that consumes source and target point clouds and outputs the per-point translation vector. An alternative is to learn point matches between the point clouds concurrently with regressing a refinement of the initial correspondence flow. In both cases, the learning task is very challenging since the flow regression is done in the free 3D space, and a typical solution is to resort to a large annotated synthetic dataset. We introduce SCOOP, a new method for scene flow estimation that can be learned on a small amount of data without employing ground-truth flow supervision. In contrast to previous work, we train a pure correspondence model focused on learning point feature representation and initialize the flow as the difference between a source point and its softly corresponding target point. Then, in the run-time phase, we directly optimize a flow refinement component with a self-supervised objective, which leads to a coherent and accurate flow field between the point clouds. Experiments on widespread datasets demonstrate the performance gains achieved by our method compared to existing leading techniques while using a fraction of the training data. Our code is publicly available at https://github.com/itailang/SCOOP.

  • 5 authors
·
Nov 25, 2022

Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning

Recently, leveraging pre-training techniques to enhance point cloud models has become a prominent research topic. However, existing approaches typically require full fine-tuning of pre-trained models to achieve satisfactory performance on downstream tasks, which is storage-intensive and computationally demanding. To address this issue, we propose a novel Parameter-Efficient Fine-Tuning (PEFT) method for point cloud, called PointGST (Point cloud Graph Spectral Tuning). PointGST freezes the pre-trained model and introduces a lightweight, trainable Point Cloud Spectral Adapter (PCSA) for fine-tuning parameters in the spectral domain. The core idea is built on two observations: 1) The inner tokens from frozen models might present confusion in the spatial domain; 2) Task-specific intrinsic information is important for transferring the general knowledge to the downstream task. Specifically, PointGST transfers the point tokens from the spatial domain to the spectral domain, effectively de-correlating confusion among tokens by using orthogonal components for separation. Moreover, the generated spectral basis involves intrinsic information about the downstream point clouds, enabling more targeted tuning. As a result, PointGST facilitates the efficient transfer of general knowledge to downstream tasks while significantly reducing training costs. Extensive experiments on challenging point cloud datasets across various tasks demonstrate that PointGST not only outperforms its fully fine-tuning counterpart but also significantly reduces trainable parameters, making it a promising solution for efficient point cloud learning. The code will be made available at https://github.com/jerryfeng2003/PointGST

  • 6 authors
·
Oct 10, 2024

PointNSP: Autoregressive 3D Point Cloud Generation with Next-Scale Level-of-Detail Prediction

Autoregressive point cloud generation has long lagged behind diffusion-based approaches in quality. The performance gap stems from the fact that autoregressive models impose an artificial ordering on inherently unordered point sets, forcing shape generation to proceed as a sequence of local predictions. This sequential bias emphasizes short-range continuity but undermines the model's capacity to capture long-range dependencies, hindering its ability to enforce global structural properties such as symmetry, consistent topology, and large-scale geometric regularities. Inspired by the level-of-detail (LOD) principle in shape modeling, we propose PointNSP, a coarse-to-fine generative framework that preserves global shape structure at low resolutions and progressively refines fine-grained geometry at higher scales through a next-scale prediction paradigm. This multi-scale factorization aligns the autoregressive objective with the permutation-invariant nature of point sets, enabling rich intra-scale interactions while avoiding brittle fixed orderings. Experiments on ShapeNet show that PointNSP establishes state-of-the-art (SOTA) generation quality for the first time within the autoregressive paradigm. In addition, it surpasses strong diffusion-based baselines in parameter, training, and inference efficiency. Finally, in dense generation with 8,192 points, PointNSP's advantages become even more pronounced, underscoring its scalability potential.

  • 7 authors
·
Mar 11, 2025

RAR: Region-Aware Point Cloud Registration

This paper concerns the research problem of point cloud registration to find the rigid transformation to optimally align the source point set with the target one. Learning robust point cloud registration models with deep neural networks has emerged as a powerful paradigm, offering promising performance in predicting the global geometric transformation for a pair of point sets. Existing methods firstly leverage an encoder to regress a latent shape embedding, which is then decoded into a shape-conditioned transformation via concatenation-based conditioning. However, different regions of a 3D shape vary in their geometric structures which makes it more sense that we have a region-conditioned transformation instead of the shape-conditioned one. In this paper we present a Region-Aware point cloud Registration, denoted as RAR, to predict transformation for pairwise point sets in the self-supervised learning fashion. More specifically, we develop a novel region-aware decoder (RAD) module that is formed with an implicit neural region representation parameterized by neural networks. The implicit neural region representation is learned with a self-supervised 3D shape reconstruction loss without the need for region labels. Consequently, the region-aware decoder (RAD) module guides the training of the region-aware transformation (RAT) module and region-aware weight (RAW) module, which predict the transforms and weights for different regions respectively. The global geometric transformation from source point set to target one is then formed by the weighted fusion of region-aware transforms. Compared to the state-of-the-art approaches, our experiments show that our RAR achieves superior registration performance over various benchmark datasets (e.g. ModelNet40).

  • 2 authors
·
Oct 7, 2021

GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training

This paper tries to address a fundamental question in point cloud self-supervised learning: what is a good signal we should leverage to learn features from point clouds without annotations? To answer that, we introduce a point cloud representation learning framework, based on geometric feature reconstruction. In contrast to recent papers that directly adopt masked autoencoder (MAE) and only predict original coordinates or occupancy from masked point clouds, our method revisits differences between images and point clouds and identifies three self-supervised learning objectives peculiar to point clouds, namely centroid prediction, normal estimation, and curvature prediction. Combined with occupancy prediction, these four objectives yield an nontrivial self-supervised learning task and mutually facilitate models to better reason fine-grained geometry of point clouds. Our pipeline is conceptually simple and it consists of two major steps: first, it randomly masks out groups of points, followed by a Transformer-based point cloud encoder; second, a lightweight Transformer decoder predicts centroid, normal, and curvature for points in each voxel. We transfer the pre-trained Transformer encoder to a downstream peception model. On the nuScene Datset, our model achieves 3.38 mAP improvment for object detection, 2.1 mIoU gain for segmentation, and 1.7 AMOTA gain for multi-object tracking. We also conduct experiments on the Waymo Open Dataset and achieve significant performance improvements over baselines as well.

  • 4 authors
·
May 15, 2023

P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting

Nowadays, pre-training big models on large-scale datasets has become a crucial topic in deep learning. The pre-trained models with high representation ability and transferability achieve a great success and dominate many downstream tasks in natural language processing and 2D vision. However, it is non-trivial to promote such a pretraining-tuning paradigm to the 3D vision, given the limited training data that are relatively inconvenient to collect. In this paper, we provide a new perspective of leveraging pre-trained 2D knowledge in 3D domain to tackle this problem, tuning pre-trained image models with the novel Point-to-Pixel prompting for point cloud analysis at a minor parameter cost. Following the principle of prompting engineering, we transform point clouds into colorful images with geometry-preserved projection and geometry-aware coloring to adapt to pre-trained image models, whose weights are kept frozen during the end-to-end optimization of point cloud analysis tasks. We conduct extensive experiments to demonstrate that cooperating with our proposed Point-to-Pixel Prompting, better pre-trained image model will lead to consistently better performance in 3D vision. Enjoying prosperous development from image pre-training field, our method attains 89.3% accuracy on the hardest setting of ScanObjectNN, surpassing conventional point cloud models with much fewer trainable parameters. Our framework also exhibits very competitive performance on ModelNet classification and ShapeNet Part Segmentation. Code is available at https://github.com/wangzy22/P2P.

  • 5 authors
·
Aug 4, 2022

Pushing Auto-regressive Models for 3D Shape Generation at Capacity and Scalability

Auto-regressive models have achieved impressive results in 2D image generation by modeling joint distributions in grid space. In this paper, we extend auto-regressive models to 3D domains, and seek a stronger ability of 3D shape generation by improving auto-regressive models at capacity and scalability simultaneously. Firstly, we leverage an ensemble of publicly available 3D datasets to facilitate the training of large-scale models. It consists of a comprehensive collection of approximately 900,000 objects, with multiple properties of meshes, points, voxels, rendered images, and text captions. This diverse labeled dataset, termed Objaverse-Mix, empowers our model to learn from a wide range of object variations. However, directly applying 3D auto-regression encounters critical challenges of high computational demands on volumetric grids and ambiguous auto-regressive order along grid dimensions, resulting in inferior quality of 3D shapes. To this end, we then present a novel framework Argus3D in terms of capacity. Concretely, our approach introduces discrete representation learning based on a latent vector instead of volumetric grids, which not only reduces computational costs but also preserves essential geometric details by learning the joint distributions in a more tractable order. The capacity of conditional generation can thus be realized by simply concatenating various conditioning inputs to the latent vector, such as point clouds, categories, images, and texts. In addition, thanks to the simplicity of our model architecture, we naturally scale up our approach to a larger model with an impressive 3.6 billion parameters, further enhancing the quality of versatile 3D generation. Extensive experiments on four generation tasks demonstrate that Argus3D can synthesize diverse and faithful shapes across multiple categories, achieving remarkable performance.

  • 12 authors
·
Feb 19, 2024 1

GDRNPP: A Geometry-guided and Fully Learning-based Object Pose Estimator

6D pose estimation of rigid objects is a long-standing and challenging task in computer vision. Recently, the emergence of deep learning reveals the potential of Convolutional Neural Networks (CNNs) to predict reliable 6D poses. Given that direct pose regression networks currently exhibit suboptimal performance, most methods still resort to traditional techniques to varying degrees. For example, top-performing methods often adopt an indirect strategy by first establishing 2D-3D or 3D-3D correspondences followed by applying the RANSAC-based PnP or Kabsch algorithms, and further employing ICP for refinement. Despite the performance enhancement, the integration of traditional techniques makes the networks time-consuming and not end-to-end trainable. Orthogonal to them, this paper introduces a fully learning-based object pose estimator. In this work, we first perform an in-depth investigation of both direct and indirect methods and propose a simple yet effective Geometry-guided Direct Regression Network (GDRN) to learn the 6D pose from monocular images in an end-to-end manner. Afterwards, we introduce a geometry-guided pose refinement module, enhancing pose accuracy when extra depth data is available. Guided by the predicted coordinate map, we build an end-to-end differentiable architecture that establishes robust and accurate 3D-3D correspondences between the observed and rendered RGB-D images to refine the pose. Our enhanced pose estimation pipeline GDRNPP (GDRN Plus Plus) conquered the leaderboard of the BOP Challenge for two consecutive years, becoming the first to surpass all prior methods that relied on traditional techniques in both accuracy and speed. The code and models are available at https://github.com/shanice-l/gdrnpp_bop2022.

  • 7 authors
·
Feb 24, 2021

Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need?

Geometric deep learning, i.e., designing neural networks to handle the ubiquitous geometric data such as point clouds and graphs, have achieved great successes in the last decade. One critical inductive bias is that the model can maintain invariance towards various transformations such as translation, rotation, and scaling. The existing graph neural network (GNN) approaches can only maintain permutation-invariance, failing to guarantee invariance with respect to other transformations. Besides GNNs, other works design sophisticated transformation-invariant layers, which are computationally expensive and difficult to be extended. To solve this problem, we revisit why the existing neural networks cannot maintain transformation invariance when handling geometric data. Our findings show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance rather than needing sophisticated neural layer designs. Motivated by these findings, we propose Transformation Invariant Neural Networks (TinvNN), a straightforward and general framework for geometric data. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling before feeding the representations into neural networks. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks. Extensive experimental results on point cloud analysis and combinatorial optimization demonstrate the effectiveness and general applicability of our proposed method. Based on the experimental results, we advocate that TinvNN should be considered a new starting point and an essential baseline for further studies of transformation-invariant geometric deep learning.

  • 5 authors
·
Dec 22, 2021

Clustering based Point Cloud Representation Learning for 3D Analysis

Point cloud analysis (such as 3D segmentation and detection) is a challenging task, because of not only the irregular geometries of many millions of unordered points, but also the great variations caused by depth, viewpoint, occlusion, etc. Current studies put much focus on the adaption of neural networks to the complex geometries of point clouds, but are blind to a fundamental question: how to learn an appropriate point embedding space that is aware of both discriminative semantics and challenging variations? As a response, we propose a clustering based supervised learning scheme for point cloud analysis. Unlike current de-facto, scene-wise training paradigm, our algorithm conducts within-class clustering on the point embedding space for automatically discovering subclass patterns which are latent yet representative across scenes. The mined patterns are, in turn, used to repaint the embedding space, so as to respect the underlying distribution of the entire training dataset and improve the robustness to the variations. Our algorithm is principled and readily pluggable to modern point cloud segmentation networks during training, without extra overhead during testing. With various 3D network architectures (i.e., voxel-based, point-based, Transformer-based, automatically searched), our algorithm shows notable improvements on famous point cloud segmentation datasets (i.e.,2.0-2.6% on single-scan and 2.0-2.2% multi-scan of SemanticKITTI, 1.8-1.9% on S3DIS, in terms of mIoU). Our algorithm also demonstrates utility in 3D detection, showing 2.0-3.4% mAP gains on KITTI.

  • 5 authors
·
Jul 26, 2023

SHS-Net: Learning Signed Hyper Surfaces for Oriented Normal Estimation of Point Clouds

We propose a novel method called SHS-Net for oriented normal estimation of point clouds by learning signed hyper surfaces, which can accurately predict normals with global consistent orientation from various point clouds. Almost all existing methods estimate oriented normals through a two-stage pipeline, i.e., unoriented normal estimation and normal orientation, and each step is implemented by a separate algorithm. However, previous methods are sensitive to parameter settings, resulting in poor results from point clouds with noise, density variations and complex geometries. In this work, we introduce signed hyper surfaces (SHS), which are parameterized by multi-layer perceptron (MLP) layers, to learn to estimate oriented normals from point clouds in an end-to-end manner. The signed hyper surfaces are implicitly learned in a high-dimensional feature space where the local and global information is aggregated. Specifically, we introduce a patch encoding module and a shape encoding module to encode a 3D point cloud into a local latent code and a global latent code, respectively. Then, an attention-weighted normal prediction module is proposed as a decoder, which takes the local and global latent codes as input to predict oriented normals. Experimental results show that our SHS-Net outperforms the state-of-the-art methods in both unoriented and oriented normal estimation on the widely used benchmarks. The code, data and pretrained models are publicly available.

  • 7 authors
·
May 9, 2023

Point-MoE: Towards Cross-Domain Generalization in 3D Semantic Segmentation via Mixture-of-Experts

While scaling laws have transformed natural language processing and computer vision, 3D point cloud understanding has yet to reach that stage. This can be attributed to both the comparatively smaller scale of 3D datasets, as well as the disparate sources of the data itself. Point clouds are captured by diverse sensors (e.g., depth cameras, LiDAR) across varied domains (e.g., indoor, outdoor), each introducing unique scanning patterns, sampling densities, and semantic biases. Such domain heterogeneity poses a major barrier towards training unified models at scale, especially under the realistic constraint that domain labels are typically inaccessible at inference time. In this work, we propose Point-MoE, a Mixture-of-Experts architecture designed to enable large-scale, cross-domain generalization in 3D perception. We show that standard point cloud backbones degrade significantly in performance when trained on mixed-domain data, whereas Point-MoE with a simple top-k routing strategy can automatically specialize experts, even without access to domain labels. Our experiments demonstrate that Point-MoE not only outperforms strong multi-domain baselines but also generalizes better to unseen domains. This work highlights a scalable path forward for 3D understanding: letting the model discover structure in diverse 3D data, rather than imposing it via manual curation or domain supervision.

  • 4 authors
·
May 29, 2025 2

HyMamba: Mamba with Hybrid Geometry-Feature Coupling for Efficient Point Cloud Classification

Point cloud classification is one of the essential technologies for achieving intelligent perception of 3D environments by machines, its core challenge is to efficiently extract local and global features. Mamba leverages state space models (SSMs) for global point cloud modeling. Although prior Mamba-based point cloud processing methods pay attention to the limitation of its flattened sequence modeling mechanism in fusing local and global features, the critical issue of weakened local geometric relevance caused by decoupling geometric structures and features in the input patches remains not fully revealed, and both jointly limit local feature extraction. Therefore, we propose HyMamba, a geometry and feature coupled Mamba framework featuring: (1) Geometry-Feature Coupled Pooling (GFCP), which achieves physically interpretable geometric information coupling by dynamically aggregating adjacent geometric information into local features; (2) Collaborative Feature Enhancer (CoFE), which enhances sparse signal capture through cross-path feature hybridization while effectively integrating global and local contexts. We conducted extensive experiments on ModelNet40 and ScanObjectNN datasets. The results demonstrate that the proposed model achieves superior classification performance, particularly on the ModelNet40, where it elevates accuracy to 95.99% with merely 0.03M additional parameters. Furthermore, it attains 98.9% accuracy on the ModelNetFewShot dataset, validating its robust generalization capabilities under sparse samples. Our code and weights are available at https://github.com/L1277471578/HyMamba

  • 5 authors
·
May 16, 2025

MMGP: a Mesh Morphing Gaussian Process-based machine learning method for regression of physical problems under non-parameterized geometrical variability

When learning simulations for modeling physical phenomena in industrial designs, geometrical variabilities are of prime interest. While classical regression techniques prove effective for parameterized geometries, practical scenarios often involve the absence of shape parametrization during the inference stage, leaving us with only mesh discretizations as available data. Learning simulations from such mesh-based representations poses significant challenges, with recent advances relying heavily on deep graph neural networks to overcome the limitations of conventional machine learning approaches. Despite their promising results, graph neural networks exhibit certain drawbacks, including their dependency on extensive datasets and limitations in providing built-in predictive uncertainties or handling large meshes. In this work, we propose a machine learning method that do not rely on graph neural networks. Complex geometrical shapes and variations with fixed topology are dealt with using well-known mesh morphing onto a common support, combined with classical dimensionality reduction techniques and Gaussian processes. The proposed methodology can easily deal with large meshes without the need for explicit shape parameterization and provides crucial predictive uncertainties, which are essential for informed decision-making. In the considered numerical experiments, the proposed method is competitive with respect to existing graph neural networks, regarding training efficiency and accuracy of the predictions.

  • 3 authors
·
May 22, 2023

Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting

Self-supervised learning (SSL) for point cloud pre-training has become a cornerstone for many 3D vision tasks, enabling effective learning from large-scale unannotated data. At the scene level, existing SSL methods often incorporate volume rendering into the pre-training framework, using RGB-D images as reconstruction signals to facilitate cross-modal learning. This strategy promotes alignment between 2D and 3D modalities and enables the model to benefit from rich visual cues in the RGB-D inputs. However, these approaches are limited by their reliance on implicit scene representations and high memory demands. Furthermore, since their reconstruction objectives are applied only in 2D space, they often fail to capture underlying 3D geometric structures. To address these challenges, we propose Gaussian2Scene, a novel scene-level SSL framework that leverages the efficiency and explicit nature of 3D Gaussian Splatting (3DGS) for pre-training. The use of 3DGS not only alleviates the computational burden associated with volume rendering but also supports direct 3D scene reconstruction, thereby enhancing the geometric understanding of the backbone network. Our approach follows a progressive two-stage training strategy. In the first stage, a dual-branch masked autoencoder learns both 2D and 3D scene representations. In the second stage, we initialize training with reconstructed point clouds and further supervise learning using the geometric locations of Gaussian primitives and rendered RGB images. This process reinforces both geometric and cross-modal learning. We demonstrate the effectiveness of Gaussian2Scene across several downstream 3D object detection tasks, showing consistent improvements over existing pre-training methods.

  • 4 authors
·
Jun 10, 2025

Hierarchical Point-based Active Learning for Semi-supervised Point Cloud Semantic Segmentation

Impressive performance on point cloud semantic segmentation has been achieved by fully-supervised methods with large amounts of labelled data. As it is labour-intensive to acquire large-scale point cloud data with point-wise labels, many attempts have been made to explore learning 3D point cloud segmentation with limited annotations. Active learning is one of the effective strategies to achieve this purpose but is still under-explored. The most recent methods of this kind measure the uncertainty of each pre-divided region for manual labelling but they suffer from redundant information and require additional efforts for region division. This paper aims at addressing this issue by developing a hierarchical point-based active learning strategy. Specifically, we measure the uncertainty for each point by a hierarchical minimum margin uncertainty module which considers the contextual information at multiple levels. Then, a feature-distance suppression strategy is designed to select important and representative points for manual labelling. Besides, to better exploit the unlabelled data, we build a semi-supervised segmentation framework based on our active strategy. Extensive experiments on the S3DIS and ScanNetV2 datasets demonstrate that the proposed framework achieves 96.5% and 100% performance of fully-supervised baseline with only 0.07% and 0.1% training data, respectively, outperforming the state-of-the-art weakly-supervised and active learning methods. The code will be available at https://github.com/SmiletoE/HPAL.

  • 5 authors
·
Aug 21, 2023

Transformers Can Do Bayesian Inference

Currently, it is hard to reap the benefits of deep learning for Bayesian methods, which allow the explicit specification of prior knowledge and accurately capture model uncertainty. We present Prior-Data Fitted Networks (PFNs). PFNs leverage large-scale machine learning techniques to approximate a large set of posteriors. The only requirement for PFNs to work is the ability to sample from a prior distribution over supervised learning tasks (or functions). Our method restates the objective of posterior approximation as a supervised classification problem with a set-valued input: it repeatedly draws a task (or function) from the prior, draws a set of data points and their labels from it, masks one of the labels and learns to make probabilistic predictions for it based on the set-valued input of the rest of the data points. Presented with a set of samples from a new supervised learning task as input, PFNs make probabilistic predictions for arbitrary other data points in a single forward propagation, having learned to approximate Bayesian inference. We demonstrate that PFNs can near-perfectly mimic Gaussian processes and also enable efficient Bayesian inference for intractable problems, with over 200-fold speedups in multiple setups compared to current methods. We obtain strong results in very diverse areas such as Gaussian process regression, Bayesian neural networks, classification for small tabular data sets, and few-shot image classification, demonstrating the generality of PFNs. Code and trained PFNs are released at https://github.com/automl/TransformersCanDoBayesianInference.

  • 5 authors
·
Dec 20, 2021

Point Cloud Mamba: Point Cloud Learning via State Space Model

Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture to more efficiently and effectively model point cloud data globally with linear computational complexity. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of x, y, and z coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences more effectively. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 79.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 5.5 mIoU and 4.9 mIoU, respectively.

  • 8 authors
·
Mar 1, 2024

Distillation with Contrast is All You Need for Self-Supervised Point Cloud Representation Learning

In this paper, we propose a simple and general framework for self-supervised point cloud representation learning. Human beings understand the 3D world by extracting two levels of information and establishing the relationship between them. One is the global shape of an object, and the other is the local structures of it. However, few existing studies in point cloud representation learning explored how to learn both global shapes and local-to-global relationships without a specified network architecture. Inspired by how human beings understand the world, we utilize knowledge distillation to learn both global shape information and the relationship between global shape and local structures. At the same time, we combine contrastive learning with knowledge distillation to make the teacher network be better updated. Our method achieves the state-of-the-art performance on linear classification and multiple other downstream tasks. Especially, we develop a variant of ViT for 3D point cloud feature extraction, which also achieves comparable results with existing backbones when combined with our framework, and visualization of the attention maps show that our model does understand the point cloud by combining the global shape information and multiple local structural information, which is consistent with the inspiration of our representation learning method. Our code will be released soon.

  • 6 authors
·
Feb 8, 2022

Towards More Diverse and Challenging Pre-training for Point Cloud Learning: Self-Supervised Cross Reconstruction with Decoupled Views

Point cloud learning, especially in a self-supervised way without manual labels, has gained growing attention in both vision and learning communities due to its potential utility in a wide range of applications. Most existing generative approaches for point cloud self-supervised learning focus on recovering masked points from visible ones within a single view. Recognizing that a two-view pre-training paradigm inherently introduces greater diversity and variance, it may thus enable more challenging and informative pre-training. Inspired by this, we explore the potential of two-view learning in this domain. In this paper, we propose Point-PQAE, a cross-reconstruction generative paradigm that first generates two decoupled point clouds/views and then reconstructs one from the other. To achieve this goal, we develop a crop mechanism for point cloud view generation for the first time and further propose a novel positional encoding to represent the 3D relative position between the two decoupled views. The cross-reconstruction significantly increases the difficulty of pre-training compared to self-reconstruction, which enables our method to surpass previous single-modal self-reconstruction methods in 3D self-supervised learning. Specifically, it outperforms the self-reconstruction baseline (Point-MAE) by 6.5%, 7.0%, and 6.7% in three variants of ScanObjectNN with the Mlp-Linear evaluation protocol. The code is available at https://github.com/aHapBean/Point-PQAE.

  • 3 authors
·
Sep 1, 2025 2

Parallel Neural Computing for Scene Understanding from LiDAR Perception in Autonomous Racing

Autonomous driving in high-speed racing, as opposed to urban environments, presents significant challenges in scene understanding due to rapid changes in the track environment. Traditional sequential network approaches may struggle to meet the real-time knowledge and decision-making demands of an autonomous agent covering large displacements in a short time. This paper proposes a novel baseline architecture for developing sophisticated models capable of true hardware-enabled parallelism, achieving neural processing speeds that mirror the agent's high velocity. The proposed model (Parallel Perception Network (PPN)) consists of two independent neural networks, segmentation and reconstruction networks, running parallelly on separate accelerated hardware. The model takes raw 3D point cloud data from the LiDAR sensor as input and converts it into a 2D Bird's Eye View Map on both devices. Each network independently extracts its input features along space and time dimensions and produces outputs parallelly. The proposed method's model is trained on a system with two NVIDIA T4 GPUs, using a combination of loss functions, including edge preservation, and demonstrates a 2x speedup in model inference time compared to a sequential configuration. Implementation is available at: https://github.com/suwesh/Parallel-Perception-Network. Learned parameters of the trained networks are provided at: https://huggingface.co/suwesh/ParallelPerceptionNetwork.

  • 1 authors
·
Dec 23, 2024

Accelerating Neural Architecture Search using Performance Prediction

Methods for neural network hyperparameter optimization and meta-modeling are computationally expensive due to the need to train a large number of model configurations. In this paper, we show that standard frequentist regression models can predict the final performance of partially trained model configurations using features based on network architectures, hyperparameters, and time-series validation performance data. We empirically show that our performance prediction models are much more effective than prominent Bayesian counterparts, are simpler to implement, and are faster to train. Our models can predict final performance in both visual classification and language modeling domains, are effective for predicting performance of drastically varying model architectures, and can even generalize between model classes. Using these prediction models, we also propose an early stopping method for hyperparameter optimization and meta-modeling, which obtains a speedup of a factor up to 6x in both hyperparameter optimization and meta-modeling. Finally, we empirically show that our early stopping method can be seamlessly incorporated into both reinforcement learning-based architecture selection algorithms and bandit based search methods. Through extensive experimentation, we empirically show our performance prediction models and early stopping algorithm are state-of-the-art in terms of prediction accuracy and speedup achieved while still identifying the optimal model configurations.

  • 4 authors
·
May 30, 2017

CurbNet: Curb Detection Framework Based on LiDAR Point Cloud Segmentation

Curb detection is a crucial function in intelligent driving, essential for determining drivable areas on the road. However, the complexity of road environments makes curb detection challenging. This paper introduces CurbNet, a novel framework for curb detection utilizing point cloud segmentation. To address the lack of comprehensive curb datasets with 3D annotations, we have developed the 3D-Curb dataset based on SemanticKITTI, currently the largest and most diverse collection of curb point clouds. Recognizing that the primary characteristic of curbs is height variation, our approach leverages spatially rich 3D point clouds for training. To tackle the challenges posed by the uneven distribution of curb features on the xy-plane and their dependence on high-frequency features along the z-axis, we introduce the Multi-Scale and Channel Attention (MSCA) module, a customized solution designed to optimize detection performance. Additionally, we propose an adaptive weighted loss function group specifically formulated to counteract the imbalance in the distribution of curb point clouds relative to other categories. Extensive experiments conducted on 2 major datasets demonstrate that our method surpasses existing benchmarks set by leading curb detection and point cloud segmentation models. Through the post-processing refinement of the detection results, we have significantly reduced noise in curb detection, thereby improving precision by 4.5 points. Similarly, our tolerance experiments also achieve state-of-the-art results. Furthermore, real-world experiments and dataset analyses mutually validate each other, reinforcing CurbNet's superior detection capability and robust generalizability. The project website is available at: https://github.com/guoyangzhao/CurbNet/.

  • 6 authors
·
Mar 25, 2024

Point2SSM: Learning Morphological Variations of Anatomies from Point Cloud

We present Point2SSM, a novel unsupervised learning approach for constructing correspondence-based statistical shape models (SSMs) directly from raw point clouds. SSM is crucial in clinical research, enabling population-level analysis of morphological variation in bones and organs. Traditional methods of SSM construction have limitations, including the requirement of noise-free surface meshes or binary volumes, reliance on assumptions or templates, and prolonged inference times due to simultaneous optimization of the entire cohort. Point2SSM overcomes these barriers by providing a data-driven solution that infers SSMs directly from raw point clouds, reducing inference burdens and increasing applicability as point clouds are more easily acquired. While deep learning on 3D point clouds has seen success in unsupervised representation learning and shape correspondence, its application to anatomical SSM construction is largely unexplored. We conduct a benchmark of state-of-the-art point cloud deep networks on the SSM task, revealing their limited robustness to clinical challenges such as noisy, sparse, or incomplete input and limited training data. Point2SSM addresses these issues through an attention-based module, providing effective correspondence mappings from learned point features. Our results demonstrate that the proposed method significantly outperforms existing networks in terms of accurate surface sampling and correspondence, better capturing population-level statistics.

  • 2 authors
·
May 23, 2023

FourCastNet 3: A geometric approach to probabilistic machine-learning weather forecasting at scale

FourCastNet 3 advances global weather modeling by implementing a scalable, geometric machine learning (ML) approach to probabilistic ensemble forecasting. The approach is designed to respect spherical geometry and to accurately model the spatially correlated probabilistic nature of the problem, resulting in stable spectra and realistic dynamics across multiple scales. FourCastNet 3 delivers forecasting accuracy that surpasses leading conventional ensemble models and rivals the best diffusion-based methods, while producing forecasts 8 to 60 times faster than these approaches. In contrast to other ML approaches, FourCastNet 3 demonstrates excellent probabilistic calibration and retains realistic spectra, even at extended lead times of up to 60 days. All of these advances are realized using a purely convolutional neural network architecture tailored for spherical geometry. Scalable and efficient large-scale training on 1024 GPUs and more is enabled by a novel training paradigm for combined model- and data-parallelism, inspired by domain decomposition methods in classical numerical models. Additionally, FourCastNet 3 enables rapid inference on a single GPU, producing a 60-day global forecast at 0.25{\deg}, 6-hourly resolution in under 4 minutes. Its computational efficiency, medium-range probabilistic skill, spectral fidelity, and rollout stability at subseasonal timescales make it a strong candidate for improving meteorological forecasting and early warning systems through large ensemble predictions.

  • 10 authors
·
Jul 16, 2025

Instance-aware Dynamic Prompt Tuning for Pre-trained Point Cloud Models

Pre-trained point cloud models have found extensive applications in 3D understanding tasks like object classification and part segmentation. However, the prevailing strategy of full fine-tuning in downstream tasks leads to large per-task storage overhead for model parameters, which limits the efficiency when applying large-scale pre-trained models. Inspired by the recent success of visual prompt tuning (VPT), this paper attempts to explore prompt tuning on pre-trained point cloud models, to pursue an elegant balance between performance and parameter efficiency. We find while instance-agnostic static prompting, e.g. VPT, shows some efficacy in downstream transfer, it is vulnerable to the distribution diversity caused by various types of noises in real-world point cloud data. To conquer this limitation, we propose a novel Instance-aware Dynamic Prompt Tuning (IDPT) strategy for pre-trained point cloud models. The essence of IDPT is to develop a dynamic prompt generation module to perceive semantic prior features of each point cloud instance and generate adaptive prompt tokens to enhance the model's robustness. Notably, extensive experiments demonstrate that IDPT outperforms full fine-tuning in most tasks with a mere 7% of the trainable parameters, providing a promising solution to parameter-efficient learning for pre-trained point cloud models. Code is available at https://github.com/zyh16143998882/ICCV23-IDPT.

  • 6 authors
·
Apr 14, 2023

KeyMatchNet: Zero-Shot Pose Estimation in 3D Point Clouds by Generalized Keypoint Matching

In this paper, we present KeyMatchNet, a novel network for zero-shot pose estimation in 3D point clouds. Our method uses only depth information, making it more applicable for many industrial use cases, as color information is seldom available. The network is composed of two parallel components for computing object and scene features. The features are then combined to create matches used for pose estimation. The parallel structure allows for pre-processing of the individual parts, which decreases the run-time. Using a zero-shot network allows for a very short set-up time, as it is not necessary to train models for new objects. However, as the network is not trained for the specific object, zero-shot pose estimation methods generally have lower accuracy compared with conventional methods. To address this, we reduce the complexity of the task by including the scenario information during training. This is typically not feasible as collecting real data for new tasks drastically increases the cost. However, for zero-shot pose estimation, training for new objects is not necessary and the expensive data collection can thus be performed only once. Our method is trained on 1,500 objects and is only tested on unseen objects. We demonstrate that the trained network can not only accurately estimate poses for novel objects, but also demonstrate the ability of the network on objects outside of the trained class. Test results are also shown on real data. We believe that the presented method is valuable for many real-world scenarios. Project page available at keymatchnet.github.io

  • 2 authors
·
Mar 28, 2023

Hierarchical Feature Learning for Medical Point Clouds via State Space Model

Deep learning-based point cloud modeling has been widely investigated as an indispensable component of general shape analysis. Recently, transformer and state space model (SSM) have shown promising capacities in point cloud learning. However, limited research has been conducted on medical point clouds, which have great potential in disease diagnosis and treatment. This paper presents an SSM-based hierarchical feature learning framework for medical point cloud understanding. Specifically, we down-sample input into multiple levels through the farthest point sampling. At each level, we perform a series of k-nearest neighbor (KNN) queries to aggregate multi-scale structural information. To assist SSM in processing point clouds, we introduce coordinate-order and inside-out scanning strategies for efficient serialization of irregular points. Point features are calculated progressively from short neighbor sequences and long point sequences through vanilla and group Point SSM blocks, to capture both local patterns and long-range dependencies. To evaluate the proposed method, we build a large-scale medical point cloud dataset named MedPointS for anatomy classification, completion, and segmentation. Extensive experiments conducted on MedPointS demonstrate that our method achieves superior performance across all tasks. The dataset is available at https://flemme-docs.readthedocs.io/en/latest/medpoints.html. Code is merged to a public medical imaging platform: https://github.com/wlsdzyzl/flemme.

  • 3 authors
·
Apr 17, 2025

NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene. While compression is possible, performance drops significantly at high compression rates. Conversely, coordinate regression methods achieve high compression by storing scene information in a neural network but suffer from reduced robustness. NeuMap combines the advantages of both approaches by utilizing 1) learnable latent codes for efficient scene representation and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for query pixels. This scene-agnostic network design learns robust matching priors from large-scale data and enables rapid optimization of codes for new scenes while keeping the network weights fixed. Extensive evaluations on five benchmarks show that NeuMap significantly outperforms other coordinate regression methods and achieves comparable performance to feature matching methods while requiring a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen night benchmark with only 6MB of data, whereas alternative methods require 100MB or several gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap

  • 5 authors
·
Nov 20, 2022

PARE-Net: Position-Aware Rotation-Equivariant Networks for Robust Point Cloud Registration

Learning rotation-invariant distinctive features is a fundamental requirement for point cloud registration. Existing methods often use rotation-sensitive networks to extract features, while employing rotation augmentation to learn an approximate invariant mapping rudely. This makes networks fragile to rotations, overweight, and hinders the distinctiveness of features. To tackle these problems, we propose a novel position-aware rotation-equivariant network, for efficient, light-weighted, and robust registration. The network can provide a strong model inductive bias to learn rotation-equivariant/invariant features, thus addressing the aforementioned limitations. To further improve the distinctiveness of descriptors, we propose a position-aware convolution, which can better learn spatial information of local structures. Moreover, we also propose a feature-based hypothesis proposer. It leverages rotation-equivariant features that encode fine-grained structure orientations to generate reliable model hypotheses. Each correspondence can generate a hypothesis, thus it is more efficient than classic estimators that require multiple reliable correspondences. Accordingly, a contrastive rotation loss is presented to enhance the robustness of rotation-equivariant features against data degradation. Extensive experiments on indoor and outdoor datasets demonstrate that our method significantly outperforms the SOTA methods in terms of registration recall while being lightweight and keeping a fast speed. Moreover, experiments on rotated datasets demonstrate its robustness against rotation variations. Code is available at https://github.com/yaorz97/PARENet.

  • 5 authors
·
Jul 14, 2024

Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling

We present Point-BERT, a new paradigm for learning Transformers to generalize the concept of BERT to 3D point cloud. Inspired by BERT, we devise a Masked Point Modeling (MPM) task to pre-train point cloud Transformers. Specifically, we first divide a point cloud into several local point patches, and a point cloud Tokenizer with a discrete Variational AutoEncoder (dVAE) is designed to generate discrete point tokens containing meaningful local information. Then, we randomly mask out some patches of input point clouds and feed them into the backbone Transformers. The pre-training objective is to recover the original point tokens at the masked locations under the supervision of point tokens obtained by the Tokenizer. Extensive experiments demonstrate that the proposed BERT-style pre-training strategy significantly improves the performance of standard point cloud Transformers. Equipped with our pre-training strategy, we show that a pure Transformer architecture attains 93.8% accuracy on ModelNet40 and 83.1% accuracy on the hardest setting of ScanObjectNN, surpassing carefully designed point cloud models with much fewer hand-made designs. We also demonstrate that the representations learned by Point-BERT transfer well to new tasks and domains, where our models largely advance the state-of-the-art of few-shot point cloud classification task. The code and pre-trained models are available at https://github.com/lulutang0608/Point-BERT

  • 6 authors
·
Nov 29, 2021

Transductive Few-Shot Learning: Clustering is All You Need?

We investigate a general formulation for clustering and transductive few-shot learning, which integrates prototype-based objectives, Laplacian regularization and supervision constraints from a few labeled data points. We propose a concave-convex relaxation of the problem, and derive a computationally efficient block-coordinate bound optimizer, with convergence guarantee. At each iteration,our optimizer computes independent (parallel) updates for each point-to-cluster assignment. Therefore, it could be trivially distributed for large-scale clustering and few-shot tasks. Furthermore, we provides a thorough convergence analysis based on point-to-set maps. Were port comprehensive clustering and few-shot learning experiments over various data sets, showing that our method yields competitive performances, in term of accuracy and optimization quality, while scaling up to large problems. Using standard training on the base classes, without resorting to complex meta-learning and episodic-training strategies, our approach outperforms state-of-the-art few-shot methods by significant margins, across various models, settings and data sets. Surprisingly, we found that even standard clustering procedures (e.g., K-means), which correspond to particular, non-regularized cases of our general model, already achieve competitive performances in comparison to the state-of-the-art in few-shot learning. These surprising results point to the limitations of the current few-shot benchmarks, and question the viability of a large body of convoluted few-shot learning techniques in the recent literature.

  • 5 authors
·
Jun 16, 2021

PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection

The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality. However, these applications usually have an urgent requirement for not only accurate but also efficient 3D object detection. Recently, knowledge distillation has been proposed as an effective model compression technique, which transfers the knowledge from an over-parameterized teacher to a lightweight student and achieves consistent effectiveness in 2D vision. However, due to point clouds' sparsity and irregularity, directly applying previous image-based knowledge distillation methods to point cloud detectors usually leads to unsatisfactory performance. To fill the gap, this paper proposes PointDistiller, a structured knowledge distillation framework for point clouds-based 3D detection. Concretely, PointDistiller includes local distillation which extracts and distills the local geometric structure of point clouds with dynamic graph convolution and reweighted learning strategy, which highlights student learning on the crucial points or voxels to improve knowledge distillation efficiency. Extensive experiments on both voxels-based and raw points-based detectors have demonstrated the effectiveness of our method over seven previous knowledge distillation methods. For instance, our 4X compressed PointPillars student achieves 2.8 and 3.4 mAP improvements on BEV and 3D object detection, outperforming its teacher by 0.9 and 1.8 mAP, respectively. Codes have been released at https://github.com/RunpeiDong/PointDistiller.

  • 4 authors
·
May 23, 2022

Reduced-Order Neural Operators: Learning Lagrangian Dynamics on Highly Sparse Graphs

We present a neural operator architecture to simulate Lagrangian dynamics, such as fluid flow, granular flows, and elastoplasticity. Traditional numerical methods, such as the finite element method (FEM), suffer from long run times and large memory consumption. On the other hand, approaches based on graph neural networks are faster but still suffer from long computation times on dense graphs, which are often required for high-fidelity simulations. Our model, GIOROM or Graph Interaction Operator for Reduced-Order Modeling, learns temporal dynamics within a reduced-order setting, capturing spatial features from a highly sparse graph representation of the input and generalizing to arbitrary spatial locations during inference. The model is geometry-aware and discretization-agnostic and can generalize to different initial conditions, velocities, and geometries after training. We show that point clouds of the order of 100,000 points can be inferred from sparse graphs with sim1000 points, with negligible change in computation time. We empirically evaluate our model on elastic solids, Newtonian fluids, Non-Newtonian fluids, Drucker-Prager granular flows, and von Mises elastoplasticity. On these benchmarks, our approach results in a 25times speedup compared to other neural network-based physics simulators while delivering high-fidelity predictions of complex physical systems and showing better performance on most benchmarks. The code and the demos are provided at https://github.com/HrishikeshVish/GIOROM.

  • 5 authors
·
Jul 4, 2024

KECOR: Kernel Coding Rate Maximization for Active 3D Object Detection

Achieving a reliable LiDAR-based object detector in autonomous driving is paramount, but its success hinges on obtaining large amounts of precise 3D annotations. Active learning (AL) seeks to mitigate the annotation burden through algorithms that use fewer labels and can attain performance comparable to fully supervised learning. Although AL has shown promise, current approaches prioritize the selection of unlabeled point clouds with high uncertainty and/or diversity, leading to the selection of more instances for labeling and reduced computational efficiency. In this paper, we resort to a novel kernel coding rate maximization (KECOR) strategy which aims to identify the most informative point clouds to acquire labels through the lens of information theory. Greedy search is applied to seek desired point clouds that can maximize the minimal number of bits required to encode the latent features. To determine the uniqueness and informativeness of the selected samples from the model perspective, we construct a proxy network of the 3D detector head and compute the outer product of Jacobians from all proxy layers to form the empirical neural tangent kernel (NTK) matrix. To accommodate both one-stage (i.e., SECOND) and two-stage detectors (i.e., PVRCNN), we further incorporate the classification entropy maximization and well trade-off between detection performance and the total number of bounding boxes selected for annotation. Extensive experiments conducted on two 3D benchmarks and a 2D detection dataset evidence the superiority and versatility of the proposed approach. Our results show that approximately 44% box-level annotation costs and 26% computational time are reduced compared to the state-of-the-art AL method, without compromising detection performance.

  • 6 authors
·
Jul 16, 2023