A Semantically Disentangled Unified Model for Multi-category 3D Anomaly Detection

Teaser: an overview of SeDiR. Category-specific models require a separate model for each class, while naive unified models often suffer from inter-category feature entanglement. SeDiR addresses this problem by aggregating coarse-to-fine geometric cues into category-aware global features, disentangling them via Category-Conditioned Contrastive Learning, and guiding reconstruction with input geometry.

Abstract

3D anomaly detection targets the detection and localization of defects in 3D point clouds trained solely on normal data. While a unified model improves scalability by learning across multiple categories, it often suffers from Inter-Category Entanglement (ICE)—where latent features from different categories overlap, causing the model to adopt incorrect semantic priors during reconstruction and ultimately yielding unreliable anomaly scores. To address this issue, we propose the Semantically Disentangled Unified Model for 3D Anomaly Detection, which reconstructs features conditioned on disentangled semantic representations. Our framework consists of three key components: (i) Coarse-to-Fine Global Tokenization for forming instance-level semantic identity, (ii) Category-Conditioned Contrastive Learning for disentangling category semantics, and (iii) a Geometry-Guided Decoder for semantically consistent reconstruction. Extensive experiments on Real3D-AD and Anomaly-ShapeNet demonstrate that our method achieves state-of-the-art for both unified and category-specific models, improving object-level AUROC by 2.8% and 9.1%, respectively, while enhancing the reliability of unified 3D anomaly detection.

Core Contributions

We identify Inter-Category Entanglement (ICE) as a fundamental bottleneck of unified 3D anomaly detection and reformulate unified 3D-AD as semantically conditioned reconstruction.
We propose Coarse-to-Fine Global Tokenization (CFGT), which aggregates multi-resolution geometric cues into a category-aware global representation for reliable semantic identity formation.
We introduce Category-Conditioned Contrastive Learning (C3L) and a Geometry-Guided Decoder (GGD) to disentangle category semantics and reconstruct features with both semantic and geometric consistency.
Extensive experiments on Real3D-AD and Anomaly-ShapeNet demonstrate strong gains over prior methods and validate the effectiveness of semantically disentangled reconstruction in unified 3D anomaly detection.

Why Semantically Disentangled Unified 3D-AD?

Prior 3D anomaly detection methods are largely category-specific, requiring a separate model for each object class. Although recent unified methods improve scalability by training a single model across multiple categories, they often fail to preserve semantic identity in the latent space. As a result, the model may reconstruct an object under the wrong category prior, even when the input is normal. SeDiR addresses this limitation by explicitly disentangling category-level semantics before reconstruction and by conditioning reconstruction on both semantic and geometric priors.

Key point: the central issue is not simply reconstructing anomalies, but reconstructing before understanding what is being reconstructed. SeDiR resolves this by establishing semantic identity first and reconstructing second.

Why Unified 3D-AD Fails: Inter-Category Entanglement

We analyze the latent space of a unified baseline and observe that semantically different categories form heavily overlapping feature clusters rather than distinct semantic manifolds. In particular, categories such as chicken, duck, and gemstone exhibit clear entanglement in the latent space. Furthermore, normal samples with lower category classification scores tend to show higher reconstruction errors, indicating that incorrect semantic understanding leads directly to incorrect reconstruction. These observations confirm that ICE is not a random side effect, but a systematic bottleneck in unified 3D anomaly detection.

T-SNE visualization and quantitative analysis of semantic disentanglement. Compared with MC3D-AD, SeDiR forms better-separated category-aware manifolds, improves category classification confidence, and reduces normal reconstruction error.

Proposed Method

SeDiR follows a simple principle: understand what to reconstruct before deciding how to reconstruct. Given an input point cloud, SeDiR first extracts multi-resolution geometric features and aggregates them into a category-aware global token. It then disentangles category semantics in the latent space and reconstructs features using both semantic priors and geometry-guided cues. This design enables semantically aligned and geometrically consistent reconstruction in the unified multi-category setting.

Overall architecture of SeDiR. Our framework consists of two stages: Semantically Disentangled Representation Learning and Semantically Disentangled Reconstruction. CFGT forms a category-aware global token from multi-resolution geometric features, C3L disentangles latent semantics, and GGD reconstructs the object conditioned on both semantic and geometric priors.

GGD: Geometry-Guided Decoder

The Geometry-Guided Decoder reconstructs features using both disentangled semantic priors and local geometric evidence. Instead of decoding blindly from latent features, the decoder is softly biased toward category-consistent reconstruction pathways. This helps ensure that geometry is reconstructed according to the correct semantic identity, leading to more stable anomaly scores and more precise localization.

Qualitative Results

SeDiR produces more semantically consistent and spatially precise anomaly localization across diverse categories. Compared with a previous unified model (MC3D-AD), it better highlights true anomalous regions while reducing false responses on normal surfaces. These qualitative results support the claim that disentangling semantic identity before reconstruction leads to more reliable anomaly detection and localization.

Qualitative comparison between MC3D-AD and SeDiR. SeDiR provides more accurate and complete localization by reconstructing features under the correct semantic prior.

Citation

@article{kim2026semantically,
  title={A Semantically Disentangled Unified Model for Multi-category 3D Anomaly Detection},
  author={Kim, SuYeon and Lee, Wongyu and Cho, MyeongAh},
  journal={arXiv preprint arXiv:2603.25159},
  year={2026}
}