Introducing PartNet: The First Large-Scale Dataset with Fine-Grained, Hierarchical, Instance-Level Part Annotations

Identifying objects and their parts is critical to how humans understand and interact with the world. For example, using a stove requires not only identifying the stove itself, but also its subcomponents: burners, control knobs, etc. This same capability is essential to many AI vision, graphics, and robotics applications, including predicting object functionality, human-object interaction, simulation, shape editing, and shape generation.

This wide range of applications has spurred great demand for large 3D datasets with part annotations. However, existing 3D shape datasets provide part annotations only on a relatively small number of object instances or on coarse, yet non-hierarchical, part annotations, making these datasets unsuitable for applications involving part-level object understanding. In other words, if we want AI to be able to make us a cup of tea, large new datasets are needed to better support the training of visual AI applications to parse and understand objects with many small details or with important subcomponents.

Given these shortcomings, in a paper presented at the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019), we make the following contributions:

  • We introduce the PartNet dataset, which consists of 573,585 fine-grained part annotations (visually and semantically identified subcomponents) for 26,671 shapes (that is, 3D point clouds of objects) across 24 object categories (e.g., lamp, door, table, chair). We believe PartNet is the first large-scale dataset with fine-grained, hierarchical, instance-level part annotations.
    • Example: In the case of a digital image containing a table lamp, the hierarchical part annotations could include “lamp head” with more finely-grained subcomponents such as “lamp shade,” “light bulbs,” and “pull chain.” Segmented “instances” would distinguish similar components from one another “left light bulb” and “right light bulb,” for example.
  • We propose three part-level understanding tasks to demonstrate the usefulness of PartNet:
    • Fine-grained semantic segmentation, or the partitioning of an object into coherent parts. For example, classifying each point that belongs to a back rest, arm rest, or any other entity in a chair.
    • Instance segmentation, or distinguishing the left arm rest from the right arm rest or each foot of the chair as a separate part.
    • Hierarchical segmentation, or denoting coarser to finer segmentation with increasing level of hierarchy. For example, finer parts such as a leg or a bar stretchers are segmented as separate entities at a higher level of hierarchy, whereas at a lower level of hierarchy, recognizing these two as generic type “chair base” is sufficient.
  • Using PartNet, we benchmark four state-of-the-art algorithms for semantic segmentation and three baseline methods for hierarchical segmentation.
  • We propose the use of part instance segmentation with PartNet and describe a new method for part instance segmentation that significantly outperforms the current baseline method.

Annotating PartNet

We guided PartNet’s data annotation via expert-defined hierarchical part templates, which we provided to increase annotation consistency between the 66 trained, professional annotators with whom we worked on this project. To ensure accuracy, annotators additionally performed at least one verification pass on each annotation.

The lack of well-acknowledged rules of thumb for defining good templates and the need for the final template to cover all variations of shapes and parts made the creation of part templates a challenging task. The following principles guided the development of our hierarchical templates:

  • Part concepts must be well-defined, or delineated with enough clarity that parts are identifiable to multiple annotators.
  • Part concepts should be consistent and capable of being shared and reused across multiple parts, shapes, and object categories.
  • The set of part concepts should be compact, with no unnecessary concepts and the reuse of concepts wherever possible.
  • The set of part concepts must be hierarchical and organized into a taxonomy that covers coarse and fine-grained parts.
  • Leaf nodes in the part taxonomy should be atomic, consisting of only primitive, decomposable shapes.
  • The part taxonomy should be complete, or capable of covering a heterogeneous variety of shapes as comprehensively as possible.

Based on these criteria, experts defined templates based on an examination of a broad variety of objects in each category. Each template is hierarchical from its coarse semantic parts to its fine-grained, primitive-level components. Given that it is impossible for a template to cover all cases, no matter its comprehensiveness, annotators were able to improve upon these templates and annotate parts that were out of the scope of the existing definition.

Figure 1. The expert-defined hierarchical template for “lamp” and the instantiations for a table lamp (left) and ceiling lamp (right).

Figure 1. The expert-defined hierarchical template for “lamp” and the instantiations for a table lamp (left) and ceiling lamp (right).

Figure 1 shows the template for “Lamp.” And nodes break a part down into its subcomponents. Or nodes show subcategories for the current part. Table lamps and ceiling lamps (as well as other varieties) can be explained with the same template through the first-level “Or” node for lamp type.

Figure 2. From left to right: the annotation interface and its subsections, the proposed question-answering human annotation workflow, and examples from the mesh cutting interface.

Figure 2. From left to right: the annotation interface and its subsections, the proposed question-answering human annotation workflow, and examples from the mesh cutting interface.

Figure 2 shows our web-based annotation interface, which presents the annotation process as a single-thread question-answering workflow whose answers automatically construct the final hierarchical segmentation for the current shape instance.

The result of these efforts is the PartNet dataset, which provides fine-grained and hierarchical instance-level part segmentation annotation for 26,671 shapes with 573,585 part instances from 24 object categories. Shapes and object categories are based on ShapeNetCore, with three supplemental object categories. Additionally, seven existing categories have been supplemented with additional 3D models from the SketchUp* 3D Warehouse.

Benchmarking with PartNet

In addition to developing PartNet itself, we tested benchmarks for three part-level object understanding tasks using PartNet: fine-grained semantic segmentation, hierarchical semantic segmentation, and instance segmentation. For details on the results of these experiments, please review our full paper.

  • Fine-grained semantic segmentation. We benchmarked four state-of-the-art semantic segmentation algorithms on the fine-grained PartNet segmentation: PointNet, PointNet++, SpiderCNN, and PointCNN.
  • Hierarchical semantic segmentation. We propose three baseline methods to tackle hierarchical segmentation based on PointNet++ architecture: bottom-up (considering only the leaf-node parts during training and grouping the prediction of the child nodes to the parent nodes in the hierarchy), top-down (classifying coarser nodes first and then finer-level ones), and ensemble (training flat segmentation at multiple levels).
  • Instance segmentation. We propose a novel detection-by-segmentation network to address instance segmentation. By taking advantage of rich shape structures, this method significantly outperforms the existing baseline method for this task.

PartNet contains highly structured, fine-grained and heterogeneous parts. Our experiments suggest that existing algorithms developed for coarse and homogenous part understanding cannot work well on PartNet, likely for these reasons:

  • Small and fine-grained parts, such as door handles and keyboard buttons, are abundant and present new challenges for part recognition.
  • Parts that have similar geometric shapes and different uses require more global shape context to accurately distinguish.
  • The wide variation in the types and shapes of different objects and parts necessitates hierarchical understanding.

Ultimately, these experiments indicate that PartNet could serve as a better platform for part-level object understanding in the next few years.

Implications for Robotics

With this PartNet dataset, we can now build a large scale simulated environment full of objects and all of their parts. We can then use this virtual world to teach robots about objects, their parts, and how to interact with them. For example, in the below photo, a robot is learning that pushing a button on a microwave will open the microwave door. This will allow us to train robots to complete daily behaviors as humans do, by understanding all of the parts and steps involved.

Introducing PartNet: The First Large-Scale Dataset with Fine-Grained, Hierarchical, Instance-Level Part Annotations

Conclusion

If you would like to try PartNet yourself, we have provided a sample dataset, sample results, and a summary video (also embedded above). We hope you find PartNet a useful resource for future visual AI applications and look forward to your comments and feedback on our work.
Please tune in to @IntelAI and @IntelAIResearch for continuing updates on Intel hardware, software, research, and resources furthering the field of AI.