Identifying objects and their parts is critical to how humans understand and interact with the world. For example, using a stove requires not only identifying the stove itself, but also its subcomponents: burners, control knobs, etc. This same capability is essential to many AI vision, graphics, and robotics applications, including predicting object functionality, human-object interaction, simulation, shape editing, and shape generation.
This wide range of applications has spurred great demand for large 3D datasets with part annotations. However, existing 3D shape datasets provide part annotations only on a relatively small number of object instances or on coarse, yet non-hierarchical, part annotations, making these datasets unsuitable for applications involving part-level object understanding. In other words, if we want AI to be able to make us a cup of tea, large new datasets are needed to better support the training of visual AI applications to parse and understand objects with many small details or with important subcomponents.
Given these shortcomings, in a paper presented at the 2019 Conference on Computer Vision and Pattern Recognition (CVPR 2019), we make the following contributions:
We guided PartNet’s data annotation via expert-defined hierarchical part templates, which we provided to increase annotation consistency between the 66 trained, professional annotators with whom we worked on this project. To ensure accuracy, annotators additionally performed at least one verification pass on each annotation.
The lack of well-acknowledged rules of thumb for defining good templates and the need for the final template to cover all variations of shapes and parts made the creation of part templates a challenging task. The following principles guided the development of our hierarchical templates:
Based on these criteria, experts defined templates based on an examination of a broad variety of objects in each category. Each template is hierarchical from its coarse semantic parts to its fine-grained, primitive-level components. Given that it is impossible for a template to cover all cases, no matter its comprehensiveness, annotators were able to improve upon these templates and annotate parts that were out of the scope of the existing definition.
Figure 1 shows the template for “Lamp.” And nodes break a part down into its subcomponents. Or nodes show subcategories for the current part. Table lamps and ceiling lamps (as well as other varieties) can be explained with the same template through the first-level “Or” node for lamp type.
Figure 2 shows our web-based annotation interface, which presents the annotation process as a single-thread question-answering workflow whose answers automatically construct the final hierarchical segmentation for the current shape instance.
The result of these efforts is the PartNet dataset, which provides fine-grained and hierarchical instance-level part segmentation annotation for 26,671 shapes with 573,585 part instances from 24 object categories. Shapes and object categories are based on ShapeNetCore, with three supplemental object categories. Additionally, seven existing categories have been supplemented with additional 3D models from the SketchUp* 3D Warehouse.
In addition to developing PartNet itself, we tested benchmarks for three part-level object understanding tasks using PartNet: fine-grained semantic segmentation, hierarchical semantic segmentation, and instance segmentation. For details on the results of these experiments, please review our full paper.
PartNet contains highly structured, fine-grained and heterogeneous parts. Our experiments suggest that existing algorithms developed for coarse and homogenous part understanding cannot work well on PartNet, likely for these reasons:
Ultimately, these experiments indicate that PartNet could serve as a better platform for part-level object understanding in the next few years.
With this PartNet dataset, we can now build a large scale simulated environment full of objects and all of their parts. We can then use this virtual world to teach robots about objects, their parts, and how to interact with them. For example, in the below photo, a robot is learning that pushing a button on a microwave will open the microwave door. This will allow us to train robots to complete daily behaviors as humans do, by understanding all of the parts and steps involved.
If you would like to try PartNet yourself, we have provided a sample dataset, sample results, and a summary video (also embedded above). We hope you find PartNet a useful resource for future visual AI applications and look forward to your comments and feedback on our work.
Please tune in to @IntelAI and @IntelAIResearch for continuing updates on Intel hardware, software, research, and resources furthering the field of AI.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
Intel and the Intel logo are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
© Intel Corporation