IIIT Hyderabad and Intel Release World’s First Dataset for Driving in India

Everyone believes that their city has the worst drivers, but it’s safe to say that few western drivers face as many challenges on the road as drivers in India. With a population of over one billion, India has roads with unique conditions that require drivers to react to a variety of potential hazards in a split-second. Testing of self-driving cars has begun in U.S. states like Arizona, California, Ohio, and Michigan, but the autonomous driving services being developed are applicable only to the environments in which their algorithms have been trained.

So far, the training of autonomous driving algorithms in the United States has been mostly based on road conditions with well-defined lanes and relatively few extraneous objects. In India, however, drivers often merge between lanes or ignore them altogether, pedestrians cross at random points and at random intervals, bicycles and rickshaws occupy overlapping space, and animals enter traffic at a whim.

Unique Dataset

Earlier this fall, Intel announced the availability of the world’s first open dataset of driving conditions in India to facilitate autonomous vehicle training. The International Institute of Information Technology (IIIT) in Hyderabad and Intel worked for over a year to create this dataset. The dataset is based on 50,000 image frames at 1080p and 720p resolution that were captured from a front-facing camera attached to a car driven around Hyderabad and Bangalore and contains 10,000 segmentation annotations, marked drivable and non-drivable areas (Fig. 1) along with vehicles and objects.

Figure 1: Pixel count and four level label hierarchy, color-coded for prediction

Since releasing the dataset we’ve seen high interest among researchers across the world. During our international launch competition, we saw over 160 downloads of the data with 15 teams competing in the challenge. (Congratulations to the winners – Mapillary Research for the segmentation challenge and the TUTU team from the University of Hong Kong for the instance segmentation challenge.) Open for non-commercial research use around the world, the dataset contains several new types of objects that will push self-driving algorithms forward. Researchers in China are also working on their own dataset, just as cities in Europe have created ones specific to their region.

Training Algorithms to Be Safer

Algorithms are only as good as the data you train them on, and current datasets aren’t detailed enough to properly address road safety in some of the world’s most crowded and unstructured driving environments. Humans can react to irregular objects, like animals on the road or jaywalkers, but machines have to become experts at identification (Fig. 2).

Figure 2: Input example images baseline trained on the dataset

It will be many years before autonomous driving is fully deployed in India and other regions because of the complexity of the road conditions. As in the U.S., driver assistance systems are available today and will continue to evolve to include foundational elements for autonomous functionality, which will be followed by commercial autonomous services, and then individual use. We are excited to see how this first dataset of its kind will propel greater innovation across the world.

To learn more, watch this video about the world’s first open dataset of driving conditions in India and discover how Intel® technologies are powering automated driving experiences around the world.