Recently, my teammate Weimin Wang and I competed in Kaggle’s Statoil/C-CORE Iceberg Classifier Challenge. The competition challenged participants to classify images acquired from C-band radar and was the most participated in image classification competition that Kaggle has ever hosted—so I’m very excited to announce that we won 1st place out of 3,343 teams! Now we’d like to share our winning solution with you.
The objective of this competition was to create an algorithm that automatically classifies images of ships vs. icebergs. Icebergs can pose a threat to ships out at sea, and in some remote areas with harsh conditions the most effective method of monitoring is through satellite imagery.
The overall solution architecture is presented in the figure below. After initial exploratory data analysis, a few key findings were identified that guided the solution architecture choices. First, a strong correlation between the image label and its incidence angle led to the creation of models based on incidence angle grouping. These groups were identified through unsupervised learning. Two primary model architectures were chosen based on custom CNN and VGG architectures. Over 100 different custom CNN architectures were designed based on conv-conv-pool and conv-conv-conv-pool styles using between 2-4 conv layers, 1-3 fully connected layers, and fully connected size from 16-512. Custom image pre-processing filters were also applied. Random search was used to find the top 10 architectures, then each top architecture was trained using four fold cross validation and used as part of the overall model ensemble. A very similar approach was taken with VGG architecture, but using VGG16 as the base model pre-trained on ImageNet. Various fully connected architectures were selected after incorporating a third channel into the two channel images. For each model type, models were trained using the full dataset, as well as a separate subset of the data that was identified based on incident angle clustering. Ensembles of the models were then pre-formed using both greedy blending and two-level stacking. Blending and stacking techniques are used to combine the results of multiple models while reducing prediction error on holdout and validation sets.
Greedy blending allowed for model inclusion if an additional model improved the overall median score. Two-level stacking combined results of the CNN and VGG predictions as well as additionally computed image stats as new features. A final CNN result was then computed by averaging the blended and stacked result.
A final set of custom post processing algorithms were then derived based on the unsupervised learning observations that were constructed during the exploratory data analysis. The post processing steps clustered the data by incident angle using KNN, and updated the confidence of a prediction based on the results of its nearest neighbors.
Several key observations led to the winning solution. First, initial exploratory data analysis using unsupervised learning techniques helped segment the problem by natural groupings of incident angle. By then separating CNN model construction by natural groupings, overall model log loss could be reduced. Secondly, careful selection of a diverse set of CNN model architectures and ensembling many weaker learners rather than searching for a smaller set of stronger learners was also key to reducing log loss, largely due to the limited size of the dataset. Finally, inclusion of a post-processing pipeline that used several hand-crafted techniques to fully utilize the unsupervised learning observations was the last step in the creation of the winning solution. If you’d like to learn more, please join me at the Intel AI DevCon event in May where I will be delivering a technical presentation on this solution.