This summer marked the 4th annual summer research program that brings together NASA’s Frontier Development Lab (FDL), the SETI Institute, and public and private research partners to tackle some of the most challenging problems on our home planet and beyond, including space weather, flood detection, and lunar resources. Intel, Google Cloud, and other corporate partners have answered the call to help provide NASA with artificial intelligence experts that advise research teams on state-of-the-art methods in data science. This year, Intel provided 20 AI mentors to work with six teams in the US and three teams in Europe. Here’s a glimpse of the contributions that three of our Intel mentors have made during the eight-week program.
I was an Intel AI mentor on the Enhanced Predictability of Global Navigation Satellite Systems (GNSS) Disturbances challenge. As part of this challenge, the team wanted to better understand components that contribute to weather in space. More specifically, knowledge of weather conditions in the ionosphere (upper atmosphere) can help predict disturbances in GPS signals. Ionospheric scintillations are rapid modifications of radio waves that are caused by small-scale structures in the ionosphere. Severe scintillation conditions can prevent a GPS receiver from locking on to a signal and can make it impossible to calculate a latitude/longitude position. Less severe scintillation conditions can reduce the accuracy and the confidence of positioning results. The ability to predict such disturbances can potentially lead to the prefetching, calculation, or storage of necessary information especially for real-time data sensitive applications in the aviation industry, navigation, and stock markets. In a society highly reliant on GPS systems, the impact of this research is crucial and immediate. We all can relate to the frustration of losing navigation instructions, only to be re-routed a few minutes later.
To help make these predictions as accurate as possible, the team leveraged public data that measures the behavior of the sun and of geospace weather, such as solar wind density and velocity, proton fluxes, and magnetometer data. A key data source was from a series of ground instruments in the Canadian Arctic region called CHAIN (Canadian High Arctic Ionospheric Network). These ground receivers record information about the Total Electron Count (TEC) of the ionosphere. All of these factors are responsible for the scintillations we wanted to predict. The domain experts in our team believed there is a strong correlation between bright features in the aurora and scintillation index. My role as an AI mentor on the team was to help verify this correlation by providing guidance on the experiments and models.
In my career as a data scientist I’ve worked on applying machine learning constructs to multiple domains like computer vision, speech and natural language processing. Data science comes down to leveraging data to help answer predefined questions. Putting pieces from my experience together, I was able to think of this challenge as one that involves time-series data with a goal to classify the auroral images and predict the scintillations while ascertaining a correlation between the two. The more complex the data, the more exciting the problem! It was crucial to handle the interplay of multiple data sources, understand the importance of different features, correctly normalize the data and ensure the model can be trained to extract insights. In the first few weeks, my role along with the other AI mentors was to help brief the team on narrowed-down deep learning approaches that might be applicable to this challenge and help scope out their initial experiments. As the summer progressed, I worked with the team on data ingestion and helped define the model to be run. Most recently, the team faced an issue where the input data had multiple missing features. As a result, training a model on this data was leading to inaccurate results. I suggested a few approaches to address potential causes of the error and ways to tackle this issue such as adding masks to missing entries and better handling them during the model training, delivering an improvement in the results. Every now and then I have a sense of amazement about what I’ve been so fortunate to be a part of – meaningful science amidst some of the best researchers in their fields!
I worked with Anahita in mentoring the GNSS challenge team. Although my specialty is in healthcare AI, data scientists really just try to tell stories with data– whether the data is from MRI images in the hospital or from telescope images in an observatory. I also help coordinate our 20 Intel AI mentors and get a unique perspective on how FDL evolves over the 8 week sprint.
With so many mentors from Intel, Google Cloud, NASA, SETI, and others, we’ve designed a “tag team” effort to help the research teams. For example, during the initial two-week sprint I was able to help set up the research team on Google Cloud Platform instances and introduced them to the Google Cloud Deep Learning VMs which are optimized for Intel Xeon Scalable processors. We then worked through the initial deep learning topology design by merging the previous year’s multilayered perceptron approach with a convolutional network to accept ground-based telescope images of the aurora borealis and packaged them into a long-short term recurrent neural network. The preliminary work showed that the addition of the aurora images improved the power of the algorithm to predict GPS interruptions.
I was also able to provide mentoring to other FDL teams. The Disaster Prevention, Progress, and Response team is trying to improve our ability to predict and respond to floods using satellite imagery, ground observations, and social platform data. Interestingly, a European Space Agency satellite scheduled to be launched this year has an Intel® Movidius™ Myriad™ Vision Processing Unit (VPU) aboard which allows the researchers to run AI models on the satellite without taking the costly and time-consuming step of transmitting the data back to Earth. I previously used the Intel® Neural Compute Stick and Intel® Distribution of OpenVINO™ toolkit in medical applications, and showed the FDL team how easy it is to convert their model to perform “orbital edge AI” on a VPU.
Even before the eight week sprint began, the Disaster Prevention team set up Slack channels for constant communication and knowledge sharing among the team researchers and NASA FDL mentors. Through these channels I learned more about the existing methodologies that the United States Geological Survey (USGS) and the National Oceanic and Atmospheric Administration (NOAA) use. Flooding happens when a river’s discharge exceeds its channel’s capacity, causing the river to overflow its banks. One of the most prevalent ways today to measure flood susceptibility is by using river height at stream gauge locations (as shown in Figure 1) to generate physical non-linear hydrological modeling of rivers. This method requires careful selection of hydrological parameters that require meticulous calibration and are customized only to a given basin/watershed region; it is time-consuming, costly, and unsuitable for short-term flood prediction. That’s why it is crucial to find alternative means for faster and reliable flood prediction models that are applicable to wider areas; artificial intelligence is well-suited to address this challenge. By the end of the first week of the bootcamp, the NASA FDL team, with input from the USGS, had already brainstormed about problems they could solve in the subsequent seven weeks that would not only be feasible to accomplish but also meaningful to the USGS in the long term.
By week four, the team had finally formulated the non-trivial problem they wanted to solve: can a generalizable ML model be developed to predict the flood occurrence for a given month and the time to peak for a river height after peak rainfall has happened? Rainfall, soil imperviousness and various basin characteristics were selected as feature attributes. The data was spatially distributed over six states and temporally distributed over 10 years with a year’s gap. The goal of this flood susceptibility model was to help prioritize areas for flood prevention/management related investments by analyzing the data using machine learning techniques on Google Cloud compute engines using Intel® Xeon® CPUs.
The team evaluated random forests, gradient boosted decision trees, and neural nets to predict flood susceptibility using precision-recall curves. With gradient boosted trees, the model could accurately predict flood events in a given month 70% of the times. As a machine learning mentor, I strived to make sure that the team did not overlook preprocessing steps like feature normalization, correlation and ranking, dealing with a highly imbalanced dataset, and regularization techniques. My job was to provide the team with relevant resources or techniques to remove roadblocks along the way. For instance, when the team fused the VGG features from elevation depth map with the tabular data, the flood susceptibility prediction degraded, though in theory it should have improved. So, I reached out to Alexei Bastidas from Intel AI Lab to provide his expertise on extracting features from satellite imagery and fusing with tabular data. Though not all of them could be implemented in the given time, these will be leveraged by the team for expanding the project later on.
Being a mentor in this “Cheetah”-like project was a different learning experience for me. This mentoring was about steering the thought process of the researchers in the right direction when needed. It was not about imposing on them to use “certain” algorithms/techniques, but it was about helping them focus on possible solutions to try. The eight-week project was about learning how to formulate a feasible solution, try different techniques quickly, and fail early in order to recover sooner and move on. As a mentor, I learned that one of my main jobs was to motivate the team. For example, I shared that a failed experiment is not a waste of time; in fact, it gives more insight on model design and data characteristics.
Working with this highly talented group of researchers was a great experience and I can’t wait to be an AI mentor again next year. The fact that research geologist Jonathon Stock, director of the USGS National Innovation Center, invited the team to present their work in order to engage more domain experts and get access to more data so the solution could be integrated with the USGS system was an incredible outcome of this summer’s work.
We invite you to check out the final presentations by the disaster prevention and GNSS teams. Intel AI researchers and data scientists continue to support the NASA FDL challenge and appreciate the opportunity to take part in projects that apply deep learning techniques to challenges in space exploration. For more updates, including information on next year’s challenge, you can follow Intel AI research and NASA FDL on Twitter.
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No product or component can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.
© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.