Complex simulations drive discovery in a huge range of scientific fields, including seismology, astrophysics, economics, and medicine. As simulations increase in complexity to better model phenomena and processes, __how__ a simulation generated a result can be just as important or even more important than __what__ result was generated. In other words, given an output from a simulation, what inputs and processes generated that output? Scientists have used various approaches for this kind of analysis in the past, but they generally have been too computationally costly to be efficiently used with highly-complex simulations.

Etalumis (the name is the inverse of “simulate”), an AI/HPC research project with contributors from the University of Oxford, CERN, Lawrence Berkeley National Lab, Intel, the University of British Columbia, the University of Liege, and New York University, proposes a new probabilistic programming approach to this problem in a paper that was __selected as a finalist for best paper at SC19__ and will be presented__ at NeurIPS 2019__. In the first application of Etalumis’ approach on simulation data from CERN’s Large Hadron Collider, the Etalumis team was able to reduce the training time for their probabilistic network from 87 days to just 9 minutes using an improved version of PyTorch and more than 1,000 nodes on NERSC’s Cori supercomputer. The team ultimately achieved the largest scale ever posterior inference in a Turing-complete PPL (probabilistic programming language) as a result of encountering approximately 24,000 latent variables expressed by the Sherpa simulator’s nearly one million lines of C++ code.

Etalumis’ approach is highly portable and designed to interface with existing simulations, which could lead to many significant scientific opportunities in fields beyond high-energy physics.

HPC simulations generally only solve the “forward” problem, i.e. for given parameters the simulation generates certain outputs**. **It is equally/more important to solve the “inverse” problem, i.e., reverse the simulation to learn how an output is generated. Approaches to solve this inverse problem in the past have been extremely computationally costly.

Etalumis solves this challenging inverse problem for existing scientific simulators in a highly efficient and scalable way with a deep-learning-based probabilistic programming inference engine – what we call “inference compilation” technology — and it provides highly interpretable posterior results. It gives scientists the ability to predict the input parameters and the path to get to an output based on an existing simulator’s output. This research has potential impact across many scientific research areas.

Our team demonstrated this inverse capability on a particle physics Large Hadron Collider (LHC) use case with the state-of-the-art Sherpa simulator. Our deep-learning based inference compilation (IC) technology allows distributed training and distributed inference. Intel contributed to this project by helping to overcome all the challenges in scaling and performance optimizations for this dynamic 3DCNN-LSTM network architecture and scale this probabilistic programming system, implemented with PyTorch deep learning framework, to run at large-scale on HPC platforms. We improved PyTorch to train this complex dynamic neural network to 1,024 Intel Xeon processor-based nodes of the Cori supercomputer with the largest global mini-batch size of 128k at NERSC. This approach reduced training time from 87 days to 9 minutes, enabling scientists to apply AI in their work.

Until now, scientists have often addressed this inverse problem by running exhaustive simulations in order to find the set of parameters which can match the output of the experiment, which is extremely costly in computation and inefficient. Alternative approaches using existing PPLs required that complex scientific simulation models be rewritten from scratch in the chosen language/system and could not address non-differentiable scientific simulation code, limiting models using existing PPLs to demos or simplified simulations. More-recently, Markov chain Monte Carlo (MCMC) approaches, which are sequential in nature, have been found to have high computational cost and low scalability. Moreover, MCMC methods are not efficient due to their “burn-in” period and autocorrelation between samples.

We sought to solve the inverse problem using deep learning inference compilation. To do so, we needed to solve issues pertaining to model-PPL interface, handling of priors with long tails, rejection sampling routines, addressing schemes, and IC network architecture, because we ultimately wanted to interface our probabilistic programming language with existing simulator codes.

Etalumis is efficient, scalable, and highly interpretable due to use of a deep-learning-based probabilistic programming method.

- It is efficient, because it uses a trained neural network to provide proposals to guide amortizing inference (based on importance sampling) and the neural network only needs to be trained once for a given simulation model. IC inference only needs a fraction of the computation cost of the random-walk Metropolis Hastings (RMH) MCMC baseline for a given effective sample size.
- It is scalable, as we can enable both distributed training and distributed inference. IC inference is embarrassingly parallel after the proposal neural network is trained to convergence. The ability to retrain a model quickly is transformative to research.
- It is highly interpretable, as existing simulation code is interpreted/executed as probabilistic programs and inference is done within the simulation execution space in the structured model defined by the simulator code base. The ability to control existing simulators to generate interpretable posteriors is relevant to scientific domains where interpretability in model inference is critical.

Etalumis will impact multiple scientific applications and give scientists the ability to solve the simulation inverse problem in a highly efficient, scalable, interpretable way. The current PPL system can be reused with common interface with new existing simulators plugged-in. It can support simulators written in multiple languages.

Additionally, we think Etalumis is an important proof point for HPC + AI in scientific applications. AI/HPC convergence is becoming the driving-force for HPC growth. We demonstrated the ability to use large-scale Intel® Xeon® Scalable processor based-systems to produce real scientific impacts.

We hope to apply this method to more scientific applications including epidemiology modeling such as disease transmission and prevention models, autonomous vehicle and reinforcement learning environments, cosmology, and climate science. The PPL scientists are working to apply this PPL system to a composite manufacturing simulator. The application is a process simulation of composite materials. Basically, it is a simulation of a cooking process of a plane wing. The problem we are trying to solve with inference is to infer the internal temperature of the wing given various observations including the observed surface temperature of the wing.

For more on Etalumis, __please review our paper__, “Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model,” __look for us at the 2019 NeurIPS conference__, and stay tuned to __@IntelAIResearch__ on Twitter.

© Intel Corporation. Intel, the Intel logo, and Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries. Other names and brands may be claimed as the property of others.