Introducing Nauta: A Distributed Deep Learning Platform for Kubernetes*

Artificial intelligence (AI) is continuing to evolve and expand as enterprises explore use cases to augment their business models.  According to a recent study from Gartner, AI deployments are adding real value, and are expected to reach nearly $4TN by 2022.  One specific area of AI that has experienced massive growth is deep learning (DL). A 2018 survey from Deloitte showed that nearly 50% of respondents had used DL.[1]  While the business value continues to grow, and the interest in DL in the enterprise is palpable, it is still a complex, risky, and time-consuming effort to integrate, validate, and optimize deep learning solutions.  This is why we are introducing an open source platform called Nauta for distributed DL using Kubernetes*.

What is Nauta?

Nauta provides a multi-user, distributed computing environment for running DL model training experiments on Intel® Xeon® Scalable processor-based systems. Results can be viewed and monitored using a command line interface, web UI and/or TensorBoard*. Developers can use existing data sets, proprietary data, or downloaded data from online sources, and create public or private folders to make collaboration among teams easier. For scalability and ease of management, Nauta uses components from the industry-leading Kubernetes* orchestration system, leveraging Kubeflow*, and Docker* for containerized machine learning at scale.  DL model templates are available (and customizable) on the platform, removing complexities associated with creating and running single and multi-node deep learning training experiments. For model testing, Nauta also supports both batch and streaming inference, all in a single platform.

Figure 1: Training and deploying a deep neural network with Nauta

Figure 1: Training and deploying a deep neural network with Nauta

By Developers, for Developers

We’ve created Nauta with the workflow of developers and data scientists in mind.  Nauta is an enterprise-grade stack for teams who need to run DL workloads to train models that will be deployed in production. With Nauta, users can define and schedule containerized deep learning experiments using Kubernetes on single or multiple worker nodes, and check the status and results of those experiments to further adjust and run additional experiments, or prepare the trained model for deployment.

Why Should I Use Nauta?

Nauta gives users the ability to leverage shared best practices from seasoned machine learning developers and operators without sacrificing flexibility. At every level of abstraction, developers still have the opportunity to fall back to Kubernetes and use primitives directly. Nauta gives newcomers to Kubernetes the ability to experiment – while maintaining guard rails. Carefully selected components and an intuitive UX reduce concerns about the production readiness, configuration and interoperability of open source DL services.

Nauta also facilitates collaboration with team members, as it was designed from the start with the ability to enable multiple users. Job inputs and outputs can be shared between team members and used to help debug issues by launching TensorBoard against others’ job checkpoints.

Figure 2: Monitoring jobs using TensorBoard and the Nauta WebUI

Figure 2: Monitoring jobs using TensorBoard and the Nauta WebUI

What’s Next?

We’re excited to continue to develop and bring new features to Nauta throughout 2019.  Look for more updates in Q1 and beyond.  We’ll continue to keep the development community updated through our landing page and encourage developers and data scientists to try Nauta on their own stack.  For the most up-to-date technical information, including installation guides, user documentation, and how to get involved with the project, please visit our Github repo.