Introducing the Aeon Dataloader and Other Enhancements in Nervana Cloud 1.5.0

Nervana Cloud 1.5.0 contains enormous under-the-hood changes and improvements.  We’ve revamped and updated a lot of the core underlying code, separated the various application components into their own microservices, re-written our job launcher, added support for a new container orchestration service, squashed more than 75 bugs, and greatly expanded our testing coverage. The biggest changes visible to the end user are the aeon dataloader and auxiliary file volume support.

Aeon dataloader – The aeon dataloader enables fast and flexible access to training data sets that are too large to load directly into memory. Data is first loaded in chunks called “macrobatches” that are then split further into minibatches to feed the model. An easy interface enables you to configure the dataloader for custom datasets and to load data from disk with minimal latency.  A manifest file is used to indicate your local input and target paths.  On the backend we’ve built an entirely new data service to handle fetching, caching, and serving requests for dataset batches. See the Aeon User Guide for more information.

Auxiliary file volume support – We’ve also introduced support for arbitrary file volumes to handle non-aeon formatted data such as older datasets, vocabulary files, and output data.  The volumes are mounted read/write during model training and inference jobs. You can append new files to existing volumes or download their contents. See Attaching Data for details on how to use this feature.

This release of Nervana Cloud includes a number of additional features and fixes:

  • New ncloud commands and API endpoints for retrieving command history, getting machine information, and revoking access tokens.
  • Automatic command retry and other enhancements to improve stability when uploading large individual files and directories of many small files. In addition, batch sizes are now configurable to better cope with network lag and disruptions. We also cap the number of simultaneous open file descriptors.
  • Enhancements such as automatic scaling and load balancing to improve streaming inference performance.
  • Revamped administration of users, groups, and tenants to improve the content displayed and fix removal operations in certain scenarios.  Added administration support via web user interface.
  • Nervana Cloud now defaults to neon v1.9.0 — i.e., all training jobs, interactive Jupyter sessions, and model deployment jobs will now assume neon v1.9.0 unless you explicitly override them to use a different version.