AWS beefs up SageMaker machine learning


Amazon Web Services has expanded the capabilities of its machine learning toolkit to address a number of challenges that enterprises confront when trying to operationalize machine learning, from model organization, training, and optimization to monitoring the performance of models in production.

, SageMaker aims to make machine learning adoption simpler for customers by bringing together a hosted environment for Jupyter notebooks with built-in model management, automated spin up of training environments in Amazon S3, and HTTPS endpoints for hosting capabilities using EC2 instances.

[ ]

As CEO Andy Jassy presents it, AWS—like rivals Google Cloud and Microsoft Azure—wants to become the leading, full-service environment for data scientists, data engineers, and non-specialist developers to run all of their machine learning workloads.

For AWS this means a triple-layered stack of services, starting with the basic building blocks used by experienced technical practitioners who want to be able to tweak every part of their modeling process, whether with , PyTorch, MXNet, or another . SageMaker promises to simplify key elements of the process, topped off with cognitive off-the-shelf services like Translate, Transcribe, image recognition, and voice recognition capabilities.

Introducing SageMaker Studio

Now Amazon is expanding this sandbox with what it calls , finally giving customers a fully integrated development environment (IDE) to store and collect all of the source code, notebooks, documentation, data sets, and project folders needed to run and manage machine learning models at enterprise scale, including collaboration capabilities.

are also provided by the likes of Domino Data Lab and Dataiku.

SageMaker Experiments and Model Monitor

Among the new capabilities AWS has announced, let’s start with notebooks. AWS wants to simplify the provisioning of compute when spinning up a Jupyter notebook with one click, as well as automating the tricky process of transferring contents between notebooks.

), which automates the selection, training, and optimization of machine learning models within Sagemaker for classification and linear regression models.

Jassy said that customers have asked for greater visibility into these models, and has responded with SageMaker Autopilot.

The rough end-to-end workflow with SageMaker Autopilot is that customers provide the CSV file or a link to the S3 location of data they want to build the model on, and SageMaker will then train up to 50 different models on that data and give customers access to each of these as notebooks and present them in the form of a leaderboard within SageMaker Studio. The entire process, from data cleaning and pre-processing to algorithm choice to instance and cluster size selection, is handled automatically.

“So when you open the notebook the recipe of that model is there, from the algorithm to the parameters, so you can evolve it if you want,” Jassy said during his re:Invent keynote today.

In theory this allows companies to level up their models as they go with AWS, starting with classification and regression algorithms, but giving them the ability to track, measure, and customize these as they accumulate more data and grow the data science and engineering skills in their business.

SageMaker Studio is available immediately from the AWS US East (Ohio) region, while SageMaker Experiments and SageMaker Model Monitor are available immediately for all SageMaker customers.