Back To Schedule
Saturday, November 17 • 3:45pm - 4:20pm
Street-fighting techniques for multi-tenant machine learning and big data workloads on Kubernetes

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

We like to think that Machine Learning and Big Data are all about harnessing our creativity and intelligence to solve extremely difficult problems. Yet, we spend only a small part of our time and budget on actually analyzing our data or building powerful models. Instead, we are forced to scale and optimize complex infrastructure, so that it can handle the jobs we will create. Unlike scaling web servers or stateless services, Big Data and Machine Learning workloads force us to be very conscious of things like GPU availability and data locality. How can we spend more time on the interesting part of the problem when our infrastructure is so complex? How can we spend most of our time in Spark analyzing our data, instead of allocating our executors? How can we spend most of our time in TensorFlow creating the most interesting deep network, rather than allocating workers to optimize GPU usage? At MapR, we have seen that Kubernetes can change things radically. We believe that Kubernetes is set to revolutionize how we create and run Big Data and Machine Learning applications. With recent enhancements, Kubernetes has reached a point where much of the traditional pain in creating these complex applications can be avoided. In this talk, we will demonstrate two examples of how to use Kubernetes to simplify your workload. The first is Spark-on-Kubernetes. We will describe the traditional challenges of building an analytics application using Spark. We will demonstrate dynamically building a large spark cluster for a specific analytics job. We will discuss best practices for handling data locality in the Kubernetes world. We will explore launching many Spark clusters in a shared environment and the various tricks for making this work on Kubernetes at scale. The second demonstration will use of Kubeflow, an implementation of TensorFlow for Kubernetes to train and serve deep learning models. We will explore topics like GPU reservation and scheduling. We will discuss and examine the various challenges we have in building an application that uses machine learning at scale in Kubernetes. We will demonstrate running KubeFlow in a multi-tenant cloud based environment.

avatar for Sky Thomas

Sky Thomas

Engineer/Architect, MapR

Saturday November 17, 2018 3:45pm - 4:20pm PST