Loading…
Wednesday, November 14
 

9:00am PST

Advanced Software Engineering with Cliff Click at Capital One, 201 3rd St., San Francisco
Cliff Click is a legend in the world of compilers, distributed systems, a software engineer's engineer.  He is known as a life-long developer, founder, and brilliant speaker.  Now for the first time, he delivers a full-day workshop that can teach every developer something new, and most importantly, share the insights of a leading practitioner who built some of the things we use daily.

This training will be comprised of three 2-hour workshops, with breakfast, lunch, and coffee breaks included.

High Performance from Understanding the Low Levels
A deep dive into modern X86 hardware. We look at caches and caching behavior, data-races (and how they show up on an X86), Specter and Meltdown, the Java Memory Model, CPU performance details (e.g. wide and O-O-O issue, hit-under-miss caches, branch prediction) and memory bandwidth - and relate them to writing performant code. We then tear down a simple Big Data analytics processing loop, make some small changes and get a 5x speedup.The (Java) Virtual Machine
A look at Virtual Machines far and wide, with a deep dive into the Java Virtual Machine. We'll cover JIT'ing and GC'ing; bytecode cost models & class loading; deoptimization (and re-opt); safepoints; virtual calls & dynamic dispatch; threading and memory models; fast locks & faster locks; OS support (priorities, files, mmap, time) and much much more. Parallel and Distributed Computing and Debugging
Parallel computing is everywhere and distributed computing is not far behind. Both bring serious challenges, including data-races, consistency and timing, "Heisen-Bugs", testing, parallel-design thinking, performance, profiling and bottlenecks.  Note this session is not about micro-services and deployment, but about coding and getting correctness in a parallel & distributed environment.


Speakers
avatar for Cliff Click

Cliff Click

CEO, Rocket Realtime School
Cliff Click was the CTO of Neurensic (now successfully exited), and CTO and Co-Founder of h2o.ai (formerly 0xdata), a firm dedicated to creating a new way to think about web-scale math and real-time analytics. He wrote my first compiler when Ihewas 15 (Pascal to TRS Z-80!), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). Cliff helped A... Read More →


Wednesday November 14, 2018 9:00am - 5:00pm PST
Capital One 201 3rd St., 5th Floor, San Francisco CA 94103
 
Thursday, November 15
 

8:00am PST

Breakfast and Welcome
Hot breakfast and uninterruptible coffee!

Thursday November 15, 2018 8:00am - 8:45am PST
Commons

8:45am PST

Grand Welcome and Opening Remarks
Dr. Alexy Khrabrov, the Founder and Organizer of the communities By the Bay, welcomes speakers and attendees, and outlines the construction and agenda of the conference.

Moderators
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit

Thursday November 15, 2018 8:45am - 9:00am PST
functional

9:00am PST

Opening Keynote: New Functional Constructs in Scala 3
Scala 3 is shaping up, with feature freeze planned for 2019. In this talk I will discuss four new constructs that are likely to be part of Scala 3 and that will affect functional programming style in profound ways. They are: enums, implicit function types, opaque types, and extension methods. I'll present each feature in detail, motivate why it makes sense to add it, and discuss use cases.

Speakers
avatar for Martin Odersky

Martin Odersky

Chief Architect, Lightbend


Thursday November 15, 2018 9:00am - 9:40am PST
functional

9:50am PST

Privacy-aware data science in Scala with monads and type level programming
In order to extract value from datasets, data science and machine learning experts require access to the data itself. However, organizations increasingly have stronger requirements for finer-grained controls over the processing and analysis of potentially sensitive data, for reasons such as regulatory compliance or general privacy policies. In machine learning applications, it may also be desirable to restrict data flow in order to avoid leakage or contamination via side channel information (eg, see Oscar Boykin's talk from last year's SBTB). We therefore seek a general mechanism to assist users in encoding and enforcing information flow policies in their software, including interactive (ie, notebook) analyses. In this talk we develop a Scala approach to this problem based on PL and security research whereby illegal data accesses can be rejected at compile-time.

Speakers
avatar for David Andrzejewski

David Andrzejewski

Engineering, Sumo Logic
David Andrzejewski is a Senior Engineering Manager at Sumo Logic, where he works on applying statistical modeling and analysis techniques to machine data such as logs and metrics. He also co-organizes the SF Bay Area Machine Learning meetup group. David holds a PhD in Computer Sciences... Read More →


Thursday November 15, 2018 9:50am - 10:30am PST
data

9:50am PST

ArrayDeques and How to Contribute to Scala 2.13 Collections
In this talk, we will introduce a new data structure, mutable.ArrayDeque, that outperforms most current mutable Scala collections like Lists, Buffers, Stacks and Queues. We will also go over implementation details needed for this to be part of the new Scala 2.13 collections library. We will then encourage the audience to contribute other useful data structures like Ropes, Zippers, Disjoint Sets etc. to Scala.

Speakers
avatar for Pathikrit Bhowmick

Pathikrit Bhowmick

Head of Engineering, Coatue Management
Pathikrit writes Scala full-time at a hedge fund. He is also the author of many widely used Scala libraries: https://github.com/pathikrit such as better-files and metarest and is a committee member of the Scala Platform.


Thursday November 15, 2018 9:50am - 10:30am PST
functional

9:50am PST

Reactive Java Programming: a new Asynchronous Database Access API
Reactive Applications require non-blocking database access; the existing JDBC API leads to blocked threads, threads scheduling, and contention. For high throughput and large-scale deployment, the Java community needs a standard asynchronous API for database access where user threads never block. This session presents a new Java standard proposal for accessing SQL databases. This new API is completely non-blocking. It is not intended to be an extension to, or a replacement for, JDBC but, rather, an entirely separate API that provides completely non-blocking access to the same databases as JDBC. This presentation examines the API, its execution model, code samples, a demo of a prototype, and the next steps.

Speakers
avatar for Kuassi Mensah

Kuassi Mensah

Director Product Management, Oracle Corporation
Kuassi is Director of Product Management at Oracle. He looks after the following product areas (i) Java connectivity to DB (Cloud, on-premises), in-place processing with DB embedded JVM (ii) MicroServices and DB connectivity, and related topics (Data & Tx models, Kubernetes, SAGAs... Read More →


Thursday November 15, 2018 9:50am - 10:30am PST
reactive

10:40am PST

FP for Data Science: For-Loops Considered Harmful
Data science, machine learning and probabilistic programming are ready to reap to the benefits of functional programming (FP). FP is well known for the benefits of correctness, expressiveness, composability, parallelism, and more. While functional languages have taken off in many different domains, it has not yet deeply penetrated into modeling and data science, where Python is most common. In this talk, I will demonstrate in both code and design how functional programming patterns are a major benefit to data science practitioners - even when you work with Python. You will come away with a survey of FP techniques and how they relate to modeling and data science, how they fix the pitfalls of imperative languages and design, and a new perspective on why industry should move toward more FP techniques in the domain of data science.

Speakers
avatar for Aris Vlasakakis

Aris Vlasakakis

Machine Learning Engineer, Credit Karma
My interests are in machine learning, functional programming, distributed systems amongst others.


Thursday November 15, 2018 10:40am - 11:00am PST
data

10:40am PST

Immutable APIs and mutable internals: a Scala design case study
Abstract: We love immutability for creating clear APIs that are easy to understand. Unfortunately, many immutable designs are slower than comparable mutable designs. In this tutorial style talk we will consider the case of parser combinators. We will first create a simple immutable API for our parser combinator library. Then we will consider an alternate implementation that keeps the immutable API but composes using hidden mutable state. The result is an immutable public API with twice the performance of the original. This design approach can generalize to other use cases in Scala to retain a safe and clean API but dramatically improve performance.

Speakers
avatar for Oscar Boykin

Oscar Boykin

Machine Learning Infrastructure, Stripe
Oscar is the creating of Scalding, Summingbird, and Algebird, and is an overall professor and mathematician turned software magician.


Thursday November 15, 2018 10:40am - 11:00am PST
functional

10:40am PST

Monitoring Reactive Streams
Reactive Streams are the key to build asynchronous, data-intensive applications with no predetermined data volumes. By enabling non-blocking backpressure, they boost the resiliency of your systems by design. But how do you tune and debug such applications? When productionizing Reactive Streams, the same backpressure that preserves the safety of your pipeline can get in the way of effectively monitoring its status. In this talk we’ll present a line of action to 
  1. measure the throughput of your pipeline
  2. identify its bottlenecks and look at possible tuning counteractions
  3. diagnose liveness issues.
Examples will be in Scala and Akka Streams, however these patterns are generic and applicable to any Reactive Streams implementation out there.

Speakers
avatar for Stefano Bonetti

Stefano Bonetti

Software Engineer
Stefano has been developing large scale backend systems within the cozy boundaries of the JVM for a few years, and he has recently become passionate about the Scala ecosystem - especially all things Akka. He has contributed to Akka, Akka HTTP and Alpakka codebases. Since 2017 he presented... Read More →



Thursday November 15, 2018 10:40am - 11:00am PST
reactive

11:10am PST

Making Spark ML Models Portable - Know Your Options
After successfully training ML model with Apache Spark the next task becomes important - how to serve it? One way is to keep using Spark for serving as well, but sometimes it's not desired or possible. For instance if one would like to expose model as HTTP service, run in Docker container or use it on mobile device. This talk explores various approaches of how to allow model portability outside Spark to achieve this.

Speakers
avatar for Matthew Tovbin

Matthew Tovbin

Software Architect, Salesforce
Matthew Tovbin is a Software Architect at Salesforce, engineering Salesforce Einstein AI Platform, which powers the world’s smartest CRM. He is a co-author of TransmogrifAI (https://transmogrif.ai), an open-source AutoML library for structured data on Apache Spark. Before joining... Read More →


Thursday November 15, 2018 11:10am - 11:30am PST
data

11:10am PST

Transpiling GraphQL instead of writing customized server code
GraphQL is an excellent query language for clients because it specifies *what* data and response shape is needed without worrying about *how* to get and reshape that data. At Twitter, we take the next step and automatically compile GraphQL queries into code at runtime that efficiently specifies *how* to retrieve the data as well! Developers are able to expose new or existing data through our GraphQL API without writing code or deploying new software. Come see how we transpile our GraphQL queries into code that retrieves and composes data distributed across many services and databases to exactly satisfy each query while generically handling batching, errors, access control, and operational concerns—without our engineers writing a single resolver. We'll discuss how we leverage information from our existing distributed data access layer to power our predictable and uniform API that lets product developers easily get the data they need. Outline: Intro: GraphQL is *What* not *how* Resolvers: Specifying *how* by hand Problems with resolvers Generating resolvers Resolving with a non-GraphQL query Leveraging existing data access systems Transpiling GraphQL Exposing new data Generating a GraphQL API and implementation A sketch of what this enabled for us

Speakers
avatar for Michael Solomon

Michael Solomon

Software Engineer, Twitter
Mike Solomon is a software engineer on Twitter's Strato team where he uses Scala to generate uniform GraphQL, REST, and Scala APIs, and tries to make building new API services unnecessary.In his spare time he makes an audio-based choose-your-own adventure mobile game called Road Trip... Read More →


Thursday November 15, 2018 11:10am - 11:30am PST
functional

11:10am PST

The Danger of Implicit Blocking in Finagle
It is really bad to block Finagle’s event loop. The most common way this happens is when code explicitly blocks by calling Await.result or Await.ready. But this is not the only way the event loop can get blocked. The event loop can be blocked when long running computations are called within the loop. If you create a service that directly makes a long running computation it will work fine when you test it under no load. But in a production environment it will grind to a halt even though the server does not appear to be overloaded and should be fully capable of handling all the requests quickly. What you will find is happening is that your are exhausting the finagle event loop thread pool while the time taken to run each computation is acceptable the amount of time each request waits around for a network IO thread to become available quickly goes up. In my talk I will discuss how to identify when this problem is occurring and how to correct it so your latency returns back to expected values.

Speakers
avatar for Michael Armella

Michael Armella

Senior Software Engineer, Credit Karma
I currently work on recommender systems at Credit Karma. I focus on high scale, high availability systems for processing large amounts of data to produce simple answers to the question "what ad should we show this user?"


Thursday November 15, 2018 11:10am - 11:30am PST
reactive

11:40am PST

Continuous ML Applications in Production
Traditional machine learning pipelines end with life-less models sitting on disk in the research lab.  These traditional models are typically trained on stale, offline, historical batch data.

Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production.

Through a series of open source, hands-on demos and exercises, we will use PipelineAI to breathe life into these models using 4 new techniques that we’ve pioneered:

* Continuous Validation (V)
* Continuous Optimizing (O)
* Continuous Training (T)
* Continuous Explainability (E)

The Continuous "VOTE" techniques has proven to maximize pipeline efficiency, minimize pipeline costs, and increase pipeline insight at every stage from continuous model training (offline) to live model serving (online.)

Speakers
avatar for Chris Fregly

Chris Fregly

Founder, PipelineAI
Chris Fregly is Founder at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly... Read More →


Thursday November 15, 2018 11:40am - 12:20pm PST
data

11:40am PST

From Scala to ByteCode — a view of how Scala is implemented on top the JVM.
Scala runs on top the JVM and offers us cool and useful features such as Traits, Objects, Lazy definitions, higher-order functions , currying and more, but how are they really implemented under the hood ? Understanding what's going on underneath the covers of your code can be very beneficial and can lead to insights that may affect the way you write and debug your code. In this session we will take a deep look into the JVM to show how Scala does its magic by examining the bytecode and classes that are being generated.

Speakers
avatar for Alon Muchnick

Alon Muchnick

backend team lead, WIX.COM
Alon Muchnick is a software engineer with background in networking security and Unix systems. For the last two years he has been working for Wix.com, developing Wix Stores, a robust microservices-based eCommerce platform, using Scala stack and CQRS with event sourcing.



Thursday November 15, 2018 11:40am - 12:20pm PST
functional

11:40am PST

Practical Reactive Streams with Monix
Stream processing is a hot topic today and it’s easy to get lost among all the possibilities. In this live coding session we will explore the Reactive Streams approach used by the Monix project - one of the Scala implementations of the Reactive Streams concepts. On an almost real-life example we’re going to walk through both the basics and some more advanced usages of the library.

Speakers
avatar for Jacek Kunicki

Jacek Kunicki

Passionate Software Engineer, SoftwareMill
I'm a passionate software engineer living in the JVM land - mainly, but not limited to. I also tend to play with electronics and hardware. When sharing my knowlegde, I always keep in mind that a working example is worth a thousand words.


Thursday November 15, 2018 11:40am - 12:20pm PST
reactive

12:20pm PST

Lunch
Excellent lunch and networking!

Thursday November 15, 2018 12:20pm - 1:10pm PST
Commons

12:30pm PST

Unconference
Sign up at https://chief.sc/unconference2018
Follow #scalesf for updates!

We begin at lunchtime

Speakers
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive


Thursday November 15, 2018 12:30pm - 5:00pm PST
unconference

1:10pm PST

Monitoring AI with AI
The best performing offline algorithm can lose in production. The most accurate model does not always improve business metrics. Environment misconfiguration or upstream data pipeline inconsistency can silently kill the model performance. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug such types of incidents. Was it possible for Microsoft to test Tay chatbot in advance and then monitor and adjust it continuously in production to prevent its unexpected behaviour? Real mission critical AI systems require advanced monitoring and testing ecosystem which enables continuous and reliable delivery of machine learning models and data pipelines into production. Common production incidents include: - Data drifts, new data, wrong features - Vulnerability issues, adversarial attacks - Concept drifts, new concepts, expected model degradation - Dramatic unexpected drifts - Biased Training set / training issue - Performance issue In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production. Technical part of the talk will cover the following topics: - Automatic Data Profiling - Anomaly Detection - Deep Autoencoders - GANs - Density based Clustering of inputs and outputs of the model - Service Mesh, Envoy Proxy, trafic shadowing Demo part of the talk will simulate a real life concept drift as well as new concepts for the model and different algorithms that will catch those drifts in operational environment.

Speakers
avatar for Stepan Pushkarev

Stepan Pushkarev

CTO, Hydrosphere.io
hydrosphere.io CTOAutomation of AI/ML Operations: deployment, serving, monitoring, subsampling, retraining.


Thursday November 15, 2018 1:10pm - 1:30pm PST
data

1:10pm PST

Production Haskell Demystified
Core components of the software platform at Symbiont.IO are written in Haskell, which may come as a surprise to people who have only heard of it in the context of academic programming languages.

In this talk, I'll walk attendees through a small demonstration showing off how Haskell enables programmers to easily model their solutions in such a way that the compiler can derive arbitrary data samples, JSON codecs, property testing, and even whole web APIs almost entirely "for free".

Link to the demonstration code: https://github.com/jkachmar/scale-by-the-bay-2018


Speakers
avatar for Joe Kachmar

Joe Kachmar

Software Engineer, Symbiont
Joe is a Software Engineer at Symbiont working on developer and client tooling in Haskell, as well as an active contributor to some of Haskell's package infrastructure. His primary professional interests tend towards mathematics and strongly typed, functional programming paradigms... Read More →


Thursday November 15, 2018 1:10pm - 1:30pm PST
functional

1:10pm PST

Creating a Data Fabric for IoT
The move to data-driven decisions is dominating every industry, with significant implications for data processing. The emergence of IoT and edge computing promises to add a new dimension to data-driven insights, but has also added additional considerations for modern data processing. This has created the need for new approaches and technologies that can bring IoT data and edge computing into the data processing fold. Such an approach need to satisfy the following: Capture data across edge devices and intelligently transform it so that it can be used at the edge, in the cloud and in the data center Manage and facilitate the flow of data from the edge to the data center or cloud to support real-time processing Trigger data processing applications as soon as data arrives at the edge, in the data center and in the cloud Process data centrally and push intelligence to the edge Provide support to quickly react to insights to mitigate risk, reduce costs and ensure security Make it easy to deploy and manage in an increasingly complex IT environment In this talk, we will talk about how to build an IoT data fabric using Apache Pulsar and how to satisfy all these requirements that works both in the edge and data center. It provides a small footprint and It can run at the edge where it captures, filters and transforms data and then pushes data to data center or cloud. Another instance of Apache Pulsar can run in the data center captures this data from several edge devices, processes them and store them indefinitely for post analysis. It provides sophisticated analytical capabilities using Pulsar Functions and SQL to process the data as it flows.

Speakers
avatar for Karthik Ramasamy

Karthik Ramasamy

Co-founder and Chief Product Officer, Streamlio


Thursday November 15, 2018 1:10pm - 1:30pm PST
reactive

1:40pm PST

Video Access-Log Processing with Apache Flink
Learn about the streaming data pipeline that supports the Mux Video service to process CDN access logs to support performance monitoring, dynamic CDN selection, log enrichment, and utility billing. The talk will cover the architecture & technologies (Go, Java, Kafka, Flink) and some of the challenges we’ve faced, including variable log delivery schedules that are addressed with Flink’s powerful windowing features, and dynamic CDN selection based on CDN performance in a geographic locale. This talk will appeal to developers faced with processing large volumes of data with minimal latency (a requirement that has become increasingly common) and supporting a growing number of business applications.

Speakers
avatar for Scott Kidder

Scott Kidder

Software Engineer, Mux
I've been working on video encoding & delivery platforms for over 10 years (MobiTV, Brightcove/Zencoder, and now Mux). I'm currently working on Mux's API for Video, making it easy for developers to build amazing applications that include video without needing to be video experts... Read More →


Thursday November 15, 2018 1:40pm - 2:00pm PST
data

1:40pm PST

Towards Parallelizing Scala Compilations
Rsc is an experimental Scala compiler focused on compilation speed. Our research goal is to achieve dramatic compilation speedups for typical Scala codebases.
Recently, we've been experimenting with outlining, i.e. computing type signatures of public and protected definitions in the program. Outlines represent dependencies between different elements of the program, so if we compute outlines, we can compile all files and perhaps even all methods of the program in parallel.
With a few language restrictions, Rsc can compute outlines very quickly. On Twitter Util, a foundational library of the Twitter monorepo, Rsc performs outlining roughly 10x faster than Scalac performs compilation. Having obtained the outlines, we can then partition the sources and launch multiple Scala compiler instances in parallel.
Join our talk to learn how outlining works, how well it performs in practice and what we have planned for the future.

Speakers
avatar for Eugene Burmako

Eugene Burmako

Language tools lead, Twitter
Language tools lead at Twitter, member of the Scala language committee, founder of Rsc, Scalameta and Scala Macros.


Thursday November 15, 2018 1:40pm - 2:00pm PST
functional

1:40pm PST

Connected Car Ecosystem: An Architectural Overview
Talking about how we ingest telemetry data from our awesome cars into our data platform and join this data with different static data source, while not losing the most valuable asset "time".

Speakers
avatar for Sonam Kanungo

Sonam Kanungo

Senior Software Engineer, Mercedes Benz Research & Development North America


Thursday November 15, 2018 1:40pm - 2:00pm PST
reactive

2:10pm PST

Programming the worldwide elastic supercomputer with Unison
Unison is a new open source language for building distributed systems. It starts with a premise that no matter the scale of a computation, no matter how many nodes it occupies, it should be expressible via a single program, not thousands of separate programs. A logical endpoint of this perspective is that there is only one computer, the global supercomputer, consisting of all the internet-connected compute nodes of civilization, and our programming languages should let us program this supercomputer simply and directly, even when describing computations that occupy thousands or millions of nodes! This talk introduces the Unison language, showing what it can be like to program systems of any size with this model of computing.

Speakers
avatar for Rúnar Bjarnason

Rúnar Bjarnason

Cofounder, Unison
My name is Rúnar. I’m a software engineer in Boston, an author of a book, Functional Programming in Scala, and cofounder of Unison Computing. We're making a distributed programming language called Unison.Talk to me about functional programming, relational database theory, compilers... Read More →
avatar for Paul Chiusano

Paul Chiusano

Cofounder, Unison Computing
I've been doing functional programming in a mix of Scala and Haskell for 10 years and enjoy thinking about (and building) new programming technology. I like programming tech that is simple, delightful, and helps programmers focus on the fun parts of software creation. I currently... Read More →


Thursday November 15, 2018 2:10pm - 2:30pm PST
reactive

2:10pm PST

Evolution of GoPro's Data Platform
In this talk, we will discuss the evolution of the data platform at GoPro from fixed-size Hadoop clusters to Cloud-based Spark Clusters with a Centralized Hive Metastore + S3. Share our experience in data architecture transformation, batch and streaming frameworks transformation, data democratization via Slack, data portal & visualization, and machine learning features visualization via Google Facets + Spark.

Speakers
avatar for David Winters

David Winters

Big Data Architect, GoPro
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka streaming data ingestion pipeline. He has been developing scalable data processing pipelines and eCommerce systems for over 20 years in Silicon Valley. David's current big... Read More →



Thursday November 15, 2018 2:10pm - 2:50pm PST
data

2:10pm PST

Rage Against the Ecosystem
Scala's open-source ecosystem is broken: writing and maintaining build configurations is too difficult, and publishing is even harder, coming with the additional friction of having to support an increasing multiplicity of binary targets. But worse, this workflow puts a burden on a few key people in the Scala community to publish their libraries quickly so that their downstream users can publish theirs, and it can take months for some projects to be published. How is it that the multi-billion-dollar Scala software industry is so dependent on so few people? I will introduce Fury, a fast, source-based dependency manager and build tool for Scala which aspires to radically disrupt the ecosystem for the better. Fury defines builds as static data, not code, making viewing them instantaneous and understanding them easy. Fury facilitates a new, distributed, version-controlled and trust-based ecosystem where publishing is as simple as tagging a signed commit and telling users about it. Builds can be external to projects, so there's no need to impose Fury upon any existing developers who are happy using sbt. The utopia we are striving for is a new, fluid and versatile ecosystem in which developers are liberated to publish more easily and frequently, and where it becomes easier for anyone to make contributions to open-source projects.

Speakers
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive


Thursday November 15, 2018 2:10pm - 2:50pm PST
functional

2:30pm PST

A Reactive Fraud Monitoring Engine for Instant Payments
2018 is a challenging year for European banks and their fraud management capabilities. In November, the introduction of SEPA Instant Credit Transfers will mean that the time allowed for transferring money to payment beneficiaries will have to be reduced from one business day, as it is currently, to a maximum of five seconds.  This dramatically reduces time for fraud detection.  Around the same time, the Revised Payment Service Directive (PSD2) will oblige banks to give access to third-party providers of financial services to customers' accounts through open APIs.  This introduces new fraud risks and the possibility for fraudsters to come up with new modus operandi.  All this takes place in a context where the General Data Protection Regulation (GDPR) is adding new constraints to the way customers' data may be accessed and used.  This presentation describes the architecture of a new fraud monitoring engine (FraME) for BNP Paribas Fortis bank to detect and react to fraud attempts in real-time given the above mentioned new rules and constraints. The presentation discusses the main challenges met during the construction of the engine and the solutions adopted to overcome them. FraME is a typical reactive system as defined in the Reactive Manifesto. It is elastic and resilient in order to detect fraud within a few hundred milliseconds and without any downtime while being available 24/7.  Fraud detection is obtained using machine learning models that are regularly retrained, tested, and deployed.  These models are completed by a set of detection rules that permit quick reaction to new fraudulent modus operandi. FraME is implemented using Apache Kafka, Apache Flink and a combination of micro-services.

Speakers
avatar for David Massart

David Massart

Solution Architect, D.E.Solution


Thursday November 15, 2018 2:30pm - 2:50pm PST
reactive

3:00pm PST

Down the Wabbit Hole
Designing flexible machine learning pipelines can be tricky, especially when you are constrained to using off-the-shelf ML frameworks. Formation found itself in this situation last year, when it needed to build an MVP pipeline for solving contextual bandit problems. In this experience report I'll describe a several useful design patterns from the Haskell world, and show how we used them to create safe and flexible interfaces that support both TensorFlow and Vowpal Wabbit models. Along the way, we’ll survey a variety of Haskell's language features, from the very abstract to the very low-level, and see how they come together in a real-world application.

Speakers
avatar for Chris McKinlay

Chris McKinlay

Principal Software Engineer, Formation


Thursday November 15, 2018 3:00pm - 3:40pm PST
data

3:00pm PST

Graal: How to use the new JVM JIT compiler in real life
With JEP 317: Experimental Java-Based JIT Compiler in JDK 10, Graal is now part of OpenJDK. In fact, Graal is already available in JDK 9 due to JEP 243: Java-Level JVM Compiler Interface. Graal is itself written in Java and that brings some new properties and behavior to the table which we haven’t seen with existing HotSpot JIT compilers. This talk will show how to use Graal with JDK 10, how to compile an upstream Graal version and what to look out for when using it for benchmarking or even in production.

Speakers
avatar for Chris Thalinger

Chris Thalinger

Staff Software Engineer, Twitter
Chris Thalinger is a software engineer working on Java Virtual Machines for more than 14 years. His main expertise is in compiler technology with Just-In-Time compilation in particular. Initially being involved with the CACAO and GNU Classpath projects, the focus shifted to OpenJDK... Read More →


Thursday November 15, 2018 3:00pm - 3:40pm PST
functional

3:00pm PST

Consensus Algorithms in Distributed Systems
Many messaging systems that are widely used in the industry, e.g., Kafka, use centralized distributed systems services to achieve reliability and consensus between servers. Companies in the industry use the services; however, only a few of them understand the details of the protocols. This talk brings the principles used in academia to the industry by introducing the common distributed systems protocols implemented underneath the popular services. In addition, this talk will compare the differences between how the protocols are used in both academia and the industry. It provides details of how the protocols, specifically Paxos and Raft, work, including how they elect leaders among servers, how they achieve consensus between machines, and how they reliably process and execute client commands. Therefore, it shows how the systems and services, which use the protocols, are enabled to have fault-tolerance, and to achieve confidentiality, integrity, authenticity, availability, etc. From the reliability and security point of view, the talk discusses how the protocols deal with machine failures, including leader failures and replicas failures. It shows the vulnerabilities and potential security issues exist in the protocols. Last but not least, we'll take a look at what we can do to avoid the vulnerabilities when applying the academic theories in the industry.

Speakers
avatar for Yifan Xing

Yifan Xing

Software engineer, Apple
distributed systems


Thursday November 15, 2018 3:00pm - 3:40pm PST
reactive

3:45pm PST

Machine Learning on Source Code
Machine Learning is definitely what the cool kids are doing nowadays. Deep Learning specifically powered a revolution on many fields of research, including Computer Vision and Natural Language Processing, but also self driving cars, or strategy games like Go. What not many are talking about is how to those techniques to improve our developer routines. Machine Learning on Source Code (MLonCode) is a very interesting field because it is at the frontier of Natural Language Processing, Graph-Based Machine Learning, Static Analysis, and has the power to even bring other fields like Dynamic Analysis of programs. The amount of data available for this problem is almost overwhelming, and given that data is the fuel of Machine Learning, we are excited for an amazing ride! This talk will cover the basics of what Machine Learning techniques can be applied to source code, specifically we will discover: * embeddings over identifiers, * structural embeddings over source code, answering the question how similar are two fragments of code, * recurrent neural networks for code completion, * future direction of the research. While the topic is advanced, the level of mathematics required for this talk will be kept to a minimum. Rather than getting stuck in the details, we'll discuss the advantages and limitations of these techniques, and their possible implications to our developer lives.

Speakers
avatar for Francesc Campoy Flores

Francesc Campoy Flores

VP of Product and DevRel, source{d}
Francesc Campoy Flores is the VP of Product and Developer Relations at source{d}, the company enabling Machine Learning for large scale code analysis and building the platform for the future of developer tooling. Previously, he worked at Google as a Developer Advocate for Google Cloud... Read More →


Thursday November 15, 2018 3:45pm - 4:20pm PST
data

3:45pm PST

Your Type System Working For You!
In this session we will explore ways of making the Scala type system work harder for you to improve correctness in your code. With examples ranging from compile time verification of geographical coordinate reference systems through to enforcing constraint rules between domain models, this talk will show you ways to harness the power of the Scala compiler to catch errors sooner, and make writing correct code easier.

Speakers
avatar for Richard Wall

Richard Wall

CEO, Escalate Software
Long time Scala developer, trainer and enthusiast. Started possibly the first Scala user group - Bay Area Scala Enthusiasts. Winner of the inaugural Phil Bagwell award for Scala community work. Scalawag and Java Posse podcast co-host. Hiker, biker, music lover, love to travel.


Thursday November 15, 2018 3:45pm - 4:20pm PST
functional

3:45pm PST

2 Fast 2 Furious: migrating Medium's architecture without slowing down
We’re shifting gears to leverage new technologies created since we built Medium 5 years ago, but we need to incrementally gain benefits from the new system along the way and we can’t afford to let it hinder feature development. By taking advantage of GraphQL’s flexibility and our existing infrastructure, we’re able to make widespread yet gradual architectural changes! Come see how Medium is changing lanes without slowing down. Anyone thinking about moving to GraphQL (or thinking about migrating an exisiting architecture in general) can benefit from this talk, but especially anyone who is building their own GraphQL server or needs practical advice on how to successfully migrate a legacy system to GraphQL without “stopping the world,” getting defunded partway through, or building a system no one uses. Abstract Migrating an entire system to new tools and frameworks isn’t an easy task. And doing that while not impacting feature development? That’s even harder. We’ll walk through how Medium is migrating off of our existing system, without hindering product development, and while also incrementally gaining the benefits of a new system along the way. We’ll go over the design of our new architecture, our phased migration approach, and how the layered structure of our GraphQL server (written in Scala with Sangria) was integral to the success of both. - Goals of the migration - Design of the new system - Phased approach - Phase 1: developer experience - IDLs (protobuf) + GraphQL - Phase 2: services + gRPC - GraphQL server layers - Fetchers - Repos - Schema (derivation) - Putting it all together

Speakers
avatar for Sasha Solomon

Sasha Solomon

Platforms Team Tech Lead, Medium
I'm the Tech Lead on the Platforms Team helping architect the next generation infrastructure at Medium in San Francisco. Scala + GraphQL 4 Lyfe. Player Character. Potato Compatible.Follow me on Twitter @sachee for tweets about GraphQL, Dungeons and Dragons, and an excellent RT ga... Read More →


Thursday November 15, 2018 3:45pm - 4:20pm PST
reactive

4:25pm PST

Inside NVIDIA’s AI infrastructure for self-driving cars
In this talk, we'll discuss Project MagLev, NVIDIA's internal end-to-end AI platform that enables the development of NVIDIA's self-driving car software, DRIVE. We'll explore the platform that supports continuous data ingest from multiple cars (each producing TBs of data per hour) and enables autonomous AI designers to iterate training new neural network designs across thousands of GPU systems and validate their behavior over multi PB-scale data sets. We will talk about our overall architecture, from data center deployment to AI pipeline automation, large-scale AI dataset management, AI training & testing. We will also touch on RAPIDS (http://rapids.ai), NVIDIA's new platform to accelerate data science. 

Speakers
avatar for Clément Farabet

Clément Farabet

VP, AI Infrastructure, NVIDIA
Clément Farabet is VP of AI Infrastructure at NVIDIA. He received a PhD from Université Paris-Est in 2013, while at NYU, co-advised by Laurent Najman and Yann LeCun. His thesis focused on real-time image understanding, introducing multi-scale convolutional neural networks and a... Read More →


Thursday November 15, 2018 4:25pm - 5:00pm PST
data

4:25pm PST

Channeling the Inner Complexity
An essential requirement for writing programs that scale, is to have
constructs to model concurrency in an understandable, safe, and
efficient manner. This talk presents an overview of various such
models available in Scala, and their impact on program structure and
complexity. It then explores a way to model concurrency with less
complexity with an implementation of Communicating Sequential
Processes (CSP), heavily inspired by Goroutines, scala-async and
Clojure's core.async.

Speakers

Thursday November 15, 2018 4:25pm - 5:00pm PST
functional

4:25pm PST

H2O internals
H2O does in-memory analytics on clusters with distributed parallelized state-of-the-art Machine Learning algorithms.  However, the platform is very generic, and very very fast.  H2O.ai builds Machine Learning tools with it, but the platform can do much more.  H2O includes a K/V store exact semantics with typical read and write speeds of ~200ns; a highly compressed Big Data in-memory storage typically better than 2x to 4x gzip-on-disk size, which can read and decompress the data at C/Fortran speed; a pure-Java clean and simple coding style to write parallel & distributed code; a generic serializer that's well faster than protobuf or kryo and does not need an special registration or markup language; a large set of building blocks for common math operations, and of course a library of state-of-the-art ML algorithms.  This is a low-level systems' implementation talk of H2O's design.

Speakers
avatar for Cliff Click

Cliff Click

CEO, Rocket Realtime School
Cliff Click was the CTO of Neurensic (now successfully exited), and CTO and Co-Founder of h2o.ai (formerly 0xdata), a firm dedicated to creating a new way to think about web-scale math and real-time analytics. He wrote my first compiler when Ihewas 15 (Pascal to TRS Z-80!), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). Cliff helped A... Read More →


Thursday November 15, 2018 4:25pm - 5:00pm PST
reactive

5:10pm PST

Panel I: Thoughtful Software Engineering
The distinguishing quality of SBTB is that authors step back from the problem at hand and think how it can be abstracted and composed from reusable abstractions.  Over the years the community evolved numerous approaches that went mainstream, such as type-level programming, reactive systems, and more.  What are the best software engineering practices worth adopting?  We'll invite some of the best positioned folks in the community to share their views and experiences.

Speakers
avatar for Rúnar Bjarnason

Rúnar Bjarnason

Cofounder, Unison
My name is Rúnar. I’m a software engineer in Boston, an author of a book, Functional Programming in Scala, and cofounder of Unison Computing. We're making a distributed programming language called Unison.Talk to me about functional programming, relational database theory, compilers... Read More →
avatar for Bryan Cantrill

Bryan Cantrill

CTO, Joyent
Bryan Cantrill is the CTO at Joyent, where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform.Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system... Read More →
avatar for Cliff Click

Cliff Click

CEO, Rocket Realtime School
Cliff Click was the CTO of Neurensic (now successfully exited), and CTO and Co-Founder of h2o.ai (formerly 0xdata), a firm dedicated to creating a new way to think about web-scale math and real-time analytics. He wrote my first compiler when Ihewas 15 (Pascal to TRS Z-80!), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). Cliff helped A... Read More →
avatar for Marius Eriksen

Marius Eriksen

infrastructure-infrastructure engineer, Grail
Marius is the author of such ideas as Finagle, Zipkin, Your Server is a Function, and many others.
avatar for Martin Odersky

Martin Odersky

Chief Architect, Lightbend
avatar for Julie Pitt

Julie Pitt

Director, Machine Learning Infrastructure, Netflix
Julie leads the Machine Learning Infrastructure at Netflix, with the goal of scaling Data Science while increasing innovation. She previously built streaming infrastructure behind the "play" button while Netflix was transitioning from domestic DVD-by-mail service to international... Read More →


Thursday November 15, 2018 5:10pm - 6:00pm PST
functional

6:00pm PST

Happy Hour I
Our famous happy hour caps the day with excellent food and drinks and great conversation.

Thursday November 15, 2018 6:00pm - 8:00pm PST
Commons
 
Friday, November 16
 

8:00am PST

Breakfast and Welcome
Hot breakfast and uninterruptible coffee!

Friday November 16, 2018 8:00am - 9:00am PST
Commons

9:00am PST

Keynote II: Kafka and the Rise of Event-Driven Microservices
Events that happen in a business are so important because they are a universal language for continuously evolving data. But, where do they come from? In this talk, Neha Narkhede will show you how to consider data differently and identify events in your apps and streams of data.

Friday November 16, 2018 9:00am - 9:40am PST
functional

9:50am PST

Building a Contacts Graph from activity data
In the customer age, being able to extract relevant communications information in real-time and cross reference it with context is key. Salesforce is using data science and engineering to enable salespeople to monitor their emails in real-time to surface insights and recommendations using a graph modeling contextual data. In this presentation, Alexis will explain how Salesforce AI Inbox builds and uses an activity graph (based on emails, meetings, ...) to offer services such as recommended connections and provide context for real time insights and recommended actions. Alexis will go over use cases, technical architecture, best practices to show how the Graph and services are built and do a live demonstration.

Speakers
avatar for Brad Powley

Brad Powley

Sr. Machine Learning Engineer, Salesforce
Brad is a machine learning engineer at Salesforce, focusing on contextual services for the Einstein platform.Brad has over a decade of engineering experience, and a half-decade of strategy consulting, sandwiched around a PhD in management science. Now he is an engineer again!Brad... Read More →
avatar for Alexis Roos

Alexis Roos

Director Data Science, Salesforce
Alexis is director of data science and machine learning at salesforce where he is leading a team of data scientists and engineers delivering Intelligent services for Einstein platform. Alexis has over twenty years of engineering and management experience including 13 years at Sun... Read More →


Friday November 16, 2018 9:50am - 10:30am PST
data

9:50am PST

Declarative distributed concurrency in Scala
I present the Distributed Chemical Machine (DCM) - a purely functional, fully declarative framework for parallel, concurrent, and distributed computing in Scala. The DCM can automatically run multi-core concurrent code on any number of machines connected to (one or more) Zookeeper instances. Zookeeper provides data coordination, persistence, and fault tolerance, allowing the programmer to focus on the distributed application logic in a peer-to-peer architecture. The DCM builds upon the Chemical Machine (http://chemist.io), a data-driven, message-passing concurrency paradigm that significantly improves upon the Actor Model, achieving automatic parallelism and a higher level of declarative expressiveness for concurrency. Previously I implemented the (single-JVM, multi-core) Chemical Machine as an embedded DSL in Scala. With very few code changes and little configuration, a Chemical Machine-based application can now be ported to the DCM and run on a cluster, achieving automatic distribution.

Speakers
avatar for Sergei Winitzki

Sergei Winitzki

Senior Software Engineer, Workday Inc.
Theoretical physicist turned software engineer, passionate for functional programming, functional type theory, and declarative domain-specific languages.


Friday November 16, 2018 9:50am - 10:30am PST
functional

9:50am PST

Fast Data pipelines with Akka Streams and Alpakka Kafka Connector

The Alpakka Kafka connector (formerly known as reactive-kafka) is a component of the Alpakka project.  It provides a diverse streaming toolkit, but sometimes it can be challenge to design these systems without a lot of experience with Akka Streams and Akka.  By combining Akka Streams with Kafka using Alpakka Kafka, we can build rich domain, low latency, and stateful streaming applications with very little infrastructure.

This talk will discuss solutions to common Kafka and streaming problems such as consumer group partition rebalancing, exactly-once/transactional message delivery, stateful stages, state durability/persistence, and common production concerns like job failover and deployment.

The Alpakka project is an open source initiative managed by Lightbend to implement stream-aware, reactive, integration pipelines for Java and Scala. It is built on top of Akka Streams, and has been designed from the ground up to understand streaming natively and provide a DSL for reactive and stream-oriented programming, with built-in support for backpressure.


Speakers
avatar for Sean Glover

Sean Glover

Software Engineer, Fast Data Platform Team, Lightbend
Sean is a Senior Software Engineer on the Fast Data Platform team at Lightbend.  He specializes in Apache Kafka and its ecosystem.  He has experience consulting with Global 5000 companies on how to build Fast Data platforms using technologies such as Scala, Kafka, Spark, and Akka... Read More →


Friday November 16, 2018 9:50am - 10:30am PST
reactive

10:00am PST

Unconference
Sign up at https://chief.sc/unconference2018
Follow #scalesf for updates!

We begin at lunchtime

Speakers
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive


Friday November 16, 2018 10:00am - 5:00pm PST
unconference

10:40am PST

Graph-First Services Using GraphQL
Authors: Julien Delange & Adam Crane As of today, many applications expose their data with limited ability to query or filter fields. Often, application developers use multiple endpoints to implement behavior variability or data filtering, which is inefficient from an engineering perspective. GraphQL, a new query language, addresses this issues by allowing the end user to query the data available on the server and select only the fields of interest. Initially started by Facebook, the technology gained traction over the years to the point that several companies started to implement GraphQL endpoint. In this talk, we present a methodology for building a modern data-rich application centered around GraphQL schemas. We cover how the schema informs decisions for the rest of the application layers and unlocks new query patterns and possibilities compared to a standard REST-based or IDL-based approach. We also cover how this affects the design of data storage, using traditional storage backends (key-value, SQL). To support these themes we present a novel application (Network Health Visualizer), its data pipeline and how the implementation of its GraphQL interface guided the rest of our development process. Additionally, we cover the Twitter GraphQL API and how it fits into the data layer of Twitter infrastructure as a whole.

Speakers
avatar for Adam Crane

Adam Crane

SRE, Twitter
avatar for Julien Delange

Julien Delange

Software Engineer, Twitter
Former rocket-scientist at the European Space Agency in the Netherlands, Julien was previously a senior staff researcher at Carnegie Mellon University and a senior software engineer at AWS. He is now a staff software engineer at Twitter, where he is working on improving Twitter infrastructure... Read More →


Friday November 16, 2018 10:40am - 11:00am PST
data

10:40am PST

Adopting GraalVM
After many years of development, Oracle finally published GraalVM and sparkled a lot of interest in the community. GraalVM is a high-performance polyglot VM with a number of potentially interesting traits we can take advantage of like increased performance and lowered cost. It can also tackle shortcomings of JVM/Scala we are struggling for years like slow-startup times or large jars. Lastly, thanks to its polyglot nature it can open interesting doors we may want to discover. On the other hand, GraalVM may still be bleeding edge technology and having a hard time to deliver the promised features. In this talk, I’d like to discuss advantages and disadvantages of adopting GraalVM, provide you guidance if you decide to do so and also share our story in this area including various samples, and recommendations. This talk is focused on JVM and Scala but should be beneficial for everyone with interested in this topic.

Speakers
avatar for Petr Zapletal

Petr Zapletal

Tech Lead, Disney Streaming Services
My name is Petr and I work for Disney Streaming Services (ex. Bamtech Media ex. Cake Solutions). I'm interested in Reactive and Distributed Systems, Streaming and ofc Scala and JVM.


Friday November 16, 2018 10:40am - 11:00am PST
functional

10:40am PST

Swimming in the stream: A simple data analytics pipeline with Akka Streams
Streams! Streams offer an interesting conceptual model to processing pipelines that is very functional programming oriented. The streaming paradigm is very well suited to deal with a constant flow of data and Akka streams is a powerful implementation of it. It offers a set of composable building blocks for creating asynchronous and scalable data streaming applications. In this talk we will live code from a very basic stream to a data aggregation pipeline that interacts with multiple services

Speakers
avatar for Gabriel Claramunt

Gabriel Claramunt

CTO, Scalents


Friday November 16, 2018 10:40am - 11:00am PST
reactive

11:10am PST

5 tips to build long-lasting Scala OSS
Seven years ago, I started ScalikeJDBC, one of the popular database libraries in Scala. After years, I am still working on it. In the meantime, the Scala community has been growing sharply. Despite the growth of the community, we haven't seen the increase in long-lasting OSS projects yet. In this talk, I will share pieces of my knowledge learned through my experiences with OSS.
 
The 5 tips I will share in this talk are
 
1) Find your lifework project
2) Be careful about adding Scala dependencies
3) Stick with binary compatibility
4) Provide cross builds
5) Have effective CI builds

If you'd like to know more about the tips, this talk is for you.

* sbt project template including all tips: https://github.com/seratch/long-lasting-scala.g8

Speakers
avatar for Kazuhiro Sera

Kazuhiro Sera

Senior Software Engineer, Salesforce
Scala enthusiast in Tokyo, Japan. Creator of ScalikeJDBC and Skinny Framework. One of the active maintainers of json4s and Scalate.



Friday November 16, 2018 11:10am - 11:30am PST
reactive

11:10am PST

Towards Typesafe Deep Learning in Scala
The preferred language of current deep learning frameworks (TensorFlow, PyTorch, MXNet, DyNet, etc.) is Python, a type-unsafe language. Inspired by the typesafety of Scala, we present Nexus, a prototypical typesafe deep learning engine in Scala. Being extraordinarily expressive in types, Nexus offers unforseen typesafety (axes of tensors are typed statically) and succinctness to deep learning developers by extensive use of typelevel computation through the popular library Shapeless. In this talk I'll introduce the design of a deep learning framework, and how Scala's type-level computation abilities could make it safer, easier to write and more expressive. Ideas include generalized algebraic data types (GADTs), heterogeneous lists (HLists), program verification (compiling-as-proofs with Scala implicits), and introductory machine learning.

Speakers
avatar for Tongfei Chen

Tongfei Chen

PhD candidate, Johns Hopkins University
Natural language processing researcher; programming language aficionado. Likes to talk about NLP/ML/AI/type systems/functional programming.


Friday November 16, 2018 11:10am - 11:45am PST
data

11:10am PST

Rust and Other Interesting Things
Bryan, the CTO of Joyent, and a core contributor to Solaris, ZFS, and DTrace, formerly a Distinguished Engineer at Sun, has recently picked up Rust.  He'll share his experience with us.

Speakers
avatar for Bryan Cantrill

Bryan Cantrill

CTO, Joyent
Bryan Cantrill is the CTO at Joyent, where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform.Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system... Read More →


Friday November 16, 2018 11:10am - 11:45am PST
functional

11:30am PST

Using Akka Streams for Web-scale Data Ingestion
Iterable uses Akka Streams to manage streams of hundreds of millions of events daily for some of the world's largest consumer companies. In this talk we'll discuss how we use Akka Streams to ingest streams of event data into Elasticsearch using Kafka. We'll show how Akka Streams makes writing streaming-related code more elegant, flexible, and performant. We'll discuss the various stream operations such as `mapAsync` and `groupedWithin`. We'll look at how Akka Streams deals with common streaming data issues such as idempotency, ordering of updates, and backpressure. Finally, we'll discuss practical issues such as measuring performance, testing, and tuning an Akka Streams-based system in the context of massive data ingestion into a distributed, eventually consistent database.


Speakers
avatar for Greg Methvin

Greg Methvin

Staff Software Engineer, Iterable
Greg Methvin is probably best known for his work as a maintainer for Play Framework and a contributor to Scala, Akka, and other open source libraries in the Scala community. He currently works on Iterable's data infrastructure team, building a platform to help companies better understand... Read More →
avatar for Jie Ren

Jie Ren

Senior Software Engineer - Data Infrastructure, Iterable
I'm a software engineer currently focused on data infrastructure at Iterable. I am passionate about solving complex technical challenges, as I’ve done throughout my career. I've been working with the Scala, Elastic, Akka Streams, Kafka stack for the last 4 years. Prior to Iterable... Read More →


Friday November 16, 2018 11:30am - 12:00pm PST
reactive

11:50am PST

Enabling Big Data and Machine Learning for the Masses: Creating a Spark Platform for the Uninitiated
Medium is expanding its use of big data and machine learning to support its product teams. In doing so, it needs to find a way to leverage both the existing technical stack in which it has invested and the knowledge of its engineering team. Unfortunately, these are somewhat at odds. Medium has heavily invested in Scala and Spark for its ETL pipelines. And while Spark certainly provides functionality to support big data analysis and machine learning, its learning curve is very high and only a few Medium engineers have experience with it. To combat this, Medium is actively developing a platform that eases the learning curve for both big data and machine learning operations. This is not only helping get to machine learning results faster, but also write and maintain ETL pipelines more efficiently. The platform includes tools for development, online and offline testing, machine learning experimentation, and monitoring.

Speakers
avatar for Tim Kral

Tim Kral

Team Lead of Data Engineering, Medium


Friday November 16, 2018 11:50am - 12:30pm PST
data

11:50am PST

You Are a Scala Contributor
Scala is a community-based language. A few people at Lightbend and the Scala Center are paid to facilitate, but ultimately Scala succeeds or doesn't because of you. This talk is about how to participate effectively in open-source work happening in the scala/* repositories on GitHub. You'll learn the overall lay of the land as well as advice on contributing in specific areas such as websites and documentation, issue reporting, Scala modules, the Scala standard library, and even the Scala compiler.

Speakers
avatar for Seth Tisue

Seth Tisue

Scala team, Lightbend
I like: compilers and interpreters, functional programming, and open-source software. I've been active in the Scala community since 2008. Before joining Lightbend in 2015, I used Scala to build the compiler and other tools for NetLogo, an open-source programming language for kids... Read More →


Friday November 16, 2018 11:50am - 12:30pm PST
functional

12:00pm PST

Scaling Bayesian Experimentation

In the past few years companies industry wide have noted the limitations of traditional null hypothesis significance testing (NHST) for online experimentation. In particular, statistical problems like multiple comparisons and peeking have been difficult to solve while still being able to make fast business and product decisions. Bayesian methods provide an alternative to overcome these problems, but are often avoided because of worries about their complexity and computational intensity. 

 

We will talk about three challenges with Bayesian statistics for experimentation and how big data, tools like Spark, and a little statistical ingenuity can help us address them. The three challenges we will discuss are (1) coming up with priors for experimentation in a world of big data, (2) building a fast Bayesian computation pipeline that is generalizable to all of the metrics your organization cares about, and (3) overcoming computational inefficiencies when using these statistical methods in a real-time experimentation environment. 

 

To accomplish (2) we use bootstrapping and for (3) we will talk about some of the challenges and solutions to making it computationally efficient.

 

In the past few years Internet-based companies have noted the limitations of traditional null hypothesis significance testing (NHST) for large-scale, online experimentation. In particular, statistical problems like multiple comparisons and peeking have been difficult to solve. Bayesian methods provide an alternative to overcome these problems, but are often avoided because of worries about their complexity and computational intensity. 

 

We will talk about three challenges with Bayesian statistics for experimentation and how big data, tools like Spark, and a little statistical ingenuity can help us address them. The three challenges we will discuss are (1) coming up with priors for experimentation in a world of big data, (2) building a fast Bayesian computation pipeline that is generalizable to all of the metrics your organization cares about, and (3) overcoming computational inefficiencies when using these statistical methods in a real-time experimentation environment. 

 

In the literature on Bayesian statistics, and especially in criticisms of it, you will often run across the difficulty of coming up with priors for statistics. We will show how we were able to come up with a general approach to generating priors. 

 

The other criticism of Bayesian statistics, and a potential roadblock for implementing it in a big data pipeline, is that it is computationally expensive. This is especially true for more complex models such as a standard revenue distribution which is typically multimodal with a peak at zero and then another near the average receipt. Under a Bayesian methodology, such distributions require multiple parameters to be estimated and do not have analytic (conjugate) priors. The standard approach of using Markov Chain Monte Carlo (MCMC) simulations can be too slow, cannot be parallelized, and requires modeling of each metric. We will discuss how we use Spark to efficiently use a statistical method called bootstrapping to handle these computational problems and provide a generalizable solution to Bayesian updating. 

 

Lastly, we often want to run our experimentation analysis in real-time so that we can make fast decisions or to inform an n-armed bandit algorithm. We will talk about some approaches we use to decrease the computation needed in a real-time experimentation analysis environment. Although bootstrapping is more efficient than MCMC, it is still more expensive than analytic methods and can be prohibitively costly in real-time. We will talk about a couple of methods we have developed to update bootstrapped data and compare their performance with a naive method.


Speakers
avatar for Paul Cho

Paul Cho

Data Engineer, Udemy
avatar for Robert J. Neal

Robert J. Neal

Principal Software Engineer, Udemy
Software engineer who prefers Scala. Primarily working in experimentation, statistics, and reinforcement learning.


Friday November 16, 2018 12:00pm - 12:30pm PST
reactive

12:30pm PST

Lunch
Lunch and meeting new friends!

Friday November 16, 2018 12:30pm - 1:10pm PST
Commons

1:10pm PST

Getting started with EitherT
Everything you've ever wanted to know about EitherT in 20 minutes or less! Learn the basics of using Either in your microservices and how it can help to improve error handling in your code. With an understanding of how to use those, we will then discuss the Either transformer known as EitherT, and how you can make use of it.

Speakers
avatar for Julie Laver

Julie Laver

Sr. Software Engineer, Twilio
Julie Laver is a Tech Lead on the Messaging Channels team at Twilio. She has been working in Scala for the last 4 years. She is passionate about helping to make Scala approachable for all engineers. Julie is graduate of the University of Waterloo. She spends her free time traveling... Read More →


Friday November 16, 2018 1:10pm - 1:30pm PST
functional

1:10pm PST

Optimizing network topologies with monadic execution contexts
Constellation's goal is to horizontally scale blockchain protocols using a DAG (directed acyclic graph) and reputation/trust model. We’re using Scala, Akka, Cats, Algebird and Kubernetes to build an asynchronous consensus service. We’re building off HoneyBadger ACS and CHECO with inspirations from GraphX, Pregel, and PageRank to scale consensus with a reactive execution graph. Conventional validation is split into granular monadic execution contexts to distribute the workload, and network topology undergoes dynamic rebalancing. Reputation is measured both deterministically from chain data as well as through node to node labels capturing ‘influence’ on the network in a fashion similar to Twitter. This is in contrast with a typical blockchain which relies on either linear state transitions / sharding, and proof of work / proof of stake.

Speakers
avatar for Ryle Goehausen

Ryle Goehausen

VP of Engineering, Constellation Labs
Scala / Python / Spark / Machine Learning / Performance engineering. Early Databricks and Spark adopter. Working on high performance cryptocurrency and reputation modeling.
avatar for Wyatt Meldman-Floch

Wyatt Meldman-Floch

CTO, Constellation Labs
Software engineer focused on machine learning, distributed systems and functional programming. Founded Constellation Labs, applying methods distributed graph processing to distributed ledger technology. Applying combinatorial models of distributed computing.



Friday November 16, 2018 1:10pm - 1:30pm PST
reactive

1:10pm PST

High-performance functional bayesian inference in Scala
This talk will present Rainier, a high-performance open source library from Stripe for developing bayesian inference models in Scala. The talk will focus on two aspects of the library: first, the underlying computational graph, which allows mathematical functions written in idiomatic Scala to be compiled to very fast, zero-allocation JVM bytecode, along with automatic derivation of their gradients. Second, the high-level bayesian inference API built on top of this graph, which provides a familiar, immutable and composable monadic interface for specifying model priors and conditioning them on observed data.

Speakers
avatar for Avi Bryant

Avi Bryant

Engineer, Stripe
Avi has led product, engineering, and data science teams at Stripe, Etsy, Twitter and Dabble DB. He's known for his open source work on projects such as Seaside, Scalding, and Algebird. Avi currently works on Stripe's deep learning and probabilistic programming models.


Friday November 16, 2018 1:10pm - 1:50pm PST
data

1:40pm PST

How Twitter teaches Scala
Circa 2009, Twitter’s scalability issues with its Ruby on Rails platform prompted a migration to a Scala-based stack. Today, Twitter is one of the largest organizations using Scala as its main programming language to power its platform. In fact, more and more organizations are steadily adopting Scala in their business-critical applications, making Scala engineers in high demand. However, this demand hasn’t yet translated in a large pool of Scala engineers that Twitter or other companies can readily hire. Because many software engineers joining Twitter are unfamiliar with Scala, this talk will focus on how the company redesigned and updates its onboarding process so as to best ramp up its new engineering hires with regards to. Scala. We will also discuss the role that Scala plays in our continuing engineering education.

Speakers
avatar for Ivan Corneillet

Ivan Corneillet

Instructor, Twitter
Ivan is a technologist, thinker, and tinkerer, having worked on hardware design, software engineering, and everything in between. Today, Ivan is focusing on leveling up Twitter's engineering talent. Previously, he was teaching machine learning to highly motivated aspiring data science... Read More →


Friday November 16, 2018 1:40pm - 2:00pm PST
functional

1:40pm PST

Distinguishing features of production-quality data pipelines
It's now easier than ever before to execute SQL queries or a few lines of code in a notebook against massive datasets and produce exciting results on a whim. Sometimes though you might be asked to take your one off proof of concept and "productionize" it. Simply putting your query or script on a schedule might be all that's needed for the problem at hand, but if you're building for the long term, there are costly pitfalls you might face in the future with new data and changing logic. What are the foundations to build on and what are the nice to have qualities and utilities so that you can avoid the big data equivalent of emailing spreadsheets to each other? You may have heard of "lambda architecture," "immutable append only data sources", "reproducible deterministic outputs," "atomic deployments" and how nice it is for your data pipeline to have these qualities, but what are the specific benefits and in what situations are they important or not? This talk will detail various practices and principles around data pipelines which can help to avoid costly mistakes and hours lost to debugging mysteries. For the most part we'll focus on why you might put effort into certain goals that don't directly affect your immediate results. This talk is geared mainly towards scala/spark data pipelines but aims to be relevant to other kinds of data pipelines as well.

Speakers
avatar for Nimbus Goehausen

Nimbus Goehausen

Principal Data Engineer, Demandbase


Friday November 16, 2018 1:40pm - 2:00pm PST
reactive

2:10pm PST

FiloDB: Real-time, In-Memory Time Series at Massive SMACK Scale
Time series and event data is becoming huge for every business, and ingesting millions of series reliably while answering many concurrent queries from users is a huge challenge. In this talk I share the story of developing and productionizing FiloDB, an open source, in-memory time series solution built with the Scala, Akka, Kafka, Cassandra, Mesos (SMACK) stack. FiloDB is able to reliably ingest monitoring/time series data and answer tons of low latency queries at massive scale. * Why we developed our own solution after looking at Prometheus, OpenTSDB, Cassandra, etc. * Time series data model and low-latency distributed querying at scale * The benefits and challenges of off-heap, in-memory data processing at scale * Building a database for modern container environments * Challenges with scaling the Prometheus data model while remaining compatible * Persistent, recoverable data at scale with Kafka and Cassandra * Key lessons in building massively scalable, real-time, low-latency data systems

Speakers
avatar for Evan Chan

Evan Chan

Senior Data Engineer, UrbanLogiq
Evan is currently Senior Data Engineer at UrbanLogiq, where he is using Rust, among other tools, in building robust data platforms to help public servants build better communities. Evan has been a distributed systems / data / software engineer for twenty years. He led a team developing... Read More →


Friday November 16, 2018 2:10pm - 2:50pm PST
data

2:10pm PST

Scalaz Stream: Rebirth
Scalaz-stream introduced the Scala ecosystem to purely functional streams. Although widely deployed in production at Verizon, SlamData, and elsewhere, the last release of scalaz-stream was October 23rd, 2016, and newer projects such as FS2, Monix Iterant, and others have risen to fill the gap, each addressing different use cases in the functional programming community. In this talk, John A. De Goes, architect of the Scalaz 8 effect system, unveils a new design for a next-generation version of scalaz-stream. Designed to embrace the powerful features of the Scalaz 8 effect system—features not available in any other effect type—this design features higher-performance, stronger guarantees, and far less code than previous incarnations. You are invited to discover the practicality of functional programming, as well as its elegance, beauty, and composability, in this unique presentation available exclusively at Scale By the Bay.

Speakers
avatar for John A. De Goes

John A. De Goes

Solution Architect, De Goes Consulting
John A. De Goes has been writing Scala software for more than eight years at multiple companies, and has assembled world-renowned Scala engineering teams, trained new developers in Scala, and developed several successful open source Scala projects.Known for his ability to take very... Read More →
avatar for Itamar Ravid

Itamar Ravid

Software Consultant, Independent
Itamar is a freelance software engineer. He’s been working with all facets of software development for over a decade - from data infrastructure, through CI/CD processes to backend development. His current interests include in microservice architectures and stream processing systems... Read More →


Friday November 16, 2018 2:10pm - 2:50pm PST
functional

2:10pm PST

Nelson: Functional programming in system design
As functional programmers we work hard to keep our code compositional and modular. Unfortunately, these noble pursuits often stop with our code, the resulting program flung over the wall to be deployed in complex and prescriptive systems. In this talk we will look at Nelson, a deployment orchestration system that applies functional programming not only in its implementation, but also in its system behavior in managing the messy world of deployment infrastructure. Where Free algebras and streams allow Nelson to be extended to support different VCS platforms, schedulers, and health checkers, Nelson's emphasis on immutable deployments inform system-level decisions such as deployment workflows and service discovery. Having been run successfully at different companies under different configurations, Nelson serves as a prime example of how functional programming can help not just with the software we write, but also with the systems we maintain.

Speakers
avatar for Adelbert Chang

Adelbert Chang

Lead Data Engineer, Target
Adelbert Chang is a Lead Data Engineer at Target where he works on infrastructure systems for the Data Science and Optimization team. Previously he worked at U.C. Santa Barbara doing research in large-scale graph querying and modeling, and in industry on machine learning systems... Read More →


Friday November 16, 2018 2:10pm - 2:50pm PST
reactive

3:00pm PST

Labels to Inference: A Continuous Sentiment Pipeline
At Zignal Labs we track brand health for major brands and organizations such as Nvidia, Airbnb and IBM. A critical aspect of this is understanding the polarity of conversations as they unfold in real time across social and traditional media sources. We will discuss how we pulled together AWS services including Mechanical Turk, Code Pipeline, Lambda, and Sagemaker to deliver a complete sentiment solution that better fits our customer use case and provides transparency into our sentiment quality. We will dive into how these services can work together to enable continuous delivery and continuous model retraining providing both system reliability and rapid inclusion of new labeled data to preserve model quality as conversations shift.

Speakers
avatar for Jeff Fenchel

Jeff Fenchel

Software Engineer, Zignal Labs
Jeff is a software engineer and data enthusiast passionate about enablement through continuous pipelines, developing and taming distributed systems, streaming data pipelines, and NLP through continuous measurement and testing. He recently started applying his platform and devops... Read More →


Friday November 16, 2018 3:00pm - 3:20pm PST
data

3:00pm PST

Scala the Cloud Native Way: Lessons Learned from Two Years of Linkerd in Production
Linkerd, an open source service mesh built on Scala, Finagle, and Netty, has seen steady adoption since its inception in 2016, and is now used in production by companies like Salesforce, CreditKarma, and FOX. In this talk, we describe lessons learned in adapting these technologies to the cloud native ecosystem. We discuss how some of Finagle's core components, like Vars and Activities, are used to provide robust and dynamic behavior in Linkerd. Finally, we explore some of the operational challenges of using the JVM in cloud native environments, which have radically different constraints than those which the underlying technologies were designed for.

Speakers
avatar for Dennis Adjei-Baah

Dennis Adjei-Baah

Software Engineer, Buoyant Inc
Dennis is a software engineer at Buoyant Inc, where he contributes to and provides support for the open source project Linkerd, an L7 proxy that provides a dedicated service mesh layer in microservice environments. Linkerd enables applications to take advantage of features like circuit-breaking... Read More →


Friday November 16, 2018 3:00pm - 3:20pm PST
reactive

3:00pm PST

Play on Dotty: Design Patterns unlocked by Dotty in a Play look-alike demo project
The Dotty compiler currently supports some really awesome features of Scala 3.0 such as implicit functions and trait parameters. In this talk I will discuss what I think are the best parts of Scala's functional programming and object-oriented features and how Dotty emphasizes better cohesion and simplification of these paradigms with implicits. FP brings the notion that functions are just objects, but Scala brings to FP the notion that classes are just higher-order functions that produce modules. With this insight, we can see how you can write extremely modular code without exposing too much detail or hard-coding your modules. I will explore the design patterns implicit parameters / functions versus and constructor injection and when to use one or the other. I will also explore how this could look in a strawman iteration on the Play Framework's architecture (assuming it took these design patterns to heart). This will be an interesting talk if you are interested in Scala 3, implicit functions, ideas for how the Play Framework could evolve, or the sweetspot between OOP and FP that Scala enables.

Speakers
avatar for Jeff May

Jeff May

Principal Software Engineer, Rally Health
Veteran Scala Programmer, Rust Enthusiast, Activist, and Drummer. I love talking about philosophy, science, and politics. My passion is building systems that expand our capacity as humans to serve each other and the planet.


Friday November 16, 2018 3:00pm - 3:35pm PST
functional

3:30pm PST

Scio data processing nirvana at Spotify
Two years ago, Neville introduced Scio, an open-source Scala framework to develop data pipelines and deploy them on Google Dataflow. In this talk, we will discuss the evolution of Scio, and share the highlights of running Scio in production for two years. We will showcase several interesting data processing workflows ran at Spotify, what we learned from running them in production, and how we leveraged that knowledge to make Scio faster, and safer and easier to use.

Speakers
avatar for Bram Leenders

Bram Leenders

Infrastructure Engineer, Spotify
Bram is an infrastructure engineer at Spotify, working on some of the core backend services and event delivery pipelines.
avatar for Julien Tournay

Julien Tournay

Data Engineer, Spotify


Friday November 16, 2018 3:30pm - 3:50pm PST
data

3:30pm PST

Orchestrating Microservices with GraphQL
Intuit is at the half way mark on its multi-year journey to decompose its Quickbooks online platforms into micro services connected through a single Graph. API access is kept simple by decomposing & orchestrating complex requests at the entry point, resulted in improved developer productivity.

Speakers
avatar for Greg Kesler

Greg Kesler

Principal Sofware Engineer, Intuit
Greg is an tech leader at Intuit Developer’s Group focusing on building APIs, SDKs and tools for Intuit and partner developers. Since Intuit has started its journey to decompose monoliths to the micro services, Greg has been a game changer to help the company to build request orchestration... Read More →


Friday November 16, 2018 3:30pm - 3:50pm PST
reactive

3:40pm PST

Concurrency with Cats-effect
The Cats Effect library recently reached its 1.0 release, providing powerful data types and type classes for purely functional effectful programming. In this talk, we’ll focus on building programs using fiber based concurrency and the synchronization primitives provided by Cats Effect. We’ll see how FS2 (Functional Streams for Scala) uses such primitives to build more advanced concurrent data types like bounded and unbounded queues. Finally, we’ll see how to apply these techniques to application design.

Speakers
avatar for Michael Pilquist

Michael Pilquist

Distinguished Engineer, Comcast
Michael Pilquist is the author of Scodec, a suite of open source Scala libraries for working with binary data, and Simulacrum, a library that simplifies working with type classes. He is a committer/maintainer on a number of other projects in the Scala ecosystem, including Cats and... Read More →


Friday November 16, 2018 3:40pm - 4:15pm PST
functional

3:55pm PST

Structured Deep Learning with Probabilistic Neural Programs
Machine learning problems with structured output spaces, such as generating from a context-free grammar, are difficult to represent in current deep learning frameworks. These frameworks let a user specify the neural network architecture for scoring a single output, but not the output space itself, which makes structured prediction difficult to express. In this talk, we describe probabilistic neural programs, an open-source Scala library for structured deep learning that lets a user specify both the architecture and output space in a simple, elegant form. This framework lets users rapidly implement, train, and run a variety of state-of-the-art structured prediction models that would be difficult or impossible to implement with other tools. We’ll demonstrate how the framework can be used to tackle an example structured prediction problem.

Speakers
avatar for Jayant Krishnamurthy

Jayant Krishnamurthy

Semantic Machines
Jayant is a scientist whose research focuses on machine learning techniques for natural language understanding and dialogue. He develops complex structured prediction models and the infrastructure for representing them effectively. He received his Ph.D. in Computer Science from Carnegie... Read More →


Friday November 16, 2018 3:55pm - 4:25pm PST
data

3:55pm PST

Data Consistency Patterns in Cloud Native Applications
Cloud Native Architectures require each service to be loosely coupled and not share data.  This makes ensuring data consistency at the data layer a difficult challenge.  An entire generation of NoSQL databases advocated eventual consistency which greatly compounded the problem.  To overcome these problems a number of patterns have emerged, the most popular of which is the Distributed Saga pattern. I will explore the complications with these patterns and show alternatives that instead rely on the database for enforcing the consistency. I will talk about the alternative patterns that provide a new solution for this problem that ensures application invariants.
 

Speakers
avatar for Ryan Knight

Ryan Knight

Principal Software Architect / CEO, Grand Cloud
Ryan Knight is Principal Solution Architect at Grand Cloud. He is a passionate technologist with extensive experience in large scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →


Friday November 16, 2018 3:55pm - 4:25pm PST
reactive

4:30pm PST

Distributed Deep Learning with Horovod
Abstract: Learn how to scale distributed training of TensorFlow and PyTorch models with Horovod.
Frameworks like TensorFlow and PyTorch make it easy to design and train deep learning models. However, when it comes to scaling models to multiple GPUs in a server, or multiple servers in a cluster, difficulties usually arise.
In this talk, you will learn about Horovod, a library designed to make distributed training fast and easy to use, and will see how to train a model designed on a single GPU on a cluster of GPU servers. 

Speakers
avatar for Alex Sergeev

Alex Sergeev

Sr Software Engineer II, Uber Technologies, Inc.
Deep Learning Infrastructure @Uber. Author of http://horovod.ai


Friday November 16, 2018 4:30pm - 5:00pm PST
data

4:30pm PST

Fireside Chat with Richard Socher, Chief Scientist, Salesforce
Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit
avatar for Richard Socher

Richard Socher

Chief Scientist, Salesforce


Friday November 16, 2018 4:30pm - 5:00pm PST
functional

4:30pm PST

Stream Processing at Lyft with Flink and Beam
Real-time access to data has become increasingly important to a wide range of companies. At Lyft, it's crucial for balancing supply and demand, setting prices, detecting fraud, and many other use cases. In this talk we will cover the basics of stream processing and how we're using Apache Flink and Apache Beam to let any engineer write real-time processing pipelines.

Speakers
avatar for Sherin Thomas

Sherin Thomas

Software Engineer, Lyft
Machine Learning Infra
avatar for Micah Wylde

Micah Wylde

Software Engineer, Lyft


Friday November 16, 2018 4:30pm - 5:00pm PST
reactive

5:10pm PST

Panel II: Data Engineering and AI
AI and Machine Learning is more and more an integral component of data pipelines, with model deployment emerging as a new area of devops/engineering/analytics.  The tooling for AI development in production is just emerging, and there are exciting startups and established companies leading the way.  This panel will cover the field.

Moderators
avatar for Lukas Biewald

Lukas Biewald

Founder, Weights and Biases

Speakers
avatar for Michelle Casbon

Michelle Casbon

Michelle Casbon is a Senior Engineer on the Google Cloud Platform Developer Relations team, where she focuses on open source contributions and community engagement for machine learning and big data tools. Prior to joining Google, she was at several San Francisco-based startups as... Read More →
avatar for Chris Fregly

Chris Fregly

Founder, PipelineAI
Chris Fregly is Founder at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly... Read More →
avatar for Stepan Pushkarev

Stepan Pushkarev

CTO, Hydrosphere.io
hydrosphere.io CTOAutomation of AI/ML Operations: deployment, serving, monitoring, subsampling, retraining.
avatar for Pete Skomoroch

Pete Skomoroch

Head of Data Products, Workday
Peter is Co-Founder and CEO of SkipFlag, which was acquired by Workday in 2018. Skipflag's technology uses your existing conversations, support tickets, and other communication to automatically build and update an enterprise knowledge base. It understands the people, topics, and facts... Read More →
avatar for Richard Socher

Richard Socher

Chief Scientist, Salesforce


Friday November 16, 2018 5:10pm - 6:00pm PST
functional

6:00pm PST

Happy Hour II
Our famous happy hour caps the day with excellent food and drinks and great conversation.

Friday November 16, 2018 6:00pm - 8:00pm PST
Commons
 
Saturday, November 17
 

8:00am PST

Breakfast and Welcome
Hot breakfast and uninterruptible coffee!

Saturday November 17, 2018 8:00am - 9:00am PST
Commons

9:00am PST

Keynote III: Mind Your State for Your State of Mind

Abstract: Applications have had an interesting evolution as we've moved into the distributed and scalable world. Similarly, storage and its cousin databases have changed side-by-side with applications. Many times, the semantics, performance, and failure models of the storage and applications do a subtle dance as they change in support of changing business requirements and environmental challenges. Adding scale to the mix has really stirred things up. This talk and paper look at some of these issues and their impact on our systems. 

Bio: Pat Helland has been building databases, distributed systems, messaging systems, transactional systems, application platforms, big data systems, and multiprocessors since 1978. His employers have included Tandem Computers, Microsoft, and Amazon. Pat attended UC Irvine and was a recipient of the UCI Information and Computer Science Hall of Fame Award (even though he dropped out). For recreation, Pat writes regular articles for the Communications of the ACM. He is employed by Salesforce.


Speakers
avatar for Pat Helland

Pat Helland

Software Architect, Salesforce


Saturday November 17, 2018 9:00am - 9:40am PST
functional

9:50am PST

Introduction to Apache Spark with Frameless
If you're interested in Spark, but you're certain you'll hate it because it's not as type-safe as you'd like, let's see if Frameless can change your mind.

Speakers
avatar for Brian Clapper

Brian Clapper

Principal Instructor and Application Engineer, Databricks, Inc.
Brian has more than 30 years' software development experience in a variety of languages and application domains. Lately, as a Databricks employee, he has been concentrating Apache Spark.


Saturday November 17, 2018 9:50am - 10:30am PST
data

9:50am PST

Effective Scala
Scala is a flexible language that enables many programming styles. While its un-opinionated design fosters innovation and experimentation in the community, the choices it offers places a burden on its users to figure out how best to use the language. This talk will be an opinionated recommendation of how to apply Scala in real world projects. We will claim that the most effective way to use Scala is not as a better Java, nor as Haskel on the JVM, but as a "third way" that best fits Scala's design. We'll give you practical guidelines that will make you a more effective Scala programmer.

Speakers
avatar for Frank Sommers

Frank Sommers

President, Autospaces, Inc
Frank is founder and president of Autospaces, Inc., a company specializing in providing workflow and decision support systems for the auto finance industry.
avatar for Bill Venners

Bill Venners

Principal, Artima
Bill Venners is president of Artima, Inc., publisher of Scala consulting, training, books, and developer tools. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality... Read More →


Saturday November 17, 2018 9:50am - 10:30am PST
functional

9:50am PST

Quantum Computing and You
Quantum Computing exploits quantum phenomena such as superposition and entanglement to realize a form of parallelism that is not available to traditional computing. It offers the potential of significant computational speed-ups in quantum chemistry, materials science, cryptography, and machine learning. With no prior knowledge of condensed matter physics or advanced mathematics, it's hard to separate hype from reality. Come learn the fundamentals of Quantum Computing, get a sense of where it applies and what it can do, and the challenges being faced to make it a reality. Oh, and we'll write a little quantum computing code too! John works on the Quantum Computing team at Microsoft Research, where he leads the development of Q# - a new language for Quantum Computation.

Speakers
avatar for John Azariah

John Azariah

Principal Architect, Microsoft


Saturday November 17, 2018 9:50am - 10:30am PST
reactive

10:40am PST

Motivating Probabilistic Programming
proml: Probablisitc ML in Scala

Deep Learning is so yesterday. Probablisitc Programming is the new kid on the block..
  • Come and learn what probabilistic Programming is all about
  • How to do Typesafe Probablisitc Programming in Scala
  • Live Demo


Speakers
avatar for Rahul Chitturi

Rahul Chitturi

Principal Software Engineer, Coatue


Saturday November 17, 2018 10:40am - 11:00am PST
data

10:40am PST

Shapeless Party Tricks in the Enterprise
Shapeless is a powerful library for strongly-typed generic programming in Scala. Numerous Scala libraries are using Shapeless to provide type class derivation and elegant functional APIs for JSON, binary protocols, test data generation, Spark Datasets, and even JDBC (an API which is now of legal drinking age). While Shapeless is a solid foundation for library authors to build upon, it can also provide type-safe solutions for one-off use-cases. Without deep-diving into the details of Shapeless, I'll share a few bite-size examples of Shapeless-based solutions that feel a bit like type-level party tricks but were used to solve real problems while working on data engineering and WebSocket-based protocols at Fortune 500 companies.

Speakers
avatar for Cody Allen

Cody Allen

Machine Learning Engineer, Salesforce
I'm generally interested in functional programming. Most of my FP experience is in Scala, and I'm always happy to talk about Cats and Typelevel projects.


Saturday November 17, 2018 10:40am - 11:00am PST
functional

10:40am PST

Structure and Interpretation of Stream Processing
In recent years, stream processing systems have become the de-facto standard for processing a large and growing volume of data from different data sources and in computing interesting insights. Hence, stream processing has become a widely researched area as it poses various challenges with respect to performance, programming abstractions, consistency guarantees, fault-tolerance, resiliency and so on. Different stream processing libraries (e.g., Iteratee, Pipes, Monix, Scalaz-Stream, Akka Streams) have been evolved to address these issues differently and with varying priorities. Also, a wide range of industry-scale distributed stream processing systems has been introduced such as Flink, Heron, Spark Streaming, Samza and so on. 

In this comparative case study, we particularly focus on state-of-the-art distributed stream processing systems (such as Flink, Heron, Spark Streaming, Samza, etc) present different approaches to these systems. In particular, while basing foundation on the commonalities among these approaches and their primary constructs, we build common ground to interact with their semantics. Afterward, we analyze the differences among different approaches, where they excel and where they fall short compared to the others, in addressing the core issues of stream processing. Also, we take a look at how the streaming landscape has evolved in the last two decades, its notable trends and future research directions. 

As an audience, from the talk, I would get a comparative overview of varying approaches to (distributed) stream processing, their programming models, execution models and notable properties that potentially can help in differentiating them and selecting the right tool while enable us to take into account accommodate the unique characteristics and trade-offs of the system into account.

Speakers
avatar for Adil Akhter

Adil Akhter

ML Engineering Lead, ING
Adil Akhter is a Functional Programmer and ML Engineering Lead at ING, building an ML Platform. He is passionate about technology and loves functional programming, mathematics, machine learning, etc.


Saturday November 17, 2018 10:40am - 11:20am PST
reactive

11:10am PST

Applied Machine Learning: a Netflix production
Applied Machine Learning is about as mature as Software Engineering circa 1998. For Data Scientists, it’s hard to collaborate, hard to be productive and hard to deploy to production. In the last 20 years, Software Engineers have become far more collaborative thanks to tools like git, far more productive thanks to cloud computing and far more effective at delivering quality software thanks to CI/CD and agile development practices. At Netflix, I get to work on problems like: how do we scale Data Science innovation by making collaboration effortless? How do we enable Data Scientists to single-handedly and reliably introduce their models to production? How do we make it easy to develop ML models that humans trust? More importantly, how do we use ML to make humans BETTER? In this talk, we’ll explore how Netflix is approaching these problems to further our mission of creating joy for our 125 Million+ members worldwide!

Speakers
avatar for Julie Pitt

Julie Pitt

Director, Machine Learning Infrastructure, Netflix
Julie leads the Machine Learning Infrastructure at Netflix, with the goal of scaling Data Science while increasing innovation. She previously built streaming infrastructure behind the "play" button while Netflix was transitioning from domestic DVD-by-mail service to international... Read More →


Saturday November 17, 2018 11:10am - 11:30am PST
data

11:10am PST

Duality and How to Delete Half (minus ε) of Your Code
There’s a prefix that shows up a lot in Haskell: “co-”. There are “comonads” and “coalgebras” and “covariant functors” … wait a second, that last one means something different than the others. But what, and how? I’ll explain the concept of duality (at least in category theory, you’re on your own for metaphysics) and distinguish it from often confused concepts like variance and isomorphisms. Duality often seems too abstract to be useful, but it can help us in a variety of ways, and we can take advantage of it to simplify (and even eliminate) writing certain kinds of code. While not all the components exist in Scala yet, we’ll discuss what is there and how it can be used to automate building some useful constructions. I am porting a tool for programmatically generating dual constructions that I wrote in Haskell (https://github.com/sellout/dualizer) to Scala. It can hopefully both reduce the amount of code you write and give you a new way to explore category theory.

Speakers
avatar for Greg Pfeil

Greg Pfeil

Senior Software Engineer, Formation
Greg has been working full-time with pure FP in Haskell and Scala for over six years. He currently abuses laziness for Formation, to extract efficient evaluation from exponential algorithms. He’s also known for inflicting recursion schemes on everyone and designing languages that... Read More →


Saturday November 17, 2018 11:10am - 11:30am PST
functional

11:40am PST

Hadoop Future in AI World
Paraphrasing Yogi Berra, “The future of Hadoop is not what it used to be. The times are different. Not necessarily worse or better. They are just different.” In this talk, I would give a brief survey of the future of Hadoop as it used to be from 2005 to 2015, based on the roadmaps of several developers, evangelists (including yours truly), vendors, and analysts over those years, to give a flavor of how the future of Hadoop has changed over the years. I would try to analyze the reasons behind this change, primarily due to emergence of public cloud and rapid response by traditional analytics ecosystem. As hype around “Big Data” seems to subside, it is being replaced by yet another hype about “AI”. I would prognosticate about how the Hadoop ecosystem would evolve to power current and future AI use cases.

Speakers
avatar for Milind Bhandarkar

Milind Bhandarkar

Founder, Ampool
Milind Bhandarkar was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system. Parallel programming languages and paradigms has been his area of focus for over 20 years. He worked at several HPC companies, Yahoo... Read More →


Saturday November 17, 2018 11:40am - 12:20pm PST
data

11:40am PST

Radix Trees: How IntMap Works
The Radix Tree (aka PATRICIA Trie) is an efficient data structure for key-value maps with integral keys. Used in a range of applications (including the Linux kernel), the Radix Tree is particularly relevant to functional programmers because it has an efficient persistent version for use in purely functional code. Haskell's widely used IntMap type is a Radix Tree under the hood. With code and diagrams, I'll walk you through how Radix Trees work, what properties they have and how they can be used effectively. I'll finish with a look towards the future and the Adaptive Radix Tree, a recently published variation on the normal Radix Tree with some real potential. While the code for this talk will be Haskell-flavored, the ideas are language-agnostic.

Speakers
avatar for Tikhon Jelvis

Tikhon Jelvis

Principal AI Scientist, Target
I picked up Haskell as my first functional language on a whim, and it's stuck with me ever since. I've worked with other functional languages too—a compiler in Racket, a backend service in OCaml—but now I'm back in the Haskell world, working on Target's supply chain optimization... Read More →


Saturday November 17, 2018 11:40am - 12:20pm PST
functional

11:40am PST

Leveraging Scala to Build Hardware at Scale
The hardware industry needs a fundamentally different approach to keep up with the new compute needs that are required for new applications such as IoT, edge computing, machine learning, and artificial intelligence. In an era where transistor scaling has stopped, the world will need lots of custom hardware to fulfill these new compute requirements. Unfortunately, increasing developer productivity has taken a back seat in the hardware industry. In this talk, I'll show how we're leveraging Scala to build hardware productively at a high-level. I'll present our full chip development stack, which is all written in Scala. The full chip stack consists of the Chisel hardware construction domain-specific language, the Diplomacy framework for parameter negotiation, and the FIRRTL compiler that turns Chisel circuits into Verilog netlists. With our full chip stack, a hardware engineer can express a complex modern SoC (system-on-chip) with less than 30 lines of statically type-checked Scala code!

Speakers
avatar for Yunsup Lee

Yunsup Lee

CTO, SiFive
Yunsup is SiFive’s Chief Technology Officer and co-founder. Yunsup received his PhD from UC Berkeley, where he co-designed the RISC‑V ISA and the first RISC-V microprocessors with Andrew Waterman, and led the development of the Hwacha decoupled vector-fetch extension. Yunsup also... Read More →


Saturday November 17, 2018 11:40am - 12:20pm PST
reactive

12:20pm PST

Lunch
Lunch and meeting friends

Saturday November 17, 2018 12:20pm - 1:10pm PST
Commons

12:30pm PST

Unconference
Sign up at https://chief.sc/unconference2018
Follow #scalesf for updates!

We begin at lunchtime

Speakers
avatar for Jon Pretty

Jon Pretty

Software Engineer, Propensive


Saturday November 17, 2018 12:30pm - 5:00pm PST
unconference

1:10pm PST

Quantum Computing Modeling in Scala

Looking at it with a computing mindset, quantum state is not that different from classical or probabilistic state. In this presentation we will show a common abstraction that captures the similarities and differences in representing and evolving classical, probabilistic and quantum state. Concrete scenarios, like managing account balances, portfolio allocation, Bayesian inference and quantum computing simulation are used as examples, with running code. A particular type of monadic transformation ties all these use-cases together.

The presentation touches upon the implementation of all 4 quantum postulates (state representation, evolution, measurement and composition) and visualizes them using biased coins, dice and complex histograms. We will also show a simple application of quantum computing: counting the number of binary words of a fixed length with no consecutive ones. This is, of course, a typical interview question about Fibonacci numbers. 

https://github.com/logicalguess/quantum-scale/blob/master/docs/QuantumScala.pdf



Speakers
avatar for Constantin Gonciulea

Constantin Gonciulea

Distinguished Engineer, JPMorgan Chase
After doing math for most of my youth, at some point I switched to computing. I have mostly enjoyed doing application architecture, concurrent programming and distributed computing. Most recently I have turned back to math, or rather to a combination of math and computing by working... Read More →



Saturday November 17, 2018 1:10pm - 1:30pm PST
reactive

1:10pm PST

Tensorflow and Swift
Tensorflow and Swift

We will discuss the history of neural network acceleration libraries (Tensorflow/Pytorch in particular), the current state of hardware/software integration (TPU/Volta in particular) and then look at where the industry is headed (LLVM+Functional Programming/Swift in particular).

Speakers
avatar for brett koonce

brett koonce

cto, quarkworks
brettkoonce.com


Saturday November 17, 2018 1:10pm - 2:00pm PST
data

1:10pm PST

Fork It Harder Make It Better
The case for forking the Scala development toolchain - and the case against it. Scala has the reputation of being hard to write tooling for, yet it is a vital part of the development experience. I want to present an overview of existing tools from writing code to building, testing and deploying it, how they are lacking, where better solutions exist outside of Scala Land and how we can improve it.

Speakers
avatar for Justin Kaeser

Justin Kaeser

Software Developer, JetBrains
Justin believes in "Tools before Rules": automating the development toolchain to remove the pain of dealing with institutional processes. At day he works on this goal as part of the IntelliJ Scala plugin team. At night he goofs off.


Saturday November 17, 2018 1:10pm - 2:00pm PST
functional

1:40pm PST

Journey of Building a Modern Data Prep Tool on Top of Apache Spark
Apache Spark is designed to be extensible and pluggable, offering much flexibility in how the system can be used. In this talk, we show how we utilize Spark to build the data preparation engine that powers Workday Prism Analytics. Our data prep engine runs two types of Spark applications: one that is “always on” to serve interactive data prep queries, and another that is “on demand” to perform batch processing of data pipelines. We demonstrate how Spark and Catalyst made it possible to have these two types of applications share much of the same code, differing only in sampling, caching, and result extraction. Further, we illustrate how our engine today takes advantage of Spark SQL and Catalyst to generate DataFrames/Datasets optimized for our use cases, and relies on Tungsten to facilitate codegen on 100+ custom library functions we expose to our users. We also describe how we leverage the Data Sources API to implement partition elimination and incremental data analysis on top of various file formats.

Speakers
avatar for Jianneng Li

Jianneng Li

Software Engineer, Workday
Jianneng is a software engineer specializing in distributed systems and data processing. He works at Workday on Prism Analytics, leveraging Apache Spark to build an end-to-end data analytics solution that helps businesses better understand their financial and HR data.


Saturday November 17, 2018 1:40pm - 2:00pm PST
reactive

2:10pm PST

Adding Custom Optimizations to Catalyst by Example with the DSE Spark Connector
Learn how the Datastax Spark Connector is wiring directly into Spark Internals to bring even more speed to users automatically! Find out how Catalyst actually interacts with Data Sources and the key locations which require modification in order to introduce custom behavior. Find out how writing a new Strategy and Execution nodes for catalyst actually works in practice! Come learn about our most recent optimizations and how they can directly benefit you or pick up some tips about writing your own custom optimizations!

Speakers
avatar for Russell Spitzer

Russell Spitzer

Software Engineer, DataStax
Spark, Cassandra, or Dogs.


Saturday November 17, 2018 2:10pm - 2:50pm PST
data

2:10pm PST

Scala.js in production
Unlike simple apps, where technology choices rarely matter, for complex apps, those choices can be crucial in getting the desired maintainability, agility, and performance. Our startup helps students learn Indian Classical Music with a modern approach by offering tools such as a composition editor, music transcriber, and real-time pitch/beat accuracy feedback provider right in the web browser. We love Scala and use it for backend and frontend development. With some effort, we were able to keep the code simple, make development a joyful experience, and extract high performance. In this talk, share our experiences in choosing frontend frameworks, client-server communication, use of WebAssembly for performance critical portions, and so on.

Speakers
avatar for Ramnivas Laddad

Ramnivas Laddad

CEO and Co-founder, Paya Labs
Ramnivas is a technologist, author, and presenter who is passionate about doing software right. He has been leading innovation in Spring Framework and Cloud Foundry since their beginning. Ramnivas has led a group in Cloud Foundry and started the Spring Cloud project. Ramnivas is the... Read More →


Saturday November 17, 2018 2:10pm - 2:50pm PST
functional

2:10pm PST

Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
Abstract: Kafka Streams, Apache Kafka’s stream processing library,
allows developers to build sophisticated stateful stream processing
applications which you can deploy in an environment of your choice.
Kafka Streams is not only scalable, but fully-elastic allowing for
dynamic scale-in and scale-out as the library handles state migration
transparently in the background. By running Kafka Streams applications
on Kubernetes, you will be able to use Kubernetes powerful control
plane to standardize and simplify the application management – from
deployment to dynamic scaling.

In this technical deep dive, we’ll explains the internals of dynamic
scaling and state migration in Kafka Streams. We’ll then show, with a
live demo, how a Kafka Streams application can run in a Docker
container on Kubernetes and the dynamic scaling of an application
running in Kubernetes.

Speakers
avatar for Matthias Sax

Matthias Sax

Software Engineer, Confluent Inc.
Matthias is a Kafka committer and software engineer at Confluent working on Kafka’s Streams API. Prior to Confluent, he was a P.h.D. student at Humboldt-University of Berlin, conducting research on data stream processing system. Matthias is also a committer at Apache Flink and Apache... Read More →
avatar for Gwen Shapira

Gwen Shapira

Confluent


Saturday November 17, 2018 2:10pm - 2:50pm PST
reactive

3:00pm PST

MLflow: An open platform to simplify the machine learning lifecycle
Successfully building and deploying a machine learning model can be difficult to do once. Enabling other data scientists (or yourself, one month later) to reproduce your pipeline, to compare the results of different versions, to track what's running where, and to redeploy and rollback updated models is much harder. in this talk, I'll introduce MLflow, a new open source project from Databricks that simplifies this process. MLflow provides APIs for tracking experiment runs between multiple users within a reproducible environment, and for managing the deployment of models to production.

Speakers
avatar for Aaron Davidson

Aaron Davidson

Software Engineer, Databricks


Saturday November 17, 2018 3:00pm - 3:40pm PST
data

3:00pm PST

FP Scala Meat & Potatoes: HTTP, JSON, & SQL with http4s, Circe, & Doobie
It's great to talk about free monads vs. finally-tagless, monads vs. applicatives, fixpoints and recursion schemes, category theory, etc. but does pure FP in Scala have anything to offer those of us who just need to receive some JSON, talk to a SQL database, and return some JSON? Come for an (almost) jargon-free look at why pure FP matters even in such meat-and-potatoes tasks, and for some hints as to how this relates to some of the more esoteric uses of FP some of your colleagues may be up to, or you may want to pursue later.

Speakers
avatar for Paul Snively

Paul Snively

Sr. Software Engineer, Formation
I've been a language nut my whole life. Common Lisp, Scheme, Oz, OCaml, Haskell, and Scala all have a home in my heart for different reasons. I've been fortunate enough to have worked with Apple, AOL, Virgin, VMware, Intel, Verizon, and Formation, among others. I've spoken at Strange... Read More →


Saturday November 17, 2018 3:00pm - 3:40pm PST
functional

3:00pm PST

Simplicity for Programmable Money
Imagine taking your favourite functional programming language and removing recursion, recursive types, and even removing function types. What do you have left? Everything you need for programmable money! Simplicity is a new typed, combinator-based, functional language without loops and recursion, designed to be used within crypto-currencies and blockchain applications. Simplicity comes with formal semantics defined using Coq, a popular, general purpose software proof assistant based on dependent type theory. In this presentation I will describe the Simplicity language. I will demonstrate how to write some small programs in Simplicity and show how to use prove that they behave correctly with some live coding.

For more information on Simplicity see https://github.com/ElementsProject/simplicity.

Speakers
avatar for Russell O'Connor

Russell O'Connor

Software Developer, Blockstream
Lazy functional programing a la Haskell.Developed lens-family and mezzolens Haskell librariesDependently typed programming and proofs a la Coq.Worked on Galois theory proofs for the verification of the Feit-Thompson theorem.Running NixOS on my laptop since 2010.Find my secret blog... Read More →


Saturday November 17, 2018 3:00pm - 3:40pm PST
reactive

3:45pm PST

Understanding World food economy with satellite images
It has become possible to observe the growing process from satellites daily at a global scale. Based on it we can identify and share agriculture-specific signals (insights) like - presence of farming activity, presence of irrigation systems, crop classification and productivity assessment. A pipeline starts with a set of images specifically designed for daily monitoring the growth of commodity crops: corn, soybean, rice and wheat. To process this data we use our processing and delivery system with ML (boosting) used for understanding vegetation patterns and AI for scaling the models on other climate zones.

Speakers
avatar for Aleksandra Kudriashova

Aleksandra Kudriashova

Senior Software Engineer, Astro Digital
MS in Computer Science from MIPT and Skoltech (Russia) and MIT (US). Head of Data at Astro Digital - Satellite Mission as a Service company. Aleksandra has designed and developed the high-level product vision, data processing infrastructure and use of data workflows.Previously she... Read More →


Saturday November 17, 2018 3:45pm - 4:20pm PST
data

3:45pm PST

Effects types in Scala - how to choose one
The Scala community has been interested in representing asynchronous computations through the type system for a long time. While Scala provides you with a well-supported Future implementation, both the Typelevel and Scalaz communities are working towards their own implementation of an IO. This leads to a lot of friendly competition between these libraries, leading to major improvements of their performance. This talk will present the current options available to build programs and manage effects in the type system, their difference from a developer perspective, their usage in the wider community, and how one can decide which implementation to use.

Speakers
avatar for Alexandre Bergeron

Alexandre Bergeron

Software Engineer


Saturday November 17, 2018 3:45pm - 4:20pm PST
functional

3:45pm PST

Street-fighting techniques for multi-tenant machine learning and big data workloads on Kubernetes
We like to think that Machine Learning and Big Data are all about harnessing our creativity and intelligence to solve extremely difficult problems. Yet, we spend only a small part of our time and budget on actually analyzing our data or building powerful models. Instead, we are forced to scale and optimize complex infrastructure, so that it can handle the jobs we will create. Unlike scaling web servers or stateless services, Big Data and Machine Learning workloads force us to be very conscious of things like GPU availability and data locality. How can we spend more time on the interesting part of the problem when our infrastructure is so complex? How can we spend most of our time in Spark analyzing our data, instead of allocating our executors? How can we spend most of our time in TensorFlow creating the most interesting deep network, rather than allocating workers to optimize GPU usage? At MapR, we have seen that Kubernetes can change things radically. We believe that Kubernetes is set to revolutionize how we create and run Big Data and Machine Learning applications. With recent enhancements, Kubernetes has reached a point where much of the traditional pain in creating these complex applications can be avoided. In this talk, we will demonstrate two examples of how to use Kubernetes to simplify your workload. The first is Spark-on-Kubernetes. We will describe the traditional challenges of building an analytics application using Spark. We will demonstrate dynamically building a large spark cluster for a specific analytics job. We will discuss best practices for handling data locality in the Kubernetes world. We will explore launching many Spark clusters in a shared environment and the various tricks for making this work on Kubernetes at scale. The second demonstration will use of Kubeflow, an implementation of TensorFlow for Kubernetes to train and serve deep learning models. We will explore topics like GPU reservation and scheduling. We will discuss and examine the various challenges we have in building an application that uses machine learning at scale in Kubernetes. We will demonstrate running KubeFlow in a multi-tenant cloud based environment.

Speakers
avatar for Sky Thomas

Sky Thomas

Engineer/Architect, MapR


Saturday November 17, 2018 3:45pm - 4:20pm PST
reactive

4:25pm PST

Edge ain't your gramp's IoT: design an implementation of a modern Edge Computing Platform
Abstract: Connecting IoT devices to the Internet is not new but
deploying and running real-time edge apps at hyperscale using these
devices is. IoT is making the world cyber-physical, making computing
ubiquitous, and making cloud-native apps live life on the edge forever
unshackled from the confines of a datacenter. Edge Computing evolves
Cloud Computing by keeping what's great about the Cloud model
(developer friendly APIs and Software-defined everything) yet applying
it in the harsh physical and security environment of sensors and
ruggedized industrial PCs. In this talk we will cover design and
implementation of a novel Edge Computing platform created at ZEDEDA
Inc. We will focus on new, special purpose, open source operating
environment that has to securely run on billions of ARM and x86
device. Based on Linuxkit, this operating environment completely
replaces the need for embedded Linux or any special purpose RTOS
systems and instead allows them to run side-by-side. We will give
hands on examples of how anyone can start using this operating
environment or, perhaps, even port it to the device of their choosing.



Speakers
avatar for Roman Shaposhnik

Roman Shaposhnik

Founder, ZEDEDA Inc.
A member of the lost tribes of Sun microsystems (still wöndering in the valley) Co-founder & CHO @ZEDEDAEdge VP Legal, Board @TheASF & @LF_Edge AKA 谢罗文 @ 阿帕奇 Roman is a well known and acknowledged expert and visionary in Open Source strategy and execution. When co-founding... Read More →


Saturday November 17, 2018 4:25pm - 5:00pm PST
data

4:25pm PST

Classical Category Theory in Plain Scala
This is a full implementation of small categories and constructs based on them, like diagrams, cones, cocones, limits, colimits, etc. As an illustration, a model of Zermelo-Fraenkel set theory is implemented. Choice Axiom included.

Speakers
avatar for Vlad

Vlad

contributor, Patryshev
Software developer with an experience in categories and toposes.Teaching logic and formal methods at Santa Clara University.Working as a data engineer at Salesforce.


Saturday November 17, 2018 4:25pm - 5:00pm PST
functional

4:25pm PST

Reactive Microservice framework for Realtime Model Executions
It is essential for the modern applications to be fast, efficient, scalable and have the ability to react to changes quickly. In this talk I'll discuss the usage of a reactive programming paradigm we have developed in Capital One and how we leverage reactive and microservice frameworks like Akka and Docker with a focus to score language agnostic ML models in real time.
 
Quantum Real Time (QRT) is an inner-sourced framework built on Scala/Akka, enabling advanced and reusable capabilities for dynamic workflow orchestration, microservice development with easy creation of API endpoints, and language agnostic machine learning model execution through the use of open source technologies. It provides out of the box orchestration functionality that is configuration driven for rapid reuse during development while ensuring a high performance, fault tolerant and distributed microservice.
 
This reactive architecture has transformed the way we approach design and development of scalable and efficient applications, integrated with machine learning models that provide real time model scoring capabilities to deliver enhanced customer experience. Thus enhancing the machine learning capabilities to use big data technologies to run computationally intense models that satisfy real time latency requirements. 


Speakers
avatar for Phani Srikar Ganti

Phani Srikar Ganti

Data Engineer, Capital One
I'm a Data Engineer at Capital One, currently working within the Credit Card Marketing & Decisioning organization. My work involves the design and development of microservices, distributed systems and big data applications. I hold a Master's degree in computer science from Johns Hopkins... Read More →
avatar for Erin Kavanaugh

Erin Kavanaugh

Software Engineer, Capital One
Erin Kavanaugh is a full-stack Software Engineer currently working to deliver a Scala application which executes credit models when a customer applies for a Capital One credit card. She is a California native, and in her free time when she is not taking algorithms courses or going... Read More →


Saturday November 17, 2018 4:25pm - 5:00pm PST
reactive

5:10pm PST

Panel III: Cloud, Edge, and Silver Lining
In this panel, we'll consider the emerging architectures on the edge, including IoT, and enterprise stacks, including blockchain approaches, that make this next phase of the Internet and its brave new world possible.

Moderators
avatar for Alexy Khrabrov

Alexy Khrabrov

Program Chair, Reactive Summit

Speakers
avatar for Holden Karau

Holden Karau

Developer Advocate, Google
Holden Karau is a transgender Canadian open source developer advocate at Google focusing on Apache Spark, Beam, and related big data tools. Previously, she worked at IBM, Alpine, Databricks, Google (yes, this is her second time), Foursquare, and Amazon. Holden is the coauthor of Learning... Read More →
avatar for Haoyuan (H.Y.) Li

Haoyuan (H.Y.) Li

Founder and CEO, Alluxio
Haoyuan (H.Y.) Li is the Founder, Chairman, and CEO of Alluxio. He holds a Ph.D. in computer science from UC Berkeley’s AMPLab, where he created the Alluxio (formerly Tachyon) open source data orchestration system, and co-created Apache Spark Streaming as an Apache Spark founding... Read More →
avatar for Anoop Nannra

Anoop Nannra

Global Leader and Head of DLT Product, Cisco
Anoop Nannra is a Global Leader and Head of Cisco’s Blockchain Organization. He is focused on identifying disruptive technologies and accelerating their adoption through business incubation via co-development, co-innovation, partnerships and acquisitions.Anoop has a proven track... Read More →
avatar for Roman Shaposhnik

Roman Shaposhnik

Founder, ZEDEDA Inc.
A member of the lost tribes of Sun microsystems (still wöndering in the valley) Co-founder & CHO @ZEDEDAEdge VP Legal, Board @TheASF & @LF_Edge AKA 谢罗文 @ 阿帕奇 Roman is a well known and acknowledged expert and visionary in Open Source strategy and execution. When co-founding... Read More →


Saturday November 17, 2018 5:10pm - 6:00pm PST
functional
 


Twitter Feed

Filter sessions
Apply filters to sessions.