Loading…
Attending this event?
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Wednesday, November 14
 

9:00am

Advanced Software Engineering with Cliff Click
Cliff Click is a legend in the world of compilers, distributed systems, a software engineer's engineer.  He is known as a life-long developer, founder, and brilliant speaker.  Now for the first time, he delivers a full-day workshop that can teach every developer something new, and most importantly, share the insights of a leading practitioner who built some of the things we use daily.

This training will be comprised of three 2-hour workshops, with breakfast, lunch, and coffee breaks included.

High Performance from Understanding the Low Levels
A deep dive into modern X86 hardware. We look at caches and caching behavior, data-races (and how they show up on an X86), Specter and Meltdown, the Java Memory Model, CPU performance details (e.g. wide and O-O-O issue, hit-under-miss caches, branch prediction) and memory bandwidth - and relate them to writing performant code. We then tear down a simple Big Data analytics processing loop, make some small changes and get a 5x speedup.The (Java) Virtual Machine
A look at Virtual Machines far and wide, with a deep dive into the Java Virtual Machine. We'll cover JIT'ing and GC'ing; bytecode cost models & class loading; deoptimization (and re-opt); safepoints; virtual calls & dynamic dispatch; threading and memory models; fast locks & faster locks; OS support (priorities, files, mmap, time) and much much more. Parallel and Distributed Computing and Debugging
Parallel computing is everywhere and distributed computing is not far behind. Both bring serious challenges, including data-races, consistency and timing, "Heisen-Bugs", testing, parallel-design thinking, performance, profiling and bottlenecks.  Note this session is not about micro-services and deployment, but about coding and getting correctness in a parallel & distributed environment.


Speakers
avatar for Cliff Click

Cliff Click

CEO, Rocket Realtime School
Cliff Click was the CTO of Neurensic (now successfully exited), and CTO and Co-Founder of h2o.ai (formerly 0xdata), a firm dedicated to creating a new way to think about web-scale math and real-time analytics. He wrote my first compiler when Ihewas 15 (Pascal to TRS Z-80!), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). Cliff helped A... Read More →


Wednesday November 14, 2018 9:00am - 5:00pm
TBA
 
Thursday, November 15
 

8:00am

Breakfast and Welcome
Hot breakfast and uninterruptible coffee!

Thursday November 15, 2018 8:00am - 8:45am
Commons

8:45am

Grand Welcome and Opening Remarks
Dr. Alexy Khrabrov, the Founder and Organizer of the communities By the Bay, welcomes speakers and attendees, and outlines the construction and agenda of the conference.

Moderators
avatar for Alexy Khrabrov

Alexy Khrabrov

Chief Scientist and Founder, By the Bay

Thursday November 15, 2018 8:45am - 9:00am
functional

9:00am

Opening Keynote
Opening Keynote by the creator of Scala.

Speakers
avatar for Martin Odersky

Martin Odersky

Chief Architect, Lightbend


Thursday November 15, 2018 9:00am - 9:40am
functional

9:50am

Privacy-aware data science in Scala with monads and type level programming
In order to extract value from datasets, data science and machine learning experts require access to the data itself. However, organizations increasingly have stronger requirements for finer-grained controls over the processing and analysis of potentially sensitive data, for reasons such as regulatory compliance or general privacy policies. In machine learning applications, it may also be desirable to restrict data flow in order to avoid leakage or contamination via side channel information (eg, see Oscar Boykin's talk from last year's SBTB). We therefore seek a general mechanism to assist users in encoding and enforcing information flow policies in their software, including interactive (ie, notebook) analyses. In this talk we develop a Scala approach to this problem based on PL and security research whereby illegal data accesses can be rejected at compile-time.

Speakers
avatar for David Andrzejewski

David Andrzejewski

Senior Engineering Manager, Sumo Logic
David Andrzejewski is a Senior Engineering Manager at Sumo Logic, where he works on applying statistical modeling and analysis techniques to machine data such as logs and metrics. He also co-organizes the SF Bay Area Machine Learning meetup group. David holds a PhD in Computer Sciences... Read More →


Thursday November 15, 2018 9:50am - 10:30am
data

9:50am

ArrayDeques and How to Contribute to Scala 2.13 Collections
In this talk, we will introduce a new data structure, mutable.ArrayDeque, that outperforms most current mutable Scala collections like Lists, Buffers, Stacks and Queues. We will also go over implementation details needed for this to be part of the new Scala 2.13 collections library. We will then encourage the audience to contribute other useful data structures like Ropes, Zippers, Disjoint Sets etc. to Scala.

Speakers
avatar for Pathikrit Bhowmick

Pathikrit Bhowmick

Head of Engineering, Coatue Management
Pathikrit writes Scala full-time at a hedge fund. He is also the author of many widely used Scala libraries: https://github.com/pathikrit such as better-files and metarest and is a committee member of the Scala Platform.


Thursday November 15, 2018 9:50am - 10:30am
functional

9:50am

Structure and Interpretation of Stream Processing
Stream processing is a widely researched area as it poses various challenges (such as, performance, compositionally, CPU and Memory usage, referential transparency). Different frameworks (e.g., Iteratee, Pipes, Conduit, FS2, Akka Streams) across variety of languages (e.g., Haskell, Scala) have been evolved to address these issues differently and with varying priorities. In this comparative case study, we present different approaches to incremental stream processing. In particular, while basing foundation on the commonalities among these approaches and their primary constructs, we build a common ground and language to interact with their semantics. Afterward, we analyse and underscore what are the differences among the approaches, where they excel and where they fall short compared to the others, in addressing the core issues of stream processing. As an audience, from the talk, I would get a comparative overview of varying approaches to incremental stream processing available in different programming languages, that potentially can help in differentiating them and selecting the right tool to get the job done.

Speakers
avatar for Adil Akhter

Adil Akhter

Lead Engineer, ING


Thursday November 15, 2018 9:50am - 10:30am
reactive

10:40am

FP for Data Science: For-Loops Considered Harmful
Data science, machine learning and probabilistic programming are ready to reap to the benefits of functional programming (FP). FP is well known for the benefits of correctness, expressiveness, composability, parallelism, and more. While functional languages have taken off in many different domains, it has not yet deeply penetrated into modeling and data science, where Python is most common. In this talk, I will demonstrate in both code and design how functional programming patterns are a major benefit to data science practitioners - even when you work with Python. You will come away with a survey of FP techniques and how they relate to modeling and data science, how they fix the pitfalls of imperative languages and design, and a new perspective on why industry should move toward more FP techniques in the domain of data science.

Speakers
avatar for Aris Vlasakakis

Aris Vlasakakis

Machine Learning Engineer, Credit Karma
My interests are in machine learning, functional programming, distributed systems amongst others.


Thursday November 15, 2018 10:40am - 11:00am
data

10:40am

Immutable APIs and mutable internals: a Scala design case study
Abstract: We love immutability for creating clear APIs that are easy to understand. Unfortunately, many immutable designs are slower than comparable mutable designs. In this tutorial style talk we will consider the case of parser combinators. We will first create a simple immutable API for our parser combinator library. Then we will consider an alternate implementation that keeps the immutable API but composes using hidden mutable state. The result is an immutable public API with twice the performance of the original. This design approach can generalize to other use cases in Scala to retain a safe and clean API but dramatically improve performance.

Speakers
avatar for Oscar Boykin

Oscar Boykin

Software Engineer, Stripe
Oscar is the creating of Scalding, Summingbird, and Algebird, and is an overall professor and mathematician turned software magician.


Thursday November 15, 2018 10:40am - 11:00am
functional

10:40am

Monitoring Reactive Streams
Reactive Streams are the key to build asynchronous, data-intensive applications with no predetermined data volumes. By enabling non-blocking backpressure, they boost the resiliency of your systems by design. But how do you tune and debug such applications? When productionizing Reactive Streams, the same backpressure that preserves the safety of your pipeline can get in the way of effectively monitoring its status. In this talk we’ll present a line of action to 
  1. measure the throughput of your pipeline
  2. identify its bottlenecks and look at possible tuning counteractions
  3. diagnose liveness issues.
Examples will be in Scala and Akka Streams, however these patterns are generic and applicable to any Reactive Streams implementation out there.

Speakers
avatar for Stefano Bonetti

Stefano Bonetti

Software Engineer
Stefano has been developing large scale backend systems within the cozy boundaries of the JVM for a few years, and he has recently become passionate about the Scala ecosystem - especially all things Akka. He has contributed to Akka, Akka HTTP and Alpakka codebases. Since 2017 he presented... Read More →


Thursday November 15, 2018 10:40am - 11:00am
reactive

11:10am

Making Spark ML Models Portable - Know Your Options
After successfully training ML model with Apache Spark the next task becomes important - how to serve it? One way is to keep using Spark for serving as well, but sometimes it's not desired or possible. For instance if one would like to expose model as HTTP service, run in Docker container or use it on mobile device. This talk explores various approaches of how to allow model portability outside Spark to achieve this.

Speakers
avatar for Matthew Tovbin

Matthew Tovbin

Principal Engineer, Einstein, Salesforce


Thursday November 15, 2018 11:10am - 11:30am
data

11:10am

Transpiling GraphQL instead of writing customized server code
GraphQL is an excellent query language for clients because it specifies *what* data and response shape is needed without worrying about *how* to get and reshape that data. At Twitter, we take the next step and automatically compile GraphQL queries into code at runtime that efficiently specifies *how* to retrieve the data as well! Developers are able to expose new or existing data through our GraphQL API without writing code or deploying new software. Come see how we transpile our GraphQL queries into code that retrieves and composes data distributed across many services and databases to exactly satisfy each query while generically handling batching, errors, access control, and operational concerns—without our engineers writing a single resolver. We'll discuss how we leverage information from our existing distributed data access layer to power our predictable and uniform API that lets product developers easily get the data they need. Outline: Intro: GraphQL is *What* not *how* Resolvers: Specifying *how* by hand Problems with resolvers Generating resolvers Resolving with a non-GraphQL query Leveraging existing data access systems Transpiling GraphQL Exposing new data Generating a GraphQL API and implementation A sketch of what this enabled for us

Speakers
avatar for Michael Solomon

Michael Solomon

Software Engineer, Twitter
Mike Solomon is a software engineer on Twitter's Strato team where he uses Scala to generate uniform GraphQL, REST, and Scala APIs, and tries to make building new API services unnecessary. | | In his spare time he makes an audio-based choose-your-own adventure mobile game called... Read More →


Thursday November 15, 2018 11:10am - 11:30am
functional

11:10am

The Danger of Implicit Blocking in Finagle
It is really bad to block Finagle’s event loop. The most common way this happens is when code explicitly blocks by calling Await.result or Await.ready. But this is not the only way the event loop can get blocked. The event loop can be blocked when long running computations are called within the loop. If you create a service that directly makes a long running computation it will work fine when you test it under no load. But in a production environment it will grind to a halt even though the server does not appear to be overloaded and should be fully capable of handling all the requests quickly. What you will find is happening is that your are exhausting the finagle event loop thread pool while the time taken to run each computation is acceptable the amount of time each request waits around for a network IO thread to become available quickly goes up. In my talk I will discuss how to identify when this problem is occurring and how to correct it so your latency returns back to expected values.

Speakers
avatar for Michael Armella

Michael Armella

Senior Software Engineer, Credit Karma
I currently work on recommender systems at Credit Karma. I focus on high scale, high availability systems for processing large amounts of data to produce simple answers to the question "what ad should we show this user?"


Thursday November 15, 2018 11:10am - 11:30am
reactive

11:40am

Down the Wabbit Hole
Vowpal Wabbit is a fast, out-of-core C++ implementation of sparse gradient descent. It is sponsored by Microsoft Research and is one of the few open-source libraries capable of solving contextual bandit problems at scale—unfortunately it is also a minefield to use properly. In this talk, we’ll explore how to use Haskell to guarantee safe and performant usage of an otherwise unsafe machine learning library. Along the way, we’ll learn about some of Haskell’s lower-level language features, including the foreign function interface, inline-c, quasi-quoting & anti-quoting, synchronous & asynchronous exceptions, the bracket pattern, and some simple type-level programming.

Speakers
avatar for Chris McKinlay

Chris McKinlay

Formation


Thursday November 15, 2018 11:40am - 12:20pm
data

11:40am

From Scala to ByteCode — a view of how Scala is implemented on top the JVM.
Scala runs on top the JVM and offers us cool and useful features such as Traits, Objects, Lazy definitions, higher-order functions , currying and more, but how are they really implemented under the hood ? Understanding what's going on underneath the covers of your code can be very beneficial and can lead to insights that may affect the way you write and debug your code. In this session we will take a deep look into the JVM to show how Scala does its magic by examining the bytecode and classes that are being generated. We will also see how debug symbols can help IDEs debug our code and an easy and friendly way.

Speakers
avatar for Alon Muchnick

Alon Muchnick

backend team lead, WIX.COM
Alon Muchnick is a software engineer with background in networking security and Unix systems. For the last two years he has been working for Wix.com, developing Wix Stores, a robust microservices-based eCommerce platform, using Scala stack and CQRS with event sourcing.


Thursday November 15, 2018 11:40am - 12:20pm
functional

11:40am

Practical Reactive Streams with Monix
Stream processing is a hot topic today and it’s easy to get lost among all the possibilities. In this live coding session we will explore the Reactive Streams approach used by the Monix project - one of the Scala implementations of the Reactive Streams concepts. On an almost real-life example we’re going to walk through both the basics and some more advanced usages of the library.

Speakers
avatar for Jacek Kunicki

Jacek Kunicki

Passionate Software Engineer, SoftwareMill
I'm a passionate software engineer living in the JVM land - mainly, but not limited to. I also tend to play with electronics and hardware. When sharing my knowlegde, I always keep in mind that a working example is worth a thousand words.


Thursday November 15, 2018 11:40am - 12:20pm
reactive

12:20pm

Lunch
Excellent lunch and networking!

Thursday November 15, 2018 12:20pm - 1:10pm
Commons

1:10pm

Monitoring AI with AI
The best performing offline algorithm can lose in production. The most accurate model does not always improve business metrics. Environment misconfiguration or upstream data pipeline inconsistency can silently kill the model performance. Neither prodops, data science or engineering teams are skilled to detect, monitor and debug such types of incidents. Was it possible for Microsoft to test Tay chatbot in advance and then monitor and adjust it continuously in production to prevent its unexpected behaviour? Real mission critical AI systems require advanced monitoring and testing ecosystem which enables continuous and reliable delivery of machine learning models and data pipelines into production. Common production incidents include: - Data drifts, new data, wrong features - Vulnerability issues, adversarial attacks - Concept drifts, new concepts, expected model degradation - Dramatic unexpected drifts - Biased Training set / training issue - Performance issue In this demo based talk we discuss a solution, tooling and architecture that allows machine learning engineer to be involved in delivery phase and take ownership over deployment and monitoring of machine learning pipelines. It allows data scientists to safely deploy early results as end-to-end AI applications in a self serve mode without assistance from engineering and operations teams. It shifts experimentation and even training phases from offline datasets to live production and closes a feedback loop between research and production. Technical part of the talk will cover the following topics: - Automatic Data Profiling - Anomaly Detection - Deep Autoencoders - GANs - Density based Clustering of inputs and outputs of the model - Service Mesh, Envoy Proxy, trafic shadowing Demo part of the talk will simulate a real life concept drift as well as new concepts for the model and different algorithms that will catch those drifts in operational environment.

Speakers
avatar for Stepan Pushkarev

Stepan Pushkarev

CTO, Hydrosphere.io
hydrosphere.io CTOAutomation of AI/ML Operations: deployment, serving, monitoring, subsampling, retraining.


Thursday November 15, 2018 1:10pm - 1:30pm
data

1:10pm

Haskell in production
Core components of our block chain at Symbiont.IO are written in Haskell. People are sometimes skeptical of Haskell's readiness for production. Here is how we use it successfully. We'll talk about relevant tools, detailed code design decisions, high level architecture design decisions and how we hire skilled developers to work on the projects. We will get into a lot of technical details, but also take a bird's eye view. We'll mention great things that "just worked" and problems and how we dealt with them.

Speakers
avatar for Jan Christopher Vogt

Jan Christopher Vogt

Software Engineer, Symbiont.IO


Thursday November 15, 2018 1:10pm - 1:30pm
functional

1:40pm

Video Access-Log Processing with Apache Flink
Learn about the streaming data pipeline that supports the Mux Video service to process CDN access logs to support performance monitoring, dynamic CDN selection, log enrichment, and utility billing. The talk will cover the architecture & technologies (Go, Java, Kafka, Flink) and some of the challenges we’ve faced, including variable log delivery schedules that are addressed with Flink’s powerful windowing features, and dynamic CDN selection based on CDN performance in a geographic locale. This talk will appeal to developers faced with processing large volumes of data with minimal latency (a requirement that has become increasingly common) and supporting a growing number of business applications.

Speakers
avatar for Scott Kidder

Scott Kidder

Software Engineer, Mux
I've been working on video encoding & delivery platforms for over 10 years (MobiTV, Brightcove/Zencoder, and now Mux). I'm currently working on Mux's API for Video, making it easy for developers to build amazing applications that include video without needing to be video experts... Read More →


Thursday November 15, 2018 1:40pm - 2:00pm
data

1:40pm

Pijul, a purely functional version control system

Rsc is an experimental Scala compiler focused on compilation speed. Our research goal is to achieve dramatic compilation speedups for typical Scala codebases.

Recently, we've been experimenting with outlining, i.e. computing type signatures of public and protected definitions in the program. Outlines represent dependencies between different elements of the program, so if we compute outlines, we can compile all files and perhaps even all methods of the program in parallel.

With a few language restrictions, Rsc can compute outlines very quickly. On Twitter Util, a foundational library of the Twitter monorepo, Rsc performs outlining roughly 10x faster than Scalac performs compilation. Having obtained the outlines, we can then partition the sources and launch multiple Scala compiler instances in parallel.

Join our talk to learn how outlining works, how well it performs in practice and what we have planned for the future.


Speakers
avatar for Eugene Burmako

Eugene Burmako

Language tools lead, Twitter
Language tools lead at Twitter, member of the Scala language committee, founder of Rsc, Scalameta and Scala Macros.


Thursday November 15, 2018 1:40pm - 2:00pm
functional

1:40pm

Connected Car Telemetry Data - Join and what not!
Talking about how we ingest telemetry data from our awesome cars into our data platform and join this data with different static data source, while not losing the most valuable asset "time".

Speakers
avatar for Sonam Kanungo

Sonam Kanungo

Senior Software Engineer, Mercedes Benz Research & Development North America


Thursday November 15, 2018 1:40pm - 2:00pm
reactive

2:10pm

Evolution of GoPro's Data Platform
In this talk, we will discuss the evolution of data platform at GoPro from fixed-size Hadoop clusters to Cloud-based Spark Cluster with Centralized Hive Metastore +S3. Share our experience in data architecture transformation, batch and streaming frameworks transformation; data democratization via slack, data portal & visualization; and machine learning features visualization via Google Facets + Spark

Speakers
avatar for Chester Chen

Chester Chen

Head of Data Science & Engineering, GoPro
avatar for David Winters

David Winters

Big Data Architect, GoPro
David is an Architect in the Data Science and Engineering team at GoPro and the creator of their Spark-Kafka streaming data ingestion pipeline. He has been developing scalable data processing pipelines and eCommerce systems for over 20 years in Silicon Valley. David's current big... Read More →


Thursday November 15, 2018 2:10pm - 2:50pm
data

2:10pm

Rage Against the Ecosystem
Scala's open-source ecosystem is broken: writing and maintaining build configurations is too difficult, and publishing is even harder, coming with the additional friction of having to support an increasing multiplicity of binary targets. But worse, this workflow puts a burden on a few key people in the Scala community to publish their libraries quickly so that their downstream users can publish theirs, and it can take months for some projects to be published. How is it that the multi-billion-dollar Scala software industry is so dependent on so few people? I will introduce Fury, a fast, source-based dependency manager and build tool for Scala which aspires to radically disrupt the ecosystem for the better. Fury defines builds as static data, not code, making viewing them instantaneous and understanding them easy. Fury facilitates a new, distributed, version-controlled and trust-based ecosystem where publishing is as simple as tagging a signed commit and telling users about it. Builds can be external to projects, so there's no need to impose Fury upon any existing developers who are happy using sbt. The utopia we are striving for is a new, fluid and versatile ecosystem in which developers are liberated to publish more easily and frequently, and where it becomes easier for anyone to make contributions to open-source projects.

Speakers
avatar for Jon Pretty

Jon Pretty

Developer, Consultant, Entrepreneur and Evangelist, Propensive Ltd
Jon Pretty is an international man of Scala mystery.


Thursday November 15, 2018 2:10pm - 2:50pm
functional

2:10pm

Reactive Java Programming: a new Asynchronous Database Access API
Reactive Applications require non-blocking database access; the existing JDBC API leads to blocked threads, threads scheduling, and contention. For high throughput and large-scale deployment, the Java community needs a standard asynchronous API for database access where user threads never block. This session presents a new Java standard proposal for accessing SQL databases. This new API is completely non-blocking. It is not intended to be an extension to, or a replacement for, JDBC but, rather, an entirely separate API that provides completely non-blocking access to the same databases as JDBC. This presentation examines the API, its execution model, code samples, a demo of a prototype, and the next steps.

Speakers
avatar for Kuassi Mensah

Kuassi Mensah

Director Product Management, Oracle Corporation
Kuassi Mensah is Director of Product Management at Oracle; his scope includes:(i) Java performance, scalability, HA, and Security with Oracle database.(ii) Hadoop and Spark integration with the Oracle database (iii) Java & JavaScript integration with the Oracle database (OJVM, Nashorn... Read More →


Thursday November 15, 2018 2:10pm - 2:50pm
reactive

3:00pm

Distributed Systems Protocols and their Vulnerabilities
Many messaging systems that are widely used in the industry, e.g., Kafka, use centralized distributed systems services to achieve reliability and consensus between servers. Companies in the industry use the services; however, only a few of them understand the details of the protocols. This talk brings the principles used in academia to the industry by introducing the common distributed systems protocols implemented underneath the popular services. In addition, this talk will compare the differences between how the protocols are used in both academia and the industry. It provides details of how the protocols, specifically Paxos and Raft, work, including how they elect leaders among servers, how they achieve consensus between machines, and how they reliably process and execute client commands. Therefore, it shows how the systems and services, which use the protocols, are enabled to have fault-tolerance, and to achieve confidentiality, integrity, authenticity, availability, etc. From the reliability and security point of view, the talk discusses how the protocols deal with machine failures, including leader failures and replicas failures. It shows the vulnerabilities and potential security issues exist in the protocols. Last but not least, we'll take a look at what we can do to avoid the vulnerabilities when applying the academic theories in the industry.

Speakers
avatar for Yifan Xing

Yifan Xing

CCIS Graduate Portal Tech Lead, Northeastern University


Thursday November 15, 2018 3:00pm - 3:20pm
reactive

3:00pm

Continuous ML Applications in Production
Traditional machine learning pipelines end with life-less models sitting on disk in the research lab.  These traditional models are typically trained on stale, offline, historical batch data.

Static models and stale data are not sufficient to power today's modern, AI-first Enterprises that require continuous model training, continuous model optimizations, and lightning-fast model experiments directly in production.

Through a series of open source, hands-on demos and exercises, we will use PipelineAI to breathe life into these models using 4 new techniques that we’ve pioneered:

* Continuous Validation (V)
* Continuous Optimizing (O)
* Continuous Training (T)
* Continuous Explainability (E)

The Continuous "VOTE" techniques has proven to maximize pipeline efficiency, minimize pipeline costs, and increase pipeline insight at every stage from continuous model training (offline) to live model serving (online.)

Speakers
CF

Chris Fregly

Founder, Pipeline AI
Chris Fregly is Founder at PipelineAI, a Real-Time Machine Learning and Artificial Intelligence Startup based in San Francisco.He is also an Apache Spark Contributor, a Netflix Open Source Committer, founder of the Global Advanced Spark and TensorFlow Meetup, author of the O’Reilly... Read More →


Thursday November 15, 2018 3:00pm - 3:40pm
data

3:00pm

Graal: How to use the new JVM JIT compiler in real life
With JEP 317: Experimental Java-Based JIT Compiler in JDK 10, Graal is now part of OpenJDK. In fact, Graal is already available in JDK 9 due to JEP 243: Java-Level JVM Compiler Interface. Graal is itself written in Java and that brings some new properties and behavior to the table which we haven’t seen with existing HotSpot JIT compilers. This talk will show how to use Graal with JDK 10, how to compile an upstream Graal version and what to look out for when using it for benchmarking or even in production.

Speakers
avatar for Chris Thalinger

Chris Thalinger

Staff Software Engineer, Twitter
Chris Thalinger is a software engineer working on Java Virtual Machines for more than 14 years. His main expertise is in compiler technology with Just-In-Time compilation in particular. Initially being involved with the CACAO and GNU Classpath projects, the focus shifted to OpenJDK... Read More →


Thursday November 15, 2018 3:00pm - 3:40pm
functional

3:30pm

2 Fast 2 Furious: migrating Medium's architecture without slowing down
We’re shifting gears to leverage new technologies created since we built Medium 5 years ago, but we need to incrementally gain benefits from the new system along the way and we can’t afford to let it hinder feature development. By taking advantage of GraphQL’s flexibility and our existing infrastructure, we’re able to make widespread yet gradual architectural changes! Come see how Medium is changing lanes without slowing down. Anyone thinking about moving to GraphQL (or thinking about migrating an exisiting architecture in general) can benefit from this talk, but especially anyone who is building their own GraphQL server or needs practical advice on how to successfully migrate a legacy system to GraphQL without “stopping the world,” getting defunded partway through, or building a system no one uses. Abstract Migrating an entire system to new tools and frameworks isn’t an easy task. And doing that while not impacting feature development? That’s even harder. We’ll walk through how Medium is migrating off of our existing system, without hindering product development, and while also incrementally gaining the benefits of a new system along the way. We’ll go over the design of our new architecture, our phased migration approach, and how the layered structure of our GraphQL server (written in Scala with Sangria) was integral to the success of both. - Goals of the migration - Design of the new system - Phased approach - Phase 1: developer experience - IDLs (protobuf) + GraphQL - Phase 2: services + gRPC - GraphQL server layers - Fetchers - Repos - Schema (derivation) - Putting it all together

Speakers
avatar for Sasha Solomon

Sasha Solomon

Platforms Team Tech Lead, Medium
I'm the Tech Lead on the Platforms Team helping architect the next generation infrastructure at Medium in San Francisco. Scala + GraphQL 4 Lyfe. Player Character. Potato Compatible. | | Follow me on Twitter @sachee for tweets about GraphQL, Dungeons and Dragons, and an excellent... Read More →


Thursday November 15, 2018 3:30pm - 4:10pm
reactive

3:50pm

Machine Learning on Source Code
Machine Learning is definitely what the cool kids are doing nowadays. Deep Learning specifically powered a revolution on many fields of research, including Computer Vision and Natural Language Processing, but also self driving cars, or strategy games like Go. What not many are talking about is how to those techniques to improve our developer routines. Machine Learning on Source Code (MLonCode) is a very interesting field because it is at the frontier of Natural Language Processing, Graph-Based Machine Learning, Static Analysis, and has the power to even bring other fields like Dynamic Analysis of programs. The amount of data available for this problem is almost overwhelming, and given that data is the fuel of Machine Learning, we are excited for an amazing ride! This talk will cover the basics of what Machine Learning techniques can be applied to source code, specifically we will discover: * embeddings over identifiers, * structural embeddings over source code, answering the question how similar are two fragments of code, * recurrent neural networks for code completion, * future direction of the research. While the topic is advanced, the level of mathematics required for this talk will be kept to a minimum. Rather than getting stuck in the details, we'll discuss the advantages and limitations of these techniques, and their possible implications to our developer lives.

Speakers
avatar for Francesc Campoy Flores

Francesc Campoy Flores

VP of Developer Relations, source{d}
Francesc Campoy Flores is the VP of Developer Relations at source{d}, a startup applying ML to source code and building the platform for the future of developer tooling. Previously, he worked at Google as a Developer Advocate for Google Cloud Platform and the Go team. | | He’s... Read More →


Thursday November 15, 2018 3:50pm - 4:25pm
data

3:50pm

Your Type System Working For You!
In this session we will explore ways of making the Scala type system work harder for you to improve correctness in your code. With examples ranging from compile time verification of geographical coordinate reference systems through to enforcing constraint rules between domain models, this talk will show you ways to harness the power of the Scala compiler to catch errors sooner, and make writing correct code easier.

Speakers
avatar for Richard Wall

Richard Wall

CEO, Escalate Software
Long time Scala developer, trainer and enthusiast. | Started possibly the first Scala user group - Bay Area Scala Enthusiasts. | Winner of the inaugural Phil Bagwell award for Scala community work. | Scalawag and Java Posse podcast co-host. Hiker, biker, music lover, love to t... Read More →


Thursday November 15, 2018 3:50pm - 4:25pm
functional

4:20pm

H2O internals
H2O does in-memory analytics on clusters with distributed parallelized state-of-the-art Machine Learning algorithms.  However, the platform is very generic, and very very fast.  H2O.ai builds Machine Learning tools with it, but the platform can do much more.  H2O includes a K/V store exact semantics with typical read and write speeds of ~200ns; a highly compressed Big Data in-memory storage typically better than 2x to 4x gzip-on-disk size, which can read and decompress the data at C/Fortran speed; a pure-Java clean and simple coding style to write parallel & distributed code; a generic serializer that's well faster than protobuf or kryo and does not need an special registration or markup language; a large set of building blocks for common math operations, and of course a library of state-of-the-art ML algorithms.  This is a low-level systems' implementation talk of H2O's design.

Speakers
avatar for Cliff Click

Cliff Click

CEO, Rocket Realtime School
Cliff Click was the CTO of Neurensic (now successfully exited), and CTO and Co-Founder of h2o.ai (formerly 0xdata), a firm dedicated to creating a new way to think about web-scale math and real-time analytics. He wrote my first compiler when Ihewas 15 (Pascal to TRS Z-80!), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). Cliff helped A... Read More →


Thursday November 15, 2018 4:20pm - 5:00pm
reactive

4:30pm

Apache Spark + Rapids.ai: the Future of Enterprise Data
Speakers
avatar for Clément Farabet

Clément Farabet

VP, AI Infrastructure, NVIDIA
Clément Farabet is VP of AI Infrastructure at NVIDIA. He received a PhD from Université Paris-Est in 2013, while at NYU, co-advised by Laurent Najman and Yann LeCun. His thesis focused on real-time image understanding, introducing multi-scale convolutional neural networks and a... Read More →


Thursday November 15, 2018 4:30pm - 5:00pm
data

4:30pm

Channeling the Inner Complexity
An essential requirement for writing programs that scale, is to have
constructs to model concurrency in an understandable, safe, and
efficient manner. This talk presents an overview of various such
models available in Scala, and their impact on program structure and
complexity. It then explores a way to model concurrency without
complexity with an implementation of Communicating Sequential
Processes (CSP), heavily inspired by Goroutines, scala-async and
Clojure's core.async.

Speakers
avatar for Jakob Odersky

Jakob Odersky

Software Engineer, Driver, Inc


Thursday November 15, 2018 4:30pm - 5:00pm
functional

5:10pm

Panel I: Thoughtful Software Engineering
The distinguishing quality of SBTB is that authors step back from the problem at hand and think how it can be abstracted and composed from reusable abstractions.  Over the years the community evolved numerous approaches that went mainstream, such as type-level programming, reactive systems, and more.  What are the best software engineering practices worth adopting?  We'll invite some of the best positioned folks in the community to share their views and experiences.

Speakers
avatar for Rúnar Bjarnason

Rúnar Bjarnason

Co-founder, Unison Computing
My name is Rúnar. I’m a software engineer in Boston, an author of a book, Functional Programming in Scala, and cofounder of Unison Computing. We're making a distributed programming language called Unison. | | Talk to me about functional programming, relational database theory... Read More →
avatar for Bryan Cantrill

Bryan Cantrill

CTO, Joyent
Bryan Cantrill is the CTO at Joyent, where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform.Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system... Read More →
avatar for Cliff Click

Cliff Click

CEO, Rocket Realtime School
Cliff Click was the CTO of Neurensic (now successfully exited), and CTO and Co-Founder of h2o.ai (formerly 0xdata), a firm dedicated to creating a new way to think about web-scale math and real-time analytics. He wrote my first compiler when Ihewas 15 (Pascal to TRS Z-80!), although his most famous compiler is the HotSpot Server Compiler (the Sea of Nodes IR). Cliff helped A... Read More →
avatar for Marius Eriksen

Marius Eriksen

infrastructure-infrastructure engineer, Grail
Marius is the author of such ideas as Finagle, Zipkin, Your Server is a Function, and many others.
avatar for Martin Odersky

Martin Odersky

Chief Architect, Lightbend
avatar for Julie Pitt

Julie Pitt

Director, Machine Learning Infrastructure, Netflix
Julie leads the Machine Learning Infrastructure at Netflix, with the goal of scaling Data Science while increasing innovation. She previously built streaming infrastructure behind the "play" button while Netflix was transitioning from domestic DVD-by-mail service to international... Read More →


Thursday November 15, 2018 5:10pm - 6:00pm
functional

6:00pm

Happy Hour I
Our famous happy hour caps the day with excellent food and drinks and great conversation.

Thursday November 15, 2018 6:00pm - 8:00pm
Commons
 
Friday, November 16
 

8:00am

Breakfast and Welcome
Hot breakfast and uninterruptible coffee!

Friday November 16, 2018 8:00am - 9:00am
Commons

9:00am

Keynote II: Neha Narkhede
Neha Narkhede is the co-creator of Apache Kafka and the cofounder of Confluent.

Friday November 16, 2018 9:00am - 9:40am
functional

9:50am

Building a Contacts Graph from activity data
In the customer age, being able to extract relevant communications information in real-time and cross reference it with context is key. Salesforce is using data science and engineering to enable salespeople to monitor their emails in real-time to surface insights and recommendations using a graph modeling contextual data. In this presentation, Alexis will explain how Salesforce AI Inbox builds and uses an activity graph (based on emails, meetings, ...) to offer services such as recommended connections and provide context for real time insights and recommended actions. Alexis will go over use cases, technical architecture, best practices to show how the Graph and services are built and do a live demonstration.

Speakers
avatar for Alexis Roos

Alexis Roos

Director Data Science, Salesforce
Alexis is director of data science and machine learning at salesforce where he is leading a team of data scientists and engineers delivering Intelligent services for Einstein platform. | Alexis has over twenty years of engineering and management experience including 13 years at Sun... Read More →


Friday November 16, 2018 9:50am - 10:30am
data

9:50am

Declarative distributed concurrency in Scala
I present the Distributed Chemical Machine (DCM) - a purely functional, fully declarative framework for parallel, concurrent, and distributed computing in Scala. The DCM can automatically run multi-core concurrent code on any number of machines connected to (one or more) Zookeeper instances. Zookeeper provides data coordination, persistence, and fault tolerance, allowing the programmer to focus on the distributed application logic in a peer-to-peer architecture. The DCM builds upon the Chemical Machine (http://chemist.io), a data-driven, message-passing concurrency paradigm that significantly improves upon the Actor Model, achieving automatic parallelism and a higher level of declarative expressiveness for concurrency. Previously I implemented the (single-JVM, multi-core) Chemical Machine as an embedded DSL in Scala. With very few code changes and little configuration, a Chemical Machine-based application can now be ported to the DCM and run on a cluster, achieving automatic distribution.

Speakers
avatar for Sergei Winitzki

Sergei Winitzki

Senior Software Engineer, Workday Inc.
Theoretical physicist turned software engineer, passionate for functional programming, functional type theory, and declarative domain-specific languages.


Friday November 16, 2018 9:50am - 10:30am
functional

9:50am

Fast Data pipelines with Akka Streams and Alpakka Kafka Connector

The Alpakka Kafka connector (formerly known as reactive-kafka) is a component of the Alpakka project.  It provides a diverse streaming toolkit, but sometimes it can be challenge to design these systems without a lot of experience with Akka Streams and Akka.  By combining Akka Streams with Kafka using Alpakka Kafka, we can build rich domain, low latency, and stateful streaming applications with very little infrastructure.

This talk will discuss solutions to common Kafka and streaming problems such as consumer group partition rebalancing, exactly-once/transactional message delivery, stateful stages, state durability/persistence, and common production concerns like job failover and deployment.

The Alpakka project is an open source initiative managed by Lightbend to implement stream-aware, reactive, integration pipelines for Java and Scala. It is built on top of Akka Streams, and has been designed from the ground up to understand streaming natively and provide a DSL for reactive and stream-oriented programming, with built-in support for backpressure.


Speakers
avatar for Sean Glover

Sean Glover

Software Engineer, Fast Data Platform Team, Lightbend
Sean is a Senior Software Engineer on the Fast Data Platform team at Lightbend.  He specializes in Apache Kafka and its ecosystem.  He has experience consulting with Global 5000 companies on how to build Fast Data platforms using technologies such as Scala, Kafka, Spark, and Akka... Read More →


Friday November 16, 2018 9:50am - 10:30am
reactive

10:40am

Graph-First Services Using GraphQL
Authors: Julien Delange & Adam Crane As of today, many applications expose their data with limited ability to query or filter fields. Often, application developers use multiple endpoints to implement behavior variability or data filtering, which is inefficient from an engineering perspective. GraphQL, a new query language, addresses this issues by allowing the end user to query the data available on the server and select only the fields of interest. Initially started by Facebook, the technology gained traction over the years to the point that several companies started to implement GraphQL endpoint. In this talk, we present a methodology for building a modern data-rich application centered around GraphQL schemas. We cover how the schema informs decisions for the rest of the application layers and unlocks new query patterns and possibilities compared to a standard REST-based or IDL-based approach. We also cover how this affects the design of data storage, using traditional storage backends (key-value, SQL). To support these themes we present a novel application (Network Health Visualizer), its data pipeline and how the implementation of its GraphQL interface guided the rest of our development process. Additionally, we cover the Twitter GraphQL API and how it fits into the data layer of Twitter infrastructure as a whole.

Speakers
avatar for Adam Crane

Adam Crane

SRE, Twitter
avatar for Julien Delange

Julien Delange

Software Engineer, Twitter
Former rocket-scientist at the European Space Agency in the Netherlands, Julien was previously a senior staff researcher at Carnegie Mellon University and a senior software engineer at AWS. He is now a staff software engineer at Twitter, where he is working on improving Twitter infrastructure... Read More →


Friday November 16, 2018 10:40am - 11:00am
data

10:40am

Adopting GraalVM
After many years of development, Oracle finally published GraalVM and sparkled a lot of interest in the community. GraalVM is a high-performance polyglot VM with a number of potentially interesting traits we can take advantage of like increased performance and lowered cost. It can also tackle shortcomings of JVM/Scala we are struggling for years like slow-startup times or large jars. Lastly, thanks to its polyglot nature it can open interesting doors we may want to discover. On the other hand, GraalVM may still be bleeding edge technology and having a hard time to deliver the promised features. In this talk, I’d like to discuss advantages and disadvantages of adopting GraalVM, provide you guidance if you decide to do so and also share our story in this area including various samples, and recommendations. This talk is focused on JVM and Scala but should be beneficial for everyone with interested in this topic.

Speakers
avatar for Petr Zapletal

Petr Zapletal

Tech Lead, Disney Streaming Services
My name is Petr and I work for Disney Streaming Services (ex. Bamtech Media ex. Cake Solutions). I'm interested in Reactive and Distributed Systems, Streaming and ofc Scala and JVM.


Friday November 16, 2018 10:40am - 11:00am
functional

10:40am

Swimming in the stream: A simple data analytics pipeline with Akka Streams
Streams! Streams offer an interesting conceptual model to processing pipelines that is very functional programming oriented. The streaming paradigm is very well suited to deal with a constant flow of data and Akka streams is a powerful implementation of it. It offers a set of composable building blocks for creating asynchronous and scalable data streaming applications. In this talk we will live code from a very basic stream to a data aggregation pipeline that interacts with multiple services

Speakers
avatar for Gabriel Claramunt

Gabriel Claramunt

CTO, Scalents


Friday November 16, 2018 10:40am - 11:00am
reactive

11:10am

Understanding World food economy with satellite images
It has become possible to observe the growing process from satellites daily at a global scale. Based on it we can identify and share agriculture-specific signals (insights) like - presence of farming activity, presence of irrigation systems, crop classification and productivity assessment. A pipeline starts with a set of images specifically designed for daily monitoring the growth of commodity crops: corn, soybean, rice and wheat. To process this data we use our processing and delivery system with ML (boosting) used for understanding vegetation patterns and AI for scaling the models on other climate zones.

Speakers
avatar for Aleksandra Kudriashova

Aleksandra Kudriashova

Head of Product, Astro Digital
My name is Alex and I love to working with satellite images at scale. It's really cool to enable access to huge dataset of global observation that helps to know how the World looks like and changing.


Friday November 16, 2018 11:10am - 11:30am
data

11:10am

Effects types in Scala - how to choose one
The Scala community has been interested in representing asynchronous computations through the type system for a long time. While Scala provides you with a well-supported Future implementation, both the Typelevel and Scalaz communities are working towards their own implementation of an IO. This leads to a lot of friendly competition between these libraries, leading to major improvements of their performance. This talk will present the current options available to build programs and manage effects in the type system, their difference from a developer perspective, their usage in the wider community, and how one can decide which implementation to use.

Speakers
avatar for Alexandre Bergeron

Alexandre Bergeron

Software Engineer


Friday November 16, 2018 11:10am - 11:30am
functional

11:10am

A Reactive Fraud Monitoring Engine for Instant Payments
2018 is a challenging year for European banks and their fraud management capabilities. In November, the introduction of SEPA Instant Credit Transfers will mean that the time allowed for transferring money to payment beneficiaries will have to be reduced from one business day, as it is currently, to a maximum of five seconds.  This dramatically reduces time for fraud detection.  Around the same time, the Revised Payment Service Directive (PSD2) will oblige banks to give access to third-party providers of financial services to customers' accounts through open APIs.  This introduces new fraud risks and the possibility for fraudsters to come up with new modus operandi.  All this takes place in a context where the General Data Protection Regulation (GDPR) is adding new constraints to the way customers' data may be accessed and used.  This presentation describes the architecture of a new fraud monitoring engine (FraME) for BNP Paribas Fortis bank to detect and react to fraud attempts in real-time given the above mentioned new rules and constraints. The presentation discusses the main challenges met during the construction of the engine and the solutions adopted to overcome them. FraME is a typical reactive system as defined in the Reactive Manifesto. It is elastic and resilient in order to detect fraud within a few hundred milliseconds and without any downtime while being available 24/7.  Fraud detection is obtained using machine learning models that are regularly retrained, tested, and deployed.  These models are completed by a set of detection rules that permit quick reaction to new fraudulent modus operandi. FraME is implemented using Apache Kafka, Apache Flink and a combination of micro-services.

Speakers
avatar for David Massart

David Massart

Principal Consultant, D.E.Solution


Friday November 16, 2018 11:10am - 11:30am
reactive

11:30am

Towards Typesafe Deep Learning in Scala
The preferred language of current deep learning frameworks (TensorFlow, PyTorch, MXNet, DyNet, etc.) is Python, a type-unsafe language. Inspired by the typesafety of Scala, we present Nexus, a prototypical typesafe deep learning engine in Scala. Being extraordinarily expressive in types, Nexus offers unforseen typesafety (axes of tensors are typed statically) and succinctness to deep learning developers by extensive use of typelevel computation through the popular library Shapeless. In this talk I'll introduce the design of a deep learning framework, and how Scala's type-level computation abilities could make it safer, easier to write and more expressive. Ideas include generalized algebraic data types (GADTs), heterogeneous lists (HLists), program verification (compiling-as-proofs with Scala implicits), and introductory machine learning.

Speakers
avatar for Tongfei Chen

Tongfei Chen

PhD student, Johns Hopkins University
Natural language processing researcher; programming language aficionado. Likes to talk about NLP/ML/AI/type systems/functional programming.


Friday November 16, 2018 11:30am - 12:00pm
data

11:30am

Rust and Other Interesting Things
Bryan, the CTO of Joyent, and a core contributor to Solaris, ZFS, and DTrace, formerly a Distinguished Engineer at Sun, has recently picked up Rust.  He'll share his experience with us.

Speakers
avatar for Bryan Cantrill

Bryan Cantrill

CTO, Joyent
Bryan Cantrill is the CTO at Joyent, where he oversees worldwide development of the SmartOS and SmartDataCenter platforms, and the Node.js platform.Prior to joining Joyent, Bryan served as a Distinguished Engineer at Sun Microsystems, where he spent over a decade working on system... Read More →


Friday November 16, 2018 11:30am - 12:00pm
functional

11:30am

Using Akka Streams for Web-scale Data Ingestion
Iterable uses Akka Streams to manage streams of tens of millions of events daily for some of the world's largest consumer companies. In this talk we'll discuss how we use Akka Streams to ingest streams of event data into Elasticsearch. We'll show how Akka Streams makes writing streaming-related code more elegant, flexible, and performant. We'll discuss the various stream operations such as `mapAsync` and `groupedWithin`, as well as more complex patterns such as MergeHub and PartitionHub. We'll look at how Akka Streams deals with common streaming data issues such as idempotency, ordering of updates, and backpressure. Finally, we'll discuss practical issues such as measuring performance, testing, and tuning an Akka Streams-based system in the context of massive data ingestion into a distributed, eventually consistent database.


Speakers
avatar for Greg Methvin

Greg Methvin

Software Engineer, Iterable


Friday November 16, 2018 11:30am - 12:00pm
reactive

12:00pm

Enabling Big Data and Machine Learning for the Masses: Creating a Spark Platform for the Uninitiated
Medium is expanding its use of big data and machine learning to support its product teams. In doing so, it needs to find a way to leverage both the existing technical stack in which it has invested and the knowledge of its engineering team. Unfortunately, these are somewhat at odds. Medium has heavily invested in Scala and Spark for its ETL pipelines. And while Spark certainly provides functionality to support big data analysis and machine learning, its learning curve is very high and only a few Medium engineers have experience with it. To combat this, Medium is actively developing a platform that eases the learning curve for both big data and machine learning operations. This is not only helping get to machine learning results faster, but also write and maintain ETL pipelines more efficiently. The platform includes tools for development, online and offline testing, machine learning experimentation, and monitoring.

Speakers
avatar for Tim Kral

Tim Kral

Team Lead of Data Engineering, Medium


Friday November 16, 2018 12:00pm - 12:30pm
data

12:00pm

You Are a Scala Contributor
Scala is a community-based language. A few people at Lightbend and the Scala Center are paid to facilitate, but ultimately Scala succeeds or doesn't because of you. This talk is about how to participate effectively in open-source work happening in the scala/* repositories on GitHub. You'll learn the overall lay of the land as well as advice on contributing in specific areas such as websites and documentation, issue reporting, Scala modules, the Scala standard library, and even the Scala compiler.

Speakers
avatar for Seth Tisue

Seth Tisue

Scala team, Lightbend
I like: compilers and interpreters, functional programming, and open-source software. I've been active in the Scala community since 2008. Before joining Lightbend in 2015, I used Scala to build the compiler and other tools for NetLogo, an open-source programming language for kids... Read More →


Friday November 16, 2018 12:00pm - 12:30pm
functional

12:00pm

Scaling Bayesian Experimentation
In the past few years companies like Optimizely, Google, and Uber have noted the limitations of null hypothesis significance testing (NHST) as the framework for experimentation. In particular, statistical problems like multiple comparisons and peeking have been difficult to solve. While some organizations have gone the route of sequential testing, other Bayesian methods provide an alternative to overcome these problems, but are often avoided because of worries about their complexity and computational intensity. We will talk about three challenges with Bayesian statistics for experimentation and how big data, tools like Spark, and a little statistical ingenuity can help us address them. The three challenges we will discuss are (1) coming up with priors for experimentation in a world of big data, (2) building a fast Bayesian computation pipeline that is generalizable to all of the metrics your organization cares about, and (3) overcoming computational inefficiencies when using these statistical methods in a real-time experimentation environment. In the literature on Bayesian statistics, and especially in criticisms of it, you will often run across the difficulty of coming up with priors for statistics. We will show how we were able to automate the computation of priors using data we already had lying around and how we were able to improve on that by slightly changing how we log certain events. The other criticism of Bayesian statistics, and a potential roadblock for implementing it in a big data pipeline, is that it is computationally expensive. This is especially true for more complex models such as a standard revenue distribution which is typically multimodal with a peak at zero and then another near the average receipt. Under a Bayesian methodology, such distributions require multiple parameters to be estimated and do not have analytic (conjugate) priors. The standard approach of using Markov Chain Monte Carlo (MCMC) simulations can be too slow, is not parallelizable, and requires modeling of each metric. We will discuss how we use Spark to efficiently use a statistical method called bootstrapping to handle these computational problems and provide a generalizable solution to Bayesian computation. Lastly, we often want to run our experimentation analysis in real-time so that we can make fast decisions or to inform an n-armed bandit algorithm. We will talk about some approaches we use to decrease the computation needed in a real-time experimentation analysis environment. Although bootstrapping is more efficient than MCMC, it is still more expensive than analytic methods and can be prohibitively costly in real-time. We will talk about a couple of methods we have developed to update bootstrapped data and compare their performance with a naive method.

Speakers
avatar for Paul Cho

Paul Cho

Data Engineer, Udemy
avatar for Robert J. Neal

Robert J. Neal

Principal Software Engineer, Udemy
Software engineer who prefers Scala. Primarily working in experimentation, statistics, and reinforcement learning.


Friday November 16, 2018 12:00pm - 12:30pm
reactive

12:30pm

Lunch
Lunch and meeting new friends!

Friday November 16, 2018 12:30pm - 1:10pm
Commons

1:10pm

Getting started with EitherT
Everything you've ever wanted to know about EitherT in 20 minutes or less! Learn the basics of using Either in your microservices and how it can help to improve error handling in your code. With an understanding of how to use those, we will then discuss the Either transformer known as EitherT, and how you can make use of it.

Speakers
avatar for Julie Laver

Julie Laver

Sr. Software Engineer, Twilio


Friday November 16, 2018 1:10pm - 1:30pm
functional

1:10pm

Optimizing network topologies with monadic execution contexts
Constellation's goal is to horizontally scale blockchain protocols using a DAG (directed acyclic graph) and reputation/trust model. We’re using Scala, Akka, Cats, Algebird and Kubernetes to build an asynchronous consensus service. We’re building off HoneyBadger ACS and CHECO with inspirations from GraphX, Pregel, and PageRank to scale consensus with a reactive execution graph. Conventional validation is split into granular monadic execution contexts to distribute the workload, and network topology undergoes dynamic rebalancing. Reputation is measured both deterministically from chain data as well as through node to node labels capturing ‘influence’ on the network in a fashion similar to Twitter. This is in contrast with a typical blockchain which relies on either linear state transitions / sharding, and proof of work / proof of stake.

Speakers
avatar for Ryle Goehausen

Ryle Goehausen

VP of Engineering, Constellation Labs
Scala / Python / Spark / Machine Learning / Performance engineering. Early Databricks and Spark adopter. Working on high performance cryptocurrency and reputation modeling.
avatar for Wyatt Meldman-Floch

Wyatt Meldman-Floch

CTO & Senior Developer, Constellation Labs
Software engineer experienced in machine learning, distributed systems and functional programming. Founded Constellation Labs, applying distributed graph processing methods to problems in distributed ledger technology. Focused on combinatorial models of distributed computing.


Friday November 16, 2018 1:10pm - 1:30pm
reactive

1:10pm

High-performance functional bayesian inference in Scala
This talk will present Rainier, a high-performance open source library from Stripe for developing bayesian inference models in Scala. The talk will focus on two aspects of the library: first, the underlying computational graph, which allows mathematical functions written in idiomatic Scala to be compiled to very fast, zero-allocation JVM bytecode, along with automatic derivation of their gradients. Second, the high-level bayesian inference API built on top of this graph, which provides a familiar, immutable and composable monadic interface for specifying model priors and conditioning them on observed data.

Speakers
avatar for Avi Bryant

Avi Bryant

Engineer, Stripe
Avi has led product, engineering, and data science teams at Stripe, Etsy, Twitter and Dabble DB. He's known for his open source work on projects such as Seaside, Scalding, and Algebird. Avi currently works on Stripe's deep learning and probabilistic programming models.


Friday November 16, 2018 1:10pm - 1:50pm
data

1:40pm

How Twitter teaches Scala
Circa 2009, Twitter’s scalability issues with its Ruby on Rails platform prompted a migration to a Scala-based stack. Today, Twitter is one of the largest organizations using Scala as its main programming language to power its platform. In fact, more and more organizations are steadily adopting Scala in their business-critical applications, making Scala engineers in high demand. However, this demand hasn’t yet translated in a large pool of Scala engineers that Twitter or other companies can readily hire. Because many software engineers joining Twitter are unfamiliar with Scala, this talk will focus on how the company redesigned and updates its onboarding process so as to best ramp up its new engineering hires with regards to. Scala. We will also discuss the role that Scala plays in our continuing engineering education.

Speakers
avatar for Ivan Corneillet

Ivan Corneillet

Instructor, Twitter
Ivan is a technologist, thinker, and tinkerer, having worked on hardware design, software engineering, and everything in between. Today, Ivan is focusing on leveling up Twitter's engineering talent. Previously, he was teaching machine learning to highly motivated aspiring data science... Read More →


Friday November 16, 2018 1:40pm - 2:00pm
functional

1:40pm

Distinguishing features of production-quality data pipelines
It's now easier than ever before to execute SQL queries or a few lines of code in a notebook against massive datasets and produce exciting results on a whim. Sometimes though you might be asked to take your one off proof of concept and "productionize" it. Simply putting your query or script on a schedule might be all that's needed for the problem at hand, but if you're building for the long term, there are costly pitfalls you might face in the future with new data and changing logic. What are the foundations to build on and what are the nice to have qualities and utilities so that you can avoid the big data equivalent of emailing spreadsheets to each other? You may have heard of "lambda architecture," "immutable append only data sources", "reproducible deterministic outputs," "atomic deployments" and how nice it is for your data pipeline to have these qualities, but what are the specific benefits and in what situations are they important or not? This talk will detail various practices and principles around data pipelines which can help to avoid costly mistakes and hours lost to debugging mysteries. For the most part we'll focus on why you might put effort into certain goals that don't directly affect your immediate results. This talk is geared mainly towards scala/spark data pipelines but aims to be relevant to other kinds of data pipelines as well.

Speakers
avatar for Nimbus Goehausen

Nimbus Goehausen

Principal Data Engineer, Demandbase


Friday November 16, 2018 1:40pm - 2:00pm
reactive

2:10pm

FiloDB: Real-time, In-Memory Time Series at Massive SMACK Scale
Time series and event data is becoming huge for every business, and ingesting millions of series reliably while answering many concurrent queries from users is a huge challenge. In this talk I share the story of developing and productionizing FiloDB, an open source, in-memory time series solution built with the Scala, Akka, Kafka, Cassandra, Mesos (SMACK) stack. FiloDB is able to reliably ingest monitoring/time series data and answer tons of low latency queries at massive scale. * Why we developed our own solution after looking at Prometheus, OpenTSDB, Cassandra, etc. * Time series data model and low-latency distributed querying at scale * The benefits and challenges of off-heap, in-memory data processing at scale * Building a database for modern container environments * Challenges with scaling the Prometheus data model while remaining compatible * Persistent, recoverable data at scale with Kafka and Cassandra * Key lessons in building massively scalable, real-time, low-latency data systems

Speakers
avatar for Evan Chan

Evan Chan

Senior Software Engineer, Apple
Evan loves to design, build, and improve bleeding edge distributed data and backend systems using the latest in open source technologies. He is the creator of the FiloDB open-source distributed time-series database, as well as the Spark Job Server. He has led the design and implementation... Read More →


Friday November 16, 2018 2:10pm - 2:50pm
data

2:10pm

Scalaz Stream: Rebirth
Scalaz-stream introduced the Scala ecosystem to purely functional streams. Although widely deployed in production at Verizon, SlamData, and elsewhere, the last release of scalaz-stream was October 23rd, 2016, and newer projects such as FS2, Monix Iterant, and others have risen to fill the gap, each addressing different use cases in the functional programming community. In this talk, John A. De Goes, architect of the Scalaz 8 effect system, unveils a new design for a next-generation version of scalaz-stream. Designed to embrace the powerful features of the Scalaz 8 effect system—features not available in any other effect type—this design features higher-performance, stronger guarantees, and far less code than previous incarnations. You are invited to discover the practicality of functional programming, as well as its elegance, beauty, and composability, in this unique presentation available exclusively at Scale By the Bay.

Speakers
avatar for John A. De Goes

John A. De Goes

Chief Prophet of Functional Programming, Stealth Startup
John A. De Goes has been writing Scala software for more than eight years at multiple companies, and has assembled world-renowned Scala engineering teams, trained new developers in Scala, and developed several successful open source Scala projects. | | Known for his ability to take... Read More →


Friday November 16, 2018 2:10pm - 2:50pm
functional

2:10pm

Nelson: Functional programming in system design
As functional programmers we work hard to keep our code compositional and modular. Unfortunately, these noble pursuits often stop with our code, the resulting program flung over the wall to be deployed in complex and prescriptive systems. In this talk we will look at Nelson, a deployment orchestration system that applies functional programming not only in its implementation, but also in its system behavior in managing the messy world of deployment infrastructure. Where Free algebras and streams allow Nelson to be extended to support different VCS platforms, schedulers, and health checkers, Nelson's emphasis on immutable deployments inform system-level decisions such as deployment workflows and service discovery. Having been run successfully at different companies under different configurations, Nelson serves as a prime example of how functional programming can help not just with the software we write, but also with the systems we maintain.

Speakers
avatar for Adelbert Chang

Adelbert Chang

Lead Data Engineer, Target
Adelbert Chang is a Lead Data Engineer at Target where he works on infrastructure systems for the Data Science and Optimization team. Previously he worked at U.C. Santa Barbara doing research in large-scale graph querying and modeling, and in industry on machine learning systems... Read More →


Friday November 16, 2018 2:10pm - 2:50pm
reactive

3:00pm

Labels to Inference: A Continuous Sentiment Pipeline
At Zignal Labs we track brand health for major brands and organizations such as Nvidia, Airbnb and IBM. A critical aspect of this is understanding the polarity of conversations as they unfold in real time across social and traditional media sources. We will discuss how we pulled together AWS services including Mechanical Turk, Code Pipeline, Lambda, and Sagemaker to deliver a complete sentiment solution that better fits our customer use case and provides transparency into our sentiment quality. We will dive into how these services can work together to enable continuous delivery and continuous model retraining providing both system reliability and rapid inclusion of new labeled data to preserve model quality as conversations shift.

Speakers
avatar for Jeff Fenchel

Jeff Fenchel

Software Engineer, Zignal Labs
Jeff is a software engineer and data enthusiast passionate about enablement through continuous pipelines, developing and taming distributed systems, streaming data pipelines, and NLP through continuous measurement and testing. He recently started applying his platform and devops... Read More →


Friday November 16, 2018 3:00pm - 3:20pm
data

3:00pm

Scala the Cloud Native Way: Lessons Learned from Two Years of Linkerd in Production
Linkerd, an open source service mesh built on Scala, Finagle, and Netty, has seen steady adoption since its inception in 2016, and is now used in production by companies like Salesforce, CreditKarma, and FOX. In this talk, we describe lessons learned in adapting these technologies to the cloud native ecosystem. We discuss how some of Finagle's core components, like Vars and Activities, are used to provide robust and dynamic behavior in Linkerd. Finally, we explore some of the operational challenges of using the JVM in cloud native environments, which have radically different constraints than those which the underlying technologies were designed for.

Speakers
avatar for Dennis Adjei-Baah

Dennis Adjei-Baah

Software Engineer, Buoyant Inc
Dennis is a software engineer at Buoyant Inc, where he contributes to and provides support for the open source project Linkerd, an L7 proxy that provides a dedicated service mesh layer in microservice environments. Linkerd enables applications to take advantage of features like circuit-breaking... Read More →


Friday November 16, 2018 3:00pm - 3:20pm
reactive

3:00pm

Play on Dotty: Design Patterns unlocked by Dotty in a Play look-alike demo project
The Dotty compiler currently supports some really awesome features of Scala 3.0 such as implicit functions and trait parameters. In this talk I will discuss what I think are the best parts of Scala's functional programming and object-oriented features and how Dotty emphasizes better cohesion and simplification of these paradigms with implicits. FP brings the notion that functions are just objects, but Scala brings to FP the notion that classes are just higher-order functions that produce modules. With this insight, we can see how you can write extremely modular code without exposing too much detail or hard-coding your modules. I will explore the design patterns implicit parameters / functions versus and constructor injection and when to use one or the other. I will also explore how this could look in a strawman iteration on the Play Framework's architecture (assuming it took these design patterns to heart). This will be an interesting talk if you are interested in Scala 3, implicit functions, ideas for how the Play Framework could evolve, or the sweetspot between OOP and FP that Scala enables.

Speakers
avatar for Jeff May

Jeff May

Principal Software Engineer, Rally Health
Veteran Scala Programmer, Rust Enthusiast, Activist, and Drummer. I love talking about philosophy, science, and politics. My passion is building systems that expand our capacity as humans to serve each other and the planet.


Friday November 16, 2018 3:00pm - 3:40pm
functional

3:30pm

Scio data processing nirvana at Spotify
Two years ago, Neville introduced Scio, an open-source Scala framework to develop data pipelines and deploy them on Google Dataflow. In this talk, we will discuss the evolution of Scio, and share the highlights of running Scio in production for two years. We will showcase several interesting data processing workflows ran at Spotify, what we learned from running them in production, and how we leveraged that knowledge to make Scio faster, and safer and easier to use.

Speakers
avatar for Bram Leenders

Bram Leenders

Infrastructure Engineer, Spotify
Bram is an infrastructure engineer at Spotify, working on some of the core backend services and event delivery pipelines.
avatar for Julien Tournay

Julien Tournay

Data Engineer, Spotify


Friday November 16, 2018 3:30pm - 3:50pm
data

3:30pm

Orchestrating Microservices with GraphQL
Intuit is at the half way mark on its multi-year journey to decompose its Quickbooks online platforms into micro services connected through a single Graph. API access is kept simple by decomposing & orchestrating complex requests at the entry point, resulted in improved developer productivity.

Speakers
avatar for Greg Kesler

Greg Kesler

Principal Software Engineer, Intuit, Inc.
Greg is an tech leader at Intuit Developer’s Group focusing on building APIs, SDKs and tools for Intuit and partner developers. Since Intuit has started its journey to decompose monoliths to the micro services, Greg has been a game changer to help the company to build request orchestration... Read More →


Friday November 16, 2018 3:30pm - 3:50pm
reactive

4:00pm

Structured Deep Learning with Probabilistic Neural Programs
Machine learning problems with structured output spaces, such as generating from a context-free grammar, are difficult to represent in current deep learning frameworks. These frameworks let a user specify the neural network architecture for scoring a single output, but not the output space itself, which makes structured prediction difficult to express. In this talk, we describe probabilistic neural programs, an open-source Scala library for structured deep learning that lets a user specify both the architecture and output space in a simple, elegant form. This framework lets users rapidly implement, train, and run a variety of state-of-the-art structured prediction models that would be difficult or impossible to implement with other tools. We’ll demonstrate how the framework can be used to tackle an example structured prediction problem.

Speakers
avatar for Jayant Krishnamurthy

Jayant Krishnamurthy

Semantic Machines
Jayant is a scientist whose research focuses on machine learning techniques for natural language understanding and dialogue. He develops complex structured prediction models and the infrastructure for representing them effectively. He received his Ph.D. in Computer Science from Carnegie... Read More →


Friday November 16, 2018 4:00pm - 4:30pm
data

4:00pm

Real-Time Data Pipelines with Cloud Native Databases
Dealing with real-time data in distributed applications is a unique challenge that presents a host of challenges such as reliability, operational complexity, security and strong consistency. These challenges will continue to grow as companies deal with an exponential growth in data at scale. This talk will look at how cloud native databases fit into real-time data pipelines and help solve these challenges by simplifying the overall architecture and greatly reducing the complexity of more traditional data pipelines.

Speakers
avatar for Ryan Knight

Ryan Knight

Senior Solution Architect, Fauna
Ryan Knight (Grand Cloud) is Principal Architect at Grand Cloud. He is a passionate technologist with extensive experience in larg scale distributed systems and data pipelines. He first started Java Consulting at the Sun Java Center and has since worked at a wide variety of companies... Read More →


Friday November 16, 2018 4:00pm - 4:30pm
reactive

4:20pm

Fireside Chat with Richard Socher, Chief Scientist, Salesforce
Speakers
avatar for Alexy Khrabrov

Alexy Khrabrov

Chief Scientist and Founder, By the Bay
avatar for Richard Socher

Richard Socher

Chief Scientist, Salesforce


Friday November 16, 2018 4:20pm - 5:00pm
functional

4:30pm

Distributed Deep Learning with Horovod
Abstract: Learn how to scale distributed training of TensorFlow and PyTorch models with Horovod.
Frameworks like TensorFlow and PyTorch make it easy to design and train deep learning models. However, when it comes to scaling models to multiple GPUs in a server, or multiple servers in a cluster, difficulties usually arise.
In this talk, you will learn about Horovod, a library designed to make distributed training fast and easy to use, and will see how to train a model designed on a single GPU on a cluster of GPU servers. 

Speakers
avatar for Alex Sergeev

Alex Sergeev

Sr Software Engineer II, Uber Technologies, Inc.
Deep Learning Infrastructure @Uber. Author of http://horovod.ai


Friday November 16, 2018 4:30pm - 5:00pm
data

5:10pm

Panel II: Data Engineering and AI
AI and Machine Learning is more and more an integral component of data pipelines, with model deployment emerging as a new area of devops/engineering/analytics.  The tooling for AI development in production is just emerging, and there are exciting startups and established companies leading the way.  This panel will cover the field.

Moderators
avatar for Alexy Khrabrov

Alexy Khrabrov

Chief Scientist and Founder, By the Bay

Speakers
avatar for Peter Bailis

Peter Bailis

Founder and Assistant Professor, Sisu and Stanford CS
Founder and CEO of Sisu (http://sisu.ai), and assistant professor of computer science at Stanford University (http://bailis.org/).
avatar for Lukas Biewald

Lukas Biewald

Founder, Weights and Biases
avatar for Michelle Casbon

Michelle Casbon

Michelle Casbon is a Senior Engineer on the Google Cloud Platform Developer Relations team, where she focuses on open source contributions and community engagement for machine learning and big data tools. Prior to joining Google, she was at several San Francisco-based startups as... Read More →
avatar for Pete Skomoroch

Pete Skomoroch

Head of Data Products, Workday
Peter is Co-Founder and CEO of SkipFlag, which was acquired by Workday in 2018. Skipflag's technology uses your existing conversations, support tickets, and other communication to automatically build and update an enterprise knowledge base. It understands the people, topics, and facts... Read More →
avatar for Richard Socher

Richard Socher

Chief Scientist, Salesforce


Friday November 16, 2018 5:10pm - 6:00pm
functional

6:00pm

Happy Hour II
Our famous happy hour caps the day with excellent food and drinks and great conversation.

Friday November 16, 2018 6:00pm - 8:00pm
Commons
 
Saturday, November 17
 

8:00am

Breakfast and Welcome
Hot breakfast and uninterruptible coffee!

Saturday November 17, 2018 8:00am - 9:00am
Commons

9:00am

Keynote III
Saturday November 17, 2018 9:00am - 9:40am
functional

9:50am

Introduction to Apache Spark with Frameless
If you're interested in Spark, but you're certain you'll hate it because it's not as type-safe as you'd like, let's see if Frameless can change your mind.

Speakers
avatar for Brian Clapper

Brian Clapper

Principal Instructor and Application Engineer, Databricks, Inc.
Brian has more than 30 years' software development experience in a variety of languages and application domains. Lately, as a Databricks employee, he has been concentrating Apache Spark.


Saturday November 17, 2018 9:50am - 10:30am
data

9:50am

Effective Scala
Scala is a flexible language that enables many programming styles. While its un-opinionated design fosters innovation and experimentation in the community, the choices it offers places a burden on its users to figure out how best to use the language. This talk will be an opinionated recommendation of how to apply Scala in real world projects. We will claim that the most effective way to use Scala is not as a better Java, nor as Haskel on the JVM, but as a "third way" that best fits Scala's design. We'll give you practical guidelines that will make you a more effective Scala programmer.

Speakers
avatar for Bill Venners

Bill Venners

President, Artima, Inc.
Bill Venners is president of Artima, Inc., publisher of Scala consulting, training, books, and developer tools. He is the lead developer and designer of ScalaTest, an open source testing tool for Scala and Java developers, and Scalactic, a library of utilities related to quality... Read More →


Saturday November 17, 2018 9:50am - 10:30am
functional

9:50am

Street-fighting techniques for multi-tenant machine learning and big data workloads on Kubernetes
We like to think that Machine Learning and Big Data are all about harnessing our creativity and intelligence to solve extremely difficult problems. Yet, we spend only a small part of our time and budget on actually analyzing our data or building powerful models. Instead, we are forced to scale and optimize complex infrastructure, so that it can handle the jobs we will create. Unlike scaling web servers or stateless services, Big Data and Machine Learning workloads force us to be very conscious of things like GPU availability and data locality. How can we spend more time on the interesting part of the problem when our infrastructure is so complex? How can we spend most of our time in Spark analyzing our data, instead of allocating our executors? How can we spend most of our time in TensorFlow creating the most interesting deep network, rather than allocating workers to optimize GPU usage? At MapR, we have seen that Kubernetes can change things radically. We believe that Kubernetes is set to revolutionize how we create and run Big Data and Machine Learning applications. With recent enhancements, Kubernetes has reached a point where much of the traditional pain in creating these complex applications can be avoided. In this talk, we will demonstrate two examples of how to use Kubernetes to simplify your workload. The first is Spark-on-Kubernetes. We will describe the traditional challenges of building an analytics application using Spark. We will demonstrate dynamically building a large spark cluster for a specific analytics job. We will discuss best practices for handling data locality in the Kubernetes world. We will explore launching many Spark clusters in a shared environment and the various tricks for making this work on Kubernetes at scale. The second demonstration will use of Kubeflow, an implementation of TensorFlow for Kubernetes to train and serve deep learning models. We will explore topics like GPU reservation and scheduling. We will discuss and examine the various challenges we have in building an application that uses machine learning at scale in Kubernetes. We will demonstrate running KubeFlow in a multi-tenant cloud based environment.

Speakers
avatar for Sky Thomas

Sky Thomas

Engineer/Architect, MapR


Saturday November 17, 2018 9:50am - 10:30am
reactive

10:40am

Motivating Probabilistic Programming
Probabilistic Programming is a well established field within statistics and machine learning and over time researchers have noticed a few things: - it’s quite easy to construct a generative model; and - these models have the same kind of structure; but - writing an inference algorithm for each particular model is slow and error-prone. Wouldn’t it be great if we could derive an inference algorithm given a model description? We can think of this an analogous to writing in assembly versus a high-level language. When we write assembly we optimize everything by hand, implementing our own control structures and so on tailored to our specific problem. This is the situation with creating a custom inference algorithm for a particular generative model. When we write in a high-level language we use predefined constructs that the compiler translates into assembly for us. We gain a lot of productivity by giving up a little bit of performance (which we often don’t miss). This is the goal with probabilistic programming.

Speakers
avatar for Rahul Chitturi

Rahul Chitturi

Principal Engineer, Coatue


Saturday November 17, 2018 10:40am - 11:00am
data

10:40am

Shapeless Party Tricks in the Enterprise
Shapeless is a powerful library for strongly-typed generic programming in Scala. Numerous Scala libraries are using Shapeless to provide type class derivation and elegant functional APIs for JSON, binary protocols, test data generation, Spark Datasets, and even JDBC (an API which is now of legal drinking age). While Shapeless is a solid foundation for library authors to build upon, it can also provide type-safe solutions for one-off use-cases. Without deep-diving into the details of Shapeless, I'll share a few bite-size examples of Shapeless-based solutions that feel a bit like type-level party tricks but were used to solve real problems while working on data engineering and WebSocket-based protocols at Fortune 500 companies.

Speakers
avatar for Cody Allen

Cody Allen

Machine Learning Engineer, Salesforce
I'm generally interested in functional programming. Most of my FP experience is in Scala, and I'm always happy to talk about Cats and Typelevel projects.


Saturday November 17, 2018 10:40am - 11:00am
functional

10:40am

Programming the worldwide elastic supercomputer with Unison
Unison is a new open source language for building distributed systems. It starts with a premise that no matter the scale of a computation, no matter how many nodes it occupies, it should be expressible via a single program, not thousands of separate programs. A logical endpoint of this perspective is that there is only one computer, the global supercomputer, consisting of all the internet-connected compute nodes of civilization, and our programming languages should let us program this supercomputer simply and directly, even when describing computations that occupy thousands or millions of nodes! This talk introduces the Unison language, showing what it can be like to program systems of any size with this model of computing.

Speakers
avatar for Paul Chiusano

Paul Chiusano

Cofounder, Unison Computing, a public benefit corp
I've been doing functional programming in a mix of Scala and Haskell for 10 years and enjoy thinking about (and building) new programming technology. I like programming tech that is simple, delightful, and helps programmers focus on the fun parts of software creation. | | I currently... Read More →


Saturday November 17, 2018 10:40am - 11:00am
reactive

11:10am

Applied Machine Learning: a Netflix production
Applied Machine Learning is about as mature as Software Engineering circa 1998. For Data Scientists, it’s hard to collaborate, hard to be productive and hard to deploy to production. In the last 20 years, Software Engineers have become far more collaborative thanks to tools like git, far more productive thanks to cloud computing and far more effective at delivering quality software thanks to CI/CD and agile development practices. At Netflix, I get to work on problems like: how do we scale Data Science innovation by making collaboration effortless? How do we enable Data Scientists to single-handedly and reliably introduce their models to production? How do we make it easy to develop ML models that humans trust? More importantly, how do we use ML to make humans BETTER? In this talk, we’ll explore how Netflix is approaching these problems to further our mission of creating joy for our 125 Million+ members worldwide!

Speakers
avatar for Julie Pitt

Julie Pitt

Director, Machine Learning Infrastructure, Netflix
Julie leads the Machine Learning Infrastructure at Netflix, with the goal of scaling Data Science while increasing innovation. She previously built streaming infrastructure behind the "play" button while Netflix was transitioning from domestic DVD-by-mail service to international... Read More →


Saturday November 17, 2018 11:10am - 11:30am
data

11:10am

Duality and How to Delete Half (minus ε) of Your Code
There’s a prefix that shows up a lot in Haskell: “co-”. There are “comonads” and “coalgebras” and “covariant functors” … wait a second, that last one means something different than the others. But what, and how? I’ll explain the concept of duality (at least in category theory, you’re on your own for metaphysics) and distinguish it from often confused concepts like variance and isomorphisms. Duality often seems too abstract to be useful, but it can help us in a variety of ways, and we can take advantage of it to simplify (and even eliminate) writing certain kinds of code. While not all the components exist in Scala yet, we’ll discuss what is there and how it can be used to automate building some useful constructions. I am porting a tool for programmatically generating dual constructions that I wrote in Haskell (https://github.com/sellout/dualizer) to Scala. It can hopefully both reduce the amount of code you write and give you a new way to explore category theory.

Speakers
avatar for Greg Pfeil

Greg Pfeil

Senior Software Engineer, Formation
Greg has been working full-time with pure FP in Haskell and Scala for over six years. He currently abuses laziness for Formation, to extract efficient evaluation from exponential algorithms. He’s also known for inflicting recursion schemes on everyone and designing languages that... Read More →


Saturday November 17, 2018 11:10am - 11:30am
functional

11:10am

rpclib: Robust, traceable microservice communication library using Akka Streams, gRPC, and Envoy
Many services benefit from being built with a streaming architecture. For example, you may have a search microservice that accepts a query and returns a sequence of results, or an event ingestion service that processes streams of user events. Within each service, it’s relatively straightforward to create such a pipeline using a single uniform streaming library such as Akka Streams; but how about at the edges of your service, where it needs to communicate with other services? Frequently Akka Streams is used within an application, but other streaming APIs need to be used at the service interface. In this talk, I’ll present rpclib, a streaming RPC library built on top of gRPC that allows you to use Akka Streams at the RPC interfaces of your services, enabling you to write services that seamlessly abstract over gRPC as Akka Stream flows.

Speakers
avatar for Dan Li

Dan Li

Senior Scala Engineer, Tubi


Saturday November 17, 2018 11:10am - 11:30am
reactive

11:40am

Hadoop Future in AI World
Paraphrasing Yogi Berra, “The future of Hadoop is not what it used to be. The times are different. Not necessarily worse or better. They are just different.” In this talk, I would give a brief survey of the future of Hadoop as it used to be from 2005 to 2015, based on the roadmaps of several developers, evangelists (including yours truly), vendors, and analysts over those years, to give a flavor of how the future of Hadoop has changed over the years. I would try to analyze the reasons behind this change, primarily due to emergence of public cloud and rapid response by traditional analytics ecosystem. As hype around “Big Data” seems to subside, it is being replaced by yet another hype about “AI”. I would prognosticate about how the Hadoop ecosystem would evolve to power current and future AI use cases.

Speakers
avatar for Milind Bhandarkar

Milind Bhandarkar

Founder, Ampool
Milind Bhandarkar was the founding member of the team at Yahoo! that took Apache Hadoop from 20-node prototype to datacenter-scale production system. Parallel programming languages and paradigms has been his area of focus for over 20 years. He worked at several HPC companies, Yahoo... Read More →


Saturday November 17, 2018 11:40am - 12:20pm
data

11:40am

The unhappy path matters: designing a typechecker for Ruby with user-friendliness in mind
While I believe that types are very useful, especially for big teams, I can totally relate to people not liking types after seeing inscrutable type error messages. At Stripe, we are developing Sorbet, a type system for Ruby. We put a lot of effort into making Sorbet useful for code that does not yet typecheck. We consider this the common case: - you might be typing a new piece of untyped ruby; - you might be in process of a refactoring; - or you might be writing some new code that may not even parse. This talk would cover some of techniques we employ in Sorbet to provide good user experience while being fast.

Speakers
avatar for Dmitry Petrashko

Dmitry Petrashko

Developer Productivity, Stripe
Dmitry works on developer productivity at Stripe, making it easy to confidently write maintainable, fast, and reliable code by improving language, core abstractions, tools and educational materials. Before this, Dmitry has defended his PhD thesis on architecture of Dotty.


Saturday November 17, 2018 11:40am - 12:20pm
functional

11:40am

Leveraging Scala to Build Hardware at Scale
The hardware industry needs a fundamentally different approach to keep up with the new compute needs that are required for new applications such as IoT, edge computing, machine learning, and artificial intelligence. In an era where transistor scaling has stopped, the world will need lots of custom hardware to fulfill these new compute requirements. Unfortunately, increasing developer productivity has taken a back seat in the hardware industry. In this talk, I'll show how we're leveraging Scala to build hardware productively at a high-level. I'll present our full chip development stack, which is all written in Scala. The full chip stack consists of the Chisel hardware construction domain-specific language, the Diplomacy framework for parameter negotiation, and the FIRRTL compiler that turns Chisel circuits into Verilog netlists. With our full chip stack, a hardware engineer can express a complex modern SoC (system-on-chip) with less than 30 lines of statically type-checked Scala code!

Speakers
avatar for Yunsup Lee

Yunsup Lee

Co-Founder and CTO, SiFive, Inc.
Yunsup is SiFive’s Chief Technology Officer and co-founder. Yunsup received his PhD from UC Berkeley, where he co-designed the RISC-V ISA and the first RISC-V microprocessors with Andrew Waterman, and led the development of the Hwacha decoupled vector-fetch extension. Yunsup also... Read More →


Saturday November 17, 2018 11:40am - 12:20pm
reactive

12:20pm

Lunch
Lunch and meeting friends

Saturday November 17, 2018 12:20pm - 1:10pm
Commons

1:10pm

convolutional neural networks, swift and iOS 12
Tensorflow and Swift

We will discuss the history of neural network acceleration libraries (Tensorflow/Pytorch in particular), the current state of hardware/software integration (TPU/Volta in particular) and then look at where the industry is headed (LLVM+Functional Programming/Swift in particular).

Speakers
avatar for brett koonce

brett koonce

cto, quarkworks
brettkoonce.com


Saturday November 17, 2018 1:10pm - 1:30pm
data

1:10pm

Functional Programming For People That Hate Math
Functional Programming For People That Hate Math

Speakers
avatar for Joe Karlsson

Joe Karlsson

Senior Software Engineer, Best Buy
Joe Karlsson is Minneapolis based JavaScript Engineer at Best Buy and international technology speaker and educator. He is the creator of weird software, including bechdel.io which tells you if a movie script passes the Bechdel Test or not. Joe is interested in the Digital Humanities... Read More →


Saturday November 17, 2018 1:10pm - 1:30pm
functional

1:10pm

Quantum Computing Modeling in Scala
Quantum computing, Bayesian inference and accounting have a lot in common: they use stateful computations that can be modeled in the same way using a monadic map. Scala is the perfect language to unify these computing models.

Speakers
avatar for Constantin Gonciulea

Constantin Gonciulea

Distinguished Engineer, JPMorgan Chase
After doing math for most of my youth, at some point I switched to computing. I have mostly enjoyed doing application architecture, concurrent programming and distributed computing. Most recently I have turned back to math, or rather to a combination of math and computing by working... Read More →


Saturday November 17, 2018 1:10pm - 1:30pm
reactive

1:40pm

Creating a Data Fabric for IoT
The move to data-driven decisions is dominating every industry, with significant implications for data processing. The emergence of IoT and edge computing promises to add a new dimension to data-driven insights, but has also added additional considerations for modern data processing. This has created the need for new approaches and technologies that can bring IoT data and edge computing into the data processing fold. Such an approach need to satisfy the following: Capture data across edge devices and intelligently transform it so that it can be used at the edge, in the cloud and in the data center Manage and facilitate the flow of data from the edge to the data center or cloud to support real-time processing Trigger data processing applications as soon as data arrives at the edge, in the data center and in the cloud Process data centrally and push intelligence to the edge Provide support to quickly react to insights to mitigate risk, reduce costs and ensure security Make it easy to deploy and manage in an increasingly complex IT environment In this talk, we will talk about how to build an IoT data fabric using Apache Pulsar and how to satisfy all these requirements that works both in the edge and data center. It provides a small footprint and It can run at the edge where it captures, filters and transforms data and then pushes data to data center or cloud. Another instance of Apache Pulsar can run in the data center captures this data from several edge devices, processes them and store them indefinitely for post analysis. It provides sophisticated analytical capabilities using Pulsar Functions and SQL to process the data as it flows.

Speakers
avatar for Karthik Ramasamy

Karthik Ramasamy

Co-founder and Chief Product Officer, Streamlio


Saturday November 17, 2018 1:40pm - 2:00pm
data

1:40pm

Fork It Harder Make It Better
The case for forking the Scala development toolchain - and the case against it. Scala has the reputation of being hard to write tooling for, yet it is a vital part of the development experience. I want to present an overview of existing tools from writing code to building, testing and deploying it, how they are lacking, where better solutions exist outside of Scala Land and how we can improve it.

Speakers
avatar for Justin Kaeser

Justin Kaeser

Software Developer, JetBrains
Justin believes in "Tools before Rules": automating the development toolchain to remove the pain of dealing with institutional processes. At day he works on this goal as part of the IntelliJ Scala plugin team. At night he goofs off.


Saturday November 17, 2018 1:40pm - 2:00pm
functional

1:40pm

Journey of Building a Modern Data Prep Tool on Top of Apache Spark
Apache Spark is designed to be extensible and pluggable, offering much flexibility in how the system can be used. In this talk, we show how we utilize Spark to build the data preparation engine that powers Workday Prism Analytics. Our data prep engine runs two types of Spark applications: one that is “always on” to serve interactive data prep queries, and another that is “on demand” to perform batch processing of data pipelines. We demonstrate how Spark and Catalyst made it possible to have these two types of applications share much of the same code, differing only in sampling, caching, and result extraction. Further, we illustrate how our engine today takes advantage of Spark SQL and Catalyst to generate DataFrames/Datasets optimized for our use cases, and relies on Tungsten to facilitate codegen on 100+ custom library functions we expose to our users. We also describe how we leverage the Data Sources API to implement partition elimination and incremental data analysis on top of various file formats.

Speakers
avatar for Jianneng Li

Jianneng Li

Software Engineer, Workday
Jianneng is a software engineer specializing in distributed systems and data processing. He works at Workday on Prism Analytics, leveraging Apache Spark to build an end-to-end data analytics solution that helps businesses better understand their financial and HR data.


Saturday November 17, 2018 1:40pm - 2:00pm
reactive

2:10pm

Adding Custom Optimizations to Catalyst by Example with the DSE Spark Connector
Learn how the Datastax Spark Connector is wiring directly into Spark Internals to bring even more speed to users automatically! Find out how Catalyst actually interacts with Data Sources and the key locations which require modification in order to introduce custom behavior. Find out how writing a new Strategy and Execution nodes for catalyst actually works in practice! Come learn about our most recent optimizations and how they can directly benefit you or pick up some tips about writing your own custom optimizations!

Speakers
avatar for Russell Spitzer

Russell Spitzer

Software Engineer, DataStax
Spark, Cassandra, or Dogs.


Saturday November 17, 2018 2:10pm - 2:50pm
data

2:10pm

Scala.js in production
Unlike simple apps, where technology choices rarely matter, for complex apps, those choices can be crucial in getting the desired maintainability, agility, and performance. Our startup helps students learn Indian Classical Music with a modern approach by offering tools such as a composition editor, music transcriber, and real-time pitch/beat accuracy feedback provider right in the web browser. We love Scala and use it for backend and frontend development. With some effort, we were able to keep the code simple, make development a joyful experience, and extract high performance. In this talk, share our experiences in choosing frontend frameworks, client-server communication, use of WebAssembly for performance critical portions, and so on.

Speakers
avatar for Ramnivas Laddad

Ramnivas Laddad

CEO and Co-founder, Paya Labs
Ramnivas is a technologist, author, and presenter who is passionate about doing software right. He has been leading innovation in Spring Framework and Cloud Foundry since their beginning. Ramnivas has led a group in Cloud Foundry and started the Spring Cloud project. Ramnivas is the... Read More →


Saturday November 17, 2018 2:10pm - 2:50pm
functional

2:10pm

Deploying Kafka Streams Applications with Docker and Kubernetes
Deploying Kafka Streams Applications with Docker and Kubernetes
Abstract: Kafka Streams, Apache Kafka’s stream processing library,
allows developers to build sophisticated stateful stream processing
applications which you can deploy in an environment of your choice.
Kafka Streams is not only scalable, but fully-elastic allowing for
dynamic scale-in and scale-out as the library handles state migration
transparently in the background. By running Kafka Streams applications
on Kubernetes, you will be able to use Kubernetes powerful control
plane to standardize and simplify the application management – from
deployment to dynamic scaling.

In this technical deep dive, we’ll explains the internals of dynamic
scaling and state migration in Kafka Streams. We’ll then show, with a
live demo, how a Kafka Streams application can run in a Docker
container on Kubernetes and the dynamic scaling of an application
running in Kubernetes.

Speakers
GS

Gwen Shapira

Confluent


Saturday November 17, 2018 2:10pm - 2:50pm
reactive

3:00pm

MLflow: An open platform to simplify the machine learning lifecycle
Successfully building and deploying a machine learning model can be difficult to do once. Enabling other data scientists (or yourself, one month later) to reproduce your pipeline, to compare the results of different versions, to track what's running where, and to redeploy and rollback updated models is much harder. in this talk, I'll introduce MLflow, a new open source project from Databricks that simplifies this process. MLflow provides APIs for tracking experiment runs between multiple users within a reproducible environment, and for managing the deployment of models to production.

Speakers
avatar for Aaron Davidson

Aaron Davidson

Software Engineer, Databricks


Saturday November 17, 2018 3:00pm - 3:30pm
data

3:00pm

FP Scala Meat & Potatoes: HTTP, JSON, & SQL with http4s, Circe, & Doobie
It's great to talk about free monads vs. finally-tagless, monads vs. applicatives, fixpoints and recursion schemes, category theory, etc. but does pure FP in Scala have anything to offer those of us who just need to receive some JSON, talk to a SQL database, and return some JSON? Come for an (almost) jargon-free look at why pure FP matters even in such meat-and-potatoes tasks, and for some hints as to how this relates to some of the more esoteric uses of FP some of your colleagues may be up to, or you may want to pursue later.

Speakers
avatar for Paul Snively

Paul Snively

Sr. Software Engineer, Formation
I've been a language nut my whole life. Common Lisp, Scheme, Oz, OCaml, Haskell, and Scala all have a home in my heart for different reasons. I've been fortunate enough to have worked with Apple, AOL, Virgin, VMware, Intel, Verizon, and Formation, among others. I've spoken at Strange... Read More →


Saturday November 17, 2018 3:00pm - 3:40pm
functional

3:00pm

Simplicity for Programmable Money
Imagine taking your favourite functional programming language and removing recursion, recursive types, and even removing function types. What do you have left? Everything you need for programmable money! Simplicity is a new typed, combinator-based, functional language without loops and recursion, designed to be used within crypto-currencies and blockchain applications. Simplicity comes with formal semantics defined using Coq, a popular, general purpose software proof assistant based on dependent type theory. In this presentation I will describe the Simplicity language. I will demonstrate how to write some small programs in Simplicity and show how to use Coq to prove that they behave correctly.

Speakers
avatar for Russell O'Connor

Russell O'Connor

Software Developer, Blockstream
Lazy functional programing a la Haskell. | Developed lens-family and mezzolens Haskell libraries | | Dependently typed programming and proofs a la Coq. | Worked on Galois theory proofs for the verification of the Feit-Thompson theorem. | | Running NixOS on my laptop since 2010... Read More →


Saturday November 17, 2018 3:00pm - 3:40pm
reactive

3:30pm

Something for Nothing: Boostrapping Text Classification
The hardest part of building a text classifier is finding labelled data to train the model. The next hardest part is making sure that data is fair and representative. In this talk we will discuss some approaches to rapidly generating corpora suitable for supervised training from public data and with open-source tools. This talk will include some practical tips as well as some less-obvious pitfalls, and is suitable for both novices and more experienced Natural Language Processing Practitioners. At the end of the talk you will be able to give a convincing answer to the eternal question: how do I build a text classifier for a product that doesn't exist yet?

Speakers
avatar for Alizishaan Khatri

Alizishaan Khatri

Machine Learning Engineer, Pivotus
Alizishaan's professional passions revolve around two things : using technology to solve real-world problems and sharing solutions with the community. He is currently employed as a Machine Learning Engineer with Pivotus where he works on problems in the Natural Language Processing... Read More →
avatar for Alexander O'Connor

Alexander O'Connor

Director of Research / Machine Learning Engineer, Pivotus
Alexander O'Connor is Director of Research with Pivotus, a banking startup bringing human contact back to banking. At Pivotus, Alex is responsible for data wrangling and machine learning. Previously, he worked as a researcher and academic in universities in Dublin Ireland (DCU & Trinity... Read More →


Saturday November 17, 2018 3:30pm - 3:50pm
data

3:50pm

Quantum Computing and You
Quantum Computing exploits quantum phenomena such as superposition and entanglement to realize a form of parallelism that is not available to traditional computing. It offers the potential of significant computational speed-ups in quantum chemistry, materials science, cryptography, and machine learning. With no prior knowledge of condensed matter physics or advanced mathematics, it's hard to separate hype from reality. Come learn the fundamentals of Quantum Computing, get a sense of where it applies and what it can do, and the challenges being faced to make it a reality. Oh, and we'll write a little quantum computing code too! John works on the Quantum Computing team at Microsoft Research, where he leads the development of Q# - a new language for Quantum Computation.

Speakers
avatar for John Azariah

John Azariah

Q# Lead, Microsoft


Saturday November 17, 2018 3:50pm - 4:25pm
reactive

4:00pm

Concurrency with Cats-effect
The Cats Effect library recently reached its 1.0 release, providing powerful data types and type classes for purely functional effectful programming. In this talk, we’ll focus on building programs using fiber based concurrency and the synchronization primitives provided by Cats Effect. We’ll see how FS2 (Functional Streams for Scala) uses such primitives to build more advanced concurrent data types like bounded and unbounded queues. Finally, we’ll see how to apply these techniques to application design.

Speakers
avatar for Michael Pilquist

Michael Pilquist

Distinguished Engineer, Comcast
Michael Pilquist is the author of Scodec, a suite of open source Scala libraries for working with binary data, and Simulacrum, a library that simplifies working with type classes. He is a committer/maintainer on a number of other projects in the Scala ecosystem, including Cats and... Read More →


Saturday November 17, 2018 4:00pm - 4:30pm
functional

4:30pm

Classical Category Theory in Plain Scala
This is a full implementation of small categories and constructs based on them, like diagrams, cones, cocones, limits, colimits, etc. As an illustration, a model of Zermelo-Fraenkel set theory is implemented. Choice Axiom included.

Speakers
avatar for Vlad Patryshev

Vlad Patryshev

contributor, Salesforce
Software developer with an experience in categories and toposes. | Teaching logic and formal methods at Santa Clara University. | Working as a data engineer at Salesforce.


Saturday November 17, 2018 4:30pm - 5:00pm
functional

4:30pm

Edge ain't your gramp's IoT: design an implementation of a modern Edge Computing Platform
Abstract: Connecting IoT devices to the Internet is not new but
deploying and running real-time edge apps at hyperscale using these
devices is. IoT is making the world cyber-physical, making computing
ubiquitous, and making cloud-native apps live life on the edge forever
unshackled from the confines of a datacenter. Edge Computing evolves
Cloud Computing by keeping what's great about the Cloud model
(developer friendly APIs and Software-defined everything) yet applying
it in the harsh physical and security environment of sensors and
ruggedized industrial PCs. In this talk we will cover design and
implementation of a novel Edge Computing platform created at ZEDEDA
Inc. We will focus on new, special purpose, open source operating
environment that has to securely run on billions of ARM and x86
device. Based on Linuxkit, this operating environment completely
replaces the need for embedded Linux or any special purpose RTOS
systems and instead allows them to run side-by-side. We will give
hands on examples of how anyone can start using this operating
environment or, perhaps, even port it to the device of their choosing.



Speakers
avatar for Roman Shaposhnik

Roman Shaposhnik

Co-Founder and VP Product & Strategy, Zededa
VP Product & Strategy, Co-founder @ZEDEDA Inc. | Roman is a renowned expert and consultant in corporate Open Source and Digital Transformation. Prior to ZEDEDA, Roman played a key role in | shaping the Open Source collaboration under the umbrella of Linux | Foundation and held leadership... Read More →


Saturday November 17, 2018 4:30pm - 5:00pm
reactive

5:10pm

Panel III: Cloud, Edge, and Silver Lining
In this panel, we'll consider the emerging architectures on the edge, including IoT, and enterprise stacks, including blockchain approaches, that make this next phase of the Internet and its brave new world possible.

Moderators
DH

Derrick Harris

Founder and Editor, ARCHITECHT

Speakers
avatar for Holden Karau

Holden Karau

Developer Advocate, Google
Holden is a transgender Canadian open source developer advocate @ Google with a focus on Apache Spark, Airflow, and related "big data" tools. She is the co-author of Learning Spark, High Performance Spark, and another Spark book that's a bit more out of date. She is a committer and... Read More →
AN

Anoop Nannra

Head of Blockchain/DLT Initiative, Cisco
avatar for Roman Shaposhnik

Roman Shaposhnik

Co-Founder and VP Product & Strategy, Zededa
VP Product & Strategy, Co-founder @ZEDEDA Inc. | Roman is a renowned expert and consultant in corporate Open Source and Digital Transformation. Prior to ZEDEDA, Roman played a key role in | shaping the Open Source collaboration under the umbrella of Linux | Foundation and held leadership... Read More →


Saturday November 17, 2018 5:10pm - 6:00pm
functional
 


Twitter Feed