The Alpakka Kafka connector (formerly known as reactive-kafka) is a component of the Alpakka project. It provides a diverse streaming toolkit, but sometimes it can be challenge to design these systems without a lot of experience with Akka Streams and Akka. By combining Akka Streams with Kafka using Alpakka Kafka, we can build rich domain, low latency, and stateful streaming applications with very little infrastructure.
This talk will discuss solutions to common Kafka and streaming problems such as consumer group partition rebalancing, exactly-once/transactional message delivery, stateful stages, state durability/persistence, and common production concerns like job failover and deployment.
The Alpakka project is an open source initiative managed by Lightbend to implement stream-aware, reactive, integration pipelines for Java and Scala. It is built on top of Akka Streams, and has been designed from the ground up to understand streaming natively and provide a DSL for reactive and stream-oriented programming, with built-in support for backpressure.
In the past few years companies industry wide have noted the limitations of traditional null hypothesis significance testing (NHST) for online experimentation. In particular, statistical problems like multiple comparisons and peeking have been difficult to solve while still being able to make fast business and product decisions. Bayesian methods provide an alternative to overcome these problems, but are often avoided because of worries about their complexity and computational intensity.
We will talk about three challenges with Bayesian statistics for experimentation and how big data, tools like Spark, and a little statistical ingenuity can help us address them. The three challenges we will discuss are (1) coming up with priors for experimentation in a world of big data, (2) building a fast Bayesian computation pipeline that is generalizable to all of the metrics your organization cares about, and (3) overcoming computational inefficiencies when using these statistical methods in a real-time experimentation environment.
To accomplish (2) we use bootstrapping and for (3) we will talk about some of the challenges and solutions to making it computationally efficient.
In the past few years Internet-based companies have noted the limitations of traditional null hypothesis significance testing (NHST) for large-scale, online experimentation. In particular, statistical problems like multiple comparisons and peeking have been difficult to solve. Bayesian methods provide an alternative to overcome these problems, but are often avoided because of worries about their complexity and computational intensity.
We will talk about three challenges with Bayesian statistics for experimentation and how big data, tools like Spark, and a little statistical ingenuity can help us address them. The three challenges we will discuss are (1) coming up with priors for experimentation in a world of big data, (2) building a fast Bayesian computation pipeline that is generalizable to all of the metrics your organization cares about, and (3) overcoming computational inefficiencies when using these statistical methods in a real-time experimentation environment.
In the literature on Bayesian statistics, and especially in criticisms of it, you will often run across the difficulty of coming up with priors for statistics. We will show how we were able to come up with a general approach to generating priors.
The other criticism of Bayesian statistics, and a potential roadblock for implementing it in a big data pipeline, is that it is computationally expensive. This is especially true for more complex models such as a standard revenue distribution which is typically multimodal with a peak at zero and then another near the average receipt. Under a Bayesian methodology, such distributions require multiple parameters to be estimated and do not have analytic (conjugate) priors. The standard approach of using Markov Chain Monte Carlo (MCMC) simulations can be too slow, cannot be parallelized, and requires modeling of each metric. We will discuss how we use Spark to efficiently use a statistical method called bootstrapping to handle these computational problems and provide a generalizable solution to Bayesian updating.
Lastly, we often want to run our experimentation analysis in real-time so that we can make fast decisions or to inform an n-armed bandit algorithm. We will talk about some approaches we use to decrease the computation needed in a real-time experimentation analysis environment. Although bootstrapping is more efficient than MCMC, it is still more expensive than analytic methods and can be prohibitively costly in real-time. We will talk about a couple of methods we have developed to update bootstrapped data and compare their performance with a naive method.
Abstract: Applications have had an interesting evolution as we've moved into the distributed and scalable world. Similarly, storage and its cousin databases have changed side-by-side with applications. Many times, the semantics, performance, and failure models of the storage and applications do a subtle dance as they change in support of changing business requirements and environmental challenges. Adding scale to the mix has really stirred things up. This talk and paper look at some of these issues and their impact on our systems.
Bio: Pat Helland has been building databases, distributed systems, messaging systems, transactional systems, application platforms, big data systems, and multiprocessors since 1978. His employers have included Tandem Computers, Microsoft, and Amazon. Pat attended UC Irvine and was a recipient of the UCI Information and Computer Science Hall of Fame Award (even though he dropped out). For recreation, Pat writes regular articles for the Communications of the ACM. He is employed by Salesforce.
Looking at it with a computing mindset, quantum state is not that different from classical or probabilistic state. In this presentation we will show a common abstraction that captures the similarities and differences in representing and evolving classical, probabilistic and quantum state. Concrete scenarios, like managing account balances, portfolio allocation, Bayesian inference and quantum computing simulation are used as examples, with running code. A particular type of monadic transformation ties all these use-cases together.
The presentation touches upon the implementation of all 4 quantum postulates (state representation, evolution, measurement and composition) and visualizes them using biased coins, dice and complex histograms. We will also show a simple application of quantum computing: counting the number of binary words of a fixed length with no consecutive ones. This is, of course, a typical interview question about Fibonacci numbers.
https://github.com/logicalguess/quantum-scale/blob/master/docs/QuantumScala.pdf