6

Parameter Exploration at Lyft

 2 years ago
source link: https://eng.lyft.com/parameter-exploration-at-lyft-b9d2a1483c82
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Parameter Exploration at Lyft

What is Parameter Exploration

At Lyft, the experimentation team’s purpose is to drive data-driven decision making. We primarily do so by facilitating different experiment techniques including AB testing, time-split experiments, and region-split experiments. These experiments help teams determine the effect and impact of their change to decide to ship the product change and to inspire future innovation.

0*w0xgQ9xUF12OaUCR.jpg?q=20
parameter-exploration-at-lyft-b9d2a1483c82

A/B testing can be viewed as a special case of a more general problem, in which we wish to optimize over a space of possible parameters or configurations with respect to certain objective metrics. These parameters can be integer, float, or discrete variables that control some aspect of the system being tested. As an example, one parameter that we may want to tune is the amount of time a rider should expect to wait for in our new wait-and-save mode. We may want to optimize this parameter to try to maximize user conversion for this mode.

For this problem space, the fact that we must test a range of values complicates the experimentation process. We tackled these problems in the past, by running multivariate user split experiments or a series of time-split experiments; in these experiments, every variant represents a value for a parameter and we would test these parameters until we were satisfied that we covered enough of the parameter space. This, however, was not a perfect solution for three big reasons:

  • First, testing specific values of a parameter does not account for the fact that the parameter values are often sampled from a continuous function, meaning we can extrapolate to unobserved values of the parameter space. For example, as we decrease the price of a ride, we expect to see the number of rides requested increase smoothly. Thus, we should be able to estimate the change in ride numbers if we decrease prices by 15% if we already know how many rides we get when decreasing prices by 10% and 20%.
  • Second, running multiple experiments to tune parameters is tedious and requires a lot of human intervention. Setting up experiments, tearing down experiments, monitoring metrics, and reporting on experiments are time consuming. Not only does this eat into science/engineering time, but it also slows down iteration and decision making.
  • Third, the optimal value for parameters could shift over time depending on a variety of external factors, some of which we cannot control at all. In order to keep these parameters up to date, we would need to spend even more time and resources on running additional experiments to tune these parameters. The underlying market conditions could even change between experiments which further complicates the process.

Bayesian Optimization over Gaussian Processes

For these three reasons, we built a separate parameter exploration/tuning tool based on Bayesian optimization and Gaussian process models. In order to explain what this means, we need to understand Bayesian optimization and GP(Gaussian processes) at a high level.

0*vwCDgamHBJ9e4D7x?q=20
parameter-exploration-at-lyft-b9d2a1483c82

Gaussian processes are a flexible class of functions that we use to model our continuous parameter space. To show how it works, we can refer to the previous example. Suppose we gather data on 10% and 20% price changes. Then we could estimate the change in ride numbers for 15% and 19% price changes as well, but the 19% estimate will likely have lower variance because we already examined a nearby data point. To be clear, GP is used to model our understanding of reality, it is not optimizing anything.

0*ZSmU3cZsbhRfs4SI?q=20
parameter-exploration-at-lyft-b9d2a1483c82

The second piece of the science equation is Bayesian optimization, which is the process by which we discover the optimal value. The procedure is straightforward; we have a Gaussian process model that represents our current estimate, we can generate a new batch of configurations that will tell us how we should gather data next. Finally we update the existing model with the new points to update our state of reality. As we sample, we will come closer and closer to finding the optimal value for a parameter. Bayesian optimization will choose the next points to sample using the acquisition function, which tries to manage a tradeoff between exploration and exploitation. This tradeoff is key; whenever we try to optimize anything, we need to be cognizant of the fact that the more data points we try to explore, the more suboptimal results we will serve to the customers. The reverse is true as well — the less willing we are to explore, the less likely we are to find the optimal value. Therefore, whenever we set up a Bayesian optimization problem, we need to configure how much exploration vs exploitation we want to do.

Running Parameter Exploration at Scale

1*a543ojyQepw8RoIiXc4rNg.jpeg?q=20
parameter-exploration-at-lyft-b9d2a1483c82

Utilizing Bayesian optimization, we were able to build a reinforcement learning system that would evaluate and adjust parameter values periodically. The architecture of the system had 2 principal components: the service and the model update worker. The service’s responsibility is to deliver the parameter values that the model wants to test, while the worker will update the existing model with new data on a configured cadence. This decoupled architecture maintains similar abstractions to our A/B testing architecture which clearly delineates between the implementation of the experiment (the service) and the measurement of metrics (the model update worker). When running a parameter tuning system, software engineers will simply write code to request for parameter values; they need not concern themselves with tracking users down for metric values. The system is highly configurable and engineers are able to set the following: the cadence in which the model updates, the amount of exploitation/exploration we want to do, the bounds of the parameter, and the kind of metrics we want to optimize for.

0*3GIbNerV6VWhMlcq.png?q=20
parameter-exploration-at-lyft-b9d2a1483c82

While the core concepts of this system seem simple, we quickly ran into issues. One of the issues stemmed from the fact that the service was too configurable. Users were able to specify expensive metrics and also set short periods for the model update. This means that one model update job may not finish before another job gets queued up. While this system does not crash the actual service because it is able to use older models, the pipeline being clogged was a big issue because the workers that updated models come from a shared pool of resources. This means one poorly configured parameter tuning system can affect another. To tackle this problem, we plan to institute limits on the configurations to guard against similar errors. We also have some long-term ideas on re-architecting the model update to run in a more robust environment, perhaps in an ETL pipeline.

Another issue arose when we tried to apply parameter tuning to time-split experimentation. We often use switchback tests that randomize time buckets in order to measure marketplace changes and mitigate bias from network effects. The downside to this approach is that it drastically reduces the sample size, which is a much bigger problem in the parameter tuning setting when we are trying to evaluate a large number of configurations. In order to tackle this issue, we had to develop a novel methodology that combined traditional variance reduction techniques with the GP model fitting.

Despite these issues, we will continue to invest in our parameter tuning capabilities because we believe that the problem space is promising and there are many levers that could be subject to tuning. Beyond parameter tuning, there is a larger space of adaptive experimentation that we are trying to tackle with multiple armed bandits and early stopping. All of these sophisticated tools allow us to test our product decisions with greater flexibility, speed, and nuance, and democratize the ability to derive statistical insights with complicated methodologies to everyone at Lyft.

Before closing, I wanted to call out the experimentation/science teams for being awesome with their continued support. Special thanks to John Kirn, Alex Chin, Yufeng Chen, and Mohan Konduri for working closely with me to make this project happen.

Interested in experiment design, applying science at scale, or working at Lyft in general? Lyft is hiring! Drop me an email at [email protected].


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK