42

Data Science at Scale: A Conversation with Uber’s Fran Bell

 5 years ago
source link: https://www.tuicool.com/articles/hit/Rryua2z
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

eYrm6jJ.jpg!web

Fran Bell has always been a scientist; theorizing, modeling and testing how the world works. An ever-curious child, she was fascinated by the natural world, poring over biology and chemistry books, but was never satisfied with just knowing ; she wanted to put her knowledge into practice, too. So, she translated her love of the physical sciences into a degree in quantum chemistry, and later, a career in data science at Uber.

While Uber’s data-driven approach to research and its unprecedented scale drew her to the company, Fran says her greatest fulfillment stems from being a leader. As a lead on Uber’s Data Science Platform team, Fran leverages her passion for applied machine learning to strategically determine which products and services will be improved through artificial intelligence. From building solutions that detect system outages to machine learning models that assist customer obsession agents in answering support tickets, Fran’s team of specialists are responsible for tackling some of Uber’s most important technical challenges. Fran is also heavily involved in building a Data Science community at Uber and co-leads Science Identity, an initiative to strengthen the (Data) Science community at Uber and increase its external visibility, for example through science symposia.

We sat down with Fran to discuss her lifelong love of science, her first big project at Uber, and what she envisions for the future of machine learning:

When and how did you first get interested in science?

When I first started reading as a small child in Austria, there were two books that really stood out to me. One was a biology book that described the basics of how birds fly and how their aerodynamics work. The second one was about how everyday technologies work. The book basically took apart things like washing machines and explained how all the different parts worked together. Biology and technology were particularly interesting because I wanted to know how and why everything worked.  

When I went to high school, I discovered Chemistry Olympiads , and this was something that I really dove into. I participated in these competitions and was able to make it to the national and, subsequently, international competitions as well. This education is what really brought me to love chemistry, physics, and mathematics.

What did you study at UC Berkeley?

I pursued my Ph.D. in quantum chemistry at UC Berkeley. Quantum chemistry is basically an intersection between applied mathematics and high performance computing that is then applied to fields like biology, chemistry, and physics. The research group that I was a part of developed very accurate, computationally efficient algorithms. While we might have known the exact solution for a given problem on paper, these calculations were so computationally slow they could not be completed in the lifetime of the earth.  We developed much faster methods which approximated the answer. It’s all about balancing accuracy and computational speed.

Specifically, my lab applied these methodologies to things like non-silicon based solar cells, which had been around for decades but very little was known about them. These are organic molecules that have been shown in a lab situation to be potentially twice as efficient even as silicon-based solar cells. They are very flexible and very easy to produce, but nobody really understood what their underlying mechanism was. By developing new methods for electronic structure theory we were able to unravel how the solar cells worked; this paved a path for optimizing the performance and cost of the non-silicon based solar cells.  

When and why did you decide to join Uber?

One of the things I was very fascinated by was the Uber product itself. Before I joined the company, I had used the app and thought it was brilliant how you could actually see the cars moving on the map towards you, you didn’t have to worry if somebody actually was ever going to pick you up, and take you to a specific location.

The second factor that pushed me to join Uber was the immediate real world impact my work could have, which is significantly delayed in academia. In tech, there is a faster turnaround pace as cross-functional teams rally together to get a product or process across the finish line. Moreover, Uber tackles some of the world’s most challenging data science problems at scale and in real time. It was very, very compelling to me and four years later, I still feel super excited to get up every single day and work on these problems with my amazing teammates.  

Was it a big jump from studying quantum chemistry in an academic setting to practicing data science at a technology company?  

It didn’t actually seem like much of a jump. After doing my Ph.D. at Berkeley, I did a postdoc at the California Institute of Technology in a related field, approximate quantum dynamics. During my postdoc, I joined a lab that was developing new mathematical models applied to things like enzymatic reactions with particular focus on systems that evolve with light, like photosynthesis. A lot of these have common methodologies to those that would be applied in statistical modeling, machine learning, and data science in general. So obviously, advanced statistical methodologies or high-dimensional optimization problems would be part of that. These methodologies have very different names when you go into the machine learning world, but the underlying mathematics are very similar, sometimes even identical.

What was your first major project at Uber?

I joined Uber in late 2014 as the first data scientist embedded within our Infrastructure team. The initial problem my team was solving involved developing a tool that could detect whenever a system outage occurred, a project we called Argos . Argos notified us whenever a user could not open the app, request a trip, or sign-up on the platform, giving us the power to ensure that our services were up and running at all times. The interesting component here was that outage detection at scale is actually still an open research question. Within a very short amount of time, we were able to make progress in this field and then also bring something into production. Now, we have a whole group that is working on this problem and is still pushing the boundaries, actively publishing in the space and going to conferences. We have a couple of patents as well.

What most surprised you about working at Uber?

I think one really amazing thing is how data science-focused Uber is; by that, I mean we use data intelligently to build better experiences for our users and solve problems at scale.

In fact, data science is so important at Uber that one of Uber’s first ten employees was a data scientist. Now, seeing how many data scientists and data analysts we have at Uber is really fantastic. Our focus on building out this team really speaks to our commitment to machine learning and data science, but our data-driveness doesn’t stop there; it permeates every part of the company, from engineering to product management and design. This culture and mentality around data as ground truth is really fascinating to me.

The other surprising thing is that there are still so many things that are yet to be solved in our space that are also unique to Uber, for example, spatio-temporal forecasting. Marrying this fast-paced environment with the need for actually coming up with innovative novel solutions is something that is very fascinating.

You lead Uber’s Data Science Platform team. What is your team responsible for and why is their work critical to Uber’s business?

Uber’s Data Science Platform program has two components. One is our specialist group. As the name suggests, this organization has a very high density of data science experts. Examples of these groups include forecasting, anomaly detection, experimentation, conversational AI, computer vision and behavioral science data science teams. These specialists leverage their deep data science domain expertise, and in collaboration with engineering and product, build tools and platforms that can be used by teams across the entire company, regardless of their skill level. For example, the forecasting platform powers forecasts across Marketing, hardware capacity planning, operations and the team collaborates with our Finance and Marketplace teams. These tools make it easy for our specialists to scale their expertise to anyone within the company. My team finds it very exciting that we’re like, basically, these ninjas, and that we can multiply ourselves and spread our expertise throughout the company.

The other teams that I’m involved in are the targeted solutions teams. These are responsible for leveraging data science to build scalable tools and platforms for particular product areas, such as customer support and growth.

What is most challenging about doing data science at Uber?

From a technical perspective, I would say we are still a very young company and we’ve obviously experienced—and are still experiencing—hypergrowth. On the data science front, this growth actually becomes both very interesting and challenging at the same time.

To give you a concrete example, my teams are responsible for conducting Uber’s hardware capacity planning, especially for high traffic events on our platform, such as Halloween, New Year’s Eve, and other major events. However, there are lots of additional variables we need to consider to determine demand on that specific date, including day of the week and how much user growth has occurred since last year’s Halloween. We sometimes have very little data to work with. In the beginning, we didn’t have historical data to compare these upcoming holidays and events because Uber wasn’t in business the last time they fell on that day of the week; combined with our rapid growth in existing markets and expansion to new ones, this can make forecasting trip demand very difficult.

Beyond this, we tackle a lot of open-ended problems that have yet to be addressed at this scale, if they’ve been worked on at all, whether it’s in academia or the private sector. For example, real-time anomaly detection is something that hadn’t really been done before for a platform of Uber’s scale. In areas where there is very little literature out there right now and still a lot of room for improvement, Uber has a great opportunity to use our resources and expertise to make a lasting impact.

What do you think is most rewarding about your work?

The thing that I really enjoy the most about my job is helping people grow. I really love individual contributor work, but having had the opportunity to help the team members in my group excel in their roles is what gets me excited to come to work every day. Managing teams also enables me to dig into the strategic aspects of machine learning and science identity at Uber by assessing how these various components work together.

What excites you about the future of machine learning?

I believe we have only just started to scratch the surface of what data science can accomplish. Among other areas, I’m particularly excited about developments in natural language processing and conversational AI. There have been great strides in machine learning models that can translate and understand human language. At Uber, we have a natural language processing team that is building an Uber-wide platform to make it easier for customer obsession agents at our Greenlight Hubs and on the app address customer support tickets, thereby facilitating an improved experience for both our users and employees. Other areas where machine learning will play a big role, of course, relate to our work on autonomous vehicles. However, I think the holy grail of machine learning will be training accurate models with less and less data, similar to how humans evolve. This is still an open research area and is very interesting to me.

What advice would you give data scientists who are debating whether they’ll make the biggest impact in the private sector or academia?

I think there are a lot of interesting research areas in both industry and academia, and nowadays, they increasingly overlap. Bringing machine learning, artificial intelligence, and other fields into the private sector is important for the advancement of this research and the latest developments on this front have really unlocked a lot of business use cases. At the same time, a lot of the larger companies have also started to invest in their own research branches. For example, we have AI Labs , Uber’s applied machine learning and research arm out of Engineering.  So, even within industry, there is the opportunity to do research work that is long term and with impact beyond the scope of that individual company.

The private sector and academia are working more closely together than say, 10 years ago. I think the reason is that data science in itself is still a very new field. So there is a lot of opportunity that people have discovered where they can be leveraging each other’s methodologies and techniques. The other exciting thing is that the private sector often has a lot more data to work with than the academic world. In fact, there are a lot of tenured professors here at Uber that are doing a sabbatical or a summer visit, and some of them even choose to stay on full-time because they find the problem space so exciting and the data we have so incredibly useful for their work.

What drives you outside of your work at Uber?

There’s a lot of things I’m passionate about. I love expanding my technical horizons. Some of what I enjoy learning about is obviously related to the work I’m doing, but I also really like reading papers from very different fields in machine learning, and listening to recordings of tech talks from interesting conferences. I always try to push myself and my boundaries, so working with amazing people at Uber who are so deep in their domain really gives me this opportunity.

If you’re interested in working on data science problems that boggle the limits of scale, consider applying fora role on Uber’s Data Science team!

Subscribe to our newsletter to keep up with the latest innovations from Uber Engineering.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK