You don’t need“Big Data” to apply deep learning

You don’t always have to train new models from scratch

Mar 21 ·4min read

BvmMVjA.jpg!web

Source: Pexels

Disclaimer: The following is based on my observations of machine learning teams — not an academic survey of the industry. For context, I’m a contributor to Cortex , an open source platform for deploying models in production.

For years, the biggest bottleneck to production deep learning was simple: we needed models that worked. And over the last decade—thanks to companies with access to unprecedented amounts of data and computer power, as well as new model architectures—we’ve largely cleared that hurdle.

We may not have fully autonomous vehicles or Bladerunner-esque AI, but when you call an Uber, you get an accurate ETA prediction. When you open an email in Gmail, you get a contextually appropriate suggestion from Smart Compose. When you turn on Netflix, you get a personalized curation of movies and shows.

As an unintended consequence of this development, people commonly believe that to use deep learning, you need the same resources as Google or Netflix: teams of researchers, endless funding, and tons of data.

This isn’t true. Even in niches like natural language processing and computer vision, there are often ways to make a small amount of data work.

You don’t need to train a better model than Google

Let me be clear, in pretty much all domains, training a state-of-the-art deep learning model from scratch requires a massive amount of data. For example, state-of-the-art language models like OpenAI’s GPT-2 require 10s of GBs of data and weeks of training.

What many miss, however, is that training a model from scratch is often times unnecessary. A few months ago, an ML-driven choose-your-own-adventure game called AI Dungeon went viral:

The game generates state-of-the-art responses, because it is built on a state-of-the-art model. But instead of being built by a team of researchers with GBs of data, the AI Dungeon model is the product of one engineer who trained a model on 30 MB of text for about 12 hours.

Nick Walton, the creator of AI Dungeon, was able to create such a powerful model precisely because he didn’t build it from scratch. Instead, he took OpenAI’s GPT-2 and fine tuned it with his data.

This process,transfer learning, is one of several techniques for leveraging the “knowledge” of an existing neural network to train a new one more efficiently. Transfer learning works because the lower layers in a neural network are responsible for identifying more primitive features—in a computer vision model, they’d recognize things like edges, colors, and contours, for example. This knowledge is typically applicable to many domains, not just the one the model was originally trained for.

In the case of AI Dungeon, this meant that Walton could take GPT-2’s general understanding of English, and fine tune it to the choose-your-own-adventure genre with a comparatively small dataset.

Anecdotally, many Cortex users who deploy deep learning models in their products use this same approach to training their models. Robert Lucian, who recently built a popular DIY license plate identifier, used a similar approach. He deployed a computer vision model (YOLOv3) that had been fine tuned on a small number of license plate images, and it worked:

A whole wave of new ML-native products are being launched on top of deep learning models that were not originally designed or trained by the companies using them. Instead, as in most areas of software, companies are building on top of open source technologies to create things they do not have the resources to build from scratch.

You can build entire products with deep learning and “small data”

If your knee-jerk reaction to deep learning is “We don’t have enough data,” I invite you to reconsider. So many applications of deep learning—recommendation engines, image parsers, conversational agents, sentiment analyzers, and more—can be built on top of open source, pre-trained models with a small amount of data.

As a matter of fact, there’s an entire industry now of transfer-learning-as-a-service platforms that allow you to upload data and fine tune models:

Rasa creates contextual AI assistants by fine tuning language models.
Owkin allows doctors to fine tune models with their own medical images.
TwentyBN is a computer vision-focused platform that allows users to fine tune models to their own domains.

As the field continues to mature, it is only going to get easier to use deep learning with small amounts of data. At the same time, expertise will always be needed—but you can learn about deep learning (or hire someone who is already experienced) much more easily than you can build a massive proprietary dataset.

You don’t always have to train new models from scratch

You don’t need to train a better model than Google

You can build entire products with deep learning and “small data”

Recommend

EAX x86 Register: Meaning and History

The Nokia 8.3 5G is the first truly global 5G phone, with support for bands in e...

「网速 & 电池」for macOS 更新发布，前排福利送码啦！

关于英国的“群体免疫”，这个视频是我看过的国内媒体最客观公正又通俗易懂的视频了。

我们回龙观的酒吧终于开业了！

通用的图像-文本语言表征学习：多模态预训练模型 UNITER

混合云环境中的Kubernetes和HPC应用程序Part II

使用feature Importance进行特征选择

迁移学习领域自适应：具有类间差异的联合概率最大平均差异

6.851: Advanced Data Structures (Fall'17)

About Joyk