Machines that can see and hear

v2Mzuya.png!web

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds . You can listen to the podcast below:

One of the most interesting recent trends in machine learning has been the combination of different types of data in order to be able to unlock new use cases for deep learning. If the 2010s were the decade of computer vision and voice recognition, the 2020s may very well be the decade we finally figure out how to make machines that can see and hear the world around them, making them that much more context-aware and potentially even humanlike.

The push towards integrating diverse data sources has received a lot of attention, from academics as well as companies. And one of those companies is Twenty Billion Neurons, and its founder Roland Memisevic, is our guest for this latest episode of the Towards Data Science podcast. Roland is a former academic who’s been knee-deep in deep learning since well before the hype that was sparked by AlexNet in 2012. His company has been working on deep learning-powered developer tools, as well as an automated fitness coach that combines video and audio data to keep users engaged throughout their workout routines.

Here were some of my favourite take-homes from today’s episode:

Academics who started down the deep learning path prior to 2012 were often ridiculed. The world of the 2000s was dominated by tabular data that simple models like decision trees and support vector machines were well suited for, so most people incorrectly generalized from this and assumed that the tools of classical, statistical machine learning were more promising than neural networks. What kept deep learning buffs moving despite all that pushback was the belief that deep learning should have the potential to process a type of information that humans consume all the time, but that machines rarely encountered, especially back then: video and audio data.
The computational constraints imposed by mobile devices are a big consideration for companies that are developing new consumer-facing applications for machine learning. When Twenty Billion Neurons got started, mobile devices couldn’t handle the on-device machine learning capabilities that they needed if they were going to run their automated fitness trainer software, so they were faced with a choice: find a way to compress their models so that they could be run on-device, or wait for the hardware to catch up with their software. Ultimately, Twenty Billion went with option 2, and that paid off: in 2018 Apple phones started carrying a chip that unlocked the on-device processing they needed.
If you’re interested in experimenting with datasets that contain multiple data types, Roland recommends checking out the “something something” dataset, publicly available from here .

You can follow Twenty Billion Neurons on Twitter here or on LinkedIn here and you can follow me on Twitter here .

If you’re curious about their upcoming fitness app launch, you can also give them a follow on Instagram here .

Recommend

优化背后的数学基础

比特币“追赶”以太坊 3200个比特币被锁定在DeFi生态创历史新高

达达赴美冲刺“即时零售第一股”，即时配送市场竞争加剧

百度直播带货“百度百科”？

业绩快报 | 中芯国际一季度实现营收9.05亿美元，并继续上调二季度营收预期

达达要IPO了，京东站“C位”

9 Things You Should Know about Scikit-Learn 0.23

2020，精品电商来到岔路口

2016-2020：比特币四年的变与不变

谷歌工程实践 | 谷歌工程实践

About Joyk