27

Machines that can see and hear

 4 years ago
source link: https://towardsdatascience.com/machines-that-can-see-and-hear-443d70852f19?gi=102553f02bed
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

v2Mzuya.png!web

Editor’s note: The Towards Data Science podcast’s “Climbing the Data Science Ladder” series is hosted by Jeremie Harris. Jeremie helps run a data science mentorship startup called SharpestMinds . You can listen to the podcast below:

One of the most interesting recent trends in machine learning has been the combination of different types of data in order to be able to unlock new use cases for deep learning. If the 2010s were the decade of computer vision and voice recognition, the 2020s may very well be the decade we finally figure out how to make machines that can see and hear the world around them, making them that much more context-aware and potentially even humanlike.

The push towards integrating diverse data sources has received a lot of attention, from academics as well as companies. And one of those companies is Twenty Billion Neurons, and its founder Roland Memisevic, is our guest for this latest episode of the Towards Data Science podcast. Roland is a former academic who’s been knee-deep in deep learning since well before the hype that was sparked by AlexNet in 2012. His company has been working on deep learning-powered developer tools, as well as an automated fitness coach that combines video and audio data to keep users engaged throughout their workout routines.

Here were some of my favourite take-homes from today’s episode:

  • Academics who started down the deep learning path prior to 2012 were often ridiculed. The world of the 2000s was dominated by tabular data that simple models like decision trees and support vector machines were well suited for, so most people incorrectly generalized from this and assumed that the tools of classical, statistical machine learning were more promising than neural networks. What kept deep learning buffs moving despite all that pushback was the belief that deep learning should have the potential to process a type of information that humans consume all the time, but that machines rarely encountered, especially back then: video and audio data.
  • The computational constraints imposed by mobile devices are a big consideration for companies that are developing new consumer-facing applications for machine learning. When Twenty Billion Neurons got started, mobile devices couldn’t handle the on-device machine learning capabilities that they needed if they were going to run their automated fitness trainer software, so they were faced with a choice: find a way to compress their models so that they could be run on-device, or wait for the hardware to catch up with their software. Ultimately, Twenty Billion went with option 2, and that paid off: in 2018 Apple phones started carrying a chip that unlocked the on-device processing they needed.
  • If you’re interested in experimenting with datasets that contain multiple data types, Roland recommends checking out the “something something” dataset, publicly available from here .

You can follow Twenty Billion Neurons on Twitter here or on LinkedIn here and you can follow me on Twitter here .

If you’re curious about their upcoming fitness app launch, you can also give them a follow on Instagram here .


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK