3

What’s the origin of the phrase “Big Data Doesn’t Fit In Excel”?

 3 years ago
source link: https://shkspr.mobi/blog/2021/04/whats-the-origin-of-the-phrase-big-data-doesnt-fit-in-excel/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
What’s the origin of the phrase “Big Data Doesn’t Fit In Excel”? – Terence Eden’s BlogTerence Eden’s Blog

Welcome to Yak Shaving School! As part of my MSc I’m reading a book about Data Analytics. So I’ve been chasing down quotes to find their origin.

One paper had this popular quote in it (emphasis added):

As with many rapidly emerging concepts, Big Data has been variously defined and operationalized, ranging from trite proclamations that Big Data consists of data-sets too large to fit in an Excel spreadsheet or be stored on a single machine (Strom, 2012)

Kitchin, Rob Big Data, new epistemologies and paradigm shifts (2014) SAGE Publications. Big Data & Society. Page: 205395171452848. DOI: https://doi.org/10.1177/2053951714528481

I keep seeing that damned Excel quote. But who originally said it? The “Big Data” paper above references “Strom”. Well, here’s what Strom has to say for themselves:

Big Data is everywhere. As Bit.ly‘s chief scientist Hilary Mason likes to say: “Big Data usually refers to a dataset that is too big to fit into your available memory, or too big to store on your own hard drive, or too big to fit into an Excel spreadsheet.”
Big Data Makes Things Better – Slashdot.org August 3rd 2012

Aha! It’s a blogpost from Slashdot. And Strom is quoting someone else – Hilary Mason. I’ve seen Mason being quoted saying this before. Here’s the earliest Tweet I could find attributing Mason – from April 2013:

"People think big data is too big to fit in Excel" not really says @hmason #boomconf

— Atypic (@ATYPIC) April 11, 2013

But I couldn’t find the original quote. I want to be able to cite who originally said it, and where & when they said it. Not a second-hand transcription.

Googling around, I found this definition of Big Data from July 2013:

“Big Data” is “it doesn’t fit in Excel”
Stéphane Hamel – één van inmiddels meer dan 30 definities van Big Data!.
Data Science – de toekomst van webanalisten?

Interesting! That’s Stéphane Hamel – not Hilary Mason. Searching for Hamel’s name, lead me to this 2017 article

The simplest definition of “Big Data” is “it doesn’t fit in Excel”
Stephane Hamel comment 8/2012 Big Data – What It Means For The Digital Analyst.
Definitions of Big Data

The “What it means for the digital analyst” page has since disappeared – but is available in the Wayback Machine. Here’s the quote in full:

I have joked that the simplest definition of “Big Data” is “it doesn’t fit in Excel” – and when you think of it, it’s true for most people who wonder how to make the shift from a traditional approach to a Big Data one. Shifting away from Excel forces the analyst to change his approach, view the data differently, and explore new solutions.
And that’s a whole lot of fun to do! 🙂
August 2nd, 2012

There’s also a Slideshow from March 2013 in which Hamel uses the phrase:
Slide saying

A bit more digging and I found this document from July 2012:

“Simplest definition of #BigData ever:!‘it doesn’t fit in Excel’ :)”!Stephane Hamel @SHamelCP 3 Jul 2012
How Big is Big Data (2013) Columbia University. DOI: https://doi.org/10.7916/d82v2qkb

The @SHamelCP Twitter account doesn’t exist any more. And while some of its Tweets are in the Internet Archive, that one is missing. But there are contemporary Tweets which suggest that it was Tweeted at about that time:

lol RT @SHamelCP
Simplest definition of #BigData ever: "it doesn't fit in Excel" 🙂 #measure #analytics

— jwindz (@jwindz) July 3, 2012

Back in 2012, the Retweet function didn’t exist, hence the slightly weird syntax. Here’s a link to a bunch of people quote tweeting it in July 2012.

The reason @SHamelCP doesn’t exist is because at some point it was renamed to @SHamel67. Which means, the original Tweet exists! And here it is:

Simplest definition of #BigData ever: "it doesn't fit in Excel" 🙂 #measure #analytics

— Stéphane Hamel (@SHamel67) July 3, 2012

I reckon that’s the earliest directly citable Tweet of the phrase. But there is some evidence of it being used earlier. Here’s a report from the BigDataWeek Community meetup in London:

The panel started off with Edd asking, So what is big data? The answers ranged from correct but slightly silly:
   lots of 0s and 1s
to
   too big to fit in x (where x is your usual tool – excel, SQL, memory etc) – Hilary
”Big data, ready or not” 25th April 2012

Here’s the video – with the quote at ~15 minutes 30 seconds in:

And, slightly earlier:

“Big Data usually refers to a data set that is too big to fit into your available memory, or too big to store on your own hard drive, or too big to fit into an Excel spreadsheet,” says Mason
Hilary Mason Wants To Get You Started With Big Data 26th December 2011
(Although possibly originally published in September 2011)

Prior to that, things start getting a little fuzzy. In April 2011, Mike Driscoll wrote a blog post about a presentation he gave with Hilary Mason and Joe Adler:

  1. Choose The Right-Sized Tool
    Or, as I like to say, you don’t need a chainsaw to cut butter.
    If you’ve got 600 lines of CSV data that you need to work with on a one-time basis, paste it into Excel or Emacs and just do it

    When you’re data gets very large, so big it can’t fit reasonably on your laptop (in 2010, that’s north of a terabyte), then you’re in Hadoop, parallelized database , or overpriced Big Iron territory.
    the seven secrets of successful data scientists 19th April 2011

So the proto-phrase seems to have appeared between April 2011 and April 2012. By July 2012 it had become much more pithy. And from there became endlessly quotable.

Before April 2011, it was always expressed much more fuzzily. A McKinsey report from May 2011 says:

In some cases, decisions will not necessarily be automated but augmented by analyzing huge, entire datasets using big data techniques and technologies rather than just smaller samples that individuals with spreadsheets can handle and understand.
Big data: The next frontier for innovation, competition, and productivity

And, even further back, here’s what RedMonk’s Stephen O’Grady had to say back in 2009:

Excel has been used on big data for years, it’s true. But not directly on big data. With a row limit of around 65,000, it certainly can’t be used as a direct window into data warehouses or marts
What’s After Excel? Big Data and the Future of Spreadsheets 19th November 2009

Please don’t think I’m picking on any of the people mentioned in this blog post – I’ve seen the quote attributed to a dozen other people, and to none. It is a catchy little slogan with huge memetic potential. I think it has now now become a standard truism.

But this was a great reminder to me that is always worth following the trail of a quote to see where it leads.

More posts from around the site:


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK