Q&A: Vast Data’s cofounder talks affordable SSDs replacing magnetic disks

Image Credit: Yuichiro Chino // Getty Images

Hear from CIOs, CTOs, and other C-level and senior execs on data and AI strategies at the Future of Work Summit this January 12, 2022. Learn more

Computer rooms were once filled with spinning magnetic media because mechanical drives were the cheapest way to store permanent data. However, the moment when IT shops retire their whirling hard disk platters may be soon upon us. Lately, flash memory has been taking over the role of persistent storage for running machines. It’s faster, though more expensive. At first, it was an easy decision to pair flash memory with working CPUs, but no one wanted to pay that premium for archival data.

To combat the higher costs of flash memory, startup Vast Data is looking to build SSD (solid-state drives) storage units that will be cheap enough to push aside magnetic disks. Last week, the company announced a new model that it feels will be so cost-competitive that data architects will want to say farewell to mechanical storage.

The company’s cofounder and chief marketing officer, Jeff Denworth explained Vast Data’s pricing model and how the company intends to deliver machines that crush disk drives in an excl

VentureBeat: So your plan is to replace mechanical drives everywhere?

Jeff Denworth: Yes. The origin story of Vast goes back to when we launched the company. We basically declared ourselves as an extinction-level event for the enterprise hard drive.

VB: How’s that working out? Are you getting there?

Denworth: What’s propelling our success, which in turn, propels valuation increases, is, fundamentally, the fact that we are close to that inflection point. The customers are buying our products in mass quantities. They have made the decision to kind of turn the page on their investment in hard drives.

VB: How does your new model change the equation?

Denworth: This announcement from last week is our next-generation hardware enclosure. It’s a flash enclosure that is ultimately doubling data center efficiency. When you combine the advancements that come with the kind of hyperscale-oriented drives that we now support with the software efficiencies we bring to the table, we’ve concluded that from a data center space perspective, you can save roughly 5x more versus your alternatives.

VB: And it’s not just space efficiency that’s driving the equation, right?

Denworth: From an energy perspective, our solution is far more efficient. You can save roughly 10x on what customers would otherwise have to go in provision for if you had hard-drive-based infrastructure. This infrastructure density always creates cost savings. When you add up the efficiencies, the power savings, the data center, space savings, and the cost savings, we believe that we finally achieved cost parity with hard-drive-based infrastructure and essentially eliminating the last argument for mechanical media in the enterprise.

VB: A big part of the equation is not just the cost but also the availability of power, right? No one wants to rewire old data centers.

Denworth: When we work with customers to configure infrastructure, no longer are we worried about anything other than power per rack. Some older data centers only have like five kilowatts per rack. You can’t even put modern GPUs in a single rack. There’s just not enough power.

What’s happened is Moore’s law starts to erode the old model. The microprocessors just need so much more power to increment performance improvements, and so power is becoming one of the most precious commodities in any data center, either cloud-based or on-premises. It’s the number one thing that people are designing around. So having a solution that is a factor of 10 less than what you would otherwise have to provision is received very well by the customer.

VB: Not all of the game is just lowering the power. You’ve put plenty of thought into the software layer and orchestrating how and when the work is done. You’re working to schedule CPU computations to avoid peak issues, right?

Denworth: Sure. The interesting thing about flash is that electrically the reads are free. So that’s great for environments where you’re doing large-scale analytics. If you’re doing AI, it doesn’t really cost you anything from a power perspective. The power utilization on a flash system increases very significantly once you start writing to these devices. It goes up by a factor of four. If you can schedule and control how the system writes down to these drives, then you have a very interesting solution.

You know, our architecture is something that we call a disaggregated and “shared everything” storage architecture. We believe we’ve solved some of the scaling and infrastructure limitations that come from a “shared nothing” architecture. We figured out a way to decouple the CPUs from the storage media in its distributed storage cluster and have them both scale arbitrarily and independently. You don’t have to buy microprocessors anymore to buy storage, which is kind of a nice luxury with our architecture. Fundamentally, we can start to control how much the system writes just by limiting the number of CPUs that you allocate into a system. We can manage that dynamically now,

VB: It strikes me that a lot of your secret sauce is just very smart software that will make smart decisions about where to write so you don’t, you know, wear out some flash cells while also making smart decisions about deduplicating the data.

Denworth: Oh, it’s not deduplication. That’s a bad word around here. It’s not deduplication. Nor is it compression. Essentially, our approach is quite novel. What it does is it tries to combine the best of both of those approaches.

If you took your laptop and just compressed your entire laptop, then you just have one big zip file. You wouldn’t be able to do anything with that data unless you went and decompressed everything. So that doesn’t work across data sets.

Deduplication can go across files but has always been really coarse in terms of how you manage fine patterns, right? Opposed to compression, which is granular. Blocks are typically about 32 kilobytes in size. And if you have just two, if you have one bite, that’s different between two blocks that may otherwise have all the same data; they won’t cryptographically hash against each other with the same hash ID.

What we said is that we’ll run a fingerprinting algorithm that looks and feels mechanically, kind of like a deduplication engine. As opposed to using an SHA hash, however, we invented a new hashing function that’s most similar to what you might find in Google Image Search. We’re not looking for an exact match block. We’re just looking to do distance calculation between the new block that’s coming to the system and all of the other blocks that are already in the cluster.

Once we find that two blocks are close enough to each other, they don’t have to be the same. We start to put them into a logical grouping of data that we call a similarity cluster. Every new block that goes into that cluster gets delta compressed against the first block in.

When we work with technologies that, for example, have deduplication and have compression native to them already, and then people take that data and store it into our system, you typically find between two to one to three to one additional data reduction. And so we’re reducing pre-reduced data. We can even reduce encrypted data. And those savings, by the way, multiply. If I see five to one in Rubrick and you get an additional three to one with our Vast, well, you just reduced your backup environment by 15 to one.

VB: Factors like that must tip the balance for many customers, right?

Denworth: We can add a lot more savings versus classic infrastructure, but the secret sauce, I would argue, is not that. It’s that it lends itself to extreme levels of scale without creating complexity. Our customers are building 100 petabyte or 200 petabyte clusters now, and they just work. We can get to like these crazy levels of capacity scalability and performance scalability. We can do that because it’s just a far simpler architecture, a far more robust architecture. Historically, if you’re building that big, people have to babysit the systems.

That’s the key: It has to work, it has to solve a unique problem, and indeed, it can’t cost a king’s ransom. We think we’ve done all three of those.

VentureBeat

VentureBeat's mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact.

Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member

Q&A: Vast Data’s cofounder talks affordable SSDs replacing magnetic disks

Q&A: Vast Data’s cofounder talks affordable SSDs replacing magnetic disks

VentureBeat

Recommend

Ory lands $22.5M for zero trust security powered by open source

Log4j vulnerabilities, malware strains multiply; major attack disclosed

多读“无用之书“

CockroachDB database creator Cockroach Labs raises $278M

2022 年新年规划

GitHub - JeffBezanson/femtolisp: a lightweight, robust, scheme-like lisp impleme...

Dramatic growth in mental-health apps has created a risky industry

员工安全意识是最好的防范武器

Launch HN: FlutterFlow (YC W21) – Build Apps Visually

LISP with GC in 436 bytes

About Joyk