A developer's toolkit for SOTA AI

Transcript

Changelog

Click here to listen along while you enjoy the transcript. 🎧

Welcome to another edition of the Practical AI podcast. My name is Chris Benson, I’m your co-host today. Normally we would have Daniel Whitenack joining us, but Daniel has just gotten off a plane, he flew halfway around the world, and we decided to give him a break from today. He was more lucid than I would be under the same situation.

Today – I wanted to dive right in… We have a super-cool topic. It is not dissimilar from some of the other general things we’ve been talking about, but I have two guests today. I’d like to introduce Varun, who is the CEO and co-founder of Codeium, and Anshul, who is the lead of their enterprise and partnership. Welcome to the show, guys.

Thanks for having us.

Thanks for having us, Chris.

You’re welcome. I’m really interested in learning more about Codeium. When Daniel lined you guys up, he sent me this thing saying “You’ve got to look at this. This is really cool”, and everything. And I’m like “Get him on the show.” He’s like “I’m already doing that.” So really glad to have you guys on, and he’s going to be bumming that he missed the conversation, because he was pretty excited about it…
So I guess I wanted to, before we even dive into Codeium and the problems it’s trying to solve and such, if you guys can each just tell me a little bit about how you’ve found yourself arriving at this moment, kind of a little bit about your background, how you got into AI, and how this became the thing. Varun, if you want to kick off, and then Anshul afterwards.

So maybe I can get started… It actually starts in 2017; I started working at this company called Nuro, that does autonomous goods delivery. So it’s an AV company. There I sort of worked on large-scale offline deep learning workloads. So as you can imagine, an autonomous vehicle company needs to run large-scale simulation, they need to basically be able to test their ML models at scale before they can actually deploy them on a car.

In 2021 I left Nuro and started Exafunction, which is the company that is building out this product Codeium. And Exafunction started out building GPU virtualization software. So you can imagine, for these large scale deep learning applications, one big problem is GPUs are scarce, they’re expensive, and also hard to program. And sort of what Exafunction started building was solutions and software to make it so that applications that ran on GPUs were more effectively using the GPU hardware. And we realized that our software with Exafunction was best applicable to generative AI tech, and started building out Codeium around a year ago.

Very cool. And before I dive in, because I have several questions for you… But I want to give Anshul a chance to introduce himself here. Go ahead, Anshul.

Surprisingly, my story is actually quite similar. I was also working at Nuro. So Varun and I used to work together back in the day. I was not actually working on the ML infrastructure side of things. That was something that Varun had hands on on. But I decided to kind of also join the team at Exafunction.

And yeah, as Varun mentioned, about a year ago I think we noticed there was – I think three things kind of happened at the same time that we noticed, that led us to Codeium. The first one is that we’re engineers, all of us are engineers, and we had all tried the GitHub Copilots, and all these cool AI tools for code in their beta, and we’re like “Wow, this is absolutely gonna be the future of software development.” But at the same time, it’s like still scratching the surface of potentially everything that we do as engineers. So that was I think number one that we realized then.

Number two was, you know, talking to a lot of our friends at these bigger companies, a lot of them were just saying “Oh, yeah, it’s cool. I’ve tried it for my personal project, but I can’t use it at work. My work is not allowing me to use that.” So that was the second thing we heard.

The third thing was exactly what Varun alluded to - we were building ML infrastructure at scale for really large workloads. When this entire generative AI wave started coming, we’re like “Wow, we’re actually kind of sitting on the perfect infrastructure for this.” So I think all those three things kind of combined together for us to be like “You know what, let’s build out an application ourselves, and build an application that we as engineers are customers ourselves” and that ended up becoming Codeium.

As you were getting into doing GPU software, what was in general some of the challenges that you were seeing? NVIDIA has their various software, supporting things like that… Clearly, you saw that there was a need for something beyond that. Can you talk a little bit about just the layout that you saw in the environment before you got to all the generative stuff, and the fact that you had infrastructure? What positioned you for that, and what was the thing that you decided that you needed to address?

Maybe I can take it a step back of why these GPU workloads are just a little bit annoying compared to CPU workloads…

[05:33] One of the really sort of unique things about GPUs is that, unlike CPUs, it’s kind of tricky to virtualize. One common thing that we have with CPUs is you can put a bunch of containers on a single VM, and then you can kind of make use of the CPU compute effectively. You can basically dump 10 applications onto a CPU and it’s perfectly fine. For GPU, it’s a little bit more messy, because the GPU doesn’t have a ton of memory. So you can’t just load up infinitely many models on there. Let’s imagine you have a GPU with 16 gigs of memory, and each of these models takes like 10 gigs. You can’t really even put two applications on there. So then that already becomes a big issue. And that’s sort of what a lot of these large deep learning workloads were struggling with.

When I was at Nuro, one big problem we had was we had around tens of models, but we had these workloads that needed hundreds of GPUs. Some of them even thousands of GPUs. And we struggled to basically make it so that we were even able to use the hardware properly. And then you can imagine the complexity then stacks with now we’re in a state where companies have trouble even getting access to 10 GPUs, because of NVIDIA sort of scarcity issues. And then also the cost of a GPU is not like a CPU; it’s significantly more expensive. The cost of a single H-100 chip is well over 30k. So these aren’t like very cheap chips. So there was a big need at the time to figure out how do we leverage the hardware properly? …and sort of that’s what we had to build software for.

And just to clarify for me - was that while you were still at Nuro, or was that after you started Exafunction?

Yeah, so while I was at Nuro, we sort of worked through – or I sort of led a team that sort of built software that kind of fixed these problems… But Exafunction was focused on generically how do we make sure deep learning-based applications could best leverage GPUs? That’s sort of what we started out building, actually. And then Codeium came out from that, actually.

Gotcha. Tell me a little bit about - as you have been right in the middle of this progression, just to frame it for a second, if you look at the last couple of years in particular, and the pace of change has been so much… And so you were right there, starting at Nuro, and then creating Exafunction, seeing some of the challenges… Could you talk a little bit about how the industry was evolving and changing as you were seeing it, so that we can get a sense of kind of how you moved toward Codeium? To give a little bit of the history, instead of just starting from where that is. Can you talk a little bit about, the itches that you were scratching, and why it led that direction? What did this AI industry look like to you?

Yeah, so when we started, you can just imagine, everything was a lot more smaller-scale, right? The hyperscalers or the cloud providers just didn’t have nearly as much GPUs. If you asked them what fraction of cloud spend is GPU spend, it’s probably like very small, single-digit percentage points, maybe even less than that at the time. So this was like a very small workload for them when we sort of started. Both me and Anshul started at Nuro in like 2018. But then over time, this grew a ton. We could see it from the training workloads. These were no longer even single-node training workloads. Back in the day, a single GPU node that had maybe like eight V100s or something was like considered a lot of compute. And suddenly now we were able to witness the fact that this was slowly becoming eight A100 nodes, and then more than eight of these nodes were necessary then even to train these models.

And similarly, to prove out that these models were capable in an actual production setting, you needed to run offline testing at massive scales, on the order of like 5,000 to 10,000 T4s scales, which is like kind of incredible in terms of raw flops. So we were able to see this hockey stick happen in front of us, and that’s sort of what made us want to start Exafunction in the first place. We realized that there were going to be large deep learning workloads.

One interesting fact is for us, for just the Exafunction GPU virtualization software that we ended up selling to enterprises, we ended up managing over 10,000 GPUs on GCP, in a single GCP region. So we ended up managing more than 20%. And we realized that “Hey, this is only going to keep growing.” When we talked to the cloud providers, they were only going to keep growing the number of GPUs, and we realized – I guess the interesting thing was in the future, generative AI was going to be potentially the largest GPU workload. That was the big thing we realized once GPT-3 came out, which was I guess in 2021 now.

Gotcha. But at that point were you already at Exafunction, and had already started at that point?

Yeah, it had already started, and we were sort of selling GPU virtualization software to large autonomous vehicle and robotics companies.

[10:09] Gotcha. And so basically, if I’m understanding you correctly, the whole generative tsunami just kind of landed on you when you were already sitting in that space, doing GPU virtualization already. So you just managed to land right in front of the wave, it sounds like.

Yeah. So we started working on Codeium maybe four or five months ago before ChatGPT. It was interesting just because we realized that an application like GitHub Copilot was going to be one of the largest GPU workloads, period. I don’t know if – you’ve probably tried the product out. Every time you do a key press, you’re going out to the cloud and doing trillions of computations. So it’s like a massive workload. And we had, as Anshul said, the perfect infrastructure to basically run this at enormous scale. Not to mention we were in love with the product from day one. We were early users of the product the moment it came out in 2021.

Very cool. So as generative is starting to take off, kind of with ChatGPT hitting the world, and really changing things quite rapidly… I think people are still shocked at how fast things have moved. You had started Codeium already… What kind of synergy were you starting to see there in terms of knowing that you have one of presumably many, many GPTs coming, and other similar generative models? You had just gotten into Codeium… Can you talk a little bit about what that was, and what were you putting together in your minds to recognize the opportunity that it was?

Yeah, so I think one of the great things about the entire ChatGPT wave is that everyone was using it. This is a thing where literally every individual is using AI. And so it helped us, in general. A big wave raises all ships kind of thing. It really helped us. We weren’t really going out and telling people “Hey, a tool like Codeium can help productivity”, because that was kind of just now assumed by everybody. Like “Oh yeah, if I do any kind of knowledge work, then there’s potential for AI to help.” So from that sense, when this entire ChatGPT wave really came about, that overall kind of just like helped us in terms of convincing people to even try the product.

The other thing that we recognized is that we were positioning ourselves very specifically from the beginning when it comes to code. Code is like actually a very interesting modality. It’s not like your standard ChatGPT, where you have a long context that a user puts in, and then it produces context coming out. Code is interesting, in the sense that, as we mentioned, it’s an autocomplete, that’s like a passive AI, rather than like an AI that you’re actually instructing the model to do something. It’s happening every keystroke, so it has to be a relatively smaller model. You can’t have these hundreds of billions of parameter models being used. It has to be relatively low latency.

And then code itself is interesting, right? If you ever cursor in the middle of a code block, the context both before and after your cursor really matters. It’s not like just what comes before. So there’s all these interesting situational kind of constraints about code that you put all these things together and realize that, okay, all these ChatGPT ways and conversational AIs are happening, that’s great, but we’re still not going to be like roller over by that, because we’re kind of focusing on a very specific application and modality of a lens, that was pretty unique in many ways.

Break: [13:31]

Could you take a moment, as we’re diving into Codeium and generative AI and its unique capabilities there, and just differentiate a little bit about – you know, so many people have tried Copilot, and so it’s kind of inevitable that you’re gonna get that comparison, to some degree… Can you talk a little bit about what Copilot’s not doing for generative AI, or how you’re approaching it that allows you to show people this as a better way forward, from your perspective?

I mean, we have tons of respect for the Copilot team. I’m just gonna start there. As Varun said, we were all early users of it…

Definitely not putting you into conflict with them. Just is a starting point for people…

[14:24] Absolutely, yeah. The way we kind of view this, and like I alluded to earlier, you’re writing brand new code with autocomplete. It’s really just one small task that we do as engineers. We refactor code, we ask for help, we write documentation, we do PR reviews… And so kind of our general approach has always been “Let’s try to build an AI toolkit, rather than an AI autocomplete tool.”

So we can get more into this, into the weeds here, but autocomplete is just one of our functionalities that we provide. We provide like an in-IDE chat, something like ChatGPT, except integrated with the IDE… Natural language search over your codebase using like embeddings and vector stores in the background… So we’re really trying to expand, like, how can we address the entire software development lifecycle. So I think that’s probably the most obvious difference with a tool like Copilot, from like an individual developer point of view.

But then the other thing, which really kind of builds off of all the infrastructure that Varun was mentioning earlier, is that we were already deploying ML infrastructure in our previous customers’ private clouds. We already had all this expertise of “How can we take actual ML infra, deploy it for a customer in a way that they can fully trust the solution, because we’re not getting any of their data?”
And so another really big differentiator for us was like - okay, I think this might actually be a tool that enterprises can use confidently and safely, because we have the infrastructure to do the deployment in a manner that they would be open to using. So I think that was like the other differentiator when it came specifically to enterprises. But we can dive more into that later.

No, that sounds good. I want you to connect one more thing for me… Going from being able to deploy the infrastructure and helping your customers in that way, to Codeium as a tool, what’s the leap there that got you from one to the other? How did you get from infra-focused to Codeium-focused?

Oh yeah, I think we had to do like a full 180 when we started. We went from full inference service company to like “Let’s create a product for consumers.” It was a full 180 in terms of product…

Yeah, to some degree a pivot, because we knew that eventually we’ll deploy to customers’ VPCs. That sounds great. But if we’re going to ship something to a customer, we need to be super-confident that it was a product that would work well… Because we’re getting no feedback from their developers. And so we actually first focused for the first six or seven months of Codeium just building out like an individual tier. Any developers can go try it, we can see how they like it, try our new capabilities, get feedback from an actual community… Do all these community-building things that we hadn’t really done as an infra-as-a-service company. That was a really huge focus for us, and we’ve grown our actual Codeium individual plan to over 100,000 active developers using us for many hours a day, because you code for that long if you’re a developer. That’s like plenty of feedback to us. Plenty of people actually using the tool, telling us “Yeah, this is good. This isn’t good. Oh, you tried pushing a new model? That’s worse.” All those things, we actually learned, so that we can get a product that’s good. So that was like the intermediate period - really learning from actual developers, what is a good product and what is not. I think that’s always going to be a key kind of part of our development cycle.

You’re coming into this with this rich knowledge in infrastructure for customers… That’s a huge area of expertise. It’s an area of expertise that even though you’re moving forward into kind of the Codeium era, if you will, in my words, that is a skill set and level of expertise that very few organizations have deeply, that you would have had there. How did that inform you in terms of Codeium, and differentiation against - whether it be Copilot or other tools that are out there, or just developers throwing things into ChatGPT? What did that background give you that gave you that differentiation in the marketplace?

[18:13] Yeah. So I think when we started, the thing we started with is like “No one cares if we have better infrastructure once you’re a product. If we have better infrastructure, that’s great, but if that makes a product that’s the same, no one should care.”

They’d just assume that you should.

Yeah. So what we started with is we set a very high bar for ourselves. Codeium is an entirely free product. So for the individual user, it’s something that they can install and use immediately for free. There are unlimited – there’s no limits at all. So when it comes to autocomplete, you can use it as much as you want. And this is, by the way, forced us to do things where infrastructure is as efficient as possible.

Just to give you a sense of the numbers we’re talking about here, we process over 10 billion tokens of code a day. That might sound like a large number, but that’s over a billion lines of code a day that we process for our own developers. We’re forced to do this entirely for free. And then on top of that, we probably have one of the world’s largest chat applications also, because it’s in IDE as well. And all of this put together has allowed us to build a very, very scalable piece of infrastructure, such that we’re the largest users of our own product. We are the largest users of our own product, we learn the most from our users, and we can then take those learnings and deploy in a very cost-effective, very efficient and optimized way to our own enterprise users. It’s one of those things where we force ourselves to learn a lot from an individual plan, and then take all those learnings and actually bring them over to the enterprise. And a lot of the learnings we were only able to make because we placed very – I would say like annoying infrastructure constraints on ourselves by saying “Hey, you guys have got to do this entirely for free, basically.” And we’re committed to building – Codeium is going to be a free product forever, actually. The individual plan will always be free. And it’s one of those things where our users are just always like “How are these guys even doing it? What are they even doing to make this happen?” And most of our users, by the way, are users that have churned off of Copilot. We have spent very little, if not anything on marketing. So it’s just one of those things where our users are like “How do we make this free?” We take the approach of – we think some of the best products in the world are free. Products like Google, they’re entirely free. Google doesn’t tell you all the time that they have the best infrastructure, but they do have the best infrastructure. It just so happens to be the case that that shows itself off in the best product. And we could talk a little bit more about how we take our sort of focus on infrastructure and make a much better enterprise product as well, but that’s the way we sort of look at it… It’s like, how do we deliver materially better experiences with our infrastructure? …and our users shouldn’t care that we actually did that.

You’ve brought it up, you’ve got to go there now, man… Go ahead and dive right into it.

I guess one of the interesting things - like, just going into how we run one of the world’s largest LLM applications, what that sort of focus forced us to do is give it a single piece of compute, like let’s say a single node or a single box of GPUs; we can host the most number of users on there. So let’s say a large company comes to us, they can be confident that whether they’re on-prem or they’re in VPC, we can give them a solution where the cost of the hardware is not going to dominate the cost of the software itself. Because right now, there’s kind of this misunderstanding that GPUs are really expensive. Which is true, they are. But the trade-off is they have a lot of compute. Modern GPUs like A100s can do 300 teraflops of compute, which is like some ungodly number, right? That’s a crazy number compared to what a modern CPU can do. And we can leverage that the best. And we’ve sort of been forced to do that. If we didn’t do that properly, we’d have outages with our service all the time. Because of that, enterprises trust us to be like the best solution to run in their own tenant, in an air-gapped way… Which is fantastic, because that’s like the way that we can build the most trust and deploy these pieces of technology to them the most effectively, because they don’t want to ship their code outside of the company.

Anshul can talk a little bit more about how we leverage things like fine-tuning as well. That’s like a purely infrastructure problem that’s very unique to us, versus like any other company as well. Anshul, do you want to sort of take that?

[21:59] Yeah, I think - as Varun said, there’s a lot of things that we do from the individual infrastructure point of view, so that we can do crazy things like make it all free for all of our individual users… But once we actually self-host, there’s actually a lot of things that you can do, that just any other tool can’t do without being self-hosted. And what Varun just mentioned is personalization. If you’re fully hosted in a company’s tenant, you can use all of their knowledge bases to create a substantially better product.

I think the way we generally think about is that you have a generic model that’s good, it’s learned from trillions of tokens of code in the public corpus… But if you think about any individual company, they have themselves hundreds of millions of tokens of code that has never seen the light of day. And that’s actually the code that’s the most relevant for them if they want to write any new code. Think of all the internal syntax, semantics, utility functions, libraries, DSLs, whatever it might be. In a model like a Copilot or a Codeium, by the nature of it having to be low-latency, it can only take about 150 or so lines of code as context. So this is not like one of those ChatGPTs, or GPT-4s where you’re putting in files and files of context. It’s really small what you can put it, and so there’s really no way for single inference to have full context of your codebase without actually fine-tuning the base model that we shipped to them on all their local code.

So we’ve actually done a bunch of studies and we’re like – on how this actually massively reduces like hallucinations, and all these other things that you always hear coming up with LLMs. But things like this, things like providing more in-depth analytics - all these things cache come up by being self-hosted. And as Varun mentioned, these are all at the core, to some degree, an infra problem. How do you actually do fine-tuning locally, in a company’s tenant? That’s actually an infra problem that we’re happy to talk more about that, but maybe I’ll just… I’ll pass it back to you, Chris.

Actually, I’m about to ask a follow-up about that, because you’ve got me really thinking about some of the use cases in my own life on that. So with the self-hosting model, and you’re able to now – kind of like OpenAI, with ChatGPT-4, there’s only so far we’re gonna go, because we’ve used the public corpus of knowledge out there on the internet, so there’s only so much more vertical scaling you can do on the model learning… And so you’re touching on the fact that there’s so much hidden IP in code, hidden information in code that is of huge value, particularly to the company that it’s in, because it’s representing their business model, and the way their business has evolved over time. And so if I’m understanding you correctly, you’re basically saying that your solution can take advantage of that on their behalf, and really hone against it.

What are some of the limits on privacy? Are they able to do that? Because that’s a big topic. We’ve actually talked about it on the show before, about, you know, in this generative AI age, with IP concerns and privacy concerns, and getting the lawyers involved… Are you able to do the training on their site, and keep it to the customer entirely? Or do they have to let their IP out, and stuff? How do you approach that problem?

Yeah, so the answer to any question of like “Does any IP leave Codeium for enterprises?”, the answer is always no. So in pretty much every part of this system, our guarantee is to actually be able to deploy this whole thing fully air-gapped. We’ve even deployed in places like AWS.gov cloud, which is entirely – it doesn’t even have connection to the internet kind of scenario. So nothing ever leaves there, to address some of the points you brought up there, Chris. Yeah, I mean, we’re not the only ones who are saying, “Oh, no, the data that a company has privately is super-important”, and is potentially even more important than the size of the model.

[25:46] I think a good example of this is actually Meta. Instead of using like a GitHub Copilot, or any generic system, they decided - I guess in classic Meta fashion - to train their own autocomplete model internally, using all of their code. And they actually published a paper, I think, a few weeks back. And their model was, in terms of size, I think like 1.3 billion parameters. Like, small in respect to the LLM world. And it just massively outperformed GitHub Copilot on pretty much every task. There’s now corroborating evidence to what we’re saying about fine-tuning, that doing this actually does lead to materially better performances for the user in question.

Now, is that Meta model going to be good for everyone else’s code? Probably not. But that’s also not the whole point. And in terms of being able to fine-tune locally - yeah, we’re able to do this completely local. And again, it comes down to scale of data. Our base model has been trained on trillions of tokens of code. That’s a lot. That’s why we need this multi-node GPU setup, to do all this training. But an actual company - if they have, say, even 10 million lines of code, that’s about 100 million or so tokens. There’s a huge order of magnitude difference still between this pre-training and the fine-tuning, which is why we can do this kind of locally, on - actually, surprisingly - whichever hardware they choose to provision for serving their developers.

So again, this comes to some of our infra background and all the stuff that we know how to do - we actually can do fine-tuning and inferences on that same piece of hardware. So we don’t actually ask companies to provision more hardware. And even more critically, we are able to do fine-tuning during any idle time with that GPU. So whenever that GPU is not being used performing inference, it’s actually doing backprop steps to continuously improve the model.

Fine-tuning is just one aspect of like a larger kind of personalization system… But we’ve instrumented all of this on hardware, using our inference, to actually create a system that is relatively easy to manage; it’s not like a crazy amount of overhead for any company to manage or use Codeium… But still, get the maximum possible wins from these AI tools.

Okay, so that is super-cool. And you mentioned things like govcloud, which I have actually worked in in my day job quite a bit, and I can think of a whole bunch of other use cases for me personally… Which begs the question about - kind of going back for a moment, because we are Practical AI, and we like to always give some practical routes for people into that… So if we’re gonna go back toward the beginning of the conversation for a moment, and we have some folks that are listening to this right now and they’ve been using Copilot for a while, they’re probably putting code into ChatGPT, and trying to accelerate there, with varying degrees of success… They’ve been experimenting with Bard, and Bard has gotten better on code lately, obviously… So many people that I talk to are still very frustrated with kind of the workflow of the whole thing… And recognizing that there are these – you’ve outlined these differentiators from Copilot and other competition out there, in a friendly competition kinda way… Talk a little bit about some of the specific generative AI use cases that would be good; if someone was in that position where they’re like “Yeah, I’m using this stuff, but I’m a little bit frustrated with it. I don’t have it down.” And if they were to give Codeium that chance, and dive in on it, can you give me several - kind of layout the use cases of what are they going to get when they move in, from a very practical, like for me, now, as the coder perspective? What does that look like? What are they bonusing in? And maybe give me a couple of different ones, because I’m really curious. And selfishly, I’m probably gonna try each of these that you’re telling me… So I’m scratching my own itch by asking the question.

I think you pointed out - like, yeah, workflows and the user experience for a lot of AI tools… Everyone’s still kind of trying to figure it out. We’re still in the very early days of these AI applications. And this is our learnings of kind of the current product company. We’re actually taking the UX quite seriously, and this is actually what the individual plan is created to get feedback on.

[29:53] Very concretely, I think a lot of people have that frustration of like having to copy a codeblock over to ChatGPT, write out a full prompt, and remember the exact prompt that he typed in before that gave them a good result, and then copying the answers back in, and then making modifications… That workflow is clearly kind of broken. So when we actually built our chat functionality into the IDE, we’re like “Okay, what are all the parts here that can get totally streamlined?” So we actually did things like on top of every function block there’s like little code lenses, that are just these small buttons that someone can click, like “Explain this function.” And it’ll automatically pull in all that relevant context, opened up in the window; you’re not copying anything over… And it’s like writing [unintelligible 00:30:33.07]

Or if you, say, refactor a function, or add docstrings, or write a unit test - these are all just like small little buttons, or preset prompts, that you can just then click and it’ll do his generation on the side. And then we even have a way of clicking apply diff. And because we know where we pull the context in, we can apply a diff right back into the context. So you’re not copying things back and trying to like resolve merge conflicts. All of these things are done kind of automatically.

So there’s a lot of really cool things you can actually do when you start bringing these things into the IDE where developers are, and we’ve spent a lot of time really thinking, as you said, from a workflow point of view, how do you make this super-smooth?

Varun, could you talk a little bit about maybe some specific tasks that you’re seeing people doing? When we talk about generative, and it’s expanded, and from LLMs, and we’re doing things in video, we’re doing things in natural language… All of the different modalities are gradually being addressed with these different models, and different tools that are being built around it. Could you talk a little bit about what are people trying to code right now, what specifically is Codeium helping them – not just about Codeium, but the actual use cases themselves, so that they go “Ah, I can see a path forward. I can do that. I know how to generate this or that or the other with generative AI encoding”? Can you talk a little bit about those in something of a specific level?

So interestingly, just a little bit about multi-modality; I think we’re maybe a little bit far from leveraging, I guess, other modes beyond text for code. I think maybe that will happen, but I think there’s not enough evidence right now yet. For autocomplete, just to be open about sort of the functionality we have - we have autocomplete, we have search, and we have codebase-aware chat. So we recognize right now that of the usage, autocomplete accounts for more than 90% to 95% of the usage of the product. It’s because chatting is not something people do like even everyday, potentially. They might open it up once every couple days, but autocomplete is something that’s like always on, very passively helpful, and people get the most value out of it, which is kind of counterintuitive. I think people don’t recognize that immediately. But when people are doing autocomplete, we’ve recognized there’s two modalities of the way people type code. There’s a modality of accelerating the developer, which is like “Hey, I kind of know what I’m going to type, and I just want to tab-complete the result”, and then there’s also an exploration phase, which is “I don’t even know what I’m trying to do.”

Based on that, I read a comment… This is like a classic thing where my behavior writing code has materially changed because of tools like Codeium, where I’ll write a comment, and I kind of just hope and pray that it pulls in the right context, so that it gives me the best generation possible. So in my mind, for the acceleration case, Codeium is like very helpful. It can autocomplete a bunch of code. But to make the exploration case, that’s where the true magical moment comes in, where I had no clue how I was going to use a bunch of these APIs… And that’s sort of what we’re focused on trying to make really better, whether that be in chat, as well as with autocomplete - how do we make it so that we can build the most knowledgeable AI, that is maximally helpful, and also just minimally annoying?

The interesting thing about Codeium as a product or these autocomplete products is they get a little bit of getting used to, but even despite the fact that they write wrong things, it’s not very annoying, because you can very easily just say, “I don’t want this completion.” It didn’t like write an entire file out, and you need to go and correct a bunch of functions. It was like a couple lines, or maybe like 10 lines of code; you can very easily validate that it’s correct.

[34:02] That comes back to then what Anshul was saying, which is “How do we make sure we can provide always the maximally helpful sort of AI agent?” The answer is “Have the best context possible.” And a couple of nitty-gritty details we do is currently, our context - and we’ll write a blog post about this - is double what Copilot’s is. We allow double the amount of context for autocomplete than what they do.

The second thing is, we’re able to pull context throughout the codebase. And this is actually that same piece of technology that is pulling context throughout the codebase through search and all these other functionalities; it’s getting used as part of chat, for codebase-aware chat, which is something that Copilot doesn’t even have today yet.

The third piece is - finally, for large enterprises - how do we make it so that these models actually semantically understand your code? …which is where fine-tuning comes in. For us, context gets us a lot of the way, but it doesn’t get us all the way. Because you can just imagine, even with double the context - so let’s say we can pass in 1,000 lines of code. For a company with 10 million lines, we’re scratching four orders of magnitude less code than the company actually has. So this is where our vision is we want to continually ramp up the amount of knowledge these models have, and the ways in which they can be helpful. I don’t know if that answered the question there…

It did, actually. Your acceleration versus exploration analogy, that was for me personally - different people get different things - that really clarified for me where I might be using Copilot, or where I would go and use Codeium on that… Because I do struggle on the exploration side myself. It’s a lot easier on the acceleration at the end of the line [unintelligible 00:35:32.09] and crank through that fast, which I’ve been able to do with these other tools… But I have struggled on the exploration side… Because I kind of want to do a thing, and I’m kind of trying to figure it out, and I’m just going to kind of see where my fingers lead on that… And having that ability to support that in the way you described - that gave me a very clear understanding from my standpoint.

So I’d like to ask each of you where this is going, both in the large and in your specific concern, with Codeium. Things have never moved faster than they’re moving right now in terms of how fast these technologies are progressing… And Daniel and I have a habit - we were commenting on our last episode about this; we have a habit of saying “Yeah, we recently mentioned this thing, and that we’d get to it”, but then we turn around and we end up talking about that we just got there way faster than we ever anticipated.

With the speed of generative AI, and you’re already creating these amazing tools and stuff like that, and you’re having to stay out front, where is your brain taking you at night, when you hit when you stop and you chill out and have a glass of wine or whatever you do, and you’re kind of just pondering, “What does the future look like?” And I’d like to know both from your own specific personal standpoints in terms of your product, and that, but just the generative AI world in general - how do you see it going forward? I’d love your insights.

Yeah, I think the classic question in the grand scheme of things is like “Oh my God, is generative AI just gonna totally get rid of my job, or completely invalidate it?” And I think for us, we will be the first people to say that we do think AI will just be the next step in a series of – at least in code - tools that have made developers more productive; that have led them to be able to focus on more kind of interesting parts of software development… And be an assistant, right? All these tools are called AI assistant tools, I think, for a reason.

We’re definitely not at a place yet - and I don’t think for a while - where there isn’t going to be like a human in the loop, in control, guiding the AI and what to do. So from that kind of respect, the doomsday scenario - and I don’t want to speak for Varun, but I think we’re pretty far from that mentality. But we do think – I think we wouldn’t have gotten into Codeium if we didn’t genuinely think that there was just so many things that we do day to day as engineers that are just a little frustrating, boring, take us out of the flow state, slow us down… Those all seem like very prime, ripe things to try to address with AI. And I think that’s kind of our general goal.

[38:04] I think there’s a lot more capabilities to build. I don’t think search chat - these aren’t going to be the last, I guess, building blocks that we build; we have more capabilities coming up that we’re super-excited about. But yeah, it’s also going to be a thing where, as you said, this is moving super-quickly. We have research, open source applications all developing at the same time, at breakneck speed… And so I think part of what we’re also looking forward to is how can we also just like educate all these software developers on the best way to use the AI tools? How do you best make the most use of it, so that they are part of the wave, and that they also can get a lot of value?

Well said. Varun?

Yeah, maybe if I was to just say - like, you were asking me what the big worry is. For me, the big worry is there’s going to be a lot of like exciting new demos that people end up building… And obviously, for us as a company, we need to make strategic bets on like “Hey, this is a worthwhile thing for us to invest in.”

For instance, I think a couple months ago there was an entire craze on agents being able to write like entire pieces of code for you, and all these other things. For us though, we had lots of enterprise companies that were sort of using the product at the time, and recognize that the technology just wasn’t there yet. Take a codebase that’s like 100 million lines of code, or 10 million lines of code. It’s gonna be hard for you to write C++ that’s like five files, that compiles perfectly, and then also uses all the other libraries when you have context that’s like five files. It’s not going to be the easiest problem. And I think that’s maybe an example… But for us, we’ve currently, I would say - just a pat on the back - over the last eight months, iterated significantly faster than every other company in this space, just in terms of the functionality… But we need to make strategic bets on what the next thing to sort of work on is at any given point. And we need to be very careful about like - hey, this is like a very exciting area, but is it actually useful to our users? Is it actually useful, in that - hey, maybe we could do something where… A great example is, given a PR, we generate a summary. And I think Copilot has tried building something like this. And we tried using the product that Copilot had, and it was just wrong a lot of the times. And I think that would have been an interesting idea for us to pursue and keep trying to make work… But then, there’s diminishing returns, and I think Anshul and I have seen this very clearly in autonomous vehicles, where we had a piece of technology that was kind of just not there yet. Like, it needs a couple more breakthroughs in machine learning to kind of get there… And the idea of building it five years in advance - you shouldn’t be doing that. You just 100% shouldn’t be building a tool when the technology just isn’t there yet. And that is something that keeps me up at night, is like “What are the next things we need to build?”, while keeping in mind that this is what the technological capability set is like today, if that makes sense.

It does, and it’s a very Practical AI perspective, if you will. So very fitting final words for the show today. Well, Varun and Anshul, thank you very, very much for coming on the show. It’s fascinating. I got a lot of insight and a lot of new things to go explore from what you’ve just taught me, and I appreciate your time. Thank you for coming on.

Thanks a lot, Chris.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. 💚

Transcript

Recommend

Magical shell history, engineers should focus on writing, LazyVim, CSS in 2023 &...

微软收购动视暴雪宣布延期：“分手费”涨了

[webapps] Piwigo v13.7.0 - Stored Cross-Site Scripting (XSS) (Authenticated)

四驱混动SUV，「卷」进17万区间

一日一技：不走常规路线，列表页1秒搞定

Up and Running Project - Laravel

拉卡拉10亿成立数字科技公司

Cron Job in Laravel

Way to Productive Life Style

Flat Data Structure in No SQL

About Joyk