A developer's toolkit for SOTA AI
source link: https://changelog.com/practicalai/231
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Transcript
Changelog
Click here to listen along while you enjoy the transcript. š§
Welcome to another edition of the Practical AI podcast. My name is Chris Benson, Iām your co-host today. Normally we would have Daniel Whitenack joining us, but Daniel has just gotten off a plane, he flew halfway around the world, and we decided to give him a break from today. He was more lucid than I would be under the same situation.
Today ā I wanted to dive right inā¦ We have a super-cool topic. It is not dissimilar from some of the other general things weāve been talking about, but I have two guests today. Iād like to introduce Varun, who is the CEO and co-founder of Codeium, and Anshul, who is the lead of their enterprise and partnership. Welcome to the show, guys.
Thanks for having us.
Thanks for having us, Chris.
Youāre welcome. Iām really interested in learning more about Codeium. When Daniel lined you guys up, he sent me this thing saying āYouāve got to look at this. This is really coolā, and everything. And Iām like āGet him on the show.ā Heās like āIām already doing that.ā So really glad to have you guys on, and heās going to be bumming that he missed the conversation, because he was pretty excited about itā¦
So I guess I wanted to, before we even dive into Codeium and the problems itās trying to solve and such, if you guys can each just tell me a little bit about how youāve found yourself arriving at this moment, kind of a little bit about your background, how you got into AI, and how this became the thing. Varun, if you want to kick off, and then Anshul afterwards.
So maybe I can get startedā¦ It actually starts in 2017; I started working at this company called Nuro, that does autonomous goods delivery. So itās an AV company. There I sort of worked on large-scale offline deep learning workloads. So as you can imagine, an autonomous vehicle company needs to run large-scale simulation, they need to basically be able to test their ML models at scale before they can actually deploy them on a car.
In 2021 I left Nuro and started Exafunction, which is the company that is building out this product Codeium. And Exafunction started out building GPU virtualization software. So you can imagine, for these large scale deep learning applications, one big problem is GPUs are scarce, theyāre expensive, and also hard to program. And sort of what Exafunction started building was solutions and software to make it so that applications that ran on GPUs were more effectively using the GPU hardware. And we realized that our software with Exafunction was best applicable to generative AI tech, and started building out Codeium around a year ago.
Very cool. And before I dive in, because I have several questions for youā¦ But I want to give Anshul a chance to introduce himself here. Go ahead, Anshul.
Surprisingly, my story is actually quite similar. I was also working at Nuro. So Varun and I used to work together back in the day. I was not actually working on the ML infrastructure side of things. That was something that Varun had hands on on. But I decided to kind of also join the team at Exafunction.
And yeah, as Varun mentioned, about a year ago I think we noticed there was ā I think three things kind of happened at the same time that we noticed, that led us to Codeium. The first one is that weāre engineers, all of us are engineers, and we had all tried the GitHub Copilots, and all these cool AI tools for code in their beta, and weāre like āWow, this is absolutely gonna be the future of software development.ā But at the same time, itās like still scratching the surface of potentially everything that we do as engineers. So that was I think number one that we realized then.
Number two was, you know, talking to a lot of our friends at these bigger companies, a lot of them were just saying āOh, yeah, itās cool. Iāve tried it for my personal project, but I canāt use it at work. My work is not allowing me to use that.ā So that was the second thing we heard.
The third thing was exactly what Varun alluded to - we were building ML infrastructure at scale for really large workloads. When this entire generative AI wave started coming, weāre like āWow, weāre actually kind of sitting on the perfect infrastructure for this.ā So I think all those three things kind of combined together for us to be like āYou know what, letās build out an application ourselves, and build an application that we as engineers are customers ourselvesā and that ended up becoming Codeium.
As you were getting into doing GPU software, what was in general some of the challenges that you were seeing? NVIDIA has their various software, supporting things like thatā¦ Clearly, you saw that there was a need for something beyond that. Can you talk a little bit about just the layout that you saw in the environment before you got to all the generative stuff, and the fact that you had infrastructure? What positioned you for that, and what was the thing that you decided that you needed to address?
Maybe I can take it a step back of why these GPU workloads are just a little bit annoying compared to CPU workloadsā¦
[05:33] One of the really sort of unique things about GPUs is that, unlike CPUs, itās kind of tricky to virtualize. One common thing that we have with CPUs is you can put a bunch of containers on a single VM, and then you can kind of make use of the CPU compute effectively. You can basically dump 10 applications onto a CPU and itās perfectly fine. For GPU, itās a little bit more messy, because the GPU doesnāt have a ton of memory. So you canāt just load up infinitely many models on there. Letās imagine you have a GPU with 16 gigs of memory, and each of these models takes like 10 gigs. You canāt really even put two applications on there. So then that already becomes a big issue. And thatās sort of what a lot of these large deep learning workloads were struggling with.
When I was at Nuro, one big problem we had was we had around tens of models, but we had these workloads that needed hundreds of GPUs. Some of them even thousands of GPUs. And we struggled to basically make it so that we were even able to use the hardware properly. And then you can imagine the complexity then stacks with now weāre in a state where companies have trouble even getting access to 10 GPUs, because of NVIDIA sort of scarcity issues. And then also the cost of a GPU is not like a CPU; itās significantly more expensive. The cost of a single H-100 chip is well over 30k. So these arenāt like very cheap chips. So there was a big need at the time to figure out how do we leverage the hardware properly? ā¦and sort of thatās what we had to build software for.
And just to clarify for me - was that while you were still at Nuro, or was that after you started Exafunction?
Yeah, so while I was at Nuro, we sort of worked through ā or I sort of led a team that sort of built software that kind of fixed these problemsā¦ But Exafunction was focused on generically how do we make sure deep learning-based applications could best leverage GPUs? Thatās sort of what we started out building, actually. And then Codeium came out from that, actually.
Gotcha. Tell me a little bit about - as you have been right in the middle of this progression, just to frame it for a second, if you look at the last couple of years in particular, and the pace of change has been so muchā¦ And so you were right there, starting at Nuro, and then creating Exafunction, seeing some of the challengesā¦ Could you talk a little bit about how the industry was evolving and changing as you were seeing it, so that we can get a sense of kind of how you moved toward Codeium? To give a little bit of the history, instead of just starting from where that is. Can you talk a little bit about, the itches that you were scratching, and why it led that direction? What did this AI industry look like to you?
Yeah, so when we started, you can just imagine, everything was a lot more smaller-scale, right? The hyperscalers or the cloud providers just didnāt have nearly as much GPUs. If you asked them what fraction of cloud spend is GPU spend, itās probably like very small, single-digit percentage points, maybe even less than that at the time. So this was like a very small workload for them when we sort of started. Both me and Anshul started at Nuro in like 2018. But then over time, this grew a ton. We could see it from the training workloads. These were no longer even single-node training workloads. Back in the day, a single GPU node that had maybe like eight V100s or something was like considered a lot of compute. And suddenly now we were able to witness the fact that this was slowly becoming eight A100 nodes, and then more than eight of these nodes were necessary then even to train these models.
And similarly, to prove out that these models were capable in an actual production setting, you needed to run offline testing at massive scales, on the order of like 5,000 to 10,000 T4s scales, which is like kind of incredible in terms of raw flops. So we were able to see this hockey stick happen in front of us, and thatās sort of what made us want to start Exafunction in the first place. We realized that there were going to be large deep learning workloads.
One interesting fact is for us, for just the Exafunction GPU virtualization software that we ended up selling to enterprises, we ended up managing over 10,000 GPUs on GCP, in a single GCP region. So we ended up managing more than 20%. And we realized that āHey, this is only going to keep growing.ā When we talked to the cloud providers, they were only going to keep growing the number of GPUs, and we realized ā I guess the interesting thing was in the future, generative AI was going to be potentially the largest GPU workload. That was the big thing we realized once GPT-3 came out, which was I guess in 2021 now.
Gotcha. But at that point were you already at Exafunction, and had already started at that point?
Yeah, it had already started, and we were sort of selling GPU virtualization software to large autonomous vehicle and robotics companies.
[10:09] Gotcha. And so basically, if Iām understanding you correctly, the whole generative tsunami just kind of landed on you when you were already sitting in that space, doing GPU virtualization already. So you just managed to land right in front of the wave, it sounds like.
Yeah. So we started working on Codeium maybe four or five months ago before ChatGPT. It was interesting just because we realized that an application like GitHub Copilot was going to be one of the largest GPU workloads, period. I donāt know if ā youāve probably tried the product out. Every time you do a key press, youāre going out to the cloud and doing trillions of computations. So itās like a massive workload. And we had, as Anshul said, the perfect infrastructure to basically run this at enormous scale. Not to mention we were in love with the product from day one. We were early users of the product the moment it came out in 2021.
Very cool. So as generative is starting to take off, kind of with ChatGPT hitting the world, and really changing things quite rapidlyā¦ I think people are still shocked at how fast things have moved. You had started Codeium alreadyā¦ What kind of synergy were you starting to see there in terms of knowing that you have one of presumably many, many GPTs coming, and other similar generative models? You had just gotten into Codeiumā¦ Can you talk a little bit about what that was, and what were you putting together in your minds to recognize the opportunity that it was?
Yeah, so I think one of the great things about the entire ChatGPT wave is that everyone was using it. This is a thing where literally every individual is using AI. And so it helped us, in general. A big wave raises all ships kind of thing. It really helped us. We werenāt really going out and telling people āHey, a tool like Codeium can help productivityā, because that was kind of just now assumed by everybody. Like āOh yeah, if I do any kind of knowledge work, then thereās potential for AI to help.ā So from that sense, when this entire ChatGPT wave really came about, that overall kind of just like helped us in terms of convincing people to even try the product.
The other thing that we recognized is that we were positioning ourselves very specifically from the beginning when it comes to code. Code is like actually a very interesting modality. Itās not like your standard ChatGPT, where you have a long context that a user puts in, and then it produces context coming out. Code is interesting, in the sense that, as we mentioned, itās an autocomplete, thatās like a passive AI, rather than like an AI that youāre actually instructing the model to do something. Itās happening every keystroke, so it has to be a relatively smaller model. You canāt have these hundreds of billions of parameter models being used. It has to be relatively low latency.
And then code itself is interesting, right? If you ever cursor in the middle of a code block, the context both before and after your cursor really matters. Itās not like just what comes before. So thereās all these interesting situational kind of constraints about code that you put all these things together and realize that, okay, all these ChatGPT ways and conversational AIs are happening, thatās great, but weāre still not going to be like roller over by that, because weāre kind of focusing on a very specific application and modality of a lens, that was pretty unique in many ways.
Break: [13:31]
Could you take a moment, as weāre diving into Codeium and generative AI and its unique capabilities there, and just differentiate a little bit about ā you know, so many people have tried Copilot, and so itās kind of inevitable that youāre gonna get that comparison, to some degreeā¦ Can you talk a little bit about what Copilotās not doing for generative AI, or how youāre approaching it that allows you to show people this as a better way forward, from your perspective?
I mean, we have tons of respect for the Copilot team. Iām just gonna start there. As Varun said, we were all early users of itā¦
Definitely not putting you into conflict with them. Just is a starting point for peopleā¦
[14:24] Absolutely, yeah. The way we kind of view this, and like I alluded to earlier, youāre writing brand new code with autocomplete. Itās really just one small task that we do as engineers. We refactor code, we ask for help, we write documentation, we do PR reviewsā¦ And so kind of our general approach has always been āLetās try to build an AI toolkit, rather than an AI autocomplete tool.ā
So we can get more into this, into the weeds here, but autocomplete is just one of our functionalities that we provide. We provide like an in-IDE chat, something like ChatGPT, except integrated with the IDEā¦ Natural language search over your codebase using like embeddings and vector stores in the backgroundā¦ So weāre really trying to expand, like, how can we address the entire software development lifecycle. So I think thatās probably the most obvious difference with a tool like Copilot, from like an individual developer point of view.
But then the other thing, which really kind of builds off of all the infrastructure that Varun was mentioning earlier, is that we were already deploying ML infrastructure in our previous customersā private clouds. We already had all this expertise of āHow can we take actual ML infra, deploy it for a customer in a way that they can fully trust the solution, because weāre not getting any of their data?ā
And so another really big differentiator for us was like - okay, I think this might actually be a tool that enterprises can use confidently and safely, because we have the infrastructure to do the deployment in a manner that they would be open to using. So I think that was like the other differentiator when it came specifically to enterprises. But we can dive more into that later.
No, that sounds good. I want you to connect one more thing for meā¦ Going from being able to deploy the infrastructure and helping your customers in that way, to Codeium as a tool, whatās the leap there that got you from one to the other? How did you get from infra-focused to Codeium-focused?
Oh yeah, I think we had to do like a full 180 when we started. We went from full inference service company to like āLetās create a product for consumers.ā It was a full 180 in terms of productā¦
Yeah, to some degree a pivot, because we knew that eventually weāll deploy to customersā VPCs. That sounds great. But if weāre going to ship something to a customer, we need to be super-confident that it was a product that would work wellā¦ Because weāre getting no feedback from their developers. And so we actually first focused for the first six or seven months of Codeium just building out like an individual tier. Any developers can go try it, we can see how they like it, try our new capabilities, get feedback from an actual communityā¦ Do all these community-building things that we hadnāt really done as an infra-as-a-service company. That was a really huge focus for us, and weāve grown our actual Codeium individual plan to over 100,000 active developers using us for many hours a day, because you code for that long if youāre a developer. Thatās like plenty of feedback to us. Plenty of people actually using the tool, telling us āYeah, this is good. This isnāt good. Oh, you tried pushing a new model? Thatās worse.ā All those things, we actually learned, so that we can get a product thatās good. So that was like the intermediate period - really learning from actual developers, what is a good product and what is not. I think thatās always going to be a key kind of part of our development cycle.
Youāre coming into this with this rich knowledge in infrastructure for customersā¦ Thatās a huge area of expertise. Itās an area of expertise that even though youāre moving forward into kind of the Codeium era, if you will, in my words, that is a skill set and level of expertise that very few organizations have deeply, that you would have had there. How did that inform you in terms of Codeium, and differentiation against - whether it be Copilot or other tools that are out there, or just developers throwing things into ChatGPT? What did that background give you that gave you that differentiation in the marketplace?
[18:13] Yeah. So I think when we started, the thing we started with is like āNo one cares if we have better infrastructure once youāre a product. If we have better infrastructure, thatās great, but if that makes a product thatās the same, no one should care.ā
Theyād just assume that you should.
Yeah. So what we started with is we set a very high bar for ourselves. Codeium is an entirely free product. So for the individual user, itās something that they can install and use immediately for free. There are unlimited ā thereās no limits at all. So when it comes to autocomplete, you can use it as much as you want. And this is, by the way, forced us to do things where infrastructure is as efficient as possible.
Just to give you a sense of the numbers weāre talking about here, we process over 10 billion tokens of code a day. That might sound like a large number, but thatās over a billion lines of code a day that we process for our own developers. Weāre forced to do this entirely for free. And then on top of that, we probably have one of the worldās largest chat applications also, because itās in IDE as well. And all of this put together has allowed us to build a very, very scalable piece of infrastructure, such that weāre the largest users of our own product. We are the largest users of our own product, we learn the most from our users, and we can then take those learnings and deploy in a very cost-effective, very efficient and optimized way to our own enterprise users. Itās one of those things where we force ourselves to learn a lot from an individual plan, and then take all those learnings and actually bring them over to the enterprise. And a lot of the learnings we were only able to make because we placed very ā I would say like annoying infrastructure constraints on ourselves by saying āHey, you guys have got to do this entirely for free, basically.ā And weāre committed to building ā Codeium is going to be a free product forever, actually. The individual plan will always be free. And itās one of those things where our users are just always like āHow are these guys even doing it? What are they even doing to make this happen?ā And most of our users, by the way, are users that have churned off of Copilot. We have spent very little, if not anything on marketing. So itās just one of those things where our users are like āHow do we make this free?ā We take the approach of ā we think some of the best products in the world are free. Products like Google, theyāre entirely free. Google doesnāt tell you all the time that they have the best infrastructure, but they do have the best infrastructure. It just so happens to be the case that that shows itself off in the best product. And we could talk a little bit more about how we take our sort of focus on infrastructure and make a much better enterprise product as well, but thatās the way we sort of look at itā¦ Itās like, how do we deliver materially better experiences with our infrastructure? ā¦and our users shouldnāt care that we actually did that.
Youāve brought it up, youāve got to go there now, manā¦ Go ahead and dive right into it.
I guess one of the interesting things - like, just going into how we run one of the worldās largest LLM applications, what that sort of focus forced us to do is give it a single piece of compute, like letās say a single node or a single box of GPUs; we can host the most number of users on there. So letās say a large company comes to us, they can be confident that whether theyāre on-prem or theyāre in VPC, we can give them a solution where the cost of the hardware is not going to dominate the cost of the software itself. Because right now, thereās kind of this misunderstanding that GPUs are really expensive. Which is true, they are. But the trade-off is they have a lot of compute. Modern GPUs like A100s can do 300 teraflops of compute, which is like some ungodly number, right? Thatās a crazy number compared to what a modern CPU can do. And we can leverage that the best. And weāve sort of been forced to do that. If we didnāt do that properly, weād have outages with our service all the time. Because of that, enterprises trust us to be like the best solution to run in their own tenant, in an air-gapped wayā¦ Which is fantastic, because thatās like the way that we can build the most trust and deploy these pieces of technology to them the most effectively, because they donāt want to ship their code outside of the company.
Anshul can talk a little bit more about how we leverage things like fine-tuning as well. Thatās like a purely infrastructure problem thatās very unique to us, versus like any other company as well. Anshul, do you want to sort of take that?
[21:59] Yeah, I think - as Varun said, thereās a lot of things that we do from the individual infrastructure point of view, so that we can do crazy things like make it all free for all of our individual usersā¦ But once we actually self-host, thereās actually a lot of things that you can do, that just any other tool canāt do without being self-hosted. And what Varun just mentioned is personalization. If youāre fully hosted in a companyās tenant, you can use all of their knowledge bases to create a substantially better product.
I think the way we generally think about is that you have a generic model thatās good, itās learned from trillions of tokens of code in the public corpusā¦ But if you think about any individual company, they have themselves hundreds of millions of tokens of code that has never seen the light of day. And thatās actually the code thatās the most relevant for them if they want to write any new code. Think of all the internal syntax, semantics, utility functions, libraries, DSLs, whatever it might be. In a model like a Copilot or a Codeium, by the nature of it having to be low-latency, it can only take about 150 or so lines of code as context. So this is not like one of those ChatGPTs, or GPT-4s where youāre putting in files and files of context. Itās really small what you can put it, and so thereās really no way for single inference to have full context of your codebase without actually fine-tuning the base model that we shipped to them on all their local code.
So weāve actually done a bunch of studies and weāre like ā on how this actually massively reduces like hallucinations, and all these other things that you always hear coming up with LLMs. But things like this, things like providing more in-depth analytics - all these things cache come up by being self-hosted. And as Varun mentioned, these are all at the core, to some degree, an infra problem. How do you actually do fine-tuning locally, in a companyās tenant? Thatās actually an infra problem that weāre happy to talk more about that, but maybe Iāll justā¦ Iāll pass it back to you, Chris.
Actually, Iām about to ask a follow-up about that, because youāve got me really thinking about some of the use cases in my own life on that. So with the self-hosting model, and youāre able to now ā kind of like OpenAI, with ChatGPT-4, thereās only so far weāre gonna go, because weāve used the public corpus of knowledge out there on the internet, so thereās only so much more vertical scaling you can do on the model learningā¦ And so youāre touching on the fact that thereās so much hidden IP in code, hidden information in code that is of huge value, particularly to the company that itās in, because itās representing their business model, and the way their business has evolved over time. And so if Iām understanding you correctly, youāre basically saying that your solution can take advantage of that on their behalf, and really hone against it.
What are some of the limits on privacy? Are they able to do that? Because thatās a big topic. Weāve actually talked about it on the show before, about, you know, in this generative AI age, with IP concerns and privacy concerns, and getting the lawyers involvedā¦ Are you able to do the training on their site, and keep it to the customer entirely? Or do they have to let their IP out, and stuff? How do you approach that problem?
Yeah, so the answer to any question of like āDoes any IP leave Codeium for enterprises?ā, the answer is always no. So in pretty much every part of this system, our guarantee is to actually be able to deploy this whole thing fully air-gapped. Weāve even deployed in places like AWS.gov cloud, which is entirely ā it doesnāt even have connection to the internet kind of scenario. So nothing ever leaves there, to address some of the points you brought up there, Chris. Yeah, I mean, weāre not the only ones who are saying, āOh, no, the data that a company has privately is super-importantā, and is potentially even more important than the size of the model.
[25:46] I think a good example of this is actually Meta. Instead of using like a GitHub Copilot, or any generic system, they decided - I guess in classic Meta fashion - to train their own autocomplete model internally, using all of their code. And they actually published a paper, I think, a few weeks back. And their model was, in terms of size, I think like 1.3 billion parameters. Like, small in respect to the LLM world. And it just massively outperformed GitHub Copilot on pretty much every task. Thereās now corroborating evidence to what weāre saying about fine-tuning, that doing this actually does lead to materially better performances for the user in question.
Now, is that Meta model going to be good for everyone elseās code? Probably not. But thatās also not the whole point. And in terms of being able to fine-tune locally - yeah, weāre able to do this completely local. And again, it comes down to scale of data. Our base model has been trained on trillions of tokens of code. Thatās a lot. Thatās why we need this multi-node GPU setup, to do all this training. But an actual company - if they have, say, even 10 million lines of code, thatās about 100 million or so tokens. Thereās a huge order of magnitude difference still between this pre-training and the fine-tuning, which is why we can do this kind of locally, on - actually, surprisingly - whichever hardware they choose to provision for serving their developers.
So again, this comes to some of our infra background and all the stuff that we know how to do - we actually can do fine-tuning and inferences on that same piece of hardware. So we donāt actually ask companies to provision more hardware. And even more critically, we are able to do fine-tuning during any idle time with that GPU. So whenever that GPU is not being used performing inference, itās actually doing backprop steps to continuously improve the model.
Fine-tuning is just one aspect of like a larger kind of personalization systemā¦ But weāve instrumented all of this on hardware, using our inference, to actually create a system that is relatively easy to manage; itās not like a crazy amount of overhead for any company to manage or use Codeiumā¦ But still, get the maximum possible wins from these AI tools.
Okay, so that is super-cool. And you mentioned things like govcloud, which I have actually worked in in my day job quite a bit, and I can think of a whole bunch of other use cases for me personallyā¦ Which begs the question about - kind of going back for a moment, because we are Practical AI, and we like to always give some practical routes for people into thatā¦ So if weāre gonna go back toward the beginning of the conversation for a moment, and we have some folks that are listening to this right now and theyāve been using Copilot for a while, theyāre probably putting code into ChatGPT, and trying to accelerate there, with varying degrees of successā¦ Theyāve been experimenting with Bard, and Bard has gotten better on code lately, obviouslyā¦ So many people that I talk to are still very frustrated with kind of the workflow of the whole thingā¦ And recognizing that there are these ā youāve outlined these differentiators from Copilot and other competition out there, in a friendly competition kinda wayā¦ Talk a little bit about some of the specific generative AI use cases that would be good; if someone was in that position where theyāre like āYeah, Iām using this stuff, but Iām a little bit frustrated with it. I donāt have it down.ā And if they were to give Codeium that chance, and dive in on it, can you give me several - kind of layout the use cases of what are they going to get when they move in, from a very practical, like for me, now, as the coder perspective? What does that look like? What are they bonusing in? And maybe give me a couple of different ones, because Iām really curious. And selfishly, Iām probably gonna try each of these that youāre telling meā¦ So Iām scratching my own itch by asking the question.
I think you pointed out - like, yeah, workflows and the user experience for a lot of AI toolsā¦ Everyoneās still kind of trying to figure it out. Weāre still in the very early days of these AI applications. And this is our learnings of kind of the current product company. Weāre actually taking the UX quite seriously, and this is actually what the individual plan is created to get feedback on.
[29:53] Very concretely, I think a lot of people have that frustration of like having to copy a codeblock over to ChatGPT, write out a full prompt, and remember the exact prompt that he typed in before that gave them a good result, and then copying the answers back in, and then making modificationsā¦ That workflow is clearly kind of broken. So when we actually built our chat functionality into the IDE, weāre like āOkay, what are all the parts here that can get totally streamlined?ā So we actually did things like on top of every function block thereās like little code lenses, that are just these small buttons that someone can click, like āExplain this function.ā And itāll automatically pull in all that relevant context, opened up in the window; youāre not copying anything overā¦ And itās like writing [unintelligible 00:30:33.07]
Or if you, say, refactor a function, or add docstrings, or write a unit test - these are all just like small little buttons, or preset prompts, that you can just then click and itāll do his generation on the side. And then we even have a way of clicking apply diff. And because we know where we pull the context in, we can apply a diff right back into the context. So youāre not copying things back and trying to like resolve merge conflicts. All of these things are done kind of automatically.
So thereās a lot of really cool things you can actually do when you start bringing these things into the IDE where developers are, and weāve spent a lot of time really thinking, as you said, from a workflow point of view, how do you make this super-smooth?
Varun, could you talk a little bit about maybe some specific tasks that youāre seeing people doing? When we talk about generative, and itās expanded, and from LLMs, and weāre doing things in video, weāre doing things in natural languageā¦ All of the different modalities are gradually being addressed with these different models, and different tools that are being built around it. Could you talk a little bit about what are people trying to code right now, what specifically is Codeium helping them ā not just about Codeium, but the actual use cases themselves, so that they go āAh, I can see a path forward. I can do that. I know how to generate this or that or the other with generative AI encodingā? Can you talk a little bit about those in something of a specific level?
So interestingly, just a little bit about multi-modality; I think weāre maybe a little bit far from leveraging, I guess, other modes beyond text for code. I think maybe that will happen, but I think thereās not enough evidence right now yet. For autocomplete, just to be open about sort of the functionality we have - we have autocomplete, we have search, and we have codebase-aware chat. So we recognize right now that of the usage, autocomplete accounts for more than 90% to 95% of the usage of the product. Itās because chatting is not something people do like even everyday, potentially. They might open it up once every couple days, but autocomplete is something thatās like always on, very passively helpful, and people get the most value out of it, which is kind of counterintuitive. I think people donāt recognize that immediately. But when people are doing autocomplete, weāve recognized thereās two modalities of the way people type code. Thereās a modality of accelerating the developer, which is like āHey, I kind of know what Iām going to type, and I just want to tab-complete the resultā, and then thereās also an exploration phase, which is āI donāt even know what Iām trying to do.ā
Based on that, I read a commentā¦ This is like a classic thing where my behavior writing code has materially changed because of tools like Codeium, where Iāll write a comment, and I kind of just hope and pray that it pulls in the right context, so that it gives me the best generation possible. So in my mind, for the acceleration case, Codeium is like very helpful. It can autocomplete a bunch of code. But to make the exploration case, thatās where the true magical moment comes in, where I had no clue how I was going to use a bunch of these APIsā¦ And thatās sort of what weāre focused on trying to make really better, whether that be in chat, as well as with autocomplete - how do we make it so that we can build the most knowledgeable AI, that is maximally helpful, and also just minimally annoying?
The interesting thing about Codeium as a product or these autocomplete products is they get a little bit of getting used to, but even despite the fact that they write wrong things, itās not very annoying, because you can very easily just say, āI donāt want this completion.ā It didnāt like write an entire file out, and you need to go and correct a bunch of functions. It was like a couple lines, or maybe like 10 lines of code; you can very easily validate that itās correct.
[34:02] That comes back to then what Anshul was saying, which is āHow do we make sure we can provide always the maximally helpful sort of AI agent?ā The answer is āHave the best context possible.ā And a couple of nitty-gritty details we do is currently, our context - and weāll write a blog post about this - is double what Copilotās is. We allow double the amount of context for autocomplete than what they do.
The second thing is, weāre able to pull context throughout the codebase. And this is actually that same piece of technology that is pulling context throughout the codebase through search and all these other functionalities; itās getting used as part of chat, for codebase-aware chat, which is something that Copilot doesnāt even have today yet.
The third piece is - finally, for large enterprises - how do we make it so that these models actually semantically understand your code? ā¦which is where fine-tuning comes in. For us, context gets us a lot of the way, but it doesnāt get us all the way. Because you can just imagine, even with double the context - so letās say we can pass in 1,000 lines of code. For a company with 10 million lines, weāre scratching four orders of magnitude less code than the company actually has. So this is where our vision is we want to continually ramp up the amount of knowledge these models have, and the ways in which they can be helpful. I donāt know if that answered the question thereā¦
It did, actually. Your acceleration versus exploration analogy, that was for me personally - different people get different things - that really clarified for me where I might be using Copilot, or where I would go and use Codeium on thatā¦ Because I do struggle on the exploration side myself. Itās a lot easier on the acceleration at the end of the line [unintelligible 00:35:32.09] and crank through that fast, which Iāve been able to do with these other toolsā¦ But I have struggled on the exploration sideā¦ Because I kind of want to do a thing, and Iām kind of trying to figure it out, and Iām just going to kind of see where my fingers lead on thatā¦ And having that ability to support that in the way you described - that gave me a very clear understanding from my standpoint.
So Iād like to ask each of you where this is going, both in the large and in your specific concern, with Codeium. Things have never moved faster than theyāre moving right now in terms of how fast these technologies are progressingā¦ And Daniel and I have a habit - we were commenting on our last episode about this; we have a habit of saying āYeah, we recently mentioned this thing, and that weād get to itā, but then we turn around and we end up talking about that we just got there way faster than we ever anticipated.
With the speed of generative AI, and youāre already creating these amazing tools and stuff like that, and youāre having to stay out front, where is your brain taking you at night, when you hit when you stop and you chill out and have a glass of wine or whatever you do, and youāre kind of just pondering, āWhat does the future look like?ā And Iād like to know both from your own specific personal standpoints in terms of your product, and that, but just the generative AI world in general - how do you see it going forward? Iād love your insights.
Yeah, I think the classic question in the grand scheme of things is like āOh my God, is generative AI just gonna totally get rid of my job, or completely invalidate it?ā And I think for us, we will be the first people to say that we do think AI will just be the next step in a series of ā at least in code - tools that have made developers more productive; that have led them to be able to focus on more kind of interesting parts of software developmentā¦ And be an assistant, right? All these tools are called AI assistant tools, I think, for a reason.
Weāre definitely not at a place yet - and I donāt think for a while - where there isnāt going to be like a human in the loop, in control, guiding the AI and what to do. So from that kind of respect, the doomsday scenario - and I donāt want to speak for Varun, but I think weāre pretty far from that mentality. But we do think ā I think we wouldnāt have gotten into Codeium if we didnāt genuinely think that there was just so many things that we do day to day as engineers that are just a little frustrating, boring, take us out of the flow state, slow us downā¦ Those all seem like very prime, ripe things to try to address with AI. And I think thatās kind of our general goal.
[38:04] I think thereās a lot more capabilities to build. I donāt think search chat - these arenāt going to be the last, I guess, building blocks that we build; we have more capabilities coming up that weāre super-excited about. But yeah, itās also going to be a thing where, as you said, this is moving super-quickly. We have research, open source applications all developing at the same time, at breakneck speedā¦ And so I think part of what weāre also looking forward to is how can we also just like educate all these software developers on the best way to use the AI tools? How do you best make the most use of it, so that they are part of the wave, and that they also can get a lot of value?
Well said. Varun?
Yeah, maybe if I was to just say - like, you were asking me what the big worry is. For me, the big worry is thereās going to be a lot of like exciting new demos that people end up buildingā¦ And obviously, for us as a company, we need to make strategic bets on like āHey, this is a worthwhile thing for us to invest in.ā
For instance, I think a couple months ago there was an entire craze on agents being able to write like entire pieces of code for you, and all these other things. For us though, we had lots of enterprise companies that were sort of using the product at the time, and recognize that the technology just wasnāt there yet. Take a codebase thatās like 100 million lines of code, or 10 million lines of code. Itās gonna be hard for you to write C++ thatās like five files, that compiles perfectly, and then also uses all the other libraries when you have context thatās like five files. Itās not going to be the easiest problem. And I think thatās maybe an exampleā¦ But for us, weāve currently, I would say - just a pat on the back - over the last eight months, iterated significantly faster than every other company in this space, just in terms of the functionalityā¦ But we need to make strategic bets on what the next thing to sort of work on is at any given point. And we need to be very careful about like - hey, this is like a very exciting area, but is it actually useful to our users? Is it actually useful, in that - hey, maybe we could do something whereā¦ A great example is, given a PR, we generate a summary. And I think Copilot has tried building something like this. And we tried using the product that Copilot had, and it was just wrong a lot of the times. And I think that would have been an interesting idea for us to pursue and keep trying to make workā¦ But then, thereās diminishing returns, and I think Anshul and I have seen this very clearly in autonomous vehicles, where we had a piece of technology that was kind of just not there yet. Like, it needs a couple more breakthroughs in machine learning to kind of get thereā¦ And the idea of building it five years in advance - you shouldnāt be doing that. You just 100% shouldnāt be building a tool when the technology just isnāt there yet. And that is something that keeps me up at night, is like āWhat are the next things we need to build?ā, while keeping in mind that this is what the technological capability set is like today, if that makes sense.
It does, and itās a very Practical AI perspective, if you will. So very fitting final words for the show today. Well, Varun and Anshul, thank you very, very much for coming on the show. Itās fascinating. I got a lot of insight and a lot of new things to go explore from what youāve just taught me, and I appreciate your time. Thank you for coming on.
Thanks a lot, Chris.
Changelog
Our transcripts are open source on GitHub. Improvements are welcome. š
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK