3

A developer's toolkit for SOTA AI

 1 year ago
source link: https://changelog.com/practicalai/231
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Transcript

šŸ“ Edit Transcript

Changelog

Click here to listen along while you enjoy the transcript. šŸŽ§

Welcome to another edition of the Practical AI podcast. My name is Chris Benson, Iā€™m your co-host today. Normally we would have Daniel Whitenack joining us, but Daniel has just gotten off a plane, he flew halfway around the world, and we decided to give him a break from today. He was more lucid than I would be under the same situation.

Today ā€“ I wanted to dive right inā€¦ We have a super-cool topic. It is not dissimilar from some of the other general things weā€™ve been talking about, but I have two guests today. Iā€™d like to introduce Varun, who is the CEO and co-founder of Codeium, and Anshul, who is the lead of their enterprise and partnership. Welcome to the show, guys.

Thanks for having us.

Thanks for having us, Chris.

Youā€™re welcome. Iā€™m really interested in learning more about Codeium. When Daniel lined you guys up, he sent me this thing saying ā€œYouā€™ve got to look at this. This is really coolā€, and everything. And Iā€™m like ā€œGet him on the show.ā€ Heā€™s like ā€œIā€™m already doing that.ā€ So really glad to have you guys on, and heā€™s going to be bumming that he missed the conversation, because he was pretty excited about itā€¦
So I guess I wanted to, before we even dive into Codeium and the problems itā€™s trying to solve and such, if you guys can each just tell me a little bit about how youā€™ve found yourself arriving at this moment, kind of a little bit about your background, how you got into AI, and how this became the thing. Varun, if you want to kick off, and then Anshul afterwards.

So maybe I can get startedā€¦ It actually starts in 2017; I started working at this company called Nuro, that does autonomous goods delivery. So itā€™s an AV company. There I sort of worked on large-scale offline deep learning workloads. So as you can imagine, an autonomous vehicle company needs to run large-scale simulation, they need to basically be able to test their ML models at scale before they can actually deploy them on a car.

In 2021 I left Nuro and started Exafunction, which is the company that is building out this product Codeium. And Exafunction started out building GPU virtualization software. So you can imagine, for these large scale deep learning applications, one big problem is GPUs are scarce, theyā€™re expensive, and also hard to program. And sort of what Exafunction started building was solutions and software to make it so that applications that ran on GPUs were more effectively using the GPU hardware. And we realized that our software with Exafunction was best applicable to generative AI tech, and started building out Codeium around a year ago.

Very cool. And before I dive in, because I have several questions for youā€¦ But I want to give Anshul a chance to introduce himself here. Go ahead, Anshul.

Surprisingly, my story is actually quite similar. I was also working at Nuro. So Varun and I used to work together back in the day. I was not actually working on the ML infrastructure side of things. That was something that Varun had hands on on. But I decided to kind of also join the team at Exafunction.

And yeah, as Varun mentioned, about a year ago I think we noticed there was ā€“ I think three things kind of happened at the same time that we noticed, that led us to Codeium. The first one is that weā€™re engineers, all of us are engineers, and we had all tried the GitHub Copilots, and all these cool AI tools for code in their beta, and weā€™re like ā€œWow, this is absolutely gonna be the future of software development.ā€ But at the same time, itā€™s like still scratching the surface of potentially everything that we do as engineers. So that was I think number one that we realized then.

Number two was, you know, talking to a lot of our friends at these bigger companies, a lot of them were just saying ā€œOh, yeah, itā€™s cool. Iā€™ve tried it for my personal project, but I canā€™t use it at work. My work is not allowing me to use that.ā€ So that was the second thing we heard.

The third thing was exactly what Varun alluded to - we were building ML infrastructure at scale for really large workloads. When this entire generative AI wave started coming, weā€™re like ā€œWow, weā€™re actually kind of sitting on the perfect infrastructure for this.ā€ So I think all those three things kind of combined together for us to be like ā€œYou know what, letā€™s build out an application ourselves, and build an application that we as engineers are customers ourselvesā€ and that ended up becoming Codeium.

As you were getting into doing GPU software, what was in general some of the challenges that you were seeing? NVIDIA has their various software, supporting things like thatā€¦ Clearly, you saw that there was a need for something beyond that. Can you talk a little bit about just the layout that you saw in the environment before you got to all the generative stuff, and the fact that you had infrastructure? What positioned you for that, and what was the thing that you decided that you needed to address?

Maybe I can take it a step back of why these GPU workloads are just a little bit annoying compared to CPU workloadsā€¦

[05:33] One of the really sort of unique things about GPUs is that, unlike CPUs, itā€™s kind of tricky to virtualize. One common thing that we have with CPUs is you can put a bunch of containers on a single VM, and then you can kind of make use of the CPU compute effectively. You can basically dump 10 applications onto a CPU and itā€™s perfectly fine. For GPU, itā€™s a little bit more messy, because the GPU doesnā€™t have a ton of memory. So you canā€™t just load up infinitely many models on there. Letā€™s imagine you have a GPU with 16 gigs of memory, and each of these models takes like 10 gigs. You canā€™t really even put two applications on there. So then that already becomes a big issue. And thatā€™s sort of what a lot of these large deep learning workloads were struggling with.

When I was at Nuro, one big problem we had was we had around tens of models, but we had these workloads that needed hundreds of GPUs. Some of them even thousands of GPUs. And we struggled to basically make it so that we were even able to use the hardware properly. And then you can imagine the complexity then stacks with now weā€™re in a state where companies have trouble even getting access to 10 GPUs, because of NVIDIA sort of scarcity issues. And then also the cost of a GPU is not like a CPU; itā€™s significantly more expensive. The cost of a single H-100 chip is well over 30k. So these arenā€™t like very cheap chips. So there was a big need at the time to figure out how do we leverage the hardware properly? ā€¦and sort of thatā€™s what we had to build software for.

And just to clarify for me - was that while you were still at Nuro, or was that after you started Exafunction?

Yeah, so while I was at Nuro, we sort of worked through ā€“ or I sort of led a team that sort of built software that kind of fixed these problemsā€¦ But Exafunction was focused on generically how do we make sure deep learning-based applications could best leverage GPUs? Thatā€™s sort of what we started out building, actually. And then Codeium came out from that, actually.

Gotcha. Tell me a little bit about - as you have been right in the middle of this progression, just to frame it for a second, if you look at the last couple of years in particular, and the pace of change has been so muchā€¦ And so you were right there, starting at Nuro, and then creating Exafunction, seeing some of the challengesā€¦ Could you talk a little bit about how the industry was evolving and changing as you were seeing it, so that we can get a sense of kind of how you moved toward Codeium? To give a little bit of the history, instead of just starting from where that is. Can you talk a little bit about, the itches that you were scratching, and why it led that direction? What did this AI industry look like to you?

Yeah, so when we started, you can just imagine, everything was a lot more smaller-scale, right? The hyperscalers or the cloud providers just didnā€™t have nearly as much GPUs. If you asked them what fraction of cloud spend is GPU spend, itā€™s probably like very small, single-digit percentage points, maybe even less than that at the time. So this was like a very small workload for them when we sort of started. Both me and Anshul started at Nuro in like 2018. But then over time, this grew a ton. We could see it from the training workloads. These were no longer even single-node training workloads. Back in the day, a single GPU node that had maybe like eight V100s or something was like considered a lot of compute. And suddenly now we were able to witness the fact that this was slowly becoming eight A100 nodes, and then more than eight of these nodes were necessary then even to train these models.

And similarly, to prove out that these models were capable in an actual production setting, you needed to run offline testing at massive scales, on the order of like 5,000 to 10,000 T4s scales, which is like kind of incredible in terms of raw flops. So we were able to see this hockey stick happen in front of us, and thatā€™s sort of what made us want to start Exafunction in the first place. We realized that there were going to be large deep learning workloads.

One interesting fact is for us, for just the Exafunction GPU virtualization software that we ended up selling to enterprises, we ended up managing over 10,000 GPUs on GCP, in a single GCP region. So we ended up managing more than 20%. And we realized that ā€œHey, this is only going to keep growing.ā€ When we talked to the cloud providers, they were only going to keep growing the number of GPUs, and we realized ā€“ I guess the interesting thing was in the future, generative AI was going to be potentially the largest GPU workload. That was the big thing we realized once GPT-3 came out, which was I guess in 2021 now.

Gotcha. But at that point were you already at Exafunction, and had already started at that point?

Yeah, it had already started, and we were sort of selling GPU virtualization software to large autonomous vehicle and robotics companies.

[10:09] Gotcha. And so basically, if Iā€™m understanding you correctly, the whole generative tsunami just kind of landed on you when you were already sitting in that space, doing GPU virtualization already. So you just managed to land right in front of the wave, it sounds like.

Yeah. So we started working on Codeium maybe four or five months ago before ChatGPT. It was interesting just because we realized that an application like GitHub Copilot was going to be one of the largest GPU workloads, period. I donā€™t know if ā€“ youā€™ve probably tried the product out. Every time you do a key press, youā€™re going out to the cloud and doing trillions of computations. So itā€™s like a massive workload. And we had, as Anshul said, the perfect infrastructure to basically run this at enormous scale. Not to mention we were in love with the product from day one. We were early users of the product the moment it came out in 2021.

Very cool. So as generative is starting to take off, kind of with ChatGPT hitting the world, and really changing things quite rapidlyā€¦ I think people are still shocked at how fast things have moved. You had started Codeium alreadyā€¦ What kind of synergy were you starting to see there in terms of knowing that you have one of presumably many, many GPTs coming, and other similar generative models? You had just gotten into Codeiumā€¦ Can you talk a little bit about what that was, and what were you putting together in your minds to recognize the opportunity that it was?

Yeah, so I think one of the great things about the entire ChatGPT wave is that everyone was using it. This is a thing where literally every individual is using AI. And so it helped us, in general. A big wave raises all ships kind of thing. It really helped us. We werenā€™t really going out and telling people ā€œHey, a tool like Codeium can help productivityā€, because that was kind of just now assumed by everybody. Like ā€œOh yeah, if I do any kind of knowledge work, then thereā€™s potential for AI to help.ā€ So from that sense, when this entire ChatGPT wave really came about, that overall kind of just like helped us in terms of convincing people to even try the product.

The other thing that we recognized is that we were positioning ourselves very specifically from the beginning when it comes to code. Code is like actually a very interesting modality. Itā€™s not like your standard ChatGPT, where you have a long context that a user puts in, and then it produces context coming out. Code is interesting, in the sense that, as we mentioned, itā€™s an autocomplete, thatā€™s like a passive AI, rather than like an AI that youā€™re actually instructing the model to do something. Itā€™s happening every keystroke, so it has to be a relatively smaller model. You canā€™t have these hundreds of billions of parameter models being used. It has to be relatively low latency.

And then code itself is interesting, right? If you ever cursor in the middle of a code block, the context both before and after your cursor really matters. Itā€™s not like just what comes before. So thereā€™s all these interesting situational kind of constraints about code that you put all these things together and realize that, okay, all these ChatGPT ways and conversational AIs are happening, thatā€™s great, but weā€™re still not going to be like roller over by that, because weā€™re kind of focusing on a very specific application and modality of a lens, that was pretty unique in many ways.

Break: [13:31]

Could you take a moment, as weā€™re diving into Codeium and generative AI and its unique capabilities there, and just differentiate a little bit about ā€“ you know, so many people have tried Copilot, and so itā€™s kind of inevitable that youā€™re gonna get that comparison, to some degreeā€¦ Can you talk a little bit about what Copilotā€™s not doing for generative AI, or how youā€™re approaching it that allows you to show people this as a better way forward, from your perspective?

I mean, we have tons of respect for the Copilot team. Iā€™m just gonna start there. As Varun said, we were all early users of itā€¦

Definitely not putting you into conflict with them. Just is a starting point for peopleā€¦

[14:24] Absolutely, yeah. The way we kind of view this, and like I alluded to earlier, youā€™re writing brand new code with autocomplete. Itā€™s really just one small task that we do as engineers. We refactor code, we ask for help, we write documentation, we do PR reviewsā€¦ And so kind of our general approach has always been ā€œLetā€™s try to build an AI toolkit, rather than an AI autocomplete tool.ā€

So we can get more into this, into the weeds here, but autocomplete is just one of our functionalities that we provide. We provide like an in-IDE chat, something like ChatGPT, except integrated with the IDEā€¦ Natural language search over your codebase using like embeddings and vector stores in the backgroundā€¦ So weā€™re really trying to expand, like, how can we address the entire software development lifecycle. So I think thatā€™s probably the most obvious difference with a tool like Copilot, from like an individual developer point of view.

But then the other thing, which really kind of builds off of all the infrastructure that Varun was mentioning earlier, is that we were already deploying ML infrastructure in our previous customersā€™ private clouds. We already had all this expertise of ā€œHow can we take actual ML infra, deploy it for a customer in a way that they can fully trust the solution, because weā€™re not getting any of their data?ā€
And so another really big differentiator for us was like - okay, I think this might actually be a tool that enterprises can use confidently and safely, because we have the infrastructure to do the deployment in a manner that they would be open to using. So I think that was like the other differentiator when it came specifically to enterprises. But we can dive more into that later.

No, that sounds good. I want you to connect one more thing for meā€¦ Going from being able to deploy the infrastructure and helping your customers in that way, to Codeium as a tool, whatā€™s the leap there that got you from one to the other? How did you get from infra-focused to Codeium-focused?

Oh yeah, I think we had to do like a full 180 when we started. We went from full inference service company to like ā€œLetā€™s create a product for consumers.ā€ It was a full 180 in terms of productā€¦

Yeah, to some degree a pivot, because we knew that eventually weā€™ll deploy to customersā€™ VPCs. That sounds great. But if weā€™re going to ship something to a customer, we need to be super-confident that it was a product that would work wellā€¦ Because weā€™re getting no feedback from their developers. And so we actually first focused for the first six or seven months of Codeium just building out like an individual tier. Any developers can go try it, we can see how they like it, try our new capabilities, get feedback from an actual communityā€¦ Do all these community-building things that we hadnā€™t really done as an infra-as-a-service company. That was a really huge focus for us, and weā€™ve grown our actual Codeium individual plan to over 100,000 active developers using us for many hours a day, because you code for that long if youā€™re a developer. Thatā€™s like plenty of feedback to us. Plenty of people actually using the tool, telling us ā€œYeah, this is good. This isnā€™t good. Oh, you tried pushing a new model? Thatā€™s worse.ā€ All those things, we actually learned, so that we can get a product thatā€™s good. So that was like the intermediate period - really learning from actual developers, what is a good product and what is not. I think thatā€™s always going to be a key kind of part of our development cycle.

Youā€™re coming into this with this rich knowledge in infrastructure for customersā€¦ Thatā€™s a huge area of expertise. Itā€™s an area of expertise that even though youā€™re moving forward into kind of the Codeium era, if you will, in my words, that is a skill set and level of expertise that very few organizations have deeply, that you would have had there. How did that inform you in terms of Codeium, and differentiation against - whether it be Copilot or other tools that are out there, or just developers throwing things into ChatGPT? What did that background give you that gave you that differentiation in the marketplace?

[18:13] Yeah. So I think when we started, the thing we started with is like ā€œNo one cares if we have better infrastructure once youā€™re a product. If we have better infrastructure, thatā€™s great, but if that makes a product thatā€™s the same, no one should care.ā€

Theyā€™d just assume that you should.

Yeah. So what we started with is we set a very high bar for ourselves. Codeium is an entirely free product. So for the individual user, itā€™s something that they can install and use immediately for free. There are unlimited ā€“ thereā€™s no limits at all. So when it comes to autocomplete, you can use it as much as you want. And this is, by the way, forced us to do things where infrastructure is as efficient as possible.

Just to give you a sense of the numbers weā€™re talking about here, we process over 10 billion tokens of code a day. That might sound like a large number, but thatā€™s over a billion lines of code a day that we process for our own developers. Weā€™re forced to do this entirely for free. And then on top of that, we probably have one of the worldā€™s largest chat applications also, because itā€™s in IDE as well. And all of this put together has allowed us to build a very, very scalable piece of infrastructure, such that weā€™re the largest users of our own product. We are the largest users of our own product, we learn the most from our users, and we can then take those learnings and deploy in a very cost-effective, very efficient and optimized way to our own enterprise users. Itā€™s one of those things where we force ourselves to learn a lot from an individual plan, and then take all those learnings and actually bring them over to the enterprise. And a lot of the learnings we were only able to make because we placed very ā€“ I would say like annoying infrastructure constraints on ourselves by saying ā€œHey, you guys have got to do this entirely for free, basically.ā€ And weā€™re committed to building ā€“ Codeium is going to be a free product forever, actually. The individual plan will always be free. And itā€™s one of those things where our users are just always like ā€œHow are these guys even doing it? What are they even doing to make this happen?ā€ And most of our users, by the way, are users that have churned off of Copilot. We have spent very little, if not anything on marketing. So itā€™s just one of those things where our users are like ā€œHow do we make this free?ā€ We take the approach of ā€“ we think some of the best products in the world are free. Products like Google, theyā€™re entirely free. Google doesnā€™t tell you all the time that they have the best infrastructure, but they do have the best infrastructure. It just so happens to be the case that that shows itself off in the best product. And we could talk a little bit more about how we take our sort of focus on infrastructure and make a much better enterprise product as well, but thatā€™s the way we sort of look at itā€¦ Itā€™s like, how do we deliver materially better experiences with our infrastructure? ā€¦and our users shouldnā€™t care that we actually did that.

Youā€™ve brought it up, youā€™ve got to go there now, manā€¦ Go ahead and dive right into it.

I guess one of the interesting things - like, just going into how we run one of the worldā€™s largest LLM applications, what that sort of focus forced us to do is give it a single piece of compute, like letā€™s say a single node or a single box of GPUs; we can host the most number of users on there. So letā€™s say a large company comes to us, they can be confident that whether theyā€™re on-prem or theyā€™re in VPC, we can give them a solution where the cost of the hardware is not going to dominate the cost of the software itself. Because right now, thereā€™s kind of this misunderstanding that GPUs are really expensive. Which is true, they are. But the trade-off is they have a lot of compute. Modern GPUs like A100s can do 300 teraflops of compute, which is like some ungodly number, right? Thatā€™s a crazy number compared to what a modern CPU can do. And we can leverage that the best. And weā€™ve sort of been forced to do that. If we didnā€™t do that properly, weā€™d have outages with our service all the time. Because of that, enterprises trust us to be like the best solution to run in their own tenant, in an air-gapped wayā€¦ Which is fantastic, because thatā€™s like the way that we can build the most trust and deploy these pieces of technology to them the most effectively, because they donā€™t want to ship their code outside of the company.

Anshul can talk a little bit more about how we leverage things like fine-tuning as well. Thatā€™s like a purely infrastructure problem thatā€™s very unique to us, versus like any other company as well. Anshul, do you want to sort of take that?

[21:59] Yeah, I think - as Varun said, thereā€™s a lot of things that we do from the individual infrastructure point of view, so that we can do crazy things like make it all free for all of our individual usersā€¦ But once we actually self-host, thereā€™s actually a lot of things that you can do, that just any other tool canā€™t do without being self-hosted. And what Varun just mentioned is personalization. If youā€™re fully hosted in a companyā€™s tenant, you can use all of their knowledge bases to create a substantially better product.

I think the way we generally think about is that you have a generic model thatā€™s good, itā€™s learned from trillions of tokens of code in the public corpusā€¦ But if you think about any individual company, they have themselves hundreds of millions of tokens of code that has never seen the light of day. And thatā€™s actually the code thatā€™s the most relevant for them if they want to write any new code. Think of all the internal syntax, semantics, utility functions, libraries, DSLs, whatever it might be. In a model like a Copilot or a Codeium, by the nature of it having to be low-latency, it can only take about 150 or so lines of code as context. So this is not like one of those ChatGPTs, or GPT-4s where youā€™re putting in files and files of context. Itā€™s really small what you can put it, and so thereā€™s really no way for single inference to have full context of your codebase without actually fine-tuning the base model that we shipped to them on all their local code.

So weā€™ve actually done a bunch of studies and weā€™re like ā€“ on how this actually massively reduces like hallucinations, and all these other things that you always hear coming up with LLMs. But things like this, things like providing more in-depth analytics - all these things cache come up by being self-hosted. And as Varun mentioned, these are all at the core, to some degree, an infra problem. How do you actually do fine-tuning locally, in a companyā€™s tenant? Thatā€™s actually an infra problem that weā€™re happy to talk more about that, but maybe Iā€™ll justā€¦ Iā€™ll pass it back to you, Chris.

Actually, Iā€™m about to ask a follow-up about that, because youā€™ve got me really thinking about some of the use cases in my own life on that. So with the self-hosting model, and youā€™re able to now ā€“ kind of like OpenAI, with ChatGPT-4, thereā€™s only so far weā€™re gonna go, because weā€™ve used the public corpus of knowledge out there on the internet, so thereā€™s only so much more vertical scaling you can do on the model learningā€¦ And so youā€™re touching on the fact that thereā€™s so much hidden IP in code, hidden information in code that is of huge value, particularly to the company that itā€™s in, because itā€™s representing their business model, and the way their business has evolved over time. And so if Iā€™m understanding you correctly, youā€™re basically saying that your solution can take advantage of that on their behalf, and really hone against it.

What are some of the limits on privacy? Are they able to do that? Because thatā€™s a big topic. Weā€™ve actually talked about it on the show before, about, you know, in this generative AI age, with IP concerns and privacy concerns, and getting the lawyers involvedā€¦ Are you able to do the training on their site, and keep it to the customer entirely? Or do they have to let their IP out, and stuff? How do you approach that problem?

Yeah, so the answer to any question of like ā€œDoes any IP leave Codeium for enterprises?ā€, the answer is always no. So in pretty much every part of this system, our guarantee is to actually be able to deploy this whole thing fully air-gapped. Weā€™ve even deployed in places like AWS.gov cloud, which is entirely ā€“ it doesnā€™t even have connection to the internet kind of scenario. So nothing ever leaves there, to address some of the points you brought up there, Chris. Yeah, I mean, weā€™re not the only ones who are saying, ā€œOh, no, the data that a company has privately is super-importantā€, and is potentially even more important than the size of the model.

[25:46] I think a good example of this is actually Meta. Instead of using like a GitHub Copilot, or any generic system, they decided - I guess in classic Meta fashion - to train their own autocomplete model internally, using all of their code. And they actually published a paper, I think, a few weeks back. And their model was, in terms of size, I think like 1.3 billion parameters. Like, small in respect to the LLM world. And it just massively outperformed GitHub Copilot on pretty much every task. Thereā€™s now corroborating evidence to what weā€™re saying about fine-tuning, that doing this actually does lead to materially better performances for the user in question.

Now, is that Meta model going to be good for everyone elseā€™s code? Probably not. But thatā€™s also not the whole point. And in terms of being able to fine-tune locally - yeah, weā€™re able to do this completely local. And again, it comes down to scale of data. Our base model has been trained on trillions of tokens of code. Thatā€™s a lot. Thatā€™s why we need this multi-node GPU setup, to do all this training. But an actual company - if they have, say, even 10 million lines of code, thatā€™s about 100 million or so tokens. Thereā€™s a huge order of magnitude difference still between this pre-training and the fine-tuning, which is why we can do this kind of locally, on - actually, surprisingly - whichever hardware they choose to provision for serving their developers.

So again, this comes to some of our infra background and all the stuff that we know how to do - we actually can do fine-tuning and inferences on that same piece of hardware. So we donā€™t actually ask companies to provision more hardware. And even more critically, we are able to do fine-tuning during any idle time with that GPU. So whenever that GPU is not being used performing inference, itā€™s actually doing backprop steps to continuously improve the model.

Fine-tuning is just one aspect of like a larger kind of personalization systemā€¦ But weā€™ve instrumented all of this on hardware, using our inference, to actually create a system that is relatively easy to manage; itā€™s not like a crazy amount of overhead for any company to manage or use Codeiumā€¦ But still, get the maximum possible wins from these AI tools.

Okay, so that is super-cool. And you mentioned things like govcloud, which I have actually worked in in my day job quite a bit, and I can think of a whole bunch of other use cases for me personallyā€¦ Which begs the question about - kind of going back for a moment, because we are Practical AI, and we like to always give some practical routes for people into thatā€¦ So if weā€™re gonna go back toward the beginning of the conversation for a moment, and we have some folks that are listening to this right now and theyā€™ve been using Copilot for a while, theyā€™re probably putting code into ChatGPT, and trying to accelerate there, with varying degrees of successā€¦ Theyā€™ve been experimenting with Bard, and Bard has gotten better on code lately, obviouslyā€¦ So many people that I talk to are still very frustrated with kind of the workflow of the whole thingā€¦ And recognizing that there are these ā€“ youā€™ve outlined these differentiators from Copilot and other competition out there, in a friendly competition kinda wayā€¦ Talk a little bit about some of the specific generative AI use cases that would be good; if someone was in that position where theyā€™re like ā€œYeah, Iā€™m using this stuff, but Iā€™m a little bit frustrated with it. I donā€™t have it down.ā€ And if they were to give Codeium that chance, and dive in on it, can you give me several - kind of layout the use cases of what are they going to get when they move in, from a very practical, like for me, now, as the coder perspective? What does that look like? What are they bonusing in? And maybe give me a couple of different ones, because Iā€™m really curious. And selfishly, Iā€™m probably gonna try each of these that youā€™re telling meā€¦ So Iā€™m scratching my own itch by asking the question.

I think you pointed out - like, yeah, workflows and the user experience for a lot of AI toolsā€¦ Everyoneā€™s still kind of trying to figure it out. Weā€™re still in the very early days of these AI applications. And this is our learnings of kind of the current product company. Weā€™re actually taking the UX quite seriously, and this is actually what the individual plan is created to get feedback on.

[29:53] Very concretely, I think a lot of people have that frustration of like having to copy a codeblock over to ChatGPT, write out a full prompt, and remember the exact prompt that he typed in before that gave them a good result, and then copying the answers back in, and then making modificationsā€¦ That workflow is clearly kind of broken. So when we actually built our chat functionality into the IDE, weā€™re like ā€œOkay, what are all the parts here that can get totally streamlined?ā€ So we actually did things like on top of every function block thereā€™s like little code lenses, that are just these small buttons that someone can click, like ā€œExplain this function.ā€ And itā€™ll automatically pull in all that relevant context, opened up in the window; youā€™re not copying anything overā€¦ And itā€™s like writing [unintelligible 00:30:33.07]

Or if you, say, refactor a function, or add docstrings, or write a unit test - these are all just like small little buttons, or preset prompts, that you can just then click and itā€™ll do his generation on the side. And then we even have a way of clicking apply diff. And because we know where we pull the context in, we can apply a diff right back into the context. So youā€™re not copying things back and trying to like resolve merge conflicts. All of these things are done kind of automatically.

So thereā€™s a lot of really cool things you can actually do when you start bringing these things into the IDE where developers are, and weā€™ve spent a lot of time really thinking, as you said, from a workflow point of view, how do you make this super-smooth?

Varun, could you talk a little bit about maybe some specific tasks that youā€™re seeing people doing? When we talk about generative, and itā€™s expanded, and from LLMs, and weā€™re doing things in video, weā€™re doing things in natural languageā€¦ All of the different modalities are gradually being addressed with these different models, and different tools that are being built around it. Could you talk a little bit about what are people trying to code right now, what specifically is Codeium helping them ā€“ not just about Codeium, but the actual use cases themselves, so that they go ā€œAh, I can see a path forward. I can do that. I know how to generate this or that or the other with generative AI encodingā€? Can you talk a little bit about those in something of a specific level?

So interestingly, just a little bit about multi-modality; I think weā€™re maybe a little bit far from leveraging, I guess, other modes beyond text for code. I think maybe that will happen, but I think thereā€™s not enough evidence right now yet. For autocomplete, just to be open about sort of the functionality we have - we have autocomplete, we have search, and we have codebase-aware chat. So we recognize right now that of the usage, autocomplete accounts for more than 90% to 95% of the usage of the product. Itā€™s because chatting is not something people do like even everyday, potentially. They might open it up once every couple days, but autocomplete is something thatā€™s like always on, very passively helpful, and people get the most value out of it, which is kind of counterintuitive. I think people donā€™t recognize that immediately. But when people are doing autocomplete, weā€™ve recognized thereā€™s two modalities of the way people type code. Thereā€™s a modality of accelerating the developer, which is like ā€œHey, I kind of know what Iā€™m going to type, and I just want to tab-complete the resultā€, and then thereā€™s also an exploration phase, which is ā€œI donā€™t even know what Iā€™m trying to do.ā€

Based on that, I read a commentā€¦ This is like a classic thing where my behavior writing code has materially changed because of tools like Codeium, where Iā€™ll write a comment, and I kind of just hope and pray that it pulls in the right context, so that it gives me the best generation possible. So in my mind, for the acceleration case, Codeium is like very helpful. It can autocomplete a bunch of code. But to make the exploration case, thatā€™s where the true magical moment comes in, where I had no clue how I was going to use a bunch of these APIsā€¦ And thatā€™s sort of what weā€™re focused on trying to make really better, whether that be in chat, as well as with autocomplete - how do we make it so that we can build the most knowledgeable AI, that is maximally helpful, and also just minimally annoying?

The interesting thing about Codeium as a product or these autocomplete products is they get a little bit of getting used to, but even despite the fact that they write wrong things, itā€™s not very annoying, because you can very easily just say, ā€œI donā€™t want this completion.ā€ It didnā€™t like write an entire file out, and you need to go and correct a bunch of functions. It was like a couple lines, or maybe like 10 lines of code; you can very easily validate that itā€™s correct.

[34:02] That comes back to then what Anshul was saying, which is ā€œHow do we make sure we can provide always the maximally helpful sort of AI agent?ā€ The answer is ā€œHave the best context possible.ā€ And a couple of nitty-gritty details we do is currently, our context - and weā€™ll write a blog post about this - is double what Copilotā€™s is. We allow double the amount of context for autocomplete than what they do.

The second thing is, weā€™re able to pull context throughout the codebase. And this is actually that same piece of technology that is pulling context throughout the codebase through search and all these other functionalities; itā€™s getting used as part of chat, for codebase-aware chat, which is something that Copilot doesnā€™t even have today yet.

The third piece is - finally, for large enterprises - how do we make it so that these models actually semantically understand your code? ā€¦which is where fine-tuning comes in. For us, context gets us a lot of the way, but it doesnā€™t get us all the way. Because you can just imagine, even with double the context - so letā€™s say we can pass in 1,000 lines of code. For a company with 10 million lines, weā€™re scratching four orders of magnitude less code than the company actually has. So this is where our vision is we want to continually ramp up the amount of knowledge these models have, and the ways in which they can be helpful. I donā€™t know if that answered the question thereā€¦

It did, actually. Your acceleration versus exploration analogy, that was for me personally - different people get different things - that really clarified for me where I might be using Copilot, or where I would go and use Codeium on thatā€¦ Because I do struggle on the exploration side myself. Itā€™s a lot easier on the acceleration at the end of the line [unintelligible 00:35:32.09] and crank through that fast, which Iā€™ve been able to do with these other toolsā€¦ But I have struggled on the exploration sideā€¦ Because I kind of want to do a thing, and Iā€™m kind of trying to figure it out, and Iā€™m just going to kind of see where my fingers lead on thatā€¦ And having that ability to support that in the way you described - that gave me a very clear understanding from my standpoint.

So Iā€™d like to ask each of you where this is going, both in the large and in your specific concern, with Codeium. Things have never moved faster than theyā€™re moving right now in terms of how fast these technologies are progressingā€¦ And Daniel and I have a habit - we were commenting on our last episode about this; we have a habit of saying ā€œYeah, we recently mentioned this thing, and that weā€™d get to itā€, but then we turn around and we end up talking about that we just got there way faster than we ever anticipated.

With the speed of generative AI, and youā€™re already creating these amazing tools and stuff like that, and youā€™re having to stay out front, where is your brain taking you at night, when you hit when you stop and you chill out and have a glass of wine or whatever you do, and youā€™re kind of just pondering, ā€œWhat does the future look like?ā€ And Iā€™d like to know both from your own specific personal standpoints in terms of your product, and that, but just the generative AI world in general - how do you see it going forward? Iā€™d love your insights.

Yeah, I think the classic question in the grand scheme of things is like ā€œOh my God, is generative AI just gonna totally get rid of my job, or completely invalidate it?ā€ And I think for us, we will be the first people to say that we do think AI will just be the next step in a series of ā€“ at least in code - tools that have made developers more productive; that have led them to be able to focus on more kind of interesting parts of software developmentā€¦ And be an assistant, right? All these tools are called AI assistant tools, I think, for a reason.

Weā€™re definitely not at a place yet - and I donā€™t think for a while - where there isnā€™t going to be like a human in the loop, in control, guiding the AI and what to do. So from that kind of respect, the doomsday scenario - and I donā€™t want to speak for Varun, but I think weā€™re pretty far from that mentality. But we do think ā€“ I think we wouldnā€™t have gotten into Codeium if we didnā€™t genuinely think that there was just so many things that we do day to day as engineers that are just a little frustrating, boring, take us out of the flow state, slow us downā€¦ Those all seem like very prime, ripe things to try to address with AI. And I think thatā€™s kind of our general goal.

[38:04] I think thereā€™s a lot more capabilities to build. I donā€™t think search chat - these arenā€™t going to be the last, I guess, building blocks that we build; we have more capabilities coming up that weā€™re super-excited about. But yeah, itā€™s also going to be a thing where, as you said, this is moving super-quickly. We have research, open source applications all developing at the same time, at breakneck speedā€¦ And so I think part of what weā€™re also looking forward to is how can we also just like educate all these software developers on the best way to use the AI tools? How do you best make the most use of it, so that they are part of the wave, and that they also can get a lot of value?

Well said. Varun?

Yeah, maybe if I was to just say - like, you were asking me what the big worry is. For me, the big worry is thereā€™s going to be a lot of like exciting new demos that people end up buildingā€¦ And obviously, for us as a company, we need to make strategic bets on like ā€œHey, this is a worthwhile thing for us to invest in.ā€

For instance, I think a couple months ago there was an entire craze on agents being able to write like entire pieces of code for you, and all these other things. For us though, we had lots of enterprise companies that were sort of using the product at the time, and recognize that the technology just wasnā€™t there yet. Take a codebase thatā€™s like 100 million lines of code, or 10 million lines of code. Itā€™s gonna be hard for you to write C++ thatā€™s like five files, that compiles perfectly, and then also uses all the other libraries when you have context thatā€™s like five files. Itā€™s not going to be the easiest problem. And I think thatā€™s maybe an exampleā€¦ But for us, weā€™ve currently, I would say - just a pat on the back - over the last eight months, iterated significantly faster than every other company in this space, just in terms of the functionalityā€¦ But we need to make strategic bets on what the next thing to sort of work on is at any given point. And we need to be very careful about like - hey, this is like a very exciting area, but is it actually useful to our users? Is it actually useful, in that - hey, maybe we could do something whereā€¦ A great example is, given a PR, we generate a summary. And I think Copilot has tried building something like this. And we tried using the product that Copilot had, and it was just wrong a lot of the times. And I think that would have been an interesting idea for us to pursue and keep trying to make workā€¦ But then, thereā€™s diminishing returns, and I think Anshul and I have seen this very clearly in autonomous vehicles, where we had a piece of technology that was kind of just not there yet. Like, it needs a couple more breakthroughs in machine learning to kind of get thereā€¦ And the idea of building it five years in advance - you shouldnā€™t be doing that. You just 100% shouldnā€™t be building a tool when the technology just isnā€™t there yet. And that is something that keeps me up at night, is like ā€œWhat are the next things we need to build?ā€, while keeping in mind that this is what the technological capability set is like today, if that makes sense.

It does, and itā€™s a very Practical AI perspective, if you will. So very fitting final words for the show today. Well, Varun and Anshul, thank you very, very much for coming on the show. Itā€™s fascinating. I got a lot of insight and a lot of new things to go explore from what youā€™ve just taught me, and I appreciate your time. Thank you for coming on.

Thanks a lot, Chris.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. šŸ’š


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK