There's a new Llama in town
source link: https://changelog.com/practicalai/233
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Transcript
Changelog
Click here to listen along while you enjoy the transcript. š§
Welcome to another Fully Connected episode of Practical AI. In these episodes Chris and I keep you fully connected with everything thatās happening in the AI community. Weāre gonna take some time to discuss the latest AI news, and then weāll share some learning resources to help you level up your machine learning game. This is Daniel Whitenack. Iām a founder and data scientist at Prediction Guard, and Iām joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?
Doing cool. Iām trying to figure out how did we survive before all these great new models, and stuff? Like, itās changed my ā
Yeah, itās been crazy. Iāve just created a post for LinkedIn, and I was grabbing text, putting it into ChatGPT, getting nice rephrasing, and then Iām like āOh, I need an image.ā And in particular - weāll talk about it a little bit in this episode, but I was like āOh, thereās this FreeWilly model from Stability AI, which is like whale-themedā, and then Iāve got the LLaMA thingā¦ So I just went to Stable Diffusion XL on Clipdrop and said, āHey, generate me an image with a whale and a LLaMA 2getherā¦ And you know, how did I even post to LinkedIn before without these things? Itās like a different world.
Yeah. 2023 versus 2022 is totally different. The content generation, the way you codeā¦ Itās a different world.
Yeah. And this week, as most weeks are, it seems like, in 2023, had some pretty groundbreaking announcements and releases, which weāre going to dive into a bunch of those things. Thereās just a huge amount to update on, and I think itās a good time for one of these episodes between you and I to just parse through some of the new stuff that is hitting our feeds.
Well, I mentioned LLaMAā¦ One of the big things this week was LLaMA 2, but I think before we jump into LLaMA 2, which I think was maybe the main thing dominating at least my world this week, it might be worth just taking a little bit of time to highlight something outside of this stream of large language models, which also crossed my desk this week, which I thought was really coolā¦ Itās this latest version of NeRF. This is work from Google, presented at ICCV 2023, so itās called ZIP NeRF, anti-alias Grid-based Neural Radiance Field.
Thatās quite a name right there.
It is quite a name. It stands for neural radiance field. So NeRF, itās like camel-cased, capital N, small e, and then capital RF, NeRFā¦ These are fully-connected neural networks that create unique, novel views of complicated 3D scenes based on a set of images that are input. So I donāt know if youāve seen that video yetā¦
Iām looking at it as we are talkingā¦ And when you say āthe videoā, I know which video youāre talking about, because itās amazing. Iāve just left it on.
Itās pretty spectacular. This is a podcast, so itās hard to express some of this for peopleā¦ If you just search for ZIP NeRF, you can go to the page for this paper, which is a great summary. But thereās a video on the page, and just to describe what it is - imagine this kind of complicated house, with a bunch of different rooms, and an outdoor patio, sort of garden areaā¦ And the video is actually this kind of almost like a drone flythrough of the house and then the outdoor area. If you imagine a drone flying through a house - thereās hats, and coats, and toys, and couches, and plants, and all sorts of things everywhereā¦ But the video is extremely seamless, and itās not generated by a drone. Itās actually just generated by interpolating between a whole bunch of 2D images, and then interpolating from that the 3D scene. So yeah, I donāt know, what are your impressions, Chris?
First of all, from the perspective - the drone flight, if you will, that you have as a perspective viewing it, itās like the best drone operator in the history of the world.
Yeah, it would probably be hard to get one to do that.
Yeah, youāre not gonna get a real drone operator that could fly that amazingly, and get those things. Itās just phenomenal. And the house is like ā for a moment, you look at it, and I mean, it looks real. But I have noticed, itās cluttery, but itās immaculately clean at the same time as well. The clutter is cleanly distributed, and stuff. I wish when my house was cluttered, it looked as beautiful as this house. It doesnāt.
But yeah, I mean, just, if you didnāt know, if you werenāt listening to the Practical AI podcast to go look at it or something like that, and you just stumbled upon it, youād think it was a drone video, if you didnāt have the education. Youād go, āOh my God, this is just really cool. I wonder what theyāre doing here.ā But itās indistinguishable from real life, for all practical purposes.
[00:06:09.20] Yeah. So itās based on 2D images, and then there are these generated interpolations, which maybe gets to ā there was something that we were talking about prior to hitting the Record button, which was this whole field of generative AI is sometimes conflated with large language models, or ChatGPTā¦ But thereās a whole lot going on in generative AI thatās not language-related, or maybe even based on language-related prompts. So I mentioned that image that I generated for my LinkedIn postā¦ That was still in a text prompt into a model that generated an image. But here, what weāre seeing is weāve got static 2D images that are input to a model thatās actually generating a whole bunch of different perspectives that are synthesized in a 3D scene. So this is, I would say, still fitting into our current landscape and world of generative AI, but itās not a text in/text out, or text in/image out model.
Right. And I think people ā thereās so much coming at people right now. We keep talking about that this year - in the five years weāve been doing this podcast, weāve never had a moment the last few months, where new things have been coming up at people so fast. New terms, new models, and people are trying to distinguishā¦ So itās pretty, I think itās pretty fair that people are trying to make sense of how they relate together. And thereās a lot of connecting between the idea of generative and the idea of large language models overlap in a lot of areas. You have models that are both, and you have models that are just one. But I think itās a brave new world right now in terms of the amount; every show, weāre just trying to figure out what matters right now. Because thereās a lot weāre not hitting.
Yeah. And this side of things, maybe like the 3D or video or image-based side of things I know has its own set of kind of transformative use cases that are popping out. I even remember a little while ago there was some technology, I think from Shopify, but others have done this as well, where maybe you have a room in your house, and you want to see how you can transform it with new furniture or something, that of course you could buyā¦ This is a real kind of e-commerce or retail sort of use case for some of this scene technology of a different kind. If you think of this sort of technology that can take 2D things and create these 3D scenes, certainly thereās use cases within game development, for example, but even other cases where maybe AI has never impacted the process as much like in real estate, for exampleā¦ You know, how expensive is it to literally have a person come out with specialized camera gearā¦ I know that weāve had this in the past, where it takes a special person to come out, with special camera gear, to capture the kind of 3D walkthrough, essentially the Street View walkthrough of your house, and map that onto an actual schematic of your houseā¦ And here, if you imagine someone ā maybe Iām now selling my house myself, without a real estate agent, and I can take an app potentially and go through my house just taking 2D images and create this really cool flyaround 3D view thatās interactive. Thatās really, I think, a powerful, transformative change for a number of different industries.
I came across a company called Luma AI in one of the posts about this technologyā¦ I donāt know exactly how much of the ā if theyāre even using the zip NeRF stuff, but certainly some things related to NeRF to take these 2D images, and they have an app that will create 3D viewsā¦ Itās pretty cool to see some of this kind of hit actual real users.
[00:10:16.08] We keep talking about the fact that weāve hit this inflection point where itās hitting all the ā you donāt have to be in the AI world for this to have a big impact. So itās very easy looking at the ZIP NeRF video to imagine walking around with your cell phone on an appā¦ Youāre just kind of like walking around and the app takes care of whether itās video, or whether itās still images or what, and it just uploads it to this, and produces this amazingā¦ So itās not your walkaround that itās doing. It takes that as raw video, but then it produces this super-high quality thing. So yeah, I mean, I think this is another case where thereās this one technology with thousands of use case possibilities, where it just changes everything.
Yeah. And maybe also in the ā Iād be curious to know your reaction to this also, with respect to kind of the industrial use cases, where ā
Oh, Iāve been thinking about itā¦
Of course, capturing 3D scenes is very important, for example for simulated environments, where youāre trying to maybe train an agent, or even kind of an industrial training for human sort of scenario, where you want to kind of take someone into an environment that itās physically hard to bring a lot of people intoā¦
Yeah. Or there could be safety issues, and such.
Yeah, safety issuesā¦ I donāt know if that sparks things in your mind. I think in the industrial sense, this could have a more B2B sort of impact than just a consumer app.
Sure. I mean, a simple thing - and Iām making something up in the next thing Iāll say. Itās very easy for me to imagine intelligence agencies that are ā if you go back some years to when Osama bin Laden was found, and they had various imagery and stuff, but with stuff like this they might take all those images that theyāre getting from various sources and produce a high ā
Yeah, a flyover, and very photorealistic, of certain parts of the compound with that kind of imageryā¦ And that can be used in a military operation subsequently. Now, Iām making that up, so nobody should take that as a thing. But itās not hard to imagine that. Itās not hard to imagine a lot of factory uses and other industrial things where you have safety issues, you have limited access kind of concerns, where youāre trying to convey thatā¦ But thereās a lot of mundane things, thereās a lot of home-based things and small business things; as you pointed out, the real estate one earlier. So this is just one technology that weāre talking about so far.
Yeah. And I think what youāre saying - it illustrates how this is impacting very large organizations, all the way down to small organizations.
Yeah, sole proprietorships.
Yeah. And itās interesting how - like, if we just take this use case, for example, these kind of 3D scenes, and kind of large-scale organizations that maybe their bread and butter was either the compute associated with like rendering videos and 3D scenes, or theyāre hardware providers that are creating specialized kind of 3D type of equipmentā¦ Like, their whole business model, theyāve got to be thinking, similar to other organizations that are dealing with maybe language-related problems that are thinking about these things with respect to LLMs - thereās a fundamental shift in maybe how their businesses will operate. But then, at the same time, it provides an opportunity for the kinds of small to medium businesses, to embrace this technology very quickly and actually make innovative products that can be widely adopted very quickly, and actually be competitors within an established market. So thereās an established market for 3D things; that has been quite expensive over time, in terms of access to that technologyā¦ So now that whole market is going to change, and I think a lot of these players will be these kinds of small to medium-sized businesses.
I agree. I think thereās a moment here, kind of ironically, because people are so worried about the impact on human creativity because of all these models and stuff like thatā¦ But on a more positive note, thereās this huge opportunity that youāre just now alluding to, for people, that if you can connect the dots as things are coming out, and you can stay on top of it, itās a great equalizer. And so it will clearly change many, many markets that are out there, and many, many industries. And so thereās huge opportunities for those who want to surge ahead at this moment and take advantage of that. And so I think that the message we tend to see in the media tends to be a little bit do me and gloomy on that, but it kind of discounts the fact that change isnāt always a bad thing. People are afraid of it, but thereās huge opportunities here as well if people choose to go find them.
Break: [00:15:22.13]
Well, Chris, there is a new LLaMA in town.
LLaMA 2. Basically, it destroyed all of my feeds and concentration this week when it was released, because it is quite - to me an encouraging thing, but also another transformative step in what weāre doing. So LLaMA 2, for those that maybe lack the context hereā¦ Meta, or Facebook, or however you want to refer to it - Meta had released a large language model called LLaMA, which was extremely useful. It was a model where you could host it yourself, as opposed to like OpenAI; you could get the weights and host it yourself. But the original LLaMA had a very restrictive licensing and access sort of pattern. Even though you could kind of download the weights from maybe like a BitTorrent link or something like that, and those propagated, technically if you got those weights you were still restricted by a license that prevented commercial use cases specifically.
And now with LLaMA 2, Meta has released the kind of follow-on to LLaMA, and we can talk through some of what the differences are, and what it is, and some of what went into it. But I think one of the biggest things, which is I think going to create this huge ripple effect throughout the industry is that theyāve released it with a commercial license. As long as on the day that LLaMA 2 was released you as a commercial entity donāt have greater than 700 million monthly active users, you can use it for commercial purposes. So maybe if my company maybe later on has 700 million monthly active users - which would be great; probably never, butā¦
Thereāll be something past LLaMA 2 by then though.
Yes. It does though, I could still actually use it, because itās only on the release date. So on the release date, which was this week, as long as you didnāt have greater than 700 million monthly active users, you can use this in your business for commercial use cases, and I think thatās going to have a huge ripple effect downstream. And we can talk about the model itself here in a second, but maybe just - Iāll pause there to get your reaction on that, Chris.
It made me smile when I heard that, because itās kind of like saying, āSo long as you donāt compete with us at Meta, you can use this for commercial.ā
Oh, itās totally true. Yeah. Like, who is that? So thatās Snapchat?
TikTokā¦ You can think of who this is. And I guess one way to put this is itās not totally open source, quote-unquote. We wouldnāt call this maybe open source in the kind of official definition of open source. But itās certainly commercially available to a very wide set of people.
Yup. You know, one of the first things I noticed when this came out on their page - and Iām diving into the specifics of the model here - is we had an episode not too long ago, and you were describing about kind of theā¦ I believe it was the 7 billion limit in terms of hardware usage, and stuff. And having been taught that by you, I immediately locked in on the smallest being 7 billion there, and I thought, āAh, this is what Daniel has taught all of us about that limitation on accessibility and who can do it.ā So it has the 13 billion, and the 70 billion size, but I definitely picked up on the 7 billion, which Iām assuming is going back to what you were teaching us a few episodes back.
Yeah. And so just to fill in a little bit on thatā¦ So the LLaMA 2 release includes three sizes. So again, thinking back to what are the kind of characteristics of large language models that kind of matter as youāre considering using them. One is license. Weāve already talked about that a little bit here. We might revisit it here in a second. Another is size, because that influences both the hardware that you need to run it, and then also its kind of ease of deployment.
[00:20:03.20] So LLaMA 2 was released in 7 billion parameter, 13 billion parameter and 70 billion parameter sizes. And then thereās also, of course, the training data and that sort of thing thatās related to this, and how itās fine-tuned or instruction-tuned. So LLaMA 2 was released in these three sizes, both as a base large language model, and a chat fine-tuned model. So thereās the 7 billion, 13, and 70 billion LLaMA 2s, and then thereās the 7, 13 and 70 billion LLaMA 2 chat modelsā¦ Which we can talk about that fine-tuning here in a second.
But yes, youāre right, Chris, in that 7 billion - I could reasonably pull that into a Colab notebook. And maybe with a few tricks, but certainly with the great tooling from Hugging Face, including ways to load it in even 4-bit, or other quantizations, I can run that on a T4, for example, in Google Colab, with some of the great tooling thatās out there. So not needing to have a huge cluster.
The 70 billion - even with that, thatās kind of another limit where using some of these tricks, Iāve definitely seen people running the 70-billion parameter model on an A100; again, loading in 4-bit, with some of the quantization stuff and all that. But 70 billion is certainly going to be more difficult to run; it might require multiple GPUs. But thatās kind of that sizing range for people to have in mind in how accessible things are.
Iām just curious, if youāre looking at these, youāre a business out there, or a data scientistā¦ Can you make up a couple of use cases that you might target with each of these, where you might say, āOh, I want to go 13 on this. Not 7, not 70 for something like this.ā Can you imagine something like this? Iām putting you on the spot.
Yeah, I think ā I mean, thereās certainly innumerable use casesā¦ But I think maybe two distinctions that people could have in their mind is if you want like your own private ChatGPTā¦ Or another way to think about it is a very general-purpose model. You could do anything with this model. Any specific prompt, whatever. Youāre probably going to look towards that higher end, the 70-billion parameter model for that kind of almost ChatGPT-like performance; youāre going to have to go much higher.
But as weāve talked about on the show before, most businesses donāt need a general-purpose model. They need a model to do a thing, or a task, or a set of tasks. And so in that case, I think businesses, because this is open and commercially-licensed, businesses that could take those 7 and 13-billion parameter models and fine-tune them for a task in their business, which also increasingly has amazing tooling around it, again, from Hugging Face and others, with the PEF library, parameter-efficient fine-tuning, and the LoRA technique, which is the low-rank adapter technique, which basically only adapts an existing model, itās kind of an adapter technique, rather than retraining a bunch of the original modelā¦ This opens up fine-tuning possibilities in these smaller models where that fine-tune for an organization is going to perform probably better than any general-purpose model out there. And because itās that smaller size, you can run it on a reasonable set of hardware, thatās not going to require you to buy your own GPU cluster to host the thing. So thatās kind of a maybe a range of use cases that people could have in mind.
[00:24:00.00] I have one more question for you before we abandon this. 7 billion to 70 billion being an order of magnitude jump on that - why would you have something fairly close to that, at 13 billion parameters? Whatās the difference in 7 and 13, when the next step is all the way up to 70? Whatās the rationale, do you think?
Yeah, so it is interesting, actuallyā¦ If Iām understanding right from some of the sources that Iāve been reading, there was actually a ā I forget if it was 30 or 34-billion parameter model that they were also had in pre-release, and were tuningā¦ So there was another one that kind of fit in that slot, that is kind of missing that gap, like youāre talking aboutā¦ If you think of MPT, MPT has a 30-billion parameter model; that fits in that kind of gap.
My understanding - and if our listeners can correct me if Iām wrong; please do. But my understanding is that they actually did test that size of model and found it to not pass their kind of safety parameters around harmful, potentially harmful output, or not truthful output, that sort of thing. So they decided actually to hold that back.
So it could be possible as they instruction-tune and get human feedback potentially more iterations of reinforcement learning from human feedback, there may be a model that they release in that parameter range. So that was one thing that happened, I think.
It is interesting - several different things here that are unique about this model specifically, or maybe the release as well, other than the license, is they were fairly vague on the data that went into the pre-training. So they talked specifically about some very intense data cleaning and filtering that they did on public datasets. And it was trained on more data than the original LLaMA, but they were fairly vague on the mix of that data, and all of that. So that may be related to feedback they got on the datasets that were used in the first LLaMA, I donāt know, but the technical paper was mostly related to the modeling and fine-tuning trickery and methodologies that they used, which was interesting.
And one of those interesting elements of the way that they fine-tune this model was I think the reward modeling. So if you remember, the GPT family of models, the MPT, Falcon, these different models - one of the things that is often done with these models is this process of reinforcement learning through human feedback, which is this processā¦ And we covered this on a previous episode, which we can link in the show notesā¦ But actually using human preferences to score the output of a model, and then actually use reinforcement learning to correct the model to better align with human preferences, or human feedback.
They actually used two separate reward models in this fine-tuning of the chat-based model. One that was related to helpfulness, and then the other one which was related to safety. And one of the interesting things that theyāve talked about in the paper was how sometimes those things can kind of work against each other, if youāre trying to do both of them at the same time. So they actually separated out the reward models that they used for the chat fine-tuning into these two-reward models, one for helpfulness and one for safety, which is quite interesting, I think.
Break: [00:27:47.20]
So Chris, maybe just a couple other things related to LLaMA, and then I want to see your feedback on the code interpreter as well, because we havenāt talked about that yet on the show. And maybe Claude 2, if we can get to it.
Yeah, weāve got to mention Claude 2 as well, because they were both big releases.
Yeah. So just one maybe other note, which I find quite interesting, and actually, I love our previous guest Damienās thoughts on this, who was in our last episode about the legal implications of generative AIā¦ But one of the interesting things about the LLaMA license, in addition to it allowing this commercial usage, is that there is technically a restriction in the LLaMA license, that says āYou will not use LLaMA materialsā, which includes the model weights and etc. āor any output or results of the ultimate materials to improve any other large language model, excluding LLaMA 2 or derivative works thereof.ā So essentially, what this means is if youāre using LLaMA 2 and you want to fine-tune a model, or youāre fine-tuning a model off of LLaMA 2 outputs, youāre stuck with LLaMA 2. Basically, LLaMA 2 is your model, and that youāre going to stick with LLaMA 2. So you couldnāt, for example, technically take outputs from LLaMA 2 and fine-tune, say, DALL-E 3 billion. That would not be allowed by the license, and of course, thatās something that people are doing all over the place. Theyāre taking outputs from GPT-4 and fine-tuning a different model, or taking outputs from a large model, like maybe LLaMA 2 70 billion now, and fine-tuning another model thatās smaller, based on a certain type of prompt or something. So this is restricting that family of models that youāre allowed to do that sort of thing with, which is the first time Iāve seen that, and I think itās kind of interesting.
Yes, it strikes me as another Mark Zuckerberg anti-competitiveness thingā¦ Which heās fairly famous for. I mean, thatās kind of ā even before this.
Yeah. And how could you enforce such a thing? [laughs]
That was my next question to you - is there any possible way that you could conceive of to actually know that from an enforceability standpoint?
I donāt either. So it seems like itās a license thing, and it will concern the lawyersā¦ But itās hard to imagine. I mean, going back to our conversation last week, once you have output, and that output is input to more output, thereās a point where it becomes very, very, very difficult to know what the sourcing really was.
Yeah. And the fine-tunes are already appearing off of LLaMA 2. The most notable probably is FreeWilly, which is from Stability AI, and is a fine-tune of the largest, 70-billion model. But thereās other ones coming out as well. And so I think weāre about to see just a huge explosion of these LLaMA 2-based models for a whole variety of purposes. And who knows how they will fit into that licensing restriction, or how open people will be about thatā¦ But itās about to start. The fine-tunes are already coming.
Yeah. Well, to your point earlier, they werenāt terribly clear about the data that they were sourcing from their own standpointā¦ And I find it interesting, a little ironic.
Itās a bit of a double standard maybeā¦
Yeah, a little bit of a double standard right there, in terms of like āWeāre not going to tell you everything about how weāre doing input, but by the way, youād better not use our output.ā
So yeah, a little interesting. Do you think thereās any risk of a walled garden kind of concept happening in large language models, if others were to follow this lead on anti-competitiveness?
[00:32:03.04] Yeah, it will be interestingā¦ I think it is a notable trend that the first LLaMA from Meta was not open for commercial at all, and now theyāre opening it up for commercial purposes. And maybe thereās a separate trend that will happen with some of these use-based restrictions that people are importing into their licenses, and how useful those things are over time; that may shift, and weāll see those things die off. Or maybe if theyāre enforced, and thereās precedent, maybe weāll see something go the other way. Iām not sure.
But speaking of models that you might get their output and use it to train other models, that is these large-scale proprietary closed models from people like OpenAI, and Anthropic, and others - weāve got a couple of things that we havenāt talked about on the show yet, which people should probably have on their radar. One of those is Claude 2. What do you think about Claude 2, from Anthropic?
Yeah, Iāve been playing around with it a lot in the last week, and I kind of have a set of things that I try over and over again; theyāre kind of my standard tasks as new models come out. And some of them are coding, and some of them are content generation, which are kind of the two big things that I use most often. It was interesting, the input size for Claude 2 is much larger than the others.
Like, much, much larger.
Much, much, much larger.
So 100,000 tokens.
Yeah. And so itās had me kind of change the way Iām approaching it, in that, by contrast with ChatGPT, and youāre trying to figure out with the limits that you have both on input and output how do you kind of prompt-engineer your way to get where youāre trying to goā¦ Which has become this whole skill set weāve been talking about in recent months. And yet Claude 2 almost kind of wipes that out a little bit - in some ways, not in all ways - in that you can hit it with a much larger input spaceā¦ And so itās changing how Iām thinking about kind of getting to the output that I want. And the output is a bit different. Itās not the same. Iām getting different outputs from all the models. Theyāre not all the same, definitely.
I think my biggest thing is with all these new releases - Iām trying to figure out how do I use each one. Iām trying to develop my own strategy on āWhen do I go to ChatGPT by default? When is that the right thing?ā And thatās changing as weāll talk about with things like plugins and stuff; thatās evolving. But then Claude 2 comes out, and then you have on the open source side, as we just talked about, LLaMA 2.
So I think trying to understand all the tools in the toolbox in relation to each other has been interesting. So Claude 2 Iām really focused right now primarily on large content output, is kind of where Iāve landed on that.
And the 100k context length of Claude 2 is something I find really compelling as well. There was also a significant paper that came out, that caused a lot of waves in terms of context length and thinking about that, which showed kind of, as you increase context length, you lose any significance of the middle bit of that context. So the beginning and end is more important in terms of what makes the output of the model quality or not in terms of how you would measure that. So weāll link to that paper maybe in the show notes as well.
But Iāve tried some thingsā¦ I mean, I donāt know exactly all of the detailsā¦ Again, Claude is one of these closed models, so I donāt know all the details of how theyāre doing things. And because itās sitting behind an API, itās hard to know how those things evolve over time. But for example, I took ā one of the things with Claude 2 is I just took one of our complete podcast transcripts, so a full episode, so 45 minutes of audio transcriptā¦ I took episode 225, which I really enjoyed, talking a lot about the things that Iām working on right now with Prediction Guardā¦ And I just asked it to give me a summary of the main takeaways. I pasted in the whole thing, and itās like a fairly good, comprehensive takeaways, like āMany companies banned usage of certain LLMsā, blah, blah, blah. Prediction Guard is trying to provide easy access, structuring, validation, compliance features for LLMs. Making LLM usage easier, blah, blah, and it gives these great takeawaysā¦
[00:36:28.11] And then I asked, āHey, suggest a few future episodes that we could do, that maybe cover related topics, but things that werenāt covered in this episode.ā Pretty good. Some of them are kind of genericā¦ A look at current state of AI agents, and automation, how close are we to no code AI app generation, blah, blah, blah. So that all kind of off of this large context of the transcript input was quite interesting.
Iām curious - Iām gonna put you on the spot also. As someone whoās working on your own product - and I know this is not a Prediction Guard episode, but Iām asking on my own behalf and on behalf of the listenerā¦ How do you as someone who is looking at these different models, how do you think of those different models? How do you kind of structure them in your mind in terms of what youāre offering? Youāve been evolving rapidly over the last few months, and Iām always curious to see kind of where your headās at on this now, as youāre looking at them?
Yeah, I think the things consistently that Iām seeing are that ā I made a post on LinkedIn about this as well; even my own applications that Iām building, LLM-based applications, having access to multiple models, rather than a single model, I think is a really nice usage pattern. The easier we can make it ā and thereās other people that are doing this as well. In Prediction Guard you can query a whole bunch of models at the same time concurrentlyā¦ Thereās other systems that will let you look at that output as well. Not.dev, and some of the toolbar stuff that Swyx is doingā¦ We had a collaboration with him in the Latent Space podcastā¦
So the more you can tie these things together and look at the output or automatically analyze the output of multiple models at the same time, I think thatās really useful. Because itās hard to generally evaluate these models until you start evaluating them for your use case, and building intuition about them for your own use case. So I think the pitfall that people maybe fall into is saying, āOh, Iām going to use this modelā, before theyāve even tested that for their use case.
Try creating a set of evaluation examples for your own use case, and then try out a bunch of different models for that. And also try out the things that are becoming more standard kind of operating procedures for building LLM applications, like looking at the consistency of outputs, running a post-generation validity or factuality check on the output. So checking a language model with a language model. Doing input filtering, and all these sorts of more engineering-related things. So those are some of the things that Iām seeingā¦ But having access to a bunch of models at the same time I think is something that can really boost your productivity.
I appreciate that. And to our listeners, weāre not making it a Prediction Guard show or episode, but as a co-host, Danielās excursion through this in his professional career has made him, in my view, one of the worldās true experts in how to look at all these together. And since we have the benefit of him co-hosting the podcast, Iām going to continue to take advantage of that expertise for all of us.
Sorry about that, Daniel. Sorry for putting you on the spot.
Yeah, no worries. I think the other thing maybe to highlight with Claude 2, and something that you were talking about in chat before we jumped into this episode was Claude 2, or maybe Anthropic and their offerings, versus Open AI. How do we understand that? How do we categorize these things? I think one of the interesting things with Claude 2 ā so weāve seen both Anthropic and their Claude models, and OpenAI and their GPT models increase context size over time. GPT models not quite as far as Claude, but both have increased.
[00:40:28.09] Theyāve also both added in some of this functionality, which I think is very interestingā¦ Claude 2, I think, first, if Iām not wrong - the ability to add in your own data. So in Claude 2 thereās a little attachment button, and you can upload PDFs or text files or CSVs and have that inserted into the context of your promptā¦ Which I think is, of course, extremely powerful. Weāve talked about adding in external data into generative models and grounding models in the past; itās very powerful.
Now, OpenAI is doing this in a slightly different way, and I think this is something worth calling out on the podcast, is with their new code interpreter beta feature within ChatGPT you can upload data, but itās processed through the code interpreter in a different way than what Claude is doing. So we all know that ChatGPT and GPT models can generate really good code, and specifically good Python codeā¦ And so what OpenAI has done for their kind of data processing agent within ChatGPT is āWell, letās just have our model generate Python code, and then weāll hook up the ChatGPT interface to a Python interpreter, and just go ahead and execute that code for you over your data, and then give you the output.ā So this is maybe a distinction that people can have in their mind - Claude 2, you can upload this huge amount of context, you can upload files, insert it into the prompt. As far as I know, theyāre not running any kind of code interpreter type thing under the hood.
ChatGPT might not be inserting all of that into the prompt, but theyāre actually saying, āWell, what if we decompose what youāre wanting me to do with this external data into something that can be executed by a sort of agent type of workflow, where you upload your data and ask me to like do some analysis over it? Iām going to generate some codeā, so the language model generates some code, and then that code is actually executed in the background, it returns a result, which is then fed back through a model to give you generated output back in the interface. So itās actually a multi-stage thing happening in a code interpreter in Open AI.
It effectively produces a no code solution, where you get an output, and youāre just kind of skipping the whole thingā¦ Instead of using the language model to generate your own code, and to be your code assist, and all that, and then youāre still doing itā¦ Itās kind of skipping that whole step right there.
Yeah. And I can give an example I actually ran prior to this show. So I have Claude and the OpenAI code interpreter side by side open; I uploaded a file with a bunch of YorĆ¹bĆ”, which is language in Africa, transcriptions out of audio, which are from the Bible TTS project that we worked with Coqui and Masakhane onā¦ And so I uploaded this file, which includes this YorĆ¹bĆ” text, in a CSV format. OpenAI said āGreat, youāve uploaded this file. Letās start by loading and examining the context.ā And then it has this sort of Show Work button, and you can see the actual code that it generated, which is Pandas code to import the CSV, and then output some examples. So you can expand that and actually see the code that it ran under the hood, and the conclusions that the agent came to.
[00:44:05.06] Then I asked it, āOkay, well, plot the distribution of the transcript links. Are there any anomalies?ā And then again, it says, āHey, Show Work.ā And you can see itās importing matplotlib, itās taking in the CSV, itās actually creating the plot, and it actually generates an image out of the transcripts, and says āI didnāt find any anomalies. Theyāre all kind of within the same distribution. Thereās not any anomalies.ā Then I asked it āCan you translate all the YorĆ¹bĆ” to English?ā and thatās where it ended up stopping, because it said āNo, Iām not good at doing that.ā And Claude actually stopped there as well and said, āNo, Iām not going to do that.ā
I also uploaded the YorĆ¹bĆ” alignments to Claude, and it said, āHey, sure, let me analyze these transcriptsā, and it just output some general, like āThere are 50 audio links. The transcript linksāā Thereās no Python code there. It just gave me some takeaways. And then I said, āAre there any anomalies?ā And it said, āI checked and I canāt find any.ā And āCould you translate it?ā and it said, āUnfortunately, I canāt.ā So itās all still a chat-based thing.
So you can see kind of different approaches to this complicated workflow of having almost an assistant agent executing code for you, versus putting more context in the language model and having it reason over that context.
So theyāre almost getting their own strengths at different types of approaches to problems. Would that be fair?
So thatās another way of thinking about it, is you start understanding how the different large language models approach a problem, and the tooling that might be better or worse for a given use case; that also will help you kind of pick which way you want to go, in addition to maybe just using multiple models, as youāve talked about earlier.
Yeah, exactly. And thereās so much to dive into on all these topics that weāve covered todayā¦ I am going to make sure that we include some really good learning resources for people in the show notes, so make sure and click on some of those. Thereās a guide from DataGen on the Neural Radiance Field stuff, the NeRF stuff that you can learn a bit more about thatā¦ Thereās a Hugging Face post, and Phil Schmidt post on LLaMA 2, that are both really practical; kind of like how do you run it, how do you fine-tune it? What does it mean?
And then thereās a nice post from the One Useful Thing, Ethan Mollik blog or newsletter about Code Interpreter, and how to get it set up, and some things to try. So weāll link that in our show notes, and I think people should dig in. Get hands-on with this stuff. Things are updating quickly, and the only way to really get that intuition about things is to dive in and get hands-on.
It is. Itās the most interesting moment weāve had in the AI revolution of recent years. Just so much cool stuff right now. Anyway, thank you for taking us through all the understanding and explanation of these things.
Yeah, definitely. It was a good time. Hopefully, people enjoy the rest of their week, and maybe go see Oppenheimer, or Barbie, depending on which of those is most interesting to youā¦ But weāll see you next time, Chris.
See you later. Thanks.
Changelog
Our transcripts are open source on GitHub. Improvements are welcome. š
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK