3

There's a new Llama in town

 1 year ago
source link: https://changelog.com/practicalai/233
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Transcript

šŸ“ Edit Transcript

Changelog

Click here to listen along while you enjoy the transcript. šŸŽ§

Welcome to another Fully Connected episode of Practical AI. In these episodes Chris and I keep you fully connected with everything thatā€™s happening in the AI community. Weā€™re gonna take some time to discuss the latest AI news, and then weā€™ll share some learning resources to help you level up your machine learning game. This is Daniel Whitenack. Iā€™m a founder and data scientist at Prediction Guard, and Iā€™m joined as always by my co-host, Chris Benson, who is a tech strategist at Lockheed Martin. How are you doing, Chris?

Doing cool. Iā€™m trying to figure out how did we survive before all these great new models, and stuff? Like, itā€™s changed my ā€“

Yeah, itā€™s been crazy. Iā€™ve just created a post for LinkedIn, and I was grabbing text, putting it into ChatGPT, getting nice rephrasing, and then Iā€™m like ā€œOh, I need an image.ā€ And in particular - weā€™ll talk about it a little bit in this episode, but I was like ā€œOh, thereā€™s this FreeWilly model from Stability AI, which is like whale-themedā€, and then Iā€™ve got the LLaMA thingā€¦ So I just went to Stable Diffusion XL on Clipdrop and said, ā€œHey, generate me an image with a whale and a LLaMA 2getherā€¦ And you know, how did I even post to LinkedIn before without these things? Itā€™s like a different world.

Yeah. 2023 versus 2022 is totally different. The content generation, the way you codeā€¦ Itā€™s a different world.

Yeah. And this week, as most weeks are, it seems like, in 2023, had some pretty groundbreaking announcements and releases, which weā€™re going to dive into a bunch of those things. Thereā€™s just a huge amount to update on, and I think itā€™s a good time for one of these episodes between you and I to just parse through some of the new stuff that is hitting our feeds.

Well, I mentioned LLaMAā€¦ One of the big things this week was LLaMA 2, but I think before we jump into LLaMA 2, which I think was maybe the main thing dominating at least my world this week, it might be worth just taking a little bit of time to highlight something outside of this stream of large language models, which also crossed my desk this week, which I thought was really coolā€¦ Itā€™s this latest version of NeRF. This is work from Google, presented at ICCV 2023, so itā€™s called ZIP NeRF, anti-alias Grid-based Neural Radiance Field.

Thatā€™s quite a name right there.

It is quite a name. It stands for neural radiance field. So NeRF, itā€™s like camel-cased, capital N, small e, and then capital RF, NeRFā€¦ These are fully-connected neural networks that create unique, novel views of complicated 3D scenes based on a set of images that are input. So I donā€™t know if youā€™ve seen that video yetā€¦

Iā€™m looking at it as we are talkingā€¦ And when you say ā€œthe videoā€, I know which video youā€™re talking about, because itā€™s amazing. Iā€™ve just left it on.

Itā€™s pretty spectacular. This is a podcast, so itā€™s hard to express some of this for peopleā€¦ If you just search for ZIP NeRF, you can go to the page for this paper, which is a great summary. But thereā€™s a video on the page, and just to describe what it is - imagine this kind of complicated house, with a bunch of different rooms, and an outdoor patio, sort of garden areaā€¦ And the video is actually this kind of almost like a drone flythrough of the house and then the outdoor area. If you imagine a drone flying through a house - thereā€™s hats, and coats, and toys, and couches, and plants, and all sorts of things everywhereā€¦ But the video is extremely seamless, and itā€™s not generated by a drone. Itā€™s actually just generated by interpolating between a whole bunch of 2D images, and then interpolating from that the 3D scene. So yeah, I donā€™t know, what are your impressions, Chris?

First of all, from the perspective - the drone flight, if you will, that you have as a perspective viewing it, itā€™s like the best drone operator in the history of the world.

Yeah, it would probably be hard to get one to do that.

Yeah, youā€™re not gonna get a real drone operator that could fly that amazingly, and get those things. Itā€™s just phenomenal. And the house is like ā€“ for a moment, you look at it, and I mean, it looks real. But I have noticed, itā€™s cluttery, but itā€™s immaculately clean at the same time as well. The clutter is cleanly distributed, and stuff. I wish when my house was cluttered, it looked as beautiful as this house. It doesnā€™t.

But yeah, I mean, just, if you didnā€™t know, if you werenā€™t listening to the Practical AI podcast to go look at it or something like that, and you just stumbled upon it, youā€™d think it was a drone video, if you didnā€™t have the education. Youā€™d go, ā€œOh my God, this is just really cool. I wonder what theyā€™re doing here.ā€ But itā€™s indistinguishable from real life, for all practical purposes.

[00:06:09.20] Yeah. So itā€™s based on 2D images, and then there are these generated interpolations, which maybe gets to ā€“ there was something that we were talking about prior to hitting the Record button, which was this whole field of generative AI is sometimes conflated with large language models, or ChatGPTā€¦ But thereā€™s a whole lot going on in generative AI thatā€™s not language-related, or maybe even based on language-related prompts. So I mentioned that image that I generated for my LinkedIn postā€¦ That was still in a text prompt into a model that generated an image. But here, what weā€™re seeing is weā€™ve got static 2D images that are input to a model thatā€™s actually generating a whole bunch of different perspectives that are synthesized in a 3D scene. So this is, I would say, still fitting into our current landscape and world of generative AI, but itā€™s not a text in/text out, or text in/image out model.

Right. And I think people ā€“ thereā€™s so much coming at people right now. We keep talking about that this year - in the five years weā€™ve been doing this podcast, weā€™ve never had a moment the last few months, where new things have been coming up at people so fast. New terms, new models, and people are trying to distinguishā€¦ So itā€™s pretty, I think itā€™s pretty fair that people are trying to make sense of how they relate together. And thereā€™s a lot of connecting between the idea of generative and the idea of large language models overlap in a lot of areas. You have models that are both, and you have models that are just one. But I think itā€™s a brave new world right now in terms of the amount; every show, weā€™re just trying to figure out what matters right now. Because thereā€™s a lot weā€™re not hitting.

Yeah. And this side of things, maybe like the 3D or video or image-based side of things I know has its own set of kind of transformative use cases that are popping out. I even remember a little while ago there was some technology, I think from Shopify, but others have done this as well, where maybe you have a room in your house, and you want to see how you can transform it with new furniture or something, that of course you could buyā€¦ This is a real kind of e-commerce or retail sort of use case for some of this scene technology of a different kind. If you think of this sort of technology that can take 2D things and create these 3D scenes, certainly thereā€™s use cases within game development, for example, but even other cases where maybe AI has never impacted the process as much like in real estate, for exampleā€¦ You know, how expensive is it to literally have a person come out with specialized camera gearā€¦ I know that weā€™ve had this in the past, where it takes a special person to come out, with special camera gear, to capture the kind of 3D walkthrough, essentially the Street View walkthrough of your house, and map that onto an actual schematic of your houseā€¦ And here, if you imagine someone ā€“ maybe Iā€™m now selling my house myself, without a real estate agent, and I can take an app potentially and go through my house just taking 2D images and create this really cool flyaround 3D view thatā€™s interactive. Thatā€™s really, I think, a powerful, transformative change for a number of different industries.

I came across a company called Luma AI in one of the posts about this technologyā€¦ I donā€™t know exactly how much of the ā€“ if theyā€™re even using the zip NeRF stuff, but certainly some things related to NeRF to take these 2D images, and they have an app that will create 3D viewsā€¦ Itā€™s pretty cool to see some of this kind of hit actual real users.

[00:10:16.08] We keep talking about the fact that weā€™ve hit this inflection point where itā€™s hitting all the ā€“ you donā€™t have to be in the AI world for this to have a big impact. So itā€™s very easy looking at the ZIP NeRF video to imagine walking around with your cell phone on an appā€¦ Youā€™re just kind of like walking around and the app takes care of whether itā€™s video, or whether itā€™s still images or what, and it just uploads it to this, and produces this amazingā€¦ So itā€™s not your walkaround that itā€™s doing. It takes that as raw video, but then it produces this super-high quality thing. So yeah, I mean, I think this is another case where thereā€™s this one technology with thousands of use case possibilities, where it just changes everything.

Yeah. And maybe also in the ā€“ Iā€™d be curious to know your reaction to this also, with respect to kind of the industrial use cases, where ā€“

Oh, Iā€™ve been thinking about itā€¦

Of course, capturing 3D scenes is very important, for example for simulated environments, where youā€™re trying to maybe train an agent, or even kind of an industrial training for human sort of scenario, where you want to kind of take someone into an environment that itā€™s physically hard to bring a lot of people intoā€¦

Yeah. Or there could be safety issues, and such.

Yeah, safety issuesā€¦ I donā€™t know if that sparks things in your mind. I think in the industrial sense, this could have a more B2B sort of impact than just a consumer app.

Sure. I mean, a simple thing - and Iā€™m making something up in the next thing Iā€™ll say. Itā€™s very easy for me to imagine intelligence agencies that are ā€“ if you go back some years to when Osama bin Laden was found, and they had various imagery and stuff, but with stuff like this they might take all those images that theyā€™re getting from various sources and produce a high ā€“

Yeah, a flyover, and very photorealistic, of certain parts of the compound with that kind of imageryā€¦ And that can be used in a military operation subsequently. Now, Iā€™m making that up, so nobody should take that as a thing. But itā€™s not hard to imagine that. Itā€™s not hard to imagine a lot of factory uses and other industrial things where you have safety issues, you have limited access kind of concerns, where youā€™re trying to convey thatā€¦ But thereā€™s a lot of mundane things, thereā€™s a lot of home-based things and small business things; as you pointed out, the real estate one earlier. So this is just one technology that weā€™re talking about so far.

Yeah. And I think what youā€™re saying - it illustrates how this is impacting very large organizations, all the way down to small organizations.

Yeah, sole proprietorships.

Yeah. And itā€™s interesting how - like, if we just take this use case, for example, these kind of 3D scenes, and kind of large-scale organizations that maybe their bread and butter was either the compute associated with like rendering videos and 3D scenes, or theyā€™re hardware providers that are creating specialized kind of 3D type of equipmentā€¦ Like, their whole business model, theyā€™ve got to be thinking, similar to other organizations that are dealing with maybe language-related problems that are thinking about these things with respect to LLMs - thereā€™s a fundamental shift in maybe how their businesses will operate. But then, at the same time, it provides an opportunity for the kinds of small to medium businesses, to embrace this technology very quickly and actually make innovative products that can be widely adopted very quickly, and actually be competitors within an established market. So thereā€™s an established market for 3D things; that has been quite expensive over time, in terms of access to that technologyā€¦ So now that whole market is going to change, and I think a lot of these players will be these kinds of small to medium-sized businesses.

I agree. I think thereā€™s a moment here, kind of ironically, because people are so worried about the impact on human creativity because of all these models and stuff like thatā€¦ But on a more positive note, thereā€™s this huge opportunity that youā€™re just now alluding to, for people, that if you can connect the dots as things are coming out, and you can stay on top of it, itā€™s a great equalizer. And so it will clearly change many, many markets that are out there, and many, many industries. And so thereā€™s huge opportunities for those who want to surge ahead at this moment and take advantage of that. And so I think that the message we tend to see in the media tends to be a little bit do me and gloomy on that, but it kind of discounts the fact that change isnā€™t always a bad thing. People are afraid of it, but thereā€™s huge opportunities here as well if people choose to go find them.

Break: [00:15:22.13]

Well, Chris, there is a new LLaMA in town.

LLaMA 2. Basically, it destroyed all of my feeds and concentration this week when it was released, because it is quite - to me an encouraging thing, but also another transformative step in what weā€™re doing. So LLaMA 2, for those that maybe lack the context hereā€¦ Meta, or Facebook, or however you want to refer to it - Meta had released a large language model called LLaMA, which was extremely useful. It was a model where you could host it yourself, as opposed to like OpenAI; you could get the weights and host it yourself. But the original LLaMA had a very restrictive licensing and access sort of pattern. Even though you could kind of download the weights from maybe like a BitTorrent link or something like that, and those propagated, technically if you got those weights you were still restricted by a license that prevented commercial use cases specifically.

And now with LLaMA 2, Meta has released the kind of follow-on to LLaMA, and we can talk through some of what the differences are, and what it is, and some of what went into it. But I think one of the biggest things, which is I think going to create this huge ripple effect throughout the industry is that theyā€™ve released it with a commercial license. As long as on the day that LLaMA 2 was released you as a commercial entity donā€™t have greater than 700 million monthly active users, you can use it for commercial purposes. So maybe if my company maybe later on has 700 million monthly active users - which would be great; probably never, butā€¦

Thereā€™ll be something past LLaMA 2 by then though.

Yes. It does though, I could still actually use it, because itā€™s only on the release date. So on the release date, which was this week, as long as you didnā€™t have greater than 700 million monthly active users, you can use this in your business for commercial use cases, and I think thatā€™s going to have a huge ripple effect downstream. And we can talk about the model itself here in a second, but maybe just - Iā€™ll pause there to get your reaction on that, Chris.

It made me smile when I heard that, because itā€™s kind of like saying, ā€œSo long as you donā€™t compete with us at Meta, you can use this for commercial.ā€

Oh, itā€™s totally true. Yeah. Like, who is that? So thatā€™s Snapchat?

TikTokā€¦ You can think of who this is. And I guess one way to put this is itā€™s not totally open source, quote-unquote. We wouldnā€™t call this maybe open source in the kind of official definition of open source. But itā€™s certainly commercially available to a very wide set of people.

Yup. You know, one of the first things I noticed when this came out on their page - and Iā€™m diving into the specifics of the model here - is we had an episode not too long ago, and you were describing about kind of theā€¦ I believe it was the 7 billion limit in terms of hardware usage, and stuff. And having been taught that by you, I immediately locked in on the smallest being 7 billion there, and I thought, ā€œAh, this is what Daniel has taught all of us about that limitation on accessibility and who can do it.ā€ So it has the 13 billion, and the 70 billion size, but I definitely picked up on the 7 billion, which Iā€™m assuming is going back to what you were teaching us a few episodes back.

Yeah. And so just to fill in a little bit on thatā€¦ So the LLaMA 2 release includes three sizes. So again, thinking back to what are the kind of characteristics of large language models that kind of matter as youā€™re considering using them. One is license. Weā€™ve already talked about that a little bit here. We might revisit it here in a second. Another is size, because that influences both the hardware that you need to run it, and then also its kind of ease of deployment.

[00:20:03.20] So LLaMA 2 was released in 7 billion parameter, 13 billion parameter and 70 billion parameter sizes. And then thereā€™s also, of course, the training data and that sort of thing thatā€™s related to this, and how itā€™s fine-tuned or instruction-tuned. So LLaMA 2 was released in these three sizes, both as a base large language model, and a chat fine-tuned model. So thereā€™s the 7 billion, 13, and 70 billion LLaMA 2s, and then thereā€™s the 7, 13 and 70 billion LLaMA 2 chat modelsā€¦ Which we can talk about that fine-tuning here in a second.

But yes, youā€™re right, Chris, in that 7 billion - I could reasonably pull that into a Colab notebook. And maybe with a few tricks, but certainly with the great tooling from Hugging Face, including ways to load it in even 4-bit, or other quantizations, I can run that on a T4, for example, in Google Colab, with some of the great tooling thatā€™s out there. So not needing to have a huge cluster.

The 70 billion - even with that, thatā€™s kind of another limit where using some of these tricks, Iā€™ve definitely seen people running the 70-billion parameter model on an A100; again, loading in 4-bit, with some of the quantization stuff and all that. But 70 billion is certainly going to be more difficult to run; it might require multiple GPUs. But thatā€™s kind of that sizing range for people to have in mind in how accessible things are.

Iā€™m just curious, if youā€™re looking at these, youā€™re a business out there, or a data scientistā€¦ Can you make up a couple of use cases that you might target with each of these, where you might say, ā€œOh, I want to go 13 on this. Not 7, not 70 for something like this.ā€ Can you imagine something like this? Iā€™m putting you on the spot.

Yeah, I think ā€“ I mean, thereā€™s certainly innumerable use casesā€¦ But I think maybe two distinctions that people could have in their mind is if you want like your own private ChatGPTā€¦ Or another way to think about it is a very general-purpose model. You could do anything with this model. Any specific prompt, whatever. Youā€™re probably going to look towards that higher end, the 70-billion parameter model for that kind of almost ChatGPT-like performance; youā€™re going to have to go much higher.

But as weā€™ve talked about on the show before, most businesses donā€™t need a general-purpose model. They need a model to do a thing, or a task, or a set of tasks. And so in that case, I think businesses, because this is open and commercially-licensed, businesses that could take those 7 and 13-billion parameter models and fine-tune them for a task in their business, which also increasingly has amazing tooling around it, again, from Hugging Face and others, with the PEF library, parameter-efficient fine-tuning, and the LoRA technique, which is the low-rank adapter technique, which basically only adapts an existing model, itā€™s kind of an adapter technique, rather than retraining a bunch of the original modelā€¦ This opens up fine-tuning possibilities in these smaller models where that fine-tune for an organization is going to perform probably better than any general-purpose model out there. And because itā€™s that smaller size, you can run it on a reasonable set of hardware, thatā€™s not going to require you to buy your own GPU cluster to host the thing. So thatā€™s kind of a maybe a range of use cases that people could have in mind.

[00:24:00.00] I have one more question for you before we abandon this. 7 billion to 70 billion being an order of magnitude jump on that - why would you have something fairly close to that, at 13 billion parameters? Whatā€™s the difference in 7 and 13, when the next step is all the way up to 70? Whatā€™s the rationale, do you think?

Yeah, so it is interesting, actuallyā€¦ If Iā€™m understanding right from some of the sources that Iā€™ve been reading, there was actually a ā€“ I forget if it was 30 or 34-billion parameter model that they were also had in pre-release, and were tuningā€¦ So there was another one that kind of fit in that slot, that is kind of missing that gap, like youā€™re talking aboutā€¦ If you think of MPT, MPT has a 30-billion parameter model; that fits in that kind of gap.

My understanding - and if our listeners can correct me if Iā€™m wrong; please do. But my understanding is that they actually did test that size of model and found it to not pass their kind of safety parameters around harmful, potentially harmful output, or not truthful output, that sort of thing. So they decided actually to hold that back.

So it could be possible as they instruction-tune and get human feedback potentially more iterations of reinforcement learning from human feedback, there may be a model that they release in that parameter range. So that was one thing that happened, I think.

It is interesting - several different things here that are unique about this model specifically, or maybe the release as well, other than the license, is they were fairly vague on the data that went into the pre-training. So they talked specifically about some very intense data cleaning and filtering that they did on public datasets. And it was trained on more data than the original LLaMA, but they were fairly vague on the mix of that data, and all of that. So that may be related to feedback they got on the datasets that were used in the first LLaMA, I donā€™t know, but the technical paper was mostly related to the modeling and fine-tuning trickery and methodologies that they used, which was interesting.

And one of those interesting elements of the way that they fine-tune this model was I think the reward modeling. So if you remember, the GPT family of models, the MPT, Falcon, these different models - one of the things that is often done with these models is this process of reinforcement learning through human feedback, which is this processā€¦ And we covered this on a previous episode, which we can link in the show notesā€¦ But actually using human preferences to score the output of a model, and then actually use reinforcement learning to correct the model to better align with human preferences, or human feedback.

They actually used two separate reward models in this fine-tuning of the chat-based model. One that was related to helpfulness, and then the other one which was related to safety. And one of the interesting things that theyā€™ve talked about in the paper was how sometimes those things can kind of work against each other, if youā€™re trying to do both of them at the same time. So they actually separated out the reward models that they used for the chat fine-tuning into these two-reward models, one for helpfulness and one for safety, which is quite interesting, I think.

Break: [00:27:47.20]

So Chris, maybe just a couple other things related to LLaMA, and then I want to see your feedback on the code interpreter as well, because we havenā€™t talked about that yet on the show. And maybe Claude 2, if we can get to it.

Yeah, weā€™ve got to mention Claude 2 as well, because they were both big releases.

Yeah. So just one maybe other note, which I find quite interesting, and actually, I love our previous guest Damienā€™s thoughts on this, who was in our last episode about the legal implications of generative AIā€¦ But one of the interesting things about the LLaMA license, in addition to it allowing this commercial usage, is that there is technically a restriction in the LLaMA license, that says ā€œYou will not use LLaMA materialsā€, which includes the model weights and etc. ā€œor any output or results of the ultimate materials to improve any other large language model, excluding LLaMA 2 or derivative works thereof.ā€ So essentially, what this means is if youā€™re using LLaMA 2 and you want to fine-tune a model, or youā€™re fine-tuning a model off of LLaMA 2 outputs, youā€™re stuck with LLaMA 2. Basically, LLaMA 2 is your model, and that youā€™re going to stick with LLaMA 2. So you couldnā€™t, for example, technically take outputs from LLaMA 2 and fine-tune, say, DALL-E 3 billion. That would not be allowed by the license, and of course, thatā€™s something that people are doing all over the place. Theyā€™re taking outputs from GPT-4 and fine-tuning a different model, or taking outputs from a large model, like maybe LLaMA 2 70 billion now, and fine-tuning another model thatā€™s smaller, based on a certain type of prompt or something. So this is restricting that family of models that youā€™re allowed to do that sort of thing with, which is the first time Iā€™ve seen that, and I think itā€™s kind of interesting.

Yes, it strikes me as another Mark Zuckerberg anti-competitiveness thingā€¦ Which heā€™s fairly famous for. I mean, thatā€™s kind of ā€“ even before this.

Yeah. And how could you enforce such a thing? [laughs]

That was my next question to you - is there any possible way that you could conceive of to actually know that from an enforceability standpoint?

I donā€™t either. So it seems like itā€™s a license thing, and it will concern the lawyersā€¦ But itā€™s hard to imagine. I mean, going back to our conversation last week, once you have output, and that output is input to more output, thereā€™s a point where it becomes very, very, very difficult to know what the sourcing really was.

Yeah. And the fine-tunes are already appearing off of LLaMA 2. The most notable probably is FreeWilly, which is from Stability AI, and is a fine-tune of the largest, 70-billion model. But thereā€™s other ones coming out as well. And so I think weā€™re about to see just a huge explosion of these LLaMA 2-based models for a whole variety of purposes. And who knows how they will fit into that licensing restriction, or how open people will be about thatā€¦ But itā€™s about to start. The fine-tunes are already coming.

Yeah. Well, to your point earlier, they werenā€™t terribly clear about the data that they were sourcing from their own standpointā€¦ And I find it interesting, a little ironic.

Itā€™s a bit of a double standard maybeā€¦

Yeah, a little bit of a double standard right there, in terms of like ā€œWeā€™re not going to tell you everything about how weā€™re doing input, but by the way, youā€™d better not use our output.ā€

So yeah, a little interesting. Do you think thereā€™s any risk of a walled garden kind of concept happening in large language models, if others were to follow this lead on anti-competitiveness?

[00:32:03.04] Yeah, it will be interestingā€¦ I think it is a notable trend that the first LLaMA from Meta was not open for commercial at all, and now theyā€™re opening it up for commercial purposes. And maybe thereā€™s a separate trend that will happen with some of these use-based restrictions that people are importing into their licenses, and how useful those things are over time; that may shift, and weā€™ll see those things die off. Or maybe if theyā€™re enforced, and thereā€™s precedent, maybe weā€™ll see something go the other way. Iā€™m not sure.

But speaking of models that you might get their output and use it to train other models, that is these large-scale proprietary closed models from people like OpenAI, and Anthropic, and others - weā€™ve got a couple of things that we havenā€™t talked about on the show yet, which people should probably have on their radar. One of those is Claude 2. What do you think about Claude 2, from Anthropic?

Yeah, Iā€™ve been playing around with it a lot in the last week, and I kind of have a set of things that I try over and over again; theyā€™re kind of my standard tasks as new models come out. And some of them are coding, and some of them are content generation, which are kind of the two big things that I use most often. It was interesting, the input size for Claude 2 is much larger than the others.

Like, much, much larger.

Much, much, much larger.

So 100,000 tokens.

Yeah. And so itā€™s had me kind of change the way Iā€™m approaching it, in that, by contrast with ChatGPT, and youā€™re trying to figure out with the limits that you have both on input and output how do you kind of prompt-engineer your way to get where youā€™re trying to goā€¦ Which has become this whole skill set weā€™ve been talking about in recent months. And yet Claude 2 almost kind of wipes that out a little bit - in some ways, not in all ways - in that you can hit it with a much larger input spaceā€¦ And so itā€™s changing how Iā€™m thinking about kind of getting to the output that I want. And the output is a bit different. Itā€™s not the same. Iā€™m getting different outputs from all the models. Theyā€™re not all the same, definitely.

I think my biggest thing is with all these new releases - Iā€™m trying to figure out how do I use each one. Iā€™m trying to develop my own strategy on ā€œWhen do I go to ChatGPT by default? When is that the right thing?ā€ And thatā€™s changing as weā€™ll talk about with things like plugins and stuff; thatā€™s evolving. But then Claude 2 comes out, and then you have on the open source side, as we just talked about, LLaMA 2.

So I think trying to understand all the tools in the toolbox in relation to each other has been interesting. So Claude 2 Iā€™m really focused right now primarily on large content output, is kind of where Iā€™ve landed on that.

And the 100k context length of Claude 2 is something I find really compelling as well. There was also a significant paper that came out, that caused a lot of waves in terms of context length and thinking about that, which showed kind of, as you increase context length, you lose any significance of the middle bit of that context. So the beginning and end is more important in terms of what makes the output of the model quality or not in terms of how you would measure that. So weā€™ll link to that paper maybe in the show notes as well.

But Iā€™ve tried some thingsā€¦ I mean, I donā€™t know exactly all of the detailsā€¦ Again, Claude is one of these closed models, so I donā€™t know all the details of how theyā€™re doing things. And because itā€™s sitting behind an API, itā€™s hard to know how those things evolve over time. But for example, I took ā€“ one of the things with Claude 2 is I just took one of our complete podcast transcripts, so a full episode, so 45 minutes of audio transcriptā€¦ I took episode 225, which I really enjoyed, talking a lot about the things that Iā€™m working on right now with Prediction Guardā€¦ And I just asked it to give me a summary of the main takeaways. I pasted in the whole thing, and itā€™s like a fairly good, comprehensive takeaways, like ā€œMany companies banned usage of certain LLMsā€, blah, blah, blah. Prediction Guard is trying to provide easy access, structuring, validation, compliance features for LLMs. Making LLM usage easier, blah, blah, and it gives these great takeawaysā€¦

[00:36:28.11] And then I asked, ā€œHey, suggest a few future episodes that we could do, that maybe cover related topics, but things that werenā€™t covered in this episode.ā€ Pretty good. Some of them are kind of genericā€¦ A look at current state of AI agents, and automation, how close are we to no code AI app generation, blah, blah, blah. So that all kind of off of this large context of the transcript input was quite interesting.

Iā€™m curious - Iā€™m gonna put you on the spot also. As someone whoā€™s working on your own product - and I know this is not a Prediction Guard episode, but Iā€™m asking on my own behalf and on behalf of the listenerā€¦ How do you as someone who is looking at these different models, how do you think of those different models? How do you kind of structure them in your mind in terms of what youā€™re offering? Youā€™ve been evolving rapidly over the last few months, and Iā€™m always curious to see kind of where your headā€™s at on this now, as youā€™re looking at them?

Yeah, I think the things consistently that Iā€™m seeing are that ā€“ I made a post on LinkedIn about this as well; even my own applications that Iā€™m building, LLM-based applications, having access to multiple models, rather than a single model, I think is a really nice usage pattern. The easier we can make it ā€“ and thereā€™s other people that are doing this as well. In Prediction Guard you can query a whole bunch of models at the same time concurrentlyā€¦ Thereā€™s other systems that will let you look at that output as well. Not.dev, and some of the toolbar stuff that Swyx is doingā€¦ We had a collaboration with him in the Latent Space podcastā€¦

So the more you can tie these things together and look at the output or automatically analyze the output of multiple models at the same time, I think thatā€™s really useful. Because itā€™s hard to generally evaluate these models until you start evaluating them for your use case, and building intuition about them for your own use case. So I think the pitfall that people maybe fall into is saying, ā€œOh, Iā€™m going to use this modelā€, before theyā€™ve even tested that for their use case.

Try creating a set of evaluation examples for your own use case, and then try out a bunch of different models for that. And also try out the things that are becoming more standard kind of operating procedures for building LLM applications, like looking at the consistency of outputs, running a post-generation validity or factuality check on the output. So checking a language model with a language model. Doing input filtering, and all these sorts of more engineering-related things. So those are some of the things that Iā€™m seeingā€¦ But having access to a bunch of models at the same time I think is something that can really boost your productivity.

I appreciate that. And to our listeners, weā€™re not making it a Prediction Guard show or episode, but as a co-host, Danielā€™s excursion through this in his professional career has made him, in my view, one of the worldā€™s true experts in how to look at all these together. And since we have the benefit of him co-hosting the podcast, Iā€™m going to continue to take advantage of that expertise for all of us.

Sorry about that, Daniel. Sorry for putting you on the spot.

Yeah, no worries. I think the other thing maybe to highlight with Claude 2, and something that you were talking about in chat before we jumped into this episode was Claude 2, or maybe Anthropic and their offerings, versus Open AI. How do we understand that? How do we categorize these things? I think one of the interesting things with Claude 2 ā€“ so weā€™ve seen both Anthropic and their Claude models, and OpenAI and their GPT models increase context size over time. GPT models not quite as far as Claude, but both have increased.

[00:40:28.09] Theyā€™ve also both added in some of this functionality, which I think is very interestingā€¦ Claude 2, I think, first, if Iā€™m not wrong - the ability to add in your own data. So in Claude 2 thereā€™s a little attachment button, and you can upload PDFs or text files or CSVs and have that inserted into the context of your promptā€¦ Which I think is, of course, extremely powerful. Weā€™ve talked about adding in external data into generative models and grounding models in the past; itā€™s very powerful.

Now, OpenAI is doing this in a slightly different way, and I think this is something worth calling out on the podcast, is with their new code interpreter beta feature within ChatGPT you can upload data, but itā€™s processed through the code interpreter in a different way than what Claude is doing. So we all know that ChatGPT and GPT models can generate really good code, and specifically good Python codeā€¦ And so what OpenAI has done for their kind of data processing agent within ChatGPT is ā€œWell, letā€™s just have our model generate Python code, and then weā€™ll hook up the ChatGPT interface to a Python interpreter, and just go ahead and execute that code for you over your data, and then give you the output.ā€ So this is maybe a distinction that people can have in their mind - Claude 2, you can upload this huge amount of context, you can upload files, insert it into the prompt. As far as I know, theyā€™re not running any kind of code interpreter type thing under the hood.

ChatGPT might not be inserting all of that into the prompt, but theyā€™re actually saying, ā€œWell, what if we decompose what youā€™re wanting me to do with this external data into something that can be executed by a sort of agent type of workflow, where you upload your data and ask me to like do some analysis over it? Iā€™m going to generate some codeā€, so the language model generates some code, and then that code is actually executed in the background, it returns a result, which is then fed back through a model to give you generated output back in the interface. So itā€™s actually a multi-stage thing happening in a code interpreter in Open AI.

It effectively produces a no code solution, where you get an output, and youā€™re just kind of skipping the whole thingā€¦ Instead of using the language model to generate your own code, and to be your code assist, and all that, and then youā€™re still doing itā€¦ Itā€™s kind of skipping that whole step right there.

Yeah. And I can give an example I actually ran prior to this show. So I have Claude and the OpenAI code interpreter side by side open; I uploaded a file with a bunch of YorĆ¹bĆ”, which is language in Africa, transcriptions out of audio, which are from the Bible TTS project that we worked with Coqui and Masakhane onā€¦ And so I uploaded this file, which includes this YorĆ¹bĆ” text, in a CSV format. OpenAI said ā€œGreat, youā€™ve uploaded this file. Letā€™s start by loading and examining the context.ā€ And then it has this sort of Show Work button, and you can see the actual code that it generated, which is Pandas code to import the CSV, and then output some examples. So you can expand that and actually see the code that it ran under the hood, and the conclusions that the agent came to.

[00:44:05.06] Then I asked it, ā€œOkay, well, plot the distribution of the transcript links. Are there any anomalies?ā€ And then again, it says, ā€œHey, Show Work.ā€ And you can see itā€™s importing matplotlib, itā€™s taking in the CSV, itā€™s actually creating the plot, and it actually generates an image out of the transcripts, and says ā€œI didnā€™t find any anomalies. Theyā€™re all kind of within the same distribution. Thereā€™s not any anomalies.ā€ Then I asked it ā€œCan you translate all the YorĆ¹bĆ” to English?ā€ and thatā€™s where it ended up stopping, because it said ā€œNo, Iā€™m not good at doing that.ā€ And Claude actually stopped there as well and said, ā€œNo, Iā€™m not going to do that.ā€

I also uploaded the YorĆ¹bĆ” alignments to Claude, and it said, ā€œHey, sure, let me analyze these transcriptsā€, and it just output some general, like ā€œThere are 50 audio links. The transcript linksā€“ā€ Thereā€™s no Python code there. It just gave me some takeaways. And then I said, ā€œAre there any anomalies?ā€ And it said, ā€œI checked and I canā€™t find any.ā€ And ā€œCould you translate it?ā€ and it said, ā€œUnfortunately, I canā€™t.ā€ So itā€™s all still a chat-based thing.

So you can see kind of different approaches to this complicated workflow of having almost an assistant agent executing code for you, versus putting more context in the language model and having it reason over that context.

So theyā€™re almost getting their own strengths at different types of approaches to problems. Would that be fair?

So thatā€™s another way of thinking about it, is you start understanding how the different large language models approach a problem, and the tooling that might be better or worse for a given use case; that also will help you kind of pick which way you want to go, in addition to maybe just using multiple models, as youā€™ve talked about earlier.

Yeah, exactly. And thereā€™s so much to dive into on all these topics that weā€™ve covered todayā€¦ I am going to make sure that we include some really good learning resources for people in the show notes, so make sure and click on some of those. Thereā€™s a guide from DataGen on the Neural Radiance Field stuff, the NeRF stuff that you can learn a bit more about thatā€¦ Thereā€™s a Hugging Face post, and Phil Schmidt post on LLaMA 2, that are both really practical; kind of like how do you run it, how do you fine-tune it? What does it mean?

And then thereā€™s a nice post from the One Useful Thing, Ethan Mollik blog or newsletter about Code Interpreter, and how to get it set up, and some things to try. So weā€™ll link that in our show notes, and I think people should dig in. Get hands-on with this stuff. Things are updating quickly, and the only way to really get that intuition about things is to dive in and get hands-on.

It is. Itā€™s the most interesting moment weā€™ve had in the AI revolution of recent years. Just so much cool stuff right now. Anyway, thank you for taking us through all the understanding and explanation of these things.

Yeah, definitely. It was a good time. Hopefully, people enjoy the rest of their week, and maybe go see Oppenheimer, or Barbie, depending on which of those is most interesting to youā€¦ But weā€™ll see you next time, Chris.

See you later. Thanks.

Changelog

Our transcripts are open source on GitHub. Improvements are welcome. šŸ’š


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK