GPT-4.5 = Big Model Energy

Transcript

Speaker 0:

GPT 4. 5 is finally here, and it's OpenAI's largest and most human like model to date. It's really the next step.

Speaker 1:

in scaling up unsupervised learning. It has a lot more deep understanding of the world and human experience.

Speaker 0:

4. 5 excels at natural conversation, creative tasks, and complex planning. It also hallucinates far less than previous models. However, reactions have been somewhat muted largely because its benchmark improvements over g b t four o are relatively speaking incremental. Still, this launch is an important milestone and could be the foundation of the next generation of models.

So let's decode what sets 4. 5 apart and what it tells us about the future of AI. After GPT four was released early in 2023, anticipation started building about what would come next. But months came and went, and the widely anticipated GPT five never arrived.

Through 2024, rumors swirled around the Internet about mysterious internal projects named Strawberry and Orion, fueling immense speculation throughout the tech world.

And finally, in December, OpenAI revealed the first of these new models, o one, a model capable of reasoning or systematically thinking through complex problems step by step, clearly surpassing GPT four in maths, coding, and logical tasks. Anticipation then shifted towards another rumored OpenAI project, Orion, which some speculated to be g p t five.

Recently though, Sam Altman confirmed that Orion would actually be released as g p t 4. 5. And now it's finally here. 4. 5 is by far OpenAI's largest model yet, potentially more than 10 times the size of GPT four, and it represents a step forward in scaling up pretraining and post training. The final result is the model that users can interact with today.

But if it's not a frontier reasoning model, what is GPT 4. 5 actually for? According to OpenAI researchers, four point five stands out for its emotional intelligence.

Speaker 1:

You can have much more deeper conversations about maybe more curious facts that four or even o one or three many don't really, know about. But also we think that he has a much better understanding of what humans want. That really gets what you mean when we ask for something. And that's been really the the, like, magical experience for people at OpenAI for myself working with Motive.

Speaker 0:

GPT 4. 5 achieves around 61. 9% accuracy on bench benchmarks like simple QA, which evaluates how effectively models can answer single relation factoid questions. That compares to GPT four o's 38. 4%. It also cuts hallucination rates dramatically down to roughly thirty seven percent from GPT four o's 61. 2%. Practically speaking, this means 4.

5 is more trustworthy for general inquiries than four o. On the creative side, GPT 4. 5 seems to really shine. Whether you're drafting emails, generating imaginative stories, telling jokes, brainstorming new ideas, GPT 4. 5 produces distinctly more human like pros than four o. On the two benchmarks that evaluates a model's persuasive power, make me pay and make me say, 4.

5 easily surpasses models like four o and o one. Early high taste testers on Twitter and other social media also pointed out that 4. 5 is often capable of actually being quite funny and seems to understand irony in a way that other models fail to grasp. In our own testing of 4.

5 before its public release, we found the model to be far better on these sort of softer subjective tasks than other models. Unlike o one or o three, which are measured more in hard metrics, researchers relied in part on vibes testing when measuring 4. 5 outputs.

Speaker 1:

Our key evaluations is working with humans who try out the models, who give us feedback on, you know, is this better than GPT four? Where is it better? Where is it worse? And then that's something that we can adapt to. And we we do a lot of work with these trainers, that's what we call them, to kind of align on what does it mean for something to be good, and then we use their feedback to improve.

Of course, the problem with that is that it's quite hard to come up with specific evaluations of subjective areas like writing quality, emotional intelligence, and model feel. What is good writing? Right? Like, what is good writing to you? What is good writing to me? I think that really depends on context. It depends on the audience.

We're definitely trying to capture some of these things in evaluations, but it's just much more subjective, which is why we're putting it out there. We really want people to kind of try it out and tell us if they have the same experience that we are having. But GPT 4. 5 is not without limitations.

Speaker 0:

For one, it's far more expensive than any other open AI model right now. Per input token, 4. 5 is 30 x more expensive than four o, and per output token, it's 15 times more. Such high costs mean GPT 4. 5 is likely not yet a suitable option for those looking to deploy the model at scale. In terms of capabilities, as expected, when compared to specialized reasoning first models like o one, GPT 4.

5 falls notably short in things like structured reasoning domains, including complex stem tasks, advanced math problems, and tough coding challenges. Challenges. So what's the bigger picture here? GPT 4.

5 shows that scaling unsupervised learning continues to yield valuable improvements in accuracy, emotional intelligence, and creativity, even if these gains are now perhaps more incremental than what we've previously seen. The era of scaling pretraining may not be completely over, but reasoning now appears to offer the most potential for squeezing gains out of scaling compute.

That is to say, investing more at inference time rather than training. Looking ahead, Sam Altman has suggested that the two paradigms, unsupervised pretraining models like GPT 4. 5 and specialized reasoning focused models like o three will converge into a unified architecture that we might see in GPT five.

Speaker 1:

We do think that reasoning is going to be a core capability of future models, but these two paradigms, they sort of they're not exclusive. They actually complement each other really well. So you could imagine that a model that has the knowledge and the intuition of GPT 4. 5 but then combined with reasoning, that would be really strong model.

Speaker 0:

So GPT 4. 5 is a crucial bridge towards that future. Models could soon blend vast world knowledge, creative fluency, emotional nuance, and advanced reasoning all in one model. The implications are incredibly exciting. The era of choosing between broad understanding and powerful reasoning may soon come to an end. GPT 4.

5 provides a glimpse into that future, one where AI systems systems combine the best of both paradigms.

Speaker 2:

I have news for you guys. YC is throwing our first ever AI startup school in San Francisco on June. Elon Musk, Satya Nadella, Sam Altman, Andre Karpathy, Andrew Ng, and Fei Fei Li are just a few of those confirmed, the world's top AI experts and founders who will teach you how to build the future.

It's a free conference just for computer science grad students, undergrads, new grads in AI and AI research, and we'll even cover your travel to SF. But you have to apply, and space is limited. Link in the description to apply for a spot.

Founder Tools

Need help?

GPT-4.5 = Big Model Energy

Transcript

Related Videos

Explore More Content