O
Ochtarcus
Tools
0

Building AI Models Faster And Cheaper Than You Think [Lightcone Podcast Ep. 4]

Garry, Diana, Harj, and Jared discuss the strategies to build a foundational AI model with examples of YC companies doing just that. We also get an exclusive look at OpenAI's Sora!

Transcript

Speaker 0:

A lot of the sci fi stuff is actually now becoming possible. What happens when you have a model that's able of simulating real world physics? Wouldn't it be cool if this podcast were actually an Infinity AI video? One thing I noticed is, like, the lip syncing is, like, extremely.

Speaker 2:

accurate. Like, it really looks like he's actually speaking Hindi. How do YC companies build.

Speaker 1:

foundation models during the batch with just 500,000?

Speaker 2:

This is literally built by 21 year old new college grads, and they built this thing in two months. I think he'd like locked himself in his apartment for a month and just read AI papers. You can actually be on the cutting edge in relatively.

Speaker 3:

short order, and that's an incredible blessing. Welcome back to another episode of The Light Cone. Today, we're talking about generative AI. First there was GPT-four, then there was mid journey for image generation and now we're making the leap into video. Hajj, we got access to Sora.

Speaker 0:

and we're about to take a look at some clips that they generated just for us. Yeah. Should we take a look? Okay. So here's the first one. The prompt is it's the year 02/1950, a humanoid robot acting as a household helper walks someone's golden retriever down a pretty tree lined suburban street.

Speaker 2:

What do we think? I like how it actually spells out helper. It's like a flex. Yeah. Like, I can spell now. Yeah. Which was not true with the image models. You'd always scrub the text in the image.

Stable diffusion Dolly were were were notoriously bad at spelling text, so that is a major advance that no one's really talked about yet. I mean, it's wild how high definition it is. Like, that's almost.

Speaker 0:

realistic.

Speaker 1:

And the other really cool thing is the physics.

Speaker 2:

The way the robot walks, for the most part, is Yeah. Very accurate. You do notice a little kinda, like, shuffle that's a little bit off, but for the most part, it's believable. And the way the golden retriever moves I'm a golden retriever, so I can personally vouch that, like, they perfectly modeled the, Yeah. You have one. Right? Like your dog. Right?

Yes. Well, it's perfect it's a perfect representation of how a golden retriever walks. I also like that with with Dolly and stable diffusion, as you got as you made your prompts longer and longer, it would just start ignoring it and not actually doing exactly what you told it to do.

And, like, we gave it a very specific prompt here, and it did exactly the thing that we told it You can see it's not it's still not exactly perfect. So I think towards the end, you see is like a floating dog, something in there. Okay. I I I was gonna call out a couple other imperfections here, which is that, like, the street is not a street, guys. Like It's carless society. Yeah. And, like, what?

What's up with that? It's like a weird it's like not quite a sidewalk, not quite a street. Yeah. And in the future, we won't need cars anymore. And then, like, only one side is jumping. There's this Oh, yeah. Floating object thing. Here's a floating object on the right if you if you watch carefully.

Which looks like a little dog or something. I'm not sure.

Speaker 3:

This is still a real breakthrough. You if you look at incredible. You know, some of the stuff that Meta put out I mean, I always think about, what is it? Will Smith trying to eat a plate of spaghetti. Yeah. And that looks insane. And it's sort of just what you would do if you fed the previous frame into the same model to try to generate the next frame, and it just wasn't durable.

Speaker 2:

And that wasn't too long ago. Yeah. The other thing that I find really impressed about the Sora videos is that they have long term visual consistency. So it's like a minute long, and, like, all the houses are similar architectural style. There's no, like, discontinuity. All the trees look similar. It's clearly.

Speaker 0:

all takes place in the same world. Next one's a drone camera circles around the Golden Gate Bridge. The view showcases the magnificent cliffs and ocean waves with views of San Francisco in the background. The view is stunning captured.

Speaker 1:

with beautiful photography. That is the Golden Gate. That was the Golden Gate Bridge. Like, it knows what the Golden Gate Bridge looks like. And I think you can see Alcatraz there a little bit too. Yeah. I'm so excited. Like, the the high definition.

Speaker 0:

is amazing.

Speaker 2:

And you can't see the city in the background as we asked for. It's definitely not geographically accurate. But Yeah. Like, the terrain is not quite actually the way it is in the real world, but it it looks visually kinda similar. Yeah. And you can see it's not quite perfect because early.

Speaker 0:

on in the clip, if you look at sort of one of the columns of the bridge from a particular angle, it looks disjointed. Like, can you see that one? Oh, yeah. At the back? Yeah. It it And then it sort of lines up when we get to this angle.

Speaker 2:

All also, if you go back to the beginning of the clip and you look at the cars driving on the bridge, they're driving on the wrong side of the road. Like Yeah. Yeah. Like, that one's about to cause a traffic accident. Maybe they there's some data from The UK, maybe. Yeah.

Speaker 1:

Hilarious. I guess the other detail is is in computer graphics, it's incredibly difficult to simulate fluid. And Mhmm. It's still a little bit wonky with it. With the waves, they're a little bit static. Yeah. I I've seen other solar eclipse where it captures the motion of water just incredibly.

Speaker 0:

One thing I'm really curious about is just how Sora works under the hood and just how they're generating these videos. So, Danny, can you give us a brief, like, a primer on just, like, what's actually going on? And one thing I was particularly curious about is, like, is this, like, a new model? Like, or is this, like, an extension of the transformer model that we all know about as powering ChatGPT?

Speaker 1:

I think the TLDR and the really cool thing here, it is really a combination of a transformer model, which typically has been mostly used for text, and a diffusion model, which has been used in which is a lot of the tech behind DALI mid journey to generate images. So it's combining these two and then adding a temporal component so you can see the consistency between frames and the time.

And I think the key thing that, OpenAI did was to train this with videos and with what they call space time patches. So it is like this, basically, three by three matrix of pixels. So you have the space spatial and then patches of temporal, which is, like, multiple frames create a video. And the way they do it, they have a variation of the sizes of these patches.

It could be certain size smaller to bigger in x y z, basically. Right? And then they basically train all this in in this giant architecture, which is expensive.

Speaker 0:

And so are the patches are these sort of spacetime patches the video equivalent of tokens?

Speaker 1:

Sort of. Because I think there's a lot of prior work behind Sora because the first thing is transformers have been mostly applied to text. And one of the prior work arts was Google's work on demonstrating that you could do transformer models not just for English text, but for images. So that was a foundational paper that came back in I think they published it in 2020.

And the paper was called images war 16 by 16. So they call it a visual transformer. So they they demonstrated that you could create and use transform models for image recognition task because the state of the art up to then was convolutional neural networks, which was very expensive to compute. So that was one piece of the puzzle. The other piece of the puzzle was, kinda the space time concept.

And I think that some of that comes from stitching some different work on the past. There's, this other paper, World Model, that came out in 2018 that separated it's for robotics, actually, that separates the detection piece. So that's kinda the perception of the visual part. And then the other piece is the memory model for the temporal aspect.

And the temporal aspect in the world model paper uses RNNs, and then there's a controller model that combines it. So what I mean, they they don't explain too much, OpenEye. This is just a bit of just me looking at it. I I don't know. This is one of these things that OpenEye is a bit cagey about it.

But you we can only speculate it's a combination of, like, robotics papers plus transformer plus text.

Speaker 0:

And then how much more expensive is it to generate one of these videos compared to sharing the text? Like, curious, like, how do we even think about that?

Speaker 1:

Oh, man. So imagine the GPT four is, like, a trillion parameters, and that imagine is only two dimensions. Right? Text is just the matrix of two by two. Now this is, like, an order of magnitude, so I can imagine it's, like, at least one order of magnitude 10,000,000,000,000. Okay. That's amazing. 10 times the amount of GPUs.

I could only imagine. I think it was about 20, 30,000. I forget exactly the number of GPUs that took for GPT four. Okay. Well, what's crazy is that.

Speaker 0:

we have companies with NYC that have also been able to achieve similar types of functionality, and they clearly have way less resources than OpenAI does. And so I'm curious how they managed to do that. And the way the way I kind of think about this is that there's the components of building one of these, like, foundational models, data, compute, and expertise.

Should we talk through some of the YC companies and how they've managed to, like, hack.

Speaker 1:

each of those things? Basically, how do YC companies build foundation models during the batch with just 500,000?

Speaker 2:

Yeah. I think it's an important topic because I think because people know how much money OpenAI is spending on GPUs, there's this meme going around that in order to do this, you need to, like, have raised, like, billions of dollars and have, like, a data center full of GPUs. And we've actually seen that it's not true.

There's actually a bunch of companies in the current batch of winter twenty four right now that just in the time of the batch, with just the 500 k that YC gives them, have actually built really awesome foundational models that are producing, like, magical results. Should we look at some of these demos and Yeah. See how talk about how they've managed to get this to work? Yeah.

Let's start with Infinity AI. Infinity AI is a company in the current batch, and what they do is they make deepfake videos of a particular person. So for example, they have an AI replica of Elon Musk, and you can just tell Infinity AI what you want Elon Musk to say, and they will produce a video of Elon Musk saying exactly that thing. You wanna watch a demo? Yeah. Let's see a demo.

Let's watch the demo. Speaking of IC companies training their own models, did you guys see the Infinity AI demo last week? Yeah. They're a company in my group. Infinity allows people to make videos by just typing out a script. Wouldn't it be cool if this podcast were actually an Infinity AI video? That'd be super cool. You think they'd be up for that?

Well, guys, I have a surprise for you.

Speaker 0:

There we are. That was pretty good. So.

Speaker 2:

special thanks to the Infinity AI team who made a model for of the Light Cone podcast. And the way that they did this is they literally just downloaded our YouTube videos from the first three episodes, and they trained their model on that. And the cool thing about these models now is, like, you don't need that much data once you've trained the foundation model to adapt it to learn a new person.

So just the, like, hour or so of YouTube video that we had was enough for them to get a really accurate representation.

Speaker 1:

I could talk about another company. So I'll explain what SyncLab is. SyncLab is an API for creating real time lip syncing. And the crazy thing about this team is that they train the models on a single a 100, and it's generating these kinds of results. So let's take a look at it.

Speaker 2:

I'm guessing this guy doesn't actually speak Hindi? No. No. Okay.

Speaker 1:

One thing I noticed is, like, the lip syncing is, like, extremely accurate. Like, it really looks like he's actually speaking Hindi. Yeah. And and if we put it in this framework that you were mentioning, Harsh, with how YC companies do this, there's different vectors. There's a computation, dataset, and speed. So they kinda hacked all of those.

So for the dataset, the clever thing they've done is unheard of a training a video model video audio model with so little resources is they compress a lot of the data and use low res video. So you don't need the high res video because if you have a high res of ten eighty p versus, let's say, the 241 version, that's like a factor quadratic factor less because it's two dimensions. Right?

So they've done that. The other thing that enabled them to really move a lot faster is the deal that we did with Azure where we have a dedicated GPU cluster for companies in the batch. They've been able to iterate a hundred times faster.

Speaker 3:

than they were before in the batch. So a lot of companies out there, they decide they need to do fine tuning. They need access to GPUs, they just can't get it. Or you've gotta pay an arm and a leg and prepay for a year in advance, and maybe you'll get it in 2025. But if you're in the YC batch, turns out you can get them. Yeah. You get over have a million in credits,.

Speaker 2:

and you and there's no contention for resources. You actually get instant access within twenty four hours for a GPU cluster. Which is pretty cool because YC invest half a million dollars, but I think all the companies in the YC batch to train these models, I think they literally didn't have to touch the YC money to train the models. Like, that was all extra free money.

It's, like, unrelated to unrelated to the YC investment. Should we talk about Yeah. Sonato? So Sonato is another company in the winter twenty four batch, and they have built a text to song model. So you can give their model lyrics to a song and tell it who you want to perform the song. Like, you can tell it, I want Taylor Swift to sing a birthday song for my dog, and it will make exactly that song.

There's only two or three models in the world that have ever been trained that actually do this, and I think Sonato is actually the best one. Oh, wow. The really cool thing is that the founders of Sonato are literally 21 years old. Harge, to your point about expertise, this was not built by PhD machine learning researchers who have been working in machine learning for ten years or something.

This is literally built by 21 year old new college grads. Yep. And they built this thing in two months, and they did it basically, they just taught themselves. That's amazing. They just went online and they figured out how to do it. That is very impressive. Should we take a look at it? Yeah.

So this is a song that they made for the YC batch, and it's like a power march about Y Combinator.

Speaker 0:

Is this how we're gonna open the batch? Yeah. That's a good idea. We need big.

Speaker 3:

orange banners behind us and we have to wear military garb though.

Speaker 1:

With orange armory.

Speaker 2:

Gary, we could do our own song for demo day. Oh my god. AI generated this have to now. We have to. This is very impressive. One thing I really like about this is, like, you can actually understand the lyrics. Like, it really does do the lyrics, but it really does sound like someone is singing it. This is the first time I've heard AI vocals like that.

Yep. And to your point, Jared, there's another company.

Speaker 1:

that also didn't have the expertise of PhD in machine learning. It is called Metalware. They're building a Copilot for hardware, and these were founders who used to work as hardware engineers at SpaceX, and they had to build all these hardware designs. So they weren't familiar with building hardware.

And they when they came into the batch, they decided to build basically a Copilot for hardware design. And they didn't have much AI background, and they figure it out. And one of the cool things about them is they also train a foundation model for this because there was there's no model available for this, and they were able to do it during the batch.

And in that same framework or the things that they hack with data and computation, in terms of the data, they got away with using less data but more high quality. What they did is they took a bunch of figures and information from textbooks on hardware, and they scanned all of that and used that as input, which is clever. Right?

The other problem the other thing because they didn't need as much data, then they could choose to work with a model that's less computationally intensive. So they actually use GPT 2. 5, which seems counterintuitive because the 2. 5 GPT only has, like, 1,000,000,000 plus parameters, I think. I think it's 1,000,000,000. Right. Yeah. Versus GPT four is, like, trillion.

Yeah. And they were able to get away to use less computational resources because they use a smaller model and better data, and then they could do all these hardware design copilot tasks, which is really cool. So when you kinda constrain a lot of your task and you're very specific and the dataset is very high quality, that's another way you could hack and build a foundation model during the batch.

And therefore, different kinds of applications, not just generating video text. There's one that I'm really excited in the current batch called GuyLab. They're building a explainable foundation model because one of the things with all these foundation models and deep learning is kinda this black box magic. Nobody know what's going on that you put in the data.

It kinda predicts the label, and you have no idea how that happened. The prior to deep learnings, you could because you could have the weights and understand which feature indicated and gave the weight for the label. So this team is building a foundation model that can explain the outputs, and they.

Speaker 0:

trained a model during the batch. Nice. As a founder, like, when is it the right call to invest in building your own model versus just using one of the existing open source models and.

Speaker 1:

fine tuning and tweaking it to fit what you need? Well, I guess it depends. Right? Depends on what you're really looking to build. If you're in a very specific and it can be niche based, you can get away with training your own foundation model like the metalware guys. But if you're, let's say, doing something more with language,.

Speaker 0:

GPT four gets you quite further along. So it depends on the task too. Right? So is it so if we're thinking about it as, like, a data compute expertise, like, we're basically saying expertise is maybe overrated. Like, we've, like, proving that if you're just, like, smart and, like, willing to read the papers, you can figure it out.

Compute, there are way like, what being YC is one way to get around that. Like, you can get credits and you can take some of that cost off. And so then, is it like the data piece is sort of where all the edge is? Like, if you can find high quality sorry. Say it again. Like, high quality but not like giant datasets, that's the.

Speaker 1:

the hack. Oh, yes. Let's talk about Find. So Find is this company that's building a Copilot for software. The answers that they're generating are even better than Stack Overflow. Interesting. And these were also kids out of college with not a lot of, like, AI background. And they've done a very clever hack to build their own model for the data.

They created a bunch of synthetic data for programming competitions. So they would have a bunch of those datasets generated, and that got, like, a lot higher quality. Imagine that. It's, like, basically infinite if it's synthetic. It's interesting because I feel like synthetic data has been looked down on. It was controversial.

Speaker 0:

Yeah. Initially. Yeah. Why, like, why was it originally controversial, and why does it actually seem to be working? It seemed like.

Speaker 2:

circular. It seemed like it would be impossible for a model to generate its own data. And how, like, how can you learn from the data that you generated yourself? Yeah. It wasn't obvious that such a thing could be possible. It seemed to, like, violate some, like, conservation of energy.

I remember it was, like, the meme that was going around on Twitter was, like, the mosquito drinking its own blood, and, like, this is how synthetic data works. Yeah. But then it turns out it actually works. Interesting.

Speaker 3:

Think maybe this is related to the idea that, you know, some of these, you know, LLMs are actually capable of reasoning. And once you can reason, maybe that's the part that sort of spins up the flywheel and makes it possible. And, you know, there are other interesting analogs that I think there's a healthy debate out there whether or not this will come together.

But you could look at self driving car models are often trained on massive amounts of simulation data instead of actually real drive time, you know, sometimes by a factor of 10 to one or more. And that might end up being true for some of the generative AI models too. Is it possible that Sora will do that as well? Like, Sora generate its own.

Speaker 1:

video to continue training and improving its own model? Probably. I I know OpenEye doesn't share much about their data sources because that's part of the secret sauce. But %, they're using video footage that's generated from Unreal Engine probably or Unity, one of these game engines, because they have a full physics simulator.

So you could create multiple scenes of the same let's say, if you have a the example of the car driving on the on the cliff,.

Speaker 0:

they could generate it from all multiple camera angles because what the game engine does, you can position the camera anywhere, and you could basically generate all the footage on all possible camera views. The physics part of this is really interesting. Okay.

I feel most people when they have seen these Sora demos or just generally get this concept, your mind goes to, oh, this will be cool for generating films or video games that like like entertainment.

Speaker 2:

But if what you're saying is it can actually, like, simulate the real world, there's probably gonna be lots of, like, further reaching implications for that. Like, what are some of like, what happens when you have a model that's able of, like, simulating real world physics, and where does that apply? Well, I have I have a good example.

This company, ATMO, which we funded in 2020, they built their own foundational model for weather prediction. The way they did it is they trained a model on, like, I think, 90 terabytes of weather data. They've programmed in a physics model of the world by starting with, like, actual, like, equations of physics. A giant polynomial. Yeah. It's it's yeah.

It's it's effectively a giant polynomial, and it's so expensive to run. It has to run on a cluster of supercomputers, and it's so expensive to run. There's actually only place in the world that actually runs this model is NOAA, the US government agency. They're the only ones with the supercomputer cluster that runs the the physics model.

And so every weather app that you go to, every weather channel, they're actually not predicting their own weather. They're just downloading the government prediction data and wrapping, like, a nice UI around it. There's only one actual physics simulation for weather, like, in in in America.

And so and, like, no commercial company has been able to create their own version because it's too expensive to do it the old school physics based way. And so what's really cool about ATMO is instead of using the old school physics way, they've trained a foundational model. And using machine learning, it's, like, a million times more efficient to run the same calculation or something like that.

And because of that,.

Speaker 0:

this startup, which has only raised to seed round, is actually able to make a weather prediction model that is more accurate than the Noah funded one that cost over a billion dollars. Interesting. What's really surprising about the text to video is, like, just how far reaching, like, the implications are. So you can go way beyond just generating, like, video games. Like, we can do weather.

Like, what what are what are other examples of cool things that we could do if we can generate, like, have a physics simulator of the real world? Well, there's a bunch of companies that are applying it to biology. Diana, do you wanna talk about a couple of those? Yeah. So.

Speaker 1:

it turns out all these foundation models are great function approximators for anything. So Any function. They're general purpose learning algorithms. And the human body can be simulated with with functions too. So one of the companies, that we funded as well is called Diffuse Bio. They're building generative AI for proteins.

So what they're doing is building these big models to be able to create new molecules for new types of drugs and new kinds of, gene therapies. And in order to hack this aspect of how do they make progress with not as much resources, They had a lot of expertise. This is different than than the set of founders we talked about that don't come from the background AI.

The founder, she has she published some very legit papers in nature before this. She had a lot of expertise in terms of how to short circuit the computation loop. What she did is build custom kernels on the on the models so that the the whole process of building the the foundation models a lot faster with less resources. So that's one. The other company in the current batch is Pyramidal.

Do you wanna talk about them? They're building a foundation model for the human brain, which turns out they're predicting EEG signal, which could be used for all sorts of applications from predicting stroke to re reading at some point, they could your brain could be red. Yeah. Yeah. Same. And what EEG signals are, they're also temporal, so sort of like Sora.

Sora has, like, the images plus images over a time stamp, so there's video. So EEG is the same thing. It's just a electrical impulse, but over a time period. So they kinda do something similar with chunking space time chunk, but for EEG. So they're able to train this model. And the way they were able to train and iterate during the batch, they were experts in the space.

So they also did a lot of hacks around the computation where they found a way to divide a lot of the sequential data into chunks, sort of like what Sora has done. And that actually reduced the runtime complexity by quadratic, which is, like, impressive. And they could get a single run of a iteration.

Speaker 2:

of an initial model with just eight hundred hours of compute GPU compute, which is really cool. One thing that's really cool about that is, like, if people sat down and tried to think of different applications for foundational models, EEG data would not be the one that would, like, immediately come to mind.

Speaker 1:

And to me, that suggests that there's probably a lot of other application areas like EEG data that just people haven't thought of yet. Yeah. It's like, who would have thought that EEG is sort of like videos? It's just this whole concept with space time. You could space time lots of things. It's also possible that.

Speaker 0:

applications of AI that people thought would exist will now exist. Like robotics, I think, is a good one. That's a huge one. You remember I think we talked about this on a previous episode about how when Sam was starting OpenAI, he talked about they originally thought that, you know, AI in robots and AI in the real world would be, like, the first application.

Speaker 2:

And I remember I went over to the OpenAI office in, like, the first year or two, and they had all these robots trying to, like, learn how to solve the Rubik's cube by, like, reinforcement learning. Yep.

Which is also kind of an interesting side note because, like, OpenAI is so wildly successful right now that it's easy to think that they knew that, like, they they had this, like, straight line path to get there, but it was definitely not that. It was like a meandering path. They pursued a bunch of dead end ideas like the reinforcement learning robots that didn't work.

But even the researcher working on transformer.

Speaker 0:

architecture at OpenAI was, like, off in the corner, I think, at the start. Like, it wasn't clear within OpenAI that that was gonna be the The thing. The right thread to pull on. Right? But it so, like, the well, Soarer and just, like, text to video is potentially interesting.

Because, again, if we have a real physics simulator for the world, like, that potentially getting plugged into robots is, like, a breakthrough to make the sort of the AI robot a reality. We actually have a company in the current YC batch, KScale Labs, that's working on consumer humanoid robots. And yeah. That's cool. Yeah. And they have a pretty cool demo.

It's very early, but, like, a lot of the sci fi stuff is actually now becoming possible. The cool thing about Ben, who's the founder for KScale,.

Speaker 1:

he was the guy that built the foundation robotics model for Tesla.

Speaker 0:

Yep. Oh, cool. He put it into the Optimus prime robot as well. Oh, awesome. The thing about.

Speaker 1:

the real world is governed by the laws of physics, and it turns out we have a bunch of equations that can describe it for different things like weather. There's also the space, for example, there's this company that we funded called Draft Date that is building AI models for CAD design. So CAD follows a lot of the laws of physics with Newton, right, with force, shear, etcetera.

And a lot of, software behind SolidWorks and AutoCAD run on these really old kernels that basically, again, solve these giant polynomials of lots of equations so that when you do a design of a structure and you wanna calculate the force and the tolerances, it's accurate because you don't want a building to just flop. Right? So what they and it's very expensive.

I mean, when whenever you build all these models in CAD, and these kernels are super old, and they kinda at at the end of the day, they run on these equations that compile, like, I don't know, to some wild thing like Fortran because they haven't not been updated. What, Draft eight is doing, they are short circuiting some of these with AI models that can do some of the predictions.

So there's a lot faster and cheaper in terms of computation. There's a lot of geometry computational geometry computation behind the scenes.

Speaker 0:

It's really cool. That's a perfect example of just, like, a valuable problem.

Speaker 3:

to solve that the general purpose models just aren't gonna get around to specializing it. That's a great point. And and there's a lot of startups that are very worried that if they, like, go into AI, they're gonna get run over by OpenAI or other foundational model companies. And so one solution to that is, like, train your own model that's doing something else. Yeah. Great point.

There's actually a YC company called Playground run by our friend, Suhail Doshi, that is a good example of actually, you probably can go up against people who are really well funded and come up with something that is far better. What we're looking at here is the newest version of Playground two point five.

And you're they're hot on the heels of Midjourney, but at the same time, like, the models that they've actually even released to open source go toe to toe against the, you know, the latest versions of stable diffusion, and in a lot of cases, way outperform that. And they've done it on far less money than Stability AI and other teams in the space.

So I think Suhail and Playground are really one to watch to sort of, you know, go toe to toe with Midjourney and in the long run potentially beat it because I would never bet against Suhil Doshi. That guy is a beast. The image quality is super impressive. That is looks so cool. And maybe.

Speaker 1:

some some of the audience would have thought that Suhil comes from an AI background, but he doesn't. Yeah. He started Mixpanel before when he was 19.

Speaker 2:

And Playground is also an interesting game on something that Harge was talking about last night, which is the phenomenon of companies pivoting into AI, because Playground actually did not start with this idea. When it started, it was a completely different idea. Yeah.

And a couple of years in, Sue Hale after raising a bunch of money, Sue Hale hard pivoted the thing into AI, and he literally just taught himself AI. I think he'd, like, locked himself in his apartment for a month and just read AI papers, and then he built Playground.

Speaker 3:

So don't be afraid. I mean, I think that that's one of the most interesting things that we've seen across many of these different examples, that if you're looking for a reason why you can't succeed, guess what, you're right.

But on the other hand, the field itself is so new, so brand new that if you spend six or nine months literally reading every paper and then meeting all the people who are in the space and they'll meet you, you can actually be on the cutting edge in relatively short order.

Speaker 0:

And that's an incredible blessing. Totally. It's a really important message actually. Right? Because we're all we're all grateful to Sam and OpenAI for, like, bringing this field forward and making all of this stuff possible.

But at the same time, all of the news headlines tend to be around the companies that are raising, like, huge amounts of money or about, you know, like, Sam, like, himself as a a sort of world celebrity at this point.

But you can actually, like, compete with OpenAI for very valuable, like, verticals and use cases by training your own model without having to be Sam Allman or having a hundred million dollars.

Speaker 3:

So we're out of time for today, but we could talk for hours about the crazy things that we're seeing in AI being built by people who are probably not that different than you who's watching right now. A lot of the world right now is looking at people like Sam Altman and Dario Amade and some of the luminary figures who have really pushed forward the whole space.

But remember, all of these people started someplace. And we hope that Y Combinator might actually be the place for you to start just like it was for Sam Altman back in the day. That's it. Catch you next time.

✨ This content is provided for educational purposes. All rights reserved by the original authors. ✨

Related Videos

You might also be interested in these related videos

Explore More Content

Discover more videos in our library

Browse All Content