Dalton & Michael: Things that don't scale, the software edition
Dalton Caldwell and Michael Seibel on software hacks that don't scale. Companies discussed include Google, Facebook, Twitch, and imeem.
Transcript
We'll get a founder that's like, oh, how do I, like, test my product before I launch to make sure it's gonna work? And and I always come back and tell the founder the same thing.
Like, if you have a house and it's got full of pipes and you know some of the pipes are broken and they're gonna leak, You can spend a lot of time trying to check every pipe, guess whether it's broken or not, and repair it, or you can turn the water on. And, like, you'll know. Like, you'll know exactly the work to be done. Hey. This is Michael Seibel with Dalton Caldwell.
Today, we're gonna talk about what does it mean to do things that don't scale, the software edition. In this episode, we're gonna go through a number of great software and product hacks that software companies used to figure out how to make their product work when perhaps they didn't have time to really build the right thing. Now Dalton, probably the master of this,.
is a person we work with, a guy named Paul Buhite who infrented this term, the ninety ten solution. He always says something like, how can you get 90% of the benefit for 10% of the work? Always. This is what he always puts on to people when they tell them it's really hard to build something or it'll take too long to code He'll just always push on this point. And Yep.
You know, founders don't love it. Right? Would you would you say that's a a fair assessment, Michael? That's a fair assessment. Yes. Founders hate it. But but tell tell tell the audience why it's worth listening to the guy. Like, why does he have the credibility to to say that to people?
Well, PB is the inventor of Gmail. And as kind of a side project at Google, he invented something that 1,500,000,000 people on earth actively use. And he literally did it doing things that don't scale. So I'll start the story, and then please take it over. So as I remember it, PB was pissed about the Gmail products Google product he was sorry. The email product he was using.
And so Google had this newsletter product. The first version of Gmail, he basically figured out how to put his email into this Google Groups UI. And as he tells the story, kind of his eureka moment was when he could start reading his own email in this UI. And then from that point on, he stopped using his old email client.
And what I loved about this is that as he tells the story, every email feature that any human would wanna use, he just started building from that point. And so, you know, he would talk to Lysy Batch, and he's like, and then I wanted to write an email. And so I built writing emails. And if you know PB, like, he could have got a couple days reading emails without replying at all.
So, like, he didn't need writing emails to start. I remember him telling the first time he got his, like, coworker, like, literally, like, his deskmate or something to try to use it. And his deskmate is like, this thing's pretty good. It loads really fast. It's really great. The only problem is, PB, it has your email in it, and I want it to have my email. And PB was like, oh, shit. Okay.
Well, I gotta build that. I'm like, I gotta build that. I forgot. Perfect night event solution. And so then it started spreading throughout Google. And do you remember when it broke? No. What what happened?
Oh, so he told a story where, like, one day, PB came in late to work, which is, you know, knowing PB Every day. You know? And everyone was looking at him really weird, and they were all, like, a little pissed. And he got to his desk, and someone came over to him and was like, don't you realize that Gmail's been down, like, all morning? And Peter was like, no. I just got to work. I didn't know.
And so he's, like, trying to fix it trying to fix it. And then his coworkers see him, like, grab a screwdriver and go to the server room. And it was like they were like, oh god. Why don't we trust PB with our email? Like, we're totally screwed. And I think he figured out, like, there was a corrupted hard drive.
And I remember that point of story, he was like he says, and that day I learned that people really think email is important, and it's gotta always work. It's like like Perfect. What like about this is I think the reason.
I think the reason he did it, man, is because he liked to run Linux on the desktop, and he didn't wanna run Outlook. Like, the the Google, like, suits were trying to get him to run Outlook on Windows, and he was like, I don't really wanna run Windows. But, yeah, it was the dirtiest hack.
And as I recall in this, you know, final part of the story, it was hard for him to get Google to release it because they were afraid it was gonna take up too much hardware. And so there was all these there was all these issues where there's a good there was a decent chance I think it never would have been released. Well, this part was that everyone thought Gmail's, like, invite.
system was, like, some cool, like, growth hack. Virality hack. Yeah. Like, virality hack. It's like, oh, you you got access to Gmail. You got, I think, four invites to give someone else, and these are, like, precious commodities. And it was it was another product. It was just another version of things that don't scale.
They didn't have enough.
server space for everyone. Used have an apartment. Not have enough servers, so they had to build an invite system. Yes. There was not an option to not basically, there was no option other than building an invite system. It was not, like, genius PM growth hacking. It was like, yeah. Well, we saturated this the hard drives are full, so I guess we can't invite anyone else to Gmail today.
That's it. That's it. So you had another story about Facebook early days that that is similar in this light. So so let me paint the picture. Back when you started to start up a long time ago, you had to buy servers and put them in a data center, which is a special room that's air conditioned that just has other servers in it. And you plug them in, and they're it's they have fast Internet access.
And so being a startup founder until AWS took off, part of the job was to drive to the suburbs or whatever, drive to some data center, which is an anonymous warehouse building somewhere, go in there, and, like, plug things in. And what was funny is when your site crashed, it wasn't just depressing that your site crashed. It actually entailed getting in your car. Yeah.
Like, part of being a startup founder was waking up at 2AM and getting in your car and driving to, like, Santa Clara Yep. Because your code wedged the you have to physically reboot the server. And your site was down until you physically rebuilt it the server. So, anyway, I'm just trying to set the stage for people. So this was this was what our life was like. Okay?
And so my company, Aimeam, we had a data center in Santa Clara, and there was a bunch of other startups there as well. And so something that I like to do was to look at who my neighbors were, so to speak. There was never people there. Was just their servers, and there'd be a label at the top of the rack. And you could see their servers, and you could see the lights blinking on the switch. Okay?
So this is what it was like. And so our company was in the data center, in this data center in Santa Clara. And then one day, there's a new tenant and, oh, new neighbors. So I look at it, and the label at the top of the cage next to ours, you know, three feet away, the label said the Facebook. com. And I remember being like, oh, yeah. I've heard of this. Like, cool.
Like, sounds good. And they had these super janky servers. I think there was maybe eight of them when they first moved in. And they were, like, super cheap. They're, like, super micro servers. You know? Like, the wires were hanging out. Like, know, I'm like, cool.
But the but the lights were blinking really fast. Okay? And so what I remember was that there was labels on every server, and the labels were the name of a university. And so at the time, one of them one of the servers was named Stanford. One of them was named Harvard. You know?
Like and it made sense because I was familiar with the Facebook product at the time, which was like a college social network that was at, like, eight colleges. Okay? So then I watched. Every time we would go back to the data center, they would have more servers in the rack with more colleges.
And it became increasingly obvious to me that the way they scaled Facebook was to have a completely separate PHP instance running for every school that they copy and pasted the code the code to. They would have a separate MySQL server for every school, and they would have, like, a Memcache instance for every school.
And so you'd see, like, the University of Oklahoma, you know, my you'd see the three servers next to each other. And the way that they managed to scale Facebook was to just keep buying these crappy servers. They would launch each school, and it would only talk to a single school database. And they never had to worry about scaling a database across all the schools at once.
Because, again, at the time, hardware was bad. Okay? MySQL was bad. Like, the technology was not great. If they had to scale a single database, a single users table to hundreds of millions of people, it would have been impossible. And so their hack was the nine ten solution, like, PPE used for Gmail, which is, like, just don't do it.
And so at the time, if you were, like, a Harvard student and you wanted to log in, you would it was hard coded to the URL was Harvard. theFacebook. com. Right, man? Mhmm. Like and so if you try to go to stanford. thefacebook. com, it'd be like, you know, error.
Like, that was just a separate database. And so then they wrote code so you could bounce between schools, and it actually took them years to build a global users table, as I recall, and avoid this this hack. And so, anyway, the thing they did that didn't scale is to copy and paste their code a lot and have completely separate database instances that didn't talk to each other.
And I'm sure people that work at Facebook today, I bet a lot people don't even know the story. But, like, that's what it took that's the real story behind how you start something big like that versus what it looks like today.
So in the case of Twitch, all, if not all, like most of the examples of this came from this core problem and it's why I tell people to not create a live video site. A normal website, even a video site, on a normal day will basically have peaks and troughs of star of of traffic. And and the largest peaks will be two to four x the steady state traffic.
So you can engineer your whole product such that if we can support two to four x the steady state traffic and our site doesn't go down, we're good. On a live video product, our peaks were 20 x. Now you can't even really test 20 x peaks. You just experience them and fix what happens when 20 x more people than normally show up on your website because some pop star is streaming something.
And so two things kinda happened that that were really fun about this. So the first hack we had was if suddenly some famous person was streaming, on their channel, there'd be a bunch of dynamic things that could load.
Like, your username would load up on the page or their channel, and the view count would load up and a whole bunch of other things that would basically hit our application servers and destroy them if a hundred thousand people were trying to request the page at the same time. So we actually had a button that could make any page on Justin TV a static page. All those features would stop working.
Your name wouldn't appear. The view count wouldn't update. Like, literally a static page that loaded our video player, and you couldn't touch us. We could just cache that static page, and as many people as possible wanna look at it. Now to them, certain things might not work right. But they were watching the video. The chat worked because that was a different system. The video worked.
That was a different system, and we didn't have to figure out the harder problems until later. Later, actually, Kyle and Emmett worked together to figure out how to cache parts of the page while make other parts of the page dynamic, but that happened way, way later. Dude, that reminds me of me give you a quick anecdote.
Yes. Remember Friendster before Myspace? Yeah. Of course. Yeah. Every time you would log in, it would calculate how many people were two degrees of separation from you, and it would it would fire off on my SQL thread where you would log in. It would look at your friends, and it would calculate your friends of friends and show you a live number of how big your extended network was.
And the founders, you know, John Abrams, he thought this was, like, a really important feature. Remember talking to him about it. Guess what Myspace's do things that don't scale solution was? They made it happen. In your friends list, it would say, this is in your friends you know, so and so is in your friends list. And if it wasn't, it would say, so and so is in your extended network. There it is.
That was it. That was the feature. And so so Friendster was, like, trying to, like, hire engineers and scale MySQL, and they're running into, like, too many threads on Linux issues and, like, updating the kernels. And MySpace was like, oh, so and so is your extended network. That's our solution. Anyway, carry on that. But that's same deal.
So our second one was, it always happened with popular streamers. So our second was, if you imagine, if someone is really popular and there's a hundred thousand people who wanna watch their stream, we actually need multiple video servers to serve all of those viewers.
So we basically propagate the original stream coming from the person streaming across multiple video servers until there was not enough video servers to serve all the people who are viewing. The challenge is is that we never had a good way of figuring out how many video servers we should propagate this stream to.
And if a stream would slowly grow in traffic over time, we had a little algorithm that could work and, like, spin up more video servers and be fine. But what actually happened was that a major celebrity would announce they were going on and all their fans would descend on that page. And so the second they started streaming, a hundred thousand people would be requesting a livestream. Bam.
Video server dies. And so we were trying to figure out solutions, solutions, solutions and, like, how do we how do we model this? How do we like, there were all kinds of, like, overly complicated solutions we came up with.
And then once again, Colinemic got together and they said, well, the video system doesn't know how many people are sitting on the website before the video stream before it starts starts trying to serve video.
What the website does, all the website has to do is communicate that information to the video system and then it could pre populate the stream to as many video servers as it would need to and then turn the stream onto users. So what happened now in this setup is that some celebrity would start streaming. They would think they were live.
No one was seeing their stream while we were propagating their stream to all the video servers that are needed. And then suddenly, the stream would appear for everyone and would look like it worked well. And, like, the delay was a couple seconds. It wasn't that bad. Right? But, like, dirty, super dirty, but it worked. And and, honestly, that's gonna be kind of the theme of this whole setup. Right?
Super dirty, but it worked. You had a couple of these in iMeme. Right?
Yeah. There there was a couple that we had in iMeme. So one of them so at the time, again, like, to to set the stage, the innovation of showing video in a browser without launching RealPlayer no one here probably knows what that But it used to be to launch a video, it would launch another application in the browser that sucked, and it would, like, crash your browser, and you hated your life. Okay?
So one of the cool innovations that YouTube, the startup YouTube, had before it was acquired by Google was to play video in Flash in the browser that required no external dependencies. It would just play right in the browser. At the time, that was, like, awesome. Like, it was, like, it was a major product innovation to do that. Yes.
And so we wanted to do that for music at iMemes, and we were looking at the tools available to do it. And we saw all this great tooling to do it for video. And so rather than rolling our own tools that was music specific, we just took all of the open source video stuff and hacked the other video code that we had so that every music file played on iMeme was actually a video file.
Was a dot FLV back in the day. Yeah. And it was actually a Flash video player. And the entire it was basically of we were playing video files that had, like, a zero bit in the video field, and that was just audio. And we actually were trans coding uploads into video files. You know I'm saying? Like, the whole the entire thing was was it was a video site with no video.
I don't know how else to explain it. And it works. And and I do think this is a recurring theme is a lot of the best product decisions are ones made kind of fast and kind of under duress. I don't know what that means. But it's like when it's, like, 8PM in the office and the site's down, you tend to come up with good decisions on this stuff.
So we had two more at Twitch that were really funny. The first one, talking about duress, was our free peering hack. So streaming live video is really expensive. Back then, it was really expensive, and we were very bad fundraisers. That was mostly my fault. And so we were always in the situation.
We didn't have enough money to stream as much video, and we had this global audience of people who wanted to watch content. And so we actually hired one of the network, ops guys from YouTube who had figured out how to kinda scale a lot of YouTube's early, usage. And he taught us that you could have free peering relationships Yep.
With different ISPs around the world and so that you wouldn't have to pay a middleman to, say, serve video to folks in Sweden. You can connect to yourself your servers. You you go I forgot what they're called. It saves you money and it saves them money. That's what they wanted. Yeah.
And there were these massive, like, switches where you could basically, like, run some wires to the switch and bam, you could connect to the Swedish ISP. Now the problem is is that some ISPs wanted to do this free peering relationship where basically you can send them traffic for for free. They can send you traffic for free. Others didn't.
They they didn't wanna do that or, like, they weren't kind of with it. And so I I think it was Sweden, but I don't remember. Some ISP was basically not allowing us to do free peering, and we were spending so much money sending video to this country, and we're generating no revenue from it. It's like, we we couldn't make a dollar in advertising.
And so what we did is that after ten minutes of people watching free free live video, we just put up a big thing that blocked the video that said, your ISP is not doing a free peering relationship with us so we can no longer serve you video. If you'd like to call to complain, here's a phone number and email address. And that worked. How fast did it take for that to work? I don't remember how fast.
I just remember it worked. And I remember thinking to myself, it's almost like, that ISP was a real company. Like, we were, like, a website in San Francisco. And and, hey, that worked. And then the second one was translation.
So we had this global audience, and we would, like, call these translation companies and we'd ask them, like, how much would it cost to translate our site into these, like, 40 different languages? And they were, like, infinite money. And we're, like, we don't have infinite money. And so I think we stole the solution from Reddit.
We were like, what happens if we just build a little website where our community translates everything? And so, basically, it would just, like, serve up every string in English, and it was, like, served to anyone who came to the site who wasn't from an English speaking country and was like, you wanna volunteer to translate the string in your local language?
And, of course, you know, people were like, well, what if they do a bad job translating? I was like, well, the alternative is it's not in their language at all. So, like, let's not make the perfect enemy the good. And I think we had something where, like, we would get three different people translated and, like, matched, but, like, that happened later.
But we basically got translation for a whole product for free. Maybe to end, because I think this might be the, like, maybe the funniest of them all,.
tell a Google story. Because I I think this one's, like, the, like Yeah. Really? Like So so look. For the Facebook story, that was firsthand where I personally witnessed the servers with my own eyes. So I'm a % confident that is what happened. Because I was it was me. Right?
This is not this Google story is secondhand, and so I may get some of the details wrong. I apologize in advance. But I'll tell you, this was relayed to me by someone that was there. Alright? You ready? So, look, the original Google algorithm was based on a paper that they wrote, which you can go read PageRank. It worked really well. It was a different way to do search.
Okay? It worked they always didn't have enough hardware to scale it because, remember, there was no cloud back then. You had to run your own servers. And so as the Internet grew, it was harder and harder to scale Google. You still with me? Like, there were just more web pages on the Internet. So it worked great when the web was small, but then they kept having more web pages really fast.
And so Google had to run as fast as they could to just stay in the same place. Just to run a crawl and reindex the web was like a lot of work. And so the way they work at the time is they weren't reindexing the web in real time constantly. They had to do it in one big batch process back in the day. Okay? And so there was some critical point. This was probably in the 02/2001 era.
Again, this is secondhand. I don't know exactly when it was. But there was some critical point where this big batch process to index the web started failing, And it would it would take three weeks to run the batch process. It was like the, you know, reindexweb. sh. You know? It was like one script that was like, Google. You know?
And it started failing. And so they tried to fix the bug, they restarted it, and then it failed again. And so the story that I heard is that there was some point where for maybe three months, maybe four months, I don't remember the exact details, there was no new index of Google. They had stale results. So anyone any user of Google, they didn't know this. You know, the user didn't know this.
But any user of Google was seeing stale results, and no new websites were in the index for quite some time. Okay? Yes. So, obviously, they were freaking out inside of Google. And this was the genesis for them to create MapReduce, which they wrote a paper about, which was a way to parallelize and break into pieces all the little bits of crawling and reindexing the web.
And, you know, Hadoop was created off of MapReducer. There's a bunch of different software used. And I would argue every big Internet company now uses the descendants of this particular piece of software, and then it was created under duress when Google secretly was completely broken for an extended period of time because the web grew too fast.
But I think this is the most fun part about this story. When the index started getting scale stale, did Google shut down the search engine? Did you Nope. Like, that's the coolest part. Like, people just didn't.
realize. They didn't know. And did they build this first? Again, in terms of do things that don't scale, did they build MapReduce before they had any users? No. Like, they basically made it this far by just building a monolithic product, and they only dealt with this issue when they had to.
You know, I think this is, like, such a common thing that comes up when we give startup advice. You know? We'll get a founder that's like, oh, how do I, like, test my product before I launch to make sure it's gonna work? And and I always come back and tell the founder the same thing.
Like, if you have a house and it's got full of pipes and you know some of the pipes are broken and they're gonna leak, you can spend a lot of time trying to check every pipe, guess whether it's broken or not, and repair it, or you can turn the water on. And, like, you'll know. Like, you'll know exactly the work to be done when you turn the water on.
And I think people are always surprised that that's basically all startups do is just turn the water on, fix what's broken, rinse and repeat. And, like, that's how big companies get built. It's never taught that way, though. Right? It's always taught in, like, oh, somebody had a plan, and they wrote it all down. It's like, never. Never. And you earn the privilege.
to work on scalable things by making something people want first. You know what I think about sometimes with Apple is picture, like, Wozniak hand soldering the original Apple computer and, like, those techniques compared to, like, whoever it is that works on Apple that designs AirPods. Yeah. Like, it's the same company, but, like, Wozniak hand soldering is not scalable. Yep.
But, you know, they earn because that worked, they earned the privilege to be able to make AirPods now. And because Google search was so good, they earned the privilege to be able to create super scalable stuff like like MapReduce and all these other awesome internal tools they built. Right? Yes. But if they wouldn't put that stuff first, it wouldn't be Google, man.
And so to wrap up, kind of what I love about things that don't scale is that it works in the real world. Right? The Airbnb founders taking photos, the DoorDash folks doing deliveries. It also works in the software world. Right? Like, don't make the perfect the enemy of the good.
Just try to figure out any kind of way to give something something that they really want and then solve all the problems that happen afterwards. And and you're you're doing way better. Alright. Thanks so much for watching the video.
✨ This content is provided for educational purposes. All rights reserved by the original authors. ✨
Related Videos
You might also be interested in these related videos