#009 - Scaling MySQL with Brian Morrison from PlanetScale

Download MP3

**Brian:** [00:00:00] Vitess was originally

created to scale YouTube. Way back in the early 2010s YouTube was running out of resources.

They were pushing the, the absolute limits of MySQL to the point where even their replicas, the, the amount of data that was being stored in a replica, was exceeding what MySQL was able to handle. So that's

where Vitess was birthed.

It was created in order to handle the scaling issues of, of YouTube back in that timeframe.

**Brian:** Brian, I'm so excited to have you on the show. Do you wanna take just a second and introduce yourself to the listeners?

Yeah. My name is Brian Morrison. I am a full stack developer. I've. Worked in a lot of different industries. I've done front end, I've done backend professionally, I've done React and Angular backend. I do go and C Sharp and currently I'm developer educator at Planet Scale. So a lot of what I do is managing [00:01:00] the blog, managing the YouTube channel and then also working the documentation and making sure that's all up to date based on feedback we get from our customers as well as new things we might be building internally.

**Lane:** I actually didn't know until just this moment that you are a GO developer. That's always really exciting to me.

At Boot.dev we mostly

do Python and go, but with a really big emphasis on go, it's like 30, 70 is the split there.

**Brian:** Nice. Very cool.

**Lane:** cool. So I obviously brought you on primarily to talk about planet scale because planet scale is a key part

of this CI/CD course that

I just finished writing and will be released by the time this podcast episode is released. Tell us what planet scale is like 50,000 foot


**Brian M:** Yeah, so 50,000 Foot. We are a hosted MySQL compatible serverless database platform. That's kind of it in a nutshell. Key emphasis on the platform bit because we offer a lot more than just MySQL, even though that's like [00:02:00] the, the main offering. We do offer database branching that's very similar to something like you'd work

**Brian:** with in a GIT environment,

**Brian Mo:** which most developers I, I believe, are familiar with.

We have features that enable zero downtime database migration. So if you're making changes to your database in that branching setup, you can merge changes and without having to like, take down production, take down the database itself. We offer data caching. We have an api, we have a C L I. We offer integrations with a bunch of different other partners as well.

So it's, that's why emphasis on the platform, my sequel is the tip of the iceberg, and then it goes so much deeper

**Lane:** the core of what your users are interacting with is MySQL, but all

of the features that planet scale builds are like stuff that you would have to do manually if you deployed bare bones. MySQL is, is the way I'm, I'm taking that.

**Brian Mo:** Yep. Pretty much.

**Lane:** I wanna give some context. I recently wrote

this Learn CI/CD course for the Boot.Dev platform.

It's gonna be one of the last courses in this like backend learning path. [00:03:00] And throughout the course, students are using GitHub actions to Test their code, right? They're doing automated testing and automated formatting all very much within GitHub. GitHub actions. And kinda the last step of the course is to connect the web server that they've been doing all of this CI and CD with to a production database. And as I was shopping around different database solutions, obviously I've used many different databases over the course of my career, but I've never specifically shopped for a database that would be great for students to spin up. A, development instance of in the cloud, because usually you are,

spinning up local

databases to test with, and that's what students would typically do. But the whole point of a continuous integration and continuous deployment course is that at some point there is a deployment. And I was just really impressed with how a, how easy it was to get started with planet scale, but also that you have a free trial that doesn't even require a credit card.

You can get this like very ephemeral. Very small database. That to me is just like [00:04:00] perfect for education purposes and I don't know why anyone else isn't doing this. Can you speak to, how Planet Scale thinks about scaling databases or, or I guess what the value proposition of Planet scale is when it comes to scaling databases?

Because I was both impressed, like with how large it can go, because that's like the main thing, right? Planet scale, you can scale up your database to. Very large sizes, but also how small it would scale.

**Brian Mo:** Yeah, so just reiterating the question to make sure I understand this correctly, you just want some insight into how we scale at planet scale or kind of the, the philosophy design around it. Is that correct?

**Lane:** What, I guess for my listeners, like what does it mean to scale a database? A lot of this people listening to this podcast might be very new to databases in general and kind of just, just thinking of a database as somewhere to put

data, like

what does it

even mean to scale up or down a database and, and what uh, business ramifications does that have

for your project?

**Brian Mo:** I see. [00:05:00] Okay. So I'm gonna, I'm gonna go with the very academic answer and say that

there's two primary ways of scaling. I think just it's architecture in general. This is completely agnostic from databases, but you can either scale

**Brian Morrison:** out or you can scale up.

**Brian Mo:** Most databases support the scaling up method, which means if you're running your database on a server, You just kinda like toss more resources at it, more hard drive space, more memory, more ram, and then it gets faster because the database has, those additional resources to utilize.

But there's also the scale out method or horizontal scaling, and that's where you can, you can create a additional nodes, we'll call them. And then those nodes, depending on however many you have, can all kind of work in tandem together to make the overall environment operate more efficiently and more smoothly.

At PlanetScale we, we really hone in on the scale out aspect. We use a platform underneath everything called vitess And the test is I'll claim to fame as it's an open source, horizontal, horizontally scaling. [00:06:00] MySQL platform. So that's really kind of what we do it now. Now, that's not to say if, if you're on like one of our enterprise tiers that we can't like fine tune some of the underlying architecture to scale upwards and out at the same time.

But for most people that are gonna be using the platform, we're, we're looking at more of a scale out implementation.

**Brian:** Got it. So scaling up or scaling vertically it sounds like is when you just add more resources to the same machine. So you've got like one physical server and we're going to shove more sticks of ram in it. If it's running out of memory or we're going to add more CPU cores. If it's processing slowly or we're going to add more disc space, if we've filled up the database with data, right? And, and that's scaling vertically, why can't I just like do that? That seems easy.

**Brian Mo:** Because it gets expensive and at some point you run out of the ability to, to bump your servers, as, as the pocketbook starts diminishing as well. In theory you can toss as many resources. Towards a physical machine as as possible until you can't. And then [00:07:00] at some point, if you, you may, if you imagine kinda like a hockey stick kind of a graph, the amount of the cost for the amount of resources you're going to apply to a physical machine is gonna start scaling exponentially over the actual return that you're gonna get on it.

**Lane:** So the, the performance benefits that I get for scaling vertically start to fall off even as I'm spending more money, like I'm spending

more money and I'm getting less return on performance. It is what I'm hearing,

**Brian Mo:** Yep.

**Lane:** whereas scaling horizontally is adding more machines. So rather than having one machine now, I have two machines.

**Brian Mo:** Basically, yeah. So within a plan,

**Brian:** within a PlanetScale database,

**Brian Mo:** anytime you spin up a production grade database, you automatically get at least one additional node for that database. And that primarily acts as a method of keeping your database online because, Nobody's perfect things happen, especially when it comes to working with computers.

I'm sure anybody who's worked on a computer can tell you that stuff happens. So the additional nodes that we spin up for you give you [00:08:00] that high availability, but also give you an additional replica for you to read off of, which, which in turn increases the performance of, of your application. Because instead of reading and writing from one of the nodes internally, you have multiples available to you to kinda like split and even out that load.

**Lane:** One benefit is that okay if I'm, if I'm all on one machine and I just scale it up vertically, I have a, say a $50,000 computer, right? But if something goes wrong, like my whole system is now like unavailable and borked. Whereas if I have three nodes in a cluster and one node goes down, my users can still get at the data they need to through the other two nodes.

So reliability is, is better. When you scale horizontally. The other thing is, is the cost. So am I correct in estimating that by adding new nodes to the cluster, the performance gains kind of scale linearly? Like I add a second node, I get twice as much read performance roughly speaking and add a third node three times as much read performance again, like I know [00:09:00] this is like super hand wavy but does it tend to work that way?

**Brian Mo:** For the most part. Yeah. Obviously it depends on your code and your configuration and how things are configured, but like you said, roughly speaking, yes. It scales more, more linearly,

**Lane:** Cool. I remember reading a paper when I was in school. I think Google had published it. I'd love to cite it, but this was many years ago, so I, I can't remember. I just remember reading the paper. But the paper.

went something like this, like in the early days of Google. The, the, the way tech companies scaled up their systems was to. Uh, Buy like really expensive state-of-the-art supercomputers, right? So like the other big tech companies at the time were just spending like gobs of money on like large, kind of vertically scaled up systems. And Google just started buying like commodity PCs that were like discarded and. Connecting them all together and building distributed systems software, right?

So they could take advantage of all of these commodity machines and put them to work like indexing the internet and it ended [00:10:00] up just being way more cost effective. Again, cuz they're utilizing all this hardware that, like in a lot of cases people didn't even want it anymore, right? Nobody wants a, a PC that's seven years out of date. But if you connect it all then like it, all of a sudden it starts to get really powerful.

**Brian Mo:** Yeah. Yeah, for sure. You run into, when you start doing stuff like that, you run into more complexities cuz now you have to figure out how to make all those machines talk the right way. And I think that's why a lot of companies traditionally will, will scale it well. I think that the tides are turned.

Earning at this point. But, back in my day, , people used to just throw, throw hard money at hardware to scale up because it, it's, it's easy to think about, right? More power, more faster. But, trying to, trying to network all these machines together and make them kind of work cohesively can be a challenge.

And that, that's one of, just bringing it back down to planet scale. That's one of the cool things that we do is all of this complexity. It's all thought through and handled under the hood for you. It's not something you really need to think about.

**Lane:** Right. I mean, That's something that like. As a small tech [00:11:00] startup you have access to these days. The reason Google was able to do this back in the day was they had amazing engineering resources, right? They had some of the best engineers in the world that could write that really complex software that handled all of those network connections and distributed load properly.

Like those are not easy or trivial algorithms to write. So generally speaking, that's why like small software startups have always opted to like scale vertically until they can't anymore because it's, it's simple, right? You can just deploy your code on a monolith and add more hardware and like your problems are solved. But these days, like we do have tools. I mean, Planet scales one, right? For databases. Kubernetes is one that I use more on, like the server, the application server side of things. But there are definitely ways to scale up your tech horizontally now. Without having to write all of the distributed systems algorithms like from scratch every single time,

Let's talk a little bit

more about Vitess.

You mentioned it a little, very quickly offhandedly

what what is Vitess? Is it part

of MySQL? Is it outside of it?[00:12:00]

**Brian Mo:** It's outside of it. It is, it is a layer on top of my sequel. So first off, it, it's interesting that you're mentioning papers from Google and they're approach to it because

**Brian:** Vitess was originally

created to scale YouTube. Way back in the early 2010s YouTube was running out of resources.

**Lane:** They were running out of the,

**Brian:** they were pushing the, the absolute limits of MySQL to the point where even their replicas, the, the amount of data that was being stored in a replica. Was exceeding what MySQL was able to handle. So that's

where Vitess was birthed.

**Lane:** It was,

**Brian:** it was created in order to handle the scaling issues of, of YouTube back in that timeframe.

**Lane:** And

**Brian:** essentially what. Vitess is, and I'm, I'm not of Vitess expert, so

**Lane:** I may butcher some of these terms.

**Brian:** Vitess.io is where

**Lane:** you can go to get all this information if you really want to dive deep into it. But essentially it is a, a gateway or load balancer that sits in front of multiple MySQL nodes.

Multiple MySQL instances, the actual [00:13:00] Damon itself that's running on virtual machines or containers of some sort. And then each one of those instances of MySQL has some kind of sidecar process that communicates with that central Central load balancer in order to distribute the load evenly across all of these across all of these different nodes.

And that's, it's a layer, it's an abstraction layer on top of, of my sequel.

I wanna dive in a little bit more into how, like, how it scales

out with Vitess. So let me

give some examples. My sequel and Postgres are the poster children in my mind of like open source. Relational databases, right? And so specifically talking

about MySQL, Vitess

as you mentioned, is like this orchestration layer that allows us to like, add multiple nodes to a cluster. That's great. I always think about MySQL and Postgres as like where you would start for most web apps, like CRUD apps, right? Create, read, update, delete. If you have like very traditional kind of, a user's table and [00:14:00] a posts table, right? If you're application fits this like fairly standard model, then these relational databases are great. Place to start where looking outside of the traditional relational databases starts to make sense is when you have different data. So like a, an example might be website analytics. So you're tracking like every click on a page. You're just trying to dump as much data as you can into, someplace on disc.

With website analytics, you might have an insane

amount of writes, right?

Every time someone's doing stuff on your website, you're just writing all of these event logs to some data store, and you would need some like very special database to be able to efficiently do. All those rights and then do some big like aggregation query on the data. What I'm interested in is, as we say, scale up

MySQL, using Vitess, are there

certain use cases. That you want to stay within when you decide to use like

MySQL and Vitess and Planet

Scale and certain use cases where you'd want to look to a [00:15:00] Redis or an Elastic Search or some of these other like more domain specific databases.

**Brian Mo:** Ah, that is a very, very good question. Let's tackle that last bit you mentioned about, about Redis or I guess in memory key value store. If you have data that is being that is relatively consistent at being predictably read back. I think that's where something like these caching tools make sense.

Like Redis or even like our own, our own Boost, we have a, we have our own internal kind of data caching mechanism that's still in beta, but it more or less will, will front load that data into an in-memory store and make accessing it significantly quicker for you. If you had asked me this question, say two years ago, I would've said that.

Typically if you have unstructured data where you have very common read patterns on your data and you know how you're gonna access it, then you know, some of the, the NoSQL databases would make sense. Whereas if you have highly structured data and you know exactly how you're gonna query it and whatnot, then your, your typical relational [00:16:00] database system makes sense.

I think in the last couple years, those lines have began to blurt a little bit because tools like MySQL now do support like J S O N data structures in columns. So you can very easily dump data, unstructured data into a, into a MySQL database where in a traditional configuration where you might run into issues as the amount of rows or amount of data that's being stored in an individual table.

One of the cool things about the test, because we, we support horizontal sharding you can actually break up the data. From, say you have one large table that you're storing all of your analytics data from, right? And it's just growing exponentially, and it's eventually gonna hit a hard limit where the engine from MySQL simply cannot handle reading back that data.

With horizontal sharding implemented, the test can actually create a logical table that spans across multiple MySQL instances, and it will know based on its knowledge of the, the topology and configuration of everything. Which tables to access in order to grab the specific data that you're accessing, you're asking for it to pull [00:17:00] back.

**Lane:** Oh, okay.

**Brian Mo:** so there is, when you start adding different layers on top of the traditional implementations of things, these lines, these lines start to blur a little bit. I guess that's probably my answer now.

**Lane:** Yeah

**Brian Mo:** my sequel for everything.

**Lane:** that's pretty cool. Let me, so let me like read that back. Cuz we may have used a term that might, might lose some people. We talked about sharding, we talked about sharding, the data. So let's, let's use like a really concrete example.

Let's say you're a really bad person and you log every keystroke that someone makes while they're visiting your site, right? So as someone is like typing on your website, you are logging a new record in the database for every single key press on their keyboard. So you can imagine you have a thousand concurrent users. They're all typing. You've got like immediately millions of rows. Of data that you need to write to, to your database. And if I'm hearing you properly, it sounds like an individual node of MySQL has some limit. I don't know what that limit is. A billion records maybe? For an individual node, 10 billion, [00:18:00] something like that.

**Brian Mo:** Off the top of my head, I, I couldn't answer that

**Lane:** Okay, that's fine. Let's pretend

it's, a

**Brian Mo:** it's some, it's some crazy, crazy big number to, to us, but I'm sure to a, to a database that's getting that much data, you could hit it pretty quickly depending on how fast the data's being pumped into it.

**Lane:** Okay, but there's some finite limit

on any individual node, and what you're telling me

is that Vitess acting as

like a load balancer that you're actually sending the data

to First. Will split that

data up. Let's say it round robins it between five nodes in the background. So you've got five actual MySQL instances each with their own tables. And it's saying, okay, you get one and now you get one, and now you get one. And they're each now storing like a row at a time so that as you're writing data, they're all like filling up consistently, like synchronized


**Brian Mo:** I think we're getting into a little bit into the nitty

**Brian:** gritty of Vitess that I, I

**Brian Mo:** don't, I'm not even confident in my answers for it. I believe that configuration is possible, but I know there are several sharding configurations that are [00:19:00] available to the test that depending on how, what your use case is, you might be able to take advantage of.

**Lane:** Sure. And, and, sorry, I didn't mean to say that's exactly how it works, but that's the general idea that you're splitting the data up between, between nodes in the background. Whether the algorithm is, is literally one row here, one row

there or, or maybe something different. Uh uh Yeah, for sure.

There's tons of different sharding algorithms out there, but, okay. So just so we understand from a high level, thess is allowing us to split that data up. Across multiple nodes. So that and the reason I think this matters is if you go like Google MySQL scaling problems, like maybe you're trying to decide on a database and you go Google a little bit about MySQL. You might read that it has all these limits and so you don't want to use it. But it's useful to know that, a tool like Planet Scale uses a test under the hood so that you actually you have to understand that there's additional capabilities added when you stack technologies on top of each other.

**Brian Mo:** Yeah, and I, I also think it's worth mentioning that if, if you are in planet scale and you're getting to the level where [00:20:00] you need to worry about sharding and application performance and how your data's being split across multiple nodes, like we literally have a whole team dedicated to helping people with that, then that is not something that we would expect anybody, who's just logging into the hobby tier, a plan scale to, to pull off and be able to set up and stuff.

That's, that's, that's definitely, that's a little bit, unless you already have a database where you have those concerns, that's a little bit of a down the road consideration.

**Lane:** Yeah, I completely agree. The, the thing that's interesting is like the listeners to this podcast, like in order to get their own hobby database up and running and, and connected to it, they won't need to know how to do this stuff, but listeners to this podcast are interested in getting backend development jobs at large companies

and being able to at least conceptually understand

the the sorts of limits

that that larger projects start to run into.

I think it can be super useful.

Let's talk about MySQL. Versus Postgres. So most of my career has been in using Postgres. I think I [00:21:00] used MySQL like at one of my very first jobs. And then I quickly started using Postgres because my next job used Postgres and have just used Postgres ever since. And to be honest, for a long time, didn't even really understand the differences because they are so similar in, in many ways. The differences tend to be quite subtle. I understand that the primary reason you guys use MySQL is because of the test. It's built specifically for MySQL. But what are some of the issues that a student who's maybe done a bunch of projects in Postgres might run into when migrating to MySQL for the first time?

**Brian Mo:** That is an excellent question that I don't think I have a great answer to because Prior to Planet Scale, my professional path was actually in the Microsoft stack. So the vast majority of my knowledge when it comes to databases is in SQL Server. Now that said, I would, I do know that Postgres has a number of additional data types on top of what you would typically consider, like a varchar or an inter, or, many of the [00:22:00] standard data types that are common across all databases.

And I, I imagine they're stored. Somewhat differently under the hood, depending on how depending on what data you're throwing at it. However, just speaking from my personal experience, going from something like SQL Server over to MySQL, since this is really my first. Job where I'm professionally working with MySQL on a regular basis, the, the amount of knowledge that translated from SQL Server to MySQL was something like, I'm gonna ballpark like 95%.

One of the main differences that has tripped me up and still does this day after working with SQL Server so long is the difference between limit and top to paginate data. But beyond that I don't really, I, I would say anyone who's interested in migrating from. Postgres to MySQL. Just check the da, their data types that they're using inside their database.

If there's, if there's overlap, great. If there's, if there's something different, I, there are plenty of strategies in order to either convert that data into a data type that MySQL can handle. And it might not even be necessary to store, to store it in a specialized data type. So there's definitely avenues [00:23:00] in order to do that.

It's just, it's gonna be unique for everybody I suppose.

**Lane:** Yeah. One analogy I like to think about is for anyone, any, any of the listeners familiar with JavaScript. JavaScript is this language that. Technically is one language, but depending on where you run your JavaScript, you get access to different things, right? If you run your JavaScript in the browser, you'll have access to certain DOM APIs.

If you run it in Node, you'll have other APIs. You run it in, Dino, whatever. Like it changes depending on where you run it. And I think SQL is basically the same way, right? SQL is a language and. By and large, if a database supports sql almost everything's going to work. But there are certain like APIs that different databases support.

So for example, the only thing that I really had issues with when migrating from Postgres to my sql as I was writing this course was Postgres has a native U U I D type, right? So it stores under the hood the, like binary format of A U U I D. Universally unique identifier for anyone who's not familiar [00:24:00] with that that I often use for like primary keys and IDs within a database.

My sequel doesn't have that built in natively. So you store it as like a binary 16 or something like that, right? Kinda a raw uh, a roster stringing. So like there's, there's there definitely a corollary. You can do both things in both databases. The syntax changes just a little bit depending on which database you're using.

**Brian:** Yeah, very true. I just, my, my, I don't think it's, it's not enough where it's like you're going from language to language. Certainly, like going from Postgres to MySQL. I'm sure 95% of even that knowledge is gonna transfer over just the same, which is great that we have one, one common language that splits across most of our data, our, our relational database platforms.


**Lane:** And to be clear, when I was doing this migration, I was writing raw SQL and I still only had to change the types of couple fields to get it to work.

If you're using like an O RM that's mapping like your programming languages code, so go or Python or whatever into SQL for you. It's very likely you won't have to change anything [00:25:00] because under the hood, the O RM will make those transitions.

Now you, you might have to manually, like if you're actually migrating a production database, you might have to like, do some changes on the database side, but it's, it's unlikely you'll have to change your code, I guess is the way that I would I would phrase that.

One more question I wanna talk about regarding the scale of my sequel and the, and planet Scale is, How do you

think about writes vs reads?

Do they both scale up equally well as you add nodes to a planet scale cluster? Or is it, is the horizontal scaling mechanism, within, in thees, optimized for, for many reads over many writes, for example?

**Brian Mo:** Again, I, I would certainly double check the docs, but I, it's based on configuration. You can set up the different nodes within which they're called tablets in in, in the test lingo can be set up with different attributes that will flag that specific tablet as a, as a read only as a. As a, as a right or even as like solely dedicated to backup.

[00:26:00] It's pretty cool what they've put together. That said, I think the vast majority of applications are, are pretty heavy into the read and a little bit less so into the right. Which is why when you spin up a database implant at scale, we'll set up these replicas that are, are flagged as read only. So this way you can use them for your reads.

And then if you need to hit the main for, for writes and this. This is not, definitely not necessary for everyone, but you have that capability to, to bounce back and forth between the two. Now we also offer read only regions, which if you wanted to bring your data closer to your users that's another functionality where you can get a completely I it's a completely separate cluster of your database in a different geographical region anywhere you want, basically around the world.

We support a number of, of regions in a w s and G C P at this time.

**Lane:** Cool. So it sounds, now I, I, I want to put in all the caveats that like, people should really go check the docs on what I'm about to say, But from a high

level, I, I do like, uh, discussing this stuff and, and we'll, we'll just make that [00:27:00] disclaimer. For anyone listening, there are databases I'm familiar with where. The distributed architecture of the database is like, essentially the idea is that there is not necessarily a master node like you can read or you can write to any node, and then the database becomes quote unquote, eventually consistent, right? So you kind of like give up some consistency in your database in the sense that if you write to one node and then read from another, Like at basically the same time.

Like they might not be perfectly in sync at that moment. So you give up some of that consistency, but it means you can scale up better in both directions in the sense that you can read and write to any node as you add nodes to the cluster. My understanding is that tests might not be, might not take that approach.

It takes a more consistent approach in the sense that you won't have this consistency issue. But that you have to run all of your rights through one node maybe, and then read from other nodes. Am I accurate in guessing that?[00:28:00]

**Brian Mo:** Yep, that's pretty accurate. And this is, this is before my time at Planet Scale, so I'm just speaking some things that I've heard. But when, when it was explored on whether we wanted to, to create these configurations where there would be multiple right? Read and write nodes the, the tradeoff of complexities didn't really match what the benefits were.

Most, most use cases would really realistically only need one node that you would need to write to and then your. The read only nodes would essentially act as failover. So if something did happen to that write node, another one would come up and pick up the slack.

**Lane:** Yeah. So one piece of advice I would give to anyone listening to this podcast if you're kind of new to backend development is I think there's a tendency especially among new backend developers to like, Use very generic terms oh, Mongo scales really well and Postgres doesn't scale very well. Those two statements are not true like that, that you can't think about it in those like vague general terms. You need to think about your application and, and how it accesses data. So like you mentioned, most applications are read heavy. I think if you think of [00:29:00] any website, it becomes very clear why that is, right? Right when you load the page, you're probably reading a, a bunch of different rows from the database. Anytime you navigate between pages, you're doing reads, right? Anytime you open a dropdown, you might be doing another read from the database, whereas really the only time you write something is if you like, create a new thing on the website. So imagine Twitter, if you log onto Twitter and doom scroll for an hour. You've just done probably thousands of reads to the database and if you don't tweet anything, you haven't even done a single. Right, right. So, um, Thinking about that kind of stuff whenever you're confronted with the problem of scale, I think is helpful and it makes sense why, planet scale.

With the use case you guys have, I'm guessing primarily web apps and websites, why you would optimize really heavily for reeds.

**Brian Mo:** Yeah, and I, I definitely think advice to some of your, your listeners who are beginner devs. I would, I would, I. Avoid becoming overly, what's the word I'm looking for? Over optimizing too early on, right? If you're starting a new project, you, this is probably not something you [00:30:00] need to be concerned with.

And even if you get a job at a, a big enterprise company, having the general knowledge that a lot of these, these, this functionality and this technology exists, it's gonna be enough to get you in the door and. There's probably gonna be another senior engineer there who's gonna be able to like, take you through the ropes.

I'm, I'm a fully self-taught developer. I've never, I don't have a professional degree or anything, and all of my experiences just come from, from mentors in the field that have walked me through the way that things work. And I've just kind of like accumulated knowledge over time of a lot of this stuff too.

Just something else to keep in the back of your head as, as you kind of like move into the tech or dev space.

**Lane:** I think that's really good advice in, in, in my experience, as a senior developer, you might get some of these like more hairy like scalability type questions on interviews. But as a junior developer, I think it's much more likely that you'll just get questions like, have you used X technology before? The questions will be simpler, but I, I do think. As a junior debit can be really helpful to, at least from a high level, understand what all these words are so that when you're sitting [00:31:00] in an interview and a term gets tossed out oh, we horizontally scale up our database cluster. You don't have an immediate look of fear on your face. You get what, what the heck just came outta their mouth.

**Brian:** Yeah, don't get the deer in the headlights.

**Lane:** Yeah. Yeah. You mentioned edge or geographic. Data distribution, and I just like glossed over it until now. I think what you said was that you can, you can basically have your database cluster somewhere geographically. Let's just pick a location, say New York City and you can have a read replica elsewhere in the world.

Why, why would you do that?

**Brian:** It depends on. The reason you would do that is if you have plenty of users in a very specific part of the world. So like taking your example, if you have, if your your main, I'm gonna go, I'm gonna go with Virginia because I know that Virginia is like the main hub for aws. Let's say you have

**Lane:** of this is, yeah.

**Brian:** you, let's say you have your main database cluster in [00:32:00] US East One, which is on AWS and in Virginia.

And all of a sudden you notice that in Europe, You're getting a huge spike of users. In that scenario, everyone who's trying to access your application or website is actually going across the globe. To hit your, wherever your application and database is hosted in, in West Virginia. Now in order to optimize this, what you'd want to do is ideally set up your architecture or your, your application, your database and whatnot, closer to wherever your users are.

So in, if, if, They don't even, I think there's a data center in Ireland or, or somewhere over there. , AWS is so many data centers. I don't even know where all them are right now, but let's just assume Ireland. You put your, you can put your application in, in Ireland and store it there and then, put your planet scale database, put the read only version of your database in that same arena.

So now in that scenario, your rights are still gonna take a little bit longer than the people who are accessing them from the states. But at least if, if your application is configured like most are, where the vast majority [00:33:00] of them is gonna be reads, that's gonna take care of 90% of people who are trying to access your application from Europe as opposed to having to come across the entire world.

They can access the data more locally, which in turn makes your application a little snappier.

**Lane:** That makes a lot of sense. I've loaded, so boot dev as a web application. I've, I've done the boot dev experience like through a VPN in India and it's a lot slower than connecting like to the data center. I think actually I host my website in Salt Lake and I'm just outside of Salt Lake, so it's like always super snappy for me. But. At some point when we, as we scale up the company and the project, we'll probably want to do some sort of geographic distribution. Like right now, we have, I mean we have 40,000 like total registered users. We probably don't have enough users in any given geographic area to warrant that complexity, but I think it definitely makes sense at some point for us to explore those sorts of things.

**Brian:** Yeah, just like with what, what a lot of people realize, especially [00:34:00] as they get into dev, is it's a lot of, it's just this evolving beast that you just kinda like things come up and you just gotta figure out how to tackle 'em and then knock 'em back down. And I mean if you ask me, that's really that's the fun part of this field is like these little things kinda pop up that you now have to go and figure out a good solution for, to engineer in order to address those users issues.

**Lane:** Yeah. Yeah. Like for most applications that latency isn't going to be a deal breaker. I have users in India using boot dev and they're happily using boot dev and like it's a little slower than they would have if they were using it in the States, right? Right next to the data center. But it's not like unbearably slow or anything like that.

It's probably like a second of latency rather than a hundred milliseconds. Give or take, but like I could definitely think of, there are applications right outta the gate where you might need to think more about geographic distribution. Like in, in gaming, it could be really bad if you have a ping time of a second and a half.

So it just depends on what you're building.

Earlier you mentioned that planet scale, in addition to all these scaling things that it does for you [00:35:00] automatically there's some additional features that you've added. So I think you mentioned like C I C D or branching. Could you, could you speak to that a little bit?

**Brian:** Yeah, and I really, this is one of the coolest features of Planet Scale, and when I was going through the interview process, made me super excited for the future of the company. Using the power of AEs, we can create isolated copies of your database schema, which. In the user interface we call branching.

What this allows us to do is because these branches are isolated, you can create a branch, create a connection string to that branch, and then experiment with building features or testing things on that branch without, with having zero impact on your production database. Which is, which is really neat.

Now we've developed the feature. Very similar to the way you would do like code merges, code branching, and code merges and get and GitHub. So a branch in, in plan scale is akin to a branch in, in GitHub, but also we have this concept of deploy requests. So once you're finished making your changes to your, [00:36:00] your development branch, You can open up a deploy request, which actually lets other developers on your team comment on the changes, review the changes, just like you would a pull requests inside of GitHub.

And then once everything is checked off and ready to go, you can merge those changes in. And then we, we do some cool magic behind the scenes, which lets you merge those changes in without having any downtime to your application. So you can. You can merge them in immediately once the, the changes are done.

Or you can actually hold off on the merge and say Hey, I don't if, if you, if you kick off a deploy request and start merging, say at 5:00 PM right? Right when you're getting ready to leave, leave the office for the day. Like you don't want all of a sudden things to go into production after hours.

So you can actually like just say, Hey, hold on, let's just wait in the morning. We'll come back and check it, make sure there's no issues and all that, all that whatnot, and then cut over. And the cut over is, is near real time, which is, which is pretty cool. We also, on top of that, offer a back out feature too, which, which you have a 30 minute window.

Once those mergers have been changed in to effectively just say, nevermind, something went south. Making [00:37:00] changes to databases in general is hard and we try to make those as simple as possible, but it's inevitable that things might go wrong. So we have another feature which lets you quickly undo those, those changes.

So you can get back your application running as quickly as possible.

**Lane:** Nice, a little control Z for your database.

**Brian Mo:** Yes.

**Lane:** Cool. Okay. I have a question about the branching cuz it sounds really, really neat. A big problem that I had at a previous company was that we had different staging environments. So like we obviously we, we weren't like just making changes and rolling straight to production.

We would roll out to staging environments. Then we had Q QA teams that would check the CO or the application on the staging environments. And one of the big pain points. Was that uh, I mean schema as you mentioned was one thing, but another big pain point was the data itself, like you need like copies of the data in the database.

Does the branching address the data as well as the schema, or is it just schema changes?

**Brian Mo:** In our base tiers, it's just the schema changes. So you're you're kinda left to yourself to seed it with data if you want to, which, which can easily [00:38:00] be done using our, our C l I. You can actually connect direct, you can connect to any branch using our C L I and get yourself a, a MySQL c l I connection to it, to, to run some of the, the same commands you would if you were using the MySQL C l I on a regular MySQL database.

Now, once you start getting into some of the higher tiers of our offerings, we offer a feature called data branching, which essentially takes your most recent backup of the source branch and just restores it directly to the database. Takes a little bit longer because obviously you're, you're piping data to it.

So it depends on how, how big your database is. But using the combination of branching and then the data branching portion of it, you can actually. Get a complete replica of your production database that's completely isolated from your production database in order to build on and test and bang away at, and, and do all the, all the things that most developers do when they're extending a database.

**Lane:** That ex, that sounds exciting to me cuz it sounds like basically an out of the box solution for the kind of manual crap that we built at that company. To get things working right, like we had all these scripts that would like, clone the database, spin up a new one. And it's just a [00:39:00] lot of work.

I haven't tried the feature yet. Sounds pretty cool. Very excited for that. Alright, where can people find more about planet scale or. Let's start with planet Scale and then I also want to plug your stuff cuz I know you do a lot of stuff on Twitter and and YouTube, but where can people find Planet Scale?

What are the resources they should go look at first?

**Brian Mo:** Yeah. To get started go to plan scale.com/docs. That's where a lot of my work lives. We, we document all of our features. Pretty thoroughly. So this way, if you're looking to understand how certain features work that's the first place to check. Our blog is also an excellent resource to finding out how to build certain applications on top of planet scale, like just sneak preview of something that's coming out well.

I guess by the time this podcast launches, it will be out, but like how to build a a, a Laravel application on top of Planet Scale. We have a new blog post that's coming out that's gonna show you how to do that and build something that you can, it's not just a simple hello World application. You can click around, you can do some things within the application.

And we have several [00:40:00] of those on top of just our normal Highly in depth technical blog articles that we have. I would say those are some of the best resources, our YouTube channel. Go look at that too. You'll, you'll see my face a lot sprinkled all over there where I'm trying to show you how to use certain features of the, of the platform as well.

Our hobby plan is free. You get a five gigabyte database. Our, our tiers are all usage charge or usage base now. You get memory serves me, it's 1 billion rows red and 10 million rows written in a given month for a free tier, which is super generous, in the grand landscape of database offerings.

Yeah. And then, hit us up, let us know. Submit a contact request or at planet scale on Twitter is one of the best ways to, to get in touch with us. I'm one of the people that actually manage that Twitter communication too, so you might end up even chatting with me behind the scenes and not know it.

**Lane:** That's awesome. And where can people find you specifically on, on Twitter or anywhere else that you hang out?

**Brian Mo:** I'm a little bit less on Twitter these days to be entirely honest. If you wanna follow me, I'm at Brian Mm. Dev. [00:41:00] If you wanna follow any of my, my other work at Brian. Mm Dev. I'm pretty much that everywhere, but I'm also that on YouTube. My website is Brian morrison.me. I, I blog on there. Could find all my past work and everything I've, I've done in my career on there as well.

I, I like to write about my past projects cause I've worked on some, what I think are pretty interesting things along the way. Yeah, I think that, I think that kind of sums it up. Oh, and then one other personal plug next month I will be at that conference up in Wisconsin. Dell's. I'll actually be giving my first presentation.

I'll be talking about breaking, and this actually lines up really well with your C I C D course. I didn't even do this on, on purpose, but I'll be breaking down kind of a, a full pipeline and mimicking something like the uh, Like Netlify does where you push, where it pushes your code of production. I'll be deconstructing that and showing you how you can build your own pipeline using a bunch of different kinds of tools and give you some starting points and whatnot.

So come say hi.

**Lane:** awesome. Yeah.

**Brian Mo:** meet people.

**Lane:** Congratulations. First presentation of the conference.

If any listeners are, are at the conference, obviously go watch the talk. I'm guessing the co talk will probably also go up on YouTube Conferences usually do that, [00:42:00] but,

**Brian Mo:** I don't, I don't know. It's not a, it's not one of the main talks, but I'm not entirely sure. I, that conference runs runs two a year. There's one in the, in the summer, it's in Wisconsin, in the Jan, in the January, it's in Texas and they didn't record some of the smaller sessions in Texas. So I don't know, it'll eventually be out there, even if I gotta record it myself in front of my

**Lane:** Yeah, Cool. Sounds great, man. And if in case anyone's confused, it's actually called that conference, like that's the name of the conference.

We're not like, just being facetious, like everyone should know the conference we're talking about,


**Brian Mo:** it's creative, but can be a little confusing. That US is the website though. It's, it's run. The, the guy who runs it is really great. It's a fantastic conference. It was, I more fun than I've had at any other, any previous conference I've ever been to.

**Lane:** That's cool. I need to go to more conferences maybe. Maybe that's gonna be one of the next ones I hit up.

Thanks so much for coming on the show, man. I'll talk to you later.

**Brian Mo:** Yeah, it was great chatting with you. Thanks for having me Lane. Bye.


#009 - Scaling MySQL with Brian Morrison from PlanetScale
Broadcast by