English System Design

Part I: VPS and host –> vertical scaling


Good, get more RAM, more processor



Part II:vertical scaling –> Load Balancer –> Distributed –> The catch of this method


Firstly analyze the catch of the vertical scaling,
Then talk about if we have aisles of servers, how to distribute it:

Finally give a plan: use black box — load balancer and return the load balancer's IP address
Draw back of the plan: it is a little hard to inquiry the working state of the backend server
Before continue talking, axle told the drawback of the previous method

So the teacher try to lead students to develop some other solutions, and he use the story from the lecture 0 (about DNS to lead) and one of the student comes up with do DNS with some tricks


Could you do some DNS tricks and return a different IP address based on what the user requested



Part III Round Robin –> round robin drawback –> state server –> using RAID to improve the robustness of the state server


First talk about Round Robin and the catch of the Round Robin (since it is rounded some machine will always be allocated with some really heavy work)

The previous drawback is proposed by the student, another drawback is if it is not queried (since the browser can cache it) then the round robin will not work

Cache definition: open session otherwise it is really useless to query the DNS server for the same thing over and over agin

The cache is determined by the TTL of the DNS server (the DNS server will determine when to change the return IP) so this mechanism will also leads to a pitfall that same machine get too much work

so axle's proposal is actually very good, since it asks the load balancer to handle all of them, and even we don't use the method to inquiry the backend server about how busy are you, we can also use some method like: randomness and round robin — perhaps still put too much work on the same server but the cache will not impair anything

Then give the example that if the php session break and our backend server is php based, how this load balancing model impact us

Major Reason: sessions recall, tend to be specific to the given machine.

Big problem occurs: larger than prompt password but you cannot check out all the staff in you shop cart

one solution: have a specific php server drawback: no redundancy and if you are popular, you will get lots of work for this specific server

Teacher analyze the catch: There is no round robin, because you have to send alice to that machine again and again

Continue the idea of factorization: we don't have the file server but we have the server to store all the state

(Some students proposed put them in the load balancer but it is not good, and the teacher proposed the questions that how to improve the robustness, how to increase the redundancy)

The RAID come in


You run to the store put in a new one and you haven’t lost any data. RAID6 is even better, what does RAID6 do, do you think? Axle?



Part III: First continue the talk of the RAID 6 –> then propose some general method to do with the redundancy to increase the stability –> the drawback is it still cannot handle with the downtime if your single server go down –> talk about the load balancer itself –> PHP acceleration –> static website drawback –> mysql query cache

Although Axle and the teacher's solutions is fairly good, but it still cannot handle the downtime where all the shared states go down

So we need replication

Then the teacher talk more about the load balancer it self (it is quite expensive)

Then after the break, the teacher come back to the sticky session issue. That is if you want to preserve your session even though there are lots of backend server, there is still a way to achieve it

the cookie will give you a solution

But the cookie still cannot provide a perfect solution, NOT THE EXPIRATION BUT THE IP CHANGES

So the teacher has proposed a method to deal with the expiration issue in this case. That is we don't store the ip address (the state might have changed) but we store a big random number like what php has done

Then the teacher talk about how to compile the php code to get some boost in efficiency

And it comes back to the cache of the dynamic website. The key is to use the CraigList as an Example

Craiglist is a dynamic website
the key of the dynamic website is you can submit your form and you will found the change in the website dynamically.
CraigList has used a way to speed thing up, that is it will not store at the data base and reinterpret it, but store the html file directly

Then the teacher begin to analyze that why we use the html file can speed things up

we have done in project 0: regenerate it on the fly CraigList: don't have to regenerate it

upside of the html is: the apache is really quick about processing the static content

Downside of this file based caching:

  1. Space
  2. redundancy — same body tag, same tail tag, same pre-process tag (a better way is to have some template)
  3. Another big cache: CANNOT USE THE PATTERN TO CHANGE THE TEMPLATE (color css) — regenerate all the pages

And the teacher summary that it is not a good solution. And very few people in the internet will do this (but actually the CraigList has done both)

And the teacher begin to talk about MySql's query cache.

Memory cache: a piece of software, save result to the RAM
If use is not existed, you store to the caching memory


You’re storing it in the caching memory.



Part IV From Cache –> Cache Size (cache garbage collection)–> Two Type Data Base –> data base backup

Firstly the lecturer talk about the cache size will make some content be evicted since the size of our RAM should be finite. Then the face book is a ready heavy website so it can benefit a lot from this mechanism (So that an opportunity for optimization).

Then the conclusion is the memory cache is a good mechanism for the database, so the teacher begin to talk about the data base. The kind of the data base can be divided into my ISAM and NODB.

The conclusion is NODB supports transaction. MyISAM uses locks which are full table locks. But the RAM will lose every thing if the power is off. So Archive Storage Engine.

archive engine doesn't cache but store, every time you will need a inquiry. But the foot notes will be on the other storage engine, so it is compressed by default. As to the replication we can use the NDB to achieve it. [anytime a query is executed on the master, that same query is copied down to one or more slaves and they do exact the same thing]

Then the teacher ask about he advantage of using this mechanism which will get some replication of the data base. The first one is you will not suffer from the down time. The second one is a load balance for the data base.

Then the teacher analyze how the load balance help the performance. You could just write your code in such a way that any select statements go to data base two, three or four and any inserts will finally go to the server one. so it is code wise.

But the previous method will have some drawbacks if we only have one writing(master) server, so the solution is we will get two servers, that is if one go down the other one can be used. The mysql also supports that.

Question is we still have the tracking issue, about how to connect the master mysql to the salve mysql.


For reads with a load balancer.



Part V From the route between the Mysql server and Load Balancer –> talk about why two load balancer needed –> then the teacher begin to build its own network on the board –> then big switches comes and we get the conclusion that every thing can fail.

Firstly the professor continue the questions about how to deal with the tracking issue, and it is a bottle neck to ruin every thing. So one solution is to use two load balancer.

And the teacher continue this complicated topic. At the first period phase of constructing face book, it uses a really silly method. And if you are in Harvard but want to send some message to MIT, you have to cross the bound and in the early on some features is restricted.

And the solution is to use the partition, for example you can put users who's last names start with a m or etc. It is not a bad idea, and it is common in data base because you can still have redundancy whole bunch of slaves in this case here. And you can balance the load on some high level information.

And through so many wired up thing, the teacher begin to build his own network including the firewall. And the server becomes two, how to connect the internet and implement the sticky session.

One solution is to use the shared state, but it is a little expensive, so our question is how to stay shard state without using the shared state saving server.

The Axle provide a solution that is to let us load balancer to listen to all the http session. (load balancer store the cookie like big random number)

But to let the load balancer to store the cookie is sometimes not a good idea, since the when the user uses other computer the cache will expire no profile is available. (since when the data base in the load balancer when you log in using different computer you data cannot be accessed) The solution to this issue is we need to partition our user and let the load balancer take the key feature (like the last name of the user) to some properly data base. This solution may solve some corner cases but will introduce new problem. Also the single point of failure.

And to avoid the data base fail, we can attach slave data base to it, but just use the slave data base, we will have the situation where have to promote the slave data base to the master database. So we will use two master data base.

But when we have two master data base, it is not a good idea to connect to each other. Since we will have some cross connect, and using such method we will end up letting our developer to understand the network topology. This is not a good layer of abstraction.

To solve the multiple database's connection issue, one of our solutions is insert a load balancer here. And let the load balancer to handle something intelligently, but the failure is between the mysql database, they use byte code to communicate so not easy to implement.

so not a solution, we use the switches. The big switch can handle connection, but the big switch can also fail because the switch algorithm. So we take care of this and we put the data center in a room, still a question that is: the building burns down.

So Amazon EC2 has the redundancy.

So the board has two data center, the question is to distribute your data center. The answer is you can do it at DNS level since different will not in the same building but even in different countries, so we need to take the geography into account. And for Google sometimes, you session will stay in the same building for a little while since sharing my session across entirely from different continent can be really expensive. (potent downtime, building out and your cache leads you to the same building.)

So the ISAC has give us a pretty good hint that is to avoid the failure is very hard, all the thing will becomes even more replicated.


So a word on security then.



Part VI Start Firstly what internet traffic can comes in if I am hosting a website.

First talk the port number, it includes 80 443 and 22 for ssh.

Then how about the load balancer, what traffic can I allow from the load balancer to my web server. To be more specific: it is really a mess to keep it encrypted because once it is inside the data center, nobody else is going to listen people inside the datacenter.

The solution is to offload the SSL at the load balancer. So everything is un-encrypted and you don't need to put your SSL certificate on all of your web servers.

Then the teacher propose a question about what kind of traffic between the web server and database. It is TCP 3306.

Then how to use the firewall to implement that, one of the silly method is we can plug them to different switch, and the switch can be the firewall by itself, so we can put some privilege for different switch only allows part of the port number. But that could be the problem and induce someone else to use it intentionally.

A better solution is if the only thing this server can do is talk via MySQL to this server and cannot instance, suddenly SHH to this server or poke around or execute any command on your network other than MySQL.

Finally our conclusion is: when you solve some problems, some new problem will occurs.


System Design CS-75

[ Silence ] Welcome back to Computer Science S-75. This is Lecture Nine, our very last [laughter]. It’s been a pleasure having everyone in the course this semester. So, tonight we talk about “Scalability.” So, we try to revisit some of the topics that we’ve looked at earlier in the semester and think about how we can deploy applications not just on say virtual machine, on your laptop, or desktop, as we’ve been doing with the appliance. But how you can scale two servers on the Internet and indeed multiple servers on the Internet so that you can handle hundreds, or thousands, or tens of thousands, or even more than that in theory. So, some of the issues that we’ll inevitably encounter is how to go about doing this. So, when it comes time to put something on the Internet, recall from Lecture Zero, that we talked about web hosts. So, this is by no means a list of recommendations per se, it’s just some representative ones that we happen to recommend. If only because the teaching fellows and I have had prior experiences with these particular vendors. But you can Google around and you can see that there’s many, many, many different options these days. However, among the takeaways hopefully from the summer thus far has been what kinds of features should you be looking for or expecting minimally in any web hosting company that you might choose. And in fact, not all of these even have those features necessarily. [ Inaudible Speaker ] Interesting, okay good. So, if your country or your work or really any network that you happen to be on or people that you know happen to be on and block access to certain IP ranges, among them for instance, GoDaddy in this case. YouTube is a popular thing to block, Facebook is a popular thing to block. That can be a sticking point, so doing a bit of due diligence or testing first could be a good thing. What else should you look for in a hosting company? Isaac?


Good, SFTP in contrast with what?


FTP and why?

Because FT — SFTP is secure.

Okay, good. So, the “S” literally stands for “Secure” and what that means is that all of your traffic is encrypted. And this maybe isn’t a big deal for the files that you’re transferring, because after all, if you’re uploading like gifs, and jpegs, and video files that are meant to be downloaded by people on the web, well, then who really cares of you encrypt those between your machine and the server. But it is important to have what data encrypted? Jack?

Usernames and passwords [inaudible].

Exactly [laughter], usernames and passwords, right? I mean that’s one of the biggest failings of something like FTP which granted is a fairly dated or older protocol, is that it sends also your user name and password in the clear. Which means anyone sniffing wirelessly around you, anyone sniffing the wire network between point A and B can see in the clear what your user name and password are. Yeah.

[Inaudible] posting companies that offer unlimited everything for a really low price.

Okay, good.

That would be due to a virtual hosting –


— that you wanted to implement a system that can grow by itself, maybe you don’t want to share the same computers, the same server with many others.

Good, so Dreamhost in particular, I think we pulled up their feature list and it was ridiculous how many features they give you, unlimited tri-bandwidth, unlimited storage space, unlimited BRAM or something like that and that just can’t possibly be real if you’re only paying like $9.95 a month. So there’s some catch and in general, the catch is that they’re making that offer to you and to a 100, to hundreds of other customers all of whom might be on that same machine, so you’re contending now for resources. And the reality is they’re probably banking on the fact that 90 something percent of their customers don’t need that many resources but for the one person or two persons that do, probably going the way of a shared host is not necessarily in your best interest, certainly not something to try to build a bigger business on. And so, there’s alternatives to thing like web posting companies, there’s VPSs, a virtual private server that you can essentially rent for yourself and what’s the fundamental distinction between a VPS and a shared web post? Axle?

The VPS is like borrowed [inaudible] Linux. It’s a virtual O machine running on cost but it’s completely its own system.

Okay, good. Well and so to be clear, the operating system is largely irrelevant in stream host and shared web posts could also be running Fedora or any operating system but what’s key is you get your own copy of Fedora or Ubuntu or whatever operating system they happen to be running. Because in the world of VPSs, what they do is they take generally a super-fast server with lots of RAM, lots of CPU, lots of disc space, and they chop it up into the illusion of multiple servers using something called the hypervisor or something like a product from VMware, or Citrix, or other companies, and even open source providers have platforms that allow you to do this, run multiple – a virtual machine on a physical machine and so, in this world, you’re still sharing resources but different — in a different way. You’re instead, getting some slice of hardware to yourself and no one else has user accounts on that particular virtual machine. Now, with that said, the system administrator, the owners of the VPS company, they themselves depending on what hypervisor they’re using, they might actually have access to your virtual machine and to your files and frankly, if you have physical access to the machine, you undoubtedly have access to your files because they can always reboot the virtual machine, for instance, in what’s called “single-user mode” or “diagnostic mode” and at that point, it’s not — they’re not even prompted for a root password. So, realize that even when you’re using a VPS, your data might be more secure, more private, from other customers but definitely not the web posting company itself. If you want even more privacy than that, you’re probably going to have to operate your own servers that only you or your colleagues have physical access to. So, here’s just a list of some of the popular VPS companies. There is one catch – in order to get these additional features or these properties in a VPS, you generally pay more. So, instead of $10 a month, $20 a month, you’re probably starting at $50 a month or something like that, maybe even in the hundreds depending on how many — how much you want in the way of resources. And toward the end of today, we’ll talk about one particular vendor of VPSs, namely “Amazon Web Services, Amazon EC2,” their elastic compute cloud that essentially lets you self-service and spawn as many virtual machines as you want so long as you’re willing to pay some number of cents per minute to have that virtual machine up and running. It’s actually a wonderful way to plan for unexpected growth because you can even automate the process of spawning more web servers, more database servers, if you happen to suddenly get popular, even over night or because you’ve been slash dotted or posted on reedit or the like, and then you can have those machines automatically power off when interest in your product or website has started to subside. All right, so how — suppose you are the fortunate sufferer of a good problem which is that your website all of a sudden is super popular and this website has maybe some static web content, HTML files, gifs, and the like, dynamic content like PHP code, maybe even some database stuff, maybe some database stuff like MySQL. Well, how do you go about scaling your site so that it can handle more users? Well, the most straightforward approach is generally what’s called “Vertical Scaling”, vertical in the sense that — well, if you’re running low on RAM or you’re kind of exhausting your available CPU cycles or you’re running low on disc space, what’s the easiest, most obvious solution to that problem? Axle?

Get a better processor and more RAM.

Good, get more RAM, more processor, more disc space and just throw resources or equivalently money at the problem. Unfortunately, there is a catch here. There’s sort of a ceiling on what you can do why? Why is vertical scaling not necessarily a full solution?

Well, you can only operate one machine so much after a while you can’t [inaudible].

Yeah, exactly. There’s some real world constraints here where you can only buy a machine that’s maybe three gigahertz these days, and maybe only has a few – a handful of – maybe a couple dozen CPUs or cores but at some point, you’re either going to exhaust your own financial resources or just the state of the art in technology because just the world hasn’t made a machine that has as many resources as you need. So, you need to get a little smarter, but at least within here, you have some discretion. So, in terms of CPUs these days, most servers have at least two CPUs, sometimes three or four or more and in turn, each of those CPUs typically has multiple cores. In fact, most of the laptops you guys have here these days are generally at least duo core, sometimes even quad core which means that you effectively have the equivalent of four CPUs or four brains inside of your computer even though they’re all inside of the same chip essentially. What does that mean concretely? It means if you have a quad core machine, you can — your computer can literally do four things at once. Whereas, in yesteryear, when you had single cores, single CPU machines, they could only do one thing at a time and even though we humans seem to think that you’re simultaneously printing something, you’re pulling up a Google map, you’re getting email and all this stuff seems to be happening simultaneously, the reality is the operating system is scheduling each of those programs to get just a split second of CPU time before giving another program, then another program a split second of CPU time, and we humans are just so slow relative to today’s processors, that we don’t even notice that things are actually happening serially as opposed to in parallel. But when you actually have quad cores, especially in a server, that means whereas in yesteryear with a single core machine, you could handle one web request at a time. Now, for instance, you can handle at least four at a time, truly in parallel and even then, a server will typically spawn what are called multiple processes or multiple threads. So, in reality, you can at least give the impression that you’re handling many more than even four requests per second. So in short, machines these days have gotten more and more and more CPUs as well as more cores and yet the funny thing here is that we humans also aren’t very good at figuring out what to do with all of this available hardware. Right? Most of you, even if you have a dual core machine, you don’t really need a dual core machine to check your mail or to like write an essay for school. You were able to do that five, 10 years ago with far fewer computational resources. Now, in fairness, there’s bloat in software and MAC OS and Windows and Office are just getting bigger and bigger, so we’re using those resources. But one of the really nice results of this trend toward more and more computational resources is that, the world has been able all the more easily to start chopping up bigger servers into smaller VPSs. And indeed, that’s how Amazon and other cloud [inaudible] so to speak are able to provide people with the self-service capability as we’ll discuss a bit later. So, within these things there are a few things you have discretion over. If you’ve ever built a computer, you might be familiar with parallel ATA, or IDE, or SATA, or SAS anyone? Axle, what do these refer to?

SATA has to do with hard drives.

Okay, good. So, SATA has to do with hard drives — in fact, all three of those have to do with hard drives. Years ago, parallel ATA or IDE hard drives were very much in vogue. You might still have them in older computers these days, desktop computers pretty much, but you wouldn’t buy a new parallel ATA drive these days. Instead you’d most likely get a SATA drive where they’re 3.5 inch for a desktop or two and a half inch for a laptop and if you have servers or you have lots of money and a fancy desktop computer, you can go with the SAS drive. SAS is Serial Attached SCSI and this really just boils down to faster. For instance, whereas parallel ATA and SATA drives typically ran at 7200 rpms per minute, revolutions per minute, SAS drives, anyone know what they typically spin at? And for those unfamiliar, inside of a mechanical hard drive, there is a one or more metal platters that literally spins, much like an old-school record player where the bits are now stored. What speed does a SAS drive spin? It’s more than 7200 rpm. Axle?

Is it 15,000?

Yeah, so 15,000 is where they would typically perform, sometimes 10,000 but 15,000. So, just twice as fast so that alone gives you a bit of a speed bump. Of course, it comes at a price literally more money but that’s one way of speeding things up. So, oftentimes what people will do is for a given website that they’re creating, if it has a database, databases tend to write to disc quite a bit. Right? Every Facebook update requires writing to disc and then reading back out some number of times. So, really where you might be touching disc a whole lot, people can throw things like SAS drives in their database so that their data can be read or written even more quickly and what’s even faster than mechanical drives these days? Axle?

Solid state drives?

Yeah, solid state drives, SSDs, which have no moving parts and as a result, electrically perform much better than mechanical drives. But those too, cost more money and they tend to be smaller in size so whereas you can buy like a four terabyte SATA drive these days, three and a half inch for your desktop, you can buy maximally a 768 gigabyte SSD these days for a lot more money typically. All right. So, let’s skip — let’s skip RAID for now. So, horizontal scaling — so this is in contrast with what we just discussed, throwing money and throwing more of everything at a problem. Horizontal scaling is sort of accepting the fact that there’s going to be a ceiling eventually, so why don’t we instead architect our system in such a way that we’re not going to hit that. Rather, we can kind of stay below it by even using not state of the art hardware and the most expensive stuff we can buy, but cheaper hardware. Servers that might be a few years old or at least are not the top of the line, so that they’ll be less expensive. So, rather than get few or one really good machine, why don’t we get a bunch of slower or at least cheaper machines instead, plural number of machines? So, this is just a picture of a data center which is just meant to conjure up the idea of scaling horizontally and actually using multiple servers to build out your topology but what does this actually mean? Well, if you have a whole bunch of servers now, instead of just one, how – what’s the relationship now with Lecture Zero where we talked about HTTP and DNS? All right. The world was very simple a few weeks ago when you had a server and it had an IP address and that IP address might have a domain name or host name associated with it and we told that story of what happens when you type in something dot com enter on your laptop and you get back the pages on that single server. But now, we have a problem if we have a whole aisle’s worth of web servers. Axle?

Well, you’re going to have to have a way to treat each request so it doesn’t end up at one machine.


So, the request can be distributed over all machines.

Okay, good. So now, if we get an inbound HTTP request, we somehow want to distribute that request over all of the various webservers that we might have whether it’s two webservers or 200. The problem really is still the same, if it’s more than one; we have to somehow figure that out. So, let me put up a fairly generic picture here – let me flip past those to this guy here. So, if we have a whole bunch of servers on the bottom here, server one, two, dot dot dot — and on the top, we have some number of clients, just random people on the Internet. We need to interpose now, some kind of black box that’s generally called a “load balancer” depicted here as a rectangle so that the traffic coming from people on the Internet is somehow distributed or balanced across our various “back-end” servers so to speak. So, it might still be the case that server one and two and so forth, have unique IP addresses but now, when a user types in something dot com and hits enter, what IP address should we return? How do we go about achieving the equivalent of this man in the middle who can somehow balance load across all end servers?

Well, let – return the IP address from the balancer? Send the request to the load balancer and let the load balancer handle which computer to send it to.

Okay, good. So, instead of DNS, returning the IP address of server one or server two or server dot dot dot, you could instead return the equivalent of the IP address of this black box, the load balancer, and then let the load balancer figure out how to actually route data to those back-end servers. So, that’s actually pretty clean. So now, if the load balancer has a public IP address, the back-end server’s now technically — you don’t even need public IP addresses. They can instead have private addresses and what was the key distinction between public and private IPs back in Lecture Zero? Anyone — anyone over here? Yeah? Louis?

The rest of the world can’t see private IPs.

Exactly, so the rest of the world by definition of private can’t see private IP addresses. Then — so that seems to have some nice privacy properties and that if no one else can see them, they can’t address them. So those servers, just by nature of this privacy can’t be contacted at least directly by random adversaries, bad guys, on the Internet. So, that’s a plus. Moreover, the world has been running out of version four IP addresses, the 32-bit IP address is become into scarcity for some time now so it’s just hard or sometimes expensive to get enough IP addresses to handle the various servers that you might buy. So this alleviates that pressure. Now, when we need one IP and not multiple for our servers because we can give these back-end servers like an odd number like 192.168 which most of you have in your home routers probably or 10 dot something or 172.16 something, all of those to mark the start of a private IP address. So, that works. All right, so the load balancer has its own IP address now. It gets a request from some client on the Internet. How, using jargon from Lecture Zero onward, can the load balancer decide or get that data to one of the back-end servers? How could we go about implementing that? Axle?

Well, probably first I’d want to figure out which server to send it to, so you’d want to check if somebody has available CPU cycles that they’re not using.


So — and once you see that there’s one server with enough CPU cycles to handle that request, [inaudible] inside your server network —


— to that machine to get back whatever it is that the client requested and then the load balancer sends it [inaudible].

Excellent. So, this request arrives then at the load balancer, the load balancer decides to whom he wants to send this packet, server one or two or dot dot dot, and you can make that decision based on any number of factors. So, Axle proposed doing it based on load like who is the busiest versus the least busy. Odds are I should send my request to the least busy server in the interest of optimizing performance all around. So, let’s assume that there’s some way as demarcated by those black arrows, like talking to those back-end servers and saying, “Hey, how busy are you? Let me know.” So, now the load balancer figures out it wants to send this particular request to server one, so it sends that request to server one using similar mechanisms, TCP IP much like the packet, how it traveled to the load balancer in the first place. The server then gets the packet, does its thing and says, “Oh, they want some HTML file, here it is.” The response goes to the load balancer, the load balancer then responds to the client, and voila. So, that works. What are some alternatives to load balancing based on the actual load on the server? So, load in general refers to how busy a server is. So, what’s an even simpler approach than that, because that — frankly, that sounds a little complex? We’ve not talked at all about how one device can query another for characteristics like, “How busy are you?” Even though it’s possible. Axle?

Well, first let me point out a downside with that, you [inaudible] every file — every file the website has on every server. So, if you would instead say have one server containing all the HTML and one running and containing all the PHP, you would then see — well, okay, bad example. One example of one server with all the images [inaudible] HTML —


— [inaudible] to request an image. It sends it to the image server —


— and if it’s an HTML request, to the HTML server.

Okay, good. So, the implication of the previous story that Axle told is that under this model, server one, two, and so forth, all need to be identical, have the same content which is nice and then it doesn’t matter to whom you send the request. The downside is now, you’re using N times as much disc space as you might otherwise need to, but that’s perhaps the price you pay for having this redundancy or to having this horizontal scalability. Or instead, you could have dedicated servers — these are for HTML, these are for gifs, these are for movie files and the like, and you could do that just by having different URLs, different host names. This is, for instance, “Images dot something dot com.” “This is videos dot something dot com.” And then the load balancer could take into account the host HTTP header to decide which direction it should go in. So, that could work for us. All right, so what’s an even simpler puristic than asking a back-end server, “How busy are you right now?” Like, if you have no idea how to do that? How instead, could we balance load across an arbitrary number of servers? Think again back to Lecture Zero. You can do all of this with only Lecture Zero under our belt. So, let’s quickly tell the story. I type in something dot com into a browser; I hit “Enter,” what happens? Jack?

We create a packet to send.

Okay, packet to send — to whom do we send it?

We send it to someplace that will determine the IP address of where we’re sending it to.

Okay, good — something that will determine the IP address of where we’re sending it to — what’s that thing called? Isaac, what’s that called?

A router?

Not router — the routers get involved but —

The DNS.

The DNS server, domain name system server, so that server in the world, whole bunches of them that whose purpose in life is to translate host names to IPs and vice versa. So, I’m going to pause the story there. That seems to be an opportunity now for us to return something. Yeah?

Could you do some DNS tricks and return a different IP address based on what the user requested?

Good, so this black box, this load balancer, maybe it’s just a fancy DNS setup. Whereby instead of returning the IP address of the load balancer itself, maybe instead the DNS server just returns the IP address of server one the first time someone asks for something dot com. And the next time someone requests something dot com, it returns the IP address of server two followed by server three followed by dot dot dot and the wrapping around eventually to server one again. So, this is actually generally called “Round Robin” and you can do this fairly easily. This is just a snippet of a popular DNS server called BIND, Berkeley Internet Name Demon, I believe is the D. And this just suggests that if you want to have multiple IP addresses for host name called “dub dub dub” you mention “A” which denotes A record and just refers to an end pointer here. But A is the same as in Lecture Zero and then you just enumerate the IP address one after the other and by default, this particular DNS server, very popular one, BIND, will return a different IP address for each request. So, that’s nice. It’s simple. Again, it uses only some knowledge from Lecture Zero even though granted, you have to know how to configure the DNS server, but you don’t need any fancy bi-directional communication with the back-end servers in this model. So, that’s nice. But there’s a price we pay for the simplicity. If we only do Round Robin, where again, we just spit out a different IP address each time and let me make this more concrete just so this isn’t quite as abstract. Let me open up a terminal program here and do NS Lookup for name server lookup of Google.com. This is exactly what Google does, at least in part. Their load balancing solution is more sophisticated than this list suggests, but indeed Google’s DNS server returns multiple IP addresses each time. So, if this is so simple to implement, what’s the catch? Axle?

It’s not a very smart solution because just — there could be — the case could be that one server gets all of the really tough and hard requests —


— that take a lot of processing power. The other ones just get the [inaudible]. There’s no way to know that.

Good, so just by bad luck, one of the servers could just get a really — a real power user, someone who is really doing something computationally difficult. Like, I don’t know, what’s a good — sending lots of mail or someone else is just kind of logging in and poking around at a much slower rate and we could come up with even more sophisticated examples than that. But over time, server one might just happen to get more heavy weight users than other servers. So, what’s the implication? Well, Round Robin is still going to keep sending more and more users to that server one-Nth of the time just by nature of Round Robin. So, that’s not so good. What else causes problems here or what else breaks potentially? So, back to the Lecture Zero story, I type something dot com, I hit “Enter”, my browser sends a request to the DNS — or my operating systems sends a request to the DNS server, it gets the IP address and in this model, it’s — the IP address of one of these servers, then I send my packet as Jack proposed to that particular server, get back a response, story ends. But then a few seconds later, I visit another link on something dot com and hit “Enter”, which part of the story now changes? Jack?

Is it now going to send to the same server [inaudible] or does it send it to a new server?

Good question.

Exactly where it’s coming from?

So, how does the story change and that’ll give us the answer? Axle?

Well, it has to send it to a new server otherwise Round Robin wouldn’t work.

Oh, ideally, yes. If you want a truly uniform distribution across all end servers, then the DNS server has to return another response and I’d argue, the DNS server will return a different response the next time it is queried. But —

Oh, it’s not queried.


Because it’s saved.

Saved or —

No, but it’s — the — the IP address is stored —


— [inaudible] open session otherwise it’s really useless to query the DNS server for the same thing over and over again.

Good. So, recall these caches back in Lecture Zero. We talked about the implications of the good parts of caching whereby as Axle’s proposing, there’s no reason for Chrome or IE to send the same DNS request every single time you click a link on something dot com. That would just be a waste of time. You’re going to lose some number of milliseconds every time that happens or worse yet, a second or two. So, instead your operating system typically caches these responses. Your browser typically caches these responses as well, and so you just don’t need to do those lookups. So, if you do happen to be that power user that’s doing a heck of a lot of work of whatever sort on server one, the next guy is going to be sent to server two, not you with your subsequent request. So, caching too can contribute to a disproportionate amount of load on certain servers largely due to bad luck. Indeed in DNS, we didn’t spend much time on this particular detail but there’s typically expiration times, TTLs, Time to Live values associated with an answer from a DNS server and that’s typically an hour, or five minutes, or a day. It totally depends on who controls the DNS server, what the value is. But that suggests too, that if you are this power user on server one, it might be a few minutes, or hours, or even days, until you get assigned to some other server simply because your TTL has expired by then. So, it’s nice and simple. We can do it with a simple configuration change but it doesn’t necessarily solve all of our problems. So, in fact, the approach Axle proposed first is actually pretty good whereby you don’t use DNS space to Round Robin. Rather, more sophisticated approach would be to let the load balancer to decide to whom to send you in the back-end and the load balancer can make that decision using any number of puristics. It could even use Round Robin or randomness because at that point, you don’t have to worry about caching issues because the DNS server has only returned one IP. So — but that still leads you to the risk that you’ll be sent — putting too much load on some server so we could take, for instance, server load into account at that point. But there is something else that breaks. If we fast forward mid-semester to when we started talking about cookies and HTTP and sessions and PHP. To spark discussion, I propose that sessions have just broken in PHP. If our back-end servers are PHP-based websites and they are using the session super global, load balancing seems to break this model. Why, Jack?

Because now, the different servers have different people sessions so although one might have my session and by then, I’m redirecting from server one to server two, server two might not have my session.


Therefore, everything’s lost.

So, sessions recall, tend to be specific to the given machine. We saw examples involving slash temp which is a temporary directory on the Linux system where sessions were typically saved as text files, serialized text files. So, that means though, that your session might be sitting on the hard drive of server one and yet if by random chance you are sent by a Round Robin to server two or server three instead of server one, in the worst case, you’re going to see the same website but you’re going to be told to log in again for instance, because that server doesn’t know that you’ve logged in. Okay, find — you kind of bite your tongue and you type in your user name and password again and hit “Enter” and suppose you’re a really good sport and you do this for all end servers. You have no idea why something dot com keeps prompting you to log in but eventually, you will have a session cookie on all of those servers. The catch then though is that if something dot com is an E-Commerce site and you’re adding things to your shopping cart, now you literally have put a book in your cart over here, a different book in this cart, a different book in this cart and when you check out, you can’t check out the aggregate. So, this is a very non-trivial problem now. Axle?

But this wouldn’t happen if you — if you had dedicated machines [inaudible] one machine running PHP and then one machine serving all the images.

Very true. So, if we have horizontally scaled in the sense that we’ve factored out disparate services, this is our PHP server, this is our gif server, this is our video server. Then indeed, this problem would not arise because presumably all the PHP traffic would get routed to the PHP server. But an obvious push-back to that solution is what? Isaac?

Well, if one of them crashes and you lose all images or —

Okay, good. There’s no redundancy which is not good for uptime if anything breaks. Axle?

And also, well at some point in time, if you get popular enough, that one PHP server’s not going to be able to handle everything —


— it need to [inaudible].

The end of story’s the same. As soon as you get popular, you get too much load for a single PHP server, then we have to solve this problem anyway. So, how do we go about solving this problem? This seems to be a real pain, this one and to be clear; the problem now is that inasmuch as sessions are typically implemented per server in the form of like a text file like we saw in slash temp, then you can’t really use Round Robin. You can’t really use load — true load balancing, taking into account each server’s load because you need to make sure that Alice, if she’s initially sent to server one, subsequently gets sent to server one again, and again, and again, for at least, you know, an hour or a day or some amount of time so that her session is useful. Jack?

Is there a way to create a server that’s especially dedicated to store everyone’s session and all the sessions are started there and —

Excellent. Yes, so absolutely. We could just continue this idea of factorization and factor out not the various types of files, but a service like session state. So, if we instead had a file server, you know, like a big external hard drive, so to speak, that is connected to all of the servers, one and two and three, so that anytime they store session data, they store it there instead of on their own hard drive. Then this way, we could share state. So indeed, that could be a solution here. Axle?

What — I don’t know if the load balancer has that function, but [inaudible] instead of having an extra server, that all the other servers can query the load balancer because all traffic goes through that anyhow. What if the load balancer stores the session?

Okay, so that’s not bad at all. So, we already have a man in the middle here. It’s a black box but there’s no reason it couldn’t be a server with hard disc space. So, why not put the sessions on the load balancer? That could absolutely work. So, let me difficult then and whether we put the sessions on the load balancer or whatever — that’s no longer a load balancer then, it’s obviously doing more. It’s more of a server that happens to be balancing load and storing sessions. Whether we put sessions there in that black box or elsewhere in a new box on the screen, we seem to have introduced a weakness now in our network topology because what if that machine breaks? It would seem to be the case that even though we have N servers which in theory, those guys are never all going to die at once, assuming that it’s not the power, electricity, or something stupid like that that’s somehow related to all of them. But odds are they’re not all just going to up and die simultaneously so we have really good redundancy in our server model right now. But as soon as we introduce just a database or file server for our sessions, if that guy dies, then what was the point of spending all this money on all these back-end servers? Our whole site goes down because we have no ability to remember that people are logged in if we can’t remember they’re logged in. No one can buy anything. So, how do we fix that problem? So, we’ve solved one problem but if you think of that sort of old visual where you have like a garden hose with lots of leaks in it and you plug one of them with one hand, all of a sudden a new leak springs up elsewhere. That’s kind of what’s happened here. We solved the problem of shared state but now, we’ve sacrificed some robustness, some redundancy. How do we now fix the latter? Axle?

It’s probably not the solution you’re looking for, that the hardware on your little data center could be built such that things like the hard drive [inaudible] missing RAID type.

Okay good so we could just use sort of different approach to storing our data and rather than just store it on the hard disk as usual, we could use something called RAID. So actually this is good way to tie in the thing we skipped a moment ago. Let me just pull up something to write on here. So Redundant Array of Independent Disks is a technology more succinctly known as RAID. RAID can actually be used in desktop computers these days even though it’s not all that common. Some companies like Dell and Apple make it relatively easy to use RAID on your system, and what does this mean? Well RAID can come in a few different forms. There’s something called RAID zero, there’s something called RAID1, there’s something RAID5, there’s something called RAID6, there’s something called RAID10 and there’s even more, but there are some of the simplest ones to talk about. So all of these technog — all of these variance of RAID assume that you have multiple hard drives in your computer for different purposes potentially. So in the world of RAID1 — RAID zero you typically have two hard drives that are of identical size. Terabyte, two terabytes, 512 gigabytes whatever it is, two identical hard drives and you do what’s called stripe data across them. Whereby every time the Operating System wants to save a file especially big files, it will first write to this drive a bit then to this one, then to this one, then to this one. The motivation being these hard drives are typically large and mechanical with spinning platters like we discussed earlier, and so it might take this guy a little while to write out some number of bits. Now it’s going to be split second in reality but that’s a split second we don’t really have when we’re trying to service lots of users. So striping allows me to write some data here then here, then here then here, then here then here effectively doubling the speed at which I can write files especially large ones to disks. So RAID zero is nice for performance. However, RAID1 gives you a very different property. With RAID1 you still have two hard drives but you mirror data so to speak across them, so that anytime you write out a file you store it both places simultaneously. There’s a bit of performance overhead to writing it in two places albeit in parallel, but the upside now is that either of these drives can die and your data is still perfectly intact, and it’s actually an amazing technology because if even you just have this in your desktop computer, you have two drives. One of them dies just because of bad luck, there’s a defect or its multiple years old and it just up and died, so long as the other one is still working. The theory behind RAID is that you can then run to the store, buy another hard drive that’s at the same size or bigger, plug it into your computer, boot back up, and most typically automatically, the RAID array will rebuild itself whereby all of the data that’s on the remaining drive will copy itself automatically over to the new one and after a few minutes or hours you’re back to a safer place. Whereby now even the other drive can up and die. Sometimes you have to run a command or choose a menu option to induce that but typically it’s automatic. You can do it even sometimes in some machines while the computer is still on so you don’t even have to suffer any downtime, so that’s great. RAID10 is essentially the combination of those two. You typically use four drives and you have both striping and redundancy so you sort of get the best of both worlds, but costs you twice as much cause you need twice as many hard disks. RAID5 and RAID6 are kind of nice middle grounds with RAID1 or nice variance of RAID1 whereby RAID1 is kind of pricy, right? Rather than buy two — one hard drive I literally have to spend twice as much and get two hard drives. RAID5 is a little more versatile whereby I typically have say three drives, four drives, five drives, but only one of them is used for redundancy. So if I get five, one terabyte drives, I have four terabytes of usable space. So I’m only sacrificing one-fifth in that case of my available disk capacity whereas in RAID1 you’re sacrificing one half so 50% of it. So RAID5, you just get better economy at scale whereby you can grow bigger and you still have some redundancy. So in RAID5 if you have three or four or five hard drives in the array, one of them can die, any of them. You run to the store put in a new one and you haven’t lost any data. RAID6 is even better, what does RAID6 do, do you think? Axle?

Well I think two drives can die.

Exactly in RAID6 any two drives can die you still won’t have lost any data, and so long as you’ve run to the store fast enough and put in one or both drives again, you’ll be good to go. Of course the price you pay with RAID6 is literally another hard drive but at least now you can maybe sleep a bit better at night knowing that man, two of my hard drives has to die before I have to really worry about this. So these are really nice technologies and so as Axle proposes here, the upside of using something like that in whatever file server were storing our shared sessions is we can at least decrease the probability of downtime at least related to hard disks. Unfortunately it still has a power cord that someone can trip over or the power supply could die. It still has RAM that could go on the fritz, a motherboard that could die. Any number of things could still happen, but at least we can throw redundancy inside of the confines of a single server and this can definitely help with our uptime and with our robustness. And indeed with actual servers that you would buy for a data center, not so much of a home. It’s very common for computers to have not only multiple hard drives and lots of multiple banks of RAM, and they would often have multiple power supplies as well. And it’s actually a really cool technology there too. If one of your power supplies dies, you can literally pull it out. The machine keeps running, you put in a new one and then it spreads the amperage across two power supplies once both are back up and running, all very hot swappable. Amazing technology these days and as an aside, if you still own a desktop computer, there is no reason you shouldn’t use RAID these days. It is just very good practice since it will allow you to avoid downtime and data loss with higher probability. Okay, but someone tripped over the power code, someone tripped over both power codes in the case of redundant power supplies. So Axle’s solution and even mine with redundant power supplies hasn’t solved the problem of shared storage becoming all of a sudden a single point of failure. So what else could we do to still get the property of shared state. So it doesn’t matter what backend server I end up on but I instead get — I still get to the ability to suffer some downtime. Well shared storage can come in a bunch of different ways, so we talked really about things as a file server but this can be incarnated with very specific technology. And just to rattle them off even though we won’t talk about them in much technical detail, Fiber Channel, FC, is a very fast, a very expensive technology that you can use in offices, in data centers not so much the home to provide very fast shared storage across server. So that’s just one type of file server if you will. ISCSI is another technology that uses IP, Internet Protocol, and uses Ethernet cables to exchange data with servers so that’s a nice somewhat cheaper way of exchange — of having a shared file server that can be used by multiple — actually in the case of ISCSI you typically use it with it single servers. So let me re-track that, that is not a solution to our current cookie problem. But what about MySQL? All right, we use that for a couple of weeks, MySQL seems to be a nice candidate because it’s already a separate server potentially. Could not the backend servers just write their session objects to a database, they definitely could. So just because you’re storing things in — just because we usually store things like user data and user generated and a database, doesn’t mean we can’t store metadata like our cookie information as well, or that too comes from users though. NFS, Network File System, this is just a protocol that you can use to implement the idea that Axle proposed of a shared file system. It just means you’ve got one server and you’re exposing your hard disk to multiple other computers. But again, we haven’t really solved the problem of downtime, so what’s the most obvious way of mitigating the risk that your single file server will go down? Axle?

Keep your copy of the sessions in another location.

Good, right? If it’s not — if you don’t have — if you’re worried about the one file server going down well the obvious solution even though money and some technical complexity will just get two. Now somehow you have to figure out how to sink the two so that one has a copy of the other’s data and vice versa. So let’s actually come back to that issue, it’s generally known as replication but it is something we can potentially achieve. But before we segue to distribution of things, let’s finish out this Load Balancer question. So how do you go about implementing this black box? Well these days you actually have a bunch of options. In software, you can do things relatively easily with a browser, pointing and clicking using things like amazons, elastic load balancer, or a scenario for which we’ll talk about a bit later. HAProxy, High Availability Proxy, is free open source software that you can run on a server that can do a load balancing as well using any — either of the characteristics we discussed earlier around Robin or actually taking load into account somehow. Linux Virtual Server, LVS is another free software, a piece of software you can use and the world of hardware people have made big business out of Load Balancers, Barracuda, Cisco, Citrix, F5 are some of the most popular vendors here. Most of whom are atrociously overpriced for what they do. So case and point like Citrix is a popular company that sells Load Balancers take a guess as to what a Load Balancer might cost you these days? It’s a highly variable range but there’s different models, but take a guess? How much of that black buck cost? Isaac?


Definitely in the thousands, indeed. In fact, we have a small one, so to speak — small one relatively speaking on campus that was $20,000 and guess what? That one is cheap. So you can literally spend on these kind of things granted not, this is what the per cost the weight you’ve read after the semester ends today. But a $100,000 for a Load Balancer or even generally a pair of Load Balancers so that either of them can die and the other one can stay alive. So in the world of enterprise hardware, these ideas we’re talking about are ridiculously priced typically because of support contracts and the like. So just realize software is number one on the list because there are other ways to achieve this much more inexpensively. Indeed for years one of the courses I teach we used HAProxy to balance load because it was so relatively easy to set up and 100$ free. So realize the same ideas can be both bought and set up on your own quite readily these days. All right let’s pause here and when we come back we’ll take a look at some issues of like caching, of replication in databases and also how we can speed up PHP a bit. Let’s take our five minute break here. All right we’re back and I almost forgot, we have one other solution to this problem of the need for sticky sessions. Sticky sessions meaning that when you visit a website multiple times, your session is somehow preserved even if there’re multiple backend servers. So shared storage was the idea we really vetted quite a bit and we didn’t quite get to a perfect solution since even though we factored out the storage, and put everyone’s cookies or session objects on the same server, it feels like we need some redundancy but we’ll come back to that in the context of MySQL in just a bit. But what about cookies? I propose that cookies themselves could offer a solution to the problem of sticky sessions and again sticky sessions means even if you visit a website multiple times, you’re still going to end up with the same session object, or more specifically, you’re still going to end up on the same backend server, Axle?

Either of the — store which server you want to go to in a cookie.


Or store everything in cookies but that’s not a good solution.

Store or — okay, yes, so storing everything in cookies is probably bad because one then it’s really starting to violate privacy because rather than store a big a key, you’re going to store like the ISPNs of all the books in your shopping cart and that might be fine but feels like your roommates and family members don’t need to know what is in your cookies. Moreover cookies typically have a finite size of a few kilobytes, so there’s definitely going to be circumstances in which you just can’t fit everything you want to in the cookie. So you know an interesting idea but probably not the best, so you could store the ID of a server in a cookie so that the user, the second, and third, and fourth times, they visit your website as by following links or coming back some other time, they are going to present the equivalent of a hand stamp saying, “Hey I was on backend server 1, send me there again.” So that’s a pretty nice idea, there is one, at least one downside here, what do you like — what do you not like potentially about this idea of storing inner cookie that gets put on user’s browser, that they subsequently transmit back to you the number of, or the ID of the server to which they should be sent. Axle?

Well from runner expiration.

Expiration in what sense?

All of cookie expires after [inaudible]

Okay, so eventually cookie is going to expire though, we saw a couple lectures just ago, we couldn’t make it expire in 10 years if we really wanted to and frankly, we’re never going to avoid that even if we had a single server, cookies could eventually expire. So at least that’s not a new problem, so I’m not too worried about expiration now, because that’s not a problem new to us simply because of load balancing. Does anything not feel right about storing the idea of the server in the cookie?

What if the IP changes?

Yeah, so if we just put like the backend IP so the private IP address in the cookie, you know what of the IP changes, what if that — what if the IP changes, so that’s a little problematic and it’s also one of these principle things like, you don’t really need to reveal to the world what your IP address scheme is, it’s not necessarily something they could exploit but it’s just the whole world doesn’t need to know that. Moreover we can implement the same idea by still storing a cookie on the user’s computer but why don’t we take the PHP approach of let’s just store a big random number and then have the load balancer remember that that big random number belongs to server one. And this other big random number belongs to server two and so forth. So a little more work on the load balancer but in this way then, we’re really not putting any states that might change or might be a little privacy revealing on the actual user’s computer. Moreover we also take away the ability for them to spoof that cookie just to get access to some other server. Now whether or not they could anything with that trick is unclear but at least we take away that ability altogether so there’s no surprises. All right, so cookies indeed are something that these black boxes of — so load balancers tend to do whereby you can configure them to insert a cookie themselves, it doesn’t just have to be a backend web server that generates cookies, the load balancer similarly could be inserting a cookie with the set cookie header that the end user then subsequently sends back so that we can remember what backend server to send the user to. Now if the user has cookies disabled well then this whole system breaks down but again so does a lot of functionality we’ve discussed this far this semester but there are sometimes some workarounds. So a word on PHP, PHP in an interpreted languages is in general tend to get a bad rep for performance because they tend not to be as a high as performing as a compiled language like C plus plus or C or the like. However there are some ways to mitigate this, there is notion of PHP Acceleration whereby you can run a PHP program, the source code through php.exe, the interpreter on the system and it turns out that PHP does typically compile that file in a sense, down to something that’s more efficiently executed, much like Java complied something down to something called Bi code. But you typically — PHP throws the results of that compilation away. Whereby it does it again and again for every subsequent request, however with very simple — with relatively straight — forward and free the available software, you can install a PHP Accelerator here in just four possibilities that essentially eliminate that discarding of the PHP Opcodes and instead keep them around. So in other words the first time someone visits your site, the PHP file is going to be interpreted and some Opcodes for performance generated but they are not going to be thrown away because the next time you or someone else visits the site, that PHP file is not going have to be reparsed and reinterpreted, the Opcodes are just going to be executed, so you get the benefit of some added performance. Now the only got you is if you ever change any of your dot php files, you have to throw away the cached Opcodes but these various tools typically do that for you. Python has a similar mechanism where you’ll get dot py files or your source code files but dot pyc files are the complied versions that can be executed more quickly, so the same idea is at play here. So this one of these things that is relatively easy and free to enable, and gives you all the more performance, specifically the ability to handle all the more requests per seconds in the context of a PHP based website. So what about caching too? Caching in general is a great thing, it solves some of our DNS concerns early on but it introduced others because caching can be a bad thing if some value has changed but you have the old one. But Caching can be implemented in the context of dynamic websites in a few different ways, so I propose that through HTML, through MySQL, and through something called memcached, we can achieve some caching benefits here. So this is a screen shot of one of the most 1990s websites out there and this was not even taken in the 1990s, this was taken a couple of years ago and I just visited out of curiosity craigslist today, it still looks the same. So what’s interesting about Craigslist is that it is a dynamic website and that you can fill out a form and post a for sale ad or roommate ad or of the like, and the website does actually change but it’s — if we zoom in on this and it’s going to be a little blurry because of the screen shot, the URL that’s up there is actually dot html even though it’s barely readable at this resolution. Which is to suggest that craigslist is apparently accepting user input through forms for instance whoever wrote up this job advertisement sometime ago but then Craigslist it’s speeding it out as a dot html file as opposed to storing it where or in what? Axle?

I suppose to storing the data that [inaudible] page in the database then printing it in PHP.

Yeah so this is in stark contrast of what we’ve done for project zero, project one, we’re using PHP as the backend whereby you store data like this, like server side and maybe an XML file or more realistically in a MySQL database or similar, and then you generate a page like this dynamically. So why is Craigslist doing this apparently? Could just be they are stuck in the ’90s but there’s compelling reason too, Axle?

Well if they store the actual HTML file then they don’t have to regenerate it every time they it’s visited.

Yeah exactly, if they are storing the html file they just don’t have to regenerate it every time it’s re visited, so this itself is caching, it’s not caching in any particularly fancy way, you’re just generating the html and saving it in something called like something.html and storing it on disk. And the upside of this is that web servers like Apache, you’re really, really, really good and fast at just spitting out bits, just spitting raw static content like a GIF, a JPEG, and HTML file but the performance optimization these days generally relate to the languages like PHP, and PYTHON, and RUBY where you’re trying to fine-tune performance, but if all you have to do is respond to a TCPIP request with a bunch of bytes from disk. That’s relatively straight — forward these days and so they’re taking advantage of the performance presumably of serving up static content but this comes at a cost. What’s the downside of this file based caching approach? Someone else? Nothing we’ve this far is sort of a complete win, there’s always a got you, Louis?


Okay so space, so we’re storing it on disk and internet, if you either post it on Craigslist, they are also storing it somewhere in a database because they do let go back and edit it, it’s just Craigslist is one of the sites where reads are probably much more common than writes. Indeed when people visit craigslist, they are probably flipping through reading pages as opposed to posting lots, and lots, and lots of ads all at the same time, so there some redundancy there that’s unfortunate. Axle?

Yeah that to — just to build upon that, you will be storing lots of code that you use on every page over again, plus it’s not a very elegant solution to have it fixed on your server containing 10,000 files.

Okay good so there’s redundancy while actually all these thousands of files, there’s redundancy too just in the basic stuff. Like you have the same HTML tag, the same body tag, the same link tag, the same grip tag in every single page if they’re indeed just static HTML files. So whereas, you get some benefits of using something like PHP and recall MVC discussion where we factored out template code, like the header and the footer, so that we stored it in one place and not thousands of places. Craigslist is kind of sacrificing that feature and going with this instead, so in the end it’s probably a calculated trade off, you get much better performance presumably from just serving up the static content, but the price you pay is more disk space but at the same time, you know for a few hundred dollars you can typically get even bigger hard drives these days, so maybe that’s actually the lesser of the evils. But there is one — there is another big got you here, if you’ve generated tens of thousands of craigslist pages that look like this, what’s the implication now and maybe why are they stuck in the ’90s?

Well I decide to add — well change the background color.


Some change the design part of it which is necessary but you can’t do that without editing each one of those ten thousand files and that could be automatically but it’s much harder than just editing the generic template that generates it.

Exactly, if you want to change the aesthetics of the page and add a background color, change the CSS or make it the font other than Times New Roman, it’s nontrivial now because assuming this is a fully intact HTML file with no server side include mechanism, no like require mechanism like you have in PHP. You have to now change the background color in tens of thousands of files, unless maybe you at least put in a CSS file but even then if it’s a less trivial change than color. Suppose you want to restructure the HTML of the page, then you really have to do a massive find and replace or more realistically probably regenerate all ten thousand plus pages and I’m — we latch on to ten thousand arbitrarily but there’s a lot of pages in this case. So upsides and downsides, they’re one of the few people in the internet who do this particular approach but it does have some value. And I think the last time I read up on statics they get by with relatively little hardware as a result which is definitely compelling. So MySQL query cache, this is a mechanism that we didn’t use, but it’s so easily enabled on a typical server with MySQL, there’s a file called my.cnf for your configuration file. And you can simply add a directive like query cached type equals 1, and then re start the server to enable the query cache which pretty much does what it says. The — if you execute a command like select foo from bars where bars equal 1, 2, 3, that would be — could be slow the first time you execute it if you don’t have an index, or if you have a really huge table. But the next time you execute it, if the query cache is on and if that row hasn’t changed, the response is going to come back much more quickly. So MySQL provides this kind of caching for identically executed queries which might certainly happen a lot if the user is navigating your website going forward and back quite a bit. Memcached is an even more powerful mechanism; Facebook has made great use of this over the years especially initially. Memcached is a memory cache, so it is a piece of software, a server that you run on a server, it can be on the same server as your web server, it can be on a different box altogether, but it essentially is a mechanism that stores whatever you want in RAM. And it does this with — in the PHP context with codes like this, so this Memcached can be used by all sorts of languages. Here is PHP zone interface to it and you use Memcached as follows, you first connect to the Memcached server using Memcached connect which is very similar in spirit to MySQL connect which you might recall from a few lectures back. Then we try in this example to get a user, so the context here is, it’s pretty expensive to do select star from users on my database table because I’ve got millions of users in this table and I’d really rather not execute that query more often than I have to. I’d rather execute it once, save the results in RAM and the next time I need that user, go into the RAM, go into the cache to get that user rather than touching the database. So there’s this sort of tier performance objectives, disk is slow, right spinning disks are specially slow, but faster serve up so generally you might want to store something instead of on disk, you want to store it in a table that has indexes so that you can search it more quickly. For instance think back to project zero, the XML file is relatively slow but at the same time, anytime you wanted to search it, you had to load it from disk, build a dome thanks to the simple XML API then search it, kind of annoying. It would be nice if we could skip the disk step so that things would just be faster and thus, was born MySQL on project one, MySQL is a server which means it’s always running, it’s using some RAM so in that case you have the ability to execute queries on a data that’s hopefully in RAM but if it isn’t you at least have the opportunity to define indexes, primary keys, unique keys, index fields, so that at least you can search that data more readily than you can with say XPath in XML. So the next step is not even to use a database because database queries can be expensive relative to just a cache which is just a key value store. I want to give you x equals y and the next time I ask for x you give me y and I want it quick, much faster than a database would return it. So here we’ve gone and connected to the memory cache daemon, the sever, and the second line I am trying to get something from the cache, the arguments the Memcached get take the first argument is a reference to the cache that you want to grab something from, and then $ ID just represents something like 1, 2, 3, the ID of the user that I want to get. If the user is null, what’s the implication apparently, Isaac?

Well you do the query all over again.

Okay good but in what case would user be null do you think?

When they don’t exist.

Good when they’re not in the cache, when user 1, 2, 3 or whoever I’m looking for is not in the cache, that variable is going to be null and so we do this IF condition as Isaac said, and here there are somewhat familiar code, PDO which relates to MySQL in our case. We connect to the database using that user and path, we then execute the query function in PDO, in this case select star from users where ID equals ID. I’m not using my — I’m not escaping ID because in this case, I’m assuming I know it’s an integer so it’s not a dangerous string just to be clear. Then I’m calling fetch to get back an associative array of my data, the users name, email address, ID, whatever else I’ve got in my database but then the last thing I do before apparently nothing else because this is out of context, before actually using that user for anything, what am I doing with him? Axle?

You’re storing it in the caching memory.

Exactly I’m storing a key value pair in the cache whereby the key is the user’s ID, so this implies that there’s an ID field in the user object that came back from the database, from this line here and the value is the user object itself. So again a Memcache in this case is a key value storage mechanism and the next time I want to look up this user, I want to look him up by his ID and case in point, that’s what I did in line 2 up top. Now caches are finite because RAM is finite and even disk is finite, so what could happen eventually with my cache? Just by nature of that — those constraints, Axle?

It gets so big you can’t keep it on the machine.

Good so eventually the cache could get so big you can’t it keep on the machines, so what would be a reasonable thing to do at that point, when you’ve run out of RAM or disk space for cache? You’re the person implementing Memcached itself now, what do you do in that case, you could just kind of quit unexpected error but would kind of be bad and completely unnecessary. What could you do? Isaac? [ Inaudible Speaker ] I’m sorry.

Garbage collection.

Yeah so some kind of garbage collection and which things would you collect? What things would you remove from memory? [ Inaudible Speaker ] Good, so we can essentially expire objects based on when they were put in. So if I put in user 1, 2, 3 yesterday and I haven’t touched him since or needed him since and I need more space, well out goes user 1, 2,3 and I can reuse that space, that memory for user 4, 5, 6 if user 4, 5, 6 is the next person I’m trying to insert into the cache. So indeed this is a very common mechanism whereby the first one is the first one out if that object has not been needed since. By contrast if 1, 2, 3 is just one of these power users who’s logging in quite a bit and he — he or she is logging and again, and again, and again, well I should remember every time and we don’t see it in the code here. But every time I get a cache hit and I actually find user 1, 2, 3 in the cache, I could somehow execute another Memcache function that just touches the user object so to speak. Thereby updating his timestamp to be this moment in time so that you remember that he was just selected and hopefully Memcache get itself would do that for us and indeed it does. I don’t need to this manually, the cache software would remember, Oh, you asked for user 1, 2, 3 I should move — probably move him back to the front of the line so that the person at the end of the line is first one to get evicted next time around. So it’s a wonderfully useful mechanism and Facebook is very read heavy or very write heavy if you’re a user. It’s kind of both these days, you know earlier on it was much more read heavy than write heavy because there were no like status updates and you would just have your profile and that was about it. So these days there’s definitely more writes but I’m going to guess that reads are still more common than not, right? When you log — if you’re a Facebook user and you log into your accounts and you see your newsfeed, you know, you might have 10, 20 whatever friends show up in that newsfeed, that’s potentially like 10 or 20 queries of some sort and yet you ‘re probably not going to update your status 30 times at — in that same unit of time. So odds are Facebook is still a little more read heavy which makes caches is all the more compelling because if your own profile isn’t changing all that often at least you might get 10 page used, 100 page used by friends or random strangers before you actually update your status or your profile again. That’s an opportunity for optimization, so early on and to this day, Facebook uses things like Memcashe quite a bit so that they’re not hitting their various databases just to generate your profile. They’re instead just getting the results of some previous lookup unless it has since expired. Well onto my MySQL optimization so that you can squeeze all the more performance out of your set up. So this table’s a little more overwhelming right now than it needs to be but recall our discussion about MySQL storage engine some time ago. And we talked briefly about my ISAM and NODB; does anyone remember at least one of the distinguishing characteristics of those two storage engines? And again the storage engine was just like the underline format that was used to store your database data.

I think NODB supported transactions into [inaudible].

Good, so NODB which is the default these days so you haven’t really needed to think about this much since project one. NODB supports transactions whereas MyISAM does not. MyISAM uses locks which are full table locks but does tend to have some other properties and these — this list here is a very long list of various distinctions among these several storage engines. Transactions is one of them but there’s a few other storage engines here that I thought I would just draw our attention to. So one, you have a memory engine otherwise known as a heap engine. This is a table that’s intentionally only store in RAM which means if you lose power, server dies or what not the entire contents of these memory tables will be lost. But still kind of a nice feature if you yourself want to implement a cache relatively easily by writing keys and values into your database and to two columns maybe, you yourself can implement some kind of cache to avoid having to touch maybe much larger tables that you yourself have. So that’s an option to you. Archive Storage Engine, haven’t had to use this but take a guess as to what it does besides archiving something. What does this engine do for you, do you think?

It doesn’t seem to be doing very much, it probably just stores [inaudible].

And what — what was the last sentence, the last part of your comment?

It doesn’t store anything in cache [inaudible].

Oh, it doesn’t store in anything in cache, you have to query it all the time. Not quite. So the property you’re actually getting and you can kind of see it here, but there’s some footnotes on the other storage engines, is it’s compressed by default. So archive tables are actually slower to query, but they’re automatically compressed for you, so they take up much, much less space. So a common case — common use case for archive tables might be log files, where you want to keep the data around and you want to write out a whole bunch of key values in a row. Every time someone hits your web server or anytime something buys someone, but suppose you rarely query that data, you’re keeping it for posterity, for research purposes, for diagnostic purposes, but you’re not going to do any selects on it anytime soon. So it would just be a waste to keep — use more disk space than you need to, so you’re willing to sacrifice some future performance when you do need to query it for some long-term disk saving. So, the archive format allows you to do that. NDB is a network storage engine, which is used for clustering. So that actually there is a way of redressing the issue of single points of failure that we discussed earlier with shared storage, but we’ll see a simpler approach in just a moment. So, in the world of databases, like MySQL, they typically offer this replication feature that I mentioned earlier. So replication is all about making automatic copies of something and the terminology generally goes as follows. You generally have a master database, which is where you read data from and write data to, but just for good measure, that master has one or more slave databases attached to it by a network connection, and their purpose in life is to get a copy of row that’s in the master database. You can think of it rather simply as anytime a query is executed on the master, that same query is copied down to one or more slaves and they do the exact same thing. So that in theory, the master in all of these slaves, are identical to one another. So what’s the upside now of having databases one, two, three and four, all of which are copies of one another apparently? What problems does this solve for us, if any? Axle?

Well, database one dies because somebody trips over the cord, but you still have three copies.

Good, so if something — if database one dies because of human error, you trip over the cord, hard drive dies, RAM fizzles out, whatever the case may be, you have literally three backups that are identical, so there’s no tapes involved, there’s no backup server. I mean, these are full-fledged databases and in the simplest case, you could just unplug the master, plug in the slave and viola, you now have a new master. And you might have to do a bit of reconfiguration in the databases to make — to promote him to master, so to speak and then leave servers three and four as the new slaves while you fix server number one. But that would be one approach here, so you have some redundancy, even though you might have a little bit of downtime, at least you can get back up and running quickly. And indeed, you can automate this process if you notice that the master is down, you could take him offline completely, promote the slave and reconfigure them all just by writing a script. What else — how else can we take advantage of this topology? Let me ask a more leading question. In the context of Facebook, especially early days, how might they in particular have made good use of this topology? Axle?

Well, maybe if you get a lot of the — a lot of queries, you can outsource them to different slaves.

Okay, so if you’re getting a lot of queries, you could, you know, maybe you could just load balance across database servers, and absolutely you could. The load balancers don’t have to be used for HDP alone. You could use it for MySQL traffic, but why do I say Facebook in particular? Early on they didn’t get that many queries, but it — this was still a good paradigm for them. [ Inaudible Speaker ] Yeah, why is this good, perhaps? So, back to my hypothesis that they’re more read heavy than write heavy. How can you adapt that — that reality to this particular topology effectively? Or put another way, why is this a good topology for a website that is very read heavy and less write heavy? Ben? [ Inaudible Speaker ] Okay.

But if you’re not writing [inaudible].

Okay, good. So reading can be expedited, so if we kind of combine Ben and Axles’ proposals here, for a read heavy website like Facebook, certainly in the early days, you could just write your code in such a way that any select statements go to databases two, three or four and any inserts, updates or deletes have to go apparently to server one, which even though that query than has to propagate to servers two, three and four, it is less common. And that happens automatically, so the code wise, you don’t have to worry about it too much and if you’re suffering a bit of performance there, well you can just throw more servers at it and have even more read servers to lighten the load further. So this approach of having slaves that can typically be used either for redundancy, so you just have a hot spare ready to go or so that you can balance read requests across them is a very nice solution, but of course, every time we solve one problem, we’ve introduced another or at least we haven’t fixed yet another here. What is a — what is a fault in this layout still? Be paranoid. Kind of talked about it earlier, but like what if one dies, right? Unless this is – there’s got to be some blip on the radar here because we have to like promote a slave and, right so, you still have a single point of failure here, at least for writes. We could keep Facebook alive by letting people browse profiles and read profiles, but status updates for instance, could be offline for as long as it takes us to promote a slave to a master. Feels like it’d be nicer or at least our probability would be better of uptime if we instead had not just a single master, but again, let’s just throw hardware at the problem. So another common paradigm is actually to have a master-master setup whereby as the labels imply and as the arrows suggest, this time you could write to either server one or two, and if you happen to write to server one, that query gets replicated on server two and vice versa. So now, you could keep it simple. You could always write to one, but then the query goes to number two automatically or you could write to either, thereby load balancing across the two and they’ll propagate between each other. But in this case here, if you’ve laid out your network connections properly, either one or two can go down, and you still have a master that you could read from, and you could even implement this in code. Recall very simply, we had the MySQL connect function weeks ago, or even the PDO constructor function, which tries to connect to a database. You can implement this in PHP code if MySQL connect fails when connecting to server one, then just try server two. So you yourself could build in some redundancy so that now we can lose server one or two and not have to intervene as humans just yet because we at least still have a second master that we can continue writing to even though server one is now offline for some amount of time. All right, but we still have to route traffic there, so in pursuit of this idea of load balancing, here’s a more complex picture that starts to unite some of our web ideas and some of our database ideas. So at the top there we have some kind of network. We have a load balancer in between and then we have this front-end tier. So web servers are typically called a tier, a service tier, this is a multi-tiered architecture, would be the jargon here. And those web servers now apparently are routing their requests through what in order to reach some MySQL slaves for reads? Who’s the man in the middle here? Yeah, Axle?

For reads with a load balancer.

Okay, so for reads, yeah, we have a second load balancer depicted here, frankly in reality they could be one in the same. They could be the same device, but just listening for different types of connections. But for now, they’re drawn more simply as separate. Now we have one MySQL master, so we also have wires or arrows pointing from the web servers to the master. And the master meanwhile, has some kind of connection to the slaves. So not bad, no frankly, this is starting to hurt my brain because now what was a very simple class where you have a nice self contained appliance on your machine — on your laptop, does everything, web, database, caching, anything you want it to do. My God, look at all the things we have to wire up now and it’s still now perfect. What could die here? Where — what are our single points of failure? Jack? [ Inaudible Speaker ] Oh, Isaac, sure.

All right, MySQL master.

Okay, so the MySQL master. We haven’t really solved that problem, so kind of be nice to steal part of the previous picture and maybe insert it into here. Jack?

Oh, I was [inaudible].

Same thing. Axle?

[Inaudible] load balancer.

Load balancers, right. So single point of failure is very well defined, like single point of failure, just look for any — any bottlenecks here, whereby things point in and then go out. Load balancer is one here, load balancer is one there, so it turns out with load balancers for your $100,000, you can get two of them typically in the package. And what they tend to do is operate also on what’s called, similar in spirit to master-master mode, but in the context of load balancers, it’s typically called active-active, as opposed to active-passive. And the idea here is with active-active, you have a pair of load balancers that are constantly listening for connections. Either one of which can receive packets from the outside world and then relay them to backend servers. And what they typically do is they send heartbeats from left to right and right to left so that if this guy ever stops hearing a heart beat from this guy, so to speak and a heartbeat is just like a packet that gets sent every second or something like that. If this guy stops hearing a heartbeat from this guy, he automatically assumes that this guy must have gone offline so he’s completely in charge now and he continues to send traffic from the outside world in, or if you instead have active-passive mode, if this is the active guy at the moment, rather if this is the active guy at the moment and he dies, this guy similarly detects, “Ooh, no more heartbeat.” And what the passive guy will do is promote himself to active, which essentially just means he takes over the other guys IP address so that all traffic now comes to him. So in short, we definitely need another load balancer in the picture, how it’s implemented is — is not as important to us right now, but having a single load balancer is probably a bad thing, right. And this is the tragedy, you can throw money at, you can throw a lot of brain power at various tiers here, but if you have a lot of web servers, a lot of MySQL servers, but you have one load balancer just because it was really expensive or you didn’t know how to configure it properly, like the rest of it is pretty much for naught because you still have things that can die and take down your entire website. So let’s make this more complex still. So let’s now introduce two load balancers and let’s actually introduce an idea of partitioning. And this was actually something that Facebook coincidentally did make good use of early on. Back in the day, there was Harvard.facebook. — Harvard.thefacebook.com, there was mit.thefacebook.com and the earliest partitioning that they used was to essentially have a different server, as best outsiders could tell for each school. So they literally just like copied the database, copied the files over to another server and then viola, thus was born MIT’s copy of Facebook. But this is actually, even though this would get kind of messy for 800 million users and thousands and thousands of universities and networks, it’s pretty clean early on because it’s just to leverage this idea of partitioning, right, it’s kind of — we didn’t have a — Facebook didn’t have a big enough server to handle Harvard and MIT, so why not just get two and say Harvard users go here, MIT users go here and now we’ve kind of avoided that problem. Now unfortunately, when BU comes on, we need a third server, but at least we can scale horizontally. Now there is a catch with partitioning, as soon as you wanted to be able to poke someone at MIT or vice versa, you had to somehow cross that Harvard/MIT boundary at which point it’s kind of a bad thing that they’re all in separate databases. So early on, there were some features that you could only do within your own network, not until there was more share state could you send messages and the like. So partitioning though could be used even more simply. Suppose that you just had a whole bunch of users, well you need to scale your architecture horizontally, why don’t I just put users who’s last names start with a to m on half of my servers and then n through z on the others, right. And when they log in, I just send them to one or the other server based on that. So in general, partitioning is not such a bad idea, it’s very common in databases because you can still have redundancy, whole bunch of slaves in this case here, whole bunch of slaves over here, but you can balance load based on some high level user information, not based on load, not round robin, you can actually take into account what someone’s name is and then send them to this particular server. So partitioning is a very common paradigm. And then lastly just to slap a word on it, high availability refers to what we described in the context of load balancers, but it can apply to databases as well. Whereby high availability or HA is the buzzword, simply refers to some kind of relationship between a pair or more of servers that are somehow checking each other’s heartbeats, so that if one of them dies, the other takes on the entire burden of the — of the service that’s being provided, whether a database or whether a load balancer. So, even though we finally got the iPad working, it’s a little small to draw on, so what I wanted to do as our final example here, is let me raise the screen here [inaudible] one of these buttons will do it. All right, we’re going old school now. Our first and last piece of chalk in the class, let’s see. We’ll start with the middle one. All right, let’s build ourselves a network here. So, we have a need for one or more web servers, one or more databases, maybe some load balancers, but we’re also going to try to tie together last week’s conversation about security, so we’ll have to think about firewalling things out now. So in very simple form, we have here a web server, which I’ll draw as www. All right, so that’s our web server. And now my website’s doing so well that I need a second web server, so I’m going to draw it like this and now we need to revisit the issue of balancing load. So what felt like one of our best options here? How do I still have the internet, which I’ll draw as a cloud here, connected to both of these servers somehow, but I want the property of sticky sessions. So what are my options or what was my best option? How do I implement sticky sessions? Axle?

Use a load balancer and keep all the sessions [inaudible].

Okay, good. So use a load balancer and store all the sessions in one place. And that’s not — okay, so we can actually do one, but not necessarily both of those. Let me interpose now the thing we started calling black box. So this is some kind load balancer, now I still have my backend servers. And here’s another one here. Okay. And now this is connected here, but I still want sticky sessions, but you know what? Shared states, that sounded expensive, fiber channel [inaudible] sounded complicated, there’s simpler way. How do I ensure that I get sticky sessions using only a load balancer and no shared state yet? How can I ensure that when Alice comes in and she’s sent to this server the first time, that the next time she comes in, she’s sent to the same one? Axle?

Remember which server she visits either for [inaudible].

Okay, good. So why don’t we have the load balancer listen at the HDP level and when the response comes back from the first web server, let’s give them numbers. So let’s call this one, and this guy two — the load balancer can insert some kind of cookie that allows it to remember that this user belongs on server one. And how it does that, I don’t know, it’s a big random number and it’s got a table like PHP does for its sessions and figures out which server to send her to in this case. All right, so now I have a database. The easiest way I know how to set up a database is to put it on the web server itself. So much like the CS50 appliance, you have a database in the same box as a web server. If I now have a database here and here on the same boxes as our web servers, what’s the most obvious problem that now arises? Yeah?

Well, Alice’s shopping cart is — oh, well if she does something to her profile or whatever —


— it’s just going to be on server one and not on server two since she just visited server one.

Exactly. So if Alice just happens to end up on server one and she updates her profile, her credit card information or something persistent. Not the shopping cart thing because that involves the session, but she does something persistent, it’s going to persist on this database and that’s fine because sticky sessions are solving all my problems now. But she comes back in a week or she logs in from a different computer, cookie expires, whatever, and she ends up here — what happened to my credit card information? What happened to my profile? I now have no profile because I’m on a different database. So clearly, this is not going to fly unless we partition our users and have the load balancer actually take into account who is this user and then send the user based on Alice’s last name always to the same servers, so that could be one approach. But for now, let’s instead factor out the database and say that it’s not on the web servers, it’s separate and it’s got some kind of internet connection here. Of course we’ve solved one problem, but introduced a new one, which is what? Isaac?

Single point of failure [inaudible].

Yeah. So single point of failure again. So how can we mitigate this? Well, we can do a couple of things. We can attach slave databases off of this, and that’s kind of nice. But it then involves somehow promoting a slave to a master in the even one dies. So maybe the cleanest approach would be something like two master databases, so we’ll call this db1, this will be db2. And now, how do I want to do this, connect these like this. Isaac, you shook your head.

Well, you could connect the two databases to each other.

Okay, so we should probably do this for master-master replication. So that’s good. But what about these lines — good, bad? Answer is bad. Why bad, Jack?

Well, you can connect each one to both of the [inaudible].

Right, so the problem we just identified was the database on same server is web server bad because then it’s talking only to it and if Alice ends up on the other servers, she has no data that you’re expecting. Well, functionally this is equivalent. I’ve just drawn a line, but they’re still only connected to each other and the traffic should probably not go like that, so we at least need to have some kind of cross connect. So okay, so I can do this, but now what do I do? So now my load balancing has to be done in code, if those are the only components in my systems right now, the line suggests that dub, dub, dub one has a network connection to db1 and db2, but that means now I have to do something like an if condition only in my PHP code to say if this database is up right here, else if this database is up right there. And that’s not bad, but now you’re developers have to know something about the topology. If you ever introduce a third master or something like that, although MySQL though wouldn’t play nicely with that, now you have to change your code. This is not a nice layer of abstraction, so how else could we solve this, Axle? I don’t like the idea of connecting each of my web servers to the database because frankly, you know what, this is going to get really ugly if it starts looking like this, right. Very quickly this degrades into a mess. Yeah?

Get another machine, load balancer [inaudible].

Okay, good.

[Inaudible] all of that, so connect your — your dub, dub, dub servers to a load balancer, then let it handle the request. And then you can implement all kinds of features like say, they are both masters, right?


They have the same data.


What if only users with last name that starts with M logs in, well, you’re going to have a load on one particular server, but then the load balancer can take all those features we talked about —


— CPU cycles and all of that, and actually distribute MySQL queries across databases.

Okay, good. So we insert a load balancer here, which is connected to both the dub, dub, dub machines and also the database servers. And then he can be responsible for load balancing across the two masters. It’s actually harder for the database to do any kind of intelligence load balancing based on last names at this point, since the MySQL traffic is going to operate with binary messages, not with http style textual messages. The load balancer up here can look at HDP headers and make intelligent decisions. It’s harder, maybe not impossible, but it wouldn’t be very common to do load balancing based on application layer intelligence here. You would probably push that to the PHP code again in that case. But this isn’t bad, but Isaac doesn’t like this picture now because of what?

It still fails [inaudible].

Yeah, so we still have the single point of failure, so you just cost me even more money or more complexity, even if I’m using free software. This just takes more time now. So now we have load balancer one, load balancer two, I need to do something like this. And even though this looks a little ridiculous, well, actually it’s a little elegant. That’s pretty sexy. So, you would do this with switches or some kind of Ethernet cables all going into some central source. So suppose instead we actually did that? If you’ve ever plugged in a computer into a network jack, which most of you probably have even if you have a laptop, you don’t connect these computers all to themselves. You instead connect them to like a big switch that has lots of Ethernet ports that you can plug into. But now Isaac, what — what do you not like about this idea if I’m plugging everything into a switch?

It still fails.

Yeah, so welcome to the world of like, network redundancy. So really the right way to do this is to have two switches, so almost everyone of your servers, database and web alike, as well as you load balancers, would typically have at least two Ethernet jacks in them. And one cable would go to one switch, and another cable would go to the other switch. You have to be super careful not to create loops of some sort. So switches have to be somewhat intelligent typically so that you don’t create this crazy mess where traffic is just bouncing and bouncing around in your internal network and nothing’s getting in or out. So there’s — there’s some care that has to be taken, but in general, this is really the theme in ensuring that you have not only scalability, but redundancy and higher probabilities of uptime and resilience against failure. You really do start cross connecting many different things. But let’s push harder. Isaac, what — suppose I fix the switch issue, suppose I also make this two load balancers and fix that issue. What’s something else that could fail now? Can’t do this on an iPad very well. So, this is your datacenter. Here is the door to your datacenter. Jack?

The building burns down.

[Inaudible] good, more extreme that I had in mind. I was thinking the power goes out, but that works too. So the building itself burns down or goes offline. You have some kind of network disconnect between you and your ISP, the whole building or the power indeed does go out. And this has happened. In fact, one of the things that happens every time Amazon goes out, is the whole world starts to think that Cloud’s cloud computing, so to speak, is a — the — a bad thing because, “Oh, my God. Look, you can’t keep the cloud up.” But the tragedy here is in this perception, that cloud computing really just refers to outsourcing of services and sharing resources like power and networking and security and so forth across multiple customers. So Amazon services EC2, elastic compute cloud, is kind of this picture here whereby you don’t own the servers, but you do rent space on them, because they give you VPSes that happen to be housed inside of this building. Amazon offers things called availability zones, whereby this might be an availability zone called US East one, so this is a building in Virginia, in that particular case. And what they offer though is US East two and three and four, though they call them A and B and C and D. And what that simply means in theory is that there’s another building like this, drawn over there that does not share the same power source, does not share the same networking cables. And so, even if something goes wrong in one building, in theory the other shouldn’t be affected. However, Amazon has suffered outages in multiple availability zones, multiple datacenters, so in addition to having servers in Virginia, guess where else they have servers?

Anywhere else.

Okay, anywhere else, Yeah, that’s actually a pretty hard question, the world’s a big place. So the West Coast and in Asia, and in South America and in Europe, as well. They have different regions as they call them inside of which are different datacenters or availability zones. But this just means that you can really drive yourself crazy thinking through all the possible failure scenarios, because even though Jack’s building burning down is a little extreme, things like that do happen, right. If you have a massive storm, like a tornado or a hurricane that just knocks out power, absolutely could a whole building goes down — go down. So what do you do in that case? Well, we have to have a second datacenter or availability zone. I’ll draw it much smaller this time, even though it might be this — physically the same. So here’s another one. Suppose that inside of this building is exactly that same topology, so now really what we have is the internet outside these boxes connecting to both buildings. So — so internet is no longer inside the building. So, once you have two datacenters, how do you now distribute your load across two datacenters? Axle?

Well, then you can use the DNS [inaudible].

Yeah, so we — we didn’t really spend much time on it, but recall that you can do load balancing at the DNS level. And this is indeed how you can do geography based geolocate — geography based load balancing, whereby now when someone on the internet requests the IP address of something.com, they might get the IP address really of this building or more specifically, of the load balancer in this building. Or they might get the IP address of load balancer in this building, right. When we did the [inaudible] lookup on Google, we got a whole bunch of results. That’s not because they have one building with lots of load balancers inside of it. That’s because they probably have lots of separate buildings or datacenters, different countries even, that themselves have different entry points, different IP addresses. So you have global load balancing, as it’s typically called. Then the request comes into a building, and you still have the issue of somehow making sure that subsequent traffic gets to the same place, because odds are Google is not sharing your session across entirely — entirely different continents. Even though, they could be, but that would probably be expensive or slow to do. So odds are you’re going to stay in that building for some amount of time, but again, these ideas we’ve been talking about just get magnified the bigger and bigger you start to think. And even then, you have potential downtime because if a whole building goes offline and your browser or your computer happens to of cached the IP address of that building, that datacenter, could take some minutes or some hours for your TTL to expire at which point you get rerouted to something else. Not too long ago, just a few weeks, I think Core [phonetic] was offline for several hours one night because they Amazon. A bunch of other popular websites too, who use Amazon services were down altogether because they were in a building or a set of buildings that suffered this kind of downtime. And it’s hard. Like if you are having the fortunate problem of way too many users and lots of revenue, it gets harder and harder to actually scale things out globally. So, typically people do what they can, but Isaac has gotten very good at pointing out, you can at least avoid as best as possible these kinds of single points of failure. Questions? So a word on security then. Let’s focus only on this picture, not so much on the building. What kind of traffic now needs to be allowed in and out of the building? So let me go ahead and just give myself some internet here, connecting to the load balancers somehow. What type of internet traffic should be coming from the outside world in if I’m hosting a website, a LAMP based website? Yeah?

Well, you would want a firewall that allows only port 80 connections.

Okay, good. So I want TCP, recall which is one of the transport protocols 80 on the way in. That’s good, but you just compromised by ability to have certain security, why? You’re not blocking a very useful type of traffic.

Oh, yeah, the encrypted [inaudible].

Good, so we also want 443 —


— which is the default port that’s used for SSL, for HTTPS based URL’s. So that’s good. This means now that the only traffic allowed into my datacenter is TCCP80 and 443. Now those familiar with SSH, you’ve also just locked yourself out of your datacenter, because you cannot now SSH into your datacenter, you might want to allow something like port 22, for SSH or you might want to have an SSL based VPN so that you can connect somehow to your datacenter remotely. And again, it doesn’t have to be a datacenter, this can just be some webhosting or some VPS hosting company that you’re using. And okay, so we might need one or more other ports for our VPN, but for now that’s pretty good. How about the load balancers? What kind of traffic needs to go from the load balancer to my web servers? Well, it — it’s really a mess to keep it encrypted because once it’s inside the datacenter, nobody else is going to listen [inaudible] people inside the datacenter.


So you would want to drop the encryption and do 80.

Good, and that’s actually very common to offload your SSL to the load balancer or some special device and then keep everything else unencrypted because if you control this, it’s at least safer. Not 100%, because if someone compromises this now, they’re going to see your traffic unencrypted. But if you’re okay with that, doing the SSL termination here, so everything’s encrypted from the internet down to here, but then everything else goes over normal unencrypted http. The upside of that is remember the whole certificate thing? You don’t need to put your SSL certificate on all of your web servers, you can just put it in the load balancer or the load balancers. You can get expensive load balancers to handle the cryptography and the computational cost thereof. You can get cheaper web servers because they don’t need to worry as much about that kind of overhead. So that’s one option. So TCP 80 here and here. How about the traffic between the web server and the databases, perhaps through these load balancers? This is a more of a trivia question. But what kind of — what kind of traffic is it, even if you don’t know the port number? Yeah?

It’s a — it’s query or execute.

Yeah, query actually — or more specifically, it’s the SQL queries like select and insert and delete and so forth. So this is generally TCP 3306, which is the port number that MySQL uses by default. So what does this mean? If you do have firewalling capabilities and we haven’t drawn any firewalls per say, so we do need to insert some hardware into this picture that would allow us to actually make these kinds of configuration changes, but if we assume we have that ability in large part because all of these things are plugged in as we said to some kind of switch, well the switch could be a firewall itself and we could make these configuration changes. We can further lock things down, why? I mean, everything just works if I don’t firewall things. Why would I want to bother tightening things so that only 80 and 443 are allowed here, and 3306 is allowed here? And in fact, notice there’s no line between these guys.

Well, it would be really stupid to keep, for example, 3306 open in the first firewall because then people — they might not be able to because of other security measures, but in theory, they — they are allowed by the firewall to query your database and do SQL commands.

Good, exactly. There’s just no need for people to be able to potentially even execute SQL queries coming in or make MySQL connections. And even if you’re not even listening for MySQL connections, it again is sort of the principle of the thing. You should really have the — the principle of least privilege whereby you only open those doors that people actually have to go through, otherwise you’re just inviting unexpected behavior because you left the door ajar, so to speak. You left the port open and it’s not clear whether someone might in fact, take advantage of that. Case and point, if somehow you screw up or Apache screws up or PHP screws up, and this server is compromised, it’d be kind of nice if the only thing this server can do is talk via MySQL to this server and cannot for instance, suddenly SSH to this server or poke around or execute any commands on your network other than MySQL. So at least if the bad guy takes this machine over, you really can’t leave this rectangle here that I’ve drawn. So again, beyond the scope of things we’ve done in the class, and even though the appliance itself actually does have a firewall that allows certain ports in and out, all of the ones you need, we haven’t had to fine tune it for any of the projects, realize that you — that even on something like a LINUX based operating system. So in short, as soon as you have the happy problem of having way too many users for your own good, lots of new problems arise even though thus far be focused pretty much entirely on the software side of things. So that is it for computer science S75. I’ll stick around for questions one on one. We still have a final section tonight, for those of you who would like to dive into some related topics, otherwise realize that the final project two, it’s deadline is coming up. You should’ve gotten feedback by — from your TF’s about project one, if not just drop him or her a note or me. Otherwise, it’s been a pleasure having everyone in the class. We will see you online after tonight [applause]. Oh, that’s okay. Thanks, I wasn’t trying to build up to that there, sorry.