English System Design

Part I: VPS and host –> vertical scaling


Good, get more RAM, more processor



Part II:vertical scaling –> Load Balancer –> Distributed –> The catch of this method


Firstly analyze the catch of the vertical scaling,
Then talk about if we have aisles of servers, how to distribute it:

Finally give a plan: use black box — load balancer and return the load balancer's IP address
Draw back of the plan: it is a little hard to inquiry the working state of the backend server
Before continue talking, axle told the drawback of the previous method

So the teacher try to lead students to develop some other solutions, and he use the story from the lecture 0 (about DNS to lead) and one of the student comes up with do DNS with some tricks


Could you do some DNS tricks and return a different IP address based on what the user requested



Part III Round Robin –> round robin drawback –> state server –> using RAID to improve the robustness of the state server


First talk about Round Robin and the catch of the Round Robin (since it is rounded some machine will always be allocated with some really heavy work)

The previous drawback is proposed by the student, another drawback is if it is not queried (since the browser can cache it) then the round robin will not work

Cache definition: open session otherwise it is really useless to query the DNS server for the same thing over and over agin

The cache is determined by the TTL of the DNS server (the DNS server will determine when to change the return IP) so this mechanism will also leads to a pitfall that same machine get too much work

so axle's proposal is actually very good, since it asks the load balancer to handle all of them, and even we don't use the method to inquiry the backend server about how busy are you, we can also use some method like: randomness and round robin — perhaps still put too much work on the same server but the cache will not impair anything

Then give the example that if the php session break and our backend server is php based, how this load balancing model impact us

Major Reason: sessions recall, tend to be specific to the given machine.

Big problem occurs: larger than prompt password but you cannot check out all the staff in you shop cart

one solution: have a specific php server drawback: no redundancy and if you are popular, you will get lots of work for this specific server

Teacher analyze the catch: There is no round robin, because you have to send alice to that machine again and again

Continue the idea of factorization: we don't have the file server but we have the server to store all the state

(Some students proposed put them in the load balancer but it is not good, and the teacher proposed the questions that how to improve the robustness, how to increase the redundancy)

The RAID come in


You run to the store put in a new one and you haven’t lost any data. RAID6 is even better, what does RAID6 do, do you think? Axle?



Part III: First continue the talk of the RAID 6 –> then propose some general method to do with the redundancy to increase the stability –> the drawback is it still cannot handle with the downtime if your single server go down –> talk about the load balancer itself –> PHP acceleration –> static website drawback –> mysql query cache

Although Axle and the teacher's solutions is fairly good, but it still cannot handle the downtime where all the shared states go down

So we need replication

Then the teacher talk more about the load balancer it self (it is quite expensive)

Then after the break, the teacher come back to the sticky session issue. That is if you want to preserve your session even though there are lots of backend server, there is still a way to achieve it

the cookie will give you a solution

But the cookie still cannot provide a perfect solution, NOT THE EXPIRATION BUT THE IP CHANGES

So the teacher has proposed a method to deal with the expiration issue in this case. That is we don't store the ip address (the state might have changed) but we store a big random number like what php has done

Then the teacher talk about how to compile the php code to get some boost in efficiency

And it comes back to the cache of the dynamic website. The key is to use the CraigList as an Example

Craiglist is a dynamic website
the key of the dynamic website is you can submit your form and you will found the change in the website dynamically.
CraigList has used a way to speed thing up, that is it will not store at the data base and reinterpret it, but store the html file directly

Then the teacher begin to analyze that why we use the html file can speed things up

we have done in project 0: regenerate it on the fly CraigList: don't have to regenerate it

upside of the html is: the apache is really quick about processing the static content

Downside of this file based caching:

  1. Space
  2. redundancy — same body tag, same tail tag, same pre-process tag (a better way is to have some template)
  3. Another big cache: CANNOT USE THE PATTERN TO CHANGE THE TEMPLATE (color css) — regenerate all the pages

And the teacher summary that it is not a good solution. And very few people in the internet will do this (but actually the CraigList has done both)

And the teacher begin to talk about MySql's query cache.

Memory cache: a piece of software, save result to the RAM
If use is not existed, you store to the caching memory


You’re storing it in the caching memory.



Part IV From Cache –> Cache Size (cache garbage collection)–> Two Type Data Base –> data base backup

Firstly the lecturer talk about the cache size will make some content be evicted since the size of our RAM should be finite. Then the face book is a ready heavy website so it can benefit a lot from this mechanism (So that an opportunity for optimization).

Then the conclusion is the memory cache is a good mechanism for the database, so the teacher begin to talk about the data base. The kind of the data base can be divided into my ISAM and NODB.

The conclusion is NODB supports transaction. MyISAM uses locks which are full table locks. But the RAM will lose every thing if the power is off. So Archive Storage Engine.

archive engine doesn't cache but store, every time you will need a inquiry. But the foot notes will be on the other storage engine, so it is compressed by default. As to the replication we can use the NDB to achieve it. [anytime a query is executed on the master, that same query is copied down to one or more slaves and they do exact the same thing]

Then the teacher ask about he advantage of using this mechanism which will get some replication of the data base. The first one is you will not suffer from the down time. The second one is a load balance for the data base.

Then the teacher analyze how the load balance help the performance. You could just write your code in such a way that any select statements go to data base two, three or four and any inserts will finally go to the server one. so it is code wise.

But the previous method will have some drawbacks if we only have one writing(master) server, so the solution is we will get two servers, that is if one go down the other one can be used. The mysql also supports that.

Question is we still have the tracking issue, about how to connect the master mysql to the salve mysql.


For reads with a load balancer.



Part V From the route between the Mysql server and Load Balancer –> talk about why two load balancer needed –> then the teacher begin to build its own network on the board –> then big switches comes and we get the conclusion that every thing can fail.

Firstly the professor continue the questions about how to deal with the tracking issue, and it is a bottle neck to ruin every thing. So one solution is to use two load balancer.

And the teacher continue this complicated topic. At the first period phase of constructing face book, it uses a really silly method. And if you are in Harvard but want to send some message to MIT, you have to cross the bound and in the early on some features is restricted.

And the solution is to use the partition, for example you can put users who's last names start with a m or etc. It is not a bad idea, and it is common in data base because you can still have redundancy whole bunch of slaves in this case here. And you can balance the load on some high level information.

And through so many wired up thing, the teacher begin to build his own network including the firewall. And the server becomes two, how to connect the internet and implement the sticky session.

One solution is to use the shared state, but it is a little expensive, so our question is how to stay shard state without using the shared state saving server.

The Axle provide a solution that is to let us load balancer to listen to all the http session. (load balancer store the cookie like big random number)

But to let the load balancer to store the cookie is sometimes not a good idea, since the when the user uses other computer the cache will expire no profile is available. (since when the data base in the load balancer when you log in using different computer you data cannot be accessed) The solution to this issue is we need to partition our user and let the load balancer take the key feature (like the last name of the user) to some properly data base. This solution may solve some corner cases but will introduce new problem. Also the single point of failure.

And to avoid the data base fail, we can attach slave data base to it, but just use the slave data base, we will have the situation where have to promote the slave data base to the master database. So we will use two master data base.

But when we have two master data base, it is not a good idea to connect to each other. Since we will have some cross connect, and using such method we will end up letting our developer to understand the network topology. This is not a good layer of abstraction.

To solve the multiple database's connection issue, one of our solutions is insert a load balancer here. And let the load balancer to handle something intelligently, but the failure is between the mysql database, they use byte code to communicate so not easy to implement.

so not a solution, we use the switches. The big switch can handle connection, but the big switch can also fail because the switch algorithm. So we take care of this and we put the data center in a room, still a question that is: the building burns down.

So Amazon EC2 has the redundancy.

So the board has two data center, the question is to distribute your data center. The answer is you can do it at DNS level since different will not in the same building but even in different countries, so we need to take the geography into account. And for Google sometimes, you session will stay in the same building for a little while since sharing my session across entirely from different continent can be really expensive. (potent downtime, building out and your cache leads you to the same building.)

So the ISAC has give us a pretty good hint that is to avoid the failure is very hard, all the thing will becomes even more replicated.


So a word on security then.



Part VI Start Firstly what internet traffic can comes in if I am hosting a website.

First talk the port number, it includes 80 443 and 22 for ssh.

Then how about the load balancer, what traffic can I allow from the load balancer to my web server. To be more specific: it is really a mess to keep it encrypted because once it is inside the data center, nobody else is going to listen people inside the datacenter.

The solution is to offload the SSL at the load balancer. So everything is un-encrypted and you don't need to put your SSL certificate on all of your web servers.

Then the teacher propose a question about what kind of traffic between the web server and database. It is TCP 3306.

Then how to use the firewall to implement that, one of the silly method is we can plug them to different switch, and the switch can be the firewall by itself, so we can put some privilege for different switch only allows part of the port number. But that could be the problem and induce someone else to use it intentionally.

A better solution is if the only thing this server can do is talk via MySQL to this server and cannot instance, suddenly SHH to this server or poke around or execute any command on your network other than MySQL.

Finally our conclusion is: when you solve some problems, some new problem will occurs.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s