How do large Web sites handle the load of millions of visitors a day?

Woman checking a shopping website on her laptop while holding her phone.
A small machine connected to the internet handles the number of users on a website.wera Rodsawang / Getty Images

One of the surprising things about Web sites is that, in certain cases, a very small machine can handle a huge number of visitors. For example, imagine that you have a simple Web site containing a number of staticpages(in this case, "static" means that everybody sees the same version of any page when they view it). If you took a normal 500MHz Celeron machine running Windows NT or Linux, loaded theApache Web serveron it, and connected this machine to the Internet with a T3 line (45 million bits per second), you could handle hundreds of thousands of visitors per day. Many ISPs will rent you a dedicated-machine configuration like this for $1,000 or less per month. This configuration will work great unless:

  • You need to handle millions of visitors per day.
  • 单台机器失败(在这种情况下,你的网站will be down until a new machine is installed and configured).
  • The pages are extremely large or complicated.
  • The pages need to change dynamically on a per-user basis.
  • Any back-end processing needs to be performed to create the contents of the page or to process a request on the page.

因为大多数的大型Web站点的e conditions, they need significantly larger infrastructures.

Advertisement

There are three main strategies for handling the load:

  1. The site can invest in a single huge machine with lots of processing power, memory, disk space and redundancy.
  2. The site can distribute the load across a number of machines.
  3. The site can use some combination of the first two options.

When you visit a site that has a different URL every time you visit (for examplewww1.xyz.com,www2.xyz.com,www3.xyz.com, etc.), then you know that the site is using the second approach at the front end. Typically the site will have an array of stand-alone machines that are each running Web server software. They all have access to an identical copy of the pages for the site. The incoming requests for pages are spread across all of the machines in one of two ways:

  • TheDomain Name Server(DNS) for the site can distribute the load. DNS is an Internet service that translates domain names into IP addresses. Each time a request is made for the Web server, DNS rotates through the available IP addresses in a circular way toshare the load. The individual servers would have common access to the same set of Web pages for the site.
  • Load balancing switchescan distribute the load. All requests for the Web site arrive at a machine that then passes the request to one of the available servers. The switch can find out from the servers which one is least loaded, so all of them are doing an equal amount of work. This is the approach that HowStuffWorks uses with its servers. The load balancer spreads the load among three different Web servers. One of the three can fail with no effect on the site.

The advantage of this redundant approach is that the failure of any one machine does not cause a problem -- the other machines pick up the load. It is also easy to add capacity in an incremental way. The disadvantage is that these machines will still have to talk to some sort of centralized database if there is any transaction processing going on.

Microsoft'sTerraServertakes the "single large machine" approach. Terraserver stores several terabytes of satellite imagery data and handles millions of requests for this information. The site uses huge enterprise-class machines to handle the load. For example, a single Digital AlphaServer 8400 used at TerraServer has eight 440 MHz 64-bit processors and 10 GB of error checked and corrected RAM. Seethe technology descriptionfor some truly impressive specifications!

Advertisement

Advertisement

Loading...