Googleis a multi-billion dollar company. It's one of the big power players on theWorld Wide Weband beyond. The company relies on adistributed computing systemto provide users with the infrastructure they need to access, create and alter data. Surely Google buys state-of-the-artcomputersand servers to keep things running smoothly, right?
Wrong. The machines that power Google's operations aren't cutting-edge power computers with lots of bells and whistles. In fact, they're relatively inexpensive machines running onLinuxoperating systems. How can one of the most influential companies on the Web rely on cheap hardware? It's due to theGoogle File System(GFS), which capitalizes on the strengths of off-the-shelf servers while compensating for any hardware weaknesses. It's all in the design.
Advertisement
Google uses the GFS to organize and manipulate huge files and to allow application developers the research and development resources they require. The GFS is unique to Google and isn't for sale. But it could serve as a model for file systems for organizations with similar needs.
Some GFS details remain a mystery to anyone outside of Google. For example, Google doesn't reveal how many computers it uses to operate the GFS. In official Google papers, the company only says that there are "thousands" of computers in the system (source: Google). But despite this veil of secrecy, Google has made much of the GFS's structure and operation public knowledge.
So what exactly does the GFS do, and why is it important? Find out in the next section.
Advertisement