Google platform
   HOME

TheInfoList



OR:

Google data centers are the large data center facilities
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
uses to provide their services, which combine large drives, computer nodes organized in aisles of racks, internal and external networking, environmental controls (mainly cooling and humidification control), and operations software (especially as concerns load balancing and
fault tolerance Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
). There is no official data on how many servers are in Google data centers, but
Gartner Gartner, Inc is a technological research and consulting firm based in Stamford, Connecticut that conducts research on technology and shares this research both through private consulting as well as executive programs and conferences. Its client ...
estimated in a July 2016 report that Google at the time had 2.5 million servers. This number is changing as the company expands capacity and refreshes its hardware.


Locations

The locations of Google's various data centers by continent are as follows:


North America

#
Berkeley County, South Carolina Berkeley County is a county in the U.S. state of South Carolina. As of the 2020 census, its population was 229,861. Its county seat is Moncks Corner. After two previous incarnations of Berkeley County, the current county was created in 1882. B ...
() — since 2007, expanded in 2013, 150 employees # Council Bluffs, Iowa () — announced 2007, first phase completed 2009, expanded 2013 and 2014, 130 employees # Douglas County, Georgia () — since 2003, 350 employees #
Bridgeport, Alabama Bridgeport is a city in Jackson County, Alabama, United States. At the time of 2010 census the population was 2,418, down from 2,728 in 2000. Bridgeport is included in the Chattanooga-Cleveland-Dalton, TN-GA-AL Combined Statistical Area. Histo ...
() — broke ground in 2018 #
Lenoir, North Carolina Lenoir is a city in and the county seat of Caldwell County, North Carolina, United States. The population was 18,263 at the 2020 census. Lenoir is located in the foothills of the Blue Ridge Mountains. To the northeast are the Brushy Mountains, ...
() — announced 2007, completed 2009, over 110 employees #
Montgomery County, Tennessee Montgomery County is a county in the U.S. state of Tennessee. As of the 2020 United States census, the population was 220,069. The county seat (and only incorporated municipality) is Clarksville. The county was created in 1796. Montgomery Count ...
() — announced 2015 #
Mayes County, Oklahoma Mayes County is a county located in the U.S. state of Oklahoma. As of the 2010 census, the population was 41,259. Its county seat is Pryor Creek. Named for Samuel Houston Mayes, Principal Chief of the Cherokee Nation from 1895 to 1899, it wa ...
at
MidAmerica Industrial Park MidAmerica Industrial Park (MAIP) is Oklahoma's largest industrial park, located in Pryor Creek, Oklahoma, United States. In 2020, over 80 firms were located within the industrial park including operations of seven Fortune 500 companies, such as Go ...
() — announced 2007, expanded 2012, over 400 employees #
The Dalles, Oregon The Dalles is the largest city of Wasco County, Oregon, United States. The population was 16,010 at the 2020 census, and it is the largest city on the Oregon side of the Columbia River between the Portland Metropolitan Area, and Hermisto ...
() — since 2006, 80 full-time employees #
Reno, Nevada Reno ( ) is a city in the northwest section of the U.S. state of Nevada, along the Nevada-California border, about north from Lake Tahoe, known as "The Biggest Little City in the World". Known for its casino and tourism industry, Reno is the ...
— announced in 2018 : 1,210 acres of land bought in 2017 in the
Tahoe Reno Industrial Center The Tahoe Reno Industrial Center (TRI Center, or TRIC) is a privately owned industrial park, located in Storey County, east of Reno, Nevada and south of Interstate 80. The center is the largest in the United States (third largest in the world), ...
; project approved by the state of Nevada in November 2018 # Henderson, Nevada — announced in 2019; 64-acres; $1.2B building costs #
Loudoun County, Virginia Loudoun County () is in the northern part of the Commonwealth of Virginia in the United States. In 2020, the census returned a population of 420,959, making it Virginia's third-most populous county. Loudoun County's seat is Leesburg. Loudoun ...
— announced in 2019 # Northland, Kansas City — announced in 2019, under construction #
Midlothian, Texas Midlothian is a city in northwest Ellis County, Texas, United States. The city is southwest of Dallas. It is the hub for the cement industry in North Texas, as it is the home to three separate cement production facilities, as well as a steel mill ...
— announced in 2019, 375-acres; $600M building costs #
New Albany, Ohio New Albany is a city in the U.S. state of Ohio, located northeast of the state capital of Columbus. Most of the city is located in Franklin County and a small portion extends into adjacent Licking County. New Albany had a population of 10,825 ...
— announced in 2019; 400-acres; $600M building costs #
Papillion, Nebraska Papillion is a city in Sarpy County in the state of Nebraska, United States. Designated as the county seat, it developed as an 1870s railroad town and suburb of Omaha. The city is part of the larger five-county metro area of Omaha. Papillion's p ...
— announced in 2019; 275-acres; $600M building costs #
Beauharnois, Québec Beauharnois () is a city located in the Beauharnois-Salaberry Regional County Municipality of southwestern Quebec, Canada, and is part of the Greater Montreal Area. The city's population as of the Canada 2011 Census was 12,011. It is home to the ...
— announced in 2021; 62.4-hectares; $600M building costs #
Salt Lake City, Utah Salt Lake City (often shortened to Salt Lake and abbreviated as SLC) is the Capital (political), capital and List of cities and towns in Utah, most populous city of Utah, United States. It is the county seat, seat of Salt Lake County, Utah, Sal ...
— announced in 2020


South America

#
Quilicura, Chile Quilicura ( ; ) is a commune of Chile located in capital Santiago. Founded in 1901, it was originally a satellite city on what were then the outskirts of the city of Santiago, but as urban sprawl has set in it is now quickly urbanizing from what ...
() — announced 2012, online since 2015, up to 20 employees expected. A million investment plan to increase capacity at Quilicura was announced in 2018. #
Cerrillos, Chile Cerrillos (English: Hillocks) is a commune of Chile located in a midtown area of Santiago and the southwest of the city — in the Santiago Province, Santiago Metropolitan Region — as a spot of the conurbation of Santiago. The commune was crea ...
– announced for 2020 #
Colonia Nicolich General Seregni (former Colonia Nicolich) is a town in the Canelones Department of southern Uruguay. Colonia Nicolich is also the name of the municipality to which the town belongs and which includes the areas Colinas de Carrasco (Empalme) and V ...
,
Uruguay Uruguay (; ), officially the Oriental Republic of Uruguay ( es, República Oriental del Uruguay), is a country in South America. It shares borders with Argentina to its west and southwest and Brazil to its north and northeast; while bordering ...
– announced 2019


Europe

#
Saint-Ghislain Saint-Ghislain (; pcd, Saint-Guilagne; wa, Sint-Guilin) is a city and municipality of Wallonia located in the province of Hainaut, Belgium. On 1 January 2018 the municipality had 23,335 inhabitants. The total area is , giving a population dens ...
,
Belgium Belgium, ; french: Belgique ; german: Belgien officially the Kingdom of Belgium, is a country in Northwestern Europe. The country is bordered by the Netherlands to the north, Germany to the east, Luxembourg to the southeast, France to th ...
() — announced 2007, completed 2010, 12 employees # Hamina,
Finland Finland ( fi, Suomi ; sv, Finland ), officially the Republic of Finland (; ), is a Nordic country in Northern Europe. It shares land borders with Sweden to the northwest, Norway to the north, and Russia to the east, with the Gulf of B ...
() — announced 2009, first phase completed 2011, expansion ongoing, 2022: 6 buildings, 400 employees # Dublin, Ireland () — announced 2011, completed 2012, 150 employees #
Eemshaven Eemshaven (; en, Ems Harbor) is a seaport in the province of Groningen in the north of the Netherlands. In 1968, the Dutch government declared the Ems estuary ( Eemsmond) to be an economic key region. One of the key developments for the regi ...
,
Netherlands ) , anthem = ( en, "William of Nassau") , image_map = , map_caption = , subdivision_type = Sovereign state , subdivision_name = Kingdom of the Netherlands , established_title = Before independence , established_date = Spanish Netherl ...
() — announced 2014, completed 2016, 200 employees, €500 million expansion announced in 2018 #
Hollands Kroon Hollands Kroon is a municipality located in the Northwest Netherlands. It was created on 1 January 2012, as a merger of four municipalities: Anna Paulowna, Niedorp, Wieringen, and Wieringermeer.Ministry of the Interior and Kingdom Relations''Same ...
( Agriport),
Netherlands ) , anthem = ( en, "William of Nassau") , image_map = , map_caption = , subdivision_type = Sovereign state , subdivision_name = Kingdom of the Netherlands , established_title = Before independence , established_date = Spanish Netherl ...
– announced 2019 #
Fredericia Fredericia () is a town located in Fredericia Municipality in the southeastern part of the Jutland peninsula in Denmark. The city is part of the Triangle Region, which includes the neighbouring cities of Kolding and Vejle. It was founded in 16 ...
,
Denmark ) , song = ( en, "King Christian stood by the lofty mast") , song_type = National and royal anthem , image_map = EU-Denmark.svg , map_caption = , subdivision_type = Sovereign state , subdivision_name = Kingdom of Denmark , establish ...
()— announced 2018, €600M building costs, completed in 2020 November #
Zürich , neighboring_municipalities = Adliswil, Dübendorf, Fällanden, Kilchberg, Maur, Oberengstringen, Opfikon, Regensdorf, Rümlang, Schlieren, Stallikon, Uitikon, Urdorf, Wallisellen, Zollikon , twintowns = Kunming, San Francisco Zürich ...
, Switzerland – announced in 2018, completed 2019 #
Warsaw Warsaw ( pl, Warszawa, ), officially the Capital City of Warsaw,, abbreviation: ''m.st. Warszawa'' is the capital and largest city of Poland. The metropolis stands on the River Vistula in east-central Poland, and its population is officia ...
,
Poland Poland, officially the Republic of Poland, is a country in Central Europe. It is divided into 16 administrative provinces called voivodeships, covering an area of . Poland has a population of over 38 million and is the fifth-most populou ...
– announced in 2019, completed in 2021


Asia

# Jurong West,
Singapore Singapore (), officially the Republic of Singapore, is a sovereign island country and city-state in maritime Southeast Asia. It lies about one degree of latitude () north of the equator, off the southern tip of the Malay Peninsula, bor ...
() — announced 2011, completed 2013 #
Changhua County Changhua County ( Mandarin Pinyin: ''Zhānghuà Xiàn''; Wade-Giles: ''Chang¹-hua⁴ Hsien⁴''; Hokkien POJ: ''Chiang-hòa-koān'' or ''Chiong-hòa-koān'') is the smallest county on the main island of Taiwan by area, and the fourth small ...
,
Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the nort ...
() — announced 2011, completed 2013, 60 employees #
Mumbai, India Mumbai (, ; also known as Bombay — the official name until 1995) is the capital city of the Indian state of Maharashtra and the ''de facto'' financial centre of India. According to the United Nations, as of 2018, Mumbai is the second-m ...
— announced 2017, completed 2019 #
Tainan City Tainan (), officially Tainan City, is a special municipality in southern Taiwan facing the Taiwan Strait on its western coast. Tainan is the oldest city on the island and also commonly known as the "Capital City" for its over 200 years of his ...
, Taiwan — announced September 2019 #
Yunlin County Yunlin County ( Mandarin pinyin: ''Yúnlín Xiàn''; Taigi POJ: ''Hûn-lîm-koān''; Hakka PFS: ''Yùn-lìm-yen'') is a county in western Taiwan. Yunlin County borders the Taiwan Strait to the west, Nantou County to the east, Changhua County ...
, Taiwan — announced September 2020 # Jakarta,
Indonesia Indonesia, officially the Republic of Indonesia, is a country in Southeast Asia and Oceania between the Indian and Pacific oceans. It consists of over 17,000 islands, including Sumatra, Java, Sulawesi, and parts of Borneo and New Guine ...
— announced in 2020, opened in September 2021 #
New Delhi New Delhi (, , ''Naī Dillī'') is the capital of India and a part of the National Capital Territory of Delhi (NCT). New Delhi is the seat of all three branches of the government of India, hosting the Rashtrapati Bhavan, Parliament Ho ...
,
India India, officially the Republic of India (Hindi: ), is a country in South Asia. It is the seventh-largest country by area, the second-most populous country, and the most populous democracy in the world. Bounded by the Indian Ocean on the so ...
— announced in 2020, completed in July 2021


Hardware


Original hardware

The original hardware (circa 1998) that was used by Google when it was located at Stanford University included: * Sun Microsystems Ultra II with dual 200 
MHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that one he ...
processors, and 256  MB of
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * ...
. This was the main machine for the original
Backrub Google was officially launched in 1998 by Larry Page and Sergey Brin to market Google Search, which has become the most used web-based search engine. Larry Page and Sergey Brin, students at Stanford University in California, developed a search al ...
system. * 2 × 300 MHz dual
Pentium II The Pentium II brand refers to Intel's sixth-generation microarchitecture (" P6") and x86-compatible microprocessors introduced on May 7, 1997. Containing 7.5 million transistors (27.4 million in the case of the mobile Dixon with 256  KB ...
servers donated by
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
, they included 512 MB of RAM and 10 × 9  GB hard drives between the two. It was on these that the main search ran. * F50 IBM
RS/6000 The RISC System/6000 (RS/6000) is a family of Reduced instruction set computer, RISC-based Unix Server (computing), servers, workstations and supercomputers made by IBM in the 1990s. The RS/6000 family replaced the IBM RT PC computer platform in ...
donated by IBM, included 4 processors, 512 MB of memory and 8 × 9 GB hard disk drives. * Two additional boxes included 3 × 9 GB hard drives and 6 x 4 GB hard disk drives respectively (the original storage for Backrub). These were attached to the Sun Ultra II. * SDD disk expansion box with another 8 × 9 GB hard disk drives donated by IBM. * Homemade disk box which contained 10 × 9 GB SCSI hard disk drives.


Production hardware

As of 2014, Google has used a heavily customized version of Debian
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
. They migrated from a Red Hat-based system incrementally in 2013. The customization goal is to purchase CPU generations that offer the best performance per dollar, not absolute performance. How this is measured is unclear, but it is likely to incorporate running costs of the entire server, and CPU power consumption could be a significant factor. Servers as of 2009–2010 consisted of custom-made open-top systems containing two processors (each with several cores), a considerable amount of RAM spread over 8 DIMM slots housing double-height DIMMs, and at least two SATA hard disk drives connected through a non-standard ATX-sized power supply unit. The servers were open top so more servers could fit into a rack. According to CNET and a book by John Hennessy, each server had a novel 12-volt battery to reduce costs and improve power efficiency.Computer Architecture, Fifth Edition: A Quantitative Approach, ; Chapter Six; 6.7 "A Google Warehouse-Scale Computer
page 471
"Designing motherboards that only need a single 12-volt supply so that the UPS function could be supplied by standard batteries associated with each server"
According to Google, their global data center operation electrical power ranges between 500 and 681
megawatts The watt (symbol: W) is the unit of power or radiant flux in the International System of Units (SI), equal to 1 joule per second or 1 kg⋅m2⋅s−3. It is used to quantify the rate of energy transfer. The watt is named after James Wat ...
. The combined processing power of these servers might have reached from 20 to 100
petaflops In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
in 2008.


Network topology

Details of the Google worldwide private networks are not publicly available, but Google publications make references to the "Atlas Top 10" report that ranks Google as the third largest ISP behind Level 3. In order to run such a large network, with direct connections to as many ISPs as possible at the lowest possible cost, Google has a very open
peering In computer networking, peering is a voluntary interconnection of administratively separate Internet networks for the purpose of exchanging traffic between the "down-stream" users of each network. Peering is settlement-free, also known as "bill-and ...
policy. From this site, we can see that the Google network can be accessed from 67 public exchange points and 69 different locations across the world. As of May 2012, Google had 882 Gbit/s of public connectivity (not counting private peering agreements that Google has with the largest ISPs). This public network is used to distribute content to Google users as well as to crawl the internet to build its search indexes. The private side of the network is a secret, but a recent disclosure from Google indicate that they use custom built high-radix switch-routers (with a capacity of 128 × 10
Gigabit Ethernet In computer networking, Gigabit Ethernet (GbE or 1 GigE) is the term applied to transmitting Ethernet frames at a rate of a gigabit per second. The most popular variant, 1000BASE-T, is defined by the IEEE 802.3ab standard. It came into use ...
port) for the
wide area network A wide area network (WAN) is a telecommunications network that extends over a large geographic area. Wide area networks are often established with leased telecommunication circuits. Businesses, as well as schools and government entities, u ...
. Running no less than two routers per datacenter (for redundancy) we can conclude that the Google network scales in the terabit per second range (with two fully loaded routers the bi-sectional bandwidth amount to 1,280 Gbit/s). These custom switch-routers are connected to
DWDM In fiber-optic communications, wavelength-division multiplexing (WDM) is a technology which multiplexes a number of optical carrier signals onto a single optical fiber by using different wavelengths (i.e., colors) of laser light. This techni ...
devices to interconnect data centers and
point of presence A point of presence (PoP) is an artificial demarcation point or network interface point between communicating entities. A common example is an ISP point of presence, the local access point that allows users to connect to the Internet with their ...
s (PoP) via
dark fiber A dark fibre or unlit fibre is an unused optical fibre, available for use in fibre-optic communication. Dark fibre may be leased from a network service provider. Dark fibre originally referred to the potential network capacity of telecommunic ...
. From a datacenter view, the network starts at the rack level, where
19-inch rack A 19-inch rack is a standardized frame or enclosure for mounting multiple electronic equipment modules. Each module has a front panel that is wide. The 19 inch dimension includes the edges or "ears" that protrude from each side of the equ ...
s are custom-made and contain 40 to 80 servers (20 to 40 1 U servers on either side, while new servers are 2U rackmount systems.Web Search for a Planet: The Google Cluster Architecture
(Luiz André Barroso, Jeffrey Dean, Urs Hölzle)
Each rack has an
Ethernet switch A network switch (also called switching hub, bridging hub, and, by the IEEE, MAC bridge) is networking hardware that connects devices on a computer network by using packet switching to receive and forward data to the destination device. A netw ...
). Servers are connected via a 1 Gbit/s
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 1 ...
link to the top of rack switch (TOR). TOR switches are then connected to a gigabit cluster switch using multiple gigabit or ten gigabit uplinks. The cluster switches themselves are interconnected and form the datacenter interconnect fabric (most likely using a dragonfly design rather than a classic butterfly or flattened butterfly layout). From an operation standpoint, when a client computer attempts to connect to Google, several DNS servers resolve www.google.com into multiple IP addresses via
Round Robin Round-robin may refer to: Computing * Round-robin DNS, a technique for dealing with redundant Internet Protocol service hosts * Round-robin networks, communications networks made up of radio nodes organized in a mesh topology * Round-robin schedu ...
policy. Furthermore, this acts as the first level of load balancing and directs the client to different Google clusters. A Google cluster has thousands of servers, and once the client has connected to the server additional load balancing is done to send the queries to the least loaded web server. This makes Google one of the largest and most complex
content delivery network A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially rel ...
s. Google has numerous data centers scattered around the world. At least 12 significant Google data center installations are located in the United States. The largest known centers are located in
The Dalles, Oregon The Dalles is the largest city of Wasco County, Oregon, United States. The population was 16,010 at the 2020 census, and it is the largest city on the Oregon side of the Columbia River between the Portland Metropolitan Area, and Hermisto ...
;
Atlanta, Georgia Atlanta ( ) is the capital and most populous city of the U.S. state of Georgia. It is the seat of Fulton County, the most populous county in Georgia, but its territory falls in both Fulton and DeKalb counties. With a population of 498,7 ...
; Reston, Virginia;
Lenoir, North Carolina Lenoir is a city in and the county seat of Caldwell County, North Carolina, United States. The population was 18,263 at the 2020 census. Lenoir is located in the foothills of the Blue Ridge Mountains. To the northeast are the Brushy Mountains, ...
; and
Moncks Corner, South Carolina Moncks Corner is a town in and the county seat of Berkeley County, South Carolina, United States. The population was 7,885 at the 2010 census. As defined by the U.S. Census Bureau, Moncks Corner is included within the Charleston-North Charleston-S ...
. In Europe, the largest known centers are in
Eemshaven Eemshaven (; en, Ems Harbor) is a seaport in the province of Groningen in the north of the Netherlands. In 1968, the Dutch government declared the Ems estuary ( Eemsmond) to be an economic key region. One of the key developments for the regi ...
and Groningen in the
Netherlands ) , anthem = ( en, "William of Nassau") , image_map = , map_caption = , subdivision_type = Sovereign state , subdivision_name = Kingdom of the Netherlands , established_title = Before independence , established_date = Spanish Netherl ...
and Mons,
Belgium Belgium, ; french: Belgique ; german: Belgien officially the Kingdom of Belgium, is a country in Northwestern Europe. The country is bordered by the Netherlands to the north, Germany to the east, Luxembourg to the southeast, France to th ...
. Google's
Oceania Oceania (, , ) is a geographical region that includes Australasia, Melanesia, Micronesia, and Polynesia. Spanning the Eastern and Western hemispheres, Oceania is estimated to have a land area of and a population of around 44.5 million ...
Data Center is located in Sydney, Australia.


Data center network topology

To support
fault tolerance Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
, increase the scale of data centers and accommodate low-radix switches, Google has adopted various modified
Clos Clos may refer to: People * Clos (surname) Other uses * CLOS, Command line-of-sight, a method of guiding a missile to its intended target * Clos network, a kind of multistage switching network * Clos (vineyard), a walled vineyard; used in Fran ...
topologies in the past.


Project 02

One of the largest Google data centers is located in the town of
The Dalles, Oregon The Dalles is the largest city of Wasco County, Oregon, United States. The population was 16,010 at the 2020 census, and it is the largest city on the Oregon side of the Columbia River between the Portland Metropolitan Area, and Hermisto ...
, on the Columbia River, approximately 80 miles (129 km) from
Portland Portland most commonly refers to: * Portland, Oregon, the largest city in the state of Oregon, in the Pacific Northwest region of the United States * Portland, Maine, the largest city in the state of Maine, in the New England region of the northeas ...
. Codenamed "Project 02", the complex was built in 2006 and is approximately the size of two
American football field The rectangular field of play used for American football games measures long between the goal lines, and (53.3 yards) wide. The field may be made of grass or artificial turf. In addition, there are end zones extending another past the goal li ...
s, with
cooling towers A cooling tower is a device that rejects waste heat to the atmosphere through the cooling of a coolant stream, usually a water stream to a lower temperature. Cooling towers may either use the evaporation of water to remove process heat and ...
four stories high.Markoff, John; Hansell, Saul.
Hiding in Plain Sight, Google Seeks More Power.
''
New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...
.'' June 14, 2006. Retrieved on October 15, 2008.
The site was chosen to take advantage of inexpensive
hydroelectric power Hydroelectricity, or hydroelectric power, is electricity generated from hydropower (water power). Hydropower supplies one sixth of the world's electricity, almost 4500 TWh in 2020, which is more than all other renewable sources combined an ...
, and to tap into the region's large surplus of
fiber optic An optical fiber, or optical fibre in Commonwealth English, is a flexible, transparent fiber made by drawing glass (silica) or plastic to a diameter slightly thicker than that of a human hair. Optical fibers are used most often as a means t ...
cable, a remnant of the
dot-com boom The dot-com bubble (dot-com boom, tech bubble, or the Internet bubble) was a stock market bubble in the late 1990s, a period of massive growth in the use and adoption of the Internet. Between 1995 and its peak in March 2000, the Nasdaq Compos ...
. A blueprint of the site appeared in 2008.


Summa papermill

In February 2009,
Stora Enso Stora Enso Oyj (from sv, Stora and fi, Enso ) is a manufacturer of pulp, paper and other forest products, headquartered in Helsinki, Finland. The majority of sales takes place in Europe, but there are also significant operations in Asia and S ...
announced that they had sold the Summa paper mill in Hamina,
Finland Finland ( fi, Suomi ; sv, Finland ), officially the Republic of Finland (; ), is a Nordic country in Northern Europe. It shares land borders with Sweden to the northwest, Norway to the north, and Russia to the east, with the Gulf of B ...
to Google for 40 million Euros. Google invested 200 million euros on the site to build a data center and announced additional 150 million euro investment in 2012. Google chose this location due to the availability and proximity of renewable energy sources.


Modular container data centers

In 2005, Google was researching a containerized
modular data center A modular data center system is a portable method of deploying data center capacity. A modular data center can be placed anywhere data capacity is needed. Modular data center systems consist of purpose-engineered modules and components to offe ...
. Google filed a patent application for this technology in 2003.


Floating data centers

In 2013, the press revealed the existence of Google's floating data centers along the coasts of the states of California (
Treasure Island ''Treasure Island'' (originally titled ''The Sea Cook: A Story for Boys''Hammond, J. R. 1984. "Treasure Island." In ''A Robert Louis Stevenson Companion'', Palgrave Macmillan Literary Companions. London: Palgrave Macmillan. .) is an adventure no ...
's Building 3) and Maine. The development project was maintained under tight secrecy. The data centers are 250 feet long, 72 feet wide, 16 feet deep. The patent for an in-ocean data center cooling technology was bought by Google in 2009 (along with a wave-powered ship-based data center patent in 2008). Shortly thereafter, Google declared that the two massive and secretly-built infrastructures were merely "interactive learning centers, ..a space where people can learn about new technology." Google halted work on the barges in late 2013 and began selling off the barges in 2014.


Software

Most of the
software stack In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on ...
that Google uses on their servers was developed in-house. According to a well-known Google employee,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
and (more recently) Go are favored over other programming languages. For example, the back end of Gmail is written in Java and the back end of Google Search is written in C++. Google has acknowledged that Python has played an important role from the beginning, and that it continues to do so as the system grows and evolves. The software that runs the Google infrastructure includes: * Google Web Server (GWS) custom Linux-based Web server that Google uses for its online services. * Storage systems: **
Google File System Google File System (GFS or GoogleFS, not to be confused with the GFS Linux file system) is a proprietary distributed file system developed by Google to provide efficient, reliable access to data using large clusters of commodity hardware. Goo ...
and its successor, Colossus **
Bigtable Bigtable is a fully managed wide-column and key-value NoSQL database service for large analytical and operational workloads as part of the Google Cloud portfolio. History Bigtable development began in 2004.. It is now used by a number of Googl ...
structured storage built upon GFS/Colossus **
Spanner A wrench or spanner is a tool used to provide grip and mechanical advantage in applying torque to turn objects—usually rotary fasteners, such as nuts and bolts—or keep them from turning. In the UK, Ireland, Australia, and New Zeala ...
planet-scale database, supporting externally-consistent distributed transactions **
Google F1 Spanner is a distributed SQL database management and storage service developed by Google. It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, ...
a distributed, quasi- SQL
DBMS In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases s ...
based on Spanner, substituting a custom version of MySQL. * Chubby lock service *
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering ...
and Sawzall programming language * Indexing/search systems: ** TeraGoogle Google's large search index (launched in early 2006) ** Caffeine (Percolator) continuous indexing system (launched in 2010). ** Hummingbird major search index update, including complex search and voice search. *
Borg The Borg are an alien group that appear as recurring antagonists in the ''Star Trek'' fictional universe. The Borg are cybernetic organisms (cyborgs) linked in a hive mind called "the Collective". The Borg co-opt the technology and knowledge ...
declarative process scheduling software Google has developed several abstractions which it uses for storing most of its data: *
Protocol Buffers Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data. The method involves an i ...
"Google's lingua franca for data", a binary serialization format which is widely used within the company. * SSTable (Sorted Strings Table) a persistent, ordered, immutable map from keys to values, where both keys and values are arbitrary byte strings. It is also used as one of the building blocks of Bigtable. * RecordIO a sequence of variable sized records.


Software development practices

Most operations are read-only. When an update is required, queries are redirected to other servers, so as to simplify consistency issues. Queries are divided into sub-queries, where those sub-queries may be sent to different ducts in
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of ...
, thus reducing the latency time. To lessen the effects of unavoidable hardware failure, software is designed to be
fault tolerant Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
. Thus, when a system goes down, data is still available on other servers, which increases reliability.


Search infrastructure


Index

Like most search engines, Google indexes documents by building a data structure known as
inverted index In computer science, an inverted index (also referred to as a postings list, postings file, or inverted file) is a database index storing a mapping from content, such as words or numbers, to its locations in a table, or in a document or a set of d ...
. Such an index obtains a list of documents by a query word. The index is very large due to the number of documents stored in the servers. The index is partitioned by document IDs into many pieces called shards. Each shard is replicated onto multiple servers. Initially, the index was being served from
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magne ...
s, as is done in traditional information retrieval (IR) systems. Google dealt with the increasing query volume by increasing number of replicas of each shard and thus increasing number of servers. Soon they found that they had enough servers to keep a copy of the whole index in main memory (although with low replication or no replication at all), and in early 2001 Google switched to an ''in-memory index'' system. This switch "radically changed many design parameters" of their search system, and allowed for a significant increase in throughput and a large decrease in latency of queries. In June 2010, Google rolled out a next-generation indexing and serving system called "Caffeine" which can continuously crawl and update the search index. Previously, Google updated its search index in batches using a series of
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filtering ...
jobs. The index was separated into several layers, some of which were updated faster than the others, and the main layer wouldn't be updated for as long as two weeks. With Caffeine, the entire index is updated incrementally on a continuous basis. Later Google revealed a distributed data processing system called "Percolator" which is said to be the basis of Caffeine indexing system.The Register
Google Caffeine jolts worldwide search machine
/ref>The Register
Google Percolator – global search jolt sans MapReduce comedown
/ref>


Server types

Google's server infrastructure is divided into several types, each assigned to a different purpose: * Web servers coordinate the execution of queries sent by users, then format the result into an
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
page. The execution consists of sending queries to index servers, merging the results, computing their rank, retrieving a summary for each hit (using the document server), asking for suggestions from the spelling servers, and finally getting a list of advertisements from the ad server. * Data-gathering servers are permanently dedicated to
spidering A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spid ...
the Web. Google's web crawler is known as GoogleBot. They update the index and document databases and apply Google's algorithms to assign ranks to pages. * Each index server contains a set of index shards. They return a list of document IDs ("docid"), such that documents corresponding to a certain docid contain the query word. These servers need less disk space, but suffer the greatest CPU workload. * Document servers store documents. Each document is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They can also fetch the complete document when asked. These servers need more disk space. * Ad servers manage advertisements offered by services like
AdWords Google Ads (formerly Google AdWords) is an online advertising platform developed by Google, where advertisers bid to display brief advertisements, service offerings, product listings, or videos to web users. It can place ads both in the result ...
and
AdSense Google AdSense is a program run by Google through which website publishers in the Google Network of content sites serve text, images, video, or interactive media advertisements that are targeted to the site content and audience. These advert ...
. * Spelling servers make suggestions about the spelling of queries.


Security

In October 2013, ''
The Washington Post ''The Washington Post'' (also known as the ''Post'' and, informally, ''WaPo'') is an American daily newspaper published in Washington, D.C. It is the most widely circulated newspaper within the Washington metropolitan area and has a large nati ...
'' reported that the U.S.
National Security Agency The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence (DNI). The NSA is responsible for global monitoring, collect ...
intercepted communications between Google's data centers, as part of a program named
MUSCULAR Skeletal muscles (commonly referred to as muscles) are organs of the vertebrate muscular system and typically are attached by tendons to bones of a skeleton. The muscle cells of skeletal muscles are much longer than in the other types of muscle ...
. This wiretapping was made possible because, at the time, Google did not encrypt data passed inside its own network. This was rectified when Google began encrypting data sent between data centers in 2013.


Environmental impact

Google's most efficient data center runs at using only fresh air cooling, requiring no electrically powered air conditioning. In December 2016, Google announced that—starting in 2017—it would purchase enough renewable energy to match 100% of the energy usage of its data centers and offices. The commitment will make Google "the world's largest corporate buyer of renewable power, with commitments reaching 2.6 gigawatts (2,600 megawatts) of wind and solar energy".


References


Further reading

* * Shankland, Stephen, CNET news
Google uncloaks once-secret server
." April 1, 2009.


External links




Web Search for a Planet: The Google Cluster Architecture
(Luiz André Barroso, Jeffrey Dean, Urs Hölzle) {{Google LLC Google real estate Data centers