Autoscaling, also spelled auto scaling or auto-scaling, and sometimes also called automatic scaling, is a method used in
cloud computing
Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over mul ...
that dynamically adjusts the amount of computational resources in a server farm - typically measured by the number of active servers - automatically based on the load on the farm. For example, the number of servers running behind a web application may be increased or decreased automatically based on the number of active users on the site. Since such metrics may change dramatically throughout the course of the day, and servers are a limited resource that cost money to run even while idle, there is often an incentive to run "just enough" servers to support the current load while still being able to support sudden and large spikes in activity. Autoscaling is helpful for such needs, as it can reduce the number of active servers when activity is low, and launch new servers when activity is high. Autoscaling is closely related to, and builds upon, the idea of
load balancing.
Advantages
Autoscaling offers the following advantages:
* For companies running their own web server infrastructure, autoscaling typically means allowing some servers to go to sleep during times of low load, saving on electricity costs (as well as water costs if water is being used to cool the machines).
[
* For companies using infrastructure hosted in the cloud, autoscaling can mean lower bills, because most cloud providers charge based on total usage rather than maximum capacity.][
* Even for companies that cannot reduce the total compute capacity they run or pay for at any given time, autoscaling can help by allowing the company to run less time-sensitive workloads on machines that get freed up by autoscaling during times of low traffic.
* Autoscaling solutions, such as the one offered by Amazon Web Services, can also take care of replacing unhealthy instances and therefore protecting somewhat against hardware, network, and application failures.][
* Autoscaling can offer greater uptime and more availability in cases where production workloads are variable and unpredictable.
Autoscaling differs from having a fixed daily, weekly, or yearly cycle of server use in that it is responsive to actual usage patterns, and thus reduces the potential downside of having too few or too many servers for the traffic load. For instance, if traffic is usually lower at midnight, then a static scaling solution might schedule some servers to sleep at night, but this might result in downtime on a night where people happen to use the Internet more (for instance, due to a viral news event). Autoscaling, on the other hand, can handle unexpected traffic spikes better.][
]
Terminology
In the list below, we use the terminology used by Amazon Web Services
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide d ...
(AWS). However, alternative names are noted and terminology that is specific to the names of Amazon services is not used for the names.
Practice
Amazon Web Services (AWS)
Amazon Web Services launched the Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment o ...
(EC2) service in August 2006, that allowed developers to programmatically create and terminate instances (machines). At the time of initial launch, AWS did not offer autoscaling, but the ability to programmatically create and terminate instances gave developers the flexibility to write their own code for autoscaling.
Third-party autoscaling software for AWS began appearing around April 2008. These included tools by Scalr and RightScale. RightScale was used by Animoto, which was able to handle Facebook traffic by adopting autoscaling.
On May 18, 2009, Amazon launched its own autoscaling feature along with Elastic Load Balancing, as part of Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment o ...
. Autoscaling is now an integral component of Amazon's EC2 offering. Autoscaling on Amazon Web Services is done through a web browser or the command line tool. In May 2016 Autoscaling was also offered in AWS ECS Service.
On-demand video provider Netflix
Netflix, Inc. is an American subscription video on-demand over-the-top streaming service and production company based in Los Gatos, California. Founded in 1997 by Reed Hastings and Marc Randolph in Scotts Valley, California, it offers a fil ...
documented their use of autoscaling with Amazon Web Services to meet their highly variable consumer needs. They found that aggressive scaling up and delayed and cautious scaling down served their goals of uptime and responsiveness best.
In an article for ''TechCrunch
TechCrunch is an American online newspaper focusing on high tech and startup companies. It was founded in June 2005 by Archimedes Ventures, led by partners Michael Arrington and Keith Teare.
In 2010, AOL acquired the company for approximately ...
'', Zev Laderman, the co-founder and CEO of Newvem, a service that helps optimize AWS cloud infrastructure, recommended that startups use autoscaling in order to keep their Amazon Web Services costs low.
Various best practice guides for AWS use suggest using its autoscaling feature even in cases where the load is not variable. That is because autoscaling offers two other advantages: automatic replacement of any instances that become unhealthy for any reason (such as hardware failure, network failure, or application error), and automatic replacement of spot instances that get interrupted for price or capacity reasons, making it more feasible to use spot instances for production purposes. Netflix's internal best practices require every instance to be in an autoscaling group, and its conformity monkey terminates any instance not in an autoscaling group in order to enforce this best practice.
Microsoft's Windows Azure
On June 27, 2013, Microsoft
Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
announced that it was adding autoscaling support to its Windows Azure
Microsoft Azure, often referred to as Azure ( , ), is a cloud computing platform operated by Microsoft for application management via around the world-distributed data centers. Microsoft Azure has multiple capabilities such as software as a ...
cloud computing platform. Documentation for the feature is available on the Microsoft Developer Network
Microsoft Developer Network (MSDN) was the division of Microsoft responsible for managing the firm's relationship with developers and testers, such as hardware developers interested in the operating system (OS), and software developers developing ...
.
Oracle Cloud
Oracle Cloud Platform
Oracle Cloud Platform refers to a Platform as a Service (PaaS) offerings by Oracle Corporation as part of Oracle Cloud Infrastructure. These offerings are used to build, deploy, integrate and extend applications in the cloud. The offerings supp ...
allows server instances to automatically scale a cluster in or out by defining an auto-scaling rule. These rules are based on CPU and/or memory utilization and determine when to add or remove nodes.
Google Cloud Platform
On November 17, 2014, the Google Compute Engine
Google Compute Engine (GCE) is the Infrastructure as a Service (IaaS) component of Google Cloud Platform which is built on the global infrastructure that runs Google's search engine, Gmail, YouTube and other services. Google Compute Engine en ...
announced a public beta of its autoscaling feature for use in Google Cloud Platform applications. As of March 2015, the autoscaling tool is still in Beta.
Facebook
In a blog post in August 2014, a Facebook engineer disclosed that the company had started using autoscaling to bring down its energy costs. The blog post reported a 27% decline in energy use for low traffic hours (around midnight) and a 10-15% decline in energy use over the typical 24-hour cycle.[
]
Kubernetes Horizontal Pod Autoscaler
Kubernetes
Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
Horizontal Pod Autoscaler automatically scales the number of pods in
replication controller
deployment
o
replicaset
based on observed CPU utilization (or, with beta support, on some other
application-provided metrics
Alternative autoscaling decision approaches
Autoscaling by default uses ''reactive'' decision approach for dealing with traffic scaling: scaling only happens in response to real-time changes in metrics. In some cases, particularly when the changes occur very quickly, this reactive approach to scaling is insufficient. Two other kinds of autoscaling decision approaches are described below.
Scheduled autoscaling approach
This is an approach to autoscaling where changes are made to the minimum size, maximum size, or desired capacity of the autoscaling group at specific times of day. Scheduled scaling is useful, for instance, if there is a known traffic load increase or decrease at specific times of the day, but the change is too sudden for reactive approach based autoscaling to respond fast enough. AWS autoscaling groups support scheduled scaling.
Predictive autoscaling
This approach to autoscaling uses predictive analytics
Predictive analytics encompasses a variety of statistical techniques from data mining, predictive modeling, and machine learning that analyze current and historical facts to make predictions about future or otherwise unknown events.
In busine ...
. The idea is to combine recent usage trends with historical usage data as well as other kinds of data to predict usage in the future, and autoscale based on these predictions.
For parts of their infrastructure and specific workloads, Netflix found that Scryer, their predictive analytics engine, gave better results than Amazon's reactive autoscaling approach. In particular, it was better for:
* Identifying huge spikes in demand in the near future and getting capacity ready a little in advance
* Dealing with large-scale outages, such as failure of entire availability zones and regions
* Dealing with variable traffic patterns, providing more flexibility on the rate of scaling out or in based on the typical level and rate of change in demand at various times of day
On November 20, 2018, AWS announced that predictive scaling would be available as part of its autoscaling offering.
See also
* Cloud computing
Cloud computing is the on-demand availability of computer system resources, especially data storage ( cloud storage) and computing power, without direct active management by the user. Large clouds often have functions distributed over mul ...
* Load balancing
* Amazon Web Services
Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon that provides on-demand cloud computing platforms and APIs to individuals, companies, and governments, on a metered pay-as-you-go basis. These cloud computing web services provide d ...
Auto Scaling with Docker
References
{{reflist, 30em
Network management
Servers (computing)
Cloud computing