SCALING WORKLOADS (Chapter 3.2 of “All AWS Data Analytics Services”)

Cloud Scaling: Up & Out

Cloud Scaling: Up & Out

3.2  SCALING WORKLOADS

Scalability is the capability of a system, network or process to handle a growing amount of work or application traffic. The goal of being scalable is to be able to be available to your customers as demand for your application grows.

AWS provides a scalable architecture that supports growth in users, traffic or data without a drop in performance, both vertically and horizontally, and allows for distributed processing.

AWS makes fast, scalable, gigabyte-to-petabyte scale analytics affordable to anyone via their broad range of storage, compute and analytical options, guaranteed!

Manually Scaling with EC2 Instance Types

Amazon EC2 provides a broad range of instance types optimized to fit different use cases. Instance types are composed of varying combinations of CPU, memory, storage, and networking capacity allowing you to choose the appropriate mix of resources required by your application. As an example, there are “compute optimized“, “memory optimized“, “accelerated computing” (GPU-optimized), “storage optimized“, & “dense-storage optimized“. Within each family of EC2 instance types there are several instance sizes that allow you to scale your resources to the requirements of your target workload, giving the ability to scale up to a more performant instance in the family or to scale down to a less performant instance in the family without having to migrate to a new instance type. This means you can ensure that you maintain performance during spikes in demand and also scale down to save money when there is less demand.

When you resize an instance, you must select an instance type that is compatible with the configuration of the instance. If the instance type that you want is not compatible with the instance configuration you have, then you must migrate your application to a new instance with the instance type that you want.

Dynamically Scaling

You can also scale up or down dynamically using EC2 Auto Scaling. Auto Scaling helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You create collections of EC2 instances, called Auto Scaling Groups. You can specify the minimum number of instances in each Auto Scaling Group, and Auto Scaling ensures that your group never goes below this size. You can specify the maximum number of instances in each Auto Scaling Group, and Auto Scaling ensures that your group never goes above this size. If you specify the desired capacity, either when you create the group or at any time thereafter, Auto Scaling ensures that your group has this many instances. If you specify scaling policies, then Auto Scaling can launch or terminate instances as demand on your application increases or decreases automatically.

As Auto Scaling adds and removes EC2 instances, you must ensure that the traffic for your application is distributed across all of your EC2 instances. The Elastic Load Balancing service automatically routes incoming web traffic across such a dynamically changing number of EC2 instances. Your load balancer acts as a single point of contact for all incoming traffic to the instances in your Auto Scaling Group. Elastic Load Balancing can detect issues with a particular instance and automatically reroute traffic to other instances until the issues have been resolved and the original instance restored.

Auto Scaling and Elastic Load Balancing can both be triggered through the Amazon CloudWatch monitoring system. CloudWatch allows you to monitor what you’re running in Amazon’s cloud — collecting and tracking metrics, monitoring log files, setting and displaying alarms, and triggering actions like Auto Scaling and Elastic Load Balancing.

On-Premise Scaling Up

In an on-premise or data center IT environment, “scaling up” meant purchasing more and more hardware, guessing at how many servers, etc. would be needed at peak capacity. IT departments typically provisioned enough capacity to manage highest-case capacity scenarios, and these servers usually run 100% of the time. This approach to “scalability” can leave a significant amount of underutilized resources in the data centers most of the time — and inefficiency that can impact overall costs in many ways.

Elasticity

Elasticity

Elasticity

Elasticity is defined as the degree to which a system is able to adapt to workload changes by provisioning & de-provisioning resources in an autonomic manner, so that at each point in time the available resources match the current demand as closely as possible.

Scaling Out or In

Scaling out, or horizontally scaling, is accomplished on AWS through the benefit from massive economies of scale. Using cloud computing, you achieve a lower variable cost than you could get on your own. Because hundreds of thousands of customers are aggregated in the cloud, this translates to much lower pay-as-you-go prices. You can also scale out and in with “step scaling policies“.

Auto Scaling applies the aggregation type to the metric data points from all instances and compares the aggregated metric value against the upper and lower bounds defined by the step adjustments to determine which step adjustment to perform. For example, suppose that you have an alarm with a breach threshold of 50 and a scaling adjustment type of “PercentChangeInCapacity“. You also have scale out and scale in policies with the following step adjustments:

Scale out policy
Lower bound Upper bound Adjustment Metric value

0

10

0

50 <= value < 60

10

20

10

60 <= value < 70

20

null

30

70 <= value < +infinity

Scale in policy

Lower bound Upper bound Adjustment Metric value

-10

0

0

40 < value <= 50

-20

-10

-10

30 < value <= 40

null

-20

-30

-infinity < value <= 30

Your group has both a current capacity and a desired capacity of 10 instances. The group maintains its current and desired capacity while the aggregated metric value is greater than 40 and less than 60.

If the metric value gets to 60, Auto Scaling increases the desired capacity of the group by 1 instance, to 11 instances, based on the second step adjustment of the scale-out policy (add 10 percent of 10 instances). After the new instance is running and its specified warm-up time has expired, Auto Scaling increases the current capacity of the group to 11 instances. If the metric value rises to 70 even after this increase in capacity, Auto Scaling increases the desired capacity of the group by another 3 instances, to 14 instances, based on the third step adjustment of the scale-out policy (add 30 percent of 11 instances, 3.3 instances, rounded down to 3 instances).

If the metric value gets to 40, Auto Scaling decreases the desired capacity of the group by 1 instance, to 13 instances, based on the second step adjustment of the scale-in policy (remove 10 percent of 14 instances, 1.4 instances, rounded down to 1 instance). If the metric value falls to 30 even after this decrease in capacity, Auto Scaling decreases the desired capacity of the group by another 3 instances, to 10 instances, based on the third step adjustment of the scale-in policy (remove 30 percent of 13 instances, 3.9 instances, rounded down to 3 instances), etc.

Instance Warmup

With step scaling policies, you can specify the number of seconds that it takes for a newly launched instance to warm up. Until its specified warm-up time has expired, an instance is not counted toward the aggregated metrics of the Auto Scaling Group.

While scaling out, Auto Scaling does not consider instances that are warming up as part of the current capacity of the group. Therefore, multiple alarm breaches that fall in the range of the same step adjustment result in a single scaling activity. This ensures that we don’t add more instances than you need. Using the example in the previous section, suppose that the metric gets to 60, and then it gets to 62 while the new instance is still warming up. The current capacity is still 10 instances, so Auto Scaling should add 1 instance (10 percent of 10 instances), but the desired capacity of the group is already 11 instances, so Auto Scaling does not increase the desired capacity further. However, if the metric gets to 70 while the new instance is still warming up, Auto Scaling should add 3 instances (30 percent of 10 instances), but the desired capacity of the group is already 11, so Auto Scaling adds only 2 instances, for a new desired capacity of 13 instances.

While scaling in, Auto Scaling considers instances that are terminating as part of the current capacity of the group. Therefore, AWS won’t remove more instances from the Auto Scaling Group than necessary.

Note that a “scale in” activity can’t start while a “scale out” activity is in progress.

On-Premise Horizontal Scaling

Typically on-premise horizontally scaling would give lower costs by using commodity hardware and software. This solution is very operationally complex and time consuming, and requires long hours put in by your IT staff continually, making sure enough capacity is provisioned all the time, no matter what.

"Scaling Out" Guessing Done On-Prem Requires Constant Monitoring and Configuration

“Scaling Out” Guessing Done On-Prem Requires Constant Monitoring and Configuration

Scaling with AWS Lambda

Sooner than you think, servers will be obsolete – at minimum, “old school.”  AWS Lambda is part of another new buzz word: “Serverless Computing.”

AWS Lambda runs your code written in Java, Node.js, and Python without requiring you to provision or manage servers. Lambda will run and scale your code with high availability, and you pay only for the compute time you consume in increments of 100 milliseconds. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

Below are some diagrams on the architecture of 3 different Lambda processes.

  1. Real-time File Processing: You can use Amazon S3 to trigger AWS Lambda to process data immediately after an upload. For example, you can use Lambda to thumbnail images, transcode videos, index files, process logs, validate content, and aggregate and filter data in real-time:
Diagram of Lambda in Real-Time File Processing

Diagram of Lambda in Real-Time File Processing (image courtesy of AWS properties)

2. Real-time Stream Processing: You can use AWS Lambda and Amazon Kinesis to process real-time streaming data for application activity tracking, transaction order processing, click stream analysis, data cleansing, metrics generation, log filtering, indexing, social media analysis, and IoT device data telemetry and metering:

Diagram of Lambda Real-time Stream Processing

Diagram of Lambda Real-time Stream Processing (image courtesy of AWS properties)

3. Extract, Transform, & Load: You can use AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store:

Diagram of Lambda Real-time Retail Data Warehouse ETL (image courtesy of AWS properties)

Diagram of Lambda Real-time Retail Data Warehouse ETL Processing (image courtesy of AWS properties)

Read the previous post here.

Read the next post here.

#gottaluvAWS! #gottaluvAWSMarketplace!

 

This entry was posted in Amazon Web Services, Auto Scaling, AWS Lambda, Cloud Computing, EC2, EC2 Instance Warm Up, Elasticity, Scaling In and Out, Scaling Up or Down. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s