DISCOVER, MIGRATE & DEPLOY PRE-CONFIGURED BIG DATA BI & ADVANCED ANALYTIC SOLUTIONS IN MINUTES – AND PAY ONLY FOR WHAT YOU USE BY THE HOUR (Chapter 3.7 in “All AWS Data Analytics Services”)

3.7  DISCOVER, MIGRATE & DEPLOY PRE-CONFIGURED BIG DATA BI & ADVANCED ANALYTIC SOLUTIONS IN MINUTES – AND PAY ONLY FOR WHAT YOU USE BY THE HOUR

Talking about AWS Marketplace is a passion of mine. I view it as AWS’ gift to the business world. No other cloud provider has the authority and influence to attract over a thousand technology partners and independent software vendors (ISVs) from popular vendors that have licensed and packaged their software to run on AWS, have integrated their software with AWS capabilities, or to deliver add-on services that benefit their customers as greatly as AWS does through the AWS Marketplace. The AWS Marketplace is the largest “app store” in the world, regardless of being strictly a B2B app store!

This translates to the best & most popular software vendors going to great lengths to alter their software to seamlessly integrate with other AWS Services and run seamlessly on the AWS cloud. Only AWS has the prominence and much deserved reputation being a total customer-centric company that’s necessary to attract such renown ISVs, and for each one to take the time required to be offered in AWS Marketplace.

The AWS Marketplace facilitates the discovery, purchasing and deployment of BI and Big Data solutions (and many more categories) on AWS, and migrate or get the business intelligence and data analytics solutions you want in minutes… and pay only for what you consume.
For those of you who haven’t heard about AWS Marketplace or dismiss it (for any number of pre-conceived ideas) as another “That’s how they get you!” thought, let me explain the facts, the benefits, and how to navigate AWS Marketplace. Please read on: AWS Marketplace has more than 100,000 active customers who use 300M compute hours/month deployed on Amazon EC2, with more than 3,000 listings from over 1,000 popular software vendors (not including the new SaaS launch that occurred in late November, 2016.)

Since AWS resources can be instantiated in seconds, you can treat these as “disposable” resources – not hardware or software you’ve spent months deciding which to choose and spending a significant up-front expenditure without knowing if it will solve your problems. The “Services not Servers” mantra of AWS provides many ways to increase developer productivity, operational efficiency and the ability to “try on” various solutions available on AWS Marketplace to find the perfect fit for your business needs without commitment to long-term contracts.

1-Click, on-demand infrastructure through software solutions on AWS Marketplace allows iterative, experimental deployment and usage to take advantage of advanced analytics and emerging technologies within minutes, paying only for what you consume, by the hour or by the month.

The vast majority of big data use cases deployed in the cloud today run on AWS, with unique customer references for big data analytics, of which 67 are household names. AWS has over 50 Services and hundreds of features to support virtually any big data application and workload. When you combine the managed AWS services with software solutions available on AWS Marketplace, you can get the precise business intelligence and big data analytical solutions you want that augments and enhances your project beyond what the services themselves provide. There are over 290 big data solutions in AWS Marketplace. Therefore, you get to data-driven results faster by decreasing the time it takes to plan, forecast, and make software provisioning decisions. This greatly improves the way you build business analytics solutions and run your business.

You can read the whitepaper on AWS Big Data Analytics Leveraging the AWS Marketplace, where I’m an author, by going to https://aws.amazon.com/mp/bi/ –> scroll to the bottom under “Additional Resources”, & click on “Download Solution Overview”. Below is a screenshot of the first page:

I'm an Contributor of this "Business Intelligence & Big Data on AWS, Leveraging ISV AWS Marketplace Solutions" Whitepaper

I’m an Contributor of this “Business Intelligence & Big Data on AWS, Leveraging ISV AWS Marketplace Solutions” Whitepaper

Below are just a fraction of example solutions you can achieve when using AWS Marketplace’s software solutions with AWS big data services:

You can:

  • Launch pre-configured and pre-tested experimentation platforms for big data analysis
  • Query your data where it sits (in-datasource analysis) without moving or storing your data on an intermediate server while directly accessing the most powerful functions of the underlying database
  • Perform “ELT” (extract, load, and transform) vs. “ETL” (extract, transform, and load) your data into Amazon’s Redshift data warehouse so the data is in its original form, giving you the ability to perform multiple data warehouse transforms on the same data
  • Have long-term connectivity among many different databases
  • Ensure your data is clean and complete prior to analysis
  • Visualize millions of data points on a map
  • Develop route planning and geographic customer targeting
  • Embed visualizations in applications or stand-alone applications
  • Visualize billions of rows in seconds
  • Graph data and drill into areas of concern
  • Have built-in data science
  • Export information into any format
  • Deploy machine-learning algorithms for data mining and predictive analytics
  • Meet the needs of specialized data connector requirements
  • Create real-time geospatial visualization and interactive analytics
  • Have both OLAP and OLTP analytical processing
  • Map disparate data sources (cloud, social, Google Analytics, mobile, on-prem, big data or relational data) using high-performance massively parallel processing (MPP) with easy-to-use wizards
  • Fine-tune the type of analytical result (location, prescriptive, statistical, text, predictive, behavior, machine learning models and so on)
  • Customize the visualizations in countless views with different levels of interactivity
  • Integrate with existing SAP products
  • Deploy a new data warehouse or extend your existing one

Amazon EC2 provides an ideal platform for operating your own self-managed big data analytics applications on AWS infrastructure. Almost any software you can install on Linux or Windows virtualized environments can be run on Amazon EC2 with a pay-as-you-go pricing model with a solution available on AWS Marketplace. Amazon EC2 uses the implemented architecture to distribute computing power across parallel servers to execute the algorithms in the most efficient manner.

Some examples of self-managed big data analytics that run on Amazon EC2 include the following:

  • A Splunk Enterprise Platform, the leading software platform for real-time Operational Intelligence. Splunk software and cloud services enable organizations to search, monitor, analyze and visualize machine-generated big data coming from websites, applications, servers, networks, sensors and mobile devices. A Splunk Analytics for Hadoop, within AWS, solution is available on AWS Marketplace also. It’s called Hunk and it enables interactive exploration, analysis, and data visualization for data stored in Amazon EMR and Amazon S3
  • A Tableau Server Data Visualization Instance, for users to interact with pre-built data visualizations created using Tableau Desktop. Tableau server allows for ad-hoc querying and data discovery, supports high-volume data visualization and historical analysis, and enables the creation of reports and dashboards
  • A SAP HANA One Instance, a single-tenant SAP HANA database instance that has SAP HANA’s in-memory platform, to do transactional processing, operational reporting, online analytical processing, predictive and text analysis
  • A Geospatial AMI such as MapLarge, that brings high-performance, real-time geospatial visualization and interactive analytics. MapLarge’s visualization results are useful for plotting addresses on a map to determine demographics, analyzing law enforcement and intelligence data, delivering insight to public health information, and visualizing distances such as roads and pipelines
  • An Advanced Analytics Zementis ADAPA Decision Engine Instance, which is a platform and scoring engine to produce Data Science predictive models that integrate with other predictive models like R, Python, KNIME, SAS, SPSS, SAP, FICO and more. Zementis ADAPA Decision Engine can score data in real-time using web services or in batch mode from local files or data in Amazon S3 buckets. It provides predictive analytics through many predictive algorithms, sensor data processing (IoT), behavior analysis, and machine learning models
  • A Matillion Data Integration Instance, an ELT service natively built for Amazon Redshift, that uses Amazon Redshift’s processing for data transformations to utilize it’s blazing speed and scalability. Matillion gives the ability to orchestrate and/or transform data upon ingestion or simply load the data so it can be transformed multiple times as your business requires

Below is an AWS Marketplace brochure explaining the benefits of using Marketplace solutions for big data analytics (that can also be found on the “BI & Big Data Landing Page” if you scroll to the bottom of the page & click on “Download PDF Poster“.

Benefits of AWS Marketplace in Analytical Solutions

Poster on the Benefits of AWS Marketplace in Analytical Solutions

The Main Categories on AWS Marketplace

AWS Marketplace has solutions for big data analytics, but listed below are all of the main sections, with links to each topics’ respective landing pages:

Breaking Down the Main AWS Marketplace Categories to Specific Functionalities:

Security Solutions:

Network Infrastructure Solutions:

Storage Solutions:

BI and Big Data Solutions:

Database Solutions:

Application Development Solutions:

Content Delivery Solutions:

Mobile Solutions:

Microsoft Solutions (note: the list of  “Third-Party Software Products” is a small fraction of the AWS Marketplace solutions that run on Microsoft Servers):

  • Microsoft Workloads:
    • Windows Server (many editions, type “Windows Server” into AWS Marketplace search bar)
    • Exchange Server
    • Microsoft Dynamics (many editions, type “Microsoft Dynamics” into AWS Marketplace search bar)
    • Microsoft SQL Server  (many editions, type “SQL Server” into AWS Marketplace search bar)
    • SharePoint (many editions, type “SharePoint” into AWS Marketplace search bar)
  • Third-Party Software Products:

Migration Solutions:

I hope you read through the entire post, and that you now realize how much time, frustration, configuration, and money you can save by using the preconfigured software solutions available at AWS Marketplace, only paying for what you use!

Why do it any other way?

Using Pre-Configured Software Solutions from AWS Marketplace with 1-Click Deployments & Paying by the Hour - Why Do It Any Other Way???

Using Pre-Configured Software Solutions from AWS Marketplace with 1-Click Deployments & Paying by the Hour – Why Do It Any Other Way???

Read the previous post here.

#gottaluvAWS! #gottaluvAWSMarketplace!

Posted in 1-Click to Deploy Software Solutions for Your Choosing Paid for by the Hour, Amazon EC2 On-Demand Instances, Amazon Web Services, Amazon Web Services Analytic Services, AWS Analytic Services, AWS Analytics, AWS BI, AWS Business Value, AWS Data Collection, AWS Marketplace, AWS Marketplace FAQs, AWS Marketplace Security Solutions, Cloud Computing, Faster Time to Data-Driven Results, Faster Time to ROI, How to Find AWS Marketplace Category Langing Pages, How to Find AWS Marketplace Preferred Vendors, List of Main Vendors by Category AWS Marketplace, Making Your IT Life Simpler with AWS Marketplace | Leave a comment

TRADITIONAL RELATIONAL DATABASE MANAGEMENT SYSTEMS (Chapter 3.6 in “All AWS Data Analytics Services”)

A Traditional Relational Database Schema Showing Tables and Relations

A Traditional Relational Database Schema Showing Tables, Relations, & Keys

3.6  TRADITIONAL RELATIONAL DATABASE MANAGEMENT SYSTEMS

A relational database management system (RDBMS) is a database management system (DBMS) that is based on the relational model.

In 1970, Edgar F. Codd, a British computer scientist with IBM, published “A Relational Model of Data for Large Shared Data Banks.” At the time, the renowned paper attracted little interest, and few understood how Codd’s groundbreaking work would define the basic rules for relational data storage for decades to come, which can be simplified as:

  1. Data must be stored and presented as relations, i.e., tables that have relationships with each other, e.g., primary/foreign keys.
  2. To manipulate the data stored in tables, a system should provide relational operators – code that enables the relationship to be tested between two entities. A good example is the WHERE clause of a SELECT statement, i.e., the SQL statement SELECT * FROM CUSTOMER_MASTER WHERE CUSTOMER_SURNAME = ’Smith’ will query the CUSTOMER_MASTER table and return all customers with a surname of Smith.

RDBMSs have been a common choice for the storage of information in databases used for financial records, manufacturing and logistical information, personnel data, and other applications using historical, transactional data since the 1980s.

However, relational databases have received unsuccessful challenge attempts by object database management systems in the 1980s and 1990s (which were introduced trying to address the so-called object-relational impedance mismatch between relational databases and object-oriented application programs) and also by XML database management systems in the 1990s.

Despite such attempts, RDBMSs keep most of the market share, but that share is declining because of the lack of the ability to scale, concurrency issues, and the high network bandwidth required for queries having to traverse many tables that have been architected to be highly normalized. Database normalization is the technique used in organizing the data in an RDBMS. It’s a systematic approach of decomposing tables to eliminate data redundancy and improve data integrity.

Two examples of traditional relational databases are Microsoft SQL Server & Oracle Databases.

In Chapter 22, the second section will compare traditional relational databases with Amazon RDS Aurora database, that is a new RDBMS built from the ground up for the cloud and just recently surpassed Amazon Redshift to be AWS’ fastest growing service.

Read the previous post here.

Read the next post here.

#gottaluvAWS! #gottaluvAWSMarketplace!

 

Posted in Amazon Aurora, Amazon Web Services, AWS BI, Microsoft SQL Server, Oracle Database, RDBMS, Traditional Relational Database Systems | Leave a comment

TRADITIONAL DATA WAREHOUSES (Chapter 3.5 in “All AWS Data Analytics Services”)

Schematic of an OLAP Cube Used in Traditional Data Warehouses

Schematic of an OLAP Cube Used in Traditional Data Warehouses

3.5  TRADITIONAL DATA WAREHOUSES

A traditional data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis, and is considered a core component of business intelligence.

Online analytical processing (OLAP) cubes are multi-dimensional data structures that traditional Data Warehouse use to contain the data that you import. The cubes divide the data into subsets that are defined by dimensions.

In a dimensional approach, transaction data are partitioned into “facts”, which are generally numeric transaction data, and “dimensions“, which are the reference information that gives context to the facts. For example, a sales transaction can be broken up into facts such as the number of products ordered and the total price paid for the products, and into dimensions such as order date, customer name, product number, order ship-to and bill-to locations, and salesperson responsible for receiving the order.

A key advantage of a dimensional approach is that the data warehouse is easier for the user to understand and to use. Also, the retrieval of data from the data warehouse tends to operate very quickly. Dimensional structures are easy to understand for business users, because the structure is divided into measurements/facts and context/dimensions. Facts are related to the organization’s business processes and operational system whereas the dimensions surrounding them contain context about the measurement. Another advantage offered by dimensional model is that it does not involve a relational database every time. Thus, this type of modeling technique is very useful for end-user queries in data warehouse.

The main disadvantages of the dimensional approach are the following:

  1. In order to maintain the integrity of facts and dimensions, loading the data warehouse with data from different operational systems is extremely complicated and time-consuming.
  2. The mapping of the disparate data sources involved very complex mapping of each column, done through the “T” portion of ETL (Extract, Transform & Load). Depending on how much data was to be mapped, the types of disparate data sources, and data competion & cleanliness often led this process to take many, many months done by highly-skilled, highly-paid specialists.
  3. It is difficult to modify the data warehouse structure if the organization adopting the dimensional approach changes the way in which it does business. Normally, a new cube would have to be made to answer a different set of analytical questions.

In Chapter 9, “Amazon Redshift Data Warehouse”, a comparison is done between Amazon Redshift’s modern approach to a Data Warehousing vs. this traditional approach to Data Warehousing.

Read the previous post here.

Read the next post here.

#gottaluvAWS! #gottaluvAWS Marketplace!

Posted in Dimensions & Measures, OLAP Cubes, Traditional Data Warehousing | Leave a comment

CLOUD & DATA SECURITY (Chapter 3.4 of “All AWS Data Analytics Services”)

You Don't Mess with the AWS Cloud - The Most Secure Cloud Platform - OR VITO & ROTWEILLERS WILL COME OUT OF NOWHERE!

You Don’t Mess with the AWS Cloud – The Most Secure Cloud Platform – OR VITO & ROTTWEILERS WILL COME OUT OF NOWHERE!

3.4  AWS CLOUD & DATA SECURITY

AWS provides capabilities across all of your locations, your networks, software and business processes meeting the strictest security requirements that are continually audited for the broadest range of security certifications.

Some of AWS' Strict Security Compliance and Privacy Certifications

Some of AWS’ Strict Security Compliance and Privacy Certifications

Security at AWS is the highest priority. As an AWS customer, you benefit from a data center and network architecture built to meet the requirements of the most security-sensitive customers. Your data and applications are far more secure on AWS than in your own office.

Government, education and nonprofit organizations face unique challenges to accomplish complex missions with limited resources. Public sector leaders engaged in true cloud computing projects overwhelmingly turn to the power and speed of AWS when they want to serve citizens more effectively, achieve scientific breakthroughs, reach broader constituents and put more of their time and resources into their core missions – yet meet all regulatory, compliance, and security mandatory requirements.

The AWS cloud provides governance capabilities enabling continuous monitoring of configuration changes to your IT resources as well as giving you the ability to leverage multiple native AWS security and encryption features for a higher level of data protection and compliance – security at every level up to the most stringent government compliance no matter what your industry. AWS now serves more than 2,300 government, 7,000 education and 22,000 nonprofit organizations worldwide including the U.S. Government, the U.S. Intelligence Community & the U.S. Department of Defense, and NASA/JPL.

AWS provides several security capabilities and services to increase privacy and control network access, including network firewalls built into Amazon VPC, data encryption in Amazon S3 and connectivity options that enable private or dedicated connections from your on-premises environment. Data encryption in transit & at rest.

AWS uses a  “Shared Responsibility Model” when it comes to security. The reason for this is that not every customer wants everything locked down in the same manner. While AWS manages security of the cloud, security in the cloud is the responsibility of the customer. Customers retain control of what security they choose to implement to protect their own content, platform, applications, systems and networks, no differently than they would for applications in an on-site datacenter.

AWS Shared Security Model Schematic

AWS Shared Security Model Schematic (image courtesy of AWS properties)

To read AWS Security Best Practices, read this.

AWS has a tiered competency-badged network of partners that provide application development expertise, managed services and professional services such as data migration. This ecosystem, along with AWS’s training and certification programs, makes it easy to adopt and operate AWS in a best-practice fashion.

Some of the Types of AWS Security Solutions Available in AWS Marketplace

Some of the Types of AWS Security Solutions Available in AWS Marketplace

Recommended AWS Marketplace Security Solutions for Security are presented in an overview manner below. For more detail, visit this page.

Below I’ll overview some of the recommended ISVs for specific security solution in AWS Marketplace:

You can read the AWS Marketplace “Security Solutions on AWS” whitepaper here.

Access comprehensive developer documents on AWS Security Resources here.

Read the previous post here.

Read the next post here.

#gottaluvAWS! #gottaluvAWSMarketplace!

 

Posted in Amazon Web Services, AWS Cloud & Data Security, AWS Marketplace, AWS Marketplace Security Solutions, AWS Shared Responsibility Model, Cloud Computing | Leave a comment

PROCESSING POWER (Chapter 3.3 of “All AWS Data Analytics”)

The Massive, Massive Processing Power of AWS Cloud

The Massive, Massive Processing Power of AWS Cloud

3.3  PROCESSING POWER

When “big data” became “the norm”, it was so large it became difficult to process using traditional database & software techniques. It normally exceeds processing capabilities available on-premises. AWS has computational power that’s second to none.

Amazon EC2 provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity that as a whole is measured by developers as “vCPU” or “Virtual CPU” (vs. the legacy way of describing EC2 compute power of “ECU” (Elastic Compute Unit) which you’ll still see at times today.

Each EC2 compute instance is optimized with varying combinations of CPU, memory, storage and networking capacity to meet the need of any big data analytics use case.

Each instance type includes one or more instance sizes, allowing you to scale your resources to the requirements of your target analytical workload. To read more about the differences between Amazon EC2-Classic and Amazon EC2-VPC, read this.

Massive Amounts of Processing Power - Practically Unlimited - is Available Using AWS

Massive Amounts of Processing Power – Practically Unlimited – is Available Using AWS

Amazon EC2 Instance Types

Performance is based on the Amazon EC2 instance type you choose. There are many instance types that you can read about here,

General-purpose instances (ie, M4, M3), provide a balance of compute, network and memory for many applications (with Intel E5 v3 processors, up to 64 vCPUs). M”x” instances are ideal for small and mid-size databases, data processing tasks that require additional memory, caching fleets, and for running backend servers for SAP, Microsoft SharePoint, cluster computing, and other enterprise applications.

Compute Optimized instances (ie, C3, C4), feature the highest performing processors (including custom CPUs optimized for EC2 and are recommended for graphics-optimized batch processing, distributed analytics, high performance science and engineering applications, ad serving, MMO gaming, and video encoding.

Accelerated Computing Instances (P2), are for GPU-optimized scenarios. They have high-performance NVIDIA K80 GPUs, each with 2,496 parallel processing cores and 12 GB of GPU memory, they support GPUDirect™ (peer-to-peer GPU communication). On a p2.16xlarge instance in this family, you have 16 GPUs, 64 vCPUs, 732 GB of memory & 192 GB of graphics memory! These instances are optimized for machine learning, high performance databases, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, genomics, rendering, and other server-side GPU compute workloads.

Memory Optimized instances (ie, R4, X1), for memory-intensive applications with up to 64 vCPUs, up to 2TB of RAM and SSD storage. X1 instances are recommended  for running in-memory databases like SAP HANA, big data processing engines like Apache Spark or Presto, and high performance computing (HPC) applications. X1 instances are certified by SAP to run Business Warehouse on HANA (BW), Data Mart Solutions on HANA, Business Suite on HANA (SoH), and the next-generation Business Suite S/4HANA in a production environment on the AWS cloud. R4 instances are recommended high performance databases, for high performance databases, data mining & analysis, in-memory databases, distributed web scale in-memory caches, applications performing real-time processing of unstructured big data, Hadoop/Spark clusters, and other enterprise applications.

Storage Optimized instances (ie, I3), with very fast SSD-backed instance storage optimized for high IOPS applications, perfect for massively parallel data warehousing applications, Hadoop, NoSQL databases NoSQL databases like Cassandra, MongoDB, Redis, in-memory databases such as Aerospike, scale out transactional databases, data warehousing, Elasticsearch, analytics workloads. Dense Storage instances (ie, D2) features up to 48 TB of HDD-based local storage, dense storage instances deliver high throughput, and offer the lowest price per disk throughput performance on EC2. This instance type is ideal for Massively Parallel Processing (MPP) data warehousing, MapReduce and Hadoop distributed computing, distributed file systems, network file systems, log or data-processing applications.

AWS Provides Virtually Unlimited Capacity for Massive Datasets with Blazingly Fast Processing Power

AWS Provides Virtually Unlimited Capacity for Massive Datasets with Blazingly Fast Processing Power

Most of these instances support hardware virtualization, AVX, AVX 2, Turbo, enhanced networking performance and cluster networking placement for low latency communication between instances, and run inside a Virtual Private Cloud, giving customers complete control over network architecture. Instances can also be dedicated to an individual customer to help meet regulatory and compliance requirements (such as HIPAA).

Applications that need to respond to high throughput real time streaming data, such as large scale distributed apps or IoT platforms, plus data intensive analytics applications or large scale web and mobile apps, can also run on AWS Lambda, a simple, scalable, low cost, reliable and low latency compute service, without having to provision or manage underlying compute resources.

Read the last post here.

Read the next post here.

#gottaluvAWS! #gottaluvAWSMarketplace!

Posted in Amazon EC2 On-Demand Instances, Amazon Web Services, Amazon Web Services Analytic Services, AWS Lambda, Compute Instance Types, ECU vs vCPU, Practically Unlimited Processing Power on AWS | Leave a comment

SCALING WORKLOADS (Chapter 3.2 of “All AWS Data Analytics Services”)

Cloud Scaling: Up & Out

Cloud Scaling: Up & Out

3.2  SCALING WORKLOADS

Scalability is the capability of a system, network or process to handle a growing amount of work or application traffic. The goal of being scalable is to be able to be available to your customers as demand for your application grows.

AWS provides a scalable architecture that supports growth in users, traffic or data without a drop in performance, both vertically and horizontally, and allows for distributed processing.

AWS makes fast, scalable, gigabyte-to-petabyte scale analytics affordable to anyone via their broad range of storage, compute and analytical options, guaranteed!

Manually Scaling with EC2 Instance Types

Amazon EC2 provides a broad range of instance types optimized to fit different use cases. Instance types are composed of varying combinations of CPU, memory, storage, and networking capacity allowing you to choose the appropriate mix of resources required by your application. As an example, there are “compute optimized“, “memory optimized“, “accelerated computing” (GPU-optimized), “storage optimized“, & “dense-storage optimized“. Within each family of EC2 instance types there are several instance sizes that allow you to scale your resources to the requirements of your target workload, giving the ability to scale up to a more performant instance in the family or to scale down to a less performant instance in the family without having to migrate to a new instance type. This means you can ensure that you maintain performance during spikes in demand and also scale down to save money when there is less demand.

When you resize an instance, you must select an instance type that is compatible with the configuration of the instance. If the instance type that you want is not compatible with the instance configuration you have, then you must migrate your application to a new instance with the instance type that you want.

Dynamically Scaling

You can also scale up or down dynamically using EC2 Auto Scaling. Auto Scaling helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You create collections of EC2 instances, called Auto Scaling Groups. You can specify the minimum number of instances in each Auto Scaling Group, and Auto Scaling ensures that your group never goes below this size. You can specify the maximum number of instances in each Auto Scaling Group, and Auto Scaling ensures that your group never goes above this size. If you specify the desired capacity, either when you create the group or at any time thereafter, Auto Scaling ensures that your group has this many instances. If you specify scaling policies, then Auto Scaling can launch or terminate instances as demand on your application increases or decreases automatically.

As Auto Scaling adds and removes EC2 instances, you must ensure that the traffic for your application is distributed across all of your EC2 instances. The Elastic Load Balancing service automatically routes incoming web traffic across such a dynamically changing number of EC2 instances. Your load balancer acts as a single point of contact for all incoming traffic to the instances in your Auto Scaling Group. Elastic Load Balancing can detect issues with a particular instance and automatically reroute traffic to other instances until the issues have been resolved and the original instance restored.

Auto Scaling and Elastic Load Balancing can both be triggered through the Amazon CloudWatch monitoring system. CloudWatch allows you to monitor what you’re running in Amazon’s cloud — collecting and tracking metrics, monitoring log files, setting and displaying alarms, and triggering actions like Auto Scaling and Elastic Load Balancing.

On-Premise Scaling Up

In an on-premise or data center IT environment, “scaling up” meant purchasing more and more hardware, guessing at how many servers, etc. would be needed at peak capacity. IT departments typically provisioned enough capacity to manage highest-case capacity scenarios, and these servers usually run 100% of the time. This approach to “scalability” can leave a significant amount of underutilized resources in the data centers most of the time — and inefficiency that can impact overall costs in many ways.

Elasticity

Elasticity

Elasticity

Elasticity is defined as the degree to which a system is able to adapt to workload changes by provisioning & de-provisioning resources in an autonomic manner, so that at each point in time the available resources match the current demand as closely as possible.

Scaling Out or In

Scaling out, or horizontally scaling, is accomplished on AWS through the benefit from massive economies of scale. Using cloud computing, you achieve a lower variable cost than you could get on your own. Because hundreds of thousands of customers are aggregated in the cloud, this translates to much lower pay-as-you-go prices. You can also scale out and in with “step scaling policies“.

Auto Scaling applies the aggregation type to the metric data points from all instances and compares the aggregated metric value against the upper and lower bounds defined by the step adjustments to determine which step adjustment to perform. For example, suppose that you have an alarm with a breach threshold of 50 and a scaling adjustment type of “PercentChangeInCapacity“. You also have scale out and scale in policies with the following step adjustments:

Scale out policy
Lower bound Upper bound Adjustment Metric value

0

10

0

50 <= value < 60

10

20

10

60 <= value < 70

20

null

30

70 <= value < +infinity

Scale in policy

Lower bound Upper bound Adjustment Metric value

-10

0

0

40 < value <= 50

-20

-10

-10

30 < value <= 40

null

-20

-30

-infinity < value <= 30

Your group has both a current capacity and a desired capacity of 10 instances. The group maintains its current and desired capacity while the aggregated metric value is greater than 40 and less than 60.

If the metric value gets to 60, Auto Scaling increases the desired capacity of the group by 1 instance, to 11 instances, based on the second step adjustment of the scale-out policy (add 10 percent of 10 instances). After the new instance is running and its specified warm-up time has expired, Auto Scaling increases the current capacity of the group to 11 instances. If the metric value rises to 70 even after this increase in capacity, Auto Scaling increases the desired capacity of the group by another 3 instances, to 14 instances, based on the third step adjustment of the scale-out policy (add 30 percent of 11 instances, 3.3 instances, rounded down to 3 instances).

If the metric value gets to 40, Auto Scaling decreases the desired capacity of the group by 1 instance, to 13 instances, based on the second step adjustment of the scale-in policy (remove 10 percent of 14 instances, 1.4 instances, rounded down to 1 instance). If the metric value falls to 30 even after this decrease in capacity, Auto Scaling decreases the desired capacity of the group by another 3 instances, to 10 instances, based on the third step adjustment of the scale-in policy (remove 30 percent of 13 instances, 3.9 instances, rounded down to 3 instances), etc.

Instance Warmup

With step scaling policies, you can specify the number of seconds that it takes for a newly launched instance to warm up. Until its specified warm-up time has expired, an instance is not counted toward the aggregated metrics of the Auto Scaling Group.

While scaling out, Auto Scaling does not consider instances that are warming up as part of the current capacity of the group. Therefore, multiple alarm breaches that fall in the range of the same step adjustment result in a single scaling activity. This ensures that we don’t add more instances than you need. Using the example in the previous section, suppose that the metric gets to 60, and then it gets to 62 while the new instance is still warming up. The current capacity is still 10 instances, so Auto Scaling should add 1 instance (10 percent of 10 instances), but the desired capacity of the group is already 11 instances, so Auto Scaling does not increase the desired capacity further. However, if the metric gets to 70 while the new instance is still warming up, Auto Scaling should add 3 instances (30 percent of 10 instances), but the desired capacity of the group is already 11, so Auto Scaling adds only 2 instances, for a new desired capacity of 13 instances.

While scaling in, Auto Scaling considers instances that are terminating as part of the current capacity of the group. Therefore, AWS won’t remove more instances from the Auto Scaling Group than necessary.

Note that a “scale in” activity can’t start while a “scale out” activity is in progress.

On-Premise Horizontal Scaling

Typically on-premise horizontally scaling would give lower costs by using commodity hardware and software. This solution is very operationally complex and time consuming, and requires long hours put in by your IT staff continually, making sure enough capacity is provisioned all the time, no matter what.

"Scaling Out" Guessing Done On-Prem Requires Constant Monitoring and Configuration

“Scaling Out” Guessing Done On-Prem Requires Constant Monitoring and Configuration

Scaling with AWS Lambda

Sooner than you think, servers will be obsolete – at minimum, “old school.”  AWS Lambda is part of another new buzz word: “Serverless Computing.”

AWS Lambda runs your code written in Java, Node.js, and Python without requiring you to provision or manage servers. Lambda will run and scale your code with high availability, and you pay only for the compute time you consume in increments of 100 milliseconds. With Lambda, you can run code for virtually any type of application or backend service – all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.

Below are some diagrams on the architecture of 3 different Lambda processes.

  1. Real-time File Processing: You can use Amazon S3 to trigger AWS Lambda to process data immediately after an upload. For example, you can use Lambda to thumbnail images, transcode videos, index files, process logs, validate content, and aggregate and filter data in real-time:
Diagram of Lambda in Real-Time File Processing

Diagram of Lambda in Real-Time File Processing (image courtesy of AWS properties)

2. Real-time Stream Processing: You can use AWS Lambda and Amazon Kinesis to process real-time streaming data for application activity tracking, transaction order processing, click stream analysis, data cleansing, metrics generation, log filtering, indexing, social media analysis, and IoT device data telemetry and metering:

Diagram of Lambda Real-time Stream Processing

Diagram of Lambda Real-time Stream Processing (image courtesy of AWS properties)

3. Extract, Transform, & Load: You can use AWS Lambda to perform data validation, filtering, sorting, or other transformations for every data change in a DynamoDB table and load the transformed data to another data store:

Diagram of Lambda Real-time Retail Data Warehouse ETL (image courtesy of AWS properties)

Diagram of Lambda Real-time Retail Data Warehouse ETL Processing (image courtesy of AWS properties)

Read the previous post here.

Read the next post here.

#gottaluvAWS! #gottaluvAWSMarketplace!

 

Posted in Amazon Web Services, Auto Scaling, AWS Lambda, Cloud Computing, EC2, EC2 Instance Warm Up, Elasticity, Scaling In and Out, Scaling Up or Down | Leave a comment

OVERCOMING KEY CHALLENGES IN DATA ANALYTICS (Chapter 3.1 of “All AWS Data Analytics Services”)

Stop Guessing at Capacity!

Stop Guessing at Capacity!

3.1  CAPACITY GUESSING

In the pre-cloud days, many different types of servers were located on-premises or in a data center. The servers were expensive, each software package that needed to run on each server needed very expensive licenses which had to be continually renewed or updated, and you needed highly-paid staff to provision, configure, and maintain the servers.

As an ever-increasing and ubiquitous proliferation of data is emitted from increasingly new and previously unforeseen sources, traditional in-house IT solutions are unable to keep up with the pace. Heavily investing in data centers and servers by “best guess” is a waste of time and money, and a never-ending job.

AWS eliminates over-purchasing of servers and infrastructure capacity needs. Before the cloud, when you made a capacity decision prior to deploying an application, you often over-purchased “just in case” your app becomes the next killer app. Oftentimes you ended up with expensive idle resources or even worse, dealing with limited capacity & losing customers.

On AWS, you can access as much or as little capacity as you need, and scale horizontally or vertically as required within minutes, or even automate capacity. This lowers the cost of ownership and reduces management overhead costs, freeing up your business for more strategic and business-focused tasks.

AWS Eliminates Capacity Guessing in Their Massive Secure Data Centers

AWS Eliminates Capacity Guessing in Their Massive Secure Data Centers

One benefit of being on AWS is that you trade “Capital Expense” for “Variable Expense“. Rather than having to invest heavily in data centers & servers before you know how you’re going to use them, you only pay when you consume computing resources, and only for what is actually consumed. This translates into a dramatic decrease in IT costs. It’s much smarter to focus on the projects that differentiate your business vs. the heavy lifting of racking, stacking, & powering servers.

The Benefits of Using an AWS Fully-Managed Service

The Benefits of Using an AWS Fully-Managed Service

The need for speed and agility today in analyzing data differently and efficiently requires complex architectures that are available and ready for use with the click of a button on AWS and the AWS Marketplace – eliminating the need to concern yourself with the underlying mechanisms and configurations that you’d have to do on premises.

In addition, AWS offers “AWS Trusted Advisor“, an online resource to help you reduce cost, increase performance, and improve security by optimizing your AWS environment. It provides real-time guidance to help you provision resources following AWS best practices.

A Diagram of How AWS Trusted Advisor Works

A Diagram of How AWS Trusted Advisor Works

Every AWS customer gets access to four categories in Trusted Advisor:

  • Cost Optimization
  • Performance
  • Security
  • Fault Tolerance

If you have Business or Enterprise support, you then have access to the full set of Trusted Advisor categories.

Administrators can apply the Trusted Advisor suggestions at their own pace & adopt regular usage of Trusted Advisor recommendations as a significant part of an ongoing, day-to-day capacity optimization plan.

You can read the previous post here.

You can read the next post here.

#gottaluvAWS! #gottaluvAWSMarketplace!

 

 

 

Posted in #savingcostonAWS, Architecture, AWS Fully-Managed Service, AWS Marketplace, AWS Trusted Advisor, Cloud Computing, Eliminate Capacity Guessing | Leave a comment