3.3 PROCESSING POWER
When “big data” became “the norm”, it was so large it became difficult to process using traditional database & software techniques. It normally exceeds processing capabilities available on-premises. AWS has computational power that’s second to none.
Amazon EC2 provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity that as a whole is measured by developers as “vCPU” or “Virtual CPU” (vs. the legacy way of describing EC2 compute power of “ECU” (Elastic Compute Unit) which you’ll still see at times today.
Each EC2 compute instance is optimized with varying combinations of CPU, memory, storage and networking capacity to meet the need of any big data analytics use case.
Each instance type includes one or more instance sizes, allowing you to scale your resources to the requirements of your target analytical workload. To read more about the differences between Amazon EC2-Classic and Amazon EC2-VPC, read this.
Amazon EC2 Instance Types
Performance is based on the Amazon EC2 instance type you choose. There are many instance types that you can read about here,
General-purpose instances (ie, M4, M3), provide a balance of compute, network and memory for many applications (with Intel E5 v3 processors, up to 64 vCPUs). M”x” instances are ideal for small and mid-size databases, data processing tasks that require additional memory, caching fleets, and for running backend servers for SAP, Microsoft SharePoint, cluster computing, and other enterprise applications.
Compute Optimized instances (ie, C3, C4), feature the highest performing processors (including custom CPUs optimized for EC2 and are recommended for graphics-optimized batch processing, distributed analytics, high performance science and engineering applications, ad serving, MMO gaming, and video encoding.
Accelerated Computing Instances (P2), are for GPU-optimized scenarios. They have high-performance NVIDIA K80 GPUs, each with 2,496 parallel processing cores and 12 GB of GPU memory, they support GPUDirect™ (peer-to-peer GPU communication). On a p2.16xlarge instance in this family, you have 16 GPUs, 64 vCPUs, 732 GB of memory & 192 GB of graphics memory! These instances are optimized for machine learning, high performance databases, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, genomics, rendering, and other server-side GPU compute workloads.
Memory Optimized instances (ie, R4, X1), for memory-intensive applications with up to 64 vCPUs, up to 2TB of RAM and SSD storage. X1 instances are recommended for running in-memory databases like SAP HANA, big data processing engines like Apache Spark or Presto, and high performance computing (HPC) applications. X1 instances are certified by SAP to run Business Warehouse on HANA (BW), Data Mart Solutions on HANA, Business Suite on HANA (SoH), and the next-generation Business Suite S/4HANA in a production environment on the AWS cloud. R4 instances are recommended high performance databases, for high performance databases, data mining & analysis, in-memory databases, distributed web scale in-memory caches, applications performing real-time processing of unstructured big data, Hadoop/Spark clusters, and other enterprise applications.
Storage Optimized instances (ie, I3), with very fast SSD-backed instance storage optimized for high IOPS applications, perfect for massively parallel data warehousing applications, Hadoop, NoSQL databases NoSQL databases like Cassandra, MongoDB, Redis, in-memory databases such as Aerospike, scale out transactional databases, data warehousing, Elasticsearch, analytics workloads. Dense Storage instances (ie, D2) features up to 48 TB of HDD-based local storage, dense storage instances deliver high throughput, and offer the lowest price per disk throughput performance on EC2. This instance type is ideal for Massively Parallel Processing (MPP) data warehousing, MapReduce and Hadoop distributed computing, distributed file systems, network file systems, log or data-processing applications.
Most of these instances support hardware virtualization, AVX, AVX 2, Turbo, enhanced networking performance and cluster networking placement for low latency communication between instances, and run inside a Virtual Private Cloud, giving customers complete control over network architecture. Instances can also be dedicated to an individual customer to help meet regulatory and compliance requirements (such as HIPAA).
Applications that need to respond to high throughput real time streaming data, such as large scale distributed apps or IoT platforms, plus data intensive analytics applications or large scale web and mobile apps, can also run on AWS Lambda, a simple, scalable, low cost, reliable and low latency compute service, without having to provision or manage underlying compute resources.
Read the last post here.
Read the next post here.