Determine the Tools and Techniques Required for Analysis

Running AWS Athena

Well integrated with AWS Glue Data Catalog,

Amazon EMR

It is a managed Hadoop framework
Simplifies running big data frameworks – Apache Hadoop, Apache Spark, HBase, Presto, and Flink on AWS
Process and analyze vast amounts of data.
Uses Apache Hive and Apache Pig, to process data for analytics and BI.
Use to transform and move large amounts of data into and out of other AWS data stores and databases.
Can interact with data in other AWS data stores like S3, DynamoDB.

EMR Notebooks

Is based on the popular Jupyter Notebook
provide a development and collaboration environment for ad hoc querying and exploratory analysis.

Amazon CloudSearch

Amazon Elasticsearch Service

Used to deploy, secure, operate, and scale Elasticsearch
Elasticsearch is used to search, analyze, and visualize data in real-time.
Access APIs and real-time analytics capabilities
Useful for
- log analytics
- full-text search
- application monitoring
- clickstream analytics
Integrations with Kibana and Logstash
Integrates with other AWS services Amazon VPC, AWS KMS, Amazon Kinesis Data Firehose, AWS Lambda, AWS IAM, Amazon Cognito, and Amazon CloudWatch.

Amazon Kinesis

Used to collect, process, and analyze real-time, streaming data
Easily get timely insights and react quickly to new information.
Offers flexibility to choose tools.
Ingest real-time data such
Can process and analyze data as it arrives and respond instantly instead of waiting
Currently oﬀers four services
- Kinesis Data Firehose
- Kinesis Data Analytics
- Kinesis Data Streams
- Kinesis Video Streams

Amazon Redshift

It is a fast, scalable data warehouse
Used to analyze all data across data warehouse and data lake.
Integrates with machine learning, parallel query execution, and columnar storage.
Setup and deploy a new data warehouse in minutes
Run queries across petabytes in Redshift, and exabytes in data lake.

Amazon QuickSight

AWS Data Pipeline

It is a web service
Used to reliably process and move data
Move between diﬀerent AWS services, on-premises data sources, at speciﬁed intervals.
Regularly access data where it’s stored, transform and process it at scale
Transfer the results to AWS services.
Easily create complex data processing workloads that are
- fault tolerant
- repeatable
- highly available

AWS Glue

Fully managed ETL service
Easily prepare and load data for analytics.
Create and run an ETL job in AWS Management Console.
Point to data stored on AWS, data and associated metadata is discovered in Glue Data Catalog.
Once cataloged, data is immediately searchable, queryable, and available for ETL.