Amazon Athena

  • It is an interactive query service
  • Easily analyze data in S3 using standard SQL.
  • It is serverless
  • No infrastructure to manage
  • Pay only for the queries that you run.

Running AWS Athena

  • Point to data in S3
  • Define the schema
  • start querying using standard SQL.
  • Most results are delivered within seconds.
  • No need for complex ETL jobs to prepare data for analysis.
  • Anyone with SQL skills can quickly analyze large-scale datasets.

Well integrated with AWS Glue Data Catalog,

  • to create a unified metadata repository across various services
  • crawl data sources to discover schemas
  • populate Catalog with new and modified table
  • maintain schema versioning
  • Can also use Glue’s ETL capabilities.

Amazon EMR

  • It is a managed Hadoop framework
  • Simplifies running big data frameworks – Apache Hadoop, Apache Spark, HBase, Presto, and Flink  on AWS
  • Process and analyze vast amounts of data.
  • Uses Apache Hive and Apache Pig, to process data for analytics and BI.
  • Use to transform and move large amounts of data into and out of other AWS data stores and databases.
  • Can interact with data in other AWS data stores like S3, DynamoDB.

EMR Notebooks

  • Is based on the popular Jupyter Notebook
  • provide a development and collaboration environment for ad hoc querying and exploratory analysis.

Amazon CloudSearch

  • It is a managed service
  • To set up, manage, and scale a search solution for website or application.
  • Supports 34 languages
  • Supported search features
    • Highlighting
    • Autocomplete
    • geospatial search

Amazon Elasticsearch Service

  • Used to deploy, secure, operate, and scale Elasticsearch
  • Elasticsearch is used to search, analyze, and visualize data in real-time.
  • Access APIs and real-time analytics capabilities
  • Useful for
    • log analytics
    • full-text search
    • application monitoring
    • clickstream analytics
  • Integrations with Kibana and Logstash
  • Integrates with other AWS services Amazon VPC, AWS KMS, Amazon Kinesis Data Firehose, AWS Lambda, AWS IAM, Amazon Cognito, and Amazon CloudWatch.

Amazon Kinesis

  • Used to collect, process, and analyze real-time, streaming data
  • Easily get timely insights and react quickly to new information.
  • Offers flexibility to choose tools.
  • Ingest real-time data such
  • Can process and analyze data as it arrives and respond instantly instead of waiting
  • Currently offers four services
    • Kinesis Data Firehose
    • Kinesis Data Analytics
    • Kinesis Data Streams
    • Kinesis Video Streams

Amazon Redshift

  • It is a fast, scalable data warehouse
  • Used to analyze all data across data warehouse and data lake.
  • Integrates with machine learning, parallel query execution, and columnar storage.
  • Setup and deploy a new data warehouse in minutes
  • Run queries across petabytes in Redshift, and exabytes in data lake.

Amazon QuickSight

  • It is a fast, cloud-powered business intelligence (BI) service
  • Used to deliver insights.
  • Create and publish interactive dashboards
  • Dashboards accessible from browsers or mobile devices.
  • Embed dashboards into applications for self-service analytics
  • Easily scales without any software to install or infrastructure to manage.

AWS Data Pipeline

  • It is a web service
  • Used to reliably process and move data
  • Move between different AWS services, on-premises data sources, at specified intervals.
  • Regularly access data where it’s stored, transform and process it at scale
  • Transfer the results to AWS services.
  • Easily create complex data processing workloads that are
    • fault tolerant
    • repeatable
    • highly available

AWS Glue

  • Fully managed ETL service
  • Easily prepare and load data for analytics.
  • Create and run an ETL job in AWS Management Console.
  • Point to data stored on AWS, data and associated metadata is discovered in Glue Data Catalog.
  • Once cataloged, data is immediately searchable, queryable, and available for ETL.
Menu