Cloudwatch Introduction

In this, we will get Cloudwatch introduction.

SOA-C01 exam is updated to AWS Certified SysOps Administrator – Associate (SOA-C02).

Amazon CloudWatch monitors

AWS resources
applications running on AWS

CloudWatch

collects and tracks metrics, for AWS resources and applications.
CloudWatch home page displays metrics about every AWS service in use.
Can create custom dashboards to display metrics
Create alarms to watch metrics and send notifications or
Alarms can automatically make changes to the resources under monitoring against a threshold

Access CloudWatch by

Amazon CloudWatch console – https://console.aws.amazon.com/cloudwatch/
AWS CLI
CloudWatch API
AWS SDKs

CloudWatch Namespaces

A cloudwatch namespace is

It is a container for CloudWatch metrics.
Metrics in different namespaces are isolated from each other,
There is no default namespace.
Must specify a namespace for each data point to be published to CloudWatch.
You can specify a namespace name when you create a metric.
These names must contain valid XML characters,
Be fewer than 256 characters in length.
Possible characters are: alphanumeric characters (0-9A-Za-z), period (.), hyphen (-), underscore (_), forward slash (/), hash (#), and colon (:).
The AWS namespaces, naming convention: AWS/service

CloudWatch Metrics

A cloudwatch metric

represents a time-ordered set of data points published to CloudWatch.
It is similar to a variable to monitor, with data points as values of that variable over time.
AWS services send metrics to CloudWatch
Can send custom metrics to CloudWatch also
Retrieve statistics about data points as an ordered set of time-series data.
Metrics are specific to a Region in which were created and cannot be deleted,
Automatically expire after 15 months if no new data is published to them.
They expire on a rolling basis; as new data points come in, data older than 15 months is dropped.
Metrics are uniquely defined by a name, a namespace, and zero or more dimensions.
Each data point in a metric has a time stamp, and (optionally) a unit of measure.

CloudWatch Metrics Time Stamps

Each metric data point must be associated with a time stamp.
The time stamp can be up to two weeks in the past
up to two hours into the future.
If no time stamp is given, CloudWatch creates a time stamp on time data point was received.
Time stamps are dateTime objects
Coordinated Universal Time (UTC) is recommended
When you retrieve statistics from CloudWatch, all times are in UTC
CloudWatch alarms check metrics based on the current time in UTC.

CloudWatch Metrics Retention

CloudWatch retains metric data as follows:

Data points with a period of less than 60 seconds are available for 3 hours. Also called as high-resolution custom metrics.
Data points with a period of 60 seconds (1 minute) are available for 15 days
Data points with a period of 300 seconds (5 minute) are available for 63 days
Data points with a period of 3600 seconds (1 hour) are available for 455 days (15 months)

CloudWatch Dimensions

A dimension

is a name/value pair
part of the identity of a metric.
Can assign up to 10 dimensions to a metric.
Used to describe characteristic of a metric
Also used to filter the results that CloudWatch returns
For few AWS services like EC2, CloudWatch can aggregate data across dimensions
Example – Server=Producton,Domain=City01

CloudWatch Statistics

It is metric data aggregations over specified periods of time.
Aggregations use the namespace, metric name, dimensions, and the data point unit of measure, within the specified time period.

Available statistics

Minimum – lowest value observed during the specified period. Tells, when low activity
Maximum – highest value observed during the specified period. Tells, when high activity
Sum – All values submitted for the matching metric added together. Tells, total activity
Average – The value of Sum / SampleCount during the specified period.
SampleCount – The count (number) of data points used for the statistical calculation.
pNN.NN – Value of specified percentile up to 2 decimal places like p95.45. Not for negative value metrics.

CloudWatch Metrics Units

Each statistic has a unit of measure.
Example units – Bytes, Seconds, Count, and Percent.
Can specify a unit when you create a custom metric.
If not specified, CloudWatch uses None as the unit.
CloudWatch attaches no significance to a unit internally
Metric data points that specify a unit of measure are aggregated separately.
Statistics without specifying a unit, CloudWatch aggregates all data points of the same unit together.

CloudWatch Metrics Periods

Period is the length of time associated with a specific Amazon CloudWatch statistic.
Periods defined in seconds, and valid values for period are 1, 5, 10, 30, or any multiple of 60.
For period of six minutes, use 360 as the period value.
Can adjust how the data is aggregated by varying the length of the period.
Only custom metrics that you define with a storage resolution of 1 second support sub-minute periods.
To retrieve statistics, specify a period, start time, and end time.
The default values for the start time and end time get you the last hour’s worth of statistics.
For statistics aggregated over the entire hour, specify a period of 3600.
aggregated statistics are stamped with the time corresponding to the beginning of the period.
Periods are also important for CloudWatch alarms.

CloudWatch Metrics Aggregation

Amazon CloudWatch aggregates statistics according to the period length that you specify when retrieving statistics. You can publish as many data points as you want with the same or similar time stamps. CloudWatch aggregates them according to the specified period length. CloudWatch does not aggregate data across Regions.

You can publish data points for a metric that share not only the same time stamp, but also the same namespace and dimensions. CloudWatch returns aggregated statistics for those data points. You can also publish multiple data points for the same or different metrics, with any timestamp.

For large dataset\s, you can insert a pre-aggregated dataset called a statistic set. With statistic sets, you give CloudWatch the Min, Max, Sum, and SampleCount for a number of data points. This is commonly used when you need to collect data many times in a minute.

CloudWatch Percentiles

A percentile indicates the relative standing of a value in a dataset.
example, the 95th percentile means that 95 percent of the data is lower than this value and 5 percent of the data is higher than this value.
Used to isolate anomalies.

CloudWatch Alarms

Watches a single metric over a specified time period, and performs specified actions,
It initiates actions on your behalf.
Action on value of the metric relative to a threshold over time.
Action can be notification to SNS or Auto Scaling policy.
Can add alarms to dashboards.
Actions only for sustained state changes only.
Always select a period greater or equal to the frequency of the metric to be monitored.

CloudWatch Monitoring

Can monitor EC2 instances, Autoscaling Groups, ELBs, Route53 Health Checks, EBS Volumes, Storage Gateways, CloudFront, DynamoDB, ElastiCache nodes, RDS instances, EMR Job Flows, Redshift. SNS topics, SQS Queues, OpsWorks, CloudWatch Logs, Estimated charges on your AWS bill, and custom metrics | logs generated by your applications and services.
EC2 will by default monitor your instances @5 minute intervals
EC2 instances can monitor your instances @1 minute intervals if the ‘detailed monitoring’ option is set on the instance
By default CloudWatch will monitor CPU, Network, Disk, and Status Checks
RAM utilization is a custom metric and must be added manually to EC2 instances in order to be tracked.

2 types of Status Checks:

System Status Checks (Physical Host):

Checks the underlying physical host
Checks for loss of network connectivity
Then, checks for loss of system power
Checks for software issues on the physical host
Checks for hardware issues on the physical host
Best way to resolve issues is to stop the instance and start it again (will switch physical hosts)

Instance Status Checks

Checks the VM itself
Checks for failed system status checks
Then, checks for mis-configured networking or startup configs
Checks for exhausted memory
Next, checks for corrupted file systems
Checks for an incompatible kernel

Check more.