Determine and Optimize the Operational Characteristics of the Storage Solution

  1. Home
  2. AWS Certified Big Data Specialty
  3. Determine and Optimize the Operational Characteristics of the Storage Solution

The following questions can help you segment data within each of your workloads and determine your storage requirements:

  • How often and how quickly do you need to access your data? AWS offers storage options and pricing tiers for frequently accessed, less frequently accessed, and infrequently accessed data.
  • Does your data store require high IOPS or throughput? AWS provides categories of storage that are optimized for performance and throughput. Understanding IOPS and throughput requirements will help you provision the right amount of storage and avoid overpaying.
  • How critical (durable) is your data? Critical or regulated data needs to be retained at almost any expense and tends to be stored for a long time.
  • How sensitive is your data? Highly sensitive data needs to be protected from accidental and malicious changes, not just data loss or corruption. Durability, cost, and security are equally important to consider.
  • How large is your data set? Knowing the total size of the data set helps in estimating storage capacity and cost.
  • How transient is your data? Transient data is short-lived and typically does not require high durability. (Note: Durability refers to average annual expected data loss.) Clickstream and Twitter data are good examples of transient data.
  • How much are you prepared to pay to store the data? Setting a budget for data storage will inform your decisions about storage options.

Optimize Amazon S3 Storage

Amazon S3 lets you analyze data access patterns, create inventory lists, and configure lifecycle policies. You can set up rules to automatically move data objects to cheaper S3 storage tiers as objects are accessed less frequently or to automatically delete objects after an expiration date. To manage storage data most effectively, you can use tagging to categorize your S3 objects and filter on these tags in your data lifecycle policies.

To determine when to transition data to another storage class, you can use Amazon S3 analytics storage class analysis to analyze storage access patterns. Analyze all the objects in a bucket or use an object tag or common prefix to filter objects for analysis. If you observe infrequent access patterns of a filtered data set over time, you can use the information to choose a more appropriate storage class, improve lifecycle policies, and make predictions around future usage and growth.

Another management tool is Amazon S3 Inventory, which audits and reports on the replication and encryption status of your S3 objects on a weekly or monthly basis. This feature provides CSV output files that list objects and their corresponding metadata and lets you configure multiple inventory lists for a single bucket, organized by different S3 metadata tags. You can also query Amazon S3 inventory using standard SQL by using Amazon Athena, Amazon Redshift Spectrum, and other tools, such as Presto, Apache Hive, and Apace Spark.

Amazon S3 can also publish storage, request, and data transfer metrics to Amazon CloudWatch. Storage metrics are reported daily, are available at one-minute intervals for granular visibility, and can be collected and reported for an entire bucket or a subset of objects (selected via prefix or tags).

With all the information these storage management tools provide, you can create policies to move less-frequently-accessed data S3 data to cheaper storage tiers for considerable savings. For example, by moving data from Amazon S3 Standard to Amazon S3 Standard-IA, you can save up to 60% (on a per-gigabyte basis) of Amazon S3 pricing. By moving data that is at the end of its lifecycle and accessed on rare occasions to Amazon Glacier, you can save up to 80% of Amazon S3 pricing.

Optimize Amazon EBS Storage

With Amazon EBS, it’s important to keep in mind that you are paying for provisioned capacity and performance—even if the volume is unattached or has very low write activity. To optimize storage performance and costs for Amazon EBS, monitor volumes periodically to identify ones that are unattached or appear to be underutilized or overutilized, and adjust provisioning to match actual usage.

AWS offers tools that can help you optimize block storage. Amazon CloudWatch automatically collects a range of data points for EBS volumes and lets you set alarms on volume behavior. AWS Trusted Advisor is another way for you to analyze your infrastructure to identify unattached, underutilized, and overutilized EBS volumes. Third-party tools, such as Cloudability, can also provide insight into performance of EBS volumes.

Delete Unattached Amazon EBS Volumes

An easy way to reduce wasted spend is to find and delete unattached volumes. However, when EC2 instances are stopped or terminated, attached EBS volumes are not automatically deleted and will continue to accrue charges since they are still operating. To find unattached EBS volumes, look for volumes that are available, which indicates that they are not attached to an EC2 instance. You can also look at network throughput and IOPS to see whether there has been any volume activity over the previous two weeks. If the volume is in a nonproduction environment, hasn’t been used in weeks, or hasn’t been attached in a month, there is a good chance you can delete it.

Before deleting a volume, store an Amazon EBS snapshot (a backup copy of an EBS volume) so that the volume can be quickly restored later if needed. You can automate the process of deleting unattached volumes by using AWS Lambda functions with Amazon CloudWatch.

Resize or Change the EBS Volume Type

Another way to optimize storage costs is to identify volumes that are underutilized and downsize them or change the volume type. Monitor the read-write access of EBS volumes to determine if throughput is low. If you have a current-generation EBS volume attached to a current-generation EC2 instance type, you can use the elastic volumes feature to change the size or volume type, or (for an SSD io1 volume) adjust IOPS performance without detaching the volume.

The following tips can help you optimize your EBS volumes:

  • For General Purpose SSD gp2 volumes, you’ll want to optimize for capacity so that you’re paying only for what you use.
  • With Provisioned IOPS SSD io1 volumes, pay close attention to IOPS utilization rather than throughput, since you pay for IOPS directly. Provision 10–20% above maximum IOPS utilization.
  • You can save by reducing provisioned IOPS or by switching from a Provisioned IOPS SSD io1 volume type to a General Purpose SSD gp2 volume type.
  • If the volume is 500 gigabytes or larger, consider converting to a Cold HDD sc1 volume to save on your storage rate.
  • You can always return a volume to its original settings if needed.

Delete Stale Amazon EBS Snapshots

If you have a backup policy that takes EBS volume snapshots daily or weekly, you will quickly accumulate snapshots. Check for stale snapshots that are over 30 days old and delete them to reduce storage costs. Deleting a snapshot has no effect on the volume. You can use the AWS Management Console or AWS Command Line Interface (CLI) for this purpose or third-party tools such as Skeddly.

Storage Optimization is an Ongoing Process

Maintaining a storage architecture that is both right-sized and right-priced is an ongoing process. To get the most efficient use of your storage spend, you should optimize storage on a monthly basis. You can streamline this effort by:

  • Establishing an ongoing mechanism for optimizing storage and setting up storage policies.
  • Monitoring costs closely using AWS cost and reporting tools, such as Cost Explorer, budgets, and detailed billing reports in the Billing and Cost Management console.
  • Enforcing Amazon S3 object tagging and establishing S3 lifecycle policies to continually optimize data storage throughout the data lifecycle.
determine and optimize the operational characteristics of the storage solution
Menu