AWS Certified Machine Learning – Specialty Sample Questions

  1. Home
  2. AWS Certified Machine Learning – Specialty Sample Questions

Machine learning has become popular among IT enthusiasts. The AWS Machine Learning specialty exam is intended to cover Amazon Web Services products that enable developers to discover patterns in end-user data using algorithms, build mathematical models based on these patterns, and then create and implement predictive applications. Every organization wants its most valuable asset – its workforce – to be technologically up to date at all times.

And, if you work for an IT company, staying current on technological developments is critical to maintaining your position and reputation. Certifications also demonstrate your dedication to your work and your organization. Keeping yourself up to date boosts your confidence and allows you to stand out in a crowd.

About the Certification

The AWS Certified Machine Learning – Specialty (MLS-C01) exam is designed for people who work in development or data science. This exam validates an examinee’s ability to use the AWS Cloud to build, train, tune, and deploy machine learning (ML) models.

It assesses an examinee’s ability to design, implement, deploy, and maintain machine learning solutions for specific business problems. It will confirm the candidate’s capability to:

  • Choose and justify the best ML approach for a given business problem.
  • Determine the best AWS services for implementing ML solutions.
  • Create and implement scalable, cost-effective, dependable, and secure machine learning solutions.

Advanced Sample Questions

What is the purpose of Amazon SageMaker Algorithm?

  • a. To deploy, monitor, and maintain machine learning models
  • b. To automate the building, training, and deploying of machine learning models
  • c. To build and host custom machine learning models
  • d. All of the above

Answer: d. All of the above

What is the role of Amazon EBS in Amazon SageMaker?

  • a. To store the output of the machine learning model
  • b. To store the training data for the model
  • c. To store the model artifacts
  • d. All of the above

Answer: d. All of the above

What is the main advantage of using Amazon SageMaker Ground Truth over traditional data labeling methods?

  • a. It reduces the time and cost of data labeling
  • b. It provides access to a large pool of annotators
  • c. It provides accurate and consistent labeling
  • d. All of the above

Answer: d. All of the above

What is the purpose of Amazon SageMaker Neo?

  • a. To optimize machine learning models for deployment on different hardware
  • b. To improve the performance of machine learning models
  • c. To reduce the size of machine learning models
  • d. All of the above

Answer: a. To optimize machine learning models for deployment on different hardware

What is the role of Amazon Kinesis in machine learning on AWS?

  • a. To process and store real-time streaming data
  • b. To train machine learning models on streaming data
  • c. To deploy machine learning models on streaming data
  • d. All of the above

Answer: a. To process and store real-time streaming data

What is Amazon Elastic Inference?

  • a. An inference engine for deep learning models
  • b. A GPU-accelerated inference engine
  • c. A cost-effective inference engine for machine learning models
  • d. All of the above

Answer: c. A cost-effective inference engine for machine learning models

What is the role of Amazon S3 in machine learning on AWS?

  • a. To store the training and testing data for machine learning models
  • b. To store the model artifacts
  • c. To store the output of machine learning models
  • d. All of the above

Answer: d. All of the above

What is the purpose of Amazon DynamoDB in machine learning on AWS?

  • a. To store the training and testing data for machine learning models
  • b. To store the model artifacts
  • c. To store the output of machine learning models
  • d. All of the above

Answer: c. To store the output of machine learning models

What is Amazon QuickSight used for in machine learning on AWS?

  • a. To visualize the results of machine learning models
  • b. To build and deploy machine learning models
  • c. To store and manage the data used for machine learning
  • d. All of the above

Answer: a. To visualize the results of machine learning models

What is the role of Amazon Machine Learning in machine learning on AWS?

  • a. To provide an interface for building, deploying, and monitoring machine learning models
  • b. To provide a library of pre-built machine learning algorithms
  • c. To provide tools for data preprocessing and feature engineering
  • d. All of the above

Answer: a. To provide an interface for building, deploying, and monitoring machine learning models

Basic Sample Questions

Question 1

While performing mini-batch training on a neural network for a classification task, a Data Scientist notices oscillations in training accuracy. Which of the following is MOST LIKELY to be the CAUSE of this problem?

  • A. The dataset’s class distribution is skewed.
  • B. Dataset shuffle is turned off.
  • C. The batch size is insufficient.
  • D. The learning rate is exceptionally rapid.

Correct Answer – D

Reference: https://towardsdatascience.com/deep-learning-personal-notes-part-1-lesson-2-8946fe970b95

Question 2

When submitting Amazon SageMaker training tasks that use one of the built-in algorithms, which common parameters MUST be provided? (Choose three.)

  • A. The training channel identifies training data on an Amazon S3 bucket.
  • B. The validation channel identifies the validation data’s location on an Amazon S3 bucket.
  • C. The IAM role that Amazon SageMaker can use to perform tasks on the users’ behalf.
  • D. Hyperparameters in a JSON array, as specified by the algorithm.
  • E. The Amazon EC2 instance class indicates whether training will be performed with a CPU or a GPU.
  • F. The output path specifies where the trained model will be stored on an Amazon S3 bucket.

Correct Answer – A,E,F

Question 3

A retail chain has been using Amazon Kinesis Data Firehose to load purchase data from its 20,000-store network into Amazon S3. Training data will require additional but simple transformations, and certain characteristics will be merged, to facilitate the training of a more advanced machine learning model. The model must be retrained on a daily basis.

Given a large number of stores and historical data ingestion, which update will require the LEAST amount of development work?

  • A. Require stores to capture their data locally on AWS Storage Gateway for loading into Amazon S3, then use AWS Glue to transform it.
  • B. Deploy an Amazon EMR cluster with Apache Spark and the transformation logic, and have the cluster run every day on the accumulating records in Amazon S3, outputting new/transformed records to Amazon S3.
  • C. Launch a fleet of Amazon EC2 instances containing the transformation logic, instruct them to transform the data records accumulating on Amazon S3, and then output the transformed records to Amazon S3.
  • D. Add an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data Firehose stream that uses SQL to transform raw record attributes into simple transformed values.

Correct Answer – D

Question 4

A data scientist uses an Amazon SageMaker notebook instance to explore and analyze data. This entails installing on the notebook instance some Python packages that are not natively available on Amazon SageMaker. How can a machine learning expert ensure that the data scientist’s essential packages are always available on the notebook instance?

  • A. Set up the AWS Systems Manager Agent on the underlying Amazon EC2 instance and use Systems Manager Automation to run the package installation commands.
  • B. Create a Jupyter notebook file (.ipynb) with cells containing the package installation commands to run and save it to the /etc/init directory of each Amazon SageMaker notebook instance.
  • C. From the Jupyter notebook console, use the conda package manager to apply the necessary conda packages to the notebook’s default kernel.
  • D. Using the package installation commands, create an Amazon SageMaker lifecycle configuration and assign it to the notebook instance.

Correct Answer – B

Reference: https://towardsdatascience.com/automating-aws-sagemaker-notebooks-2dec62bc2c84

Question 5

A web-based company wants to boost conversions on its landing page. Using a large historical dataset of client visits, the company developed a multi-class deep learning network algorithm using Amazon SageMaker on a regular basis. However, there is an overfitting problem: training data shows a prediction accuracy of 90%, whereas test data shows a prediction accuracy of 70%. In order to optimize visit-to-purchase conversions, the organization must improve the generalizability of its model before putting it into production.

Which activity is recommended to ensure that the company’s test and validation data is modeled as accurately as possible?

  • A. Increase the randomization of training data in training mini-batches.
  • B. Assign a larger proportion of the total data to the training dataset.
  • C. Incorporate L1 or L2 regularisation as well as dropouts into the training.
  • D. Reduce the number of deep learning network layers and units (or neurones).

Correct Answer – D

Question 6

A long short-term memory (LSTM) model is used by a company to evaluate the risk variables associated with a specific energy sector. The programme analyses multi-page text documents and classifies each phrase as either dangerous or not dangerous. Despite the Data Scientist’s extensive experimentation with various network architectures and tuning of the associated hyperparameters, the model underperforms. Which technique will result in the MAXIMUM performance increase?

  • A. Pretrain term frequency-inverse document frequency (TF-IDF) vectors on a large collection of energy-related news articles to initialise the words.
  • B. Instead of LSTM, use gated recurrent units (GRUs) and run the training process until the validation loss stops decreasing.
  • C. Lower the learning rate and repeat the training process until the training loss no longer decreases.
  • D. Pretrain the words using word2vec embeddings on a large collection of energy-related news articles.

Correct Answer – C

Question 7

An eCommerce startup is using photographs to automate product categorization. A data scientist trained a computer vision model using the Amazon SageMaker image categorization method. The photos for each product are organized by the product line. The model’s accuracy is insufficient when classifying new items. All product photos are the same size and are stored in an Amazon S3 bucket. The company wants to improve the model as soon as possible so that it can be used for future products. Which actions would improve the accuracy of the solution? (Choose three.)

  • A. To improve accuracy, use the SageMaker semantic segmentation algorithm to train a new model.
  • B. Classify the products in the dataset using the Amazon Rekognition DetectLabels API.
  • C. Enhance the dataset’s images. Crop, resize, flip, rotate, and adjust the brightness and contrast of the images using open source libraries.
  • D. Use a SageMaker notebook to implement pixel normalization and image scaling. Amazon S3 should be used to store the new dataset.
  • E. Train a new model with Amazon Rekognition Custom Labels.
  • F. Examine the product categories for class imbalances and use oversampling or undersampling as needed. Amazon S3 should be used to store the new dataset.

Correct Answer – B, C, E

Reference: https://docs.aws.amazon.com/rekognition/latest/dg/how-it-works-types.html https://towardsdatascience.com/image-processing-techniques-for-computer-vision-11f92f511e21 https://docs.aws.amazon.com/rekognition/latest/customlabels-dg/training-model.html

Question 8

An employee saw a video clip with audio on a company’s social media page. The video is only available in Spanish. The employee’s first language is English, and he or she cannot understand Spanish. Sentiment analysis is requested by the employee. Which service combination is the MOST EFFECTIVE in terms of task completion?

  • A. Amazon Transcribe, Amazon Translate, and Amazon Comprehend are three services provided by Amazon.
  • B. Amazon Transcribe, Amazon Comprehend, and Amazon SageMaker seq2seq are all available.
  • C. Amazon Transcribe, Amazon Translate, and Amazon SageMaker Neural Topic Model are examples of such services (NTM)
  • D. Amazon Transcribe, Amazon Translate, and Amazon SageMaker BlazingText are all available on Amazon.

Correct Answer – A

Question 9

A company examines camera photos of the tops of objects placed on store shelves to determine which items have been taken and which remain. After many hours of data tagging, the organization now has a total of 1,000 hand-labeled photos encompassing ten distinct things. The instruction was ineffective. Which machine learning technique best meets the company’s long-term objectives?

  • A. Grayscale the images and retrain the model
  • B. Reduce the number of distinct items from ten to two, construct the model, and iterate.
  • C. Attach different coloured labels to each item, take new photos, and reassemble the model.
  • D. Use image variants such as inversions and translations to augment training data for each item, then build the model and iterate.

Correct Answer – A

Question 10

A plane engine manufacturer is compiling a time series of 200 performance indicators. During testing, engineers require near-real-time detection of significant production problems. All data must be saved for later analysis. Which strategy would be the MOST EFFECTIVE in terms of near-real-time defect detection?

  • A. Make use of AWS IoT Analytics for data ingestion, storage, and analysis. To perform anomaly analysis, use Jupyter notebooks from within AWS IoT Analytics.
  • B. Use Amazon S3 for data ingestion, storage, and analysis. To detect anomalies, use an Amazon EMR cluster to perform Apache Spark ML k-means clustering.
  • C. Use Amazon S3 for data ingestion, storage, and analysis. To detect anomalies, use the Amazon SageMaker Random Cut Forest (RCF) algorithm.
  • D. Perform anomaly detection using Amazon Kinesis Data Firehose and Amazon Kinesis Data Analytics Random Cut Forest (RCF). Kinesis Data Firehose can be used to store data in Amazon S3 for later analysis.

Correct Answer – B

Question 11

Amazon A company is using Polly to convert plaintext texts to voice in order to automate corporate announcements. However, corporate acronyms are mispronounced in modern papers. What should a Machine Learning Specialist do about this problem in the future?

  • A. Convert existing documents to SSML and add pronunciation tags.
  • B. Create a lexicon of appropriate pronunciations.
  • C. Use speech marks to help with pronunciation.
  • D. Preprocess the text files for pronunciation using Amazon Lex.

Correct Answer – A

Reference: https://docs.aws.amazon.com/polly/latest/dg/ssml.html

Question 12

A manufacturing company asks a machine learning expert for help in developing a model that categorizes damaged components into one of eight defect classes. The company provided over 100,000 photos per fault category for training purposes. According to the expert, the validation accuracy of the picture classification model is 80%, while the training accuracy is 90%. Human-level performance for this type of image categorization is estimated to be around 90%. What should the professional take into account when resolving this situation?

  • A. More training time
  • B. Extending the network
  • C. Making use of a different optimizer
  • D. Making use of some form of regularisation

Correct Answer – D

Reference: https://acloud.guru/forums/aws-certified-machine-learning-specialty/discussion/-MGdBUKmQ02zC3uOq4VL/AWS%20Exam%20Machine%20Learning

Question 13

A company is creating a demand forecasting model based on machine learning (ML). During the development stage, an ML expert performs feature engineering on an Amazon SageMaker laptop with limited CPU and memory resources. A data engineer typically uses the same notebook to perform data preparation once a day, which requires a large amount of RAM and takes only two hours to complete. The data preparation is not intended to benefit from GPU acceleration. All processes on an ml.m5.4xlarge notebook instance are running normally.
The organization receives an AWS Budgets warning that the billing for this month exceeds the budgeted amount.

Which of the following solutions will save you the most money?

  • A. Set the notebook instance type to memory optimized with the same vCPU number as the ml.m5.4xlarge instance. When not in use, turn off the notebook. In that instance, run both data preprocessing and feature engineering development.
  • B. Maintain the same notebook instance type and size. When not in use, turn off the notebook. Using Amazon SageMaker Processing, run data preprocessing on a P3 instance type with the same memory as the ml.m5.4xlarge instance.
  • C. Switch to a smaller general-purpose notebook instance. When not in use, turn off the notebook. Using Amazon SageMaker Processing, run data preprocessing on an ml.r5 instance with the same memory size as the ml.m5.4xlarge instance.
  • D. Replace the notebook instance with a smaller general-purpose instance. When not in use, turn off the notebook. Using the Reserved Instance option, run data preprocessing on an R5 instance with the same memory size as the ml.m5.4xlarge instance.

Correct Answer – B

Question 14

Telemetry data is generated by wind turbines, weather stations, and solar panels for an energy company. The organization wishes to perform predictive maintenance on these devices. The devices are dispersed throughout the city and have spotty internet access.
A team of data scientists is analyzing the telemetry data to detect anomalies and predict repairs before the devices fail. The team requires a data ingestion system that is scalable, secure, and capable of handling large amounts of data at high speeds. The team has decided to keep data on Amazon S3.

Which technique meets these requirements?

  • A. Ingest the data by making an HTTP API call to an Amazon EC2 web server. To load the data into Amazon S3, set up EC2 instances in an Auto Scaling configuration behind an Elastic Load Balancer.
  • B. Send the data to AWS IoT Core via Message Queuing Telemetry Transport (MQTT). Create an AWS IoT Core rule to send data to an Amazon Kinesis data stream configured to write to an S3 bucket using Amazon Kinesis Data Firehose.
  • C. Send the data to AWS IoT Core via Message Queuing Telemetry Transport (MQTT). Create an AWS IoT Core rule to route all MQTT data to an Amazon Kinesis Data Firehose delivery stream configured to write to an S3 bucket.
  • D. Message Queuing Telemetry Transport (MQTT) the data to an Amazon Kinesis data stream configured to write to an S3 bucket.

Correct Answer – C

Reference: https://aws.amazon.com/blogs/industries/real-time-operational-monitoring-of-renewable-energy-assets-with-aws-iot/

Question 15

A company wants to forecast home selling prices using historical sales data. In the company’s dataset, the goal variable is the selling price. The attributes include the lot size, living space and non-living area measurements, the number of bedrooms and bathrooms, the year built, and the postal code. The company wants to forecast home sales prices using multivariable linear regression.

Which step should a machine learning expert take to remove extraneous data and simplify the model?

  • A. Create a histogram of the features and calculate the standard deviation. Remove features with a lot of variation.
  • B. Create a histogram of the features and calculate the standard deviation. Features with low variance should be removed.
  • C. Create a heatmap displaying the dataset’s correlation with itself. Features with low mutual correlation scores should be removed.
  • D. Perform a correlation analysis on all features in relation to the target variable. Features with low target variable correlation scores should be removed.

Correct Answer – D

Question 16

A Machine Learning Specialist collects customer data for an online shopping website. The data includes demographic information, previous visits, and information about the surrounding area. The Specialist is in charge of creating a machine learning strategy for identifying client purchasing habits, preferences, and trends in order to improve the website’s service and recommendation capabilities.

What action should the Specialist recommend?

  • A. Identifying patterns in the customer database using Latent Dirichlet Allocation (LDA) for the given collection of discrete data.
  • B. A neural network with at least three layers and random initial weights to recognize patterns in the customer database.
  • C. Identifying patterns in the customer database through collaborative filtering based on user interactions and correlations.
  • D. RCF over random subsamples to identify patterns in the customer database.

Correct Answer – C

Question 17

A mobile network operator is developing an analytics platform for analyzing and optimizing business operations using Amazon Athena and Amazon S3. The source systems send data in.CSV format in real-time. The Data Engineering team wishes to convert the data to the Apache Parquet format before storing it on Amazon S3. Which approach necessitates the LEAST amount of effort to implement?

  • A. Ingest.CSV data using Apache Kafka Streams on Amazon EC2 instances and serialize data as Parquet using Kafka Connect S3.
  • B. Import.CSV data from Amazon Kinesis Data Streams and convert it to Parquet using Amazon Glue.
  • C. In an Amazon EMR cluster, use Apache Spark Structured Streaming to import.CSV data and Apache Spark to convert the data to Parquet.
  • D. Import.CSV data from Amazon Kinesis Data Streams and convert it to Parquet using Amazon Kinesis Data Firehose.

Correct Answer – B

Question 18

A utility company wants to forecast future energy consumption for its residential and commercial clients. Data on historical energy consumption over the last decade is provided. A team of data scientists will conduct the initial data analysis and feature selection, which will include historical power use data as well as data on the weather, the number of people on the property, and public holidays. Data scientists use Amazon Forecast to generate the projections. Which Forecast algorithm should data scientists use to meet these criteria?

  • A. AIM stands for Autoregressive Integrated Moving Average (AIRMA)
  • B. Smoothing on the Exponential (ETS)
  • C. Quantile Regression Using a Convolutional Neural Network (CNN-QR)
  • D. Prophet

Correct Answer – B

Reference: https://jesit.springeropen.com/articles/10.1186/s43067-020-00021-8

Question 19

A machine learning (ML) model is used by a retail company to forecast daily sales. The model has been producing incorrect results for the last three weeks, according to the company’s brand manager.
An AWS Glue task consolidates the forecasting input data with the actual daily sales data and the model’s predictions at the end of each day. Using the AWS Glue task, the data is saved in Amazon S3. Using an Amazon SageMaker Studio notebook, the company’s machine learning team is analyzing the model’s errors. What actions should the machine learning team take on the SageMaker Studio notebook to best demonstrate the model’s degradation?

  • A. Make a histogram of the last three weeks’ daily sales. In addition, make a histogram of the daily sales prior to that time period.
  • B. Produce a histogram of model errors over the last three weeks. In addition, make a histogram of the model errors prior to that time period.
  • C. Make a line chart of the model’s weekly mean absolute error (MAE).
  • D. For the last three weeks, create a scatter plot of daily sales versus model error. Create a scatter plot of daily sales versus model error prior to that period.

Correct Answer – C

Reference: https://machinelearningmastery.com/time-series-forecasting-performance-measures-with-python/

Question 20

A mobile device manufacturer wishes to determine and adjust the optimal selling price for its products. The company is gathering and characterizing relevant data in order to train machine learning (ML) models. There are over 1,000 features, and the corporation wants to know which ones are most important in determining the selling price. Which feature selection strategies should the company use? (Make three choices.)

  • A. Scaling of data with standardization and normalization
  • B. Heat map correlation plot
  • C. Binning of data
  • D. Univariate analysis
  • E. Using a tree-based classifier to determine feature importance
  • F. Data enhancement

Correct Answer – C, D, F

Reference: https://towardsdatascience.com/an-overview-of-data-preprocessing-features-enrichment-automatic-feature-selection-60b0c12d75ad https://towardsdatascience.com/feature-selection-using-python-for-classification-problem-b5f00a1c7028#:~:text=Univariate%20feature%20selection%20works%

Hurry up and start preparing now!

Menu