Microsoft Azure DP-900 Data Fundamentals Interview Questions

  1. Home
  2. Microsoft Azure DP-900 Data Fundamentals Interview Questions
Microsoft Azure Data Fundamentals: DP-900 Interview Questions

Every job aspirant, looking to start their career has one common aim that is, to crack the interview. This not only creates confusion but also some stress in mind about what to be asked? What to prepare? And How to prepare? The same goes for the role of Microsoft Azure Data Fundamentals. That is to say, candidates after cracking the Microsoft Azure DP-900 exam also have to prepare themselves for the job role interview they want to apply for. Because seeing the competition around for the Microsoft Azure job, it is important to be fully prepared for the interview process as well.

But, we are always here to help! In this tutorial, we will learn and go through important areas for the DP-900 exam interview in the form of questions. Using this, you can gain knowledge as well as the confidence to pass the interview process. However, before starting with the interview questions, take a look at the DP-900 overview and basic knowledge requirements.

Microsoft DP-900: Overview

Microsoft Azure DP-900 exam is for candidates with basic knowledge in the core data concepts and understanding in the process of implementing using Microsoft Azure data services. If you are beginning to work with data in the cloud then, this exam is for you.

Knowledge Area:
  • For this Microsoft Azure DP-900 exam, Firstly, you must have familiarity with the concepts of relational and non-relational data. And, knowledge in various types of data workloads like transactional or analytical.

Moving on to the interview questions!

Advanced Interview Questions

What are some common data storage options available in Azure?

Azure provides several data storage options that can be used to store different types of data, depending on the specific requirements of the application or workload. Some common data storage options available in Azure include:

  1. Azure Blob Storage: A fully-managed object storage service that can store unstructured data such as text and binary data, including images, videos, and audio. It can also store structured data such as JSON and XML.
  2. Azure Data Lake Storage: A fully-managed, scalable, and secure data lake that allows organizations to store data of any size, type, and ingestion speed. It can store both structured and unstructured data, and can be used for big data analytics and machine learning.
  3. Azure SQL Database: A fully-managed relational database service that can store structured data and supports the T-SQL language. It is built on top of the Microsoft SQL Server engine and can be used for transactional and analytical workloads.
  4. Azure Cosmos DB: A globally distributed, fully-managed, multi-model database service that supports document, key-value, graph, and column-family data models. It can be used for NoSQL workloads, such as document databases, key-value stores, and graph databases.
  5. Azure Table Storage: A NoSQL key-value store service that can store structured and semi-structured data. It is a cost-effective option for storing large amounts of data and can be used for high-performance, big data workloads.
  6. Azure File Storage: A fully-managed file share in the cloud that allows you to store files and share them via the SMB protocol. It can be used for storing and sharing files, as well as for use with on-premises or cloud-based applications.
  7. Azure Queue Storage: A fully-managed message queue service that allows you to store and retrieve large numbers of messages. It can be used for messaging scenarios, such as buffering, scheduling, and batch processing.

These are some of the common data storage options available in Azure, but there are many other services and options available depending on the use case and requirements.

Can you explain the concept of data warehousing in Azure?

Data warehousing is a process of collecting, storing, and managing large amounts of data in a centralized location for reporting and analysis purposes. Azure Data Fundamentals provides several services for data warehousing, including Azure SQL Data Warehouse and Azure Synapse Analytics.

Azure SQL Data Warehouse is a fully managed, scalable, and secure cloud data warehouse that uses a combination of massively parallel processing (MPP) and column store indexing to quickly query and analyze large amounts of data. It is built on top of Azure SQL Database and can be used to create a data warehouse in the cloud, with the ability to scale up and down based on the needs of the organization.

Azure Synapse Analytics (formerly SQL Data Warehouse) is a fully managed cloud data warehouse that allows for analytics over structured and unstructured data. It combines enterprise data warehousing, big data integration and analytics, and data integration. It allows to ingest, prepare, manage, and serve data for immediate business intelligence needs, and it is seamlessly integrated with Azure Machine Learning.

Both services provide a number of built-in data warehousing capabilities, such as data integration, data modeling, data warehousing, and data visualization, as well as integration with other Azure data services such as Azure Data Factory and Power BI for advanced data analysis and reporting

How do Azure Data Factory and Azure Data Lake differ in their functionality?

Azure Data Factory is a cloud-based data integration service that allows you to create, schedule, and manage data pipelines. These pipelines can move and transform data from various sources, such as on-premises databases and cloud-based services, into a data lake or a data warehouse for further analysis.

Azure Data Lake is a cloud-based data storage and analytics service. It allows you to store large amounts of data in its native format, without the need for a predefined schema. This makes it a good option for storing raw data that will be used for big data analytics, machine learning, and other advanced analytics use cases.

In summary, Azure Data Factory allows you to move and transform data while Azure Data Lake allows you to store and analyze data. You can use them together to create a complete data pipeline that ingests, stores, and analyzes data.

Can you explain the process of data ingestion using Azure Event Hubs?

Azure Event Hubs is a fully managed, real-time data streaming platform and event ingestion service, capable of receiving and processing millions of events per second. The process of data ingestion using Azure Event Hubs typically involves the following steps:

  • Create an Event Hubs namespace and an event hub within that namespace.
  • Configure the event hub with the desired settings, such as the number of partitions and the retention period for data.
  • Use one of the available SDKs or client libraries to send events to the event hub. Events can be sent using HTTP/HTTPS, AMQP, or MQTT protocols.
  • Optionally, use an event processor to read and process the events as they are received.
  • Optionally, use Azure Stream Analytics or Azure Functions to perform real-time analytics or triggering actions based on the events received.
  • Optionally, use Azure Data Lake Storage or Azure Cosmos DB to store the events for long-term retention or further analysis.

How does Azure Stream Analytics help in real-time data processing?

Azure Stream Analytics is a real-time data stream processing service provided by Microsoft Azure. It allows for the analysis of streaming data from various sources such as IoT devices, social media, and logs. It can filter, aggregate, and perform mathematical operations on streaming data in real-time, and then output the results to various destinations such as Azure Storage, Power BI, and Azure Event Hubs. This allows for real-time monitoring, alerting, and decision-making based on streaming data. Additionally, it can be integrated with other Azure services such as Azure Functions and Azure Machine Learning for further processing and analysis.

Can you explain the concept of Azure Synapse Analytics and its use cases?

Azure Synapse Analytics (formerly known as Azure SQL Data Warehouse) is a cloud-based analytics service provided by Microsoft Azure. It combines the capabilities of big data integration, data warehousing, and data integration with a rich set of analytics and visualization tools. It enables organizations to analyze and visualize data from various sources such as on-premises, cloud, and streaming data in real-time.

The main use cases for Azure Synapse Analytics include:

  1. Data warehousing: It can store and manage large amounts of structured and semi-structured data, and perform complex queries on the data using T-SQL or ANSI SQL.
  2. Big data integration: It can process and analyze large amounts of unstructured and structured data from various sources such as Hadoop, Azure Data Lake, and Blob storage.
  3. Data integration: It can integrate data from various sources, including on-premises and cloud-based systems, and perform data transformations and data quality checks.
  4. Advanced analytics: It can perform advanced analytics tasks such as machine learning and predictive modeling using built-in R and Python libraries.
  5. Data visualization: It provides a rich set of visualization tools such as Power BI and Azure Data Studio, which allows users to easily create interactive reports and dashboards.
  6. Real-time analytics: It can process and analyze streaming data in real-time using Azure Stream Analytics and Power BI.

By using Azure Synapse Analytics, organizations can gain insights from their data in near real-time, make data-driven decisions, and take actions to optimize their business operations.

Which Azure Cosmos DB features are different from other NoSQL databases?

Azure Cosmos DB has a number of features that differentiate it from other NoSQL databases. Some of these include:

  • Multi-model: Azure Cosmos DB supports multiple data models, including document, key-value, graph, and column-family, allowing developers to choose the best fit for their use case.
  • Global distribution: Azure Cosmos DB is designed to be globally distributed and allows for low-latency access to data from any location.
  • Automatic and instant scalability: Azure Cosmos DB allows for automatic and instant scalability of throughput and storage, making it easy to scale up or down based on the needs of an application.
  • Consistency models: Azure Cosmos DB offers a variety of consistency models, including strong and eventual consistency, which allows for fine-grained control over data consistency.
  • Advanced indexing: Azure Cosmos DB supports advanced indexing options, such as full-text search and geospatial indexing, that are not available in other NoSQL databases.
  • Built-in security: Azure Cosmos DB has built-in security features, including encryption at rest and in transit, role-based access control, and Azure Active Directory authentication and authorization.
  • Rich querying: Azure Cosmos DB supports rich querying of data using SQL, MongoDB, and Gremlin, allowing developers to use the language that is most familiar to them.

These features make Azure Cosmos DB a more versatile and powerful choice for applications that need to work with large volumes of data, that need low latency and high throughput, and that need to scale globally.

Can you explain the concept of Azure Data Share and its use cases?

Azure Data Share is a cloud-based service provided by Microsoft Azure that allows organizations to securely share data with other organizations or individuals. It enables data sharing between Azure subscriptions and across Azure regions, as well as with external parties.

Data Share enables the following use cases:

  1. Data sharing: Allows organizations to share data with other organizations or individuals in a secure and controlled manner. It supports sharing of data stored in various Azure services such as Azure Data Lake Storage, Azure Blob storage, and Azure SQL Database.
  2. Data collaboration: Allows organizations to collaborate with external partners, customers, or suppliers on data projects. It provides a secure and controlled environment for data collaboration, allowing external parties to access and use the shared data without compromising security.
  3. Data governance: Allows organizations to maintain control over their data, including the ability to set access permissions and track data usage. It supports data lineage and data cataloging, as well as integration with Azure Policy and Azure Active Directory for fine-grained access control.
  4. Data integration: Allows organizations to easily share data with other systems and applications, such as Power BI, Azure Machine Learning, and Power Automate.

By using Azure Data Share, organizations can share data with external parties in a secure and controlled manner, which can help to improve data quality, reduce data duplication, and support data-driven decision making.

How does Azure Purview help in data lineage and governance?

Azure Purview is a cloud-based data governance service provided by Microsoft Azure that helps organizations to discover, understand, and govern their data across on-premises, multi-cloud and SaaS environments. It enables organizations to gain a holistic view of their data assets, understand the relationships between them, and apply governance policies to ensure data quality, compliance, and security.

Azure Purview helps in data lineage and governance in the following ways:

  1. Data Discovery: It automatically discovers and crawls data sources across on-premises, multi-cloud and SaaS environments and creates a comprehensive map of the organization’s data assets.
  2. Data Relationship: It uses machine learning algorithms to understand the relationships between data assets and create a data lineage map that shows how data flows through the organization.
  3. Data Governance: It allows organizations to define and apply governance policies to data assets, such as data quality, compliance, and security. Policies can be defined at the data source, column, and row level, and can be integrated with Azure Policy and Azure Active Directory for fine-grained access control.
  4. Data Catalog: It provides an intuitive and searchable catalog of data assets that allows users to quickly find and understand the data they need. It also allows users to understand the data lineage, data quality, and data policy compliance of data assets.

By using Azure Purview, organizations can gain a comprehensive understanding of their data assets and their relationships, as well as apply governance policies to ensure data quality, compliance, and security.

How does Azure SQL Data Warehouse differ from other SQL databases in Azure?

Azure SQL Data Warehouse (SQL DW) is a fully-managed, cloud-based data warehouse service that is part of the Azure SQL family of services. It is designed to work with very large amounts of data and allows for high-performance analytics and reporting. It differs from other SQL databases in Azure, such as Azure SQL Database and Azure Database for MySQL, in a number of ways:

  • Scale: Azure SQL DW can scale up and out to handle very large amounts of data, with the ability to scale compute and storage independently.
  • Columnstore Index: Azure SQL DW uses a columnstore index for storage which is optimized for analytics workloads and allows for faster query performance on large data sets.
  • MPP Architecture: Azure SQL DW uses a massively parallel processing (MPP) architecture that allows for parallel processing of queries and data across multiple nodes.
  • Partitioning: Azure SQL DW supports table partitioning, which allows for more efficient querying and management of large data sets.
  • Data integration: Azure SQL DW integrates with Azure data services like Azure Data Factory, Azure Data Lake Storage, and Power BI to enable data integration, movement and reporting.
  • High availability: Azure SQL DW offers high availability options such as active geo-replication and failover groups to ensure the availability of data.
  • Elasticity: Azure SQL DW provides the ability to pause and resume the compute resources, this allows you to save cost when the data warehouse is not in use.

These features make Azure SQL DW a more powerful choice for organizations that need to work with large amounts of data and that need to perform high-performance analytics and reporting.

Basic Interview Questions

1. Explain the term, Azure?

Azure refers to a cloud platform developed by Microsoft that helps businesses to operate in the cloud. The cloud platform provides a pay-as-you-go payment method. Moreover, Azure provides infrastructure, hardware, operating systems, document services, storage, databases, and data services to help organizations get benefits and growth. Further, using Azure, you can host virtual machines (VM’s), web servers, database servers, and content storage in Azure.

2. What is Cloud Computing?

Cloud computing refers to Web-based computing that gives access to businesses and individuals for consuming computing resources and services. This can be virtual machines, databases, processing, memory, pay-as-you-go scheme and more. Moreover, in cloud computing, you get an option for using resources according to the pay-as-you-go model. Lastly, it provides benefits to top organotins by providing large-scale computing with seamless access to virtually limitless resources.

3. Can you list some advantages of cloud computing?

Some advantages of cloud computing are:

  • Firstly, it allows businesses to host hardware, database servers, web servers, software, products, and services in the cloud.
  • Secondly, Cloud computing offers 24×7 uptime with having control over the cloud servers and data centers using the help of a cloud service provider.
  • Thirdly, it provides scalability and reliability. And, there is no limit on the number of users or resources, you can add or remove accordingly.
  • Next, it maintains and provides automatic updates for new software, OS, databases, and third-party software. Moreover, it has data centers in various locations globally.
4. Can you name the types of services that the cloud offers?

Cloud offers three types of services:

  • Firstly, Infrastructure as a service. In this, you get the raw hardware from your cloud provider as a service.
  • Secondly, Platform as a Service. This provides a platform for publishing without giving access to the underlying software or OS. Lastly, Software as a service. In this, you can use no infrastructure, no platform, simple software without purchasing it.

5. How many types of cloud deployment models are there. Name and explain?

There are three cloud deployment models:

  • Firstly, Public Cloud. In this, the cloud provider and server owns the infrastructure that you are using could be a multi-tenant system.
  • Secondly, Private Cloud. In this, the infrastructure is owned by you or your cloud provider with providing exclusive service exclusively. For example, hosting your website with the cloud provider on a dedicated server.
  • Lastly, Hybrid Cloud. When both Public and Private Clouds are used together, it is called Hybrid clouds. For Example, the use of in-house servers for confidential data.

6. Suppose there is a private server on my premises. And, I have allocated some of my workloads to the public cloud. Which type of architecture this is?

This is a Hybrid Cloud. This is because both public cloud and private cloud are in use.

7. What makes you choose Cloud Computing?

Cloud computing refers to Web-based computing that gives access to businesses and individuals for consuming computing resources and services. This can be virtual machines, databases, processing, memory, services, storage, or even a number of calls or events and pay-as-you-go. Due to its growing demand in the market, this is the best thing to start a career in and for better growth.

8. Why go with Microsoft Azure and not AWS?

Microsoft Azure is 4-12% cheaper than AWS. Moreover, it offers some extra properties and services. And, talking about the advancement, Azure now offers a whole new set of capabilities and features. Further, using Azure, you can host virtual machines (VM’s), web servers, database servers, and content storage in Azure.

9. What is Azure Data Factory?

Azure Data Factory refers to a cloud-based, data-integration ETL service used for automating the movement and transformation of data. Using this, you can develop data-driven workflows for moving data between on-premises and cloud data stores. moreover, you can execute your data processing either on an Azure-based cloud service or in your own self-hosted compute environments. This can be SSIS, SQL Server, or Oracle. However, after creating a pipeline that performs the action you need, you can schedule it to run periodically, time window scheduling, or trigger the pipeline from an event occurrence.

10. What is Azure SSIS Integration Runtime? How to execute the SSIS package in Data Factory?

This is a fully managed cluster of virtual machines hosted in Azure and dedicated used for running SSIS packages in the Data Factory. Moreover, the SSIS IR nodes can be scaled up, by configuring the node size. Likewise, they can be scaled out by configuring the number of nodes in the VMs cluster. Further, for executing, create an SSIS IR and an SSISDB catalog hosted in Azure SQL Database or Azure SQL Managed Instance.

11. I want to get the list of all source files in a specific storage account and the properties of each file located in that storage. Which Data Factory activity is used?

For this, you can use the Get Metadata activity.

12. Name the types of triggers offered by Data Factory.

  • Firstly, the Schedule trigger. This is used for executing the ADF pipeline on a wall-clock schedule.
  • Secondly, the Tumbling window trigger. This executes the ADF pipeline on a periodic interval and retains the pipeline state.
  • Lastly, the Event-based trigger. This response to a blob related even for adding or deleting a blob from an Azure storage account

13. Name the methods for executing any Data Factory pipeline.

The methods include:

  • Firstly, under Debug mode
  • Secondly, manual execution using Trigger now
  • Lastly, adding scheduled, tumbling window, or event trigger
14. Explain the process for creating Azure Functions?

Azure Function is used for running small lines of code or functions in the cloud. In this, we get the option to select the programming languages and we pay only for the time our code executes. Further, it supports a variety of programming languages, like C#, F#, Node.js, Python, PHP, or Java. And, it has support for continuous deployment and integration by letting us develop serverless applications.

15. What is a Virtual Machine (VM)?

Azure VM provides the pliability of virtualization while not having to focus or maintain the physical hardware that runs it. But, you continue to maintain the VM by playing tasks. This includes configuring, patching, and putting in the computer code that runs next.

16. What is Microsoft SQL Azure?

Microsoft SQL Azure refers to a cloud-based relational database service that is built on SQL Server technologies. This runs in Microsoft data centers on hardware that is owned, hosted, and maintained by Microsoft.

17. What is Azure Active Directory (Azure AD)?

Azure Active Directory is a multi-tenant, cloud-based directory, and identity management service of Microsoft. This has the ability to combine core directory services, application access management, and identity protection into a single solution.

18. What do you understand by the term CRON expression?

This refers to a string of characters for providing the schedule for a service to run in repeat.

19. What is Azure Kubernetes Services?

Azure Kubernetes Service is an instrumentality orchestration service that depends on the open-source Kubernetes system. This service is available on the Microsoft Azure public cloud. And, it is used for deploying, scaling, and managing Docker containers and container-based applications across a cluster of container hosts

Intermediate Level Questions

20. What is the integration runtime?

The integration runtime refers to the compute infrastructure that Azure Data Factory for providing data integration capabilities across various network environments. This can be:

  • Firstly, Data movement. For this, the integration runtime creates movement of the data between the source and destination data stores. Side by side, it also provides support for built-in connectors, format conversion, column mapping,  and scalable data transfer.
  • Secondly, Dispatch activities. For transformation, the integration runtime o the capability for running SSIS packages.
  • Thirdly, Execute SSIS packages. The integration runtime runs SSIS packages in a managed Azure compute environment. 
    • Further, you can deploy one or many instances of the integration runtime as necessary for moving and transforming data. The integration runtime can execute on an Azure public network or on a private network.
Microsoft dp-90 Azure data fundamentaks online tutorial

21. What is the limit on the number of integration runtimes?

There is no limit on the number of integration runtime instances in a data factory. However, there is a limit on the number of VM cores that the integration runtime can use per subscription for SSIS package execution. 

22. Explain the top-level concepts of Azure Data Factory?

An Azure subscription can have one or more Azure Data Factory instances. However, the Azure Data Factory consists of four key components working together as a platform for composing data-driven workflows for moving and transforming data. The concepts include:

1. Pipelines

  • There can be one or more pipelines in a data factory. However, a pipeline refers to a logical grouping of activities for performing a unit of work. For example, a pipeline can contain a group of activities ingesting data from an Azure blob. Then, running a Hive query on an HDInsight cluster to partition the data. 

2. Data flows

  • Data flow refers to the objects build visually in Data Factory for transforming data at scale on backend Spark services. 

3. Activities

Activities represent a processing step in a pipeline. For example, you can use a copy activity for copying data from one datastore to another data store. Likewise, for transforming or analyzing your data, you can use a Hive activity running a Hive query on an Azure HDInsight cluster. Data Factory supports three types of activities: 

  • Firstly, data movement activities
  • Secondly, data transformation activities
  • Lastly, control activities.

4. Datasets

  • Datasets represent data structures within the data stores. This points to or references the data for use in your activities as inputs or outputs.
5. Linked services
  • Linked services are the same as connection strings used for defining the connection information required for Data Factory for connecting to external resources. That is to say, a linked service defines the connection to the data source, and a dataset represents the structure of the data. For example, an Azure Storage linked service specifies the connection string for connecting to the Azure Storage account. 

6. Pipeline runs

  • A pipeline run refers to an instance of pipeline execution. Moreover, you can pass the arguments manually or within the trigger definition.

7. Parameters

  • Parameters refer to the key-value pairs in a read-only configuration. In this, you define parameters in a pipeline, and then pass the arguments for the defined parameters during execution from a run context. However, the run context is created by a trigger or from a pipeline that you execute manually. 

8. Control flows

  • Control flows set up pipeline activities that include chaining activities in a sequence, branching, parameters that you define at the pipeline level. And, also the arguments that you pass as you invoke the pipeline on-demand or from a trigger. Further, this includes custom state passing and looping containers.
23. Is it possible to define default values for the pipeline parameters?

Yes, default values can be defined for the parameters in the pipelines.

24. Can an activity in a pipeline consume arguments that are passed to a pipeline run?

Yes, every activity within the pipeline can consume the parameter value that’s passed to the pipeline. This runs with the @parameter construct.

25. Can an activity output property be consumed in another activity?

Yes. An activity output can be consumed in a subsequent activity with the @activity construct.

26. For troubleshooting data flow logic, what information do I need to provide in order to get help?

Just provide the Data Flow Script, when Microsoft provides help in troubleshooting with data flows. This refers to the code-behind script from your data flow graph. However, from the ADF UI, open your data flow. Then, at the top-right corner, click the Script button. Lastly, copy and paste this script or save it in a text file.

27. What is self-hosted integration runtime? Is it available for data flows?

Self-hosted IR is an ADF pipeline that helps in constructing that is further used with the Copy Activity for acquiring or moving data to and from on-prem or VM-based data sources and sinks. However, the virtual machines that you use for a self-hosted IR can also be placed inside of the same VNET as your protected data stores for access to those data stores from ADF.

28. Does the data flow compute engine serve multiple tenants?

You must know that Clusters are never shared. But, the isolation for each job run in production runs. However, in the case of debugging, scenario one person gets one cluster, and all debugs will go to that cluster which is initiated by that user.

29. Is it possible to write attributes in cosmos DB in the same order as specified in the sink in ADF data flow?

For cosmos DB, the basic format of each document is a JSON object. And, this refers to an unordered set of name/value pairs, so the order cannot be reserved.

30. What is blob storage?

Azure Blob Storage refers to a service for storing large amounts of unstructured object data like text or binary data. This can be used for serving images or documents directly to a browser and storing files for distributed access. Further, you can use it for:

  • Firstly, streaming video and audio
  • Secondly, storing data for backup and restore disaster recovery, and archiving

31. Explain the term Data. Why it is important?

Data refers to a collection of facts like numbers, descriptions, and observations used in decision-making. Analyzing data can help in getting useful information and for making critical business decisions.

32. What is Structured Data?

Structured data refers to tabular data containing rows and columns in a database. Databases holding tables in this form are basically known as relational databases. However, each row in a table has the same set of columns.

33. What is Semi-structured Data?

Semi-structured data refers to the information that doesn’t stay in a relational database but has some structure to it.  For example, documents held in JavaScript Object Notation (JSON) format. Further, the semi-structured data has some types. This includes key-value stores and graph databases. Here, the key-value store is similar to a relational table, except that each row can have any number of columns. And, you can use a graph database for storing and querying information about complex relationships.

34. What is Unstructured Data?

You can say, not all data is structured or even semi-structured. For example, audio and video files or binary data files. These files might not have a specific structure. We can say they are unstructured data.

35. Explain the process for creating and connecting to Azure SQL Database?
  • Firstly, log into the Azure Portal with our Azure credentials. Then, create an Azure SQL database in the Azure portal.
  • Secondly, click on Create a resource on the side menu and it will open an Azure Marketplace. Then, click Databases then click on the SQL Database.
  • After that, another section will appear. There, provide the basic information about databases like Database name, Storage Space, and Server name.
36. What is Azure SQL Data Warehouse?

We can define this with a large store of data collected from a wide range of sources within a company for providing management decisions. However, these can be built by the integration of the data from multiple sources used for analytical reporting, decision making, etc. Further, SQL Data Warehouse is a cloud-based Enterprise application that provides work under parallel processing for quickly analyzing a complex query from the huge volume of data and solutions for Big-Data concepts.

37. Is it possible to migrate a SQL Server database to Azure SQL?

Yes, it is quite common to migrate a SQL Server database to Azure SQL. For this, you can use the SSMS’s Import and Export features for this purpose.

38. What do you understand by CosmosDB?

Azure Cosmos DB refers to a globally replicated, multimodal database service that provides rich querying over schema-free data. It has the ability to store data in JSON format and there is no need for defining the schema in advance. Moreover, here you can execute the SQL query on stored JSON documents. Further, this is the right solution for web, mobile, gaming applications when predictable throughput, high availability, and low latency are key requirements. 

39. Explain Azure Blob Storage?

Azure Storage refers to one of the cloud computing PaaS (Platform as a Service) services provided by the Microsoft Azure team. It offers cloud storage that is highly available, secure, durable, scalable, and redundant. Moreover, it can store and process hundreds of terabytes of data or you can store the small amounts of data required for a small business website.

40. WhaAzure Databricks?

Azure Databricks is an Apache® Spark-based analytics platform optimized for Azure. This has the ability to combine the best of Databricks and Azure for helping customers in,

  • accelerating innovation with a one-click setup
  • streamlined workflows
  • creating an interactive workspace for enabling association between data scientists, data engineers, and business analysts.

Further, it helps customers to automatically benefit from native integration with other Azure services like Power BI, SQL Data Warehouse, Cosmos DB as well as from enterprise-grade Azure security and Active Directory integration.

41. Explain the difference between services: IaaS, PaaS, and SaaS?

These are the major components of Azure.

1. Infrastructure as a Service (IaaS)

Using this, you can rent IT infrastructure from the cloud provider with a pay-as-you-go scheme. This includes:

  • servers and virtual machines (VMs)
  • storage, networks
  • operating systems
2. Platform as a Service (PaaS)

Platform as a service (PaaS) refers to cloud computing services used for supplying an on-demand environment for developing, testing, delivering, and managing software applications.

3. Software as a Service (SaaS)

Software as a service (SaaS) refers to a way for delivering software applications over the Internet using on-demand and on a subscription basis. Using this, cloud providers are able to,

  • Host and manage the software application
  • Underly infrastructure
  • Control and handle maintenance like software upgrades and security patching.

More Advanced Level Questions

42. Can I schedule a pipeline? If yes, explain?

Yes, you can schedule a pipeline. For this, use the scheduler trigger or time window trigger for scheduling a pipeline. In this, the trigger uses a wall-clock calendar schedule. This further can schedule pipelines in calendar-based recurrent patterns or periodically. For example, on Mondays at 6:00 PM and Fridays at 9:00 PM.

43. How to pass parameters to a pipeline run?

We know that parameters are a first-class, top-level concept in Data Factory. So, you can define parameters at the pipeline level and pass arguments as you execute the pipeline run on-demand. Or further, you can also use a trigger.

44. Explain the method using which I can access data by using the other 90 dataset types in Data Factory?

The mapping data flow feature currently provides access to Azure SQL Database, Azure Synapse Analytics, delimited text files from Azure Blob storage or Azure Data Lake Storage Gen2, and Data Lake Storage Gen2 natively for sourcing and sinking.

However, use the Copy activity for staging data from any of the other connectors. Then, execute a Data Flow activity for transforming data after staging. For example, your pipeline will first copy into Blob storage, and then a Data Flow activity will use a dataset in source for transforming that data.

45. In regard to data flows, what has changed from private preview to limited public preview?

Now, you will no longer have to bring your own Azure Databricks clusters. That is to say, Data Factory will manage cluster creation and tear-down. Secondly, Blob datasets and Azure Data Lake Storage Gen2 datasets are separated into delimited text and Apache Parquet datasets. But, you can still use Data Lake Storage Gen2 and Blob storage for storing those files by using the appropriate linked service for those storage engines.

46. Can you explain the two levels of security in ADLS Gen2?

1. Role-Based Access Control (RBAC).

RBAC has built-in Azure roles like the reader, contributor, owner, or custom roles. This is basically used for managing the service itself like updating settings and properties for the storage account). And, secondly, for allowing to use of built-in data explorer tools for which it requires reader permissions.

2. Access Control Lists (ACLs).

Access control lists help in specifying which data objects a user may read, write, or execute. However, execution is necessary for browsing the directory structure. They are POSIX-compliant. Where POSIX does not operate on a security inheritance model. That is to say, access ACLs are specified for every object. The concept of default ACLs is critical for new files within a directory for obtaining the correct security settings.

Because of the overhead assigning ACLs to every object, and there is a limit of 32 ACLs for every object. Further, it is important for managing data-level security in ADLS Gen1 or Gen2 via Azure Active Directory groups.

47. Differentiate Dataset and Linked Service in Data Factory?

  • Linked Service refers to a description of the connection string used for connecting to the data stores. For example, while ingesting data from a SQL Server instance, the linked service contains the name for the SQL Server instance and the credentials used for connecting to that instance.
  • Dataset is basically a reference to the data store that is explained by the linked service. While ingesting data from a SQL Server instance, the dataset points to the name of the table that contains the target data or the query that returns data from different tables.
48. Explain the limitations and constraints with wrangling data flow?

Dataset names can only contain alpha-numeric characters. The data stores are supported for the following:

  • Firstly, DelimitedText dataset in Azure Blob Storage using account key authentication
  • Secondly, DelimitedText dataset in Azure Data Lake Storage gen2 using account key or service principal authentication
  • Thirdly, DelimitedText dataset in Azure Data Lake Storage gen1 using service principal authentication
  • Lastly, Azure SQL Database and Data Warehouse using SQL authentication. However, there is no PolyBase or staging support for the data warehouse.
49. Can you explain the major difference between mapping and wrangling data flows?
  • Mapping data flow helps in providing a way for transforming data at scale without any requirement of coding. Using this, you can design a data transformation job in the data flow canvas by constructing a series of transformations. However, start with any number of source transformations followed by data transformation steps. Then, complete your data flow with a sink for landing your results in a destination. Lastly,  Mapping data flow is great at mapping and transforming data with both known and unknown schemas in the sinks and sources.
  • On the other hand, Wrangling data flow gives access for agile data preparation and exploration using the Power Query Online mashup editor at scale via spark execution. However, with an increase in data lakes sometimes you just need to explore a data set or create a dataset in the lake. Lastly, wrangling data flows are used for less formal and model-based analytics scenarios.
50. What do you understand by Microsoft Azure Table Storage?
  • Azure Table storage refers to a service used across many projects for storing structured NoSQL data in the cloud and for providing a key/attribute store with a schemaless design. Table storage offers a schemaless architecture design with a fast and cost-effective environment. 
  • Secondly, you can store flexible datasets using Table storage. This can be user data for a web application or any other device information. Or, it can be any other type of metadata that your service requires.
  • Thirdly, it lets you store any number of entities in the table. However, one storage account may consist of any number of tables. This depends on the capacity limit of the storage account.
  • Lastly, being a NoSQL datastore this accepts authenticated calls from inside and outside the Azure cloud. 

Final Words

Above, we have covered most of the Microsoft Azure DP-900 interview questions categorized into basic, intermediate, and advanced levels. However, we already discussed how much the interview process is important for getting a good job. So, it is necessary that you should put your knowledge and skills together to answer all the questions and crack the interview. Lastly, go through the questions above and give yourself a revision. I hope this will help you in your process and feel free to comment for any doubt.

Microsoft Azure Data Fundamentals (DP-900) Free Practice Test
Prepare and pass the Microsoft Azure Data Fundamentals: DP-900 Exam Now!
Menu