- 9 minutes to read
Data landing zones are connected to your data management landing zone by virtual network (VNet) peering. Each data landing zone is considered a landing zone related to Azure landing zone architecture.
Before provisioning a data landing zone, make sure your DevOps and CI/CD operating model is in place and a data management landing zone is deployed.
Each data landing zone has several layers that enable agility for the service data integrations and data products it contains. You can deploy a new data landing zone with a standard set of services that let the data landing zone begin ingesting and analyzing data.
Your Azure subscription associated with your data landing zone has the following structure:
A data application produces one or more data products.
Data landing zone architecture
Data landing zone architecture illustrates the layers, their resource groups, and services each resource group contains. The architecture also provides an overview of all groups and roles associated with your data landing zone, plus the extent of their access to your control and data planes.
Before you deploy a data landing zone, make sure you consider the number of initial data landing zones you want to deploy.
Use this architecture as a starting point. Download the Visio file and modify it to fit your specific business and technical requirements when planning your data landing zone implementation.
Core services layer
The core services layer includes all services required to enable your data landing zone within the context of cloud-scale analytics. The following table lists the resource groups that provide the standard suite of available services in every data landing zone you deploy.
|databricks-monitoring-rg||Optional||Monitoring for Azure Databricks workspaces|
|hive-rg||Optional||Hive metastore for Azure Databricks|
|storage-rg||Yes||Data lakes services|
|external-data-rg||Yes||Upload ingest storage|
|runtimes-rg||Yes||Shared integration runtimes|
|metadata-ingestion-rg||Optional||Data agnostic ingestion|
|databricks-monitoring-rg||Optional||Log analytics workspace for databricks workspaces in landing zone|
|shared-synapse-rg||Optional||Shared Azure Synapse|
|shared-databricks-rg||Optional||Shared Azure Databricks workspace|
The network resource group contains core components, including Azure Network Watcher, network security groups (NSG), and a virtual network. All of these services are deployed into a single resource group.
The virtual network of your data landing zone is automatically peered with your data management landing zone's VNet and your connectivity subscription's VNet.
Azure Databricks workspaces monitoring
This resource group is optional and only deploys with Azure Databricks.
The Azure landing zone pattern recommends that you send all logs to a central Log Analytics workspace. However, each data landing zone also includes a monitoring resource group to capture Spark logs from Databricks. Each resource group contains a shared Log Analytics workspace and Azure Key Vault to store Log Analytics keys.
Only use the Log Analytics workspace in your Databricks monitoring resource group to capture Azure Databricks Spark logs.
For more information, see Monitoring Azure Databricks.
Hive metastore for Azure Databricks
This resource group is optional and should only be deployed with Azure Databricks.
The Hive metastore for Azure Databricks provisions an Azure Database for MySQL database and a key vault. All Azure Databricks workspaces in your data landing zone use this metastore as their external Apache Hive metastore.
For more information, see External Apache Hive metastore.
Data lake services
As shown in the previous diagram, three Azure Data Lake Storage Gen2 accounts are provisioned in a single data lake services resource group. Data transformed at different stages is saved in one of your data landing zone's data lakes. The data is available for consumption by your analytics, data science, and visualization teams.
Data lake layers use different terminology depending on technology and vendor. This table provides guidance on how to apply terms for cloud-scale analytics:
|Cloud-scale analytics||Delta Lake||Other terms||Description|
|Raw||Bronze||Landing and Conformance||Ingestion Tables|
|Enriched||Silver||Standardization Zone||Refined Tables. Stored full entity, consumption-ready recordsets from systems of record.|
|Curated||Gold||Product Zone||Feature or aggregated tables. Primary zone for applications, teams, and users to consume data products.|
|Development||--||Development Zone||Location for data engineers and scientists, comprising both an analytics sandbox and a product development zone.|
In the previous diagram, each data landing zone has three data lakes. However, depending on your requirements, you might want to consolidate your raw, enriched and curated layers into one storage account, and maintain another storage account called 'development' for data consumers to bring in other useful data products.
For more information, see:
- Overview of Azure Data Lake Storage for cloud-scale analytics
- Data Standardization
- Provision Azure Data Lake Storage Gen2 accounts for each data landing zone
- Key considerations for Azure Data Lake Storage
- Access control and data lake configurations in Azure Data Lake Storage
Upload ingest storage
Third-party data publishers need to land data in your platform so your data application teams can pull it into their data lakes. As seen in the following diagram, your upload ingest storage resource group lets you provision blob stores for third-parties.
Your data application teams request these storage blobs. Their requests are then approved by your data landing zone operations team. Data should be removed from its source storage blob once it's been pulled from the storage blob into raw.
Since Azure Storage blobs are provisioned on an as-needed basis, you should initially deploy an empty storage services resource group in each data landing zone.
Deploy a virtual machine scale set with self-hosted integration runtimes into your data landing zone. Host it in the shared integration resource group. This deployment lets you rapidly onboard data products to your data landing zone.
To enable the resource group:
- Create at least one Azure Data Factory in your data landing zone's shared integration resource group. Use it only for linking the shared self-hosted integration runtime, not for data pipelines.
- Create a shared image for the Azure virtual machine scale set with a self-hosted integration runtime configured.
- Set up the self hosted integration runtimes in high availability mode.
- Associate the self-hosted integration runtimes with Azure data factories in your data landing zone(s).
- Set up Azure Automation to periodically update the self hosted integration runtime.
Deploy shared integration runtimes as close to the data source as possible. Their deployment does not restrict your deployment of integration runtimes in a data landing zone or into third-party clouds. Instead, it provides a fallback for cloud native, in-region data sources.
CI/CD Agents help you deploy data applications and changes to the data landing zone.
For more information, see Azure Pipeline agents.
Data agnostic ingestion
This resource group is optional, and doesn't prohibit you from deploying your landing zone.
This resource group applies if you have (or are developing) a data agnostic ingestion engine for automatically ingesting data based on registering metadata (including connection strings, path to copy data from and to, and ingestion schedule. The ingestion and processing resource group has key services for this kind of framework.
Deploy an Azure SQL Database instance to hold metadata used by Azure Data Factory. Provision an Azure Key Vault to store secrets relating to automated ingestion services. These secrets can include:
- Azure Data Factory metastore credentials
- Service principal credentials for your automated ingestion process
For more information, see How automated ingestion frameworks support cloud-scale analytics in Azure.
Services included in this resource group include:
|Azure Data Factory||Yes||Azure data factory is your orchestration engine for data agnostic ingestion.|
|Azure SQL DB||Yes||Azure SQL DB is the metastore for Azure Data Factory.|
|Event Hubs or IoT Hub||Optional||Event Hubs or IoT Hub can provide real-time streaming to Event Hubs, plus batch and streaming processing via a Databricks engineering workspace.|
|Azure Databricks||Optional||You can deploy Azure Databricks or Azure Synapse Spark for use with your data agnostic ingestion engine.|
|Azure Synapse||Optional||You can deploy Azure Databricks or Azure Synapse Spark to use with the data agnostic ingestion engine.|
This resource group is optional and only deploys with Azure Databricks. Everyone in your data landing zone can use a Databricks workspace.
Azure Databricks is a key consumer of the Azure Data Lake Storage service. Atomic file operations are optimized for Spark analytic engines. This optimization speeds up the completion of Spark jobs that Azure Databricks service issues.
An Azure Databricks workspace called the Azure Databricks (analytics) workspace is provisioned for all data scientists and DataOps, as shown in the shared products resource group.
You can configure this workspace to connect to your Azure Data Lake using either Azure Active Directory passthrough or table access control. Depending on your use case, you can configure conditional access as another security measure.
Follow cloud-scale analytics best practices to integrate Azure Databricks:
The Azure landing zone pattern recommends that you send all logs to a central Log Analytics workspace. However, each data landing zone also contains a monitoring resource group to capture Spark logs from Databricks.
This resource group is optional.
During your initial setup of a data landing zone, a single Azure Synapse Analytics workspace is deployed for use by all data analysts and scientists in your shared products resource group.
You can set up more synapse workspaces for data products if cost management and recharge are required. Your data application teams might make use of dedicated Azure Synapse Analytics workspaces to create dedicated Azure SQL Database pools as a read data store used by your visualization layer.
Prevent the use of your shared Azure Synapse workspace for data product creation by locking down the workspace to only allow SQL On-demand queries. It is there for exploitative purposes only.
Each data landing zone can have multiple data products. You can create these data products by ingesting data from source. You can also create data products from other data products within the same data landing zone or from other data landing zones. Data product creation of the data products is subject to data steward approval.
Data product resource group
Your data product resource group product includes all the services required to make that data product. For example, an Azure Database is required for MySQL, which is used by a visualization tool. Data must be ingested and transformed before it lands into that MySQL database. In this case, you can deploy Azure Database for MySQL and an Azure Data Factory into the data product resource group.
If you choose not to implement a data agnostics engine for ingesting once from operational sources, or if complex connections aren't facilitated in your data agnostics engine, create a source aligned data application. For more information, see Data applications (source-aligned)
For more information on how to onboard data products, see Cloud-scale analytics data products in Azure.
An empty visualization resource group is created for every data landing zone. Fill this resource group with services you need to implement your visualization solution. Using your existing VNet lets your solution connect to data products.
This resource group can host virtual machines for third-party visualization services.
Due to licensing costs, it might be more economical to deploy third-party visualization products into your data management landing zone, and for those products to connect across data landing zones to pull data back.
- Cloud-scale analytics data products in Azure
The Cloud Adoption Framework (CAF Framework) is a collection of documentation, implementation guidance, best practices, and tools that are proven guidance from Microsoft designed to accelerate your cloud adoption journey.What are landing zones in cloud? ›
A landing zone is a modular and scalable configuration that enables organizations to adopt Google Cloud for their business needs. A landing zone is often a prerequisite to deploying enterprise workloads in a cloud environment.What stages does the cloud adoption framework consists of? ›
The four stages are project, foundation, migration, and reinvention.What is the difference between cloud adoption framework CAF and well-architected framework WAF )? ›
In short, Azure's Well-Architected Framework targets a specific workload, and the Cloud Adoption Framework picks them up from the point of migration to the cloud.What are the 4 pillars of cloud value framework? ›
Understanding the AWS Cloud Value Framework
It also helps them measure and track progress against the four dimensions of value: Cost Savings (TCO) Staff Productivity. Operational Resilience.
A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories.Why do I need a landing zone? ›
The landing zone acts as a controlled and secure foundation where you can quickly deploy new applications and services without having to spend time configuring the bare essentials like setting up AWS CloudTrail or AWS organizations to get building.What are the three main factors of the cloud computing adoption? ›
It was found that the most contributing factors for use or cloud computing are technology readiness, human readiness, organization support, environment and, security and privacy.Why do we need cloud adoption framework? ›
The Cloud Adoption Framework helps decision makers align strategies for business, culture, and technical change to achieve desired business outcomes. The Cloud Adoption Framework provides technical guidance for Microsoft Azure.What are the three 3 main cloud service models? ›
IaaS, PaaS and SaaS are the three most popular types of cloud service offerings. They are sometimes referred to as cloud service models or cloud computing service models.
Cloud adoption is a strategy used by enterprises to improve the scalability of Internet-based database capabilities while reducing cost and risk. To achieve this, businesses engage in the practice of cloud computing or using remote servers hosted on the Internet to store, manage, and process critical data.What are the five pillars of the well-architected framework WAF )? ›
AWS Well-Architected Framework
The framework includes five pillars which include best practices, including: operational excellence, security, reliability, performance efficiency, and cost optimization.
This data must be evaluated against the seven common migration strategies (7 Rs) for moving applications to the AWS Cloud. These strategies are refactor, replatform, repurchase, rehost, relocate, retain, and retire. For more information, see the 7 Rs in the glossary.What are the 6 R's of cloud? ›
This originates from the “5 R's” model published by Gartner in 2010, which defined all the basic options to migrate a specific application to the cloud. Amazon Web Services (AWS) adopted this model and extended it to the 6 R's: Re-host, Re-platform, Re-factor/Re-architect, Re-purchase, Retire and Retain.What is the first step of cloud adoption? ›
The first step is to understand which on-premises applications are the best candidates for migration to, or development in, the cloud. This decision should be based on usage trends and business impact.What is the cloud adoption lifecycle? ›
The cloud adoption lifecycle is a concept that describes the phases organizations go through as they migrate to most types of cloud-based technologies. Those phases include: Evaluation of cloud services. Implementation of a small-scale proof-of-concept.What are the four 4 deployment models in cloud computing? ›
There are four cloud deployment models: public, private, community, and hybrid. Each deployment model is defined according to where the infrastructure for the environment is located.What are 3 types of cloud deployment models? ›
There are four cloud deployment models: public, private, community, and hybrid. Each deployment model is defined according to where the infrastructure for the environment is located.What are the 3 main types models in cloud computing? ›
There are also three main types of cloud computing services: Infrastructure-as-a-Service (IaaS), Platforms-as-a-Service (PaaS), and Software-as-a-Service (SaaS). Choosing a cloud type or cloud service is a unique decision.Is cloud adoption journey a framework? ›
The Cloud Adoption Framework is proven guidance that's designed to help you create and implement the business and technology strategies necessary for your organization to succeed in the cloud.
Inadequate skills or experience
It's not just the migration itself that's at risk here. Ongoing management in the cloud environment is also likely to suffer, leading to poor cost control and an inability to leverage cloud benefits such as agility and scalability.
Organizations must take into account four important factors: security, performance, integration and legal requirements.What are the three levels of cloud services defined by NIST? ›
The NIST Cloud Computing Definition provides three possible cloud services categories (called service models): Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS).What are the 3 basic components of cloud computing? ›
Cloud computing is a general term for anything that involves delivering hosted services over the internet. These services are divided into three main categories or types of cloud computing: infrastructure as a service (IaaS), platform as a service (PaaS) and software as a service (SaaS).What is the most popular cloud model? ›
The public cloud model is the most widely used cloud service. This cloud type is a popular option for web applications, file sharing, and non-sensitive data storage. The service provider owns and operates all the hardware needed to run a public cloud.What is cloud adoption roadmap? ›
A cloud adoption roadmap documents your organization's cloud transition strategy and goals. It includes every step, action item, and deliverable of the cloud adoption journey, from evaluating cloud provider options, to implementing a migration, to continuously optimizing your organization's use of cloud technologies.