Transform data using a mapping data flow - Azure Data Factory (2023)

  • Article
  • 8 minutes to read

APPLIES TO: Transform data using a mapping data flow - Azure Data Factory (1)Azure Data Factory Transform data using a mapping data flow - Azure Data Factory (2)Azure Synapse Analytics

If you're new to Azure Data Factory, see Introduction to Azure Data Factory.

In this tutorial, you'll use the Azure Data Factory user interface (UX) to create a pipeline that copies and transforms data from an Azure Data Lake Storage (ADLS) Gen2 source to an ADLS Gen2 sink using mapping data flow. The configuration pattern in this tutorial can be expanded upon when transforming data using mapping data flow

Note

This tutorial is meant for mapping data flows in general. Data flows are available both in Azure Data Factory and Synapse Pipelines. If you are new to data flows in Azure Synapse Pipelines, please follow Data Flow using Azure Synapse Pipelines

In this tutorial, you do the following steps:

  • Create a data factory.
  • Create a pipeline with a Data Flow activity.
  • Build a mapping data flow with four transformations.
  • Test run the pipeline.
  • Monitor a Data Flow activity

Prerequisites

  • Azure subscription. If you don't have an Azure subscription, create a free Azure account before you begin.
  • Azure storage account. You use ADLS storage as a source and sink data stores. If you don't have a storage account, see Create an Azure storage account for steps to create one.

The file that we are transforming in this tutorial is MoviesDB.csv, which can be found here. To retrieve the file from GitHub, copy the contents to a text editor of your choice to save locally as a .csv file. To upload the file to your storage account, see Upload blobs with the Azure portal. The examples will be referencing a container named 'sample-data'.

Create a data factory

In this step, you create a data factory and open the Data Factory UX to create a pipeline in the data factory.

  1. Open Microsoft Edge or Google Chrome. Currently, Data Factory UI is supported only in the Microsoft Edge and Google Chrome web browsers.

  2. On the left menu, select Create a resource > Integration > Data Factory:

    Transform data using a mapping data flow - Azure Data Factory (3)

  3. On the New data factory page, under Name, enter ADFTutorialDataFactory.

    The name of the Azure data factory must be globally unique. If you receive an error message about the name value, enter a different name for the data factory. (for example, yournameADFTutorialDataFactory). For naming rules for Data Factory artifacts, see Data Factory naming rules.

    Transform data using a mapping data flow - Azure Data Factory (4)

  4. Select the Azure subscription in which you want to create the data factory.

  5. For Resource Group, take one of the following steps:

    a. Select Use existing, and select an existing resource group from the drop-down list.

    (Video) Azure Data Factory Mapping Data Flows Tutorial | Build ETL visual way!

    b. Select Create new, and enter the name of a resource group.

    To learn about resource groups, see Use resource groups to manage your Azure resources.

  6. Under Version, select V2.

  7. Under Location, select a location for the data factory. Only locations that are supported are displayed in the drop-down list. Data stores (for example, Azure Storage and SQL Database) and computes (for example, Azure HDInsight) used by the data factory can be in other regions.

  8. Select Create.

  9. After the creation is finished, you see the notice in Notifications center. Select Go to resource to navigate to the Data factory page.

  10. Select Author & Monitor to launch the Data Factory UI in a separate tab.

Create a pipeline with a Data Flow activity

In this step, you'll create a pipeline that contains a Data Flow activity.

  1. On the home page of Azure Data Factory, select Orchestrate.

    Transform data using a mapping data flow - Azure Data Factory (5)

  2. In the General tab for the pipeline, enter TransformMovies for Name of the pipeline.

  3. In the Activities pane, expand the Move and Transform accordion. Drag and drop the Data Flow activity from the pane to the pipeline canvas.

    Transform data using a mapping data flow - Azure Data Factory (6)

  4. In the Adding Data Flow pop-up, select Create new Data Flow and then name your data flow TransformMovies. Click Finish when done.

    Transform data using a mapping data flow - Azure Data Factory (7)

  5. In the top bar of the pipeline canvas, slide the Data Flow debug slider on. Debug mode allows for interactive testing of transformation logic against a live Spark cluster. Data Flow clusters take 5-7 minutes to warm up and users are recommended to turn on debug first if they plan to do Data Flow development. For more information, see Debug Mode.

    Transform data using a mapping data flow - Azure Data Factory (8)

Build transformation logic in the data flow canvas

Once you create your Data Flow, you'll be automatically sent to the data flow canvas. In this step, you'll build a data flow that takes the moviesDB.csv in ADLS storage and aggregates the average rating of comedies from 1910 to 2000. You'll then write this file back to the ADLS storage.

  1. In the data flow canvas, add a source by clicking on the Add Source box.

    Transform data using a mapping data flow - Azure Data Factory (9)

  2. Name your source MoviesDB. Click on New to create a new source dataset.

    (Video) Transform Data with ADF Data Flows, Part 1: Getting Started

    Transform data using a mapping data flow - Azure Data Factory (10)

  3. Choose Azure Data Lake Storage Gen2. Click Continue.

    Transform data using a mapping data flow - Azure Data Factory (11)

  4. Choose DelimitedText. Click Continue.

    Transform data using a mapping data flow - Azure Data Factory (12)

  5. Name your dataset MoviesDB. In the linked service dropdown, choose New.

    Transform data using a mapping data flow - Azure Data Factory (13)

  6. In the linked service creation screen, name your ADLS gen2 linked service ADLSGen2 and specify your authentication method. Then enter your connection credentials. In this tutorial, we're using Account key to connect to our storage account. You can click Test connection to verify your credentials were entered correctly. Click Create when finished.

    Transform data using a mapping data flow - Azure Data Factory (14)

  7. Once you're back at the dataset creation screen, enter where your file is located under the File path field. In this tutorial, the file moviesDB.csv is located in container sample-data. As the file has headers, check First row as header. Select From connection/store to import the header schema directly from the file in storage. Click OK when done.

    Transform data using a mapping data flow - Azure Data Factory (15)

  8. If your debug cluster has started, go to the Data Preview tab of the source transformation and click Refresh to get a snapshot of the data. You can use data preview to verify your transformation is configured correctly.

    Transform data using a mapping data flow - Azure Data Factory (16)

  9. Next to your source node on the data flow canvas, click on the plus icon to add a new transformation. The first transformation you're adding is a Filter.

    Transform data using a mapping data flow - Azure Data Factory (17)

  10. Name your filter transformation FilterYears. Click on the expression box next to Filter on to open the expression builder. Here you'll specify your filtering condition.

    Transform data using a mapping data flow - Azure Data Factory (18)

  11. The data flow expression builder lets you interactively build expressions to use in various transformations. Expressions can include built-in functions, columns from the input schema, and user-defined parameters. For more information on how to build expressions, see Data Flow expression builder.

    In this tutorial, you want to filter movies of genre comedy that came out between the years 1910 and 2000. As year is currently a string, you need to convert it to an integer using the toInteger() function. Use the greater than or equals to (>=) and less than or equals to (<=) operators to compare against literal year values 1910 and 2000. Union these expressions together with the and (&&) operator. The expression comes out as:

    toInteger(year) >= 1910 && toInteger(year) <= 2000

    To find which movies are comedies, you can use the rlike() function to find pattern 'Comedy' in the column genres. Union the rlike expression with the year comparison to get:

    (Video) 81. Rank Transformation in Mapping Data Flow in Azure Data Factory

    toInteger(year) >= 1910 && toInteger(year) <= 2000 && rlike(genres, 'Comedy')

    If you've a debug cluster active, you can verify your logic by clicking Refresh to see expression output compared to the inputs used. There's more than one right answer on how you can accomplish this logic using the data flow expression language.

    Transform data using a mapping data flow - Azure Data Factory (19)

    Click Save and Finish once you're done with your expression.

  12. Fetch a Data Preview to verify the filter is working correctly.

    Transform data using a mapping data flow - Azure Data Factory (20)

  13. The next transformation you'll add is an Aggregate transformation under Schema modifier.

    Transform data using a mapping data flow - Azure Data Factory (21)

  14. Name your aggregate transformation AggregateComedyRatings. In the Group by tab, select year from the dropdown to group the aggregations by the year the movie came out.

    Transform data using a mapping data flow - Azure Data Factory (22)

  15. Go to the Aggregates tab. In the left text box, name the aggregate column AverageComedyRating. Click on the right expression box to enter the aggregate expression via the expression builder.

    Transform data using a mapping data flow - Azure Data Factory (23)

  16. To get the average of column Rating, use the avg() aggregate function. As Rating is a string and avg() takes in a numerical input, we must convert the value to a number via the toInteger() function. This is expression looks like:

    avg(toInteger(Rating))

    Click Save and Finish when done.

    Transform data using a mapping data flow - Azure Data Factory (24)

  17. Go to the Data Preview tab to view the transformation output. Notice only two columns are there, year and AverageComedyRating.

    Transform data using a mapping data flow - Azure Data Factory (25)

  18. Next, you want to add a Sink transformation under Destination.

    Transform data using a mapping data flow - Azure Data Factory (26)

  19. Name your sink Sink. Click New to create your sink dataset.

    (Video) 51. JOIN Transformation in Mapping Data Flow in Azure Data Factory

    Transform data using a mapping data flow - Azure Data Factory (27)

  20. Choose Azure Data Lake Storage Gen2. Click Continue.

    Transform data using a mapping data flow - Azure Data Factory (28)

  21. Choose DelimitedText. Click Continue.

    Transform data using a mapping data flow - Azure Data Factory (29)

  22. Name your sink dataset MoviesSink. For linked service, choose the ADLS gen2 linked service you created in step 6. Enter an output folder to write your data to. In this tutorial, we're writing to folder 'output' in container 'sample-data'. The folder doesn't need to exist beforehand and can be dynamically created. Set First row as header as true and select None for Import schema. Click Finish.

    Transform data using a mapping data flow - Azure Data Factory (30)

Now you've finished building your data flow. You're ready to run it in your pipeline.

Running and monitoring the Data Flow

You can debug a pipeline before you publish it. In this step, you're going to trigger a debug run of the data flow pipeline. While data preview doesn't write data, a debug run will write data to your sink destination.

  1. Go to the pipeline canvas. Click Debug to trigger a debug run.

    Transform data using a mapping data flow - Azure Data Factory (31)

  2. Pipeline debug of Data Flow activities uses the active debug cluster but still take at least a minute to initialize. You can track the progress via the Output tab. Once the run is successful, click on the eyeglasses icon to open the monitoring pane.

    Transform data using a mapping data flow - Azure Data Factory (32)

  3. In the monitoring pane, you can see the number of rows and time spent in each transformation step.

    Transform data using a mapping data flow - Azure Data Factory (33)

  4. Click on a transformation to get detailed information about the columns and partitioning of the data.

    Transform data using a mapping data flow - Azure Data Factory (34)

If you followed this tutorial correctly, you should have written 83 rows and 2 columns into your sink folder. You can verify the data is correct by checking your blob storage.

Next steps

The pipeline in this tutorial runs a data flow that aggregates the average rating of comedies from 1910 to 2000 and writes the data to ADLS. You learned how to:

  • Create a data factory.
  • Create a pipeline with a Data Flow activity.
  • Build a mapping data flow with four transformations.
  • Test run the pipeline.
  • Monitor a Data Flow activity

Learn more about the data flow expression language.

(Video) 59. Select Transformation in Mapping Data Flow in Azure Data Factory

FAQs

Which transformation method should you use in the mapping data flow? ›

DERIVED COLUMN

Often there is a need to create new calculated fields or update data in the existing fields in a data stream. Derived column transformation can be used in such cases. One can add as many fields as needed in this transformation and provide the calculation expression for the new fields as shown below.

Which of the following are valid transformations available in the mapping data flow? ›

Mapping Data Flow –

You can perform several transformations such as Filter, JOIN, Aggregate, Union, Lookup, Sort, etc using mapping data flows.

Which transformation in the mapping data flow is used to routes data rows to different streams based on matching conditions? ›

The conditional split transformation routes data rows to different streams based on matching conditions. The conditional split transformation is similar to a CASE decision structure in a programming language.

How many transformation you can use in your mapping in Informatica? ›

There are 3 Informatica transformations viz. External Procedure, Lookup, and Stored Procedure which can be unconnected in a valid mapping (A mapping which the Integration Service can execute).

What is the difference between a mapping and a transformation? ›

Mapping is the process of creating links between the source and target data objects; transformation involves creating formulas to accomplish any of a wide variety of tasks using Insight's rich function library.

What are the two types of data transformation? ›

During the transformation process, an analyst or engineer will determine the data structure. The most common types of data transformation are: Constructive: The data transformation process adds, copies, or replicates data. Destructive: The system deletes fields or records.

What is the process of transforming data? ›

Data transformation is the process of converting, cleansing, and structuring data into a usable format that can be analyzed to support decision making processes, and to propel the growth of an organization. Data transformation is used when data needs to be converted to match that of the destination system.

What are transformation rules in data mapping? ›

A transformation rule represents instructions to the developer who completes the job that you define in the mapping specification. The transformation rules describe the current state of the information and what needs to be done to it to produce a particular result.

Can a mapping contain multiple transformations? ›

You can add multiple Mapplet transformations to a mapping or mapplet. The Mapplet transformation can be active or passive based on the transformation logic within the mapplet. An active mapplet includes at least one active transformation.

What are the different types of text formats available while transforming data using mapping data flows? ›

Parse text columns in your data stream that are strings of JSON, delimited text, or XML formatted text.

In which transformation is connected to the data flow or connected to other transformation in mapping pipeline? ›

A connected transformation is connected to the data flow or connected to the other transformations in the mapping pipeline.

What is the difference between the mapping data flow and wrangling data flow transformation activities in data factory? ›

Additionally, mapping data flows are operated as activities within the Data Factory pipeline on fully managed Data Factory Spark clusters. Wrangling data flow activities denote data preparation activities that don't require code.

What is mapping data flow in Azure data Factory? ›

Mapping data flows are visually designed data transformations in Azure Data Factory. Data flows allow data engineers to develop data transformation logic without writing code. The resulting data flows are executed as activities within Azure Data Factory pipelines that use scaled-out Apache Spark clusters.

Which is the minimum transformation required in mapping? ›

Whenever we add a relational source or a flat file to a mapping, a source qualifier transformation is required. When we add a source to a mapping, source qualifier transformation is added automatically.

Do you have to transform all data? ›

Often data transformation is a mandatory step needed for further data management tasks like data conversion and data integration. Data transformation is a key step in each of these processes because it can help shape, standardize, and overall create consistency between various datasets.

How many types of 3 transformations are there? ›

There are four common types of transformations - translation, rotation, reflection, and dilation.

What are the 5 elements of mapping? ›

Map Elements. Most maps contain the same common elements: main body, legend, title, scale and orientation indicators, inset map, and source notes.

What are the 5 types of mapping? ›

According to the ICSM (Intergovernmental Committee on Surveying and Mapping), there are five different types of maps: General Reference, Topographical, Thematic, Navigation Charts and Cadastral Maps and Plans.

What are the 5 stages of transforming data into information? ›

To be effectively used in making decisions, data must go through a transformation process that involves six basic steps: 1) data collection, 2) data organization, 3) data processing, 4) data integration, 5) data reporting and finally, 6) data utilization.

What are the three most common transformations in ETL processes? ›

ETL transformation types

Basic transformations: Cleaning: Mapping NULL to 0 or "Male" to "M" and "Female" to "F," date format consistency, etc. Deduplication: Identifying and removing duplicate records. Format revision: Character set conversion, unit of measurement conversion, date/time conversion, etc.

What is the most effective transformation method? ›

A logarithmic model is the most effective transformation method.

Which two tasks are part of the transforming data process? ›

Data transformation is a component of most data integration and data management tasks, such as data wrangling and data warehousing.

Why do we need to transform data? ›

Data is transformed to make it better-organized. Transformed data may be easier for both humans and computers to use. Properly formatted and validated data improves data quality and protects applications from potential landmines such as null values, unexpected duplicates, incorrect indexing, and incompatible formats.

What are the steps in the transformation process? ›

Key steps in the process of bacterial transformation: (1) competent cell preparation, (2) transformation of cells, (3) cell recovery, and (4) cell plating.

What are the 4 types of transformations and how do we describe them? ›

Translation is when we slide a figure in any direction. Reflection is when we flip a figure over a line. Rotation is when we rotate a figure a certain degree around a point. Dilation is when we enlarge or reduce a figure.

Which transformation Cannot be used in Mapplet? ›

You cannot include the following objects in a mapplet: Normalizer transformations. Cobol sources. XML Source Qualifier transformations.

What are the three types of mapping? ›

General Reference (sometimes called planimetric maps) Topographic Maps. Thematic. Navigation Charts.

Which main component types transform data into a data flow? ›

The source component introduces data into the data flow, where it moves from transformation to transformation. After the data has been processed, it moves on to the destination component, which pushes it out to storage, including databases, files, the Web, or any other type of destination you can access.

Which method is used in data transformation *? ›

Normalization. Also called data pre-processing, this is one of the crucial techniques for data transformation in data mining.

Which of the following transformation can be used to set the value of the mapping variable using variable functions? ›

Use variable functions in an expression to set the value of a mapping variable for the next session run. The transformation language provides the following variable functions to use in a mapping: SetMaxVariable.

Which transformation in the mapping data flow is used to route data rows to different streams based on matching conditions? ›

The conditional split transformation routes data rows to different streams based on matching conditions. The conditional split transformation is similar to a CASE decision structure in a programming language.

What is the purpose of data mapping? ›

Data mapping is the process of connecting a data field from one source to a data field in another source. This reduces the potential for errors, helps standardize your data and makes it easier to understand your data.

Why is data flow mapping important? ›

Data maps can also be used for a variety of other purposes, including to improve business processes, improve IT systems and IT controls, identify areas for risk mitigation, provide ideas for annual budget planning as well as training opportunities for staff.

Videos

1. 65. Flatten Transformation in Mapping Data Flow in Azure Data Factory
(WafaStudies)
2. ADF Stringify data flow transformation
(Azure Data Factory)
3. ADF Mapping Data Flows: Transforming JSON
(Azure Data Factory)
4. 61. Unpivot Transformation in Mapping Data Flow in Azure Data Factory
(WafaStudies)
5. 55. Union Transformation in Mapping Data Flow in Azure Data Factory
(WafaStudies)
6. #31. Azure Data Factory - Mapping data flow with Derived column transformation added
(All About BI !)
Top Articles
Latest Posts
Article information

Author: Otha Schamberger

Last Updated: 08/21/2022

Views: 6339

Rating: 4.4 / 5 (55 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Otha Schamberger

Birthday: 1999-08-15

Address: Suite 490 606 Hammes Ferry, Carterhaven, IL 62290

Phone: +8557035444877

Job: Forward IT Agent

Hobby: Fishing, Flying, Jewelry making, Digital arts, Sand art, Parkour, tabletop games

Introduction: My name is Otha Schamberger, I am a vast, good, healthy, cheerful, energetic, gorgeous, magnificent person who loves writing and wants to share my knowledge and understanding with you.