1

I have data originating from different flat .csv files that I have uploaded to Azure Blob Storage. With Azure Data Factory, I have created a SQL database containing all the tables from the different files. All data sources contain the same underlying data, but use slightly different naming conventions. The data levels in my data sources are:

  • Brand House
  • Brand Group
  • Product Name
  • Size

I would like to create one unique mapping convention (unique ID on the lowest hierarchy level) that can link all data sources together. The goal is to have a unique ID on Size level that is created in each table.

Currently I am thinking of writing a script for this in Python that looks at the string values in the different tables and creates a unique ID for each hierarchy level in the data. Then this script is ran with Azure Data Bricks and all IDs are created. This approach requires me to have a look at all the different options on each hierarchy level and think of a smart naming convention.

Is there any built-in functionality in Azure Data Factory or other smart tools that can help me with this problem? The approach I have described above requires quite some manual effort and I would like to leverage any best practices here.

4

0 回答 0