Approach to Start the Data Profiling

So you got the large / medium enterprise legacy systems or may be a ERP system in your organization and decided to profile the data you have. The first step is to decide what all data you are going to work with. Normally spend analysis has to be done on your procurement, material data. So material master, vendor master, MRV, PO, part master are the ideal candidate to start.
While extracting the data you need to very careful as if you miss some critical fields or required fields for analysis - which you will come to know at very later stage and then everything starts from the scratch. E in the ETL process is a large subject in itself to talk. So we will not get into that right now.

Here I will assume that, you are there. You identified the fields correctly and then went ahead with the generating those text files - with proper delimiters. :-).

And you are ready for running profiling. Here we have two options. A semi automatic database driven approach and fully automatic tool dependant approach. both are good and having their own advantages. We will see -

1. Database driven approach: Here you load all the extract text files in say MS SQL Server database server specific table. You write some routines which will tell you about miss and hit. If you know the data structure and the underlying data well and most importantly - how to write good generic queries then this option is for you. I will talk about this in my next writeup.
2. Automatic tool dependant approach - There are several tools available in the market just to do this kind of work for you. Choose one of them and you are done. I know Exeros X-profiler.
The tool let you create a project, link a text file or major type of database as a data source and give you a profiling report. It fulfills most of the requirements that we need. So its good.
Look for my next post comparing these things - manual approach , profiler tools and all.

Enterprise Performance Management - Data is First Priority

Search This Blog

Approach to Start the Data Profiling

Labels

Comments

Post a Comment

Popular posts from this blog

Data Management – Are we all speaking same language ?

Data Management - Elephant & Seven Blind Men

Master Data Management – Product or Process ?