As a third step in my Data Management article
series – lets look at commonly used terminology in the domain. Now these
are very standard definitions I am quoting from a standard available glossary.
The next step – next article would be to explain the relevance and usage of
these terminology in business world. E.g. How to look at data standardization in
supplier data context or material data context – when it comes to optimizing
your procurement processes. That’s next.
In my first article in this data management
series –I compared data management with the story of elephant and seven blind
men. http://manageyourdata.blogspot.in/2012/09/data-management-elephant-seven-blind-men.html
The second post is more about – why its
important to speak same language when you are running any data management
initiative. http://manageyourdata.blogspot.in/2012/09/data-management-are-we-all-speaking.html
Data analysis : Analysis of data is a process of
inspecting, cleaning, transforming, and modeling data with the goal of
highlighting useful information, suggesting conclusions, and supporting
decision making.
Data Governance : The
exercise of decision-making and authority for data-related matters. The
organizational bodies, rules, decision rights, and accountability of people
and information systems as they perform information-related processes. Data
Governance determines how an organization makes decisions -- how we
"decide how to decide."
Data
Governance Framework: A logical structure for organizing how
we think about and communicate Data Governance concepts.
Data Governance Methodology:A logical structure providing step-by-step instructions for performing Data Governance processes.
Data Governance Office (DGO): A centralized organizational entity responsible for facilitating and coordinating Data Governance and/or Stewardship efforts for an organization. It supports a decision-making group, such as a Data Stewardship Council.
Data Mapping :The process of assigning a source data element to a target data element.
Data Modeling :The discipline, process, and organizational group that conducts analysis of data objects used in a business or other context,entities the relationships among these data objects, and creates models that depict those relationships
Data Governance Methodology:A logical structure providing step-by-step instructions for performing Data Governance processes.
Data Governance Office (DGO): A centralized organizational entity responsible for facilitating and coordinating Data Governance and/or Stewardship efforts for an organization. It supports a decision-making group, such as a Data Stewardship Council.
Data Mapping :The process of assigning a source data element to a target data element.
Data Modeling :The discipline, process, and organizational group that conducts analysis of data objects used in a business or other context,entities the relationships among these data objects, and creates models that depict those relationships
Master
Data Management (MDM): A structured approach to defining and
managing an organization's Master Data
Data
Classification :The categorization of data, following various schema to support various
business or technology goals.
Data
Cleansing : Also referred to as data scrubbing. Data Cleansing is the process of detecting
dirty data in a database (data that is incorrect, out-of-date, redundant,
incomplete, or formatted incorrectly) and then removing and/or correcting the
data. Data cleansing is often necessary to bring consistency to different sets
of data that have been merged from separate databases. Cleansing data involves
consolidating data within a database by removing inconsistent data, removing
duplicates and re-indexing existing data in order to achieve the most accurate
and concise database. It can involve manual tasks or processes automated by
special Data Quality tools. A particular type of Data Cleansing is Address
Cleansing, in which street addresses are converted to a standard format as set
forth by the U.S. Postal Service master database. For example, standard
abbreviations are utilized, typos are corrected and ZIP codes are converted to
9-digit format. Address cleansing is usually done in conjunction with address
matching, a process that validates an address against one of the 57 million
addresses in the USPS database
Data
Conversion: The manipulation of information sets from one format or structure to
another. Data Conversion is often required when acquiring sets of information
from outside sources
Data
Enrichment: An activity that supplements and/or improves the existing data
Data
Mart: A repository of data gathered from operational data and other sources.
The data may derive from an enterprise-wide database or data warehouse or from
more specialized sources. The emphasis of a data mart is on meeting the
expectations and needs of a particular group of users, so it may be designed to
assist them in performing analysis and understanding the content
Data
Mining: The analysis of data for relationships not previously discovered. Data
Mining (DM) is also known as Knowledge Discovery. It is the process of
automatically searching large volumes of data for patterns that may be used to
predict future behavior
Data
Profiling: The process of examining data in an existing database and collecting
statistics and information about that data. The information collected may be
used to collect metrics on data quality, assess whether metadata accurately
describes the actual values in the source database, determine if existing data
can be re-purposed, or understand risks and challenges in using the data
Data
Quality: The practice of correcting, standardizing, and verifying data
Data
Standardization: The transformation of data into consistent formats
Data
Validation :As a broad concept, Data Validation refers to the confirmation of the
reliability of data through a checking process. As a set of processes Data
Validation refers to a systematic review of a data set to entity outliers or
suspect values. More specifically, data validation refers to the systematic
process of independently reviewing a body of analytical data against
established criteria to provide assurance that the data are acceptable for
their intended use. Within databases, Data Validation refers to procedures
built into databases to define and check acceptable input for fields, and to
accept or reject the data
For a detailed
level glossary – you can visit http://www.datagovernance.com/glossary_d.html
. Most of the definitions are from this glossary.
Thanks
Prashant Mendki
Twitter - @pmendki
Great post! I have been trying to work with some data management systems and data recovery programs with the company I just opened up in case something were to happen. Thanks for all the useful definitions, I'm sure this will help me with this process.
ReplyDelete