Management of data quality related problems: exploiting operational knowledge
Publicatie van Kenniscentrum Creating 010
Jan Dijk,van, M.S. Bargh, R. Choenni | Artikel | Publicatiedatum: 01 juli 2016
Dealing with data quality related problems is an important issue that all organizations face in realizing and sustaining data intensive advanced applications. Upon detecting these problems in datasets, data analysts often register them in issue tracking systems in order to address them later on categorically and collectively. As there is no standard format for registering these problems, data analysts often describe them in natural languages and subsequently rely on ad-hoc, non-systematic, and expensive solutions to categorize and resolve registered problems. In this contribution we present a formal description of an innovative data quality resolving architecture to semantically and dynamically map the descriptions of data quality related problems to data quality attributes. Through this mapping, we reduce complexity – as the dimensionality of data quality attributes is far smaller than that of the natural language space – and enable data analysts to directly use the methods and tools.