Data Quality in Cooperative Information · PDF fileData Quality in Cooperative Information...
Transcript of Data Quality in Cooperative Information · PDF fileData Quality in Cooperative Information...
Data Quality inData Quality inCooperative Information SystemsCooperative Information Systems
Monica ScannapiecoMonica Scannapieco
Dipartimento di Informatica e SistemisticaUniversità di Roma “La Sapienza”, Italy
Supervisors: Carlo Batini, Tiziana Catarci
Istituto di Analisi dei Sistemi e InformaticaConsiglio Nazionale delle Ricerche, Italy
Supervisor: Paola Bertolazzi
Data Quality:a Multidimensional Concept
Data Quality
(Wang & Strong 1996)
Believability, Accuracy, Objectivity, Reputation, Value-Added, Relevancy, Timeliness, Completeness, ....
(Naumann 2002)
Accuracy, Completeness, Customer Support, Documentation, Availability, Latency, Price,Quality of Service, ....
(Jarke et al. 1999)
Correctness, Completeness, Minimality, Traceability, Interpretability, Metadata Evolution,Functionality, ....
(Redman 1996)
Content, Scope, Level of Detail, Composition, Accuracy, Completeness, Currency, Appropriateness,Interpretability, ....
......
......
......
Data Quality in Cooperative Information Systems
PITFALLPITFALLExchanges of low quality data deteriorate the quality in each DB
OPPORTUNITYOPPORTUNITYSame data in different organizations enable quality improvement by comparisons
DB 3
DB 2
DB 1
DataX
DataX
DataX
DB 1
Org 2 DB 2
DB 3DataX
DataX
DataX
DataX
DataX
Org 3
Org 1
Data and Data Quality Model and
Quality Query Language
! A model according to which cooperating organizations exchange data with associated quality values: Data and Data Quality Model (D2Q)
! A query language according to which cooperating organizations request for data with associated quality values: Quality Query Language (QQL)
Org 1DB 1DataX DataX
QualityX
D2Q
DB 2DataYDataY
QualityY Org 2
D2Q
QQL
Preliminary Results
!Model and Query Language XML-based!Coupling of Data Graphs and Quality Graphs
(one for each dimension)
0.7
0.9
Accuracy_Name
Accuracy_Street
Accuracy_Country
Accuracy_ZIPCode
0.3 0.9 0.9
0.7
0.5
0.2
Accuracy_TelephoneNumber
Accuracy_City
Accuracy_ ResidenceAddress
Accuracy_Surname
Accuracy_TelephoneNumber
Link from a data object/value
MARIA
00198
Name
Street Country
ZIPCode
VIA SALARIA 113 ROMA ITALY
ROSSI
TelephoneNumber
+390649918479
+393391234567
TelephoneNumber
City
Surname
ResidenceAddress
Links to quality objects/values
D2Q data graph
Accuracy D2Q quality graph
An Architecture forQuality Improvement in
Cooperative Information Systems
BROKER
NOTIFICATION SERVICEQuality P&S
Bulletin Board
Quality Factory
Gateway
e-Service1e-Servicen
RATING SERVICE
Enabling quality diffusion, i.e., through P&S paradigm
Query planningand on-line improvement
Evaluating source reliabilityRECORD MATCHER
Off-line improvement algorithms
Quality Factory
Gateway
e-Service1e-Servicen
Evaluating quality values
Interschema knowledgeService supporting knowledgeHistorical quality knowledge
REPOSITORY
Org 1
Org n
Strategies for Quality Improvement
Very bad data
Not very bad data
Good data Quality Maintenance
Notication Service
On-line Improvement
Broker
The notification service multicastsdata quality changes
Cooperative data
Cooperative data
The Broker selects the best quality data answering a query and sends it to the requester (query planning based on data quality optimization) and to other providers (On-Line Improvement)
Cooperative data
Cooperative data
Record Matcher
Off-line Improvement
Cooperative data
Cooperative data
The Record Matcher periodically compares exported data in order to improve their quality