Data Quality in Cooperative Information · PDF fileData Quality in Cooperative Information...

7
Data Quality in Data Quality in Cooperative Information Systems Cooperative Information Systems Monica Scannapieco Monica Scannapieco Dipartimento di Informatica e Sistemistica Università di Roma “La Sapienza”, Italy Supervisors: Carlo Batini, Tiziana Catarci Istituto di Analisi dei Sistemi e Informatica Consiglio Nazionale delle Ricerche, Italy Supervisor: Paola Bertolazzi

Transcript of Data Quality in Cooperative Information · PDF fileData Quality in Cooperative Information...

Data Quality inData Quality inCooperative Information SystemsCooperative Information Systems

Monica ScannapiecoMonica Scannapieco

Dipartimento di Informatica e SistemisticaUniversità di Roma “La Sapienza”, Italy

Supervisors: Carlo Batini, Tiziana Catarci

Istituto di Analisi dei Sistemi e InformaticaConsiglio Nazionale delle Ricerche, Italy

Supervisor: Paola Bertolazzi

Data Quality:a Multidimensional Concept

Data Quality

(Wang & Strong 1996)

Believability, Accuracy, Objectivity, Reputation, Value-Added, Relevancy, Timeliness, Completeness, ....

(Naumann 2002)

Accuracy, Completeness, Customer Support, Documentation, Availability, Latency, Price,Quality of Service, ....

(Jarke et al. 1999)

Correctness, Completeness, Minimality, Traceability, Interpretability, Metadata Evolution,Functionality, ....

(Redman 1996)

Content, Scope, Level of Detail, Composition, Accuracy, Completeness, Currency, Appropriateness,Interpretability, ....

......

......

......

Data Quality in Cooperative Information Systems

PITFALLPITFALLExchanges of low quality data deteriorate the quality in each DB

OPPORTUNITYOPPORTUNITYSame data in different organizations enable quality improvement by comparisons

DB 3

DB 2

DB 1

DataX

DataX

DataX

DB 1

Org 2 DB 2

DB 3DataX

DataX

DataX

DataX

DataX

Org 3

Org 1

Data and Data Quality Model and

Quality Query Language

! A model according to which cooperating organizations exchange data with associated quality values: Data and Data Quality Model (D2Q)

! A query language according to which cooperating organizations request for data with associated quality values: Quality Query Language (QQL)

Org 1DB 1DataX DataX

QualityX

D2Q

DB 2DataYDataY

QualityY Org 2

D2Q

QQL

Preliminary Results

!Model and Query Language XML-based!Coupling of Data Graphs and Quality Graphs

(one for each dimension)

0.7

0.9

Accuracy_Name

Accuracy_Street

Accuracy_Country

Accuracy_ZIPCode

0.3 0.9 0.9

0.7

0.5

0.2

Accuracy_TelephoneNumber

Accuracy_City

Accuracy_ ResidenceAddress

Accuracy_Surname

Accuracy_TelephoneNumber

Link from a data object/value

MARIA

00198

Name

Street Country

ZIPCode

VIA SALARIA 113 ROMA ITALY

ROSSI

TelephoneNumber

+390649918479

+393391234567

TelephoneNumber

City

Surname

ResidenceAddress

Links to quality objects/values

D2Q data graph

Accuracy D2Q quality graph

An Architecture forQuality Improvement in

Cooperative Information Systems

BROKER

NOTIFICATION SERVICEQuality P&S

Bulletin Board

Quality Factory

Gateway

e-Service1e-Servicen

RATING SERVICE

Enabling quality diffusion, i.e., through P&S paradigm

Query planningand on-line improvement

Evaluating source reliabilityRECORD MATCHER

Off-line improvement algorithms

Quality Factory

Gateway

e-Service1e-Servicen

Evaluating quality values

Interschema knowledgeService supporting knowledgeHistorical quality knowledge

REPOSITORY

Org 1

Org n

Strategies for Quality Improvement

Very bad data

Not very bad data

Good data Quality Maintenance

Notication Service

On-line Improvement

Broker

The notification service multicastsdata quality changes

Cooperative data

Cooperative data

The Broker selects the best quality data answering a query and sends it to the requester (query planning based on data quality optimization) and to other providers (On-Line Improvement)

Cooperative data

Cooperative data

Record Matcher

Off-line Improvement

Cooperative data

Cooperative data

The Record Matcher periodically compares exported data in order to improve their quality