D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia

Post on 25-Feb-2016

40 views 1 download

description

MOMIS Query Manager Prototipo di un query manager per la gestione di query globali. D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia. D2I Integrazione, Warehousing e Mining di sorgenti eterogenee - PowerPoint PPT Presentation

Transcript of D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

D. Beneventano, S. Bergamaschi, F. Mandreoli

Università degli Studi di Modena e Reggio Emilia

MOMIS Query Manager Prototipo di un query manager per la gestione di query globali

D2I Integrazione, Warehousing e Mining di sorgenti eterogenee

Tema 1: Integrazione di dati provenienti da sorgenti eterogenee

ROMA, 11 OTTOBRE 2002

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Example

S(G) = (Name,E_mail,Year,Dept,Section)

Local classes (relational)

Name E_mail Section Year DeptL1 firstn and lastn e_mail null year null

L2 name e_mail s_code null dept_code

S(L1) = (Name,E_mail,Year) S(L2) = (Name,E_mail,Dept,Section) Local Class Schemata w.r.t. Global Class:

Global Class Schema: G

L1(firstn,lastn,year,e_mail) L2(name,e_mail,dept_code,s_code)

Global Class: G

INTEGRATION

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Data cleaning and reconciliation

name e_mail dept_c S_code

Rossi_Ada RA@i.it Dept1 413245

Po_Ugo UP@i.it Dept1 2314

firstn lastn e_mail year

Rita Verde PV@i.it 2

Ada Rossi RA@i.it 1

Name E_mail Year

Rita Verde PV@i.it 2

Ada Rossi RA@i.it 1

Name E_mail Dept Section

Ada Rossi RA@i.it Dept1 413245

Ugo Po UP@i.it Dept1 2314

Schema Translation (example: firstn and lastn to Name)

Data conversion (example: ‘Rita’ + ‘Verde’ to ‘Rita Verde’)

L1

Integration at the extensional level• the data returned by various sources need to be converted/reconciled• interpretation and merging of the data provided by the sources

L2

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

L1 L2O1 O O2

Name E_mail Year

Rita Verde PV@i.it 2

Ada Rossi RA@i.it 1OO1

O2O

Name E_mail Dept Section

Ada Rossi RA@i.it Dept1 413245

Ugo Po UP@i.it Dept1 2314

Redundancy and Reconcilation

Instances of the same object in different local class must have the same value for a common attribute

Hypothesis

L2L1

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Object fusion

L1 L2O1 O O2

Name E_mail Year

Rita Verde PV@i.it 2

Ada Rossi RA@i.it 1

Name Name

Ada Rossi Ada Rossi

JoinMap JM(L1,L2)L1.Name=L2.Name

Name E_mail Dept Section

Ada Rossi RA@i.it Dept1 413245

Ugo Po UP@i.it Dept1 2314OO1

O2O

To identify instances of the same object and fuse them:JoinMap - join criteria among classes

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Object fusion : indirect mapL1 L2O1 O2 O3

Id Name E_mail Year

123 Rita Verde PV@i.it 2

243 Ada Rossi RA@i.it 1

Matr SN

243 XY413245

JoinMapJMCS.S,UNI.RS

E_mail Dept SN

RA@i.it Dept1 XY413245

UP@i.it Dept1 XZ2314O2O1

O3O2

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Name E_mail Year

Rita Verde PV@i.it 2

Ada Rossi RA@i.it 1

Name E_mail Dept Section

Ada Rossi RA@i.it Dept1 413245

Ugo Po UP@i.it Dept1 2314

Global Class Instance

L2

Name E_mail Year Dept Section

Ada Rossi RA@i.it 1 Dept1 413245

Rita Verde PV@i.it 2

Ugo Po UP@i.it Dept1 2314

L1

G

• The computation is based on “FULL DISJUNCTION” (Rajarama, Ullman - Integrating Information by Outerjoins and Full Disjunctions. PODS 1996)

“Computing the natural outerjoin of many relations in a way that preserves all possible connections amon facts”

G: select S(G) from L1 outer join L2 on JM(L1,L2)

• GAV with “Single database property”(Lenzerini - Data Integration: A Theoretical Perspective, PODS

2002)

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

FULL DISJUNCTION COMPUTATION

• Question: when a full disjunction can be computed by some sequence of natural outerjoins

• Answer: there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, -acyclic hypergraph (Fagin - 1983)

A Global class with n local classes, n >2 :

-cyclic hypergraph

G: select S(G) from (L1 outer join L2 on JM(L1,L2))

outer join (L1 outer join L3 on JM(L1,L3))

on JM(L2,L3)

Example: n = 3 :

New Method

L1

L2 L3

JM(L1,L3)JM(L1,L2)

JM(L2,L3)

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Query rewiting methodGlobal query (in DNF) : Q1Local query for the class L : Q1_L

where-condition of Q1_L :all factors of DNF which can be solved in L

residual factors of Q1 :factors not included in all local where-condition

select-list of Q1_L : attributes of the select-list of Q1 + residual factors

+JoinMap

Global query reformulation full disjunction based on the JoinMap + residual factors

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Query rewiting example

Q1: select E_mailfrom Gwhere (E_mail like ’*.it' and Dept='Dept1') or

(E_mail like ’*.it' and Year=2)

Q1_L1: select Name, Year, E_mail from L1 where (E_mail like ’*.it' or Year=2)

Q1_L2: select Name, Dept, E_mail from L2 where (E_mail like ’*.it' or Dept='Dept1')

Q1: select E_mail from Q1_L1 outer join Q1_L2 on JM where (Dept='Dept1' or Year=2)

residual factor

Global query

Local queries

Global query reformulation: