D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia

10
rsità degli Studi di Modena e Reggio Emilia The MOMIS project - http://www.dbgroup.unimo.it/Momis D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia MOMIS Query Manager Prototipo di un query manager per la gestione di query glob D2I Integrazione, Warehousing e Mining di sorgenti eterogenee Tema 1: Integrazione di dati provenienti da sorgenti eterogenee ROMA, 11 OTTOBRE 2002

description

MOMIS Query Manager Prototipo di un query manager per la gestione di query globali. D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia. D2I Integrazione, Warehousing e Mining di sorgenti eterogenee - PowerPoint PPT Presentation

Transcript of D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

D. Beneventano, S. Bergamaschi, F. Mandreoli

Università degli Studi di Modena e Reggio Emilia

MOMIS Query Manager Prototipo di un query manager per la gestione di query globali

D2I Integrazione, Warehousing e Mining di sorgenti eterogenee

Tema 1: Integrazione di dati provenienti da sorgenti eterogenee

ROMA, 11 OTTOBRE 2002

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Example

S(G) = (Name,E_mail,Year,Dept,Section)

Local classes (relational)

Name E_mail Section Year DeptL1 firstn and lastn e_mail null year null

L2 name e_mail s_code null dept_code

S(L1) = (Name,E_mail,Year) S(L2) = (Name,E_mail,Dept,Section) Local Class Schemata w.r.t. Global Class:

Global Class Schema: G

L1(firstn,lastn,year,e_mail) L2(name,e_mail,dept_code,s_code)

Global Class: G

INTEGRATION

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Data cleaning and reconciliation

name e_mail dept_c S_code

Rossi_Ada [email protected] Dept1 413245

Po_Ugo [email protected] Dept1 2314

firstn lastn e_mail year

Rita Verde [email protected] 2

Ada Rossi [email protected] 1

Name E_mail Year

Rita Verde [email protected] 2

Ada Rossi [email protected] 1

Name E_mail Dept Section

Ada Rossi [email protected] Dept1 413245

Ugo Po [email protected] Dept1 2314

Schema Translation (example: firstn and lastn to Name)

Data conversion (example: ‘Rita’ + ‘Verde’ to ‘Rita Verde’)

L1

Integration at the extensional level• the data returned by various sources need to be converted/reconciled• interpretation and merging of the data provided by the sources

L2

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

L1 L2O1 O O2

Name E_mail Year

Rita Verde [email protected] 2

Ada Rossi [email protected] 1OO1

O2O

Name E_mail Dept Section

Ada Rossi [email protected] Dept1 413245

Ugo Po [email protected] Dept1 2314

Redundancy and Reconcilation

Instances of the same object in different local class must have the same value for a common attribute

Hypothesis

L2L1

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Object fusion

L1 L2O1 O O2

Name E_mail Year

Rita Verde [email protected] 2

Ada Rossi [email protected] 1

Name Name

Ada Rossi Ada Rossi

JoinMap JM(L1,L2)L1.Name=L2.Name

Name E_mail Dept Section

Ada Rossi [email protected] Dept1 413245

Ugo Po [email protected] Dept1 2314OO1

O2O

To identify instances of the same object and fuse them:JoinMap - join criteria among classes

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Object fusion : indirect mapL1 L2O1 O2 O3

Id Name E_mail Year

123 Rita Verde [email protected] 2

243 Ada Rossi [email protected] 1

Matr SN

243 XY413245

JoinMapJMCS.S,UNI.RS

E_mail Dept SN

[email protected] Dept1 XY413245

[email protected] Dept1 XZ2314O2O1

O3O2

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Name E_mail Year

Rita Verde [email protected] 2

Ada Rossi [email protected] 1

Name E_mail Dept Section

Ada Rossi [email protected] Dept1 413245

Ugo Po [email protected] Dept1 2314

Global Class Instance

L2

Name E_mail Year Dept Section

Ada Rossi [email protected] 1 Dept1 413245

Rita Verde [email protected] 2

Ugo Po [email protected] Dept1 2314

L1

G

• The computation is based on “FULL DISJUNCTION” (Rajarama, Ullman - Integrating Information by Outerjoins and Full Disjunctions. PODS 1996)

“Computing the natural outerjoin of many relations in a way that preserves all possible connections amon facts”

G: select S(G) from L1 outer join L2 on JM(L1,L2)

• GAV with “Single database property”(Lenzerini - Data Integration: A Theoretical Perspective, PODS

2002)

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

FULL DISJUNCTION COMPUTATION

• Question: when a full disjunction can be computed by some sequence of natural outerjoins

• Answer: there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, -acyclic hypergraph (Fagin - 1983)

A Global class with n local classes, n >2 :

-cyclic hypergraph

G: select S(G) from (L1 outer join L2 on JM(L1,L2))

outer join (L1 outer join L3 on JM(L1,L3))

on JM(L2,L3)

Example: n = 3 :

New Method

L1

L2 L3

JM(L1,L3)JM(L1,L2)

JM(L2,L3)

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Query rewiting methodGlobal query (in DNF) : Q1Local query for the class L : Q1_L

where-condition of Q1_L :all factors of DNF which can be solved in L

residual factors of Q1 :factors not included in all local where-condition

select-list of Q1_L : attributes of the select-list of Q1 + residual factors

+JoinMap

Global query reformulation full disjunction based on the JoinMap + residual factors

Univ

ersit

à de

gli S

tudi

di M

oden

a e

Regg

io E

milia

The MOMIS project - http://www.dbgroup.unimo.it/Momis

Query rewiting example

Q1: select E_mailfrom Gwhere (E_mail like ’*.it' and Dept='Dept1') or

(E_mail like ’*.it' and Year=2)

Q1_L1: select Name, Year, E_mail from L1 where (E_mail like ’*.it' or Year=2)

Q1_L2: select Name, Dept, E_mail from L2 where (E_mail like ’*.it' or Dept='Dept1')

Q1: select E_mail from Q1_L1 outer join Q1_L2 on JM where (Dept='Dept1' or Year=2)

residual factor

Global query

Local queries

Global query reformulation: