D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia
description
Transcript of D. Beneventano, S. Bergamaschi, F. Mandreoli Università degli Studi di Modena e Reggio Emilia
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
D. Beneventano, S. Bergamaschi, F. Mandreoli
Università degli Studi di Modena e Reggio Emilia
MOMIS Query Manager Prototipo di un query manager per la gestione di query globali
D2I Integrazione, Warehousing e Mining di sorgenti eterogenee
Tema 1: Integrazione di dati provenienti da sorgenti eterogenee
ROMA, 11 OTTOBRE 2002
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
Example
S(G) = (Name,E_mail,Year,Dept,Section)
Local classes (relational)
Name E_mail Section Year DeptL1 firstn and lastn e_mail null year null
L2 name e_mail s_code null dept_code
S(L1) = (Name,E_mail,Year) S(L2) = (Name,E_mail,Dept,Section) Local Class Schemata w.r.t. Global Class:
Global Class Schema: G
L1(firstn,lastn,year,e_mail) L2(name,e_mail,dept_code,s_code)
Global Class: G
INTEGRATION
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
Data cleaning and reconciliation
name e_mail dept_c S_code
Rossi_Ada [email protected] Dept1 413245
Po_Ugo [email protected] Dept1 2314
firstn lastn e_mail year
Rita Verde [email protected] 2
Ada Rossi [email protected] 1
Name E_mail Year
Rita Verde [email protected] 2
Ada Rossi [email protected] 1
Name E_mail Dept Section
Ada Rossi [email protected] Dept1 413245
Ugo Po [email protected] Dept1 2314
Schema Translation (example: firstn and lastn to Name)
Data conversion (example: ‘Rita’ + ‘Verde’ to ‘Rita Verde’)
L1
Integration at the extensional level• the data returned by various sources need to be converted/reconciled• interpretation and merging of the data provided by the sources
L2
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
L1 L2O1 O O2
Name E_mail Year
Rita Verde [email protected] 2
Ada Rossi [email protected] 1OO1
O2O
Name E_mail Dept Section
Ada Rossi [email protected] Dept1 413245
Ugo Po [email protected] Dept1 2314
Redundancy and Reconcilation
Instances of the same object in different local class must have the same value for a common attribute
Hypothesis
L2L1
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
Object fusion
L1 L2O1 O O2
Name E_mail Year
Rita Verde [email protected] 2
Ada Rossi [email protected] 1
Name Name
Ada Rossi Ada Rossi
JoinMap JM(L1,L2)L1.Name=L2.Name
Name E_mail Dept Section
Ada Rossi [email protected] Dept1 413245
Ugo Po [email protected] Dept1 2314OO1
O2O
To identify instances of the same object and fuse them:JoinMap - join criteria among classes
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
Object fusion : indirect mapL1 L2O1 O2 O3
Id Name E_mail Year
123 Rita Verde [email protected] 2
243 Ada Rossi [email protected] 1
Matr SN
243 XY413245
JoinMapJMCS.S,UNI.RS
E_mail Dept SN
[email protected] Dept1 XY413245
[email protected] Dept1 XZ2314O2O1
O3O2
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
Name E_mail Year
Rita Verde [email protected] 2
Ada Rossi [email protected] 1
Name E_mail Dept Section
Ada Rossi [email protected] Dept1 413245
Ugo Po [email protected] Dept1 2314
Global Class Instance
L2
Name E_mail Year Dept Section
Ada Rossi [email protected] 1 Dept1 413245
Rita Verde [email protected] 2
Ugo Po [email protected] Dept1 2314
L1
G
• The computation is based on “FULL DISJUNCTION” (Rajarama, Ullman - Integrating Information by Outerjoins and Full Disjunctions. PODS 1996)
“Computing the natural outerjoin of many relations in a way that preserves all possible connections amon facts”
G: select S(G) from L1 outer join L2 on JM(L1,L2)
• GAV with “Single database property”(Lenzerini - Data Integration: A Theoretical Perspective, PODS
2002)
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
FULL DISJUNCTION COMPUTATION
• Question: when a full disjunction can be computed by some sequence of natural outerjoins
• Answer: there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, -acyclic hypergraph (Fagin - 1983)
A Global class with n local classes, n >2 :
-cyclic hypergraph
G: select S(G) from (L1 outer join L2 on JM(L1,L2))
outer join (L1 outer join L3 on JM(L1,L3))
on JM(L2,L3)
Example: n = 3 :
New Method
L1
L2 L3
JM(L1,L3)JM(L1,L2)
JM(L2,L3)
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
Query rewiting methodGlobal query (in DNF) : Q1Local query for the class L : Q1_L
where-condition of Q1_L :all factors of DNF which can be solved in L
residual factors of Q1 :factors not included in all local where-condition
select-list of Q1_L : attributes of the select-list of Q1 + residual factors
+JoinMap
Global query reformulation full disjunction based on the JoinMap + residual factors
Univ
ersit
à de
gli S
tudi
di M
oden
a e
Regg
io E
milia
The MOMIS project - http://www.dbgroup.unimo.it/Momis
Query rewiting example
Q1: select E_mailfrom Gwhere (E_mail like ’*.it' and Dept='Dept1') or
(E_mail like ’*.it' and Year=2)
Q1_L1: select Name, Year, E_mail from L1 where (E_mail like ’*.it' or Year=2)
Q1_L2: select Name, Dept, E_mail from L2 where (E_mail like ’*.it' or Dept='Dept1')
Q1: select E_mail from Q1_L1 outer join Q1_L2 on JM where (Dept='Dept1' or Year=2)
residual factor
Global query
Local queries
Global query reformulation: