Transcript of Data Warehousing con ORACLE Viste materializzate / SnapshotsViste materializzate / Snapshots Star...
- Slide 1
- Data Warehousing con ORACLE Viste materializzate /
SnapshotsViste materializzate / Snapshots Star queryStar query
Tabelle partizionateTabelle partizionate Tabelle Organizzate ad
IndiceTabelle Organizzate ad Indice Indici BitMappedIndici
BitMapped ClusterCluster Indici Function BasedIndici Function Based
Transportable TablespacesTransportable Tablespaces
- Slide 2
- DataBase TablespaceDataFile Oracle Block O.S. Block Owner
Schema SegmentExtent Table Index Cluster Snapshot Oracle
Architectural Components (Logical) Partition
- Slide 3
- Partitioning offers these advantages: Partitioning enables data
management operations such 1.data loads, 2.Index creation and
rebuilding 3.backup/recovery at the partition level, rather than on
the entire table. This results in significantly reduced times for
these operations. Partitioning improves query performance. In many
cases, the results of a query can be achieved by accessing a subset
of partitions, rather than the entire table. For some queries, this
technique (called partition pruning) can provide order-of-magnitude
gains in performance. Partitioning can significantly reduce the
impact of scheduled downtime for maintenance operations. Partition
independence for partition maintenance operations lets you perform
concurrent maintenance operations on different partitions of the
same table or index.
- Slide 4
- Partitioning offers these advantages: Partitioning increases
the availability of mission-critical databases if critical tables
and indexes are divided into partitions to reduce the maintenance
windows, recovery times, and impact of failures. Partitioning can
be implemented without requiring any modifications to your
applications. For example, you could convert a nonpartitioned table
to a partitioned table without needing to modify any of the SELECT
statements or DML statements which access that table. You do not
need to rewrite your application code to take advantage of
partitioning.
- Slide 5
- Slide 6
- Partitioning Methods Oracle provides the following partitioning
methods: 1. Range Partitioning 2. List Partitioning 3. Hash
Partitioning 4. Composite Partitioning Composite partitioning is a
combination of other partitioning methods.
- Slide 7
- Range Partitioning Range partitioning maps data to partitions
based on ranges of partition Key Values that you establish for each
partition. It is the most common type of partitioning and is often
used with dates. When using range partitioning, there are a few
rules to keep in mind: 1) Each partition has a VALUES LESS THAN
clause, which specifies a non inclusive upper bound for the
partitions. Any binary values of the partition key equal to or
higher than this literal are added to the next higher partition. 2)
All partitions, except the first, have an implicit lower bound
specified by the VALUES LESS THAN clause on the previous partition.
3) A MAXVALUE literal can be defined for the highest partition.
MAXVALUE represents a virtual infinite value that sorts higher than
any other possible value for the partition key, including the null
value.
- Slide 8
- CREATE TABLE sales_range ( salesman_id NUMBER(5), salesman_name
VARCHAR2(30), sales_amount NUMBER(10), sales_date DATE) PARTITION
BY RANGE(sales_date) ( PARTITION sales_jan2000 VALUES LESS
THAN(TO_DATE(01/02/2000,DD/MM/YYYY)), PARTITION sales_feb2000
VALUES LESS THAN(TO_DATE(01/03/2000,DD/MM/YYYY)), PARTITION
sales_mar2000 VALUES LESS THAN(TO_DATE(01/04/2000,DD/MM/YYYY)),
PARTITION sales_apr2000 VALUES LESS
THAN(TO_DATE(01/05/2000,DD/MM/YYYY)), )
- Slide 9
- List Partitioning List partitioning enables you to explicitly
control how rows map to partitions. You do this by specifying a
list of discrete values for the partitioning key in the description
for each partition. This is different from range partitioning,
where a range of values is associated with a partition and from
hash partitioning, where a hash function controls the
row-to-partition mapping. The advantage of list partitioning is
that you can group and organize unordered and unrelated sets of
data in a natural way. CREATE TABLE sales_list (salesman_id
NUMBER(5), salesman_name VARCHAR2(30), sales_state VARCHAR2(20),
sales_amount NUMBER(10), sales_date DATE) PARTITION BY
LIST(sales_state) ( PARTITION sales_west VALUES IN (California,
Hawaii), PARTITION sales_east VALUES IN (New York, Virginia,
Florida), PARTITION sales_cent VALUES IN (Texas, Illinois) )
- Slide 10
- Hash Partitioning Hash partitioning enables easy partitioning
of data that does not lend itself to range or list partitioning. It
does this with a simple syntax and is easy to implement. It is a
better choice than range partitioning when: 1) You do not know
beforehand how much data will map into a given range 2) The sizes
of range partitions would differ quite substantially or would be
difficult to balance manually 3) Range partitioning would cause the
data to be undesirably clustered 4) Performance features such as
parallel DML are important The concepts of splitting, dropping or
merging partitions do not apply to Hash partitions. Instead, hash
partitions can be added and coalesced. CREATE TABLE sales_hash
(salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_amount
NUMBER(10), week_no NUMBER(2)) PARTITION BY HASH(salesman_id)
PARTITIONS 4 STORE IN (tbs1, tbs2, tbs3, tbs4);
- Slide 11
- Maintaining Partitions This section describes how to perform
the following specific partition maintenance operations: 1.Moving
Partitions 2.Adding Partitions 3.Dropping Partitions 4.Coalescing
Partitions 5.Modifying Partition Default Attributes 6.Truncating
Partitions 7.Splitting Partitions 8.Merging Partitions 9.Exchanging
Table Partitions
- Slide 12
- Coalescing Free Space Space for tablespace segments is managed
using extents, which are made up of a specific number of contiguous
data blocks. Thus, a larger free extent can be fragmented, or
smaller contiguous free extents can be coalesced into one larger
free extent. However, continuous allocation and deallocation of
free space fragments your tablespace and makes allocation of larger
extents more difficult. By default, SMON (system monitor) processes
incrementally coalesce the free extents of tablespaces in the
background.
- Slide 13
- Regular table access ROWID Index-Organized Tables IOT access
Non-key columns Key column Row header
- Slide 14
- Index-Organized Tables An index-organized table has a storage
organization that is a variant of a primary B-tree. Unlike an
ordinary (heap-organized) table whose data is stored as an
unordered collection (heap), data for an index-organized table is
stored in a B-tree index structure in a primary key sorted manner.
Besides storing the primary key column values of an index-organized
table row, each index entry in the B-tree stores the nonkey column
values as well.
- Slide 15
- Index-Organized Tables Compared with Regular Tables Faster
key-based access to table data Faster key-based access to table
data Reduced storage requirements Reduced storage requirements Main
restrictions: Main restrictions: Must have a primary key Cannot use
unique constraints Cannot be clustered Faster key-based access to
table data Faster key-based access to table data Reduced storage
requirements Reduced storage requirements Main restrictions: Main
restrictions: Must have a primary key Cannot use unique constraints
Cannot be clustered
- Slide 16
- Secondary Indexes on Index-Organized Tables Secondary index
support on index-organized tables provides efficient access to
index-organized table using columns that: 1) are not the primary
key 2) are not prefix of the primary key. Oracle constructs
secondary indexes on index-organized tables using logical row
identifiers (logical rowids) that are based on the tables primary
key. A logical rowid optionally can includes a physical guess,
which identifies the block location of the row. Oracle can use
these physical guesses to probe directly into the leaf block of the
index-organized table, bypassing the primary key search. Because
rows in index-organized tables do not have permanent physical
addresses, the physical guesses can become stale when rows are
moved to new blocks. A) Without physical guesses, access involves
two index scans: a secondary index scan followed by a scan of the
primary key index. B) With accurate physical guesses, access
involves a secondary index scan and an additional I/O to fetch the
data block containing the row. C) With inaccurate physical guesses,
access involves a secondary index scan and an I/O to fetch the
wrong data block (as indicated by the physical guess), followed by
a scan of the primary key index.
- Slide 17
- Creating Index-Organized Tables SQL> create table sales 2
(office_cd number(3) 3,qtr_enddate 4,revenuenumber(10,2)
5,constraint sales_pk 6 PRIMARY KEY (office_cd,qtr_end) 7 ) 8
ORGANIZATION INDEX 9 tablespace indx 10 storage (); SQL> create
table sales 2 (office_cd number(3) 3,qtr_enddate
4,revenuenumber(10,2) 5,constraint sales_pk 6 PRIMARY KEY
(office_cd,qtr_end) 7 ) 8 ORGANIZATION INDEX 9 tablespace indx 10
storage ();
- Slide 18
- Bitmap Indexes Bitmap indexes are widely used in data
warehousing environments. The environments typically have: large
amounts of data and ad hoc queries, but a low level of concurrent
DML transactions and a low level of cardinality. For such
applications, bitmap indexing provides: 1) Reduced response time
for large classes of ad hoc queries 2) Reduced storage requirements
compared to other indexing techniques 3) Dramatic performance gains
even on hardware with a relatively small number of CPUs or a small
amount of memory 4) Efficient maintenance during parallel DML and
loads Fully indexing a large table with a traditional B-tree index
can be prohibitively expensive in terms of space because the
indexes can be several times larger than the data in the
table.
- Slide 19
- The following shows a portion of a company's customers table.
SELECT cust_id, cust_gender, cust_marital_status, cust_income_level
FROM customers; CUST_ID C CUST_MARITAL_STATUS CUST_INCOME_LEVEL
---------- - -------------------- --------------------- 70 F D:
70,000 - 89,999 80 F married H: 150,000 - 169,999 90 M single H:
150,000 - 169,999 100 F I: 170,000 - 189,999 110 F married C:
50,000 - 69,999 120 M single F: 110,000 - 129,999 130 M J: 190,000
- 249,999 140 M married G: 130,000 - 149,999 Because
cust_gender,cust_marital_status, and cust_income_level are all
low-cardinality columns (there are only three possible values for
marital status and region, two possible values for gender, and 12
for income level), bitmap indexes are ideal for these columns. Do
not create a bitmap index on cust_id because this is a unique
column. Instead, a unique B-tree index on this column provides the
most efficient representation and retrieval.
- Slide 20
- ClusterCluster Use clusters to store one or more tables that:
1) Are primarily queried 2) Not predominantly inserted into or
updated 3) Which the queries often join data of multiple tables in
the cluster. Are an optional method of storing table data. A
cluster is a group of tables that share the same data blocks
because they share common columns and are often used together.
- Slide 21
- Because clusters store related rows of different tables
together in the same data blocks, properly used clusters offers
these benefits: 1)Disk I/O is reduced for joins of clustered
tables. 2) Access time improves for joins of clustered tables. 3)
In a cluster, a cluster key value is the value of the cluster key
columns for a particular row. Each cluster key value is stored only
once each in the cluster and the cluster index, no matter how many
rows of different tables contain the value. Therefore, less storage
is required to store related table and index data.
ClusterCluster
- Slide 22
- CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTFREE 5 TABLESPACE
user_data STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121
PCTINCREASE 0 ); CREATE TABLE dept (deptno NUMBER(3) PRIMARY KEY,..
) CLUSTER emp_dept (deptno); CREATE TABLE emp (empno NUMBER(5)
PRIMARY KEY, ename VARCHAR2(15) NOT NULL,... deptno NUMBER(3)
REFERENCES dept) CLUSTER emp_dept (deptno);
- Slide 23
- CREATE INDEX emp_dept_index ON CLUSTER emp_dept TABLESPACE
user_indx STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121
PCTINCREASE 0) PCTFREE 5; A cluster index must be created before
any rows can be inserted into any clustered table. The following
statement creates a cluster index for the emp_dept cluster:
- Slide 24
- Viewing Information About Clusters The following views display
information about clusters:
- Slide 25
- Index Function Based Features of Function-Based Indexes
Function-based indexes allow you to: 1) More powerful 2) Precompute
the value of a computationally intensive function and store it in
the index 3) Increase the number of situations where the optimizer
can perform a range scan instead of a full table scan You must have
the following initialization parameters defined to create a
function-based index: *) QUERY_REWRITE_ENABLED set to TRUE *)
COMPATIBLE set to 8.1.0.0.0 or a greater value
- Slide 26
- Index Function Based Example: Function-Based Index for Case
Insensitive Searches The following statement creates function-based
index idx on table emp based on an uppercase evaluation of the
ename column: CREATE INDEX idx ON emp (UPPER(ename)); Now the
SELECT statement uses the function-based index on UPPER(ename) To
retrieve all employees with names that start with JOH: SELECT *
FROM emp WHERE UPPER(ename) LIKE 'JOH%'; Example: Precomputing
Arithmetic Expressions with a Function-Based Index This statement
creates a function-based index on an expression: CREATE INDEX idx
ON t (a + b * (c - 1), a, b); SELECT statements can use either an
index range scan (in the following SELECT statement the expression
is a prefix of the index) or index full scan (preferable when the
index specifies a high degree of parallelism). SELECT a FROM t
WHERE a + b * (c - 1) < 100;
- Slide 27
- OLTP Data warehouse Data marts Information distribution Staging
Data Transportation: Transportable Tablespaces
- Slide 28
- Transporting Tablespaces 1.Make tablespace read-only 2.Export
metadata from source 3.Copy data files to target system 4.Transfer
export file 5.Import metadata into target 6.If necessary, alter the
tablespace to read-write 1.Make tablespace read-only 2.Export
metadata from source 3.Copy data files to target system 4.Transfer
export file 5.Import metadata into target 6.If necessary, alter the
tablespace to read-write
- Slide 29
- Exporting and Importing Metadata exp sys/ FILE=s980501.dmp
TRANSPORT_TABLESPACE=y TABLESPACES=sales_ts TRIGGERS=N
CONSTRAINTS=N imp sys/ FILE=s980501.dmp TRANSPORT_TABLESPACE=y
DATAFILES=(/disk1/sales01.dbf, /disk2/sales02.dbf) imp sys/
FILE=s980501.dmp TRANSPORT_TABLESPACE=y
DATAFILES=(/disk1/sales01.dbf, /disk2/sales02.dbf)
- Slide 30
- Transporting a Tablespace /u/d1/sales1.dbf /u/d2/sales2.dbf
SALES _ TS /u/d1 /u/d2 /disk1 /disk2 /disk1/sales01.dbf
/disk2/sales02.dbf SALES_TS s980501.dmp Read Only System 1 System
2
- Slide 31
- Transportable Tablespaces: Uses Moves entire tablespace data
Supports media recovery Source and target databases must : Be on
the same operating system Be on the same operating system Run
Oracle8i, release 8.1, or above Run Oracle8i, release 8.1, or above
Have the same block size Have the same block size Use the same
character set Use the same character set Moves entire tablespace
data Supports media recovery Source and target databases must : Be
on the same operating system Be on the same operating system Run
Oracle8i, release 8.1, or above Run Oracle8i, release 8.1, or above
Have the same block size Have the same block size Use the same
character set Use the same character set
- Slide 32
- Checking Transport Set DBMS_TTS.TRANSPORT_SET_CHECK(
ts_list=> SALES_TS, incl_constraints=> TRUE); Data dictionary
Objects with references to objects outside the transport set into
table TRANSPORT_SET_VIOLATIONS List of tablespaces Check
referential integrity
- Slide 33
- Discoverer Architecture User edition Administration edition
Administration edition End User Layer Database (OLTP, Data
Warehouse, Data Mart) Business Area Database Complexity is Hidden
From Users Viewer edition Data Base DWH
- Slide 34
- TuningTuning
- Slide 35
- Per ottenere i migliori risultati consigliabile intraprendere
la fase di tuning in parallelo con lattivit di disegno, senza
attendere le fasi di sviluppo e messa in produzione del software.
Il grafico sotto riportato pone in evidenza il rapporto benefici
(Y) / tempo di avvio del tuning (X). Un prospetto con andamento
opposto si potrebbe pensare se, in alternativa ai benefici, sulle
ordinate (Y) si prendessero in considerazione i costi della
attivit. Disegno Sviluppo Produzione
- Slide 36
- Ecco riportati gli step da seguire, esplicitamente nellordine,
al fine di ottenere una ottima politica di tuning:
- Slide 37
- Slide 38
- Slide 39
- Slide 40
- Slide 41
- Slide 42
- Slide 43
- Slide 44
- Slide 45
- Slide 46
- Statement SQL Parse Syntax Validation Terminology Validation
Grant / User Validation Execution Plan Bind Describe OK Execute KO
Fetch OK KO Implicit Cursor
- Slide 47
- Comando SQL Ottimizzatore Manuale Interno delle Regole
Informazioni Statistiche Addizionali Piano di Esecuzione
- Slide 48
- Analyze Table Index Cluster Name Compute Estimate Delete
Statistics; Sample Integer Rows Percent
DBMS_UTILITY.ANALYZE_SCHEMA( schema VARCHAR2, method VARCHAR2,
estimate_rowsNUMBERDEFAULT NULL, estimate_percent NUMBER DEFAULT
NULL, method_opt VARCHAR2 DEFAULT NULL);
- Slide 49
- Per abilitare tale metodologia si pu intervenire a livello di
D.B di sessione di singolo statement SQL. Parametro di
inizializzazione OPTIMIZER_MODE RULE(rule-based) CHOOSE(cost-based)
FIRST_ROWS(cost-based - minimizza il response time)
ALL_ROWS(cost-based - minimizza il total execution time) Comando
ALTER SESSION SET OPTIMIZER_MODE Sql Statement Select /*+ choose */
campi from tavole where condizione;
- Slide 50
- Ottimizzatore Comando SQL Piano di Esecuzione Plan_Table
- Slide 51
- Slide 52
- Slide 53
- Slide 54
- Automated Performance Tuning System La metodica che si pone
alla base della strategia architettata su quattro step distinti: 1
- Individuazione degli aspetti del Data Base che si desidera porre
sotto analisi, definizione delle regole e rintraccio dei valori
limite. 2 - Collecting data. Acquisizione delle informazioni legate
a: database instance schema environment workload (carico di
lavoro). 3 - Viewing dei collected data. Prospetto, tramite
reportistiche semplici e di immediata comprensione, dei dati
precedentemente raccolti. 4 - Analyzing data / generate
recommendations. Nel caso in cui i valori limite non siano
soddisfatti ecco il sistema intervenire con una serie di utili
consigli se non addirittura con un insieme di risoluzioni
automatiche poste in essere. Sono controlli che rientrano sotto il
nome di "Routine Tuning" da considerarsi come "prevenzione" essendo
gli stessi un help per anticipare quelli che potrebbero divenire
reali problemi prima chegli stessi si presentino con la propria
complessita' risolutiva.
- Slide 55
- Controllo 1-%Library Cache Misses < 1% select
round(sum(reloads)/sum(pins)*100,2) Col1 from v$librarycache; >
shared_pool_size Controllo 2 -%Data Dictionary Cache Misses <
10% select round(sum(getmisses)/sum(gets)*100,2) Col1 from
v$rowcache; > shared_pool_size
- Slide 56
- Controllo 3 -Ratio Logico/Fisico Critico > 80% select
round(((1-(a.value/(b.value+c.value)))*100),2) Col1 from v$sysstat
a, v$sysstat b, v$sysstat c where a.name = 'physical reads' --
accessi fisici and b.name = 'db block gets' -- accessi logici and
c.name = 'consistent gets'; -- accessi logici >
db_block_buffer
- Slide 57
- Controllo 4 -Analisi Aree di Sort Parallelamente all'area
condivisa di ram (SGA), esistono un insieme di aree
dimensionalmente ridotte, non condivise ed in relazione 1:1 con i
processi dedicati al supporto delle connessioni utente (PGA) la cui
funzionalita' principale e' rintracciabile nel supporto alle
politiche di sort e di grouping. Ogni statement che richiama
operazioni quali sort e grouping, sfrutta per il raggiungimento
dell'obiettivo l'area PGA. Nel caso in cui la stessa risulti non
sufficiente per accomodare l'attivit, quest'ultima migra sui
segmenti temporanei opportunamente creati su disco, con un degrado
significativo dei tempi di esecuzione. Non esiste un limite da
utilizzarsi come confronto. Viene demandata alla sensibilit del
DBA, la decisione su di una rianalisi dei parametri di initSID.ora:
sort_area_retained_size dimensione in bytes allocata nella PGA per
potenziali SORT sort_area_size dimensione in bytes allocabile nella
PGA per sicuri SORT. Sono individuate due soglie (min & max)
per cercare di risolvere il maggior numero di attivita'
integralmente in ram.
- Slide 58
- select name, value from v$sysstat where name in ('sorts
(memory)','sorts (disk)'); Controllo 4 -Analisi Aree di Sort
- Slide 59
- Controllo 5 -Analisi Oggetti con + 25 extents Il controllo
prosegue concentrandosi sugli oggetti (segmenti) con un livello di
criticit, un livello di frammentazione (numero di extents
associati) elevato e potenzialmente colpevolizzabile di un
peggioramento delle performances del sistema. select owner,
segment_name, segment_type, tablespace_name, extents from
dba_segments where extents > 25 and owner not in ('SYS') order
by owner,segment_type, extents desc;
- Slide 60
- SQL Trace Facility Sono in grado di fornire informazioni
preziose, per ogni statement SQL chiamato in causa, generando le
seguenti statistiche attive: 1)a livello di sessione (alter session
set sql_trace = TRUE) 2)a livello di intera istanza (parametro di
configurazione del file initSID.ora sql_trace=true) Numero di
parse, execute e fetch Tempo di CPU e tempo di elapsed (trascorso)
Numero di letture logiche e letture fisiche Numero di record
processati archiviate, in formato interno, in un file denominato
trace file
- Slide 61
- Al fine di prospettare le informazione storicizzate nel file di
trace sopra descritto, occorre dare in pasto lo stesso archivio ad
un formattatore denominato TKPROF la cui sintassi di avvio risulta
essere: TKPROF file_trace_input file_output
SORT=(option1,option2,..) EXECPU, EXEELA, EXEROW PRINT=integer
INSERT=file_scripts_sql_output SYS=boolean TABLE=schema.table
EXPLAIN=user/password RECORD= file_record_output
- Slide 62
- Analizziamo in dettaglio le tecniche da utilizzarsi al fine di
interpretare loutput del TKPROF di cui a seguire viene prospettato
un esempio. call count cpu elapsed disk query current rows ----
------- ------- --------- -------- -------- ------- ------ Parse 11
0.08 0.18 0 0 0 0 Execute 11 0.23 0.66 0 3 6 2 Fetch 35 6.70 6.83
100 12326 2 824
---------------------------------------------------------------
Total 57 7.01 7.67 100 12329 8 826