Data Warehousing con ORACLE Viste materializzate / SnapshotsViste materializzate / Snapshots Star...

62
Data Warehousing con Data Warehousing con ORACLE ORACLE Viste materializzate / Snapshots Viste materializzate / Snapshots Star query Star query Tabelle partizionate Tabelle partizionate Tabelle Organizzate ad Indice Tabelle Organizzate ad Indice Indici BitMapped Indici BitMapped Cluster Cluster Indici Function Based Indici Function Based Transportable Tablespaces Transportable Tablespaces

Transcript of Data Warehousing con ORACLE Viste materializzate / SnapshotsViste materializzate / Snapshots Star...

  • Slide 1
  • Data Warehousing con ORACLE Viste materializzate / SnapshotsViste materializzate / Snapshots Star queryStar query Tabelle partizionateTabelle partizionate Tabelle Organizzate ad IndiceTabelle Organizzate ad Indice Indici BitMappedIndici BitMapped ClusterCluster Indici Function BasedIndici Function Based Transportable TablespacesTransportable Tablespaces
  • Slide 2
  • DataBase TablespaceDataFile Oracle Block O.S. Block Owner Schema SegmentExtent Table Index Cluster Snapshot Oracle Architectural Components (Logical) Partition
  • Slide 3
  • Partitioning offers these advantages: Partitioning enables data management operations such 1.data loads, 2.Index creation and rebuilding 3.backup/recovery at the partition level, rather than on the entire table. This results in significantly reduced times for these operations. Partitioning improves query performance. In many cases, the results of a query can be achieved by accessing a subset of partitions, rather than the entire table. For some queries, this technique (called partition pruning) can provide order-of-magnitude gains in performance. Partitioning can significantly reduce the impact of scheduled downtime for maintenance operations. Partition independence for partition maintenance operations lets you perform concurrent maintenance operations on different partitions of the same table or index.
  • Slide 4
  • Partitioning offers these advantages: Partitioning increases the availability of mission-critical databases if critical tables and indexes are divided into partitions to reduce the maintenance windows, recovery times, and impact of failures. Partitioning can be implemented without requiring any modifications to your applications. For example, you could convert a nonpartitioned table to a partitioned table without needing to modify any of the SELECT statements or DML statements which access that table. You do not need to rewrite your application code to take advantage of partitioning.
  • Slide 5
  • Slide 6
  • Partitioning Methods Oracle provides the following partitioning methods: 1. Range Partitioning 2. List Partitioning 3. Hash Partitioning 4. Composite Partitioning Composite partitioning is a combination of other partitioning methods.
  • Slide 7
  • Range Partitioning Range partitioning maps data to partitions based on ranges of partition Key Values that you establish for each partition. It is the most common type of partitioning and is often used with dates. When using range partitioning, there are a few rules to keep in mind: 1) Each partition has a VALUES LESS THAN clause, which specifies a non inclusive upper bound for the partitions. Any binary values of the partition key equal to or higher than this literal are added to the next higher partition. 2) All partitions, except the first, have an implicit lower bound specified by the VALUES LESS THAN clause on the previous partition. 3) A MAXVALUE literal can be defined for the highest partition. MAXVALUE represents a virtual infinite value that sorts higher than any other possible value for the partition key, including the null value.
  • Slide 8
  • CREATE TABLE sales_range ( salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_amount NUMBER(10), sales_date DATE) PARTITION BY RANGE(sales_date) ( PARTITION sales_jan2000 VALUES LESS THAN(TO_DATE(01/02/2000,DD/MM/YYYY)), PARTITION sales_feb2000 VALUES LESS THAN(TO_DATE(01/03/2000,DD/MM/YYYY)), PARTITION sales_mar2000 VALUES LESS THAN(TO_DATE(01/04/2000,DD/MM/YYYY)), PARTITION sales_apr2000 VALUES LESS THAN(TO_DATE(01/05/2000,DD/MM/YYYY)), )
  • Slide 9
  • List Partitioning List partitioning enables you to explicitly control how rows map to partitions. You do this by specifying a list of discrete values for the partitioning key in the description for each partition. This is different from range partitioning, where a range of values is associated with a partition and from hash partitioning, where a hash function controls the row-to-partition mapping. The advantage of list partitioning is that you can group and organize unordered and unrelated sets of data in a natural way. CREATE TABLE sales_list (salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_state VARCHAR2(20), sales_amount NUMBER(10), sales_date DATE) PARTITION BY LIST(sales_state) ( PARTITION sales_west VALUES IN (California, Hawaii), PARTITION sales_east VALUES IN (New York, Virginia, Florida), PARTITION sales_cent VALUES IN (Texas, Illinois) )
  • Slide 10
  • Hash Partitioning Hash partitioning enables easy partitioning of data that does not lend itself to range or list partitioning. It does this with a simple syntax and is easy to implement. It is a better choice than range partitioning when: 1) You do not know beforehand how much data will map into a given range 2) The sizes of range partitions would differ quite substantially or would be difficult to balance manually 3) Range partitioning would cause the data to be undesirably clustered 4) Performance features such as parallel DML are important The concepts of splitting, dropping or merging partitions do not apply to Hash partitions. Instead, hash partitions can be added and coalesced. CREATE TABLE sales_hash (salesman_id NUMBER(5), salesman_name VARCHAR2(30), sales_amount NUMBER(10), week_no NUMBER(2)) PARTITION BY HASH(salesman_id) PARTITIONS 4 STORE IN (tbs1, tbs2, tbs3, tbs4);
  • Slide 11
  • Maintaining Partitions This section describes how to perform the following specific partition maintenance operations: 1.Moving Partitions 2.Adding Partitions 3.Dropping Partitions 4.Coalescing Partitions 5.Modifying Partition Default Attributes 6.Truncating Partitions 7.Splitting Partitions 8.Merging Partitions 9.Exchanging Table Partitions
  • Slide 12
  • Coalescing Free Space Space for tablespace segments is managed using extents, which are made up of a specific number of contiguous data blocks. Thus, a larger free extent can be fragmented, or smaller contiguous free extents can be coalesced into one larger free extent. However, continuous allocation and deallocation of free space fragments your tablespace and makes allocation of larger extents more difficult. By default, SMON (system monitor) processes incrementally coalesce the free extents of tablespaces in the background.
  • Slide 13
  • Regular table access ROWID Index-Organized Tables IOT access Non-key columns Key column Row header
  • Slide 14
  • Index-Organized Tables An index-organized table has a storage organization that is a variant of a primary B-tree. Unlike an ordinary (heap-organized) table whose data is stored as an unordered collection (heap), data for an index-organized table is stored in a B-tree index structure in a primary key sorted manner. Besides storing the primary key column values of an index-organized table row, each index entry in the B-tree stores the nonkey column values as well.
  • Slide 15
  • Index-Organized Tables Compared with Regular Tables Faster key-based access to table data Faster key-based access to table data Reduced storage requirements Reduced storage requirements Main restrictions: Main restrictions: Must have a primary key Cannot use unique constraints Cannot be clustered Faster key-based access to table data Faster key-based access to table data Reduced storage requirements Reduced storage requirements Main restrictions: Main restrictions: Must have a primary key Cannot use unique constraints Cannot be clustered
  • Slide 16
  • Secondary Indexes on Index-Organized Tables Secondary index support on index-organized tables provides efficient access to index-organized table using columns that: 1) are not the primary key 2) are not prefix of the primary key. Oracle constructs secondary indexes on index-organized tables using logical row identifiers (logical rowids) that are based on the tables primary key. A logical rowid optionally can includes a physical guess, which identifies the block location of the row. Oracle can use these physical guesses to probe directly into the leaf block of the index-organized table, bypassing the primary key search. Because rows in index-organized tables do not have permanent physical addresses, the physical guesses can become stale when rows are moved to new blocks. A) Without physical guesses, access involves two index scans: a secondary index scan followed by a scan of the primary key index. B) With accurate physical guesses, access involves a secondary index scan and an additional I/O to fetch the data block containing the row. C) With inaccurate physical guesses, access involves a secondary index scan and an I/O to fetch the wrong data block (as indicated by the physical guess), followed by a scan of the primary key index.
  • Slide 17
  • Creating Index-Organized Tables SQL> create table sales 2 (office_cd number(3) 3,qtr_enddate 4,revenuenumber(10,2) 5,constraint sales_pk 6 PRIMARY KEY (office_cd,qtr_end) 7 ) 8 ORGANIZATION INDEX 9 tablespace indx 10 storage (); SQL> create table sales 2 (office_cd number(3) 3,qtr_enddate 4,revenuenumber(10,2) 5,constraint sales_pk 6 PRIMARY KEY (office_cd,qtr_end) 7 ) 8 ORGANIZATION INDEX 9 tablespace indx 10 storage ();
  • Slide 18
  • Bitmap Indexes Bitmap indexes are widely used in data warehousing environments. The environments typically have: large amounts of data and ad hoc queries, but a low level of concurrent DML transactions and a low level of cardinality. For such applications, bitmap indexing provides: 1) Reduced response time for large classes of ad hoc queries 2) Reduced storage requirements compared to other indexing techniques 3) Dramatic performance gains even on hardware with a relatively small number of CPUs or a small amount of memory 4) Efficient maintenance during parallel DML and loads Fully indexing a large table with a traditional B-tree index can be prohibitively expensive in terms of space because the indexes can be several times larger than the data in the table.
  • Slide 19
  • The following shows a portion of a company's customers table. SELECT cust_id, cust_gender, cust_marital_status, cust_income_level FROM customers; CUST_ID C CUST_MARITAL_STATUS CUST_INCOME_LEVEL ---------- - -------------------- --------------------- 70 F D: 70,000 - 89,999 80 F married H: 150,000 - 169,999 90 M single H: 150,000 - 169,999 100 F I: 170,000 - 189,999 110 F married C: 50,000 - 69,999 120 M single F: 110,000 - 129,999 130 M J: 190,000 - 249,999 140 M married G: 130,000 - 149,999 Because cust_gender,cust_marital_status, and cust_income_level are all low-cardinality columns (there are only three possible values for marital status and region, two possible values for gender, and 12 for income level), bitmap indexes are ideal for these columns. Do not create a bitmap index on cust_id because this is a unique column. Instead, a unique B-tree index on this column provides the most efficient representation and retrieval.
  • Slide 20
  • ClusterCluster Use clusters to store one or more tables that: 1) Are primarily queried 2) Not predominantly inserted into or updated 3) Which the queries often join data of multiple tables in the cluster. Are an optional method of storing table data. A cluster is a group of tables that share the same data blocks because they share common columns and are often used together.
  • Slide 21
  • Because clusters store related rows of different tables together in the same data blocks, properly used clusters offers these benefits: 1)Disk I/O is reduced for joins of clustered tables. 2) Access time improves for joins of clustered tables. 3) In a cluster, a cluster key value is the value of the cluster key columns for a particular row. Each cluster key value is stored only once each in the cluster and the cluster index, no matter how many rows of different tables contain the value. Therefore, less storage is required to store related table and index data. ClusterCluster
  • Slide 22
  • CREATE CLUSTER emp_dept (deptno NUMBER(3)) PCTFREE 5 TABLESPACE user_data STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0 ); CREATE TABLE dept (deptno NUMBER(3) PRIMARY KEY,.. ) CLUSTER emp_dept (deptno); CREATE TABLE emp (empno NUMBER(5) PRIMARY KEY, ename VARCHAR2(15) NOT NULL,... deptno NUMBER(3) REFERENCES dept) CLUSTER emp_dept (deptno);
  • Slide 23
  • CREATE INDEX emp_dept_index ON CLUSTER emp_dept TABLESPACE user_indx STORAGE (INITIAL n NEXT m MINEXTENTS 1 MAXEXTENTS 121 PCTINCREASE 0) PCTFREE 5; A cluster index must be created before any rows can be inserted into any clustered table. The following statement creates a cluster index for the emp_dept cluster:
  • Slide 24
  • Viewing Information About Clusters The following views display information about clusters:
  • Slide 25
  • Index Function Based Features of Function-Based Indexes Function-based indexes allow you to: 1) More powerful 2) Precompute the value of a computationally intensive function and store it in the index 3) Increase the number of situations where the optimizer can perform a range scan instead of a full table scan You must have the following initialization parameters defined to create a function-based index: *) QUERY_REWRITE_ENABLED set to TRUE *) COMPATIBLE set to 8.1.0.0.0 or a greater value
  • Slide 26
  • Index Function Based Example: Function-Based Index for Case Insensitive Searches The following statement creates function-based index idx on table emp based on an uppercase evaluation of the ename column: CREATE INDEX idx ON emp (UPPER(ename)); Now the SELECT statement uses the function-based index on UPPER(ename) To retrieve all employees with names that start with JOH: SELECT * FROM emp WHERE UPPER(ename) LIKE 'JOH%'; Example: Precomputing Arithmetic Expressions with a Function-Based Index This statement creates a function-based index on an expression: CREATE INDEX idx ON t (a + b * (c - 1), a, b); SELECT statements can use either an index range scan (in the following SELECT statement the expression is a prefix of the index) or index full scan (preferable when the index specifies a high degree of parallelism). SELECT a FROM t WHERE a + b * (c - 1) < 100;
  • Slide 27
  • OLTP Data warehouse Data marts Information distribution Staging Data Transportation: Transportable Tablespaces
  • Slide 28
  • Transporting Tablespaces 1.Make tablespace read-only 2.Export metadata from source 3.Copy data files to target system 4.Transfer export file 5.Import metadata into target 6.If necessary, alter the tablespace to read-write 1.Make tablespace read-only 2.Export metadata from source 3.Copy data files to target system 4.Transfer export file 5.Import metadata into target 6.If necessary, alter the tablespace to read-write
  • Slide 29
  • Exporting and Importing Metadata exp sys/ FILE=s980501.dmp TRANSPORT_TABLESPACE=y TABLESPACES=sales_ts TRIGGERS=N CONSTRAINTS=N imp sys/ FILE=s980501.dmp TRANSPORT_TABLESPACE=y DATAFILES=(/disk1/sales01.dbf, /disk2/sales02.dbf) imp sys/ FILE=s980501.dmp TRANSPORT_TABLESPACE=y DATAFILES=(/disk1/sales01.dbf, /disk2/sales02.dbf)
  • Slide 30
  • Transporting a Tablespace /u/d1/sales1.dbf /u/d2/sales2.dbf SALES _ TS /u/d1 /u/d2 /disk1 /disk2 /disk1/sales01.dbf /disk2/sales02.dbf SALES_TS s980501.dmp Read Only System 1 System 2
  • Slide 31
  • Transportable Tablespaces: Uses Moves entire tablespace data Supports media recovery Source and target databases must : Be on the same operating system Be on the same operating system Run Oracle8i, release 8.1, or above Run Oracle8i, release 8.1, or above Have the same block size Have the same block size Use the same character set Use the same character set Moves entire tablespace data Supports media recovery Source and target databases must : Be on the same operating system Be on the same operating system Run Oracle8i, release 8.1, or above Run Oracle8i, release 8.1, or above Have the same block size Have the same block size Use the same character set Use the same character set
  • Slide 32
  • Checking Transport Set DBMS_TTS.TRANSPORT_SET_CHECK( ts_list=> SALES_TS, incl_constraints=> TRUE); Data dictionary Objects with references to objects outside the transport set into table TRANSPORT_SET_VIOLATIONS List of tablespaces Check referential integrity
  • Slide 33
  • Discoverer Architecture User edition Administration edition Administration edition End User Layer Database (OLTP, Data Warehouse, Data Mart) Business Area Database Complexity is Hidden From Users Viewer edition Data Base DWH
  • Slide 34
  • TuningTuning
  • Slide 35
  • Per ottenere i migliori risultati consigliabile intraprendere la fase di tuning in parallelo con lattivit di disegno, senza attendere le fasi di sviluppo e messa in produzione del software. Il grafico sotto riportato pone in evidenza il rapporto benefici (Y) / tempo di avvio del tuning (X). Un prospetto con andamento opposto si potrebbe pensare se, in alternativa ai benefici, sulle ordinate (Y) si prendessero in considerazione i costi della attivit. Disegno Sviluppo Produzione
  • Slide 36
  • Ecco riportati gli step da seguire, esplicitamente nellordine, al fine di ottenere una ottima politica di tuning:
  • Slide 37
  • Slide 38
  • Slide 39
  • Slide 40
  • Slide 41
  • Slide 42
  • Slide 43
  • Slide 44
  • Slide 45
  • Slide 46
  • Statement SQL Parse Syntax Validation Terminology Validation Grant / User Validation Execution Plan Bind Describe OK Execute KO Fetch OK KO Implicit Cursor
  • Slide 47
  • Comando SQL Ottimizzatore Manuale Interno delle Regole Informazioni Statistiche Addizionali Piano di Esecuzione
  • Slide 48
  • Analyze Table Index Cluster Name Compute Estimate Delete Statistics; Sample Integer Rows Percent DBMS_UTILITY.ANALYZE_SCHEMA( schema VARCHAR2, method VARCHAR2, estimate_rowsNUMBERDEFAULT NULL, estimate_percent NUMBER DEFAULT NULL, method_opt VARCHAR2 DEFAULT NULL);
  • Slide 49
  • Per abilitare tale metodologia si pu intervenire a livello di D.B di sessione di singolo statement SQL. Parametro di inizializzazione OPTIMIZER_MODE RULE(rule-based) CHOOSE(cost-based) FIRST_ROWS(cost-based - minimizza il response time) ALL_ROWS(cost-based - minimizza il total execution time) Comando ALTER SESSION SET OPTIMIZER_MODE Sql Statement Select /*+ choose */ campi from tavole where condizione;
  • Slide 50
  • Ottimizzatore Comando SQL Piano di Esecuzione Plan_Table
  • Slide 51
  • Slide 52
  • Slide 53
  • Slide 54
  • Automated Performance Tuning System La metodica che si pone alla base della strategia architettata su quattro step distinti: 1 - Individuazione degli aspetti del Data Base che si desidera porre sotto analisi, definizione delle regole e rintraccio dei valori limite. 2 - Collecting data. Acquisizione delle informazioni legate a: database instance schema environment workload (carico di lavoro). 3 - Viewing dei collected data. Prospetto, tramite reportistiche semplici e di immediata comprensione, dei dati precedentemente raccolti. 4 - Analyzing data / generate recommendations. Nel caso in cui i valori limite non siano soddisfatti ecco il sistema intervenire con una serie di utili consigli se non addirittura con un insieme di risoluzioni automatiche poste in essere. Sono controlli che rientrano sotto il nome di "Routine Tuning" da considerarsi come "prevenzione" essendo gli stessi un help per anticipare quelli che potrebbero divenire reali problemi prima chegli stessi si presentino con la propria complessita' risolutiva.
  • Slide 55
  • Controllo 1-%Library Cache Misses < 1% select round(sum(reloads)/sum(pins)*100,2) Col1 from v$librarycache; > shared_pool_size Controllo 2 -%Data Dictionary Cache Misses < 10% select round(sum(getmisses)/sum(gets)*100,2) Col1 from v$rowcache; > shared_pool_size
  • Slide 56
  • Controllo 3 -Ratio Logico/Fisico Critico > 80% select round(((1-(a.value/(b.value+c.value)))*100),2) Col1 from v$sysstat a, v$sysstat b, v$sysstat c where a.name = 'physical reads' -- accessi fisici and b.name = 'db block gets' -- accessi logici and c.name = 'consistent gets'; -- accessi logici > db_block_buffer
  • Slide 57
  • Controllo 4 -Analisi Aree di Sort Parallelamente all'area condivisa di ram (SGA), esistono un insieme di aree dimensionalmente ridotte, non condivise ed in relazione 1:1 con i processi dedicati al supporto delle connessioni utente (PGA) la cui funzionalita' principale e' rintracciabile nel supporto alle politiche di sort e di grouping. Ogni statement che richiama operazioni quali sort e grouping, sfrutta per il raggiungimento dell'obiettivo l'area PGA. Nel caso in cui la stessa risulti non sufficiente per accomodare l'attivit, quest'ultima migra sui segmenti temporanei opportunamente creati su disco, con un degrado significativo dei tempi di esecuzione. Non esiste un limite da utilizzarsi come confronto. Viene demandata alla sensibilit del DBA, la decisione su di una rianalisi dei parametri di initSID.ora: sort_area_retained_size dimensione in bytes allocata nella PGA per potenziali SORT sort_area_size dimensione in bytes allocabile nella PGA per sicuri SORT. Sono individuate due soglie (min & max) per cercare di risolvere il maggior numero di attivita' integralmente in ram.
  • Slide 58
  • select name, value from v$sysstat where name in ('sorts (memory)','sorts (disk)'); Controllo 4 -Analisi Aree di Sort
  • Slide 59
  • Controllo 5 -Analisi Oggetti con + 25 extents Il controllo prosegue concentrandosi sugli oggetti (segmenti) con un livello di criticit, un livello di frammentazione (numero di extents associati) elevato e potenzialmente colpevolizzabile di un peggioramento delle performances del sistema. select owner, segment_name, segment_type, tablespace_name, extents from dba_segments where extents > 25 and owner not in ('SYS') order by owner,segment_type, extents desc;
  • Slide 60
  • SQL Trace Facility Sono in grado di fornire informazioni preziose, per ogni statement SQL chiamato in causa, generando le seguenti statistiche attive: 1)a livello di sessione (alter session set sql_trace = TRUE) 2)a livello di intera istanza (parametro di configurazione del file initSID.ora sql_trace=true) Numero di parse, execute e fetch Tempo di CPU e tempo di elapsed (trascorso) Numero di letture logiche e letture fisiche Numero di record processati archiviate, in formato interno, in un file denominato trace file
  • Slide 61
  • Al fine di prospettare le informazione storicizzate nel file di trace sopra descritto, occorre dare in pasto lo stesso archivio ad un formattatore denominato TKPROF la cui sintassi di avvio risulta essere: TKPROF file_trace_input file_output SORT=(option1,option2,..) EXECPU, EXEELA, EXEROW PRINT=integer INSERT=file_scripts_sql_output SYS=boolean TABLE=schema.table EXPLAIN=user/password RECORD= file_record_output
  • Slide 62
  • Analizziamo in dettaglio le tecniche da utilizzarsi al fine di interpretare loutput del TKPROF di cui a seguire viene prospettato un esempio. call count cpu elapsed disk query current rows ---- ------- ------- --------- -------- -------- ------- ------ Parse 11 0.08 0.18 0 0 0 0 Execute 11 0.23 0.66 0 3 6 2 Fetch 35 6.70 6.83 100 12326 2 824 --------------------------------------------------------------- Total 57 7.01 7.67 100 12329 8 826