Views expressed here are solely that of my own. Please make sure that you test the code/queries that appear on my blog before applying them to the production environment.

Tuesday, July 19, 2011

Will index compression help us or not ?

Below are two demonstrations related to the index compression in Oracle. In the first test, index compression will make situation worse because the indexed column values are unique, in the second test we will see how much we will get by using index compression in a specific case.

Lets begin the first test by creating a sample schema and a sample table.
[oracle@localhost uural]$ sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Tue Jul 19 01:45:22 2011

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> !vi cre_user-U0001.sql

drop user u0001 cascade;
create user u0001 identified by u0001;
grant create session to u0001;
alter user u0001 quota 5G on users;
grant create table to u0001;
grant dba to u0001;

conn u0001/u0001
sho user
~
~
~
"cre_user-U0001.sql" 9L, 207C written

SQL> @cre_user-U0001

User dropped.


User created.


Grant succeeded.


User altered.


Grant succeeded.


Grant succeeded.

Connected.
USER is "U0001"

I created a script to query the index related information form "user_indexes", validate the index and then query the "index_stats" view.
SQL> !vi check_index_stats.sql

set lines 120
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key, num_rows, clustering_factor
from user_indexes
where index_name = '&&index_name';
validate index &&index_name;
select name, height, blocks, lf_rows, lf_blks, lf_rows_len, lf_blk_len, br_rows, br_blks, br_rows_len, br_blk_len, del_lf_rows, del_lf_rows_len, distinct_keys, most_repeated_key, btree_space, used_space, pct_used, rows_per_key, blks_gets_per_access
from index_stats
where name = '&&index_name';
undefine index_name
set lines 80
~
~
~
"check_index_stats.sql" 11L, 558C written


SQL> create table t1 as
  2  select
  3          rownum                          id,
  4          dbms_random.string('U',6)       v1,
  5          lpad(rownum,10)                 v2,
  6          rpad('x',100)                   padding
  7  from
  8  dual
  9  connect by level <= 1000000
 10  ;
/

SQL> create index t1_01 on t1(id);

Index created.

Elapsed: 00:00:02.02
SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:16.47
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        2226       1000000                       1                       1
   1000000             17858


Elapsed: 00:00:00.03
old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

Elapsed: 00:00:00.59
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       2304    1000000       2226    15979802       7996       2225          5
      26657       8028           0               0       1000000                 1    17839236   16006459         90
           1                    4


Elapsed: 00:00:00.19
As we can see from the above output, the index we created has 2226 leaf blocks and since the values in "id" column are unique it has 1,000,000 distinct keys, it took nearly 18 MB storage to keep this index.

Lets now try to query this table by using this index.
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.15
SQL> set lines 120
SQL> set autot trace
SQL> select * from t1 where id between 100000 and 200000;

100001 rows selected.

Elapsed: 00:00:03.25

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   100K|    11M|  2013   (1)| 00:00:25 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   100K|    11M|  2013   (1)| 00:00:25 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |   100K|       |   226   (1)| 00:00:03 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=100000 AND "ID"<=200000)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      15213  consistent gets
       2013  physical reads
          0  redo size
   13766300  bytes sent via SQL*Net to client
      73745  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100001  rows processed

SQL> set autot off
As a result it took 03.25 seconds, 15213 consistent gets and 2013 physical reads to get 100001 rows from the table.

Now drop and create this index as compressed
SQL> drop index t1_01;

Index dropped.

SQL> create index t1_01 on t1(id) compress;

Index created.

SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        3067       1000000                       1                       1
   1000000             17858


old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       3200    1000000       3067    11000000       7992       3066          6
      36702       8028           0               0       1000000                 1    24559632   22016504         90
           1                    4
As we can see from the above output, the compressed index we created has 3067 leaf blocks and it took nearly 25 MB storage to keep this index. These values are greater than the values that we got when we used non-compressed index.

Lets now try to query this table by using this compressed index.
SQL> alter system flush buffer_cache;

System altered.

SQL> set timing on
SQL> set autot trace
SQL> select * from t1 where id between 100000 and 200000;

100001 rows selected.

Elapsed: 00:00:03.31

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   100K|    11M|  2097   (1)| 00:00:26 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   100K|    11M|  2097   (1)| 00:00:26 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |   100K|       |   310   (1)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=100000 AND "ID"<=200000)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      15292  consistent gets
       2097  physical reads
          0  redo size
   13766300  bytes sent via SQL*Net to client
      73745  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100001  rows processed

SQL> set autot off
SQL> drop index t1_01;

Index dropped.

Elapsed: 00:00:00.11

As a result, it took 03.31 seconds, 15292 consistent gets and 2097 physical reads to run the query by using this compressed index. These values are greater than the previous test which we used a non-compressed index.

In that case using a compressed index made situation worse. The reason for this is that, Oracle makes the compression by using an algorithm to tore the index key values only once in every index block in a special structure and put only the reference to this index key in index entries instead of using the index key itself.

Compressing an index on a column with unique values, causes the index to consume more storage space and it will make the performance worse. Because index will have to write all the key values in the block and keep an extra structure to manage the compressed index.

We should use the index compression on indexes with less selective columns, so that the number of repeating values for a index key in a block will increase making the index compression more efficient in terms of storage space and performance due to reading and fetching less number of index blocks to the database server memory.

Indeed, Oracle protects this kind of case to happen, if we had given it some information about this index column. If we try to create the index having only one column as unique and compressed, it will not allow us to do that, since the index compression in that case will not be efficient. It tries to protect us from making mistakes shortly.

Lets try this
SQL> create unique index t1_01 on t1(id) compress;
create unique index t1_01 on t1(id) compress
                                    *
ERROR at line 1:
ORA-25193: cannot use COMPRESS option for a single column key


-------------------

Lets demonstrate another case where index compression will help to decrease index size and improve the performance.
In this test, we will create a sample table which has a non-selective column with repeating values on which we will create our index.

SQL> set autot off
SQL> create table t1
  2  as
  3  with generator as (
  4      select  --+ materialize
  5          *
  6      from all_objects
  7  )
  8  select
  9      v1.*
 10  from
 11      generator   v1,
 12      generator   v2
 13  where
 14      rownum <= 1000000
 15  ;

Table created.

Elapsed: 00:00:23.85
SQL> create index t1_01 on t1(object_type);

Index created.

Elapsed: 00:00:02.80
SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.38
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        2391             3                     797                    3800
   1000000             11402


Elapsed: 00:00:00.02
old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

Elapsed: 00:00:00.54
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       2560    1000000       2391    17147394       7996       2390          7
      45746       8028           0               0             3            484121    19174632   17193140         90
  333333.333           166670.167


Elapsed: 00:00:00.19
This index has 2391 leaf blocks, takes nearly 19 MB storage space.

Lets try to query the table using this index.
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.18
SQL> set autot trace
SQL> select object_type, count(*) from t1 group by object_type order by object_type;

Elapsed: 00:00:00.35

Execution Plan
----------------------------------------------------------
Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     3 |    21 |  3168   (2)| 00:00:39 |
|   1 |  SORT GROUP BY     |      |     3 |    21 |  3168   (2)| 00:00:39 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|  6835K|  3139   (1)| 00:00:38 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      11397  consistent gets
      11395  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          3  rows processed
Oracle chose Full Table Scan (FTS) because the index column "object_type" is not defined as a "NOT NULL" column and Oracle thinks that there maybe null values which are not in the index but in the table and do not use index. To force Oracle use index, we will make "object_type" column "NOT NULL".

SQL> alter table t1 modify object_type not null;

Table altered.

Elapsed: 00:00:00.29
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.06
SQL> select object_type, count(*) from t1 group by object_type order by object_type;

Elapsed: 00:00:00.35

Execution Plan
----------------------------------------------------------
Plan hash value: 2450666175

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     3 |    21 |   682   (5)| 00:00:09 |
|   1 |  SORT GROUP BY        |       |     3 |    21 |   682   (5)| 00:00:09 |
|   2 |   INDEX FAST FULL SCAN| T1_01 |  1000K|  6835K|   653   (1)| 00:00:08 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         74  recursive calls
          0  db block gets
       2473  consistent gets
       2415  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
          3  rows processed
Now Oracle uses this index. It takes 00.35 seconds, 2473 consistent gets and 2415 physical reads to get the final 3 rows.

Now try to recreate the index as compressed.
SQL> drop index t1_01;

Index dropped.

Elapsed: 00:00:00.13
SQL> create index t1_01 on t1(object_type) compress;

Index created.

Elapsed: 00:00:02.18
SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.07
SQL> set autot off
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        1537             3                     512                    3800
   1000000             11402


Elapsed: 00:00:00.01
old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

Elapsed: 00:00:00.56
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       1664    1000000       1537    11000000       7992       1536          5
      29374       8028           0               0             3            484121    12323844   11048070         90
  333333.333           166670.167


Elapsed: 00:00:00.05
The new compressed index has 1537 leaf blocks, taking nearly 12 MB storage space and this is nearly half of the uncompressed index size. Also pay attention, how AVG_LEAF_BLOCKS_PER_KEY values got down from 797 to 512, which shows average index leaf blocks needed to store each index key.

Lets go on running the SQL by using this compressed index.
SQL> set autot trace
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.07
SQL> select object_type, count(*) from t1 group by object_type order by object_type;

Elapsed: 00:00:00.28

Execution Plan
----------------------------------------------------------
Plan hash value: 2450666175

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     3 |    21 |   451   (8)| 00:00:06 |
|   1 |  SORT GROUP BY        |       |     3 |    21 |   451   (8)| 00:00:06 |
|   2 |   INDEX FAST FULL SCAN| T1_01 |  1000K|  6835K|   422   (1)| 00:00:06 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1550  consistent gets
       1543  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          3  rows processed

This time query took 00.28 seconds which is not too much faster than using a non-compressed index but still faster. Physical reads which are very expensive went down from 2415 to 1543. By using compressed index we saved some index storage and performance of some SQLs.

2 comments:

Ugur Inal said...

Ural bey merhaba,

Çok güzel bir index yazı dizisi olmuş. Hele hele önümüzdeki 2 ay Hollanda'daki 9i sistemimizin aşırı yüksek clustering_factor değerine sahip 1500 civarında indeksinin yeniden iyileştirilmesi işlemine geçeceğim düşünülürse...

Ural Ural said...

Cok tesekkurler Ugur Bey, biraz gec oldu ama,size kolay gelsin dileklerimi iletiyorum.
Saygilar,
Ural