Views expressed here are solely that of my own. Please make sure that you test the code/queries that appear on my blog before applying them to the production environment.

Friday, July 22, 2011

Why we should put index creation statements at the end of the table transfer scripts

Try to put the index creation statements at the end of the table transfer scritps, after the table data transfer part. It does not only increase the performance of the script in data insert part but also results in a better utilized index structure. Creating the index first and inserting table data causes different index structure to be created than inserting the table data and then creating the index, if your index key is not a unique value and inserted into the table in sequential order. Second option will be most probably faster and will cause the index blocks better utilized.

Below is a demonstration showing that case.

Lets start with creating a sample schema. You can find the contents of the "cre_user-U0001.sql" in my previous blog posts.
[oracle@localhost uural]$ sqlplus / as sysdba
 
SQL*Plus: Release 11.1.0.7.0 - Production on Thu Jul 21 01:02:47 2011
 
Copyright (c) 1982, 2008, Oracle.  All rights reserved.
 

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
 
SQL> @cre_user-U0001
 
User dropped.
 

User created.
 

Grant succeeded.
 

User altered.
 

Grant succeeded.
 

Grant succeeded.
 
Connected.
USER is "U0001"
SQL> set lines 120
SQL> !vi check_index_stats.sql
 
set lines 120
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key, num_rows, clustering_factor
from user_indexes
where index_name = '&&index_name';
validate index &&index_name;
select name, height, blocks, lf_rows, lf_blks, lf_rows_len, lf_blk_len, br_rows, br_blks, br_rows_len, br_blk_len, del_lf_rows, del_lf_rows_len, distinct_keys, most_repeated_key, b
tree_space, used_space, pct_used, rows_per_key, blks_gets_per_access
from index_stats
where name = '&&index_name';
undefine index_name
--set lines 80
 
~
~

Now create an empty table and an index on its id column.
SQL> create table t1 tablespace ts_no_assm
  2  as
  3  select rownum id
  4  from dual
  5  where null is not null;
 
Table created.
 
SQL> create index t1_01 on t1(id) tablespace ts_no_assm;
 
Index created.

Then insert some 100,000 rows into that table, with id column values are not unique and sequential. You can think of that operation like you are transferring the data from another database table to "t1" table.
SQL> insert into t1
  2  select mod(rownum,1000) id
  3  from dual
  4  connect by level <= 100000;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent=>null, cascade=>true, method_opt=>'for all columns size 1');
 
PL/SQL procedure successfully completed.

Check the index stats
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   1         252          1000                       1                     100
    100000            100000
 

old   1: validate index &&index_name
new   1: validate index T1_01
 
Index analyzed.
 
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'
 
NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   2        256     100000        252     1489000       7996        251          1
       3314       8028           0               0          1000               100     2023020    1492314         74
         100                 52.5
As you can see in the index stats above, index has grown to 252 leaf blocks which are %74 full on average and index size is around 2 MB.

Now try the second case by first inserting the rows than creating the index.
SQL> drop table t1;
 
Table dropped.

SQL> create table t1 tablespace ts_no_assm
  2  as
  3  select rownum id
  4  from dual
  5  where null is not null;
 
Table created.
 
SQL> insert into t1
  2  select mod(rownum,1000) id
  3  from dual
  4  connect by level <= 100000;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index t1_01 on t1(id) tablespace ts_no_assm;
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent=>null, cascade=>true, method_opt=>'for all columns size 1');
 
PL/SQL procedure successfully completed.
 
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   1         208          1000                       1                     100
    100000            100000
 

old   1: validate index &&index_name
new   1: validate index T1_01
 
Index analyzed.
 
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'
 
NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   2        256     100000        208     1489000       7996        207          1
       3066       8028           0               0          1000               100     1671196    1492066         90
         100                 52.5

As you can see now, index has grown to 208 leaf blocks which are %90 full on average and index size is around 1.7 MB.

As a result, putting the index creation statements at the end of a table transfer script, does not only makes the data insert operation faster, since the database will not have to deal with creating the index tree, but also makes the index well organized by utilizing the leaf blocks better with %90 average usage because of 90-10 index block splits.

The reason for this result is, when you first create the index and then insert the rows, the indexed column values are inserted into the table not in sequential order because of the "mod(rownum, 1000)" function because the same value was inserted again for every 1,000 rows causing 50-50 index leaf block splits. This caused the index allocate more blocks for its leaf blocks making the index larger. Although in the second scenario, since all the rows were already inserted to the table, when we created the index on id column, Oracle selected the id column values from table and sort them before inserting into the index tree causing only 90-10 index leaf block splits.

If the id column were unique and inserted into table in sequential order, it would not make much difference creating the index first from the index structure point of view, but anyway insert operation will be faster if you create the index as a last step.

At least, if you can not drop the existing index on an existing table and then create it after the insert operation, you can use the method of making the existing index "UNUSABLE" then "REBUILD" it after the insert operation which will be faster and more effective related the index structure.

In an OLTP system, which the insert, update, delete operations will start to happen after you transferred the table, index structure will soon do some 50-50 index leaf block splits and return to a structure with less utilized leaf blocks, but in a datawarehouse-like database where mostly insert and select statements happen, you can use this method more effectively.

Tuesday, July 19, 2011

Will index compression help us or not ?

Below are two demonstrations related to the index compression in Oracle. In the first test, index compression will make situation worse because the indexed column values are unique, in the second test we will see how much we will get by using index compression in a specific case.

Lets begin the first test by creating a sample schema and a sample table.
[oracle@localhost uural]$ sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Tue Jul 19 01:45:22 2011

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> !vi cre_user-U0001.sql

drop user u0001 cascade;
create user u0001 identified by u0001;
grant create session to u0001;
alter user u0001 quota 5G on users;
grant create table to u0001;
grant dba to u0001;

conn u0001/u0001
sho user
~
~
~
"cre_user-U0001.sql" 9L, 207C written

SQL> @cre_user-U0001

User dropped.


User created.


Grant succeeded.


User altered.


Grant succeeded.


Grant succeeded.

Connected.
USER is "U0001"

I created a script to query the index related information form "user_indexes", validate the index and then query the "index_stats" view.
SQL> !vi check_index_stats.sql

set lines 120
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key, num_rows, clustering_factor
from user_indexes
where index_name = '&&index_name';
validate index &&index_name;
select name, height, blocks, lf_rows, lf_blks, lf_rows_len, lf_blk_len, br_rows, br_blks, br_rows_len, br_blk_len, del_lf_rows, del_lf_rows_len, distinct_keys, most_repeated_key, btree_space, used_space, pct_used, rows_per_key, blks_gets_per_access
from index_stats
where name = '&&index_name';
undefine index_name
set lines 80
~
~
~
"check_index_stats.sql" 11L, 558C written


SQL> create table t1 as
  2  select
  3          rownum                          id,
  4          dbms_random.string('U',6)       v1,
  5          lpad(rownum,10)                 v2,
  6          rpad('x',100)                   padding
  7  from
  8  dual
  9  connect by level <= 1000000
 10  ;
/

SQL> create index t1_01 on t1(id);

Index created.

Elapsed: 00:00:02.02
SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:16.47
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        2226       1000000                       1                       1
   1000000             17858


Elapsed: 00:00:00.03
old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

Elapsed: 00:00:00.59
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       2304    1000000       2226    15979802       7996       2225          5
      26657       8028           0               0       1000000                 1    17839236   16006459         90
           1                    4


Elapsed: 00:00:00.19
As we can see from the above output, the index we created has 2226 leaf blocks and since the values in "id" column are unique it has 1,000,000 distinct keys, it took nearly 18 MB storage to keep this index.

Lets now try to query this table by using this index.
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.15
SQL> set lines 120
SQL> set autot trace
SQL> select * from t1 where id between 100000 and 200000;

100001 rows selected.

Elapsed: 00:00:03.25

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   100K|    11M|  2013   (1)| 00:00:25 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   100K|    11M|  2013   (1)| 00:00:25 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |   100K|       |   226   (1)| 00:00:03 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=100000 AND "ID"<=200000)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      15213  consistent gets
       2013  physical reads
          0  redo size
   13766300  bytes sent via SQL*Net to client
      73745  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100001  rows processed

SQL> set autot off
As a result it took 03.25 seconds, 15213 consistent gets and 2013 physical reads to get 100001 rows from the table.

Now drop and create this index as compressed
SQL> drop index t1_01;

Index dropped.

SQL> create index t1_01 on t1(id) compress;

Index created.

SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        3067       1000000                       1                       1
   1000000             17858


old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       3200    1000000       3067    11000000       7992       3066          6
      36702       8028           0               0       1000000                 1    24559632   22016504         90
           1                    4
As we can see from the above output, the compressed index we created has 3067 leaf blocks and it took nearly 25 MB storage to keep this index. These values are greater than the values that we got when we used non-compressed index.

Lets now try to query this table by using this compressed index.
SQL> alter system flush buffer_cache;

System altered.

SQL> set timing on
SQL> set autot trace
SQL> select * from t1 where id between 100000 and 200000;

100001 rows selected.

Elapsed: 00:00:03.31

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   100K|    11M|  2097   (1)| 00:00:26 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   100K|    11M|  2097   (1)| 00:00:26 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |   100K|       |   310   (1)| 00:00:04 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID">=100000 AND "ID"<=200000)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      15292  consistent gets
       2097  physical reads
          0  redo size
   13766300  bytes sent via SQL*Net to client
      73745  bytes received via SQL*Net from client
       6668  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100001  rows processed

SQL> set autot off
SQL> drop index t1_01;

Index dropped.

Elapsed: 00:00:00.11

As a result, it took 03.31 seconds, 15292 consistent gets and 2097 physical reads to run the query by using this compressed index. These values are greater than the previous test which we used a non-compressed index.

In that case using a compressed index made situation worse. The reason for this is that, Oracle makes the compression by using an algorithm to tore the index key values only once in every index block in a special structure and put only the reference to this index key in index entries instead of using the index key itself.

Compressing an index on a column with unique values, causes the index to consume more storage space and it will make the performance worse. Because index will have to write all the key values in the block and keep an extra structure to manage the compressed index.

We should use the index compression on indexes with less selective columns, so that the number of repeating values for a index key in a block will increase making the index compression more efficient in terms of storage space and performance due to reading and fetching less number of index blocks to the database server memory.

Indeed, Oracle protects this kind of case to happen, if we had given it some information about this index column. If we try to create the index having only one column as unique and compressed, it will not allow us to do that, since the index compression in that case will not be efficient. It tries to protect us from making mistakes shortly.

Lets try this
SQL> create unique index t1_01 on t1(id) compress;
create unique index t1_01 on t1(id) compress
                                    *
ERROR at line 1:
ORA-25193: cannot use COMPRESS option for a single column key


-------------------

Lets demonstrate another case where index compression will help to decrease index size and improve the performance.
In this test, we will create a sample table which has a non-selective column with repeating values on which we will create our index.

SQL> set autot off
SQL> create table t1
  2  as
  3  with generator as (
  4      select  --+ materialize
  5          *
  6      from all_objects
  7  )
  8  select
  9      v1.*
 10  from
 11      generator   v1,
 12      generator   v2
 13  where
 14      rownum <= 1000000
 15  ;

Table created.

Elapsed: 00:00:23.85
SQL> create index t1_01 on t1(object_type);

Index created.

Elapsed: 00:00:02.80
SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.38
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        2391             3                     797                    3800
   1000000             11402


Elapsed: 00:00:00.02
old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

Elapsed: 00:00:00.54
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       2560    1000000       2391    17147394       7996       2390          7
      45746       8028           0               0             3            484121    19174632   17193140         90
  333333.333           166670.167


Elapsed: 00:00:00.19
This index has 2391 leaf blocks, takes nearly 19 MB storage space.

Lets try to query the table using this index.
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.18
SQL> set autot trace
SQL> select object_type, count(*) from t1 group by object_type order by object_type;

Elapsed: 00:00:00.35

Execution Plan
----------------------------------------------------------
Plan hash value: 3946799371

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     3 |    21 |  3168   (2)| 00:00:39 |
|   1 |  SORT GROUP BY     |      |     3 |    21 |  3168   (2)| 00:00:39 |
|   2 |   TABLE ACCESS FULL| T1   |  1000K|  6835K|  3139   (1)| 00:00:38 |
---------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      11397  consistent gets
      11395  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          3  rows processed
Oracle chose Full Table Scan (FTS) because the index column "object_type" is not defined as a "NOT NULL" column and Oracle thinks that there maybe null values which are not in the index but in the table and do not use index. To force Oracle use index, we will make "object_type" column "NOT NULL".

SQL> alter table t1 modify object_type not null;

Table altered.

Elapsed: 00:00:00.29
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.06
SQL> select object_type, count(*) from t1 group by object_type order by object_type;

Elapsed: 00:00:00.35

Execution Plan
----------------------------------------------------------
Plan hash value: 2450666175

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     3 |    21 |   682   (5)| 00:00:09 |
|   1 |  SORT GROUP BY        |       |     3 |    21 |   682   (5)| 00:00:09 |
|   2 |   INDEX FAST FULL SCAN| T1_01 |  1000K|  6835K|   653   (1)| 00:00:08 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
         74  recursive calls
          0  db block gets
       2473  consistent gets
       2415  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          7  sorts (memory)
          0  sorts (disk)
          3  rows processed
Now Oracle uses this index. It takes 00.35 seconds, 2473 consistent gets and 2415 physical reads to get the final 3 rows.

Now try to recreate the index as compressed.
SQL> drop index t1_01;

Index dropped.

Elapsed: 00:00:00.13
SQL> create index t1_01 on t1(object_type) compress;

Index created.

Elapsed: 00:00:02.18
SQL> exec dbms_stats.gather_table_stats('', 't1', estimate_percent => null, cascade => true, method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:10.07
SQL> set autot off
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'

INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   2        1537             3                     512                    3800
   1000000             11402


Elapsed: 00:00:00.01
old   1: validate index &&index_name
new   1: validate index T1_01

Index analyzed.

Elapsed: 00:00:00.56
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'

NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   3       1664    1000000       1537    11000000       7992       1536          5
      29374       8028           0               0             3            484121    12323844   11048070         90
  333333.333           166670.167


Elapsed: 00:00:00.05
The new compressed index has 1537 leaf blocks, taking nearly 12 MB storage space and this is nearly half of the uncompressed index size. Also pay attention, how AVG_LEAF_BLOCKS_PER_KEY values got down from 797 to 512, which shows average index leaf blocks needed to store each index key.

Lets go on running the SQL by using this compressed index.
SQL> set autot trace
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.07
SQL> select object_type, count(*) from t1 group by object_type order by object_type;

Elapsed: 00:00:00.28

Execution Plan
----------------------------------------------------------
Plan hash value: 2450666175

-------------------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     3 |    21 |   451   (8)| 00:00:06 |
|   1 |  SORT GROUP BY        |       |     3 |    21 |   451   (8)| 00:00:06 |
|   2 |   INDEX FAST FULL SCAN| T1_01 |  1000K|  6835K|   422   (1)| 00:00:06 |
-------------------------------------------------------------------------------


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
       1550  consistent gets
       1543  physical reads
          0  redo size
        551  bytes sent via SQL*Net to client
        419  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          3  rows processed

This time query took 00.28 seconds which is not too much faster than using a non-compressed index but still faster. Physical reads which are very expensive went down from 2415 to 1543. By using compressed index we saved some index storage and performance of some SQLs.

Sunday, July 17, 2011

Indexing only the needed column values to save index space and also to improve the performance

You can reduce the index space and also improve the performance of specific SQL statements by indexing only a specific value of a column by using function-based indexes in Oracle.

Below is a demonstration related to this.

First create a sample schema.
[oracle@localhost uural]$ sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Sun Jul 17 05:01:51 2011

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> @cre_user-U0001


User dropped.


User created.


Grant succeeded.


User altered.


Grant succeeded.


Grant succeeded.

SQL> !more cre_user-U0001.sql
drop user u0001 cascade;
create user u0001 identified by u0001;
grant create session to u0001;
alter user u0001 quota 5G on users;
grant create table to u0001;
grant dba to u0001;

Create a sample table with 1,000,000 rows with a status column which has the value of 'CLOSED'.
SQL> conn u0001/u0001
Connected.
SQL> create table t1 as
  2  select
  3    rownum id,
  dbms_random.string('U',6) v1,
  4    5    lpad(rownum,10) v2,
  6    rpad('x',100) padding,
  7    'CLOSED' status
  8  from
  dual
  9   10  connect by level <= 1000000;


Table created.

SQL> desc t1
 Name                                      Null?    Type
 ----------------------------------------- -------- ----------------------------
 ID                                                 NUMBER
 V1                                                 VARCHAR2(4000)
 V2                                                 VARCHAR2(40)
 PADDING                                            VARCHAR2(100)
 STATUS                                             CHAR(6)

Now insert some 100 rows which have the status column as 'OPEN'.
SQL> insert into t1
  2  select
  3    rownum id,
  4    dbms_random.string('U',6) v1,
  5    lpad(rownum,10) v2,
  6    rpad('x',100) padding,
  7    'OPEN' status
  8  from
  9    dual
 10  connect by level <= 100;

100 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent => null,cascade => true,method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> set autot trace
SQL> sho autot
autotrace TRACEONLY EXPLAIN STATISTICS

Create an index on status column.
SQL> create index t1_01 on t1(status);

Index created.

SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent => null,cascade => true,method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> alter system flush buffer_cache;

System altered.

Query the the rows which has the status column with the value 'OPEN'.
SQL> set timing on
SQL> select * from t1 where status = 'OPEN';

100 rows selected.

Elapsed: 00:00:01.94

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   500K|    62M|  5220   (1)| 00:01:03 |
|*  1 |  TABLE ACCESS FULL| T1   |   500K|    62M|  5220   (1)| 00:01:03 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("STATUS"='OPEN')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
      18897  consistent gets
      18887  physical reads
          0  redo size
       4229  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As you can see the CBO did not use the index on status column most probably because of the low distinct values and low selectivity of the "status" column. It did 18887 "physical reads" and 18897 "consistent gets" and it took "01.94" seconds to get the result.

Now try to give a hint to make the CBO use the index on "status" column.
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.25
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.02

SQL> select /*+ index(t, t1_01)*/ * from t1 t where status = 'OPEN';

100 rows selected.

Elapsed: 00:00:00.04

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time    |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   500K|    62M| 10703   (1)| 00:02:09 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   500K|    62M| 10703   (1)| 00:02:09 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |   500K|       |  1262   (1)| 00:00:16 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("STATUS"='OPEN')


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         19  consistent gets
          5  physical reads
          0  redo size
      14839  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

As you can see above, CBO used the index and did 5 "physical reads" and 19 "consistent gets" and query took "00.04" seconds although the CBO computed the cost higher then Full Table Scan (FTS).

Lets look at the index structure
SQL> set autot off

SQL> select BLEVEL, LEAF_BLOCKS, DISTINCT_KEYS, AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, NUM_ROWS from user_indexes where index_name = 'T1_01';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
---------- ----------- ------------- -----------------------
AVG_DATA_BLOCKS_PER_KEY   NUM_ROWS
----------------------- ----------
         2        2513             2                    1256
                   9435    1000100


Elapsed: 00:00:00.06
SQL> validate index t1_01;

Index analyzed.

Elapsed: 00:00:01.45

SQL> select HEIGHT,BLOCKS,LF_ROWS,LF_BLKS,BR_ROWS,BR_BLKS,DEL_LF_ROWS,DISTINCT_KEYS,USED_SPACE,PCT_USED,ROWS_PER_KEY,BLKS_GETS_PER_ACCESS from index_stats;

    HEIGHT     BLOCKS    LF_ROWS    LF_BLKS    BR_ROWS    BR_BLKS DEL_LF_ROWS
---------- ---------- ---------- ---------- ---------- ---------- -----------
DISTINCT_KEYS USED_SPACE   PCT_USED ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------- ---------- ---------- ------------ --------------------
         3       2688    1000100       2513       2512          8           0
            2   18051946         90       500050             250028.5


Elapsed: 00:00:00.73
You can see from the index stats that, this index ha 2513 leaf blocks which takes about 18 MB storage space.

Lets now drop this index and create a function-based index on "status" column but for only the value that we are interested in.
SQL> drop index t1_01;

Index dropped.

Elapsed: 00:00:00.60

We use the CASE statement, to tell Oracle that we only want the rows whose status column is "OPEN" to be indexed. Since the CASE will return NULL value for other values these rows' rowids will not be kept in the index structure.
SQL> create index t1_01 on t1(case when status = 'OPEN' then status else null end);

Index created.

Elapsed: 00:00:01.87
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent => null,cascade => true,method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:17.51
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.11
SQL> set autot trace

SQL> SQL> sho autot
autotrace TRACEONLY EXPLAIN STATISTICS

Now we need to use the same statement that we used while creating the function-based index, in the where condition of our query to make it use the new index.
SQL> select * from t1 where (case when status = 'OPEN' then status else null end) = 'OPEN';

100 rows selected.

Elapsed: 00:00:00.03

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time    |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   100 | 13200 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   100 | 13200 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |   100 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(CASE  WHEN "STATUS"='OPEN' THEN "STATUS" ELSE NULL END ='OPEN')


Statistics
----------------------------------------------------------
          8  recursive calls
          0  db block gets
         19  consistent gets
          4  physical reads
          0  redo size
       4229  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

From the trace we see that this statement did 4 "physical reads", 19 "consistent gets" and completed in "00.03" seconds. These values are very close to the values we got from the test using the non function-based index before.
But what about the storage space that this new index uses, lets check.
SQL> set autot off
SQL> select BLEVEL, LEAF_BLOCKS, DISTINCT_KEYS, AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, NUM_ROWS from user_indexes where index_name = 'T1_01';

    BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
---------- ----------- ------------- -----------------------
AVG_DATA_BLOCKS_PER_KEY   NUM_ROWS
----------------------- ----------
         0           1             1                       1
                      2        100


Elapsed: 00:00:00.01
SQL> validate index t1_01;

Index analyzed.

Elapsed: 00:00:00.04
SQL> select HEIGHT,BLOCKS,LF_ROWS,LF_BLKS,BR_ROWS,BR_BLKS,DEL_LF_ROWS,DISTINCT_KEYS,USED_SPACE,PCT_USED,ROWS_PER_KEY,BLKS_GETS_PER_ACCESS from index_stats;

    HEIGHT     BLOCKS    LF_ROWS    LF_BLKS    BR_ROWS    BR_BLKS DEL_LF_ROWS
---------- ---------- ---------- ---------- ---------- ---------- -----------
DISTINCT_KEYS USED_SPACE   PCT_USED ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------- ---------- ---------- ------------ --------------------
         1          8        100          1          0          0           0
            1       1800         23          100                 51.5

Elapsed: 00:00:00.04
As we can see, this new function-based index has only 1 leaf blocks and takes only 1.8 KB storage space which is far less compared to the values of non function-based index which took around 18 MB.

Now we can go one step further, suppose instead of having a status column, we have two date columns named "date_to_be_processed" and "date_processed". Every row is inserted by a batch process with their date_to_be_processed column set to a specific date normally one day later and when tomorrow comes another job selects only the rows with their "date_to_be_processed" column set to this date and "date_processed" column is set to "NULL". In that case we are not interested in the column which their "date_processed" values set already. Below is the demonstration of that case.

SQL> alter table t1 drop column status;


Table altered.

Elapsed: 00:02:00.87

SQL> alter table t1 add (date_processed date);

Table altered.

Elapsed: 00:00:00.38
SQL> alter table t1 add (date_to_be_processed date);

Table altered.

Elapsed: 00:00:00.05
SQL> truncate table t1;

Table truncated.

Elapsed: 00:00:02.35

We insert 1,000,000 rows which their "date_processed" columns are set. Every day has 100 rows processed.
SQL> insert into t1
  2      select
  3        rownum id,
  4        dbms_random.string('U',6) v1,
  5        lpad(rownum,10) v2,
  6        rpad('x',100) padding,
  7        trunc(sysdate) - (10000 - trunc((rownum-1)/100)) date_processed,
      trunc(sysdate) - (10000 - trunc((rownum-1)/100)) date_to_be_processed
  8    9      from
 10       dual
 11     connect by level <= 1000000;


1000000 rows created.

Elapsed: 00:01:04.80
SQL> set pause on
SQL> select date_processed, count(*) from t1 group by date_processed order by date_processed;


DATE_PROC   COUNT(*)
--------- ----------
29-FEB-84        100
01-MAR-84        100
02-MAR-84        100
03-MAR-84        100
04-MAR-84        100
05-MAR-84        100
06-MAR-84        100
07-MAR-84        100
08-MAR-84        100
09-MAR-84        100
10-MAR-84        100
11-MAR-84        100
...
Ctrl+C
13 rows selected.

Elapsed: 00:00:05.70

SQL> select date_processed, count(*) from t1 group by date_processed order by date_processed desc;


DATE_PROC   COUNT(*)
--------- ----------
16-JUL-11        100
15-JUL-11        100
14-JUL-11        100
13-JUL-11        100
12-JUL-11        100
11-JUL-11        100
10-JUL-11        100
09-JUL-11        100
08-JUL-11        100
07-JUL-11        100
06-JUL-11        100
05-JUL-11        100
...
Ctrl+C
13 rows selected.

Elapsed: 00:00:04.60

Now insert some 100 rows which should be processed today with their "date_processed" columnsset to "NULL".
SQL>  insert into t1
  2      select
  3        rownum id,
  4        dbms_random.string('U',6) v1,
  5        lpad(rownum,10) v2,
  6        rpad('x',100) padding,
  7        null date_processed,
  8        trunc(sysdate) date_to_be_processed
  9      from
 10       dual
 11     connect by level <= 100;

100 rows created.

Elapsed: 00:00:00.04
SQL> commit;

Commit complete.

Elapsed: 00:00:00.00
SQL> select date_processed, count(*) from t1 group by date_processed order by date_processed desc;


DATE_PROC   COUNT(*)
--------- ----------
                 100
16-JUL-11        100
15-JUL-11        100
14-JUL-11        100
13-JUL-11        100
12-JUL-11        100
11-JUL-11        100
10-JUL-11        100
09-JUL-11        100
08-JUL-11        100
07-JUL-11        100
06-JUL-11        100
...
Ctrl+C
13 rows selected.

Elapsed: 00:00:07.02

Create a regular composite index on these columns as we could normally do.
SQL> create index t1_01 on t1(date_processed, date_to_be_processed);

Index created.

Elapsed: 00:00:04.99
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent => null,cascade => true,method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:24.62
SQL> set autot trace
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.07

Now try to query the rows that we are interested.
SQL> select * from t1 where date_processed is null and date_to_be_processed = trunc(sysdate);

100 rows selected.

Elapsed: 00:00:00.08

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time    |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   140 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |   140 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("DATE_PROCESSED" IS NULL AND
              "DATE_TO_BE_PROCESSED"=TRUNC(SYSDATE@!))
       filter("DATE_TO_BE_PROCESSED"=TRUNC(SYSDATE@!))


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         21  consistent gets
          5  physical reads
        188  redo size
      15117  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

SQL> select BLEVEL, LEAF_BLOCKS, DISTINCT_KEYS, AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, NUM_ROWS from user_indexes where index_name = 'T1_01'
  2

SQL> set autot off
SQL> select BLEVEL, LEAF_BLOCKS, DISTINCT_KEYS, AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, NUM_ROWS from user_indexes where index_name = 'T1_01';


    BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY
---------- ----------- ------------- -----------------------
AVG_DATA_BLOCKS_PER_KEY   NUM_ROWS
----------------------- ----------
         2        3774         10001                       1
                      2    1000100


Elapsed: 00:00:00.60
SQL> validate index t1_01;

Index analyzed.

Elapsed: 00:00:00.97
SQL> select HEIGHT,BLOCKS,LF_ROWS,LF_BLKS,BR_ROWS,BR_BLKS,DEL_LF_ROWS,DISTINCT_KEYS,USED_SPACE,PCT_USED,ROWS_PER_KEY,BLKS_GETS_PER_ACCESS from index_stats;


    HEIGHT     BLOCKS    LF_ROWS    LF_BLKS    BR_ROWS    BR_BLKS DEL_LF_ROWS
---------- ---------- ---------- ---------- ---------- ---------- -----------
DISTINCT_KEYS USED_SPACE   PCT_USED ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------- ---------- ---------- ------------ --------------------
         3       3968    1000100       3774       3773         15           0
        10001   27107816         90          100                 53.5


Elapsed: 00:00:04.46

It did 21 "consistent gets", 5 "physical reads" and it took "00.08" seconds to complete. The index we created has 3773 leaf blocks and it took about 27 MB storage space.

Now try a function-based index on only specific values we interested in. We tell Oracle that put into index "date_to_be_processed" values of only the rows which has "date_processed" columns are null.
SQL> set lines 120
SQL> drop index t1_01;

Index dropped.

Elapsed: 00:00:00.34
SQL> create index t1_01 on t1(case when date_processed is null then date_to_be_processed else null end);

Index created.

Elapsed: 00:00:02.22
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent => null,cascade => true,method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

Elapsed: 00:00:17.58
SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.03
SQL> set autot trace

SQL> alter system flush buffer_cache;

System altered.

Elapsed: 00:00:00.04
SQL> select * from t1 where (case when date_processed is null then date_to_be_processed else null end) = trunc(sysdate);

100 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |   100 | 14100 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |   100 | 14100 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |   100 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access(CASE  WHEN "DATE_PROCESSED" IS NULL THEN
              "DATE_TO_BE_PROCESSED" ELSE NULL END =TRUNC(SYSDATE@!))


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         17  consistent gets
          3  physical reads
          0  redo size
       4311  bytes sent via SQL*Net to client
        485  bytes received via SQL*Net from client
          8  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
        100  rows processed

SQL> set autot off
SQL> select BLEVEL, LEAF_BLOCKS, DISTINCT_KEYS, AVG_LEAF_BLOCKS_PER_KEY, AVG_DATA_BLOCKS_PER_KEY, NUM_ROWS from user_indexes where index_name = 'T1_01';


    BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY   NUM_ROWS
---------- ----------- ------------- ----------------------- ----------------------- ----------
         0           1             1                       1                       2        100

Elapsed: 00:00:00.98
SQL> validate index t1_01;

Index analyzed.

Elapsed: 00:00:00.18
SQL> select HEIGHT,BLOCKS,LF_ROWS,LF_BLKS,BR_ROWS,BR_BLKS,DEL_LF_ROWS,DISTINCT_KEYS,USED_SPACE,PCT_USED,ROWS_PER_KEY,BLKS_GETS_PER_ACCESS from index_stats;


    HEIGHT     BLOCKS    LF_ROWS    LF_BLKS    BR_ROWS    BR_BLKS DEL_LF_ROWS DISTINCT_KEYS USED_SPACE   PCT_USED
---------- ---------- ---------- ---------- ---------- ---------- ----------- ------------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
         1          8        100          1          0          0           0             1       1900         24
         100                 51.5


Elapsed: 00:00:00.78
SQL>
Now we see that it did 17 "consistent gets, 3 "physical reads" and it took just "00.02" seconds to get the results which is also a better value compared to "00.08" of previous normal index test, but when you compare the index structure, new index has only 1 leaf block which took 1.9 KB storage space which is far less compared to the 27 MB which was used in a normal index test.

These examples are suitable for very selective cases, I mean, in these tests you are not interested in the old historical values, you only need to select the new values or values which we are interested in. But especially, from the storage point of view, if the table will grow very huge to billion of rows and you will not query the old historical values very often then this method is very well suits your needs.

Friday, July 15, 2011

Effects of indexes on Oracle Cost Based Optimizer (CBO) execution plan selection

Followings are some tests that I performed on an Oracle 11gR2 database and my comments about the effects of the indexes on Oracle Cost Based Optimizer (CBO) execution plan selection. The results can change on different database versions or on the databases configured with different database parameters. Below are my output that I got at the time of my tests.

[oracle@localhost uural]$ sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Thu Jul 14 14:15:11 2011

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

First I create a test user
SQL> !more cre_user-U0001.sql
drop user u0001 cascade;
create user u0001 identified by u0001;
grant create session to u0001;
alter user u0001 quota 5G on users;
grant create table to u0001;
grant dba to u0001;
SQL> @cre_user-U0001

User dropped.


User created.


Grant succeeded.


User altered.


Grant succeeded.


Grant succeeded.

Then I connect with that test user
SQL> conn u0001/u0001
Connected.

Create a big sample table
SQL> create table t1 as
  2  select
  3    rownum id,
  4    dbms_random.string('U',6) v1,
  5    lpad(rownum,10) v2,
  6    rpad('x',100) padding
  7  from
  8    dual
connect by level <= 1000000;  9

Table created.
Create an index on id column
SQL> create index t1_01 on t1(id);

Index created.
gather table statistics with cascade option, which will gather index statistics also
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent => null,cascade => true,method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.

SQL> set lines 160

SQL> desc t1
 Name                                                                                      Null?    Type
 ----------------------------------------------------------------------------------------- -------- ------------------------------------------------------------
 ID                                                                                                 NUMBER
 V1                                                                                                 VARCHAR2(4000)
 V2                                                                                                 VARCHAR2(40)
 PADDING                                                                                            VARCHAR2(100)
set auto trace to explain only, we are only interested in the execution plans right now
SQL> set autot trace exp
SQL> sho autot
autotrace TRACEONLY EXPLAIN
When we try to select all columns with a condition using an indexed column, CBO decides to do an "INDEX RANGE SCAN" and after finding the related rowids of the rows matching the index key in where condition, then select the rows from table by using these rowids.
SQL> select * from t1 where id = 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   124 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |   124 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=100000)
When we select only the indexed column from the table CBO goes directly to the table and does a full table scan(TABLE ACCESS FULL). Although this column is indexed, it did not use the index for this sql, because there is no "NOT NULL" constraint on indexed column "ID" and there can be NULL values for this column in the table which they did not exist in the corresponding index, since the index keys consisting of null values are not kept in an index.
SQL> select id from t1;

Execution Plan
----------------------------------------------------------
Plan hash value: 3617692013

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |  1000K|  4882K|  4895   (1)| 00:00:59 |
|   1 |  TABLE ACCESS FULL| T1   |  1000K|  4882K|  4895   (1)| 00:00:59 |
--------------------------------------------------------------------------
This time we query again the indexed column by giving a not null condition, and CBO uses associated index. It does not go to table because indexed column value also exists in the index as index key so there is no need to go to the table.Although, actually we know that there is only one row matching the condition, CBO does not know it and it does a "INDEX RANGE SCAN" instead of "INDEX UNIQUE SCAN" which could only be an option if we created a unique index on "ID" column.
SQL> select id from t1 where id = 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2311469903

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |     1 |     5 |     3   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| T1_01 |     1 |     5 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ID"=100000)
Again for the count operation since there is a not null condition in the where clause CBO computes this count only using the index without going to the table.
SQL> select count(*) from t1 where id = 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 1142046246

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     5 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE   |       |     1 |     5 |            |          |
|*  2 |   INDEX RANGE SCAN| T1_01 |     1 |     5 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=100000)
Lets now set the indexed column to "NOT NULL" and see what will change.
SQL> alter table t1 modify id not null;

Table altered.
Now CBO knows that there is no NULL value for the indexed column in the table and since all the values of indexed column is also recorded in the corresponding index there is no need to go to table. CBO chooses a "INDEX FAST FULL SCAN" (index ffs) which is done by using multi block read (mbr) technique, and because of that, returning rows from this operation will not be ordered by indexed column value.
SQL> select id from t1;

Execution Plan
----------------------------------------------------------
Plan hash value: 93400700

------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |       |  1000K|  4882K|   609   (1)| 00:00:08 |
|   1 |  INDEX FAST FULL SCAN| T1_01 |  1000K|  4882K|   609   (1)| 00:00:08 |
------------------------------------------------------------------------------
This time we want the result in order and CBO chooses "INDEX FULL SCAN" which will read all the index keys from the beginning leaf block to the end leaf block in the order of indexed key and return them again without going to the table.
SQL> select id from t1 order by id;

Execution Plan
----------------------------------------------------------
Plan hash value: 2357002434

--------------------------------------------------------------------------
| Id  | Operation        | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT |       |  1000K|  4882K|  2235   (1)| 00:00:27 |
|   1 |  INDEX FULL SCAN | T1_01 |  1000K|  4882K|  2235   (1)| 00:00:27 |
--------------------------------------------------------------------------
This is very common execution plan CBO uses the index to get the rowids matching the criteria and then goes to table to fetch the matching rows by using the rowids got from the index.
SQL> select * from t1 where id = 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   124 |     4   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |   124 |     4   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |     1 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=100000)
This time CBO does the ordering by using the index although there is no where condition containing the indexed column. Since the order by column is an indexes column, and all the values of that column exist in the index tree because of the "NOT NULL" constraint on that column, CBO finds the rowids of all rows in order from the index and then gets the associated rows from the table by using these rowids.
SQL> select * from t1 order by id;

Execution Plan
----------------------------------------------------------
Plan hash value: 331831787

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  1000K|   118M| 20103   (1)| 00:04:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |  1000K|   118M| 20103   (1)| 00:04:02 |
|   2 |   INDEX FULL SCAN           | T1_01 |  1000K|       |  2235   (1)| 00:00:27 |
-------------------------------------------------------------------------------------
Because of the "NOT NULL" constraint CBO knows that all values of the indexed column are index tree and the count of them equals to the count of rows in the table. It does a "INDEX FAST FULL SCAN" by using multi block read operation.
SQL> select count(*) from t1;

Execution Plan
----------------------------------------------------------
Plan hash value: 442041690

-----------------------------------------------------------------------
| Id  | Operation             | Name  | Rows  | Cost (%CPU)| Time     |
-----------------------------------------------------------------------
|   0 | SELECT STATEMENT      |       |     1 |   609   (1)| 00:00:08 |
|   1 |  SORT AGGREGATE       |       |     1 |            |          |
|   2 |   INDEX FAST FULL SCAN| T1_01 |  1000K|   609   (1)| 00:00:08 |
-----------------------------------------------------------------------
Now make the indexed column unique
SQL> alter table t1 add constraint t1_c_01 unique (id);

Table altered.
As we can see CBO still chooses "INDEX RANGE SCAN", because although we made the indexed column UNIQUE, the index is still non-unique
SQL> select * from t1 where id = 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2362378154

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   124 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |   124 |     3   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T1_01 |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=100000)
Lets drop and recreate the index as a unique index
SQL> drop index t1_01;
drop index t1_01
           *
ERROR at line 1:
ORA-02429: cannot drop index used for enforcement of unique/primary key


SQL> alter table t1 drop constraint t1_c_01;

Table altered.

SQL> drop index t1_01;

Index dropped.

SQL> create unique index t1_01 on t1(id);

Index created.

SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent => null,cascade => true,method_opt => 'for all columns size 1');

PL/SQL procedure successfully completed.
Now we see that CBO decides to do a "INDEX UNIQUE SCAN" which will be cheaper than a "INDEX RANGE SCAN" in means of cost.
SQL> select * from t1 where id = 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2407846814

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |     1 |   124 |     3   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |     1 |   124 |     3   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | T1_01 |     1 |       |     2   (0)| 00:00:01 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ID"=100000)
Again CBO chooses "INDEX UNIQUE SCAN" without going to the table since all the indexed column values exists in the index because of the "NOT NULL" constrained on indexed column.
SQL> select id from t1 where id = 100000;

Execution Plan
----------------------------------------------------------
Plan hash value: 2394460924

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |     5 |     2   (0)| 00:00:01 |
|*  1 |  INDEX UNIQUE SCAN| T1_01 |     1 |     5 |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("ID"=100000)
Nothings changed in the execution plan after changing the index to a unique one.
SQL> select * from t1 order by id;

Execution Plan
----------------------------------------------------------
Plan hash value: 331831787

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  1000K|   118M| 19965   (1)| 00:04:00 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |  1000K|   118M| 19965   (1)| 00:04:00 |
|   2 |   INDEX FULL SCAN           | T1_01 |  1000K|       |  2097   (1)| 00:00:26 |
-------------------------------------------------------------------------------------
Lets give an example how CBO chooses to scan the index leaf blocks from the end to the beginning when we give an order by in descending order by using "INDEX FULL SCAN DESCENDING".
SQL> select * from t1 order by id desc;

Execution Plan
----------------------------------------------------------
Plan hash value: 2369582229

-------------------------------------------------------------------------------------
| Id  | Operation                   | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |       |  1000K|   118M| 19965   (1)| 00:04:00 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T1    |  1000K|   118M| 19965   (1)| 00:04:00 |
|   2 |   INDEX FULL SCAN DESCENDING| T1_01 |  1000K|       |  2097   (1)| 00:00:26 |
-------------------------------------------------------------------------------------
Again if we only select only the indexed column, column values comes from only the index without going to the table in descending order.
SQL> select id from t1 order by id desc;

Execution Plan
----------------------------------------------------------
Plan hash value: 1356827296

------------------------------------------------------------------------------------
| Id  | Operation                  | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |       |  1000K|  4882K|  2097   (1)| 00:00:26 |
|   1 |  INDEX FULL SCAN DESCENDING| T1_01 |  1000K|  4882K|  2097   (1)| 00:00:26 |
------------------------------------------------------------------------------------

SQL>

That's all folks.
I hope my demonstrations and comments can give you any little idea about how the Oracle CBO works and decides the execution plans in the presence of indexes.

Thursday, July 14, 2011

Oracle Heterogeneous Services (HS) problem with tables having lots of columns

Today, our Oracle data warehouse team got an error while querying a table having around 650 columns in an AS/400 database by using Oracle Heterogeneous Services or HS for short(It is also called HSODBC).

HSODBC is a bundled component in Oracle database installation package. It enables yo to query any non-Oracle database tables from within an Oracle database as if the remote non-Oracle database is an Oracle database. I am thinking to publish another blog post in the future about how to setup HSODBC which I will not cover today.

In this post, I only want to talk about an error in HSODBC and its solution since I could not find so much information about the solution of this problem today. I want to help anybody who will have that same problem in their HSODBC configuration. The HSODBC version we use is Oracle 10gR2 HSODBC. HSODBC parameter in different Oracle versions are different so the following solution is especially for Oracle 10gR2 HSODBC configuration. In our case, Oracle HSODBC server is a separate Windows machine which runs only Oracle HSODBC on it.

The sql statements giving error were like below.
select col1,col2,...,col650 from owner.table_name@database_link_using_hsodbc_to_as400_db;
or the following sql statement got also the same error
select * from owner.table_name@database_link_using_hsodbc_to_as400_db;

Error :
ORA-28500: connection from ORACLE to a non-Oracle system returned this message:
[Generic Connectivity Using ODBC][A048] The execution tree of this sql text is too big. Try to increase the variable 'firstTreeExtensions' in your environment file.
ORA-02063: preceding line from database_link_using_hsodbc_to_as400_db

The above error indicates that Oracle HSODBC has an internal limitation about the length of the selected data(consisting of column values in select statement) of a table coming from the non-Oracle database. We tried to select around 520 column from that table without getting this error but after adding some more columns to this list select, we started to get this error because we already reached that internal limitation of selected row size. In another document, we found that this internal default max size limit of selected row size, you can select from the non-Oracle database, is 150 Kilobytes.

So we started to find how to increase that parameter and after trying different configurations we were able to increase the value of this parameter in the configuration and got rid of the above error.

Solution :
Just add the following line to the "init<sid_name_of_as400_db_in_listener.ora_file>.ora" file in the "$ORACLE_HOME\hs\admin" folder. With this setting we increase the limit to 1000 KB.

HS_FDS_CONNECT_STRING="<navobj><binding><environment><queryprocessor firstTreeExtensions='1000'></queryProcessor></environment></binding></navobj>"

Then we need to reload the listener in HSODBC machine, like below so that our change in ".ora" configuration will take effect.

In a windows Command Prompt window in HSODBC machine do the following to reload the listener.

lsnrctl
LSNRCTL> reload


Now you need to create a new session to the Oracle database, since the connections connected before these changes made to the listener in HSODBC server, will not be effected by this change, because of that, it is very important to create a new session to Oracle database where we will execute our sql statement querying the big table in the non-Oracle database.
Then try to query all 650 columns of this table in non-Oracle database which is As/400 in our case.

select col1,col2,...,col650 from owner.table_name@database_link_using_hsodbc_to_as400_db;

Finally, you should successfully query all the 650 columns of this table.
I hope the same solution will help to your case also.

Monday, July 4, 2011

Oracle automatically keeps the old, historical table statistics

After Oracle version 10g, Oracle started to keep the old, historical table statistics in the database automatically so that you can use them, in case you need them later.
If a specific SQL starts to perform badly in the database and you realized that the reason is the statistics of the table changed a lot since the last statistics collected on that table, you can use the previous old statistics of this table by restoring them in a very short time, without any need to collect them from scratch which could take a long time.

Below is how you can accomplish to query and restore those old table statistics.

Firs connect to a sample schema in database.
[oracle@localhost uural]$ sqlplus u0001/u0001

SQL*Plus: Release 11.1.0.7.0 - Production on Mon Jul 4 04:03:56 2011

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL>

Then create a sample table and collect statistics on this table by using "dbms_stats.gather_table_stats" procedure.
SQL> create table t1 as
  2  select
  3          rownum                          id,
  4          dbms_random.string('U',6)       v1,
  5          lpad(rownum,10)                 v2,
  6          rpad('x',100)                   padding
  7  from
  8          all_objects
  9  where
 10          rownum <= 1000
 11  ;

Table created.
You can query the old statistics info in a view in the database called "user_tab_stats_history". Since there is no statistics collected on this table we will not see any rows returning from this view.
SQL> select * from user_tab_stats_history;

no rows selected
Now collect the statistics on the new table.
SQL> begin
  2          dbms_stats.gather_table_stats(
  3                  ownname          => user,
  4                  tabname          => 'T1',
  5                  cascade          => true,
  6                  estimate_percent => null,
  7                  granularity      => 'default',
  8                  method_opt       => 'for all columns size 1'
  9          );
 10  end;
 11  /

PL/SQL procedure successfully completed.
Now check the "num_rows" info of table in "user_tables" view.
SQL> select table_name, num_rows from user_tables where table_name = 'T1';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                   1000
Check the "user_tab_stats_history" view again.
SQL> set lines 250
SQL> r
  1* select * from user_tab_stats_history

TABLE_NAME                     PARTITION_NAME                 SUBPARTITION_NAME              STATS_UPDATE_TIME
------------------------------ ------------------------------ ------------------------------ ---------------------------------------------------------------------------
T1                                                                                           04-JUL-11 04.10.24.567434 AM -07:00

Now truncate the table and collect the statistics again, and check the "user_tab_stats_history" view.
SQL> truncate table t1;

Table truncated.

SQL> select table_name, num_rows from user_tables where table_name = 'T1';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                   1000

SQL> begin
        dbms_stats.gather_table_stats(
  2    3                  ownname          => user,
  4                  tabname          => 'T1',
  5                  cascade          => true,
  6                  estimate_percent => null,
  7                  granularity      => 'default',
  8                  method_opt       => 'for all columns size 1'
  9          );
 10  end;
 11  /

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows from user_tables where table_name = 'T1';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                      0

SQL> select * from user_tab_stats_history;

TABLE_NAME                     PARTITION_NAME                 SUBPARTITION_NAME              STATS_UPDATE_TIME
------------------------------ ------------------------------ ------------------------------ ---------------------------------------------------------------------------
T1                                                                                           04-JUL-11 04.10.24.567434 AM -07:00
T1                                                                                           04-JUL-11 04.16.39.947784 AM -07:00

This sample table has only 1000 rows being relatively small, but suppose you have a big table with 100 million rows in it, and it takes quiet a long time to collect the statistics. If this table is truncated and statistics refreshed, then 100 million rows are inserted back to this table and new table statistics are not refreshed yet and an SQL in database is performing bad because the statistics of this table is not up-to-date. In that case if you have the old statistics from the time that this table had 100 million rows you can try to restore that statistics and try to tune the SQL. Below is how you can restore the old statistics. First insert 1000 rows back to that table, check the "num_rows" column in "user_tables" view.
SQL> insert into t1
  2  select
  3          rownum                          id,
  4          dbms_random.string('U',6)       v1,
  5          lpad(rownum,10)                 v2,
  6          rpad('x',100)                   padding
  7  from
  8          all_objects
  9  where
 10          rownum <= 1000
 11  ;

1000 rows created.

SQL> commit;

Commit complete.

SQL> select count(*) from t1;

  COUNT(*)
----------
      1000

SQL> select table_name, num_rows from user_tables where table_name = 'T1';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                      0

Restore the old statistics back.
SQL> begin
  dbms_stats.restore_table_stats (ownname         => 'U0001',
  2    3                                    tabname         => 'T1',
  4                                    as_of_timestamp =>'04-JUL-11 04.10.24.567434 AM -07:00');
  5  end;
  6  /

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows from user_tables where table_name = 'T1';

TABLE_NAME                       NUM_ROWS
------------------------------ ----------
T1                                   1000

SQL> select * from user_tab_stats_history;

TABLE_NAME                     PARTITION_NAME                 SUBPARTITION_NAME              STATS_UPDATE_TIME
------------------------------ ------------------------------ ------------------------------ ---------------------------------------------------------------------------
T1                                                                                           04-JUL-11 04.10.24.567434 AM -07:00
T1                                                                                           04-JUL-11 04.16.39.947784 AM -07:00
T1                                                                                           04-JUL-11 04.31.24.129217 AM -07:00
As you can see, we restored the statistics of first row in the above "user_tab_stats_history" output, and Oracle created a new row stating that the statistics of the table has changed after the restore operation. Oracle keeps those old statistics for 31 days and after that automatically removes them which are older than 31 days. You can query that parameter "the history retention value" with the following SQL.
SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                         31
Below function gets the oldest time stamp where statistics history is available. Users cannot restore statistics to a time stamp older than the oldest time stamp.
SQL> select dbms_stats.get_stats_history_availability from dual;

GET_STATS_HISTORY_AVAILABILITY
---------------------------------------------------------------------------
03-JUN-11 12.46.58.682468000 AM -07:00
Automatic purging is enabled when STATISTICS_LEVEL parameter is set to TYPICAL or ALL. If automatic purging is disabled, the old versions of statistics need to be purged manually using the PURGE_STATS procedure. You can query that parameter like below.
[oracle@localhost uural]$ sqlplus / as sysdba

SQL*Plus: Release 11.1.0.7.0 - Production on Mon Jul 4 04:45:47 2011

Copyright (c) 1982, 2008, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options

SQL> sho parameter statistics_level

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
statistics_level                     string      TYPICAL
SQL>
You can change the history retention value like below. To run this procedure, you must have the SYSDBA or both ANALYZE ANY DICTIONARY and ANALYZE ANY system privileges. retention parameter : The retention time in days. The stats history will be retained for at least these many number of days.The valid range is [1,365000]. Also the following values can be used for special purposes. 0 - old stats are never saved. The automatic purge will delete all stats history -1 - stats history is never purged by automatic purge. null - change stats history retention to default value
SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                         31

SQL> exec dbms_stats.alter_stats_history_retention(retention => 45);

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                         45
Set it back to default value.
SQL> exec dbms_stats.alter_stats_history_retention(retention => null);

PL/SQL procedure successfully completed.

SQL> select dbms_stats.get_stats_history_retention from dual;

GET_STATS_HISTORY_RETENTION
---------------------------
                         31

SQL>