Views expressed here are solely that of my own. Please make sure that you test the code/queries that appear on my blog before applying them to the production environment.

Friday, July 22, 2011

Why we should put index creation statements at the end of the table transfer scripts

Try to put the index creation statements at the end of the table transfer scritps, after the table data transfer part. It does not only increase the performance of the script in data insert part but also results in a better utilized index structure. Creating the index first and inserting table data causes different index structure to be created than inserting the table data and then creating the index, if your index key is not a unique value and inserted into the table in sequential order. Second option will be most probably faster and will cause the index blocks better utilized.

Below is a demonstration showing that case.

Lets start with creating a sample schema. You can find the contents of the "cre_user-U0001.sql" in my previous blog posts.
[oracle@localhost uural]$ sqlplus / as sysdba
 
SQL*Plus: Release 11.1.0.7.0 - Production on Thu Jul 21 01:02:47 2011
 
Copyright (c) 1982, 2008, Oracle.  All rights reserved.
 

Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
 
SQL> @cre_user-U0001
 
User dropped.
 

User created.
 

Grant succeeded.
 

User altered.
 

Grant succeeded.
 

Grant succeeded.
 
Connected.
USER is "U0001"
SQL> set lines 120
SQL> !vi check_index_stats.sql
 
set lines 120
select index_name, blevel, leaf_blocks, distinct_keys, avg_leaf_blocks_per_key, avg_data_blocks_per_key, num_rows, clustering_factor
from user_indexes
where index_name = '&&index_name';
validate index &&index_name;
select name, height, blocks, lf_rows, lf_blks, lf_rows_len, lf_blk_len, br_rows, br_blks, br_rows_len, br_blk_len, del_lf_rows, del_lf_rows_len, distinct_keys, most_repeated_key, b
tree_space, used_space, pct_used, rows_per_key, blks_gets_per_access
from index_stats
where name = '&&index_name';
undefine index_name
--set lines 80
 
~
~

Now create an empty table and an index on its id column.
SQL> create table t1 tablespace ts_no_assm
  2  as
  3  select rownum id
  4  from dual
  5  where null is not null;
 
Table created.
 
SQL> create index t1_01 on t1(id) tablespace ts_no_assm;
 
Index created.

Then insert some 100,000 rows into that table, with id column values are not unique and sequential. You can think of that operation like you are transferring the data from another database table to "t1" table.
SQL> insert into t1
  2  select mod(rownum,1000) id
  3  from dual
  4  connect by level <= 100000;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent=>null, cascade=>true, method_opt=>'for all columns size 1');
 
PL/SQL procedure successfully completed.

Check the index stats
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   1         252          1000                       1                     100
    100000            100000
 

old   1: validate index &&index_name
new   1: validate index T1_01
 
Index analyzed.
 
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'
 
NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   2        256     100000        252     1489000       7996        251          1
       3314       8028           0               0          1000               100     2023020    1492314         74
         100                 52.5
As you can see in the index stats above, index has grown to 252 leaf blocks which are %74 full on average and index size is around 2 MB.

Now try the second case by first inserting the rows than creating the index.
SQL> drop table t1;
 
Table dropped.

SQL> create table t1 tablespace ts_no_assm
  2  as
  3  select rownum id
  4  from dual
  5  where null is not null;
 
Table created.
 
SQL> insert into t1
  2  select mod(rownum,1000) id
  3  from dual
  4  connect by level <= 100000;
 
100000 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> create index t1_01 on t1(id) tablespace ts_no_assm;
 
Index created.
 
SQL> exec dbms_stats.gather_table_stats('','t1',estimate_percent=>null, cascade=>true, method_opt=>'for all columns size 1');
 
PL/SQL procedure successfully completed.
 
SQL> @check_index_stats
Enter value for index_name: T1_01
old   3: where index_name = '&&index_name'
new   3: where index_name = 'T1_01'
 
INDEX_NAME                         BLEVEL LEAF_BLOCKS DISTINCT_KEYS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY
------------------------------ ---------- ----------- ------------- ----------------------- -----------------------
  NUM_ROWS CLUSTERING_FACTOR
---------- -----------------
T1_01                                   1         208          1000                       1                     100
    100000            100000
 

old   1: validate index &&index_name
new   1: validate index T1_01
 
Index analyzed.
 
old   3: where name = '&&index_name'
new   3: where name = 'T1_01'
 
NAME                               HEIGHT     BLOCKS    LF_ROWS    LF_BLKS LF_ROWS_LEN LF_BLK_LEN    BR_ROWS    BR_BLKS
------------------------------ ---------- ---------- ---------- ---------- ----------- ---------- ---------- ----------
BR_ROWS_LEN BR_BLK_LEN DEL_LF_ROWS DEL_LF_ROWS_LEN DISTINCT_KEYS MOST_REPEATED_KEY BTREE_SPACE USED_SPACE   PCT_USED
----------- ---------- ----------- --------------- ------------- ----------------- ----------- ---------- ----------
ROWS_PER_KEY BLKS_GETS_PER_ACCESS
------------ --------------------
T1_01                                   2        256     100000        208     1489000       7996        207          1
       3066       8028           0               0          1000               100     1671196    1492066         90
         100                 52.5

As you can see now, index has grown to 208 leaf blocks which are %90 full on average and index size is around 1.7 MB.

As a result, putting the index creation statements at the end of a table transfer script, does not only makes the data insert operation faster, since the database will not have to deal with creating the index tree, but also makes the index well organized by utilizing the leaf blocks better with %90 average usage because of 90-10 index block splits.

The reason for this result is, when you first create the index and then insert the rows, the indexed column values are inserted into the table not in sequential order because of the "mod(rownum, 1000)" function because the same value was inserted again for every 1,000 rows causing 50-50 index leaf block splits. This caused the index allocate more blocks for its leaf blocks making the index larger. Although in the second scenario, since all the rows were already inserted to the table, when we created the index on id column, Oracle selected the id column values from table and sort them before inserting into the index tree causing only 90-10 index leaf block splits.

If the id column were unique and inserted into table in sequential order, it would not make much difference creating the index first from the index structure point of view, but anyway insert operation will be faster if you create the index as a last step.

At least, if you can not drop the existing index on an existing table and then create it after the insert operation, you can use the method of making the existing index "UNUSABLE" then "REBUILD" it after the insert operation which will be faster and more effective related the index structure.

In an OLTP system, which the insert, update, delete operations will start to happen after you transferred the table, index structure will soon do some 50-50 index leaf block splits and return to a structure with less utilized leaf blocks, but in a datawarehouse-like database where mostly insert and select statements happen, you can use this method more effectively.

No comments: