Home > Oracle > Don’t forget about column projection

Don’t forget about column projection

Note: this post is not about some exact statement, but about importance of column projection which should not be ignored, especially in cases as operations requiring workareas, data access optimization, Exadata offloading and others.

Let’s consider merge of two simple tables.

 merge into t1
 using t2 on (t1.id = t2.id)
  when matched 
  then update set n = 1;

The tables:

SQL> select * from v$version where rownum = 1;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

SQL > create table t1 compress
  2             as
  3         select rownum as id,
  4                0 as n,
  5                lpad('*', 4000, '*') as pad
  6           from dual
  7        connect by level <= 1000000;

Table created.

SQL > create table t2 compress
  2             as
  3         select 1000000 + rownum as id,
  4                1 as n,
  5                lpad('*', 4000, '*') as pad
  6           from dual
  7        connect by level <= 1100000;

Table created.

SQL > exec dbms_stats.gather_table_stats(user, 'T1', estimate_percent => 100, degree => 4);

PL/SQL procedure successfully completed.

SQL > exec dbms_stats.gather_table_stats(user, 'T2', estimate_percent => 100, degree => 4);

PL/SQL procedure successfully completed.

SQL > alter session set statistics_level=all;

Session altered.

And a simple statement:

SQL > merge into t1
  2   using t2 on (t1.id = t2.id)
  3    when matched 
  4    then update set n = 1;

0 rows merged.

Elapsed: 00:08:41.93

Let’s look at the execution plan with rowsource statistics:

SQL > select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

-------------------------------------
SQL_ID  b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id)  when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:08:41.66 |   52742 |    608K|    556K|  2047M|    56M|   30M (1)|    4346K|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:05.96 |   22174 |  22163 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:05.58 |   30568 |  30556 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"="T2"."ID")

So, our statement is executing more than 8 minutes. All this time has been consumed on the step 3 – HASH JOIN. Notice amount of used temp space 4346 Mb (this is old oddity that value in Used-Temp should be multiplied by 1000) as a result high amount of I/O operations on temp space (columns Reads, Writes – 556K of physical reads and 556K physical writes via direct path read temp and direct path write temp) due to size of workarea (hash_area_size) was not enough (Used-Mem = 30Mb).
Note: obviously in your case it can be absolutely another time and amount of I/O, in particular it depends on hash area size, in my case it was about 30Mb

How can we improve the performance of the statement and especially hash join?
Should we increase hash area size?

May be, but before let’s take look at the same execution plan with column projection:

SQL> select * from table(dbms_xplan.display_cursor('b5cp092vum9nw', null, 'allstats last +projection'));
 
-------------------------------------
SQL_ID  b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id)  when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:08:41.66 |   52742 |    608K|    556K|  2047M|    56M|   30M (1)|    4346K|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:05.96 |   22174 |  22163 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:05.58 |   30568 |  30556 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"="T2"."ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   3 - (#keys=1) "T1"."ID"[NUMBER,22], "T2"."ID"[NUMBER,22], "T1".ROWID[ROWID,10], "T1"."PAD"[VARCHAR2,4000], "T1"."N"[NUMBER,22], 
       "T2"."PAD"[VARCHAR2,4000], "T2"."N"[NUMBER,22]
   4 - "T1".ROWID[ROWID,10], "T1"."ID"[NUMBER,22], "T1"."N"[NUMBER,22], "T1"."PAD"[VARCHAR2,4000]
   5 - "T2"."ID"[NUMBER,22], "T2"."N"[NUMBER,22], "T2"."PAD"[VARCHAR2,4000]

Pay attention on column projection of the steps 3, 4, 5. It contains ALL columns from the tables T1 and T2 although the query uses only some of them. The part of the execution plan below of the step 2 is equivalent to

select t1.rowid,
       t1.id,
       t1.n,
       t1.pad,
       t2.id,
       t2.n,
       t2.pad
  from t1, t2
 where t1.id = t2.id

It looks like a bug or “not implemented yet”. In our case it means that all columns from the table T1 (especially the fat column PAD) will be retrieved and stored in the workarea.

This can dramatically affect the performance:

  • For operations using workareas it means increased amount of dataset. If required amount of available memory is not enough then it leads to additional I/O in temp space.
    In our particular case all columns from the table T1 are retrieved and put into the workarea. If the condition in the statement would be something like

    t1.id between t2.id and t2.id + 100
    

    then Hash Join would be impossible, because it can be based on “equality” condition. In that case we can expect Merge Join and then both datasests must be sorted.

    SQL> explain plan for
      2  merge into t1
      3  using t2
      4  on (t1.id between t2.id and t2.id + 100)
      5  when matched then
      6      update set n = 1;
     
    Explained
     
    SQL> @plan
     
    ---------------------------------------------------------------------------------------
    | Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
    ---------------------------------------------------------------------------------------
    |   0 | MERGE STATEMENT        |      |     1 |     2 |       |  1766K  (1)| 05:53:21 |
    |   1 |  MERGE                 | T1   |       |       |       |            |          |
    |   2 |   VIEW                 |      |       |       |       |            |          |
    |   3 |    MERGE JOIN          |      |     1 |  8018 |       |  1766K  (1)| 05:53:21 |
    |   4 |     SORT JOIN          |      |  1000K|  3822M|  7812M|   840K  (1)| 02:48:06 |
    |   5 |      TABLE ACCESS FULL | T1   |  1000K|  3822M|       |  4928   (1)| 00:01:00 |
    |*  6 |     FILTER             |      |       |       |       |            |          |
    |*  7 |      SORT JOIN         |      |  1100K|  4206M|  8593M|   926K  (1)| 03:05:16 |
    |   8 |       TABLE ACCESS FULL| T2   |  1100K|  4206M|       |  6779   (1)| 00:01:22 |
    ---------------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       6 - filter("T1"."ID"<="T2"."ID"+100)
       7 - access(INTERNAL_FUNCTION("T1"."ID")>=INTERNAL_FUNCTION("T2"."ID"))
           filter(INTERNAL_FUNCTION("T1"."ID")>=INTERNAL_FUNCTION("T2"."ID"))
    
  • It affects data access optimization. If there would be indexes containing all columns using in a query as t1(id, n) or t2(id,n), then Index Fast Full Scan will be impossible here because the indexes do not contain required redundant columns (as PAD).
    If there would be a condition doing Index Range Scan appropriate access path then TABLE ACCESS BY INDEX ROWID would exist to get rest of unnecessary columns.
  • If a row is chained then it leads to additional logical (and maybe physical) I/O to read redundant columns
  • If the query would be run on Exadata and one or both Full Table Scan would be offloading, then there would no Column Projection optimization, returned amount of data from Storage Cells to PGA would be redundant and contains unnecessary columns from a table.
  • etc
  • It is what can happen if we ignore Column Projection.

    Our particular case can be fixed easy

    SQL >  merge into (select id, n from t1) t1
      2    using (select id, n from t2) t2 on (t1.id = t2.id)
      3     when matched
      4     then update set n = 1;
    
    0 rows merged.
    
    Elapsed: 00:00:20.09
    

    upd note: (select id, n from t2) is redundant here. It could be (select id from t2).

    The execution plan with rowsurce statistics and column projection:

    ----------------------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
    ----------------------------------------------------------------------------------------------------------------------------------------------
    |   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
    |   1 |  MERGE               | T1   |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
    |   2 |   VIEW               |      |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
    |*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:00:19.83 |   52742 |  35651 |   2032 |    36M|  6589K|   34M (1)|   16384 |
    |   4 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:11.77 |   30568 |  28841 |      0 |       |       |          |         |
    |   5 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:04.44 |   22174 |   4778 |      0 |       |       |          |         |
    ----------------------------------------------------------------------------------------------------------------------------------------------
    Predicate Information (identified by operation id):
    ---------------------------------------------------
       3 - access("ID"="ID")
    Column Projection Information (identified by operation id):
    -----------------------------------------------------------
       1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
       3 - (#keys=1) "ID"[NUMBER,22], "ID"[NUMBER,22], "N"[NUMBER,22], "T1".ROWID[ROWID,10], "N"[NUMBER,22]
       4 - "ID"[NUMBER,22], "N"[NUMBER,22]
       5 - "T1".ROWID[ROWID,10], "ID"[NUMBER,22], "N"[NUMBER,22]
    

    Now column projection does not contain unnecessary columns, amount of required memory and as a result amount of temp space (Used-Tmp ~ 16Mb) and I/O (2032 physical reads and 2032 physical writes via direct path read temp and direct path write temp) was significantly reduced, and the statement was executed about 20 seconds instead of 8 minutes.

    Notice, that in this case Oracle takes the table T2 as build input. It is because RowID from the table T1 is required to perform MERGE.
    The part of the execution plan below of the step 2 now looks like

    select t1.id, 
           t1.n, 
           t1.rowid, 
           t2.id, 
           t2.n
      from t1, t2
     where t1.id = t2.id
    

    It estimates that amount of resultset

    1100K rows * ("ID"[NUMBER,22], "N"[NUMBER,22])
    

    less than

    1000K rows * ("T1".ROWID[ROWID,10], "ID"[NUMBER,22], "N"[NUMBER,22])
    

    and that resultset requires less amount of workarea.
    Thus it is reasonable to use T2 as build input here.

    About these ads
  1. July 18, 2012 at 6:27 pm

    Very interesting. We use quite a few merge statements in our app. I’ll have to play around with this to see if we can get some performance gains.
    Have you tested this against any other database versions?

    • July 18, 2012 at 10:33 pm

      Hi Christoph,
      thanks for visiting my blog.

      The problem with merge is reproduced in 10.2 and cannot be fixed the same fashion as in 11.2, but can be fixed partially:

      SQL> select * from v$version where rownum = 1;
       
      BANNER
      ----------------------------------------------------------------
      Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi
      
      SQL> merge into (select id, n from t1) t1
        2  using t2 on (t1.id = t2.id)
        3   when matched
        4   then update set n = 1;
       
      merge into (select id, n from t1) t1
      using t2 on (t1.id = t2.id)
       when matched
       then update set n = 1
       
      ORA-00903: invalid table name
      
      SQL> explain plan for
        2  merge into t1
        3       using (select id, n from t2) t2 on (t1.id = t2.id)
        4        when matched
        5        then update set n = 1;
       
      Explained
       
      SQL> select * from table(dbms_xplan.display(null,null,'basic +projection'));
       
      -------------------------------------
      | Id  | Operation            | Name |
      -------------------------------------
      |   0 | MERGE STATEMENT      |      |
      |   1 |  MERGE               | T1   |
      |   2 |   VIEW               |      |
      |   3 |    HASH JOIN         |      |
      |   4 |     TABLE ACCESS FULL| T2   |
      |   5 |     TABLE ACCESS FULL| T1   |
      -------------------------------------
      Column Projection Information (identified by operation id):
      -----------------------------------------------------------
         1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[76], SYSDEF[32720]
         3 - (#keys=1) "ID"[NUMBER,22], "T1"."ID"[NUMBER,22], "N"[NUMBER,22],
             "T1".ROWID[ROWID,10], "T1"."PAD"[VARCHAR2,4000], "T1"."N"[NUMBER,22]
         4 - "ID"[NUMBER,22], "N"[NUMBER,22]
         5 - "T1".ROWID[ROWID,10], "T1"."ID"[NUMBER,22], "T1"."N"[NUMBER,22],
             "T1"."PAD"[VARCHAR2,4000]
      

      In the second case amount of data from T2 is less than from T1, as the result in this exact case Oracle uses T2 as probe input for hash join.

      UPD:
      (select id, n from t2) is redundant here. It could be
      (select id from t2).

  2. Igor Usoltsev
    September 12, 2012 at 9:19 pm

    Thanks, Alexader
    very interesting and useful finding!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 80 other followers

%d bloggers like this: