Alexander Anokhin

_serial_direct_read is not a criteria for cursor [non-]sharing

October 22, 2018 Leave a comment

Those who are on Exadata and change _SERIAL_DIRECT_READ on session or system level may be interested in the issue described below. The issue exists in all versions from 11g and above.
Read more…

filter IS NOT NULL

November 16, 2013 14 comments

Introduction
Although Oracle is smart enough it is always helpful to double check. Even in cases where you expect that Oracle aware about nulls and it will apply necessary filters itself, result can be unexpected.
Below are several cases where you would get a benefit from manually added filter which could look unnecessary and redundant.
Read more…

Timing: rowsource statistics. Part 2: Overhead and inconsistent time

December 24, 2012 8 comments

Timing: query execution statistics (rowsource statistics). Part 1: How it works

December 24, 2012 6 comments

This is the first (but not last) blogpost of my explanation of timing. This stuff is about timing in query execution statistics which also called rowsource statistics.

Here I am explaining

why reported actual time (“A-Time” or “time=” in sql trace) can be inconsistent,
why execution time with statistics_level=all can be longer,
and how exactly it works.

This stuff covers versions 10g and later if another is not mentioned.

Merry Christmas and Happy New Year 2013! 😉

The Digger: additional notes. DTrace output can be shufled in multi-CPU environment.

November 5, 2012 Leave a comment

Below are important additions about the Digger tool.

1. DTrace output can be shufled in multi-CPU environment.

This means that output can be in not chronological order. It is not something Digger specific, it is how DTrace works.

When DTrace script is being executed there are two parts to DTrace: the in kernel part and the DTrace script process in userland. When a dtrace probe fires, the data traced is placed in a per CPU buffer. Then, periodically, the DTrace script reads the buffers (in round-robin style) and continues processing the data for final output. While the data from any single CPU was entered into its buffer in order the probes are firing asynchronously with respect to all the CPUs.

If probe1 fired on CPU3 and after that probe2 fired on CPU2, and after that probe3 fired on CPU1 it is possible to get output as, for example, as following:
probe3
probe2
probe1

The most common case where we can see this are the cases when a process(thread) migrated from one CPU to another and DTrace consumer reads and prints current CPU-buffer before previous. It does not happen always but sometimes you can see it.

So, do not panic if some part of the output is not chronological in multi-CPU environment. It is expected.

If the output order is important, then it is strongly recommended to print such columns as timestamp or relative timestamp in order to be sure that the output is chronological, otherwise to fix it manually or using tools like sort (1). Column CPU# also is helpful because allows to see places where a process(thread) migrated from one CPU to another.

2. How much is performance impact of the Digger?

It depends on the usage. When you trace everything – all application functions or all library calls – performance impact can be significant and amount of printed info can be huge.

The Digger allows to restrict
a) traced area – where tracing is enabled
b) traced contents – which functions are traced

For example, in order to trace kcb* functions and all syscalls *read* inside function qertbFetch
command line should be:

digger.sh -p PID -Fcdeo -f qertbFetch -t kcb* -s *read*

Then generated dtrace script will be containing only required probes:

pid$target:a.out:qertbFetch:entry
pid$target:a.out:qertbFetch:return
+
pid$target:a.out:kcb*:entry
pid$target:a.out:kcb*:return
+
syscall::*read*:entry
syscall::*read*:entry

Note: “a.out” is synonym of executing binary.

It helps to trace only relevant information, only required functions. It decreases performance impact and decreases probability of drops.
So, if you need to decrease impact of the tracing just try to restrict traced area and contents.

3. Important bug is fixed.
There was an issue related with replacement tab symbols on spaces during generating html wordpress page with source code (thanks Zhenx Li for let me know). The issue led to possible error “syntax error at line n: `end of file’ unexpected” when the source of the tool was copied/pasted. Now links on html pages with source are replaced on direct download links.

Buffer is pinned count

July 26, 2012 2 comments

Introduction

There are many cases where Oracle revisit some buffer in the buffer cache many times inside one database call. It such cases it can pin the buffer and hold the buffer pinned and just read pinned buffer in consequences visits. It allows to avoid redundant logical reads.

There are statistics “buffer is pinned count” and “buffer is not pinned count”.
The concept is simple Read more…

Digger: the tool for tracing of unix processes

July 26, 2012 Leave a comment

I would like to introduce new tool – Digger.
This tool allows to see tree of process’ calls such as application calls, library, system calls (and even kernel functions and OS scheduler actions) with additional information as function arguments, result, cpu & elapsed time.
This is not something Oracle specific, the tool can be used for tracing of any unix process (DTrace is required).

Continue…

Don’t forget about column projection

July 18, 2012 6 comments

Note: this post is not about some exact statement, but about importance of column projection which should not be ignored, especially in cases as operations requiring workareas, data access optimization, Exadata offloading and others.

Let’s consider merge of two simple tables.

 merge into t1
 using t2 on (t1.id = t2.id)
  when matched 
  then update set n = 1;

The tables:

SQL> select * from v$version where rownum = 1;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

SQL > create table t1 compress
  2             as
  3         select rownum as id,
  4                0 as n,
  5                lpad('*', 4000, '*') as pad
  6           from dual
  7        connect by level <= 1000000;

Table created.

SQL > create table t2 compress
  2             as
  3         select 1000000 + rownum as id,
  4                1 as n,
  5                lpad('*', 4000, '*') as pad
  6           from dual
  7        connect by level <= 1100000;

Table created.

SQL > exec dbms_stats.gather_table_stats(user, 'T1', estimate_percent => 100, degree => 4);

PL/SQL procedure successfully completed.

SQL > exec dbms_stats.gather_table_stats(user, 'T2', estimate_percent => 100, degree => 4);

PL/SQL procedure successfully completed.

SQL > alter session set statistics_level=all;

Session altered.

And a simple statement:

SQL > merge into t1
  2   using t2 on (t1.id = t2.id)
  3    when matched 
  4    then update set n = 1;

0 rows merged.

Elapsed: 00:08:41.93

Let’s look at the execution plan with rowsource statistics:

SQL > select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

-------------------------------------
SQL_ID  b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id)  when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:08:41.66 |   52742 |    608K|    556K|  2047M|    56M|   30M (1)|    4346K|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:05.96 |   22174 |  22163 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:05.58 |   30568 |  30556 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"="T2"."ID")

So, our statement is executing more than 8 minutes. All this time has been consumed on the step 3 – HASH JOIN. Notice amount of used temp space 4346 Mb (this is old oddity that value in Used-Temp should be multiplied by 1000) as a result high amount of I/O operations on temp space (columns Reads, Writes – 556K of physical reads and 556K physical writes via direct path read temp and direct path write temp) due to size of workarea (hash_area_size) was not enough (Used-Mem = 30Mb).
Note: obviously in your case it can be absolutely another time and amount of I/O, in particular it depends on hash area size, in my case it was about 30Mb

How can we improve the performance of the statement and especially hash join?
Should we increase hash area size?

May be, but before let’s take look at the same execution plan with column projection:

SQL> select * from table(dbms_xplan.display_cursor('b5cp092vum9nw', null, 'allstats last +projection'));
 
-------------------------------------
SQL_ID  b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id)  when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:08:41.66 |   52742 |    608K|    556K|  2047M|    56M|   30M (1)|    4346K|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:05.96 |   22174 |  22163 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:05.58 |   30568 |  30556 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."ID"="T2"."ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   3 - (#keys=1) "T1"."ID"[NUMBER,22], "T2"."ID"[NUMBER,22], "T1".ROWID[ROWID,10], "T1"."PAD"[VARCHAR2,4000], "T1"."N"[NUMBER,22], 
       "T2"."PAD"[VARCHAR2,4000], "T2"."N"[NUMBER,22]
   4 - "T1".ROWID[ROWID,10], "T1"."ID"[NUMBER,22], "T1"."N"[NUMBER,22], "T1"."PAD"[VARCHAR2,4000]
   5 - "T2"."ID"[NUMBER,22], "T2"."N"[NUMBER,22], "T2"."PAD"[VARCHAR2,4000]

Pay attention on column projection of the steps 3, 4, 5. It contains ALL columns from the tables T1 and T2 although the query uses only some of them. The part of the execution plan below of the step 2 is equivalent to

select t1.rowid,
       t1.id,
       t1.n,
       t1.pad,
       t2.id,
       t2.n,
       t2.pad
  from t1, t2
 where t1.id = t2.id

It looks like a bug or “not implemented yet”. In our case it means that all columns from the table T1 (especially the fat column PAD) will be retrieved and stored in the workarea.

This can dramatically affect the performance:

For operations using workareas it means increased amount of dataset. If required amount of available memory is not enough then it leads to additional I/O in temp space.
In our particular case all columns from the table T1 are retrieved and put into the workarea. If the condition in the statement would be something like

t1.id between t2.id and t2.id + 100

then Hash Join would be impossible, because it can be based on “equality” condition. In that case we can expect Merge Join and then both datasests must be sorted.

SQL> explain plan for
  2  merge into t1
  3  using t2
  4  on (t1.id between t2.id and t2.id + 100)
  5  when matched then
  6      update set n = 1;
 
Explained
 
SQL> @plan
 
---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT        |      |     1 |     2 |       |  1766K  (1)| 05:53:21 |
|   1 |  MERGE                 | T1   |       |       |       |            |          |
|   2 |   VIEW                 |      |       |       |       |            |          |
|   3 |    MERGE JOIN          |      |     1 |  8018 |       |  1766K  (1)| 05:53:21 |
|   4 |     SORT JOIN          |      |  1000K|  3822M|  7812M|   840K  (1)| 02:48:06 |
|   5 |      TABLE ACCESS FULL | T1   |  1000K|  3822M|       |  4928   (1)| 00:01:00 |
|*  6 |     FILTER             |      |       |       |       |            |          |
|*  7 |      SORT JOIN         |      |  1100K|  4206M|  8593M|   926K  (1)| 03:05:16 |
|   8 |       TABLE ACCESS FULL| T2   |  1100K|  4206M|       |  6779   (1)| 00:01:22 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   6 - filter("T1"."ID"<="T2"."ID"+100)
   7 - access(INTERNAL_FUNCTION("T1"."ID")>=INTERNAL_FUNCTION("T2"."ID"))
       filter(INTERNAL_FUNCTION("T1"."ID")>=INTERNAL_FUNCTION("T2"."ID"))

It affects data access optimization. If there would be indexes containing all columns using in a query as t1(id, n) or t2(id,n), then Index Fast Full Scan will be impossible here because the indexes do not contain required redundant columns (as PAD).
If there would be a condition doing Index Range Scan appropriate access path then TABLE ACCESS BY INDEX ROWID would exist to get rest of unnecessary columns.
If a row is chained then it leads to additional logical (and maybe physical) I/O to read redundant columns
If the query would be run on Exadata and one or both Full Table Scan would be offloading, then there would no Column Projection optimization, returned amount of data from Storage Cells to PGA would be redundant and contains unnecessary columns from a table.
etc

It is what can happen if we ignore Column Projection.

Our particular case can be fixed easy

SQL >  merge into (select id, n from t1) t1
  2    using (select id, n from t2) t2 on (t1.id = t2.id)
  3     when matched
  4     then update set n = 1;

0 rows merged.

Elapsed: 00:00:20.09

upd note: (select id, n from t2) is redundant here. It could be (select id from t2).

The execution plan with rowsurce statistics and column projection:

----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:00:19.83 |   52742 |  35651 |   2032 |    36M|  6589K|   34M (1)|   16384 |
|   4 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:11.77 |   30568 |  28841 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:04.44 |   22174 |   4778 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("ID"="ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   3 - (#keys=1) "ID"[NUMBER,22], "ID"[NUMBER,22], "N"[NUMBER,22], "T1".ROWID[ROWID,10], "N"[NUMBER,22]
   4 - "ID"[NUMBER,22], "N"[NUMBER,22]
   5 - "T1".ROWID[ROWID,10], "ID"[NUMBER,22], "N"[NUMBER,22]

Now column projection does not contain unnecessary columns, amount of required memory and as a result amount of temp space (Used-Tmp ~ 16Mb) and I/O (2032 physical reads and 2032 physical writes via direct path read temp and direct path write temp) was significantly reduced, and the statement was executed about 20 seconds instead of 8 minutes.

Notice, that in this case Oracle takes the table T2 as build input. It is because RowID from the table T1 is required to perform MERGE.
The part of the execution plan below of the step 2 now looks like

select t1.id, 
       t1.n, 
       t1.rowid, 
       t2.id, 
       t2.n
  from t1, t2
 where t1.id = t2.id

It estimates that amount of resultset

1100K rows * ("ID"[NUMBER,22], "N"[NUMBER,22])

less than

1000K rows * ("T1".ROWID[ROWID,10], "ID"[NUMBER,22], "N"[NUMBER,22])

and that resultset requires less amount of workarea.
Thus it is reasonable to use T2 as build input here.

A small change in parallel insert with serial data access between 11.1 and 11.2

June 22, 2012 5 comments

There is an interesting small change in parallel insert between 11.1 and 11.2 in queries with parallelized insert, but serialized data access. On a practice the simplest case where we can see this it is CTAS or parallel insert as select from remote table. Read more…

Serial direct read for small tables in 11.2.0.2

May 21, 2012 7 comments

Today I have fixed an issue related with serial direct path reads.

There is 11.2.0.2 database

SQL> select * from v$version where rownum = 1;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
 
SQL> SELECT a.ksppinm  "Parameter",
  2         b.ksppstvl "Session Value",
  3         c.ksppstvl "Instance Value"
  4    FROM x$ksppi a, x$ksppcv b, x$ksppsv c
  5   WHERE a.indx = b.indx
  6     AND a.indx = c.indx
  7     AND a.ksppinm = '_serial_direct_read';
 
Parameter               Session Value           Instance Value
----------------------- ----------------------- -----------------------
_serial_direct_read     auto                    auto

There is a very small table, let’s call it dualcopy.

SQL> exec dbms_stats.gather_table_stats(user, 'dualcopy', estimate_percent => 100);
 
PL/SQL procedure successfully completed
 
SQL> select object_type,
  2         num_rows,
  3         blocks,
  4         empty_blocks
  5    from all_tab_statistics
  6   where table_name = 'DUALCOPY';
 
OBJECT_TYPE    NUM_ROWS     BLOCKS EMPTY_BLOCKS
------------ ---------- ---------- ------------
TABLE                 1          1            0

Let’s try to do select from the table with enabled 10046 event:

SQL> alter session set events '10046 trace name context forever, level 8';
 
Session altered

SQL> select * from dualcopy;
 
DUMMY
-----
X

Below is an excerpt from the trace file

PARSING IN CURSOR #18446744071497556472 len=24 dep=0 uid=2500 oct=3 lid=2500 tim=3988723224325 hv=250290826 ad='3e911e388' sqlid='g32a9w87fq8na'
select * from dualcopy
END OF STMT
PARSE #18446744071497556472:c=20000,e=26430,p=0,cr=27,cu=0,mis=1,r=0,dep=0,og=2,plh=769194902,tim=3988723224324
EXEC #18446744071497556472:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,plh=769194902,tim=3988723224417
WAIT #18446744071497556472: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3988723224541
WAIT #18446744071497556472: nam='SQL*Net message from client' ela= 66047 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3988723290684
WAIT #18446744071497556472: nam='direct path read' ela= 7852 file number=705 first dba=23954 block cnt=1 obj#=2229858 tim=3988723301159
WAIT #18446744071497556472: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=2229858 tim=3988723301360
FETCH #18446744071497556472:c=10000,e=10521,p=1,cr=2,cu=0,mis=0,r=1,dep=0,og=2,plh=769194902,tim=3988723301437
STAT #18446744071497556472 id=1 cnt=1 pid=0 pos=1 obj=2229858 op='TABLE ACCESS FULL DUALCOPY (cr=2 pr=1 pw=0 time=10419 us cost=2 size=2 card=1)'

What?! serial direct path read for the table in one block?! What’s going on?

The table can be small, but if your application accesses it many times it can lead to significant amount of physical I/O.

Serial direct read mode (_serial_direct_read) is a property of a child cursor. Similar behavior is possible if the same query was parsed earlier in a session with ALWAYS or TRUE mode (_serial_direct_read=always or _serial_direct_read=true). If we pay attention on the parse call, it was a hard parsing (mis=1) and new child cursor. Thus, this is not the reason in our case.

The reason is that the table has been configured for using of KEEP pool, but database has not.

SQL> select table_name, buffer_pool
  2    from all_tables
  3   where table_name = 'DUALCOPY';
 
TABLE_NAME                     BUFFER_POOL
------------------------------ -----------
DUALCOPY                       KEEP

SQL> show parameter db_keep_cache_size
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_keep_cache_size                   big integer 0

This is the bug 12530276. Oracle 11.2.0.2 treats all ‘keep objects’ as large objects when ‘keep pool’ is not configured.

Allocation of KEEP pool fixes the problem.

Older Entries