Archive
Timing: query execution statistics (rowsource statistics). Part 1: How it works
This is the first (but not last) blogpost of my explanation of timing. This stuff is about timing in query execution statistics which also called rowsource statistics.
Here I am explaining
- why reported actual time (“A-Time” or “time=” in sql trace) can be inconsistent,
- why execution time with statistics_level=all can be longer,
- and how exactly it works.
This stuff covers versions 10g and later if another is not mentioned.
Merry Christmas and Happy New Year 2013!
The Digger: additional notes. DTrace output can be shufled in multi-CPU environment.
Below are important additions about the Digger tool.
1. DTrace output can be shufled in multi-CPU environment.
This means that output can be in not chronological order. It is not something Digger specific, it is how DTrace works.
When DTrace script is being executed there are two parts to DTrace: the in kernel part and the DTrace script process in userland. When a dtrace probe fires, the data traced is placed in a per CPU buffer. Then, periodically, the DTrace script reads the buffers (in round-robin style) and continues processing the data for final output. While the data from any single CPU was entered into its buffer in order the probes are firing asynchronously with respect to all the CPUs.
If probe1 fired on CPU3 and after that probe2 fired on CPU2, and after that probe3 fired on CPU1 it is possible to get output as, for example, as following:
probe3
probe2
probe1
The most common case where we can see this are the cases when a process(thread) migrated from one CPU to another and DTrace consumer reads and prints current CPU-buffer before previous. It does not happen always but sometimes you can see it.
So, do not panic if some part of the output is not chronological in multi-CPU environment. It is expected.
If the output order is important, then it is strongly recommended to print such columns as timestamp or relative timestamp in order to be sure that the output is chronological, otherwise to fix it manually or using tools like sort (1). Column CPU# also is helpful because allows to see places where a process(thread) migrated from one CPU to another.
2. How much is performance impact of the Digger?
It depends on the usage. When you trace everything – all application functions or all library calls – performance impact can be significant and amount of printed info can be huge.
The Digger allows to restrict
a) traced area – where tracing is enabled
b) traced contents – which functions are traced
For example, in order to trace kcb* functions and all syscalls *read* inside function qertbFetch
command line should be:
digger.sh -p PID -Fcdeo -f qertbFetch -t kcb* -s *read*
Then generated dtrace script will be containing only required probes:
pid$target:a.out:qertbFetch:entry pid$target:a.out:qertbFetch:return + pid$target:a.out:kcb*:entry pid$target:a.out:kcb*:return + syscall::*read*:entry syscall::*read*:entry
Note: “a.out” is synonym of executing binary.
It helps to trace only relevant information, only required functions. It decreases performance impact and decreases probability of drops.
So, if you need to decrease impact of the tracing just try to restrict traced area and contents.
3. Important bug is fixed.
There was an issue related with replacement tab symbols on spaces during generating html wordpress page with source code (thanks Zhenx Li for let me know). The issue led to possible error “syntax error at line n: `end of file’ unexpected” when the source of the tool was copied/pasted. Now links on html pages with source are replaced on direct download links.
Buffer is pinned count
There are many cases where Oracle revisit some buffer in the buffer cache many times inside one database call. It such cases it can pin the buffer and hold the buffer pinned and just read pinned buffer in consequences visits. It allows to avoid redundant logical reads.
There are statistics “buffer is pinned count” and “buffer is not pinned count”.
The concept is simple Read more…
Digger: the tool for tracing of unix processes
I would like to introduce new tool – Digger.
This tool allows to see tree of process’ calls such as application calls, library, system calls (and even kernel functions and OS scheduler actions) with additional information as function arguments, result, cpu & elapsed time.
This is not something Oracle specific, the tool can be used for tracing of any unix process (DTrace is required).
Don’t forget about column projection
Note: this post is not about some exact statement, but about importance of column projection which should not be ignored, especially in cases as operations requiring workareas, data access optimization, Exadata offloading and others.
Let’s consider merge of two simple tables.
merge into t1 using t2 on (t1.id = t2.id) when matched then update set n = 1;
The tables:
SQL> select * from v$version where rownum = 1;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
SQL > create table t1 compress
2 as
3 select rownum as id,
4 0 as n,
5 lpad('*', 4000, '*') as pad
6 from dual
7 connect by level <= 1000000;
Table created.
SQL > create table t2 compress
2 as
3 select 1000000 + rownum as id,
4 1 as n,
5 lpad('*', 4000, '*') as pad
6 from dual
7 connect by level <= 1100000;
Table created.
SQL > exec dbms_stats.gather_table_stats(user, 'T1', estimate_percent => 100, degree => 4);
PL/SQL procedure successfully completed.
SQL > exec dbms_stats.gather_table_stats(user, 'T2', estimate_percent => 100, degree => 4);
PL/SQL procedure successfully completed.
SQL > alter session set statistics_level=all;
Session altered.
And a simple statement:
SQL > merge into t1 2 using t2 on (t1.id = t2.id) 3 when matched 4 then update set n = 1; 0 rows merged. Elapsed: 00:08:41.93
Let’s look at the execution plan with rowsource statistics:
SQL > select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));
-------------------------------------
SQL_ID b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id) when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | | 0 |00:08:41.66 | 52742 | 608K| 556K| | | | |
| 1 | MERGE | T1 | 1 | | 0 |00:08:41.66 | 52742 | 608K| 556K| | | | |
| 2 | VIEW | | 1 | | 0 |00:08:41.66 | 52742 | 608K| 556K| | | | |
|* 3 | HASH JOIN | | 1 | 1 | 0 |00:08:41.66 | 52742 | 608K| 556K| 2047M| 56M| 30M (1)| 4346K|
| 4 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:05.96 | 22174 | 22163 | 0 | | | | |
| 5 | TABLE ACCESS FULL| T2 | 1 | 1100K| 1100K|00:00:05.58 | 30568 | 30556 | 0 | | | | |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T1"."ID"="T2"."ID")
So, our statement is executing more than 8 minutes. All this time has been consumed on the step 3 – HASH JOIN. Notice amount of used temp space 4346 Mb (this is old oddity that value in Used-Temp should be multiplied by 1000) as a result high amount of I/O operations on temp space (columns Reads, Writes – 556K of physical reads and 556K physical writes via direct path read temp and direct path write temp) due to size of workarea (hash_area_size) was not enough (Used-Mem = 30Mb).
Note: obviously in your case it can be absolutely another time and amount of I/O, in particular it depends on hash area size, in my case it was about 30Mb
How can we improve the performance of the statement and especially hash join?
Should we increase hash area size?
May be, but before let’s take look at the same execution plan with column projection:
SQL> select * from table(dbms_xplan.display_cursor('b5cp092vum9nw', null, 'allstats last +projection'));
-------------------------------------
SQL_ID b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id) when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | | 0 |00:08:41.66 | 52742 | 608K| 556K| | | | |
| 1 | MERGE | T1 | 1 | | 0 |00:08:41.66 | 52742 | 608K| 556K| | | | |
| 2 | VIEW | | 1 | | 0 |00:08:41.66 | 52742 | 608K| 556K| | | | |
|* 3 | HASH JOIN | | 1 | 1 | 0 |00:08:41.66 | 52742 | 608K| 556K| 2047M| 56M| 30M (1)| 4346K|
| 4 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:05.96 | 22174 | 22163 | 0 | | | | |
| 5 | TABLE ACCESS FULL| T2 | 1 | 1100K| 1100K|00:00:05.58 | 30568 | 30556 | 0 | | | | |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("T1"."ID"="T2"."ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
3 - (#keys=1) "T1"."ID"[NUMBER,22], "T2"."ID"[NUMBER,22], "T1".ROWID[ROWID,10], "T1"."PAD"[VARCHAR2,4000], "T1"."N"[NUMBER,22],
"T2"."PAD"[VARCHAR2,4000], "T2"."N"[NUMBER,22]
4 - "T1".ROWID[ROWID,10], "T1"."ID"[NUMBER,22], "T1"."N"[NUMBER,22], "T1"."PAD"[VARCHAR2,4000]
5 - "T2"."ID"[NUMBER,22], "T2"."N"[NUMBER,22], "T2"."PAD"[VARCHAR2,4000]
Pay attention on column projection of the steps 3, 4, 5. It contains ALL columns from the tables T1 and T2 although the query uses only some of them. The part of the execution plan below of the step 2 is equivalent to
select t1.rowid,
t1.id,
t1.n,
t1.pad,
t2.id,
t2.n,
t2.pad
from t1, t2
where t1.id = t2.id
It looks like a bug or “not implemented yet”. In our case it means that all columns from the table T1 (especially the fat column PAD) will be retrieved and stored in the workarea.
This can dramatically affect the performance:
-
For operations using workareas it means increased amount of dataset. If required amount of available memory is not enough then it leads to additional I/O in temp space.
In our particular case all columns from the table T1 are retrieved and put into the workarea. If the condition in the statement would be something liket1.id between t2.id and t2.id + 100
then Hash Join would be impossible, because it can be based on “equality” condition. In that case we can expect Merge Join and then both datasests must be sorted.
SQL> explain plan for 2 merge into t1 3 using t2 4 on (t1.id between t2.id and t2.id + 100) 5 when matched then 6 update set n = 1; Explained SQL> @plan --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | MERGE STATEMENT | | 1 | 2 | | 1766K (1)| 05:53:21 | | 1 | MERGE | T1 | | | | | | | 2 | VIEW | | | | | | | | 3 | MERGE JOIN | | 1 | 8018 | | 1766K (1)| 05:53:21 | | 4 | SORT JOIN | | 1000K| 3822M| 7812M| 840K (1)| 02:48:06 | | 5 | TABLE ACCESS FULL | T1 | 1000K| 3822M| | 4928 (1)| 00:01:00 | |* 6 | FILTER | | | | | | | |* 7 | SORT JOIN | | 1100K| 4206M| 8593M| 926K (1)| 03:05:16 | | 8 | TABLE ACCESS FULL| T2 | 1100K| 4206M| | 6779 (1)| 00:01:22 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 6 - filter("T1"."ID"<="T2"."ID"+100) 7 - access(INTERNAL_FUNCTION("T1"."ID")>=INTERNAL_FUNCTION("T2"."ID")) filter(INTERNAL_FUNCTION("T1"."ID")>=INTERNAL_FUNCTION("T2"."ID")) - It affects data access optimization. If there would be indexes containing all columns using in a query as t1(id, n) or t2(id,n), then Index Fast Full Scan will be impossible here because the indexes do not contain required redundant columns (as PAD).
If there would be a condition doing Index Range Scan appropriate access path then TABLE ACCESS BY INDEX ROWID would exist to get rest of unnecessary columns. - If a row is chained then it leads to additional logical (and maybe physical) I/O to read redundant columns
- If the query would be run on Exadata and one or both Full Table Scan would be offloading, then there would no Column Projection optimization, returned amount of data from Storage Cells to PGA would be redundant and contains unnecessary columns from a table.
- etc
It is what can happen if we ignore Column Projection.
Our particular case can be fixed easy
SQL > merge into (select id, n from t1) t1 2 using (select id, n from t2) t2 on (t1.id = t2.id) 3 when matched 4 then update set n = 1; 0 rows merged. Elapsed: 00:00:20.09
upd note: (select id, n from t2) is redundant here. It could be (select id from t2).
The execution plan with rowsurce statistics and column projection:
----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | Reads | Writes | OMem | 1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
| 0 | MERGE STATEMENT | | 1 | | 0 |00:00:19.83 | 52742 | 35651 | 2032 | | | | |
| 1 | MERGE | T1 | 1 | | 0 |00:00:19.83 | 52742 | 35651 | 2032 | | | | |
| 2 | VIEW | | 1 | | 0 |00:00:19.83 | 52742 | 35651 | 2032 | | | | |
|* 3 | HASH JOIN | | 1 | 1 | 0 |00:00:19.83 | 52742 | 35651 | 2032 | 36M| 6589K| 34M (1)| 16384 |
| 4 | TABLE ACCESS FULL| T2 | 1 | 1100K| 1100K|00:00:11.77 | 30568 | 28841 | 0 | | | | |
| 5 | TABLE ACCESS FULL| T1 | 1 | 1000K| 1000K|00:00:04.44 | 22174 | 4778 | 0 | | | | |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
3 - access("ID"="ID")
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
3 - (#keys=1) "ID"[NUMBER,22], "ID"[NUMBER,22], "N"[NUMBER,22], "T1".ROWID[ROWID,10], "N"[NUMBER,22]
4 - "ID"[NUMBER,22], "N"[NUMBER,22]
5 - "T1".ROWID[ROWID,10], "ID"[NUMBER,22], "N"[NUMBER,22]
Now column projection does not contain unnecessary columns, amount of required memory and as a result amount of temp space (Used-Tmp ~ 16Mb) and I/O (2032 physical reads and 2032 physical writes via direct path read temp and direct path write temp) was significantly reduced, and the statement was executed about 20 seconds instead of 8 minutes.
Notice, that in this case Oracle takes the table T2 as build input. It is because RowID from the table T1 is required to perform MERGE.
The part of the execution plan below of the step 2 now looks like
select t1.id,
t1.n,
t1.rowid,
t2.id,
t2.n
from t1, t2
where t1.id = t2.id
It estimates that amount of resultset
1100K rows * ("ID"[NUMBER,22], "N"[NUMBER,22])
less than
1000K rows * ("T1".ROWID[ROWID,10], "ID"[NUMBER,22], "N"[NUMBER,22])
and that resultset requires less amount of workarea.
Thus it is reasonable to use T2 as build input here.
A small change in parallel insert with serial data access between 11.1 and 11.2
There is an interesting small change in parallel insert between 11.1 and 11.2 in queries with parallelized insert, but serialized data access. On a practice the simplest case where we can see this it is CTAS or parallel insert as select from remote table. Read more…
Serial direct read for small tables in 11.2.0.2
Today I have fixed an issue related with serial direct path reads.
There is 11.2.0.2 database
SQL> select * from v$version where rownum = 1; BANNER -------------------------------------------------------------------------------- Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production SQL> SELECT a.ksppinm "Parameter", 2 b.ksppstvl "Session Value", 3 c.ksppstvl "Instance Value" 4 FROM x$ksppi a, x$ksppcv b, x$ksppsv c 5 WHERE a.indx = b.indx 6 AND a.indx = c.indx 7 AND a.ksppinm = '_serial_direct_read'; Parameter Session Value Instance Value ----------------------- ----------------------- ----------------------- _serial_direct_read auto auto
There is a very small table, let’s call it dualcopy.
SQL> exec dbms_stats.gather_table_stats(user, 'dualcopy', estimate_percent => 100); PL/SQL procedure successfully completed SQL> select object_type, 2 num_rows, 3 blocks, 4 empty_blocks 5 from all_tab_statistics 6 where table_name = 'DUALCOPY'; OBJECT_TYPE NUM_ROWS BLOCKS EMPTY_BLOCKS ------------ ---------- ---------- ------------ TABLE 1 1 0
Let’s try to do select from the table with enabled 10046 event:
SQL> alter session set events '10046 trace name context forever, level 8'; Session altered SQL> select * from dualcopy; DUMMY ----- X
Below is an excerpt from the trace file
PARSING IN CURSOR #18446744071497556472 len=24 dep=0 uid=2500 oct=3 lid=2500 tim=3988723224325 hv=250290826 ad='3e911e388' sqlid='g32a9w87fq8na' select * from dualcopy END OF STMT PARSE #18446744071497556472:c=20000,e=26430,p=0,cr=27,cu=0,mis=1,r=0,dep=0,og=2,plh=769194902,tim=3988723224324 EXEC #18446744071497556472:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,plh=769194902,tim=3988723224417 WAIT #18446744071497556472: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3988723224541 WAIT #18446744071497556472: nam='SQL*Net message from client' ela= 66047 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3988723290684 WAIT #18446744071497556472: nam='direct path read' ela= 7852 file number=705 first dba=23954 block cnt=1 obj#=2229858 tim=3988723301159 WAIT #18446744071497556472: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=2229858 tim=3988723301360 FETCH #18446744071497556472:c=10000,e=10521,p=1,cr=2,cu=0,mis=0,r=1,dep=0,og=2,plh=769194902,tim=3988723301437 STAT #18446744071497556472 id=1 cnt=1 pid=0 pos=1 obj=2229858 op='TABLE ACCESS FULL DUALCOPY (cr=2 pr=1 pw=0 time=10419 us cost=2 size=2 card=1)'
What?! serial direct path read for the table in one block?! What’s going on?
The table can be small, but if your application accesses it many times it can lead to significant amount of physical I/O.
Serial direct read mode (_serial_direct_read) is a property of a child cursor. Similar behavior is possible if the same query was parsed earlier in a session with ALWAYS or TRUE mode (_serial_direct_read=always or _serial_direct_read=true). If we pay attention on the parse call, it was a hard parsing (mis=1) and new child cursor. Thus, this is not the reason in our case.
The reason is that the table has been configured for using of KEEP pool, but database has not.
SQL> select table_name, buffer_pool 2 from all_tables 3 where table_name = 'DUALCOPY'; TABLE_NAME BUFFER_POOL ------------------------------ ----------- DUALCOPY KEEP SQL> show parameter db_keep_cache_size NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_keep_cache_size big integer 0
This is the bug 12530276. Oracle 11.2.0.2 treats all ‘keep objects’ as large objects when ‘keep pool’ is not configured.
Allocation of KEEP pool fixes the problem.
Dynamic tracing of Oracle logical I/O: part 2. Dtrace LIO v2 is released.
What’s new in DTrace LIO?
Jump to introduction: Dynamic tracing of Oracle logical I/O
1. The list of supported functions performing logical I/O is extended:
Note: some function names below are my suggestions.
Consistent gets:
- kcbgtcr – Kernel Cache Buffer Get Consistent Read. This is general entry point for consistent read.
- kcbldrget – Kernel Cache Buffer Load Direct-Read Get. The function performing direct-path read. Interesting detail: 10.2 the function kcbldrget is called just after kcbgtcr, in 11.2 by (from) kcbgtcr.
Current gets (db block gets):
- kcbgcur – Kernel Cache Buffer Get Current Read
- kcbget – Kernel Cache Buffer Get Buffer. This is analogue of kcbgcur function, as I observed this function is called for index branch and leaf blocks
- kcbnew – Kernel Cache Buffer New Buffer
- kcblnb (kcblnb_dscn in 11.2) – Kernel Cache Buffer Load New Buffer. The function performing direct-path load. Decoding of block coordinates is not supported for this function in the current version of DTraceLIO. The parameters such as object_id, data_object_id will be shown as 0 (zero)
Note: KCB and KCBL are module names
KCB – Kernel Cache Buffer (Get, change, and release buffers)
KCBL – Kernel Cache Buffer Load (Direct I/O routines)
2. Functions related with buffer pinning are added.
Following functions calls are traced:
- kcbispnd – Kernel Cache Buffer Is Pinned – functions where Oracle checks is some buffer pinned yet or not. In this function Oracle either increments statistic “buffer is pinned count” if buffer is pinned yet or “buffer is not pinned count” if buffer is not pinned.
- kcbrls – Kernel Cache Buffer Release Pin
Important note: buffer pinning itself is performed inside logical I/O functions. So, unfortunately it is impossible to see it as function calls.
3. Changed parameters
Usage:
dtracelio.d PID [show_each_call] [interval]
- PID – unix process ID
- show_each_call – This is the bitmask determining how functions calls will be shown.
1st bit – to show logical I/O functions
2nd bit – to show buffer pinning
Examples:
0: output of each call will be disabled
1: logical I/O functions will be shown
3: logical I/O and buffer pinning functions will be shown
Default value: 1 (only logical I/O functions will be shown) - interval – Specifies the number of seconds over which Summary form with cumulative figures will be shown. Works only when show_each_call is disabled.
Default value: 0 (disabled)
4. Existed summary table is changed, new columns are added
The existent summary table “Logical I/O Summary (grouped by function/object)” is changed.
============================= Logical I/O Summary (grouped by function/object) =========================== function stat object_id data_object_id mode_held where bufs calls --------- ------- ----------- ---------------- ----------- ------- -------- --------- kcbgcur cu 0 -1 2 381 1 1 kcbgcur cu 56756 56756 2 594 1 1 kcbnew cu 0 -1 383 1 1 kcbgtcr cr 56756 56756 446 2 2 kcbgtcr cr 56756 56756 447 2 2 kcbgtcr cr 56756 56756 577 2 2 ==========================================================================================================
New columns:
- stat: “cr” – consistent get, “cr (d)” – consistent get direct, “cu” – current get, ”cu (d)” – current get direct
- mode_held: mode in which buffer is pinned. The column is supported only for functions kcbgcur and kcbget
- bufs: number of processed bufers. This number is equal number of calls for almost all functions except kcbnew. In function kcbnew N (many) buffers can be processed by one function call.
5. New summary table is added
In addition to old summary output grouped by function/object new output form “grouped by object” is added.
An example:
================================= Logical I/O Summary (grouped by object) ================================
object_id data_object_id lio cr cr (d) cu cu (d) ispnd (Y) ispnd (N) pin rls
---------- --------------- --------- --------- --------- --------- --------- --------- --------- ---------
1 -1 21 0 0 21 0 0 0 21
17 17 108 108 0 0 0 0 0 108
0 -1 64042 18 0 64024 0 0 0 36
415893 2142703 463565 49351 0 414214 0 0 387842 88604
2142704 2142704 1977665 0 0 1977665 0 0 0 9467
---------- --------------- --------- --------- --------- --------- --------- --------- --------- ---------
total 2505401 49477 0 2455924 0 387842 0 98236
==========================================================================================================
Where following figures are showed:
- lio – All logical reads done by an object. Equal of “session logical reads” statistic. lio = cr + cu
- cr – consistent gets. This statistic includes cr (d)
- cr (d) – consistent gets direct
- cu – current gets. This statistic includes cu (d)
- cu (d) – current gets direct
- ispnd (Y) – buffer is pinned count
- ispnd (N) – buffer is not pinned count
- pin rls – buffer pin is released
The list of objects is sorted by lio column.
Enjoy!
p.s.
Announcement: One of the next blog entry will be about statistics “buffer is [not] pinned count”. I will show what these statistics do mean, how it works and how DTraceLIO with new functionality might be used.
Coming soon.
Dynamic tracing of Oracle logical I/O
I would like to provide one more option to investigate Oracle logical I/O with DTrace.
This method allows to see each consistent/current gets with details about block, object and location from which function is called.
This is the DTrace script: dtracelio.d
Note: Dtrace LIO with new features is released
Short description:
This tool allows to see:
– details of each call (consistent/current get)
– details of blocks been read
– function from which call been performed without investigating of call stack (“where”)
– aggregation of performed calls
The tool can be especially helpful in 11g because x$kcbuwhy (x$kcbsw in 10g) is not populated properly there.
Note: Examples of using views x$kcbsw/x$kcbuwhy:
- Investigating Logical I/O by Jonathan Lewis
- Finding the reasons for excessive logical IOs by Tanel Poder
Current version of the script monitors execution of two Kernel Cache Layer functions:
kcbgtcr – Kernel Cache Buffer Get Consistent Read. Number of calls of this function we can see as statistic “consistent gets” and in 10046 trace as “cr=”.
kcbgcur – Kernel Cache Buffer Get Current Read. Number of calls of this function we can see as statistic “db block gets” in 10046 trace as “cu=”.
upd:
statistic “db block gets” is incremented not only by kcbgcur, but also by kcbget (included in current version of dtracelio), kcbnew (coming soon), kcblnb* (direct-path load) and some others. But kcbgcur is most often used.
In the both functions:
- the first argument is a pointer on the structure which describes a block;
- the second argument is unknown, but I could suggest that this is lock_mode (from MOS bug 7109078);
- the third argument (the least significant bits) is “where” (x$kcbwh), that is a module where the function is called.
Usage:
dtracelio.d PID [show_each_call]
PID – unix process ID
show_each_call – if 0 then output of each call will be disabled, else details of each call will be shown
A little example of usage. Let’s try to see what is going on during select on very small table:
SQL> @spid
SID ORACLE_DEDICATED_PROCESS CLIENTPID
---------- ------------------------ ------------------------
41 18949 900:408
SQL> create table dualcopy as select * from dual;
Table created
SQL> select * from dualcopy;
DUMMY
-----
X
Notice, that I have executed “select * from dualcopy” to avoid hard parsing in consequences executions.
Let’s execute dtracelio
dtracelio.d 18949
I have executed the script with default parameter show_each_call, it means that details of each calls will be shown.
Now I am executing our query:
SQL> select * from dualcopy; DUMMY ----- X
And this is an output of dtracelio
kcbgtcr(0xFFFFFD7FFFDFB0F0,0,742,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 742 kcbgtcr(0xFFFFFD7FFFDFAED0,0,743,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 743 kcbgtcr(0xFFFFFD7FFDC3B2E0,0,860,0) [tsn: 4 rdba: 0x100060b (4/1547) obj: 79218 dobj: 79218] where: 860
You can see here:
kcbgtcr(0xFFFFFD7FFFDFB0F0,0,742,0) – it is function call with 3 arguments,
tsn: 4 – a tablespace number, ts# from v$tablespace
rdba: 0x100060a – a relative dba (data block address)
(4/1546) – file 4 block 1546
obj: 79218 – dictionary object number, object_id from dba_objects
dobj: 79218 – data object number, data_object_id from dba_objects
where: 742 – location from function (kcbgtcr in this case) was executed. This is INDX from x$kcbwf.
SQL> select indx, kcbwhdes from x$kcbwh where indx in (742, 743, 860);
INDX KCBWHDES
---------- ----------------------------------------------------------------
742 ktewh25: kteinicnt
743 ktewh26: kteinpscan
860 kdswh01: kdstgr
It is functions from which our function kcbgtcr was executed. It is parent function which you will see if will print call stack during the call of kcbgtcr.
If you would print call stack you will see (I’ve just added ustack() in the script):
kcbgtcr(0xFFFFFD7FFFDFB0F0,0,742,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 742
oracle`kcbgtcr
oracle`ktecgshx+0x3f6
oracle`kteinicnt1+0x1cf
oracle`qertbFetch+0xc2a
oracle`opifch2+0xaa2
oracle`opifch+0x3a
oracle`opiodr+0x433
oracle`ttcpip+0x599
oracle`opitsk+0x600
oracle`opiino+0x675
oracle`opiodr+0x433
oracle`opidrv+0x32e
oracle`sou2o+0x57
oracle`opimai_real+0x219
oracle`ssthrdmain+0x14e
oracle`main+0xcb
oracle`0x159e67c
kcbgtcr(0xFFFFFD7FFFDFAED0,0,743,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 743
oracle`kcbgtcr
oracle`ktecgshx+0x3f6
oracle`kteinpscan+0x1b6
oracle`kteiniscan+0x2c
oracle`kdselini+0x2f
oracle`kdsirs1+0x792
oracle`kdsirs+0x2b
oracle`qertbFetch+0xbb0
oracle`opifch2+0xaa2
oracle`opifch+0x3a
oracle`opiodr+0x433
oracle`ttcpip+0x599
oracle`opitsk+0x600
oracle`opiino+0x675
oracle`opiodr+0x433
oracle`opidrv+0x32e
oracle`sou2o+0x57
oracle`opimai_real+0x219
oracle`ssthrdmain+0x14e
oracle`main+0xcb
kcbgtcr(0xFFFFFD7FFDC3B2E0,0,860,0) [tsn: 4 rdba: 0x100060b (4/1547) obj: 79218 dobj: 79218] where: 860
oracle`kcbgtcr
oracle`ktrget2+0x27d
oracle`kdstgr+0x46b
oracle`qertbFetch+0x466
oracle`opifch2+0xaa2
oracle`opifch+0x3a
oracle`opiodr+0x433
oracle`ttcpip+0x599
oracle`opitsk+0x600
oracle`opiino+0x675
oracle`opiodr+0x433
oracle`opidrv+0x32e
oracle`sou2o+0x57
oracle`opimai_real+0x219
oracle`ssthrdmain+0x14e
oracle`main+0xcb
oracle`0x159e67c
So, 3 consistent gets, 3 logical reads were performed.
Two times block (4/1546) – this is segment header – has been read. The first time from function kteinicnt and the second time from function kteinpscan.
And the third get is block (4/1547) – this is data block – was read from function kdstgr.
After that, if you push Ctrl+C, dtracelio script will be finished, and you will see Summary section:
^C ========================= Summary ========================== object_id data_object_id function where count 79218 79218 kcbgtcr 742 1 79218 79218 kcbgtcr 743 1 79218 79218 kcbgtcr 860 1
Summary section is aggregation of all calls grouped by object_id, data_object_id, function, where.
example 2
There was question on sql.ru. The author asked “why number of current gets (db block gets) during update in distributed transaction is more then in usual, non-distributed transaction?”.
In his case it were:
3070178 current gets in non-distributed transaction
6070211 current gets in distributed transaction
So, let’s find out what exactly is read during distributed transaction and is not read in usual, non-distributed transaction.
SQL> select * from v$version;
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE 11.2.0.2.0 Production
TNS for Solaris: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production
SQL> @spid
SID ORACLE_DEDICATED_PROCESS CLIENTPID
---------- ------------------------ ------------------------
1 16727 900:408
SQL> create table test as select * from all_objects;
Table created
SQL> alter session set optimizer_dynamic_sampling=0;
Session altered
SQL> @10046
Session altered
Here we are executing dtracelio.d:
dtracelio.d 16727 0
Here I have asked to show me only summary section of executions of functions performing logical I/O by process 16727.
1st update, without distributed transaction
SQL> begin 2 update test set owner = owner; 3 commit; 4 end; 5 / PL/SQL procedure successfully completed
After the update has been completed we can push Ctrl+C to finish the DTrace and see the Summary section:
^C ========================= Summary ========================== object_id data_object_id function where count ... 0 -1 kcbgcur 48 16 0 -1 kcbgcur 156 16 17 17 kcbgtcr 869 18 0 -1 kcbgcur 51 253 0 -1 kcbgcur 79 1365 79204 79204 kcbgcur 878 1656 0 -1 kcbgcur 50 2357 -1 79204 kcbgcur 859 2357 79204 79204 kcbgtcr 869 2726 79204 79204 kcbgcur 1053 75105
An excerpt from raw 10046 trace:
=====================
PARSING IN CURSOR #18446741324891442008 len=60 dep=0 uid=0 oct=47 lid=0 tim=229288260276 hv=2516922591 ad='88f557c0' sqlid='6sj578yb0ac6z'
begin
update test set owner = owner;
commit;
end;
END OF STMT
....
EXEC #18446741324891442008:c=2780000,e=4884618,p=0,cr=2776,cu=84545,mis=0,r=1,dep=0,og=1,plh=0,tim=229293144944
Pay attention, cu=84545.
Now let’s execute the same update within distributed transaction
SQL> declare 2 i integer; 3 begin 4 5 select count(*) into i from dual@dblink; 6 update test set owner = owner; 7 commit; 8 9 end; 10 /
PARSING IN CURSOR #18446741324891442008 len=122 dep=0 uid=0 oct=47 lid=0 tim=229346672450 hv=1954107067 ad='88f04bc0' sqlid='arss8t9u7kmpv' declare i integer; begin select count(*) into i from dual@dblink; update test set owner = owner; commit; end; END OF STMT ... EXEC #18446741324891442008:c=2950000,e=3432722,p=0,cr=2298,cu=159479,mis=0,r=1,dep=0,og=1,plh=0,tim=229350105247
As we can see cu=159479, about in two times more then in previous execution.
dtracelio.d output:
========================= Summary ========================== object_id data_object_id function where count ... 0 -1 kcbgcur 48 16 0 -1 kcbgcur 156 16 17 17 kcbgtcr 869 18 0 -1 kcbgcur 51 49 0 -1 kcbgcur 50 442 -1 79204 kcbgcur 859 442 0 -1 kcbgcur 79 1161 79204 79204 kcbgcur 878 1177 79204 79204 kcbgtcr 869 2248 79204 79204 kcbgcur 1053 75105 0 -1 kcbgcur 109 79853
Pay attention on the bottom row
0 -1 kcbgcur 109 79853
79853 additional db block gets of some undo block which does not exist in previous output.
If we take a look at the output of dtracelio.d with enabled output of each call we will see something like this:
... kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053 kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109 kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053 kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109 kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053 kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109 kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053 kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109 kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053 kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109 kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053 ...
Here we can see these additional calls “where: 109″ and that the same block (3/240) is read during these calls.
Here is only excerpt, but I checked, all 79853 calls for the same block from the same “where” (109 in this case).
Update: updated version of DTraceLIO allows to see that these current gets were performed in shared mode, so content of the block is not changed.
Let’s try to look at inside the block.
alter system dump datafile 3 block 240;
An excerpt from block dump:
... Block dump from disk: buffer tsn: 2 rdba: 0x00c000f0 (3/240) scn: 0x0000.0015240d seq: 0x02 flg: 0x04 tail: 0x240d2602 frmt: 0x02 chkval: 0x54dc type: 0x26=KTU SMU HEADER BLOCK ...
So, this is the undo header.
What is “where: 109″? “Where” means location in code, let’s try to find function name with indx 109
SQL> select indx, kcbwhdes from x$kcbwh where indx = 109;
INDX KCBWHDES
---------- ----------------------------------------------------------------
109 ktuwh87: ktugus:ktuGetExtTxnInfo
It is exactly what you will see if print call stack during call with “where: 109″ (the second row, function from kcbgcur was called is ktuGetExtTxnInfo):
0000000002c63d01 kcbgcur () + 1 00000000029272d4 ktuGetExtTxnInfo () + 174 00000000028f5193 ktugti () + 13 00000000028e2561 ktuchg2 () + 1971 0000000002944d2f ktbchg2 () + 12f 0000000001b04809 kdu_array_flush_retry () + c69 0000000001b01f42 kdu_array_buf () + 522 0000000001af5390 kduurp () + 460 0000000001adea00 kdusru () + 13e0 0000000001ac6c12 kauupd () + 1b2 0000000005a3370c updrow () + a5c 0000000007b8bd27 qerupFetch () + 397 000000000367cfdc qerstFetch () + 58c 0000000005a3f2fe updaul () + 4ce 0000000005a4469d updThreePhaseExe () + 1cd 0000000005a43abd updexe () + 1ed 000000000447b2c5 opiexe () + 2455 ...
It were for 11.2.0.2. In my 10.2.0.5 all these additional calls are with “where: 383″, “ktuwh02: ktugus” which Jonathan Lewis called “Get Undo Segment header for commit” (Seems it is not correct descriptoin).
And following call stack is during these calls
oracle`kcbgcur
oracle`ktugusc+0x321
oracle`ktugti+0xf2
oracle`ktuchg+0x139a
oracle`ktbchg2+0x115
oracle`kddchg+0x2c2
oracle`kddlok+0x7da
oracle`kddlkr+0x16a
oracle`updrow+0x2417
oracle`qerupRowProcedure+0x4f
oracle`qerupFetch+0x339
oracle`updaul+0x481
oracle`updThreePhaseExe+0xc72
oracle`updexe+0x171
oracle`opiexe+0xf1b
oracle`opipls+0x98e
oracle`opiodr+0x433
oracle`rpidrus+0xde
oracle`skgmstack+0x80
oracle`rpidru+0x86
Conclusion
We have determined what exactly are extra current gets within distributed transaction.
Why Oracle 79853 times reads undo header is a topic for further research. I would prefer to write another post about it.
Little oftopic, but interesting point is that Oracle opens distributed transaction (binds undo header, transaction appears in v$transaction, v$global_transaction) for PLSQL block with dblinks during hard parsing. If hard parsing is not performed then during parsing/execution of statement with dblink.
It follows that if you will execute a statement like this
declare
i integer;
begin
DML;
DML;
DML;
if 1=0 then
select count(*) into i from dual@dblink;
end if;
end;
then either it will be executed within dustributed transaction (with extra current gets) or not depends on will hard parsing be performed or not.
p.s.
Attention!
I am still looking for additional information about
* structure kcbds,
* argments of functions kcbgtcr
* argments of functions kcbgcur
If you have this information please let me know.