<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Alexander Anokhin</title>
	<atom:link href="http://alexanderanokhin.wordpress.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://alexanderanokhin.wordpress.com</link>
	<description>Unique Oracle Stories</description>
	<lastBuildDate>Tue, 04 Jun 2013 20:55:53 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='alexanderanokhin.wordpress.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://0.gravatar.com/blavatar/06e0ea700e4c113f4b127820af08266a?s=96&#038;d=http%3A%2F%2Fs2.wp.com%2Fi%2Fbuttonw-com.png</url>
		<title>Alexander Anokhin</title>
		<link>http://alexanderanokhin.wordpress.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://alexanderanokhin.wordpress.com/osd.xml" title="Alexander Anokhin" />
	<atom:link rel='hub' href='http://alexanderanokhin.wordpress.com/?pushpress=hub'/>
		<item>
		<title>Timing: rowsource statistics. Part 2: Overhead and inconsistent time</title>
		<link>http://alexanderanokhin.wordpress.com/2012/12/24/timing-rowsource-statistics-part-2-overhead-and-inconsistent-time/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/12/24/timing-rowsource-statistics-part-2-overhead-and-inconsistent-time/#comments</comments>
		<pubDate>Mon, 24 Dec 2012 21:16:05 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=1664</guid>
		<description><![CDATA[Timing sampling frequency: Number of calls getting timestamp depends of parameter _rowsource_statistics_sampfreq (default is 128). If this parameter is set to 0, there are no time calculations in rowsource statistics. The functions qerstSnapStats()/qerstUpdateStats()do not get timestamp If this parameter is set to 1, time is calculated always. Every pair qerstSnapStats()/qerstUpdateStats() gets timestamp (this is the [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=1664&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><span id="more-1664"></span><br />
<strong>Timing sampling frequency: </strong><br />
Number of calls getting timestamp depends of parameter <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> (default is 128).</p>
<ul>
<li>If this parameter is set to 0, there are no time calculations in rowsource statistics. The functions <span style="font-family:Courier New;">qerstSnapStats()</span>/<span style="font-family:Courier New;">qerstUpdateStats()</span>do not get timestamp</li>
<li>If this parameter is set to 1, time is calculated always. Every pair <span style="font-family:Courier New;">qerstSnapStats()</span>/<span style="font-family:Courier New;">qerstUpdateStats()</span> gets timestamp (this is the case in the excerpts above).</li>
<li>If this parameter is set to N (as default value 128) then timestamp will be got every N tuples. It means that only N call of <span style="font-family:Courier New;">qerstSnapStats()</span>/<span style="font-family:Courier New;">qerstUpdateStats()</span> on some rowsource level will get timestamp.</li>
<li>There is special value 3 which enables gathering only rowcounts.</li>
</ul>
<p>Pay attention on the case when <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> = N.<br />
If this parameter is set to N (as default value 128) then timestamp will be got every N tuples. It means that only N call of <span style="font-family:Courier New;">qerstSnapStats()</span>/<span style="font-family:Courier New;">qerstUpdateStats()</span> on some rowsource level will get timestamp. Consequenced calls (for example next 127 times) of <span style="font-family:Courier New;">qerstSnapStats()</span>/<span style="font-family:Courier New;">qerstUpdateStats()</span> use that calculated time. In this case reported time of some rowsources is not real, but something like forecasted. As a result in this case we can find inconsistent time of some steps of an execution plan, when reported time of child steps more then time of parent steps.</p>
<p>From the other hand the most accurate (<span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> = 1) leads to increased number of additional calls of the function getting timestamp, as a result increased time for execution of a query.</p>
<p>These are examples of the Digger&#8217;s output of the <a href="http://alexanderanokhin.wordpress.com/tools/digger/">Digger</a> for the same query</p>
<pre class="brush: plain; title: ; notranslate">
select --+ index(tbl)
       count(pad)
  from tbl;

-----------------------------------------------------
| Id  | Operation                    | Name | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |
|   1 |  SORT AGGREGATE              |      |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TBL  |    50 |
|   3 |    INDEX FULL SCAN           | IDX  |    50 |
-----------------------------------------------------
</pre>
<p>but with different <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span>: <a href="https://dl.dropbox.com/s/iljyze02muanyhr/freq0.txt?dl=1">0</a> (timing disabled), <a href="https://dl.dropbox.com/s/2txp94duznt4mv5/freq1.txt?dl=1">1</a> (the most accurate timing), <a href="https://dl.dropbox.com/s/4yy67kuvohdota2/freq2.txt?dl=1">2</a>, <a href="https://dl.dropbox.com/s/coaxatmwsdsea2x/freq4.txt?dl=1">4</a>, <a href="https://dl.dropbox.com/s/zq5vsx6jk0q8h33/freq8.txt?dl=1">8</a>, <a href="https://dl.dropbox.com/s/fttdqxiw0wmlgps/freq16.txt?dl=1">16</a>, <a href="https://dl.dropbox.com/s/1a0gon2pn9ozwzs/freq128.txt?dl=1">128</a><br />
special case: <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> = <a href="https://dl.dropbox.com/s/vln7y74qsujqqwm/freq3.txt?dl=1">3</a> (only rowcounts)</p>
<p>Thus, we can separate three levels of details of rowsource statistics managed by the parameter <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span>:<br />
1: number of rows returned by rowsource<br />
2: (1) + statistics as consistent gets, current gets, starts (number of times an execution plan step has been executed), etc<br />
3: (1) + (2) + timing</p>
<p>As we know from 10g there are 3 ways how to enable gathering of rowsource statistics:<br />
<strong>1)</strong> set parameter <strong><span style="font-family:Courier New;">statistics_level</span> </strong>to &#8220;all&#8221; on session or system level</p>
<pre class="brush: plain; light: true; title: ; notranslate">
alter session set statistics_level=all
</pre>
<p>By this another hidden parameter parameter _rowsource_execution_statistics is set into TRUE, it enables gathering of rowsource statistics.</p>
<ul>
<li><em><span style="font-family:Courier New;">statistics_level</span>=all also sets parameter timed_os_statistics.</em></li>
<li><em>This is the only way to enable rowsource statistics gathering in 9i (except actual rows which are populated by enabled sql trace without statistics_level=all).<br />
Setting the parameter without sql trace enables gathering of rowsource statistics only for CBO (optimizer modes all_rows/first_rows), and the only way to get the statistics &#8211; to query V$SQL_PLAN_STATISTICS[_ALL].<br />
Setting the parameter with sql trace enables gathering of rowsource statistics for queries with any optimizer mode. In this case the subset statistics also can be printed in trace file. SQL trace without parameter statistics_level=all gathers only rowcounts.</em>
</li>
</ul>
<p>In this case <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> is used only if the parameter has been set explicitly.<br />
Otherwise using value is 1.</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL&gt; select ksppinm name,
  2         ksppstdvl value,
  3         ksppstdf IsDefault,
  4         decode(bitand(ksppstvf, 5), 1, 'TRUE', 0, 'FALSE') IsSetExplicitly
  5    from x$ksppi x, x$ksppcv y
  6   where (x.indx = y.indx)
  7     and ksppinm = '_rowsource_statistics_sampfreq';

NAME                                VALUE      ISDEFAULT  ISSETEXPLICITLY
----------------------------------- ---------- ---------- --------------------
_rowsource_statistics_sampfreq      128        TRUE       FALSE

SQL&gt; select count(*) from v$parameter where name ='_rowsource_statistics_sampfreq';

  COUNT(*)
----------
         0
</pre>
<p>Here although the value of the parameter still be equal 128, the value using for frequency of getting timestamps will be 1.</p>
<p>If <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> is set manually then using value is <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span>.</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL&gt; alter session set &quot;_rowsource_statistics_sampfreq&quot;=128;

Session altered.

SQL&gt; select ksppinm name,
  2         ksppstdvl value,
  3         ksppstdf IsDefault,
  4         decode(bitand(ksppstvf, 5), 1, 'TRUE', 0, 'FALSE') IsSetExplicitly
  5    from x$ksppi x, x$ksppcv y
  6   where (x.indx = y.indx)
  7     and ksppinm = '_rowsource_statistics_sampfreq';

NAME                                VALUE      ISDEFAULT  ISSETEXPLICITLY
----------------------------------- ---------- ---------- --------------------
_rowsource_statistics_sampfreq      128        TRUE       TRUE

SQL&gt; select count(*) from v$parameter where name ='_rowsource_statistics_sampfreq';

  COUNT(*)
----------
         1
</pre>
<p>In this case the value using for getting timestamps will be 128.</p>
<ul>
<li><em>Notice that hidden parameters set explicitly appear in v$parameter</em></li>
</ul>
<p>It is possible to set _rowsource_execution_statistics=true inside a session with <span style="font-family:Courier New;">statistics_level</span>=typical manually.<br />
In such case (and in cases below) real frequence of getting timestamps is equal to parameter <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> regardless how it was set (explicitly or not).</p>
<p><strong>2)</strong> hint <strong><span style="font-family:Courier New;">gather_plan_statistics</span></strong><br />
it enables rowsource statistics for a query.<br />
In this case just <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> is used.</p>
<p><strong>3)</strong> sql trace (10046 event)<br />
In this case just <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> is used.<br />
Keep in mind that in this case it is possible to see rowsource statistics not only via trace file or tkprof report, but also using dbms_xplan.display_cursor, it is much more convenient.</p>
<ul>
<li><em>In each of these three cases optimizer environment (v$[sql/ses/sys]_optimizer_env) property <strong>sqlstat_enabled</strong> set to true.</em></li>
</ul>
<p><strong>Timing: overhead</strong></p>
<p>Let&#8217;s see how and why different precise of timing calculation (<span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span>) affects amount of CPU and execution time of a query.<br />
I will run a query with different <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span>: default, 0, 128, 16, 8, 4, 2, 1.</p>
<p>In testcases below I will use simple table in 10 millions of rows and with an index</p>
<pre class="brush: plain; light: true; title: ; notranslate">
create table test(id not null, pad) as
  select rownum as id,
         '*' as pad
    from dual
 connect by level &lt;= 10000000;

create index idx on test(id);

SQL&gt; exec dbms_stats.gather_table_stats(user, 'TEST', estimate_percent =&gt; 100, cascade =&gt; true);

PL/SQL procedure successfully completed</pre>
<p>and two queries over the table: one with Full Table Scan and another with Index Full Scan.<br />
All runs on two versions &#8211; 10g and 11g.</p>
<p>In all cases I will use SQL tracing, because of easy access to consumed CPU instead of just rely on elapsed time.<br />
In 10g sql trace for the session, in 11g SQL trace for exact SQL statement</p>
<pre class="brush: plain; light: true; title: ; notranslate">
alter session set events 'sql_trace[sql:9pzmsyh5t14bb]';
</pre>
<p>After all queries will be run and traced, I again run the same queries, but during each run of the SQL query I measured number of timer calls (gethrtime). I did it after previous run in order to avoid measurement (dtrace) overhead, so CPU time in the trace results in the table below does not contain dtrace overhead.</p>
<p>So, the first query with Full Table Scan:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select count(pad)
  from test;

--------------------------------------------
| Id  | Operation          | Name |   Rows |
--------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |
|   2 |   TABLE ACCESS FULL| TEST |     10M|
--------------------------------------------
</pre>
<p>Result of the test are below: (all tkprof reports are made with option aggregate=no)<br />
10g Full Table Scan: <a href="https://dl.dropbox.com/s/rq8stn6ae7vidsb/script10g_fts.txt?dl=1">full script</a>, <a href="https://dl.dropbox.com/s/p7toy82teq20icb/scrout10g_fts.txt?dl=1">output</a>, <a href="https://dl.dropbox.com/s/qrs67oi3txiy8k4/orcl_ora_2069.trc?dl=1">raw trace</a>, <a href="https://dl.dropbox.com/s/svwhcj6gaxs57zi/tkprofrep10g_fts.txt?dl=1">tkprof report</a></p>
<p>11g Full Table Scan: <a href="https://dl.dropbox.com/s/mm7j8c4gmhblowx/script11g_fts.txt?dl=1">full script</a>, <a href="https://dl.dropbox.com/s/ucbocchetbc2myg/scrout11g_fts.txt?dl=1">output</a>, <a href="https://dl.dropbox.com/s/ewyxc51a8308prw/orcl2_ora_2946.trc?dl=1">raw trace</a>, <a href="https://dl.dropbox.com/s/bjzeoryf1mq3fgz/tkprofrep11g_fts.txt?dl=1">tkprof report</a></p>
<ul>
<li><em><em>Notice interesting detail: such form of SQL tracing:</em></em>
<pre class="brush: plain; light: true; title: ; notranslate">alter session set events 'sql_trace[sql:9pzmsyh5t14bb]'</pre>
<p><em>does not trace parse calls </em></li>
</ul>
<table width="500" border="0" cellspacing="10" cellpadding="10">
<tbody>
<tr>
<td style="text-align:center;" rowspan="2"><strong>freq</strong></td>
<td style="text-align:center;" colspan="2"><strong>10g</strong></td>
<td style="text-align:center;" colspan="2"><strong>11g</strong></td>
</tr>
<tr>
<td style="text-align:center;"><strong>cpu</strong></td>
<td style="text-align:center;"><strong>timer calls</strong></td>
<td style="text-align:center;"><strong>cpu</strong></td>
<td style="text-align:center;"><strong>timer calls</strong></td>
</tr>
<tr>
<td>default (128)</td>
<td style="text-align:right;">0,66</td>
<td style="text-align:right;">234 392</td>
<td style="text-align:right;">0,65</td>
<td style="text-align:right;">156 262</td>
</tr>
<tr>
<td>0</td>
<td style="text-align:right;">0,59</td>
<td style="text-align:right;">5</td>
<td style="text-align:right;">0,57</td>
<td style="text-align:right;">5</td>
</tr>
<tr>
<td>128</td>
<td style="text-align:right;">0,66</td>
<td style="text-align:right;">234 392</td>
<td style="text-align:right;">0,64</td>
<td style="text-align:right;">156 262</td>
</tr>
<tr>
<td>16</td>
<td style="text-align:right;">1,13</td>
<td style="text-align:right;">1 875 017</td>
<td style="text-align:right;">1,02</td>
<td style="text-align:right;">1 250 015</td>
</tr>
<tr>
<td>8</td>
<td style="text-align:right;">1,64</td>
<td style="text-align:right;">3 750 020</td>
<td style="text-align:right;">1,32</td>
<td style="text-align:right;">2 500 017</td>
</tr>
<tr>
<td>4</td>
<td style="text-align:right;">2,50</td>
<td style="text-align:right;">7 500 019</td>
<td style="text-align:right;">2,02</td>
<td style="text-align:right;">5 000 019</td>
</tr>
<tr>
<td>2</td>
<td style="text-align:right;">4,44</td>
<td style="text-align:right;">15 000 026</td>
<td style="text-align:right;">3,41</td>
<td style="text-align:right;">10 000 025</td>
</tr>
<tr>
<td>1</td>
<td style="text-align:right;">8,18</td>
<td style="text-align:right;">30 000 027</td>
<td style="text-align:right;">6,28</td>
<td style="text-align:right;">20 000 033</td>
</tr>
</tbody>
</table>
<p>And the same actions for changed query and Index Full Scan</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select --+ index(test)
       count(pad)
  from test;

-----------------------------------------------------
| Id  | Operation                    | Name | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |
|   1 |  SORT AGGREGATE              |      |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |    10M|
|   3 |    INDEX FULL SCAN           | IDX  |    10M|
-----------------------------------------------------
</pre>
<p>10g Index Full Scan: <a href="https://dl.dropbox.com/s/bj2lg09o321zgps/script10g_idx.txt?dl=1">full script</a>, <a href="https://dl.dropbox.com/s/b6tc5vdh4n5e8db/scrout10g_idx.txt?dl=1">output</a>, <a href="https://dl.dropbox.com/s/u1hvbuw0thafkro/orcl_ora_2091.trc?dl=1">raw trace</a>, <a href="https://dl.dropbox.com/s/rqw0ztji4lgb0z0/tkprofrep10g_idx.txt?dl=1">tkprof report</a></p>
<p>11g Index Full Scan: <a href="https://dl.dropbox.com/s/aly55hpnzdmspdq/script11g_idx.txt?dl=1">full script</a>, <a href="https://dl.dropbox.com/s/ehg0cok9b56d1pn/scrout11g_idx.txt?dl=1">output</a>, <a href="https://dl.dropbox.com/s/9wtj1g51da1ifpr/orcl2_ora_21296.trc?dl=1">raw trace</a>, <a href="https://dl.dropbox.com/s/gvgkn3xddhuv7ei/tkprofrep11g_idx.txt?dl=1">tkprof report</a></p>
<table width="500" border="0" cellspacing="10" cellpadding="10">
<tbody>
<tr>
<td style="text-align:center;" rowspan="2"><strong>freq</strong></td>
<td style="text-align:center;" colspan="2"><strong>10g</strong></td>
<td style="text-align:center;" colspan="2"><strong>11g</strong></td>
</tr>
<tr>
<td style="text-align:center;"><strong>cpu</strong></td>
<td style="text-align:center;"><strong>timer calls</strong></td>
<td style="text-align:center;"><strong>cpu</strong></td>
<td style="text-align:center;"><strong>timer calls</strong></td>
</tr>
<tr>
<td>default (128)</td>
<td style="text-align:right;">3,49</td>
<td style="text-align:right;">937 531</td>
<td style="text-align:right;"> 2,78</td>
<td style="text-align:right;"> 312 519</td>
</tr>
<tr>
<td>0</td>
<td style="text-align:right;">3,28</td>
<td style="text-align:right;">5</td>
<td style="text-align:right;"> 2,62</td>
<td style="text-align:right;">  7</td>
</tr>
<tr>
<td>128</td>
<td style="text-align:right;">3,52</td>
<td style="text-align:right;"> 937 534</td>
<td style="text-align:right;"> 2,69</td>
<td style="text-align:right;"> 312 519</td>
</tr>
<tr>
<td>16</td>
<td style="text-align:right;">5,24</td>
<td style="text-align:right;">7 500 036</td>
<td style="text-align:right;"> 3,32</td>
<td style="text-align:right;">  2 500 021</td>
</tr>
<tr>
<td>8</td>
<td style="text-align:right;">7,18</td>
<td style="text-align:right;"> 15 000 038</td>
<td style="text-align:right;"> 4,31</td>
<td style="text-align:right;"> 5 000 025</td>
</tr>
<tr>
<td>4</td>
<td style="text-align:right;">11,25</td>
<td style="text-align:right;"> 30 000 044</td>
<td style="text-align:right;"> 5,28</td>
<td style="text-align:right;"> 10 000 029</td>
</tr>
<tr>
<td>2</td>
<td style="text-align:right;">19,01</td>
<td style="text-align:right;">60 000 056</td>
<td style="text-align:right;"> 7,92</td>
<td style="text-align:right;">  20 000 041</td>
</tr>
<tr>
<td>1</td>
<td style="text-align:right;">34,03</td>
<td style="text-align:right;">120 000 067</td>
<td style="text-align:right;"> 13,06</td>
<td style="text-align:right;">  40 000 057</td>
</tr>
</tbody>
</table>
<p>We can see that <span style="font-family:Courier New;">_rowsource_statistics_sampfreq</span> increases number of additional timer calls which obviously increases CPU consumption.<br />
Also we can see that default value 128 does not lead to significant overhead</p>
<p>Pay attention aslo on:<br />
- in the Full Table Scan case number of additional timer calls in 10g is in 1.5 times more than in 11g. We saw the reason above in section &#8220;How it works&#8221; &#8211; <span style="font-family:Courier New;">qerstUpdateStats()</span> calls <span style="font-family:Courier New;">gethrtime()</span> twice<br />
- in the Index Full Scan case number of additional timer calls in 10g is in 3(!) times more than in 11g. Why?<br />
Let&#8217;s try to take a look at where, in what call stack, <span style="font-family:Courier New;">gethrtime()</span> were called.</p>
<pre class="brush: plain; light: true; title: ; notranslate">
dtrace -p &lt;PID&gt; -n 'pid$target:libc.so.1:gethrtime:entry{@tim[ustack()]=count();}' &gt; report.txt</pre>
<p>Excerpts:<br />
10g:</p>
<pre class="brush: plain; collapse: true; gutter: false; light: false; title: ; toolbar: true; notranslate">
...
              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x2fd
              oracle`qerstFetch+0x152
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x2fd
              oracle`qerstRowP+0x3d
              oracle`qerixtFetch+0x47d
              oracle`qerstFetch+0xf0
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x2fd
              oracle`qerstRowP+0x3d
              oracle`qertbFetchByRowID+0x88f
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x31e
              oracle`qerstFetch+0x152
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x31e
              oracle`qerstRowP+0x3d
              oracle`qerixtFetch+0x47d
              oracle`qerstFetch+0xf0
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x31e
              oracle`qerstRowP+0x3d
              oracle`qertbFetchByRowID+0x88f
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstRowP+0x29b
              oracle`qerixtFetch+0x47d
              oracle`qerstFetch+0xf0
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
         10000000

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstRowP+0x29b
              oracle`qertbFetchByRowID+0x88f
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
         10000000

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstFetch+0x48
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
         10000001

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstFetch+0x48
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
         10000001

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x134
              oracle`qerstFetch+0x152
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
         10000001

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x155
              oracle`qerstFetch+0x152
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
         10000001
</pre>
<p>11g:</p>
<pre class="brush: plain; collapse: true; gutter: false; light: false; title: ; toolbar: true; notranslate">
...
              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x206
              oracle`qerstRowP+0x4e
              oracle`qertbFetchByRowID+0xa4e
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x206
              oracle`qerstFetch+0x1a2
              oracle`qertbFetchByRowID+0x4c7
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
          9999999

              libc.so.1`gethrtime
              oracle`qerstRowP+0x25d
              oracle`qertbFetchByRowID+0xa4e
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
         10000000

              libc.so.1`gethrtime
              oracle`qerstFetch+0x234
              oracle`qertbFetchByRowID+0x4c7
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
         10000001
</pre>
<p>As we can see in 10g additional rowsource level appears &#8211; qerpfFetch (prefetch), which also wrapped by qerstSnapStats/qerstUpdateStats functions. As the result number of calls in three times (1.5*2) more than in 11g.</p>
<p>Amount of overhead depends on number of these additional calls of the function getting timestamp and also it depends on implementation of this function, that is in different OS overhead can be different.<br />
Mentioned calls <strong><span style="font-family:Courier New;">gethrtime()</span></strong> in Solaris or <strong><span style="font-family:Courier New;">gettimeofday()</span></strong> in Linux are syscalls. But these are special type of syscalls, these syscalls are implemented as syscalls with minimal overhead. Such syscalls in Solaris are called &#8220;fast trap syscalls&#8221; in Linux vsyscalls/VDSO (only 64 bit architectures).</p>
<ul>
<li><em>more about vsyscall/VDSO in Linux:<br />
- <a href="http://lwn.net/Articles/446528/">vsyscalls and the vDSO</a>,<br />
- <a href="https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html">gettimeofday speedup</a></em></li>
</ul>
<p>It is why we do not see it as real syscalls. In this case in order to catch these calls we should trace library calls which are wrappers for the real syscalls. For example, in Linux it is possible to use <strong>ltrace</strong> (library trace).</p>
<p>Thus, performance overhead depends on<br />
- number of additional timer calls: which itself depends on number of processed rows in each rowsource and number of rowsources<br />
- implementation of the function getting timestamp, so in different OS overhead can be different<br />
- and how exactly it works inside OS, even in the same OS it can work in different modes</p>
<p><strong>Timing: Overhead: VDSO on Linux</strong></p>
<ul>
<li><em><a href="https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_MRG/1.3/html/Realtime_Tuning_Guide/sect-Realtime_Tuning_Guide-General_System_Tuning-gettimeofday_speedup.html">gettimeofday speedup</a>:<br />
A Virtual Dynamic Shared Object (VDSO), is a shared library that allows application in user space to perform some kernel actions without as much overhead as a system call. The VDSO is often used to provide fast access to the gettimeofday system call data.<br />
Enabling the VDSO instructs the kernel to use its definition of the symbols in the VDSO, rather than the ones found in any user-space shared libraries, particularly the glibc. The effects of enabling the VDSO are system-wide &#8211; either all processes use it or none do.<br />
When enabled, the VDSO overrides the glibc definition of gettimeofday with it&#8217;s own. This removes the overhead of a system call, as the call is made direct to the kernel memory, rather than going through the glibc.</em></li>
</ul>
<p>The behavior of gethrtime() can be changed (system-wide):</p>
<ul>
<li><em>The VDSO boot parameter has three possible values:<br />
0<br />
Provides the most accurate time intervals at μs (microsecond) resolution, but also produces the highest call overhead, as it uses a regular system call<br />
1<br />
Slightly less accurate, although still at μs resolution, with a lower call overhead<br />
2<br />
The least accurate, with time intervals at the ms (millisecond) level, but offers the lowest call overhead</em></li>
</ul>
<p>VDSO behavior is enabled by default. The value used to enable the VDSO affects the behavior of gettimeofday. It can be enabled by echoing the desired value to /proc/sys/kernel/syscall64.</p>
<pre class="brush: plain; light: true; title: ; notranslate">echo 1 &gt; /proc/sys/kernel/vsyscall64</pre>
<p>It allows us to compare performance of gehrtime() as VDSO and as usual syscall.<br />
Environment: VirtualBox with Oracle 11.2.0.3 on Oracle Linux 6.3 inside</p>
<p>I will change VDSO behavior and run the same query as above, consequentially in three modes: 2,1,0 (gethrtime() becames a real syscall).<br />
Again I use sql trace instead of <span style="font-family:Courier New;">statistics_level</span> or hint <span style="font-family:Courier New;">gather_plan_statistics</span> in order to easy access to cpu time.<br />
The tracefile: <a href="https://dl.dropbox.com/s/qc026kvqw4dyxlm/orcl_ora_3017.trc?dl=1">orcl_ora_3017.trc</a></p>
<pre class="brush: plain; light: true; title: ; notranslate">
# echo 2 &gt; /proc/sys/kernel/vsyscall64
</pre>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:05.45 |   41400 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:05.45 |   41400 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:04.54 |   41400 |
|   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:01.74 |   23555 |
-----------------------------------------------------------------------------------------------

FETCH #139665587261584:c=5403178,e=5453897,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789769054152
</pre>
<p>cpu=5,40</p>
<pre class="brush: plain; light: true; title: ; notranslate">
# echo 1 &gt; /proc/sys/kernel/vsyscall64
</pre>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:05.17 |   41400 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:05.17 |   41400 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:04.29 |   41400 |
|   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:01.63 |   23555 |
-----------------------------------------------------------------------------------------------

FETCH #139665578749400:c=5119221,e=5169173,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789789329746
</pre>
<p>cpu=5,12</p>
<pre class="brush: plain; light: true; title: ; notranslate">
# echo 0 &gt; /proc/sys/kernel/vsyscall64
</pre>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:11.67 |   41400 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:11.67 |   41400 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:09.15 |   41400 |
|   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.28 |   23555 |
-----------------------------------------------------------------------------------------------

FETCH #139665585290416:c=11504250,e=11668447,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789819491035
</pre>
<p>cpu=11,50</p>
<p>Summary</p>
<table>
<tbody>
<tr>
<th>mode</th>
<th>cpu time</th>
</tr>
<tr>
<td>0</td>
<td>11,50</td>
</tr>
<tr>
<td>1</td>
<td>5,12</td>
</tr>
<tr>
<td>2</td>
<td>5,40</td>
</tr>
</tbody>
</table>
<p>In order to measure number of times <strong><span style="font-family:Courier New;">gettimeofday()</span></strong> has been called on Linux it is possible either use ltrace (in any mode) or any syscalls measurement method (as strace or dtrace on OL6) in mode 0 (when <strong><span style="font-family:Courier New;">gettimeofday()</span></strong> is real syscall).</p>
<p>I have used dtrace and the simplest script:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
# dtrace -p 3017 -n 'syscall::gettimeofday:entry/pid == $target/{@calls=count()}'
dtrace: description 'syscall::gettimeofday:entry' matched 1 probe
^C
40000640
</pre>
<p>Thus, number of timer calls in this case is 40 mln &#8211; the same number as in Solaris.</p>
<p><strong>Timing: known issues</strong></p>
<p>In some environments it is possible to get execution plans with absolutely crazy timings, as below.<br />
This case is Solaris 10 on VirtualBox:</p>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
select count(*)
  from (select rownum id from dual connect by level &lt;= 999999) t1,
       (select rownum id from dual connect by level &lt;= 999999) t2
 where t1.id = t2.id;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |      |      1 |        |      1 |00:00:02.56 |       |       |          |
|   1 |  SORT AGGREGATE                  |      |      1 |      1 |      1 |00:00:02.56 |       |       |          |
|*  2 |   HASH JOIN                      |      |      1 |      1 |  99999 |07:15:32.57 |  3749K|  1936K| 5594K (0)|
|   3 |    VIEW                          |      |      1 |      1 |  99999 |03:28:07.19 |       |       |          |
|   4 |     COUNT                        |      |      1 |        |  99999 |02:31:21.65 |       |       |          |
|   5 |      CONNECT BY WITHOUT FILTERING|      |      1 |        |  99999 |01:53:30.97 |       |       |          |
|   6 |       FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       |       |          |
|   7 |    VIEW                          |      |      1 |      1 |  99999 |03:28:17.39 |       |       |          |
|   8 |     COUNT                        |      |      1 |        |  99999 |02:50:24.37 |       |       |          |
|   9 |      CONNECT BY WITHOUT FILTERING|      |      1 |        |  99999 |00:56:48.52 |       |       |          |
|  10 |       FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       |       |          |
--------------------------------------------------------------------------------------------------------------------
</pre>
<p>Pay attention that the query has been done in ~3 seconds, but reported time of some steps more than 7 hours.<br />
The reason is non-monotonic timer. It means that subsequent call of timestamp can return value less that previous call (time from the past).<br />
More details of this problem and the tool allowing to check it is described here: <a href="http://www.doerzbach.com/index.php?entry=entry120113-124240" rel="nofollow">http://www.doerzbach.com/index.php?entry=entry120113-124240</a></p>
<p><strong>P.S.</strong><br />
By the way Real-Time SQL monitoring works absolutely another way&#8230; but this is another story. I am going to explain it also in one of next posts.</p>
<p><strong>P.P.S.</strong><br />
Seems that&#8217;s all. I hope didn&#8217;t forget anything <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /><br />
Any additions, corrections, questions are welcome!</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/1664/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/1664/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=1664&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/12/24/timing-rowsource-statistics-part-2-overhead-and-inconsistent-time/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>Timing: query execution statistics (rowsource statistics). Part 1: How it works</title>
		<link>http://alexanderanokhin.wordpress.com/2012/12/24/timing-query-execution-statistics-rowsource-statistics-part-1-how-it-works/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/12/24/timing-query-execution-statistics-rowsource-statistics-part-1-how-it-works/#comments</comments>
		<pubDate>Mon, 24 Dec 2012 20:55:56 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=1182</guid>
		<description><![CDATA[This is the first (but not last) blogpost of my explanation of timing. This stuff is about timing in query execution statistics which also called rowsource statistics. Here I am explaining why reported actual time (&#8220;A-Time&#8221; or &#8220;time=&#8221; in sql trace) can be inconsistent, why execution time with statistics_level=all can be longer, and how exactly [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=1182&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>This is the first (but not last) blogpost of my explanation of timing. This stuff is about timing in query execution statistics which also called rowsource statistics.</em></p>
<p><em>Here I am explaining</em></p>
<ul>
<ul>
<li><em>why reported actual time (&#8220;A-Time&#8221; or &#8220;time=&#8221; in sql trace) can be inconsistent,</em></li>
<li><em>why execution time with <span style="font-family:Courier New;">statistics_level=all</span> can be longer,</em></li>
<li><em>and how exactly it works.</em></li>
</ul>
</ul>
<p><em>This stuff covers versions 10g and later if another is not mentioned.</em></p>
<p><span style="color:#339966;"><strong>Merry Christmas and Happy New Year 2013!</strong></span> <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p><span id="more-1182"></span></p>
<p><strong>Introduction</strong></p>
<p>The rowsource statistics are figures as time spent during execution of rowsource (a step of execution plan), number of returned rows, number of buffer gets and physical reads and writes and some statistics of usage of workareas. These statistics are populated in <span style="font-family:Courier New;">V$SQL_PLAN_STATISTICS</span> and <span style="font-family:Courier New;">V$SQL_PLAN_STATISTICS_ALL</span> (based on <span style="font-family:Courier New;">X$QESRSTAT</span> and <span style="font-family:Courier New;">X$QESRSTATALL</span>) when statistics gathering is enabled. Also if sql_trace (10046 event) is enabled then these statistics are printed in raw trace file as details inside rows &#8220;STAT #&#8221; and can be seen in tkprof report as execution plan.</p>
<p>The statistics gathering is enabled for a query only if it is turned on before a query execution and can be seen only when the query is finished (event with error) or canceled.</p>
<p>Rowsource statistics were introduced in Oracle 9i, but there was no useful interface to see they in convenient form. In 10g very important tool has been introduced &#8211; the function <span style="font-family:Courier New;"><a href="http://docs.oracle.com/cd/E14072_01/appdev.112/e10577/d_xplan.htm#i998364">DBMS_XPLAN.DISPLAY_CURSOR</a></span> having many useful options. The tool has become an indispensable tool for SQL tuning.</p>
<p><strong>How to use</strong></p>
<p><strong> </strong>Gathering of rowsource statistics can be enabled by one of following options:</p>
<p>1. set parameter <strong><span style="font-family:Courier New;">statistics_level</span></strong> to <strong>all</strong> (at session or system level)<br />
2. run a query with hint <span style="font-family:Courier New;">gather_plan_statistics</span><br />
3. enable sql trace</p>
<ul>
<li><em>these methods explained more detailed below in the section &#8220;Timing sampling frequency&#8221; (part 2).</em></li>
</ul>
<p>Gathered statistics populate <span style="font-family:Courier New;">V$SQL_PLAN_STATISTICS</span> and <span style="font-family:Courier New;">V$SQL_PLAN_STATISTICS_ALL</span> and can bee seen either<br />
- by very useful form using function <span style="font-family:Courier New;"><a href="http://docs.oracle.com/cd/E14072_01/appdev.112/e10577/d_xplan.htm#i998364">DBMS_XPLAN.DISPLAY_CURSOR</a></span> (preferable).<br />
- directly query <span style="font-family:Courier New;">V$SQL_PLAN_STATISTICS_ALL</span><br />
- in trace file as <span style="font-family:Courier New;">STAT#</span> rows or in tkprof report as figures inside brackets in execution plan, if a query was traced by sql trace</p>
<p><strong>Using DBMS_XPLAN.DISPLAY_CURSOR with SQL trace</strong><br />
It is helpful to emphasize that even you used sql trace in order to investigate what is going on with a query, it is possible to use dbms_xplan.display_cursor to see rowsource statistics.<br />
Pros are obvious:<br />
It is easier, because it does not require access to the trace file on the host.<br />
Rowsource statistics printed in tracefile contain only minimum of figures: cr (consistent reads), pr (physical reads), pw (physical writes), time. At the same time complete rowsource statistics collected in v$sql_plan_statistics[_all] and printed by dbms_xplan.display_cursor() contain more useful information then in trace, in particular<br />
- starts (number of times a rowsource has been executed),<br />
- current gets. Trace shows cu= only for sql call level (parse, exec, fetch) as &#8220;cu=&#8221; stat, but not for rowsource level. In the same time <span style="font-family:Courier New;">DISPLAY_CURSOR()</span> shows &#8220;buffer gets&#8221; that is sum of cr and cu, but it is possible to query v$sql_plan_statistics[_all] to see cr and cu as separated values;<br />
- workareas and tempspace usage<br />
So, rowsource statistics printed in tracefile is only restricted subset of gathered statistics.<br />
Also display_cursor() is able to provide much helpful information related with execution plan as predicates, column projection, notes, etc.</p>
<p>Cons:<br />
<span style="font-family:Courier New;">V$SQL_PLAN_STATISTICS[_ALL]</span> contains two sets of figures: one is cumulated values for ALL executions of a cursor and one for LAST execution of the cursor. It means that if a query should be executed many times and only one of execution in the middle is object of interest, then <span style="font-family:Courier New;">V$SQL_PLAN_STATISTICS[_ALL]</span> and <span style="font-family:Courier New;">DISPLAY_CURSOR()</span> can be useless, SQL trace is more appropriate.</p>
<p>The gathering of the rowsource statistics by sql trace also means that if you need to get rowsource statistics and session settings cannot be changed (so, parameter <span style="font-family:Courier New;">statistics_level</span> cannot be used), the query cannot be changed (so, hint <span style="font-family:Courier New;">gather_plan_statistics</span> cannot be used), then it is possible to just enable sql trace in the session, or enable sql trace for exact query using 11g synthax:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL&gt; alter system set events 'sql_trace[sql:gpdmdntvzjgcr]';
</pre>
<p>where &#8220;gpdmdntvzjgcr&#8221; is sql_id which we are going to trace</p>
<ul>
<li><em>See also: sql_trace event in 11g:<br />
<a href="http://oraclue.com/2009/03/24/oracle-event-sql_trace-in-11g">Miladin Modrakovic: ORACLE EVENT SQL_TRACE IN 11G</a><br />
<a href="http://blog.tanelpoder.com/2010/06/23/the-full-power-of-oracles-diagnostic-events-part-2-oradebug-doc-and-11g-improvements/">Tanel Poder: Diagnostic events. ORADEBUG DOC and 11g improvements.</a><br />
</em></li>
</ul>
<ul>
<li><em>See also: <a href="http://oracle-randolf.blogspot.co.uk/2012/12/dbmsxplandisplaycursor-and-parallel.html">Randolf Geist &#8211; How to use dbms_cursor with parallel queries</a></em></li>
</ul>
<ul>
<li><em><strong>upd:</strong> There is also an option create SQL profile with hint gather_plan_statistics inside, can be useful especially in cases when sql_id is unknown for example when sql text is always changed/unique due to used literals, details in comments below the page</em></li>
</ul>
<p><strong>Rowsources and rowsource statistics architecture</strong></p>
<p>Let&#8217;s look how rowsource statistics work. Let&#8217;s prepare data, this is a table with 50 rows and an index</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL&gt; create table tbl(id not null, pad) as
  2      select rownum as id, rpad('X',100,'X') as pad
  4        from all_objects
  5       where rownum &lt;= 50;
Table created

SQL&gt; create index idx on tbl(id);

Index created

SQL&gt; exec dbms_stats.gather_table_stats(user, 'TBL', estimate_percent =&gt; 100, cascade =&gt; true);

PL/SQL procedure successfully completed</pre>
<p>execute the <a href="http://alexanderanokhin.wordpress.com/tools/digger/">Digger</a></p>
<pre class="brush: plain; light: true; title: ; notranslate">
digger.sh -p PID -FSmdeoz -f opifch* -t *,libc.so.1:gethrtime &gt; output.trc

It means:
-f - opifch* - tracing is enabled inside function which name is beginning from opifch
-t - *,libc.so.1:gethrtime - to trace all calls of Oracle and calls gethrtime from library libc
-F - to print flow indents
-S - trace all syscalls
-m - print module
-d - print relative timestamps
-z - print user stack before enter into qergsFetch
</pre>
<p>and execute a query</p>
<pre class="brush: plain; title: ; notranslate">
select --+ index(tbl)
       count(pad)
  from tbl;

-----------------------------------------------------
| Id  | Operation                    | Name | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |
|   1 |  SORT AGGREGATE              |      |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TBL  |    50 |
|   3 |    INDEX FULL SCAN           | IDX  |    50 |
-----------------------------------------------------
</pre>
<p>twice &#8211; once without rowsource statistics: <a href="https://dl.dropbox.com/s/m7l9e33vfwggstt/stats_disabled_full.txt?dl=1">stats_disabled_full.txt</a><br />
and once with them: <a href="https://dl.dropbox.com/s/af2le4hmwoom3fg/freq1_full.txt?dl=1">freq1_full.txt</a></p>
<ul>
<li><em>The files <a href="https://dl.dropbox.com/s/m7l9e33vfwggstt/stats_disabled_full.txt?dl=1">stats_disabled_full.txt</a> and <a href="https://dl.dropbox.com/s/af2le4hmwoom3fg/freq1_full.txt?dl=1">freq1_full.txt</a> mentioned above contain complete output of function calls. Instead of show complete tree of calls I will use the output of only qer* and qesa* functions and <span style="font-family:Courier New;">gethrtime()</span> library call, just for better readability.<br />
In order to do get this output I execute <a href="http://alexanderanokhin.wordpress.com/tools/digger/">Digger</a> as above, but with parameters<br />
-f qerstFetch<br />
-t qer*,qesa*,libc.so.1:gethrtime</em>
<pre class="brush: plain; light: true; title: ; notranslate">
digger.sh -p PID -FSmdeoz -f qerstFetch -t qer*,qesa*,libc.so.1:gethrtime &gt; output.trc
</pre>
<p><em>what means to trace only qer* and qesa* function calls and function gethrtime from library libc inside function qerstFetch. So, the output below is from files containing these calls.<br />
Why qer* and qesa* ? Functions qer* are Query Execution Rowsource layer and qesa* is just to have function qesaFastAggNonDistSS in the output, I will explain about it in the section Rowsource Arcitechture</em></li>
</ul>
<p>We can see that with enabled rowsource statistics every call of rowsource function is wrapped in qerst* correlated functions.</p>
<p>The output without rowsource statistics: <a href="https://dl.dropbox.com/s/vskc6o0pleqqjb0/stats_disabled.txt?dl=1">stats_disabled.txt</a></p>
<pre class="brush: plain; light: true; title: ; notranslate">
module call(args)             = return
------ ---------------------------------
oracle -&gt; qergsFetch(0xC1A2D828, 0xFFFFFD7FFDC353E8, 0x4B6D830)
oracle   -&gt; qertbFetchByRowID(0xC1A2D9F8, 0xFFFFFD7FFDC350C0, 0x39983B8)
oracle     -&gt; qerixtFetch(0xC1A2DC80, 0xFFFFFD7FFDC349B8, 0x0)
oracle     &lt;- qerixtFetch = 0x0
oracle     -&gt; qertbGetPartitionNumber(0xFFFFFD7FFFDFB264, 0xC1A2DB08, 0xFFFFFD7FFDC35188)
oracle     &lt;- qertbGetPartitionNumber = 0x1
oracle     -&gt; qerixtFetch(0xC1A2DC80, 0xFFFFFD7FFDC349B8, 0x0)
oracle     &lt;- qerixtFetch = 0x0
</pre>
<p>The output with rowsource statistics: <a href="https://dl.dropbox.com/s/2txp94duznt4mv5/freq1.txt?dl=1">freq1.txt</a></p>
<pre class="brush: plain; light: true; title: ; notranslate">
   module call(args)             = return
--------- ---------------------------------
   oracle -&gt; qerstFetch(0xC19A32B8, 0xFFFFFD7FFDC349B8, 0x4B6D830)
libc.so.1   -&gt; gethrtime(0xFFFFFD7FFFDFB510, 0x0, 0x43)
libc.so.1   &lt;- gethrtime = 0xB3789663
   oracle   -&gt; qergsFetch(0xC19A3428, 0xFFFFFD7FFDC346D8, 0x3570158)
   oracle     -&gt; qerstFetch(0xC19A3520, 0xFFFFFD7FFDC34438, 0x39983B8)
libc.so.1       -&gt; gethrtime(0xFFFFFD7FFFDFB290, 0x0, 0x14)
libc.so.1       &lt;- gethrtime = 0xB37B2B9C
   oracle       -&gt; qertbFetchByRowID(0xC19A3688, 0xFFFFFD7FFDC34120, 0x3570158)
   oracle         -&gt; qerstFetch(0xC19A3810, 0xFFFFFD7FFDC34088, 0x0)
libc.so.1           -&gt; gethrtime(0xFFFFFD7FFFDFAF90, 0x0, 0x51)
libc.so.1           &lt;- gethrtime = 0xB37D610A
   oracle           -&gt; qerixtFetch(0xC19A39A0, 0xFFFFFD7FFDC33980, 0x0)
   oracle           &lt;- qerixtFetch = 0x0
   oracle           -&gt; qerstUpdateStats(0xFFFFFD7FFDC34088, 0xFFFFFD7FFFDFAF90, 0x1)
libc.so.1             -&gt; gethrtime(0x59, 0xCA6C3AE0, 0x5A)
libc.so.1             &lt;- gethrtime = 0xB3812DCB
   oracle           &lt;- qerstUpdateStats = 0xE9E8513F
   oracle         &lt;- qerstFetch = 0x0
   oracle         -&gt; qertbGetPartitionNumber(0xFFFFFD7FFFDFB084, 0xC19A3798, 0xFFFFFD7FFDC341E8)
   oracle         &lt;- qertbGetPartitionNumber = 0x1
   oracle         -&gt; qerstRowP(0xFFFFFD7FFFDFB250, 0x7FFF, 0x3570158)
   oracle           -&gt; qerstUpdateStats(0xFFFFFD7FFDC34438, 0xFFFFFD7FFFDFB290, 0x1)
libc.so.1             -&gt; gethrtime(0x59, 0xCA6C3AE0, 0x5A)
libc.so.1             &lt;- gethrtime = 0xB38957DB
   oracle           &lt;- qerstUpdateStats = 0xE9E850AF
libc.so.1           -&gt; gethrtime(0xFFFFFD7FFFDFB450, 0x7FFF, 0xFFFFFD7FFDC4BBC8)
libc.so.1           &lt;- gethrtime = 0xB38B4E8F
   oracle         &lt;- qerstRowP = 0x7FFF
</pre>
<ul>
<li><em>Pay attention that parameter -S has been used in Digger, it means to trace syscalls, but we do not see such syscalls as <span style="font-family:Courier New;">gethrtime()</span>, only library calls. It is because <span style="font-family:Courier New;">gethrtime()</span> is implemented as special type of system call &#8211; fast trap syscall. On Linux Oracle calls <span style="font-family:Courier New;">gettimeofday()</span>, which implemented as vsyscall/VDSO, it is also special type of syscall, which allows application in user space to perform some kernel actions without as much overhead as a system call.<br />
</em></li>
</ul>
<p>Each real rowsource function as qergsFetch, qerixtFetch, qertbFetchByRowID now is wrapped by qerstFetch function with calling of qerstSnapStats() before real rowsource function and <span style="font-family:Courier New;">qerstUpdateStats()</span> after it. Inside functions <span style="font-family:Courier New;">qerstSnapStats()</span> and <span style="font-family:Courier New;">qerstUpdateStats()</span> Oracle calls getting timestamp in order to calculate time.</p>
<p>a construction</p>
<pre class="brush: plain; light: true; title: ; notranslate">
-&gt; qersomeFetch()
&lt;- qersomeFetch()
</pre>
<p>becames</p>
<pre class="brush: plain; light: true; title: ; notranslate">
-&gt; qerstFetch()
  -&gt; qerstSnapStats()
    -&gt; gethrtime()
    &lt;- gethrtime
  &lt;- qerstSnapStats
  -&gt; qersomeFetch()
  &lt;- qersomeFetch
  -&gt; qerstUpdateStats()
    -&gt; gethrtime()
    &lt;- gethrtime
  &lt;- qerstUpdateStats
&lt;- qerstFetch
</pre>
<p>Inside this qerst* function before and after rowsource function call there are calls of a functions <strong><span style="font-family:Courier New;">qerstSnapStats()</span></strong>/<strong><span style="font-family:Courier New;">qerstUpdateStats()</span></strong> getting timestamp, <strong><span style="font-family:Courier New;">gethrtime()</span></strong> in this case (Solaris). In Linux Oracle calls <strong><span style="font-family:Courier New;">gettimeofday()</span></strong>.</p>
<ul>
<li><em>Notice that I said <strong><span style="font-family:Courier New;">qerstSnapStats()</span></strong>, but there is no function <span style="font-family:Courier New;">qerstSnapStats()</span> in the output above. I suggest that it is because compilator inline optimizations. In the same case in 10g we can see that qerst* functions do not call <span style="font-family:Courier New;">gethrtime()</span> directly.<br />
An excerpt from 10.2.0.5:</em></p>
<pre class="brush: plain; light: true; title: ; notranslate">
-&gt; qergsFetch(0x389777988, 0x332EF70, 0xFFFFFD7FFFDFC4B0)
  -&gt; qerstFetch(0x389777A68, 0x1C84DE0, 0x389777988)
    -&gt; qerstSnapStats(0xFFFFFD7FFDB77520, 0xFFFFFD7FFFDFC2C8, 0x389777988)
      -&gt; gethrtime(0xFFFFFD7FFDB77520, 0xFFFFFD7FFFDFC2C8, 0x389777988)
      &lt;- gethrtime = 0x55E25FEA
    &lt;- qerstSnapStats = 0x3CD57897
    -&gt; qertbFetchByRowID(0x389777BD0, 0x332EF70, 0xFFFFFD7FFFDFC320)
      -&gt; qerstFetch(0x389777D48, 0x0, 0xFFFFFD7FFFDFC218)
        -&gt; qerstSnapStats(0xFFFFFD7FFDB772C8, 0xFFFFFD7FFFDFBEC8, 0xFFFFFD7FFFDFC218)
          -&gt; gethrtime(0xFFFFFD7FFDB772C8, 0xFFFFFD7FFFDFBEC8, 0xFFFFFD7FFFDFC218)
          &lt;- gethrtime = 0x55E65426
        &lt;- qerstSnapStats = 0x3CD57995
        -&gt; qerixtFetch(0x389777ED8, 0x0, 0xFFFFFD7FFFDFC218)
        &lt;- qerixtFetch = 0x0
        -&gt; qerstUpdateStats(0xFFFFFD7FFDB772C8, 0xFFFFFD7FFFDFBEC8, 0x1)
          -&gt; gethrtime(0xFFFFFD7FFDB772C8, 0xFFFFFD7FFFDFBEC8, 0x1)
          &lt;- gethrtime = 0x55EAB35F
          -&gt; gethrtime(0xFFFFFD7FFDB772C8, 0xFFFFFD7FFFDFBEC8, 0xF3)
          &lt;- gethrtime = 0x55EBAAB1
        &lt;- qerstUpdateStats = 0x3CD57AEA
      &lt;- qerstFetch = 0x0
</pre>
<p><em>But here in 10g we can see another problem &#8211; qerstUpdateStats calls <span style="font-family:Courier New;">gethrtime()</span> twice! We will see effects below.<br />
Just keep this in mind, when see the output from 11.2 above, that in cases which looks like qerstFetch() or qerstRowP calls gethrtime() directly, in fact between them there is qerstSnapStats()</em></li>
</ul>
<p>Keep in mind that an execution plan is a virtual flow of execution.<br />
Example, let&#8217;s look at our execution plan:</p>
<pre class="brush: plain; title: ; notranslate">
-----------------------------------------------------
| Id  | Operation                    | Name | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |
|   1 |  SORT AGGREGATE              |      |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TBL  |    20 |
|   3 |    INDEX FULL SCAN           | IDX  |    20 |
-----------------------------------------------------
</pre>
<p>Virtual flow is:<br />
<span style="font-family:Courier New;">SELECT STATEMENT</span> (fetch) calls <span style="font-family:Courier New;">SORT AGGREGATE</span><br />
<span style="font-family:Courier New;">SORT AGGREGATE</span> calls <span style="font-family:Courier New;">TABLE ACCESS BY INDEX ROWID</span><br />
<span style="font-family:Courier New;">TABLE ACCESS BY INDEX ROWID</span> calls <span style="font-family:Courier New;">INDEX FULL SCAN</span></p>
<p>It almost matches with real flow, but not completely. In reality some actions related with rowsource <span style="font-family:Courier New;">SORT AGGREGATE</span> are executed inside <span style="font-family:Courier New;">TABLE ACCESS BY INDEX ROWID</span>.<br />
<a href="https://dl.dropbox.com/s/vskc6o0pleqqjb0/stats_disabled.txt?dl=1">stats_disabled.txt</a>:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
 -&gt; qergsFetch(0xC12FEB20, 0xFFFFFD7FFDC3B490, 0x4B6D830)
   -&gt; qertbFetchByRowID(0xC12FECF0, 0xFFFFFD7FFDC3B168, 0x39983B8)
     -&gt; qerixtFetch(0xC12FEF50, 0xFFFFFD7FFDC3AA60, 0x0)
     &lt;- qerixtFetch = 0x0
     -&gt; qertbGetPartitionNumber(0xFFFFFD7FFFDFB264, 0xC12FEE00, 0xFFFFFD7FFDC3B230)
     &lt;- qertbGetPartitionNumber = 0x1
     -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0x39983B8)
     &lt;- qesaFastAggNonDistSS = 0x7FFF
     -&gt; qerixtFetch(0xC12FEF50, 0xFFFFFD7FFDC3AA60, 0x0)
     &lt;- qerixtFetch = 0x0
     -&gt; qertbGetPartitionNumber(0xFFFFFD7FFFDFB264, 0xC12FEE00, 0xFFFFFD7FFDC3B230)
     &lt;- qertbGetPartitionNumber = 0x1
     -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0x39983B8)
     &lt;- qesaFastAggNonDistSS = 0x7FFF
</pre>
<p>Pay attention that the function qesaFastAggNonDistSS from SORT AGGREGATE rowsource is called inside qertbFetchByRowID.</p>
<p>Or another example: let&#8217;s take a slightly changed query with Hash Join:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select --+ index(t) leading(t t2) use_hash(t)
       count(pad)
  from tbl t, dual t2
 where substr(t.pad, 1, 1) = t2.dummy;

----------------------------------------------
| Id  | Operation                     | Name |
----------------------------------------------
|   0 | SELECT STATEMENT              |      |
|   1 |  SORT AGGREGATE               |      |
|*  2 |   HASH JOIN                   |      |
|   3 |    TABLE ACCESS BY INDEX ROWID| TBL  |
|   4 |     INDEX FULL SCAN           | IDX  |
|   5 |    TABLE ACCESS FULL          | DUAL |
----------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access(&quot;T2&quot;.&quot;DUMMY&quot;=SUBSTR(&quot;T&quot;.&quot;PAD&quot;,1,1))
</pre>
<p>Run the Digger:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
digger.sh -p PID -FSmdeoz -f qergsFetch -t qer*,qesa* &gt; hj_stats_dis_qer_qesa.txt
</pre>
<p>run the query, the output: <a href="https://dl.dropbox.com/s/eydldhzqo4l8nhk/hj_stats_dis_qer_qesa.txt?dl=1">hj_stats_dis_qer_qesa.txt</a></p>
<p>We can see here that &#8220;build input&#8221; is building inside qertbFetchByRowID (the table TBL), and walking through &#8220;probe input&#8221; inside qertbFetch (the table DUAL).</p>
<pre class="brush: plain; light: true; title: ; notranslate">
-&gt; qergsFetch(0xBF321E00, 0xFFFFFD7FFDC3B230, 0x4B6D830)
  -&gt; qerhjFetch(0xBF321FA0, 0xFFFFFD7FFDC3AFD8, 0x39983B8)
    ....
    -&gt; qertbFetchByRowID(0xBF3266D0, 0xFFFFFD7FFDC37258, 0x6C7FED0)
      -&gt; qerixtFetch(0xBF326958, 0xFFFFFD7FFDC36B50, 0x0)
      &lt;- qerixtFetch = 0x0
      -&gt; qertbGetPartitionNumber(0xFFFFFD7FFFDFB0B4, 0xBF3267E0, 0xFFFFFD7FFDC37320)
      &lt;- qertbGetPartitionNumber = 0x1
      -&gt; qerhjSplitBuild(0xFFFFFD7FFFDFB2E0, 0x7FFF, 0x6C7FED0)
      &lt;- qerhjSplitBuild = 0x7FFE
      -&gt; qerixtFetch(0xBF326958, 0xFFFFFD7FFDC36B50, 0x0)
      &lt;- qerixtFetch = 0x0
      -&gt; qertbGetPartitionNumber(0xFFFFFD7FFFDFB0B4, 0xBF3267E0, 0xFFFFFD7FFDC37320)
      &lt;- qertbGetPartitionNumber = 0x1
      -&gt; qerhjSplitBuild(0xFFFFFD7FFFDFB2E0, 0x7FFE, 0x6C7FED0)
      &lt;- qerhjSplitBuild = 0x7FFE
      ...
    &lt;- qertbFetchByRowID = 0x7FFE
     ...
    -&gt; qertbFetch(0xBF322208, 0xFFFFFD7FFDC375D0, 0x6C85860)
      -&gt; qerhjInnerProbeHashTable(0xFFFFFD7FFFDFB2E0, 0x7FFF, 0x0)
        -&gt; qerhjBuildHashTable(0xBF321FA0, 0xFFFFFD7FFDC3AFD8, 0xFFFFFD7FFFDFB2E0
          -&gt; qerhjAllocHashTable(0xBF321FA0, 0xFFFFFD7FFDC3AFD8, 0x32)
          &lt;- qerhjAllocHashTable = 0x80
        &lt;- qerhjBuildHashTable = 0x51
        -&gt; qerhjSplitProbe(0xFFFFFD7FFFDFB2E0, 0xFFFFFD7FFDC3AFD8, 0x0)
        &lt;- qerhjSplitProbe = 0x0
        -&gt; qerhjWalkHashBucket(0xFFFFFD7FFFDFB2E0, 0xFFFFFD7FFDC3AFD8, 0x7FFF)
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F366)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F2F7)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F288)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F219)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F1AA)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F13B)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F0CC)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB540, 0x7FFF, 0xFFFFFD7FFDA7F05D)
          &lt;- qesaFastAggNonDistSS = 0x7FFF
          ...
        &lt;- qerhjWalkHashBucket = 0x7FFF
      &lt;- qerhjInnerProbeHashTable = 0x7FFF
    &lt;- qertbFetch = 0x7FFF
    ...
  &lt;- qerhjFetch = 0x7FFF
&lt;- qergsFetch = 0x0
</pre>
<p>Why it is important in context of our topic &#8211; because it means that Oracle cannot measure time just before enter in rowsource and after exit, something like this:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
-&gt; qerstSnapStats() -- get time here
  -&gt; qerhjFetch()
    -&gt; qerstSnapStats() -- get time here
      -&gt; qertbFetchByRowID()
      &lt;- qertbFetchByRowID
    &lt;- qerstUpdateStats -- get time here, calculate time spent in qertbFetchByRowID
  &lt;- qerhjFetch
&lt;- qerstUpdateStats -- get time here, calculate time spent in qerhjFetch
</pre>
<p>In order to to separate time spent in one rowsource from another, in mentioned cases Oracle has to call qerstSnapStats/qerstUpdateStats for every row using function qerstRowP:<br />
(the output for the latest query and the same with enabled rowsource statistics: <a href="https://dl.dropbox.com/s/xpb0bvgphdp52br/hj_freq1.txt?dl=1">hj_freq1.txt</a>)</p>
<pre class="brush: plain; light: true; title: ; notranslate">
-&gt; qerstRowP(0xFFFFFD7FFFDFAEC0, 0x7FFE, 0x3570158)
  -&gt; qerstUpdateStats(0xFFFFFD7FFDC37608, 0xFFFFFD7FFFDFAF00, 0x1)
    -&gt; gethrtime(0x59, 0xCA6EA940, 0x5A)
    &lt;- gethrtime = 0x68E025
  &lt;- qerstUpdateStats = 0x236
  -&gt; qerstRowP(0xFFFFFD7FFFDFAFE0, 0x7FFE, 0x368F5DD68)
    -&gt; qerstUpdateStats(0xFFFFFD7FFDC3A780, 0xFFFFFD7FFFDFB020, 0x1)
      -&gt; gethrtime(0x59, 0xCA6EA940, 0x5A)
      &lt;- gethrtime = 0x6B3FC8
    &lt;- qerstUpdateStats = 0x339
    -&gt; qerhjSplitBuild(0xFFFFFD7FFFDFB100, 0x7FFE, 0x368F5DE04)
    &lt;- qerhjSplitBuild = 0x7FFE
    -&gt; gethrtime(0xA545FF26, 0xFFFFFD7FFDB80FD0, 0x1)
    &lt;- gethrtime = 0x6EDEB8
  &lt;- qerstRowP = 0x7FFE
  -&gt; gethrtime(0xA545FF26, 0x4E, 0x59)
  &lt;- gethrtime = 0x706660
&lt;- qerstRowP = 0x7FFE
</pre>
<p>or</p>
<pre class="brush: plain; light: true; title: ; notranslate">
-&gt; qerstRowP(0xFFFFFD7FFFDFB250, 0x7FFF, 0xFFFFFD7FFDA7DFE3)
  -&gt; qerstUpdateStats(0xFFFFFD7FFDC3B520, 0xFFFFFD7FFFDFB290, 0x1)
    -&gt; gethrtime(0x59, 0xCA6EA940, 0x5A)
    &lt;- gethrtime = 0x323F47B
  &lt;- qerstUpdateStats = 0x27
  -&gt; qesaFastAggNonDistSS(0xFFFFFD7FFFDFB450, 0x7FFF, 0xFFFFFD7FFDC16CD0)
  &lt;- qesaFastAggNonDistSS = 0x7FFF
  -&gt; gethrtime(0xFFFFFD7FFFDFB450, 0x7FFF, 0xFFFFFD7FFDC32938)
  &lt;- gethrtime = 0x324AA83
&lt;- qerstRowP = 0x7FFF
</pre>
<p><span style="font-family:Courier New;">qerstUpdateStats()</span> here updates statistics for qertbFetchByRowID, it allows the time spent in hash join or &#8220;sort aggregate&#8221; be not included in TABLE ACCESS BY INDEX ROWID despite the fact that physically hashing and counting happens inside TABLE ACCESS BY INDEX ROWID.</p>
<p>The result of this flow: these rowsources calculate time for every processed row. Thus, number of timer calls is approximately equal number of processed rows in every rowsource (except top) multiplied by two (two timestamps on row &#8211; SnapStats/UpdateStats), instead of just number of rowsources.</p>
<p>Now is the time to look in more detail at timing: <a href="http://alexanderanokhin.wordpress.com/2012/12/24/timing-rowsource-statistics-part-2-overhead-and-inconsistent-time/">Part 2: Overhead and inconsistent time</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/1182/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/1182/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=1182&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/12/24/timing-query-execution-statistics-rowsource-statistics-part-1-how-it-works/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>The Digger: additional notes. DTrace output can be shufled in multi-CPU environment.</title>
		<link>http://alexanderanokhin.wordpress.com/2012/11/05/the-digger-additional-notes-dtrace-output-can-be-shufled-in-multi-cpu-environment/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/11/05/the-digger-additional-notes-dtrace-output-can-be-shufled-in-multi-cpu-environment/#comments</comments>
		<pubDate>Mon, 05 Nov 2012 07:41:32 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=1113</guid>
		<description><![CDATA[Below are important additions about the Digger tool. 1. DTrace output can be shufled in multi-CPU environment. This means that output can be in not chronological order. It is not something Digger specific, it is how DTrace works. When DTrace script is being executed there are two parts to DTrace: the in kernel part and [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=1113&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Below are important additions about the Digger tool. </p>
<p>1. <strong>DTrace output can be shufled in multi-CPU environment.</strong> </p>
<p>This means that output can be in not chronological order. It is not something Digger specific, it is how DTrace works. </p>
<p>When DTrace script is being executed there are two parts to DTrace: the in kernel part and the DTrace script process in userland. When a dtrace probe fires, the data traced is placed in a per CPU buffer. Then, periodically, the DTrace script reads the buffers  (in round-robin style) and continues processing the data for final output. While the data from any single CPU was entered into its buffer in order the probes are firing asynchronously with respect to all the CPUs. </p>
<p>If probe1 fired on CPU3 and after that probe2 fired on CPU2, and after that probe3 fired on CPU1 it is possible to get output as, for example, as following:<br />
probe3<br />
probe2<br />
probe1</p>
<p>The most common case where we can see this are the cases when a process(thread) migrated from one CPU to another and DTrace consumer reads and prints current CPU-buffer before previous. It does not happen always but sometimes you can see it. </p>
<p>So, do not panic if some part of the output is not chronological in multi-CPU environment. It is expected. </p>
<p>If the output order is important, then it is strongly recommended to print such columns as timestamp or relative timestamp in order to be sure that the output is chronological, otherwise to fix it manually or using tools like sort (1). Column  CPU# also is helpful because allows to see places where a process(thread) migrated from one CPU to another.</p>
<p>2. <strong>How much is performance impact of the Digger?</strong> </p>
<p>It depends on the usage. When you trace everything &#8211; all application functions or all library calls &#8211; performance impact can be significant and amount of printed info can be huge.</p>
<p>The Digger allows to restrict<br />
a) traced area &#8211; where tracing is enabled<br />
b) traced contents &#8211; which functions are traced</p>
<p>For example, in order to trace kcb* functions and all syscalls *read* inside function qertbFetch<br />
command line should be:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
digger.sh -p PID -Fcdeo -f qertbFetch -t kcb* -s *read*
</pre>
<p>Then generated dtrace script will be containing only required probes:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
pid$target:a.out:qertbFetch:entry
pid$target:a.out:qertbFetch:return
+
pid$target:a.out:kcb*:entry
pid$target:a.out:kcb*:return
+
syscall::*read*:entry
syscall::*read*:entry
</pre>
<p><em>Note: &#8220;a.out&#8221; is synonym of executing binary.</em></p>
<p>It helps to trace only relevant information, only required functions. It decreases performance impact and decreases probability of <a>drops</a>.<br />
So, if you need to decrease impact of the tracing just try to restrict traced area and contents.</p>
<p>3. <strong>Important bug is fixed. </strong><br />
There was an issue related with replacement tab symbols on spaces during generating html wordpress page with source code (thanks Zhenx Li for let me know). The issue led to possible error &#8220;syntax error at line n: `end of file&#8217; unexpected&#8221; when the source of the tool was copied/pasted. Now links on html pages with source are replaced on direct download links.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/1113/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/1113/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=1113&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/11/05/the-digger-additional-notes-dtrace-output-can-be-shufled-in-multi-cpu-environment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>Buffer is pinned count</title>
		<link>http://alexanderanokhin.wordpress.com/2012/07/26/buffer-is-pinned-count/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/07/26/buffer-is-pinned-count/#comments</comments>
		<pubDate>Thu, 26 Jul 2012 14:40:12 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=892</guid>
		<description><![CDATA[Introduction There are many cases where Oracle revisit some buffer in the buffer cache many times inside one database call. It such cases it can pin the buffer and hold the buffer pinned and just read pinned buffer in consequences visits. It allows to avoid redundant logical reads. There are statistics &#8220;buffer is pinned count&#8221; [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=892&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a name="1"></a><br />
<strong>Introduction</strong></p>
<p>There are many cases where Oracle revisit some buffer in the buffer cache many times inside one database call. It such cases it can pin the buffer and hold the buffer pinned and just read pinned buffer in consequences visits. It allows to avoid redundant logical reads.</p>
<p>There are statistics &#8220;buffer is pinned count&#8221; and &#8220;buffer is not pinned count&#8221;.<br />
The concept is simple<span id="more-892"></span>: there is the function kcbispnd (&#8220;Kernel Cache Buffer Is Pinned&#8221; as I suggest) where Oracle checks if a buffer is pinned or not. If a buffer is pinned then statistic &#8220;buffer is pinned count&#8221; is incremented, otherwise Oracle increments &#8220;buffer is not pinned count&#8221; and usually initiates logical read after that.<br />
There is analog of kcbispnd, the function kcbipnns. It does the same checks but does not change any statistics. The function kcbispnd is usually called before logical read, the function kcbispnd can be called in another places, for example, before to release a buffer.</p>
<p>Inside a procedure performing logical read Oracle pins the buffer (except examination), attaches buffer handle to x$kcbbf by the function kcbzgs. When it exits from the function performing logical read it holds the buffer pinned and continues to work visiting the buffer or doing another work until it will be released. The function kcbrls (Kernel Cache Buffer Release) is used to release a buffer, also it is possible from another places, for example from kcb_post_apply can call kcbzfs (detach handle from x$kcbbf) directly.</p>
<ul>
<li><em>Note: read more about <a href="http://www.jlcomp.demon.co.uk/buffer_handles.html">buffer_handles</a></em></li>
</ul>
<p>Very simplified it looks like something like this</p>
<pre class="brush: plain; light: true; title: ; notranslate">
if kcbispnd(buffer) != 1 then
    &lt;logical read&gt;
end if;
&lt;works with buffer&gt;
&lt;release the buffer&gt; -- usually by kcbrls;
</pre>
<p>In such cases Oracle:<br />
 &#8211; acquires latch &#8220;cache buffer chains&#8221; during logical read, inside kcbgtcr<br />
 &#8211; <del datetime="2012-08-03T11:31:47+00:00">acquires</del> can acquire latch &#8220;cache buffer handles&#8221; during attaching of buffer handle, inside kcbzgs<br />
 &#8211; acquires latch &#8220;cache buffer chains&#8221; during releasing of a buffer<br />
 &#8211; <del datetime="2012-08-03T11:31:47+00:00">acquires</del> can acquire latch &#8220;cache buffer handles&#8221; during detaching of buffer handle, inside kcbzfs</p>
<ul>
<li>
Upd: In order to do not grab latch &#8220;cache buffer handles&#8221; during every buffer pin, a session has _db_handles_cached (5 by default) buffer handles. The latch is captured if a session needs more than this number.
</li>
</ul>
<p>There are cases when Oracle just read and does not pin a buffer inside logical read. In such cases it increments statistic &#8220;consistent gets &#8211; examination&#8221;. In such cases only one latch &#8220;cache buffer chains&#8221; is acquired.</p>
<ul>
<li><em>Note: read <a href="http://alexanderanokhin.wordpress.com/tools/digger/#ImportantNotes">more</a> about latches acquiring inside Oracle functions </em></li>
</ul>
<p>It is necessary to note, that Oracle does not execute the function kcbispnd before every logical read, but only in some cases, when in some point of source code it does not know if a buffer will be pinned or not, when if buffer is pinned or not depends on some conditions, on some data. In some cases Oracle uses kcbipnns to check if a buffer is pinned. And when it is known that a buffer will be pinned in some point of time Oracle just re-visit it without any check.</p>
<p><a name="2"></a><br />
<strong>Example</strong><br />
<em>Note: Oracle 11.2.0.2 on Solaris 10 is used</em></p>
<p>Let&#8217;s try to look at a simple example: reading table via an index.<br />
The query:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select --+ index(tbl)
       count(pad) 
  from tbl;
</pre>
<p>The execution plan:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
--------------------------------------
| Id  | Operation                    |
--------------------------------------
|   0 | SELECT STATEMENT             |
|   1 |  SORT AGGREGATE              |
|   2 |   TABLE ACCESS BY INDEX ROWID|
|   3 |    INDEX FULL SCAN           |
--------------------------------------
</pre>
<p>In this example we create simple table with index</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL&gt; create table tbl(id not null, pad) as
  2      select rownum id,
  3             rpad('*', 100, '*') pad
  4        from all_objects
  5       where rownum &lt;= 100;
 
Table created

SQL&gt; create index idx on tbl(id) pctfree 95;
 
Index created

SQL&gt; exec dbms_stats.gather_table_stats(user, 'TBL', estimate_percent =&gt; 100, cascade =&gt; true);
 
PL/SQL procedure successfully completed

SQL&gt; select blevel,
  2         leaf_blocks,
  3         clustering_factor
  4    from dba_ind_statistics
  5   where index_name = 'IDX';
 
    BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
---------- ----------- -----------------
         1           7                 2
</pre>
<p>Notice that I use pctfree on the index to force Oracle create more than one index block for our small table. It is because mechanics of pinning of root index block slightly differs from pinning of leaf index block.</p>
<p>I execute the following query to get rowsource statistics</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select --+ index(tbl) gather_plan_statistics
       count(pad) 
  from tbl;

SQL&gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |      1 |00:00:00.01 |      10 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |00:00:00.01 |      10 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TBL  |      1 |    100 |00:00:00.01 |      10 |
|   3 |    INDEX FULL SCAN           | IDX  |      1 |    100 |00:00:00.01 |       8 |
--------------------------------------------------------------------------------------
</pre>
<p>8 buffer gets (1 root index block + 7 leaf blocks) were done during INDEX FULL SCAN and two buffer gets (clustering factor) were done during TABLE ACCESS BY INDEX ROWID.</p>
<p>Below is an excerpt from DTraceLIO aggregation output (full DTraceLIO output: <a href="http://alexanderanokhin.wordpress.com/scripts/digger/examples/dtracelio_opifch-trc/">dtracelio_opifch.trc</a>): </p>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
================================= Logical I/O Summary (grouped by object) ================================
 object_id  data_object_id       lio        cr    cr (d)        cu    cu (d) ispnd (Y) ispnd (N)   pin rls
---------- --------------- --------- --------- --------- --------- --------- --------- --------- ---------
         0               0         0         0         0         0         0         0         1         0
     79083           79083         2         2         0         0         0       197         2         2
     79084           79084         8         8         0         0         0         0         1         7
---------- --------------- --------- --------- --------- --------- --------- --------- --------- ---------
     total                        10        10         0         0         0       197         4         9
==========================================================================================================
</pre>
<p>Here we can see again that this query requires 8 logical reads of the object 79084 (index IDX) and two logical reads of object 79083 (the table TBL).<br />
Pay attention on<br />
isnpd (Y) = 197 and ispnd (N) = 2 for the table TBL,<br />
isnpd (N) = 1 for the index IDX,<br />
isnpd (N) = 1 for object with object_id = 0.</p>
<p>pin rls (pin release, kcbrls function) has been called twice for the table and 7 times for the index.</p>
<p>So, now the time to use <a href="http://alexanderanokhin.wordpress.com/tools/digger/">the Digger</a>!</p>
<p>Before I execute our query I execute the Digger and wait for prompt in the file.</p>
<pre class="brush: plain; light: true; title: ; notranslate">
digger.sh -p 16829 -Fgdeoz -f opifch* &gt; opifch.trc</pre>
<p>the output: <a href="http://alexanderanokhin.wordpress.com/scripts/digger/examples/opifch_11_2-trc/">opifch.trc</a> (200Kb)</p>
<p>these keys means:<br />
-p PID &#8211; to trace process ID 16829<br />
-F &#8211; to use flow indents<br />
-gdeo &#8211; print columns: timestamp, relative timestamp (cpu time of the process minus DTrace overhead), elapsed time of a call (ela), cpu time of a call (cpu)<br />
-z &#8211; to print call stack before enter to traced function<br />
-f &#8211; to trace calls inside function opifch*<br />
A filter (option t) is not used, it means that all application calls will be traced inside traced function function opifch.</p>
<p><em>Note: Pay attention, that we trace the function calls only from the application and do not trace the calls from libraries and syscalls. Don&#8217;t worry about it, we will see syscalls in another Digger&#8217;s examples.</em></p>
<p>Why opifch? It is possible to trace a function qergsFetch (qer Group by Sort Fetch) the step 2 in the execution plan, it is where our execution plan starts to execute. Or even qertbFetchByRowID, step 3, where buffer pinning will be caused. But there is some important piece of code related with buffer pinning outside of qergsFetch. It is why I trace opifch.</p>
<p>Why I use the symbol star (opifch*) instead of opifch?<br />
There are functions opifch and opifch2. Different clients uses different API. As a result with different clients you can get different entry points of a fetch call. An exmple, an output from PL/SQL Developer.<br />
Notice that in the output SQL contains two fetch calls. It is because SQL+ calls the second fetch to get &#8220;no data found&#8221;.<br />
And output from PL/SQL Developer also contains two fetch calls, it is because PL/SQL Developer run &#8220;select x from dual&#8221; before each query.</p>
<p>After that I execute the query and look at the output</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select --+ index(tbl)
       count(pad) 
  from tbl;
</pre>
<p><a name="3"></a><br />
So, lets take look at the output. I will start from the function qergsFetch, this is the step 2 in the execution plan, it is where our execution plan starts to execute. We will investigate calls before qergsFetch in following examples.</p>
<pre class="brush: plain; first-line: 53; title: ; wrap-lines: false; notranslate">
   -&gt; qergsFetch(0x89FAF9B0, 0xFFFFFD7FFD99EB60, 0x4D98000)
     -&gt; qeaeCnt(0xFFFFFD7FFD99FCA8, 0xFFFFFD7FFD99EB60, 0x1)
     &lt;- qeaeCnt = 0x8
     -&gt; qertbFetchByRowID(0x89FAFB80, 0xFFFFFD7FFD99E858, 0x3ABDE00)
       -&gt; qerixtFetch(0x89FAFE08, 0xFFFFFD7FFD99E170, 0x0)
</pre>
<p> Here we see that Oracle calls qergsFetch in order to perform SORT AGGREGATE, step 2.<br />
SORT AGGREGATE needs to get data from the step 3, so it calls qertbFetchByRowID in order to perform TABLE ACCESS BY INDEX ROWID.<br />
TABLE ACCESS BY INDEX ROWID needs to get rowids from the index, so it calls qerixtFetch in order to perform INDEX FULL SCAN.</p>
<pre class="brush: plain; first-line: 58; title: ; wrap-lines: false; notranslate">
         -&gt; qeilsr(0xFFFFFD7FFD99E3A8, 0xFFFFFD7FFD99E568, 0x0)
           -&gt; qeilbk1(0xFFFFFD7FFD99E568, 0x0, 0xFFFFFD7FFD99E3A8)
             -&gt; kcbispnd(0xFFFFFD7FFD99E610, 0x0, 0x0)
             &lt;- kcbispnd = 0x0
             -&gt; ktrgtc2(0xFFFFFD7FFD99E600, 0xFFFFFD7FFD99E1A0, 0x46491D0)
               -&gt; ktsmg_max_query(0x0, 0x0, 0xFFFFFD7FFDC12038)
               &lt;- ktsmg_max_query = 0x3F
               -&gt; kcbgtcr(0xFFFFFD7FFD99E610, 0x1, 0x47C)
                 -&gt; ktrexc(0xFFFFFD7FFFDFA960, 0xE771F80, 0x0)
                   -&gt; ktrEvalBlockForCR(0xFFFFFD7FFD99DE9C, 0xE771F80, 0xFFFFFD7FFD99DEC0)
                   &lt;- ktrEvalBlockForCR = 0x1
                   -&gt; ktcckv(0xFFFFFD7FFFDFA980, 0xFFFFFD7FFD99DE9C, 0x129FC2)
                   &lt;- ktcckv = 0x0
                   -&gt; kdifkc(0x74ADA014, 0xFFFFFD7FFD99E568, 0x129FC2)
                     -&gt; kdxbrs1(0x74ADA04C, 0xFFFFFD7FFD9A2CD0, 0xFFFFFD7FFFDF9F5E)
                       -&gt; lmebucp(0xFFFFFD7FFD9A2D11, 0x0, 0x74ADBFB1)
                       &lt;- lmebucp = 0xFFFFFFFF
                       -&gt; lmebucp(0xFFFFFD7FFD9A2D11, 0x0, 0x74ADBFC1)
                       &lt;- lmebucp = 0xFFFFFFFF
                     &lt;- kdxbrs1 = 0x1000BCC
                   &lt;- kdifkc = 0x1
                 &lt;- ktrexc = 0x2
               &lt;- kcbgtcr = 0x0
             &lt;- ktrgtc2 = 0x0
</pre>
<p>Here Oracle gets index root block. It executes function kcbispnd and check if the buffer is pinned yet. Buffer is not pinned, so function kcbispnd increments statistic &#8220;buffer is not pinned count&#8221; and returns 0. After that Oracle initiates logical read (ktrgtc2-&gt;kcbgtcr).<br />
Notice that function ktrexc is called in kcbgtcr. It is examination &#8211; the case when Oracle just read the buffer and does not pin it. The statistic &#8220;consistent gets &#8211; examination&#8221; is incremented inside this function.</p>
<pre class="brush: plain; first-line: 82; title: ; wrap-lines: false; notranslate">
             -&gt; kcbipnns(0xFFFFFD7FFD99E610, 0x1, 0xFFFFFD7FFD99E600)
             &lt;- kcbipnns = 0x0
             -&gt; ktrget2(0xFFFFFD7FFD99E600, 0xFFFFFD7FFD99E1A0, 0x47D)
               -&gt; ktsmg_max_query(0x0, 0x0, 0xFFFFFD7FFDC12038)
               &lt;- ktsmg_max_query = 0x3F
               -&gt; kcbgtcr(0xFFFFFD7FFD99E610, 0x0, 0x47D)
                 -&gt; ktrexf(0xFFFFFD7FFFDFA350, 0xE771F80, 0x0)
                 &lt;- ktrexf = 0x9
                 -&gt; kcbzgs(0x1, 0xE771F80, 0x1)
                   -&gt; kssadf_numa_intl(0x26, 0x91F2BB00, 0x924905D8)
                   &lt;- kssadf_numa_intl = 0x91A74018
                 &lt;- kcbzgs = 0x91A74018
                 -&gt; kcbz_fp_buf(0x74BDF348, 0x91A74098, 0x1)
                 &lt;- kcbz_fp_buf = 0x1
               &lt;- kcbgtcr = 0x748C4014
               -&gt; kcbcge(0xFFFFFD7FFD99E610, 0xFFFF8000, 0x0)
               &lt;- kcbcge = 0x129D0C
               -&gt; ktcckv(0xFFFFFD7FFD99E610, 0xFFFFFD7FFD99DE9C, 0x0)
               &lt;- ktcckv = 0x129D0C
             &lt;- ktrget2 = 0x748C4064
</pre>
<p>Here Oracle reads index leaf block.<br />
Pay attention that it uses function kcbipnns instead of kcbispnd to determine if buffer pinned or not, found that buffer is not pinned and initiates logical read of leaf block. Inside kcbgtcr Oracle pins the buffer, attaches the buffer handle to x$kcbbf (kcbzgs).<br />
After Oracle exits from kcbgtcr it holds a buffer pinned.</p>
<pre class="brush: plain; first-line: 114; title: ; notranslate">
           &lt;- qeilbk1 = 0x0
         &lt;- qeilsr = 0x0
         -&gt; kdifxs(0xFFFFFD7FFD99E3A8, 0x1, 0x0)
           -&gt; kdifxs1(0xFFFFFD7FFD99E3A8, 0x1, 0x0)
           &lt;- kdifxs1 = 0x748C5FB8
         &lt;- kdifxs = 0x748C5FB8
         -&gt; kafgex1(0x0, 0x748C5FB8, 0x0)
         &lt;- kafgex1 = 0x0
       &lt;- qerixtFetch = 0x0
</pre>
<p>After that Oracle fetches row from index block (kdifxs) and exits from qerixtFetch.<br />
Now Oracle has the first rowid and needs to read table block to get value of the column PAD from the table.</p>
<pre class="brush: plain; first-line: 123; title: ; notranslate">
       -&gt; qetlbr(0xFFFFFD7FFD99E970, 0xFFFFFD7FFD9A2C14, 0x0)
         -&gt; kcbispnd(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD9A2C14, 0x0)
         &lt;- kcbispnd = 0x0
         -&gt; kdsgrp(0xFFFFFD7FFD99E970, 0x0, 0xFFFFFD7FFD99E970)
           -&gt; kcbispnd(0xFFFFFD7FFD99E988, 0x0, 0xFFFFFD7FFD99E970)
           &lt;- kcbispnd = 0x0
           -&gt; ktrget2(0xFFFFFD7FFD99E978, 0xFFFFFD7FFD99E8A0, 0x360)
             -&gt; ktsmg_max_query(0x0, 0x0, 0xFFFFFD7FFDC12038)
             &lt;- ktsmg_max_query = 0x3F
             -&gt; kcbgtcr(0xFFFFFD7FFD99E988, 0x0, 0x360)
               -&gt; ktrexf(0xFFFFFD7FFFDFA720, 0xE771F80, 0x0)
               &lt;- ktrexf = 0x9
               -&gt; kcbzgs(0x1, 0xE771F80, 0x1)
                 -&gt; kssadf_numa_intl(0x26, 0x91F2BB00, 0x924905D8)
                 &lt;- kssadf_numa_intl = 0x91A74398
               &lt;- kcbzgs = 0x91A74398
               -&gt; kcbz_fp_buf(0x74BF2CC8, 0x91A74418, 0x1)
               &lt;- kcbz_fp_buf = 0x1
             &lt;- kcbgtcr = 0x74AD4014
             -&gt; kcbcge(0xFFFFFD7FFD99E988, 0x0, 0x0)
             &lt;- kcbcge = 0x0
             -&gt; ktcckv(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD99DE9C, 0x0)
             &lt;- ktcckv = 0x0
           &lt;- ktrget2 = 0x74AD407C
         &lt;- kdsgrp = 0x74AD5F91
       &lt;- qetlbr = 0x74AD5F91
</pre>
<p>Here we can see similar strategy &#8211; Oracle executes kcbispnd and checks if a buffer containing required <strong>table</strong> block is pinned yet or not. The function kcbispnd increments statistic &#8220;buffer is not pinned count&#8221;, returns 0, and Oracle initiates logical read of a buffer containing table block.<br />
Pay attention that it executes function kcbispnd twice: the first time before function kdsgrp (Kernel Data Scan Get Row Piece) and the second time inside this procedure.<br />
Be aware that the first call of kcbispnd here is the call for object_id=0 which we saw in DTraceLIO above. </p>
<p>Why kcbispnd is executed twice here?<br />
As we will see below, for each rowid from the index oracle performs two checks:<br />
1) the first call of kcbispnd is checking if buffer from previous rowid still be pinned. If yes and if it differs then block from latest read rowid from the index it releases pinned buffer.<br />
In our case at this point of time there was no previous rowid and there was no pinned buffer containing table block. It is why object_id=0.<br />
2) the second call of kcbispnd is checking if buffer from current(latest) rowid is pinned. If not then Oracle initiates logical read.</p>
<p>Now we have two buffers are pinned, these buffers contain index block and table block.</p>
<p>After that Oracle repeats following pattern (except two cases when Oracle either needs to read next leaf block or rowid from the index points to another table block):</p>
<pre class="brush: plain; first-line: 153; title: ; notranslate">
       -&gt; qerixtFetch(0x89FAFE08, 0xFFFFFD7FFD99E170, 0x0)
         -&gt; kdifxs(0xFFFFFD7FFD99E3A8, 0x1, 0x0)
           -&gt; kdifxs1(0xFFFFFD7FFD99E3A8, 0x1, 0x0)
           &lt;- kdifxs1 = 0x748C5FAC
         &lt;- kdifxs = 0x748C5FAC
         -&gt; kafgex1(0x0, 0x748C5FAC, 0x0)
         &lt;- kafgex1 = 0x0
       &lt;- qerixtFetch = 0x0
</pre>
<p>Here Oracle again calls fetch from the index (qerixtFetch) and gets row from the index without execution of kcbispnd and logical I/O. It just reads a pinned buffer. </p>
<pre class="brush: plain; first-line: 161; title: ; notranslate">
       -&gt; qetlbr(0xFFFFFD7FFD99E970, 0xFFFFFD7FFD9A2C14, 0x0)
         -&gt; kcbispnd(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD9A2C14, 0x0)
         &lt;- kcbispnd = 0x1
         -&gt; kdsgrp(0xFFFFFD7FFD99E970, 0x0, 0xFFFFFD7FFD99E970)
           -&gt; kcbispnd(0xFFFFFD7FFD99E988, 0x0, 0xFFFFFD7FFD99E970)
           &lt;- kcbispnd = 0x1
         &lt;- kdsgrp = 0x74AD5F26
       &lt;- qetlbr = 0x74AD5F26
</pre>
<p>After that Oracle reads row from the table buffer with twice execution of kcbispnd: before kdsgrp and inside kdsgrp.</p>
<p>Pay attention that Oracle executes kcbispnd before each logical read (or visit to pinned) of a buffer containing table block, but does not execute it before each read of buffer containing index block. It is what I said that Oracle uses this concept of using kcbispnd not in each case when it is going to perform logical read, but in cases when in some point of source code it does not know if a buffer will be pinned or not, when it depends on some conditions, on data in our case. In this case in some point of source code it is known that index buffer will be pinned, it does not depend on anything, in contrast the table buffer which depends of previous rowid from the index.</p>
<p>In our case with the current execution plan Oracle would gets index blocks one-by-one, gets each next index buffer once, reads rows from the buffer and releases it. In contrast a table buffer can be got many times. It depends on clustering factor. It is exactly where clustering factor is important.</p>
<p>Two exception from previous pattern:</p>
<pre class="brush: plain; first-line: 853; title: ; notranslate">
       -&gt; qerixtFetch(0x89FAFE08, 0xFFFFFD7FFD99E170, 0x0)
         -&gt; kdifxs(0xFFFFFD7FFD99E3A8, 0x1, 0x0)
           -&gt; kdifxs1(0xFFFFFD7FFD99E3A8, 0x1, 0x0)
             -&gt; kcbipnns(0xFFFFFD7FFD99E3C0, 0x1, 0x0)
             &lt;- kcbipnns = 0x1
             -&gt; kcbrls(0xFFFFFD7FFD99E3C0, 0x1, 0x0)
               -&gt; kcbzar(0x91A74958, 0x91F5D738, 0x100000)
               &lt;- kcbzar = 0x8
               -&gt; kcbzfs(0x91A748D8, 0x2000000000000019, 0xFFFFFFFD)
                 -&gt; kjbilms(0x91A748D8, 0x2000000000000019, 0xFFFFFFFD)
                 &lt;- kjbilms = 0x0
                 -&gt; kssrmf_numa_intl(0x91A748D8, 0x91F2BB00, 0x0)
                 &lt;- kssrmf_numa_intl = 0x0
               &lt;- kcbzfs = 0x0
             &lt;- kcbrls = 0x0
             -&gt; ksuttctest(0xE768C18, 0x1, 0x0)
               -&gt; nioqts(0xE768D28, 0x0, 0x0)
               &lt;- nioqts = 0x0
             &lt;- ksuttctest = 0x0
             -&gt; ktrget2(0xFFFFFD7FFD99E3B0, 0xFFFFFD7FFD99E1A0, 0x391)
               -&gt; ktsmg_max_query(0x0, 0x0, 0xFFFFFD7FFDC12038)
               &lt;- ktsmg_max_query = 0x3F
               -&gt; kcbgtcr(0xFFFFFD7FFD99E3C0, 0x0, 0x391)
                 -&gt; ktrexf(0xFFFFFD7FFFDF94D0, 0xE771F80, 0x0)
                 &lt;- ktrexf = 0x9
                 -&gt; kcbzgs(0x1, 0xE771F80, 0x1)
                   -&gt; kssadf_numa_intl(0x26, 0x91F2BB00, 0x924905D8)
                   &lt;- kssadf_numa_intl = 0x91A748D8
                 &lt;- kcbzgs = 0x91A748D8
                 -&gt; kcbz_fp_buf(0x74BDF218, 0x91A74958, 0x1)
                 &lt;- kcbz_fp_buf = 0x1
               &lt;- kcbgtcr = 0x748C2014
               -&gt; kcbcge(0xFFFFFD7FFD99E3C0, 0xFFFF8000, 0x0)
               &lt;- kcbcge = 0x129D0C
               -&gt; ktcckv(0xFFFFFD7FFD99E3C0, 0xFFFFFD7FFD99DE9C, 0x0)
               &lt;- ktcckv = 0x129D0C
             &lt;- ktrget2 = 0x748C2064
           &lt;- kdifxs1 = 0x748C3FB8
         &lt;- kdifxs = 0x748C3FB8
         -&gt; kafgex1(0x0, 0x748C3FB8, 0x0)
         &lt;- kafgex1 = 0x0
       &lt;- qerixtFetch = 0x0
</pre>
<p>Here Oracle found that it needs to read the next leaf block. It releases the pinned buffer and initiates logical read in order to get the next leaf block.</p>
<pre class="brush: plain; first-line: 1649; title: ; wrap-lines: false; notranslate">
       -&gt; qetlbr(0xFFFFFD7FFD99E970, 0xFFFFFD7FFD9A2C14, 0x0)
         -&gt; kcbispnd(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD9A2C14, 0x0)
         &lt;- kcbispnd = 0x1
         -&gt; kcbipnns(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD9A2C14, 0x0)
         &lt;- kcbipnns = 0x1
         -&gt; kcbrls(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD9A2C14, 0x0)
           -&gt; kcbzar(0x91A74418, 0x919D7E10, 0x100000)
           &lt;- kcbzar = 0x8
           -&gt; kcbzfs(0x91A74398, 0x2000000000000019, 0xFFFFFFFD)
             -&gt; kjbilms(0x91A74398, 0x2000000000000019, 0xFFFFFFFD)
             &lt;- kjbilms = 0x0
             -&gt; kssrmf_numa_intl(0x91A74398, 0x91F2BB00, 0x0)
             &lt;- kssrmf_numa_intl = 0x0
           &lt;- kcbzfs = 0x0
         &lt;- kcbrls = 0x0
         -&gt; kdsgrp(0xFFFFFD7FFD99E970, 0x0, 0xFFFFFD7FFD99E970)
           -&gt; kcbispnd(0xFFFFFD7FFD99E988, 0x0, 0xFFFFFD7FFD99E970)
           &lt;- kcbispnd = 0x0
           -&gt; ktrget2(0xFFFFFD7FFD99E978, 0xFFFFFD7FFD99E8A0, 0x360)
             -&gt; ktsmg_max_query(0x0, 0x0, 0xFFFFFD7FFDC12038)
             &lt;- ktsmg_max_query = 0x3F
             -&gt; kcbgtcr(0xFFFFFD7FFD99E988, 0x0, 0x360)
               -&gt; ktrexf(0xFFFFFD7FFFDFA720, 0xE771F80, 0x0)
               &lt;- ktrexf = 0x9
               -&gt; kcbzgs(0x1, 0xE771F80, 0x1)
                 -&gt; kssadf_numa_intl(0x26, 0x91F2BB00, 0x924905D8)
                 &lt;- kssadf_numa_intl = 0x91A74398
               &lt;- kcbzgs = 0x91A74398
               -&gt; kcbz_fp_buf(0x74BDF6D8, 0x91A74418, 0x1)
               &lt;- kcbz_fp_buf = 0x1
             &lt;- kcbgtcr = 0x748CA014
             -&gt; kcbcge(0xFFFFFD7FFD99E988, 0x0, 0x0)
             &lt;- kcbcge = 0x0
             -&gt; ktcckv(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD99DE9C, 0x0)
             &lt;- ktcckv = 0x0
           &lt;- ktrget2 = 0x748CA07C
         &lt;- kdsgrp = 0x748CBF91
       &lt;- qetlbr = 0x748CBF91
</pre>
<p>Here Oracle got rowid pointing to another table block. It checks if previous buffer is pinned, kcbispnd returns 1, this means that buffer is pinned, Oracle finds pinned buffer contains another table block then block from current rowid and releases pinned buffer. It is why Oracle executes kcbispnd before kdsgrp. It checks if rowid from the index contains the same table buffer as it keeps pinned or not.  Inside kdsgrp it executes kcbispnd again to check that block from current rowid is pinned, finds that this buffer is not pinned yet, increments &#8220;buffer is not pinned count&#8221;, returns 0, and initiates logical read of the table block.</p>
<p>It is why clustering factor is important.<br />
After Oracle reads next index entry from an index it checks if this index entry points to the same table buffer as previous index entry or not. If yes then Oracle just revisit a pinned buffer. Otherwise Oracle releases previous pinned table buffer and get and pin the buffer from the current index entry.</p>
<p>Have you noticed that a few buffers are still pinned in this point? These buffers are released at the end of the fetch call.</p>
<pre class="brush: plain; first-line: 2459; title: ; wrap-lines: false; notranslate">
   -&gt; qecrlssub(0x89FB0D58, 0xFFFFFD7FFDC12038, 0xA)
     -&gt; qergsRelease(0x89FAF9B0, 0xFFFFFD7FFD99EB60, 0xA)
       -&gt; qertbRelease(0x89FAFB80, 0xFFFFFD7FFD99E858, 0xA)
         -&gt; kcbipnns(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD99E858, 0xA)
         &lt;- kcbipnns = 0x1
         -&gt; kcbrls(0xFFFFFD7FFD99E988, 0xFFFFFD7FFD99E858, 0xA)
           -&gt; kcbzar(0x91A74418, 0x918C9830, 0x100000)
           &lt;- kcbzar = 0x8
           -&gt; kcbzfs(0x91A74398, 0x2000000000000019, 0xFFFFFFFD)
             -&gt; kjbilms(0x91A74398, 0x2000000000000019, 0xFFFFFFFD)
             &lt;- kjbilms = 0x0
             -&gt; kssrmf_numa_intl(0x91A74398, 0x91F2BB00, 0x0)
             &lt;- kssrmf_numa_intl = 0x0
           &lt;- kcbzfs = 0x0
         &lt;- kcbrls = 0x0
         -&gt; qerixRelease(0x89FAFE08, 0xFFFFFD7FFD99E170, 0x0)
           -&gt; qerixReleaseSelf(0x89FAFE08, 0x1, 0x0)
             -&gt; kcbipnns(0xFFFFFD7FFD99E610, 0x1, 0x0)
             &lt;- kcbipnns = 0x1
             -&gt; kcbipnns(0xFFFFFD7FFD99E610, 0x1, 0x0)
             &lt;- kcbipnns = 0x1
             -&gt; kcbrls(0xFFFFFD7FFD99E610, 0x1, 0x0)
               -&gt; kcbzar(0x91A74098, 0x918A0548, 0x100000)
               &lt;- kcbzar = 0x8
               -&gt; kcbzfs(0x91A74018, 0x2000000000000019, 0xFFFFFFFD)
                 -&gt; kjbilms(0x91A74018, 0x2000000000000019, 0xFFFFFFFD)
                 &lt;- kjbilms = 0x0
                 -&gt; kssrmf_numa_intl(0x91A74018, 0x91F2BB00, 0x0)
                 &lt;- kssrmf_numa_intl = 0x0
               &lt;- kcbzfs = 0x0
             &lt;- kcbrls = 0x0
             -&gt; kcbipnns(0xFFFFFD7FFD99E3C0, 0x91F2BB00, 0x0)
             &lt;- kcbipnns = 0x1
             -&gt; kcbrls(0xFFFFFD7FFD99E3C0, 0x91F2BB00, 0x0)
               -&gt; kcbzar(0x91A74958, 0x91F9D780, 0x100000)
               &lt;- kcbzar = 0x8
               -&gt; kcbzfs(0x91A748D8, 0x2000000000000019, 0xFFFFFFFD)
                 -&gt; kjbilms(0x91A748D8, 0x2000000000000019, 0xFFFFFFFD)
                 &lt;- kjbilms = 0x0
                 -&gt; kssrmf_numa_intl(0x91A748D8, 0x91F2BB00, 0x0)
                 &lt;- kssrmf_numa_intl = 0x0
               &lt;- kcbzfs = 0x0
             &lt;- kcbrls = 0x0
           &lt;- qerixReleaseSelf = 0x1
         &lt;- qerixRelease = 0xE6
       &lt;- qertbRelease = 0xE6
     &lt;- qergsRelease = 0x26
   &lt;- qecrlssub = 0x26
</pre>
<p><a name="3"></a><br />
<strong>Conclusion</strong><br />
We can conclude that &#8220;buffer is pinned count&#8221;<br />
– is not a number of times when Oracle re-visit pinned buffer. It can read from pinned buffer without any checks, it is just reading from memory, as fetched rows from the index in our example.<br />
– is not a check before every logical read.<br />
– is not a number of times when a buffer has been pinned. Number times when a buffer has been pinned is number of logical reads, except examinations and direct-path reads.<br />
– is called when Oracle in some point of source code does not know if a buffer is pinned or not. When it depends on some conditions and on data.</p>
<p>Notice that there are no calls related exactly with latches such as kslget or ksl_get_shared_latch. It is what I was talking about scripts KSLBEGIN/KSLEND inside procedures kcbgtcr and kcbrls (look at <a href="http://alexanderanokhin.wordpress.com/tools/digger/#ImportantNotes">Important note #1</a>).</p>
<p><a name="4"></a><br />
<strong>Appendix</strong><br />
There is an option to trace buffer pinning by set the parameter _trace_pin_time:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
alter system set &quot;_trace_pin_time&quot;=1 scope=spfile;
</pre>
<p>The database should be restarted.</p>
<p>In this case buffer pins (where, data block address, time) in all sessions will be traced. Example:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
pin kdiwh15: kdifxs dba 0x100071d:1 time 3630971669
pin kdswh05: kdsgrp dba 0x1000714:1 time 3630973016
pin kdswh05: kdsgrp dba 0x1000713:1 time 3630973465
pin kdswh05: kdsgrp dba 0x1000714:1 time 3630974088
pin kdswh05: kdsgrp dba 0x1000713:1 time 3630974532
</pre>
<p>Every row contains</p>
<pre class="brush: plain; light: true; title: ; notranslate">
pin &lt;where&gt; dba &lt;DBA&gt; time &lt;timestamp&gt;
</pre>
<p>&#8220;where&#8221; means function name inside which buffer has been pinned.</p>
<p>Following query can be used in order to see used (active) buffer handles</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select bf.*, 
       w.kcbwhdes
  from (select b.*, 
               decode(kcbbfcr, 1, 'CR', 'CUR') cr
          from x$kcbbf b
         where bitand(b.kcbbfso_flg, 1) = 1) bf,
       x$kcbwh w
 where bf.kcbbfwhr = w.indx
</pre>
<p>Join with x$kcbwh here is helpful just to see kcbwhdes which means &#8220;where&#8221; that is a function name where buffer has been pinned.</p>
<p><a name="5"></a><br />
<strong>Post Scriptum</strong><br />
I hope it helped you to understand better what is &#8220;buffer is pinned count&#8221; and how to use <a href="http://alexanderanokhin.com/tools/digger/">the Digger</a>.</p>
<p>By the way look at the annoucements at the right side of the page. It is what I am going to publish next time. Next blog entries will be about timing, rowsource statistics and wait events.<br />
Coming soon.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/892/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/892/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=892&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/07/26/buffer-is-pinned-count/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>Digger: the tool for tracing of unix processes</title>
		<link>http://alexanderanokhin.wordpress.com/2012/07/26/digger-the-tool-for-tracing-of-unix-processes/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/07/26/digger-the-tool-for-tracing-of-unix-processes/#comments</comments>
		<pubDate>Thu, 26 Jul 2012 14:38:53 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=825</guid>
		<description><![CDATA[I would like to introduce new tool – Digger. This tool allows to see tree of process’ calls such as application calls, library, system calls (and even kernel functions and OS scheduler actions) with additional information as function arguments, result, cpu &#38; elapsed time. This is not something Oracle specific, the tool can be used [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=825&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I would like to introduce new tool – Digger.<br />
This tool allows to see tree of process’ calls such as application calls, library, system calls (and even kernel functions and OS scheduler actions) with additional information as function arguments, result, cpu &amp; elapsed time.<br />
This is not something Oracle specific, the tool can be used for tracing of any unix process (DTrace is required).</p>
<p><a href="http://alexanderanokhin.com/tools/digger/">Continue&#8230;</a></p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/825/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/825/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=825&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/07/26/digger-the-tool-for-tracing-of-unix-processes/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>Don&#8217;t forget about column projection</title>
		<link>http://alexanderanokhin.wordpress.com/2012/07/18/dont-forget-about-column-projection/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/07/18/dont-forget-about-column-projection/#comments</comments>
		<pubDate>Wed, 18 Jul 2012 13:48:47 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=785</guid>
		<description><![CDATA[Note: this post is not about some exact statement, but about importance of column projection which should not be ignored, especially in cases as operations requiring workareas, data access optimization, Exadata offloading and others. Let&#8217;s consider merge of two simple tables. The tables: And a simple statement: Let&#8217;s look at the execution plan with rowsource [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=785&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><em>Note: this post is not about some exact statement, but about importance of column projection which should not be ignored, especially in cases as operations requiring workareas, data access optimization, Exadata offloading and others.</em></p>
<p>Let&#8217;s consider merge of two simple tables.</p>
<pre class="brush: plain; light: true; title: ; notranslate">
 merge into t1
 using t2 on (t1.id = t2.id)
  when matched 
  then update set n = 1;
</pre>
<p>The tables:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL&gt; select * from v$version where rownum = 1;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production

SQL &gt; create table t1 compress
  2             as
  3         select rownum as id,
  4                0 as n,
  5                lpad('*', 4000, '*') as pad
  6           from dual
  7        connect by level &lt;= 1000000;

Table created.

SQL &gt; create table t2 compress
  2             as
  3         select 1000000 + rownum as id,
  4                1 as n,
  5                lpad('*', 4000, '*') as pad
  6           from dual
  7        connect by level &lt;= 1100000;

Table created.

SQL &gt; exec dbms_stats.gather_table_stats(user, 'T1', estimate_percent =&gt; 100, degree =&gt; 4);

PL/SQL procedure successfully completed.

SQL &gt; exec dbms_stats.gather_table_stats(user, 'T2', estimate_percent =&gt; 100, degree =&gt; 4);

PL/SQL procedure successfully completed.

SQL &gt; alter session set statistics_level=all;

Session altered.
</pre>
<p>And a simple statement:</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL &gt; merge into t1
  2   using t2 on (t1.id = t2.id)
  3    when matched 
  4    then update set n = 1;

0 rows merged.

Elapsed: 00:08:41.93
</pre>
<p>Let&#8217;s look at the execution plan with rowsource statistics:</p>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
SQL &gt; select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

-------------------------------------
SQL_ID  b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id)  when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:08:41.66 |   52742 |    608K|    556K|  2047M|    56M|   30M (1)|    4346K|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:05.96 |   22174 |  22163 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:05.58 |   30568 |  30556 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(&quot;T1&quot;.&quot;ID&quot;=&quot;T2&quot;.&quot;ID&quot;)
</pre>
<p>So, our statement is executing more than 8 minutes. All this time has been consumed on the step 3 &#8211; HASH JOIN. Notice amount of used temp space 4346 Mb (this is old oddity that value in Used-Temp should be multiplied by 1000) as a result high amount of I/O operations on temp space (columns Reads, Writes &#8211; 556K of physical reads and 556K physical writes via direct path read temp and direct path write temp) due to size of workarea (hash_area_size) was not enough (Used-Mem = 30Mb).<br />
<em>Note: obviously in your case it can be absolutely another time and amount of I/O, in particular it depends on hash area size, in my case it was about 30Mb</em></p>
<p>How can we improve the performance of the statement and especially hash join?<br />
Should we increase hash area size?</p>
<p>May be, but before let&#8217;s take look at the same execution plan with column projection:</p>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
SQL&gt; select * from table(dbms_xplan.display_cursor('b5cp092vum9nw', null, 'allstats last +projection'));
 
-------------------------------------
SQL_ID  b5cp092vum9nw, child number 0
-------------------------------------
merge into t1 using t2 on (t1.id = t2.id)  when matched then update set n = 1
Plan hash value: 3423882595
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:08:41.66 |   52742 |    608K|    556K|       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:08:41.66 |   52742 |    608K|    556K|  2047M|    56M|   30M (1)|    4346K|
|   4 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:05.96 |   22174 |  22163 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:05.58 |   30568 |  30556 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(&quot;T1&quot;.&quot;ID&quot;=&quot;T2&quot;.&quot;ID&quot;)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   3 - (#keys=1) &quot;T1&quot;.&quot;ID&quot;[NUMBER,22], &quot;T2&quot;.&quot;ID&quot;[NUMBER,22], &quot;T1&quot;.ROWID[ROWID,10], &quot;T1&quot;.&quot;PAD&quot;[VARCHAR2,4000], &quot;T1&quot;.&quot;N&quot;[NUMBER,22], 
       &quot;T2&quot;.&quot;PAD&quot;[VARCHAR2,4000], &quot;T2&quot;.&quot;N&quot;[NUMBER,22]
   4 - &quot;T1&quot;.ROWID[ROWID,10], &quot;T1&quot;.&quot;ID&quot;[NUMBER,22], &quot;T1&quot;.&quot;N&quot;[NUMBER,22], &quot;T1&quot;.&quot;PAD&quot;[VARCHAR2,4000]
   5 - &quot;T2&quot;.&quot;ID&quot;[NUMBER,22], &quot;T2&quot;.&quot;N&quot;[NUMBER,22], &quot;T2&quot;.&quot;PAD&quot;[VARCHAR2,4000]
</pre>
<p>Pay attention on column projection of the steps 3, 4, 5. It contains ALL columns from the tables T1 and T2 although the query uses only some of them. The part of the execution plan below of the step 2 is equivalent to</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select t1.rowid,
       t1.id,
       t1.n,
       t1.pad,
       t2.id,
       t2.n,
       t2.pad
  from t1, t2
 where t1.id = t2.id
</pre>
<p>It looks like a bug or &#8220;not implemented yet&#8221;. In our case it means that all columns from the table T1 (especially the fat column PAD) will be retrieved and stored in the workarea.</p>
<p>This can dramatically affect the performance: </p>
<ul>
<li>
For operations using workareas it means increased amount of dataset. If required amount of available memory is not enough then it leads to additional I/O in temp space.<br />
In our particular case all columns from the table T1 are retrieved and put into the workarea. If the condition in the statement would be something like </p>
<pre class="brush: plain; light: true; title: ; notranslate">
t1.id between t2.id and t2.id + 100
</pre>
<p>then Hash Join would be impossible, because it can be based on &#8220;equality&#8221; condition. In that case we can expect Merge Join and then both datasests must be sorted.</p>
<pre class="brush: plain; collapse: true; light: false; title: ; toolbar: true; notranslate">
SQL&gt; explain plan for
  2  merge into t1
  3  using t2
  4  on (t1.id between t2.id and t2.id + 100)
  5  when matched then
  6      update set n = 1;
 
Explained
 
SQL&gt; @plan
 
---------------------------------------------------------------------------------------
| Id  | Operation              | Name | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT        |      |     1 |     2 |       |  1766K  (1)| 05:53:21 |
|   1 |  MERGE                 | T1   |       |       |       |            |          |
|   2 |   VIEW                 |      |       |       |       |            |          |
|   3 |    MERGE JOIN          |      |     1 |  8018 |       |  1766K  (1)| 05:53:21 |
|   4 |     SORT JOIN          |      |  1000K|  3822M|  7812M|   840K  (1)| 02:48:06 |
|   5 |      TABLE ACCESS FULL | T1   |  1000K|  3822M|       |  4928   (1)| 00:01:00 |
|*  6 |     FILTER             |      |       |       |       |            |          |
|*  7 |      SORT JOIN         |      |  1100K|  4206M|  8593M|   926K  (1)| 03:05:16 |
|   8 |       TABLE ACCESS FULL| T2   |  1100K|  4206M|       |  6779   (1)| 00:01:22 |
---------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   6 - filter(&quot;T1&quot;.&quot;ID&quot;&lt;=&quot;T2&quot;.&quot;ID&quot;+100)
   7 - access(INTERNAL_FUNCTION(&quot;T1&quot;.&quot;ID&quot;)&gt;=INTERNAL_FUNCTION(&quot;T2&quot;.&quot;ID&quot;))
       filter(INTERNAL_FUNCTION(&quot;T1&quot;.&quot;ID&quot;)&gt;=INTERNAL_FUNCTION(&quot;T2&quot;.&quot;ID&quot;))
</pre>
</li>
<li>It affects data access optimization. If there would be indexes containing all columns using in a query as t1(id, n) or t2(id,n), then Index Fast Full Scan will be impossible here because the indexes do not contain required redundant columns (as PAD).<br />
If there would be a condition doing Index Range Scan appropriate access path then TABLE ACCESS BY INDEX ROWID would exist to get rest of unnecessary columns.</li>
<li>
If a row is chained then it leads to additional logical (and maybe physical) I/O to read redundant columns</li>
<li>
If the query would be run on Exadata and one or both Full Table Scan would be offloading, then there would no Column Projection optimization, returned amount of data from Storage Cells to PGA would be redundant and contains unnecessary columns from a table.
</li>
<li>etc</li>
<p>It is what can happen if we ignore Column Projection.</p>
<p>Our particular case can be fixed easy</p>
<pre class="brush: plain; light: true; title: ; notranslate">
SQL &gt;  merge into (select id, n from t1) t1
  2    using (select id, n from t2) t2 on (t1.id = t2.id)
  3     when matched
  4     then update set n = 1;

0 rows merged.

Elapsed: 00:00:20.09
</pre>
<p><em><strong>upd note:</strong> (select <strong>id, n</strong> from t2) is redundant here. It could be (select <strong>id</strong> from t2).</em></p>
<p>The execution plan with rowsurce statistics and column projection:</p>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
----------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  | Writes |  OMem |  1Mem | Used-Mem | Used-Tmp|
----------------------------------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
|   2 |   VIEW               |      |      1 |        |      0 |00:00:19.83 |   52742 |  35651 |   2032 |       |       |          |         |
|*  3 |    HASH JOIN         |      |      1 |      1 |      0 |00:00:19.83 |   52742 |  35651 |   2032 |    36M|  6589K|   34M (1)|   16384 |
|   4 |     TABLE ACCESS FULL| T2   |      1 |   1100K|   1100K|00:00:11.77 |   30568 |  28841 |      0 |       |       |          |         |
|   5 |     TABLE ACCESS FULL| T1   |      1 |   1000K|   1000K|00:00:04.44 |   22174 |   4778 |      0 |       |       |          |         |
----------------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(&quot;ID&quot;=&quot;ID&quot;)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - SYSDEF[4], SYSDEF[32720], SYSDEF[1], SYSDEF[96], SYSDEF[32720]
   3 - (#keys=1) &quot;ID&quot;[NUMBER,22], &quot;ID&quot;[NUMBER,22], &quot;N&quot;[NUMBER,22], &quot;T1&quot;.ROWID[ROWID,10], &quot;N&quot;[NUMBER,22]
   4 - &quot;ID&quot;[NUMBER,22], &quot;N&quot;[NUMBER,22]
   5 - &quot;T1&quot;.ROWID[ROWID,10], &quot;ID&quot;[NUMBER,22], &quot;N&quot;[NUMBER,22]
</pre>
<p>Now column projection does not contain unnecessary columns, amount of required memory and as a result amount of temp space (Used-Tmp ~ 16Mb) and I/O (2032 physical reads and 2032 physical writes via direct path read temp and direct path write temp) was significantly reduced, and the statement was executed about 20 seconds instead of 8 minutes.</p>
<p>Notice, that in this case Oracle takes the table T2 as build input. It is because RowID from the table T1 is required to perform MERGE.<br />
The part of the execution plan below of the step 2 now looks like</p>
<pre class="brush: plain; light: true; title: ; notranslate">
select t1.id, 
       t1.n, 
       t1.rowid, 
       t2.id, 
       t2.n
  from t1, t2
 where t1.id = t2.id
</pre>
<p>It estimates that amount of resultset </p>
<pre class="brush: plain; light: true; title: ; notranslate">
1100K rows * (&quot;ID&quot;[NUMBER,22], &quot;N&quot;[NUMBER,22])
</pre>
<p>less than </p>
<pre class="brush: plain; light: true; title: ; notranslate">
1000K rows * (&quot;T1&quot;.ROWID[ROWID,10], &quot;ID&quot;[NUMBER,22], &quot;N&quot;[NUMBER,22])
</pre>
<p>and that resultset requires less amount of workarea.<br />
Thus it is reasonable to use T2 as build input here.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/785/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/785/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=785&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/07/18/dont-forget-about-column-projection/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>A small change in parallel insert with serial data access between 11.1 and 11.2</title>
		<link>http://alexanderanokhin.wordpress.com/2012/06/22/change-in-parallel-insert-between-11-1-and-11-2-with-serial-data-access/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/06/22/change-in-parallel-insert-between-11-1-and-11-2-with-serial-data-access/#comments</comments>
		<pubDate>Fri, 22 Jun 2012 09:34:52 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=756</guid>
		<description><![CDATA[There is an interesting small change in parallel insert between 11.1 and 11.2 in queries with parallelized insert, but serialized data access. On a practice the simplest case where we can see this it is CTAS or parallel insert as select from remote table. As an example I will use CTAS from remote table. Note: [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=756&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>There is an interesting small change in parallel insert between 11.1 and 11.2 in queries with parallelized insert, but serialized data access. On a practice the simplest case where we can see this it is CTAS or parallel insert as select from remote table.<span id="more-756"></span><br />
As an example I will use CTAS from remote table.</p>
<pre class="brush: plain; light: true; title: ; notranslate">
create table tbllocal parallel 2
    as select *
  from tblremote@dblink;
</pre>
<p><em>Note: even if remote part of this query (select from tblremote@dblink) will be parallelized, it will be run in parallel on remote side. Communication between remote and local servers will be performed in one thread: local query coordinator &#8211; remote query coordinator. So, in our case it does not matter if query part parallelized or not.</em></p>
<p><strong>11.1.0.7</strong></p>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
SQL&gt; select * from v$version where rownum = 1;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

SQL&gt; show parameter optimizer_features_enable

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      11.1.0.7

SQL&gt; explain plan for
  2  create table tbllocal parallel 2
  3      as select * from tblremote@dblink;

SQL&gt; @plan

-------------------------------------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Rows  | Bytes | Cost (%CPU)| Time     | TQ/Ins |IN-OUT| PQ Distrib |
-------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT   |           |    41 |    82 |     3   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR          |           |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)    | :TQ10001  |    41 |    82 |     2   (0)| 00:00:01 |  Q1,01 | P-&gt;S | QC (RAND)  |
|   3 |    LOAD AS SELECT        | TBLLOCAL  |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     BUFFER SORT          |           |       |       |            |          |  Q1,01 | PCWC |            |
|   5 |      PX RECEIVE          |           |    41 |    82 |     2   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   6 |       PX SEND ROUND-ROBIN| :TQ10000  |    41 |    82 |     2   (0)| 00:00:01 |        | S-&gt;P | RND-ROBIN  |
|   7 |        REMOTE            | TBLREMOTE |    41 |    82 |     2   (0)| 00:00:01 | DBLINK | R-&gt;S |            |
-------------------------------------------------------------------------------------------------------------------
Remote SQL Information (identified by operation id):
----------------------------------------------------
   7 - SELECT &quot;DUMMY&quot; FROM &quot;TBLREMOTE&quot; &quot;TBLREMOTE&quot; (accessing 'DBLINK.WORLD' )
</pre>
<p>This execution plan means (a little bit simlified):<br />
Query Coordinator:<br />
[step 7] get result from remote side,<br />
[step 6] put data into special structures in SGA</p>
<p>Slave process:<br />
[step 5] gets data from SGA<br />
[step 4] put the data in workarea (PGA)<br />
[step 3] performes direct-path insert</p>
<p>Pay attention on the step 4 &#8211; BUFFER SORT.<br />
BUFFER SORT it is buffering technic using sort area size, without actual sorting.<br />
<em>Note: additional details: <a title="Jonathan Lewis - buffer-sorts" href="http://jonathanlewis.wordpress.com/2006/12/17/buffer-sorts/">Jonathan Lewis &#8211; buffer-sorts</a></em></p>
<p>The problem here is that BUFFER SORT should get ALL data from child rowsource before parent operation will be performed. Before inserts will be started all data from remote side will be read and put into workareas of slave processes. As the result if you are going to carry out CTAS from huge remote table you need the same huge amount of memory (sort area size) and if you do not have enough memory the data will be spilled in temp.<br />
As a result there is probability to get many &#8220;direct-path read/write temp&#8221; wait events and even <em>&#8220;ORA-01652: unable to extend temp segment .. in tablespace TEMP&#8221;</em> before the first row will be inserted.</p>
<p><strong>11.2.0.2</strong><br />
Fortunately Oracle 11.2 does not have this step in execution plan.</p>
<p><em>Note: Oracle 11.2 does not have this step in execution plan even optimizer_features_enable = 11.1.0.7, 10.2.0.5, and adds this step when optimizer_features_enable = 10.2.0.4</em></p>
<pre class="brush: plain; light: true; title: ; wrap-lines: false; notranslate">
SQL&gt; select * from v$version where rownum = 1;

BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production

SQL&gt; show parameter optimizer_features_enable

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
optimizer_features_enable            string      11.2.0.2

SQL&gt; explain plan for
  2  create table tbllocal parallel 2
  3      as select * from tblremote@dblink;

Explained

SQL&gt; @plan
------------------------------------------------------------------------------------------------------------------
| Id  | Operation               | Name      | Rows  | Bytes | Cost (%CPU)| Time     | TQ/Ins |IN-OUT| PQ Distrib |
------------------------------------------------------------------------------------------------------------------
|   0 | CREATE TABLE STATEMENT  |           |    41 |    82 |     3   (0)| 00:00:01 |        |      |            |
|   1 |  PX COORDINATOR         |           |       |       |            |          |        |      |            |
|   2 |   PX SEND QC (RANDOM)   | :TQ10001  |    41 |    82 |     2   (0)| 00:00:01 |  Q1,01 | P-&gt;S | QC (RAND)  |
|   3 |    LOAD AS SELECT       | TBLLOCAL  |       |       |            |          |  Q1,01 | PCWP |            |
|   4 |     PX RECEIVE          |           |    41 |    82 |     2   (0)| 00:00:01 |  Q1,01 | PCWP |            |
|   5 |      PX SEND ROUND-ROBIN| :TQ10000  |    41 |    82 |     2   (0)| 00:00:01 |        | S-&gt;P | RND-ROBIN  |
|   6 |       REMOTE            | TBLREMOTE |    41 |    82 |     2   (0)| 00:00:01 | DBLINK | R-&gt;S |            |
------------------------------------------------------------------------------------------------------------------
Remote SQL Information (identified by operation id):
----------------------------------------------------
   6 - SELECT &quot;DUMMY&quot; FROM &quot;TBLREMOTE&quot; &quot;TBLREMOTE&quot; (accessing 'DBLINK.WORLD' )
</pre>
<p>Here slave process will inserts data immediately after get a portion from query coordinator.<br />
Thus, here you do not need huge amount of memory and temp space to perform CTAS from huge remote table.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/756/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/756/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=756&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/06/22/change-in-parallel-insert-between-11-1-and-11-2-with-serial-data-access/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>Serial direct read for small tables in 11.2.0.2</title>
		<link>http://alexanderanokhin.wordpress.com/2012/05/21/serial-direct-read-for-small-tables-in-11-2-0-2/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/05/21/serial-direct-read-for-small-tables-in-11-2-0-2/#comments</comments>
		<pubDate>Mon, 21 May 2012 17:18:07 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=638</guid>
		<description><![CDATA[Today I have fixed an issue related with serial direct path reads. There is 11.2.0.2 database There is a very small table, let&#8217;s call it dualcopy. Let&#8217;s try to do select from the table with enabled 10046 event: Below is an excerpt from the trace file What?! serial direct path read for the table in [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=638&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Today I have fixed an issue related with serial direct path reads. </p>
<p>There is 11.2.0.2 database</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; select * from v$version where rownum = 1;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
 
SQL&gt; SELECT a.ksppinm  &quot;Parameter&quot;,
  2         b.ksppstvl &quot;Session Value&quot;,
  3         c.ksppstvl &quot;Instance Value&quot;
  4    FROM x$ksppi a, x$ksppcv b, x$ksppsv c
  5   WHERE a.indx = b.indx
  6     AND a.indx = c.indx
  7     AND a.ksppinm = '_serial_direct_read';
 
Parameter               Session Value           Instance Value
----------------------- ----------------------- -----------------------
_serial_direct_read     auto                    auto
</pre>
<p>There is a very small table, let&#8217;s call it dualcopy. </p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; exec dbms_stats.gather_table_stats(user, 'dualcopy', estimate_percent =&gt; 100);
 
PL/SQL procedure successfully completed
 
SQL&gt; select object_type,
  2         num_rows,
  3         blocks,
  4         empty_blocks
  5    from all_tab_statistics
  6   where table_name = 'DUALCOPY';
 
OBJECT_TYPE    NUM_ROWS     BLOCKS EMPTY_BLOCKS
------------ ---------- ---------- ------------
TABLE                 1          1            0
</pre>
<p>Let&#8217;s try to do select from the table with enabled 10046 event:</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; alter session set events '10046 trace name context forever, level 8';
 
Session altered

SQL&gt; select * from dualcopy;
 
DUMMY
-----
X
</pre>
<p>Below is an excerpt from the trace file</p>
<pre class="brush: plain; highlight: [8]; title: ; wrap-lines: false; notranslate">
PARSING IN CURSOR #18446744071497556472 len=24 dep=0 uid=2500 oct=3 lid=2500 tim=3988723224325 hv=250290826 ad='3e911e388' sqlid='g32a9w87fq8na'
select * from dualcopy
END OF STMT
PARSE #18446744071497556472:c=20000,e=26430,p=0,cr=27,cu=0,mis=1,r=0,dep=0,og=2,plh=769194902,tim=3988723224324
EXEC #18446744071497556472:c=0,e=21,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=2,plh=769194902,tim=3988723224417
WAIT #18446744071497556472: nam='SQL*Net message to client' ela= 4 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3988723224541
WAIT #18446744071497556472: nam='SQL*Net message from client' ela= 66047 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=3988723290684
WAIT #18446744071497556472: nam='direct path read' ela= 7852 file number=705 first dba=23954 block cnt=1 obj#=2229858 tim=3988723301159
WAIT #18446744071497556472: nam='SQL*Net message to client' ela= 5 driver id=1413697536 #bytes=1 p3=0 obj#=2229858 tim=3988723301360
FETCH #18446744071497556472:c=10000,e=10521,p=1,cr=2,cu=0,mis=0,r=1,dep=0,og=2,plh=769194902,tim=3988723301437
STAT #18446744071497556472 id=1 cnt=1 pid=0 pos=1 obj=2229858 op='TABLE ACCESS FULL DUALCOPY (cr=2 pr=1 pw=0 time=10419 us cost=2 size=2 card=1)'
</pre>
<p>What?! serial direct path read for the table in one block?! What&#8217;s going on?</p>
<p>The table can be small, but if your application accesses it many times it can lead to significant amount of physical I/O.</p>
<p>Serial direct read mode (_serial_direct_read) is a property of a child cursor. Similar behavior is possible if the same query was parsed earlier in a session with ALWAYS or TRUE mode (_serial_direct_read=always or _serial_direct_read=true). If we pay attention on the parse call, it was a hard parsing  (mis=1) and new child cursor. Thus, this is not the reason in our case.</p>
<p>The reason is that the table has been configured for using of KEEP pool, but database has not.</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; select table_name, buffer_pool
  2    from all_tables
  3   where table_name = 'DUALCOPY';
 
TABLE_NAME                     BUFFER_POOL
------------------------------ -----------
DUALCOPY                       KEEP

SQL&gt; show parameter db_keep_cache_size
 
NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
db_keep_cache_size                   big integer 0
</pre>
<p>This is the bug 12530276. Oracle 11.2.0.2 treats all &#8216;keep objects&#8217; as large objects when &#8216;keep pool&#8217; is not configured.</p>
<p>Allocation of KEEP pool fixes the problem.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/638/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/638/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=638&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/05/21/serial-direct-read-for-small-tables-in-11-2-0-2/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>Dynamic tracing of Oracle logical I/O: part 2. Dtrace LIO v2 is released.</title>
		<link>http://alexanderanokhin.wordpress.com/2012/03/19/dtrace-lio-new-features/</link>
		<comments>http://alexanderanokhin.wordpress.com/2012/03/19/dtrace-lio-new-features/#comments</comments>
		<pubDate>Mon, 19 Mar 2012 15:44:30 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=489</guid>
		<description><![CDATA[What&#8217;s new in DTrace LIO? Jump to introduction: Dynamic tracing of Oracle logical I/O 1. The list of supported functions performing logical I/O is extended: Note: some function names below are my suggestions. Consistent gets: kcbgtcr &#8211; Kernel Cache Buffer Get Consistent Read. This is general entry point for consistent read. kcbldrget &#8211; Kernel Cache [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=489&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>What&#8217;s new in <a title="DTraecLIO source code" href="http://alexanderanokhin.wordpress.com/scripts/dtracelio-d/">DTrace LIO</a>?<br />
<em>Jump to introduction: <a href="http://alexanderanokhin.wordpress.com/2011/11/13/dynamic-tracing-of-oracle-logical-io/">Dynamic tracing of Oracle logical I/O</a></em></p>
<p><strong>1. The list of supported functions performing logical I/O is extended:</strong><br />
<em>Note: some function names below are my suggestions.</em></p>
<p><strong>Consistent gets:</strong></p>
<ul>
<li>kcbgtcr &#8211; Kernel Cache Buffer Get Consistent Read. This is general entry point for consistent read.</li>
<li>kcbldrget &#8211; Kernel Cache Buffer Load Direct-Read Get. The function performing direct-path read. Interesting detail: 10.2 the function kcbldrget is called just after kcbgtcr, in 11.2 by (from) kcbgtcr.</li>
</ul>
<p><strong>Current gets (db block gets):</strong></p>
<ul>
<li>kcbgcur &#8211; Kernel Cache Buffer Get Current Read</li>
<li>kcbget &#8211; Kernel Cache Buffer Get Buffer. This is analogue of kcbgcur function, as I observed this function is called for index branch and leaf blocks</li>
<li>kcbnew &#8211; Kernel Cache Buffer New Buffer</li>
<li>kcblnb (kcblnb_dscn in 11.2) &#8211; Kernel Cache Buffer Load New Buffer. The function performing direct-path load. Decoding of block coordinates is not supported for this function in the current version of DTraceLIO. The parameters such as object_id, data_object_id will be shown as 0 (zero)</li>
</ul>
<p>Note: KCB and KCBL are module names<br />
KCB &#8211; Kernel Cache Buffer (Get, change, and release buffers)<br />
KCBL &#8211; Kernel Cache Buffer Load (Direct I/O routines)<em><br />
</em></p>
<p><strong>2. Functions related with buffer pinning are added.</strong><br />
Following functions calls are traced:</p>
<ul>
<li>kcbispnd &#8211; Kernel Cache Buffer Is Pinned &#8211; functions where Oracle checks is some buffer pinned yet or not. In this function Oracle either increments statistic &#8220;buffer is pinned count&#8221; if buffer is pinned yet or &#8220;buffer is not pinned count&#8221; if buffer is not pinned.</li>
<li>kcbrls &#8211; Kernel Cache Buffer Release Pin</li>
</ul>
<p>Important note: buffer pinning itself is performed inside logical I/O functions. So, unfortunately it is impossible to see it as function calls.</p>
<p><strong>3. Changed parameters</strong><br />
Usage:<br />
dtracelio.d PID [show_each_call] [interval]</p>
<ul>
<li>PID &#8211; unix process ID</li>
<li>show_each_call &#8211; This is the bitmask determining how functions calls will be shown.<br />
<span style="margin-left:20px;">1st bit &#8211; to show logical I/O functions</span><br />
<span style="margin-left:20px;">2nd bit &#8211; to show buffer pinning</span><br />
Examples:<br />
<span style="margin-left:20px;">0: output of each call will be disabled</span><br />
<span style="margin-left:20px;">1: logical I/O functions will be shown</span><br />
<span style="margin-left:20px;">3: logical I/O and buffer pinning functions will be shown</span><br />
Default value: 1 (only logical I/O functions will be shown)</li>
<li>interval &#8211; Specifies the number of seconds over which Summary form with cumulative figures will be shown. Works only when show_each_call is disabled.<br />
Default value: 0 (disabled)</li>
</ul>
<p><strong>4. Existed summary table is changed, new columns are added</strong><br />
The existent summary table &#8220;Logical I/O Summary (grouped by function/object)&#8221; is changed.</p>
<pre class="brush: plain; gutter: false; title: ; wrap-lines: false; notranslate">
============================= Logical I/O Summary (grouped by function/object) ===========================
 function    stat   object_id   data_object_id   mode_held   where     bufs     calls
--------- ------- ----------- ---------------- ----------- ------- -------- ---------
  kcbgcur      cu           0               -1           2     381        1         1
  kcbgcur      cu       56756            56756           2     594        1         1
   kcbnew      cu           0               -1                 383        1         1
  kcbgtcr      cr       56756            56756                 446        2         2
  kcbgtcr      cr       56756            56756                 447        2         2
  kcbgtcr      cr       56756            56756                 577        2         2
==========================================================================================================
</pre>
<p>New columns:</p>
<ul>
<li>stat: &#8220;cr&#8221; &#8211; consistent get, &#8220;cr (d)&#8221; &#8211; consistent get direct, &#8220;cu&#8221; &#8211; current get, &#8221;cu (d)&#8221; &#8211; current get direct</li>
<li>mode_held: mode in which buffer is pinned. The column is supported only for functions kcbgcur and kcbget</li>
<li>bufs: number of processed bufers. This number is equal number of calls for almost all functions except kcbnew. In function kcbnew N (many) buffers can be processed by one function call.</li>
</ul>
<p><strong>5. New summary table is added</strong></p>
<p>In addition to old summary output grouped by function/object new output form &#8220;grouped by object&#8221; is added.</p>
<p>An example:</p>
<pre class="brush: plain; gutter: false; title: ; wrap-lines: false; notranslate">
================================= Logical I/O Summary (grouped by object) ================================
 object_id  data_object_id       lio        cr    cr (d)        cu    cu (d) ispnd (Y) ispnd (N)   pin rls
---------- --------------- --------- --------- --------- --------- --------- --------- --------- ---------
         1              -1        21         0         0        21         0         0         0        21
        17              17       108       108         0         0         0         0         0       108
         0              -1     64042        18         0     64024         0         0         0        36
    415893         2142703    463565     49351         0    414214         0         0    387842     88604
   2142704         2142704   1977665         0         0   1977665         0         0         0      9467
---------- --------------- --------- --------- --------- --------- --------- --------- --------- ---------
     total                   2505401     49477         0   2455924         0    387842         0     98236
==========================================================================================================
</pre>
<p>Where following figures are showed:</p>
<ul>
<li>lio &#8211; All logical reads done by an object. Equal of &#8220;session logical reads&#8221; statistic. lio = cr + cu</li>
<li>cr &#8211; consistent gets. This statistic includes cr (d)</li>
<li>cr (d) &#8211; consistent gets direct</li>
<li>cu &#8211; current gets. This statistic includes cu (d)</li>
<li>cu (d) &#8211; current gets direct</li>
<li>ispnd (Y) &#8211; buffer is pinned count</li>
<li>ispnd (N) &#8211; buffer is not pinned count</li>
<li>pin rls &#8211; buffer pin is released</li>
</ul>
<p>The list of objects is sorted by lio column.</p>
<p>Enjoy! <img src='http://s1.wp.com/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> </p>
<p>p.s.<br />
Announcement: One of the next blog entry will be about statistics &#8220;buffer is [not] pinned count&#8221;. I will show what these statistics do mean, how it works and how DTraceLIO with new functionality might be used.<br />
Coming soon.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/489/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/489/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=489&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2012/03/19/dtrace-lio-new-features/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
		<item>
		<title>Dynamic tracing of Oracle logical I/O</title>
		<link>http://alexanderanokhin.wordpress.com/2011/11/13/dynamic-tracing-of-oracle-logical-io/</link>
		<comments>http://alexanderanokhin.wordpress.com/2011/11/13/dynamic-tracing-of-oracle-logical-io/#comments</comments>
		<pubDate>Sun, 13 Nov 2011 03:01:47 +0000</pubDate>
		<dc:creator>Alexander Anokhin</dc:creator>
				<category><![CDATA[Oracle]]></category>
		<category><![CDATA[DTrace]]></category>

		<guid isPermaLink="false">http://alexanderanokhin.wordpress.com/?p=224</guid>
		<description><![CDATA[I would like to provide one more option to investigate Oracle logical I/O with DTrace. This method allows to see each consistent/current gets with details about block, object and location from which function is called. This is the DTrace script: dtracelio.d Note: Dtrace LIO with new features is released Short description: This tool allows to [&#8230;]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=224&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I would like to provide one more option to investigate Oracle logical I/O with DTrace.<br />
This method allows to see each consistent/current gets with details about block, object and location from which function is called. </p>
<p>This is the DTrace script: <a href="https://dl.dropbox.com/s/yu91obrtrigdac6/dtracelio.d?dl=1">dtracelio.d</a><br />
<em>Note: <a href="http://alexanderanokhin.wordpress.com/2012/03/19/dtrace-lio-new-features/">Dtrace LIO with new features is released</a></em></p>
<p><strong>Short description</strong>:<br />
This tool allows to see:<br />
 &#8211; details of each call (consistent/current get)<br />
 &#8211; details of blocks been read<br />
 &#8211; function from which call been performed without investigating of call stack (&#8220;where&#8221;)<br />
 &#8211; aggregation of performed calls</p>
<p>The tool can be especially helpful in 11g because x$kcbuwhy (x$kcbsw in 10g) is not populated properly there.<br />
<em>Note: Examples of using views x$kcbsw/x$kcbuwhy:<br />
- <a href="http://www.jlcomp.demon.co.uk/buffer_usage.html">Investigating Logical I/O by Jonathan Lewis</a><br />
- <a href="http://blog.tanelpoder.com/2009/11/19/finding-the-reasons-for-excessive-logical-ios/">Finding the reasons for excessive logical IOs by Tanel Poder</a><br />
</em></p>
<p>Current version of the script monitors execution of two Kernel Cache Layer functions:<br />
kcbgtcr &#8211; Kernel Cache Buffer Get Consistent Read. Number of calls of this function we can see as statistic &#8220;consistent gets&#8221; and in 10046 trace as &#8220;cr=&#8221;.<br />
kcbgcur &#8211; Kernel Cache Buffer Get Current Read. Number of calls of this function we can see as statistic &#8220;db block gets&#8221; in 10046 trace as &#8220;cu=&#8221;.<br />
<em>upd:<br />
statistic &#8220;db block gets&#8221; is incremented not only by kcbgcur, but also by kcbget (included in current version of dtracelio), kcbnew (coming soon), kcblnb* (direct-path load) and some others. But kcbgcur is most often used.<br />
</em></p>
<p>In the both functions:<br />
- the first argument is a pointer on the structure which describes a block;<br />
- the second argument is unknown, but I could suggest that this is lock_mode (from MOS bug 7109078);<br />
- the third argument (the least significant bits) is &#8220;where&#8221; (x$kcbwh), that is a module where the function is called. </p>
<p><strong>Usage:</strong><br />
dtracelio.d PID [show_each_call]</p>
<p>PID &#8211; unix process ID<br />
show_each_call &#8211; if 0 then output of each call will be disabled, else details of each call will be shown</p>
<p>A little example of usage. Let&#8217;s try to see what is going on during select on very small table:</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; @spid
 
       SID ORACLE_DEDICATED_PROCESS CLIENTPID
---------- ------------------------ ------------------------
        41 18949                    900:408
 
SQL&gt; create table dualcopy as select * from dual;
 
Table created
 
SQL&gt; select * from dualcopy;
 
DUMMY
-----
X

</pre>
<p>Notice, that I have executed &#8220;select * from dualcopy&#8221; to avoid hard parsing in consequences executions.</p>
<p>Let&#8217;s execute dtracelio</p>
<pre class="brush: plain; title: ; notranslate">
dtracelio.d 18949
</pre>
<p>I have executed the script with default parameter show_each_call, it means that details of each calls will be shown.</p>
<p>Now I am executing our query:</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; select * from dualcopy;
 
DUMMY
-----
X
</pre>
<p>And this is an output of dtracelio</p>
<pre class="brush: plain; title: ; notranslate">
kcbgtcr(0xFFFFFD7FFFDFB0F0,0,742,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 742
kcbgtcr(0xFFFFFD7FFFDFAED0,0,743,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 743
kcbgtcr(0xFFFFFD7FFDC3B2E0,0,860,0) [tsn: 4 rdba: 0x100060b (4/1547) obj: 79218 dobj: 79218] where: 860
</pre>
<p>You can see here:<br />
kcbgtcr(0xFFFFFD7FFFDFB0F0,0,742,0) &#8211; it is function call with 3 arguments,<br />
tsn: 4 &#8211; a tablespace number, ts# from v$tablespace<br />
rdba: 0x100060a &#8211; a relative dba (data block address)<br />
(4/1546) &#8211; file 4 block 1546<br />
obj: 79218 &#8211; dictionary object number, object_id from dba_objects<br />
dobj: 79218 &#8211; data object number, data_object_id from dba_objects<br />
where: 742 &#8211; location from function (kcbgtcr in this case) was executed. This is INDX from x$kcbwf.</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; select indx, kcbwhdes from x$kcbwh where indx in (742, 743, 860);
 
      INDX KCBWHDES
---------- ----------------------------------------------------------------
       742 ktewh25: kteinicnt
       743 ktewh26: kteinpscan
       860 kdswh01: kdstgr
</pre>
<p>It is functions from which our function kcbgtcr was executed. It is parent function which you will see if will print call stack during the call of kcbgtcr.<br />
If you would print call stack you will see (I&#8217;ve just added ustack() in the script):</p>
<pre class="brush: plain; highlight: [5,25,48]; title: ; notranslate">
kcbgtcr(0xFFFFFD7FFFDFB0F0,0,742,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 742

              oracle`kcbgtcr
              oracle`ktecgshx+0x3f6
              oracle`kteinicnt1+0x1cf
              oracle`qertbFetch+0xc2a
              oracle`opifch2+0xaa2
              oracle`opifch+0x3a
              oracle`opiodr+0x433
              oracle`ttcpip+0x599
              oracle`opitsk+0x600
              oracle`opiino+0x675
              oracle`opiodr+0x433
              oracle`opidrv+0x32e
              oracle`sou2o+0x57
              oracle`opimai_real+0x219
              oracle`ssthrdmain+0x14e
              oracle`main+0xcb
              oracle`0x159e67c

kcbgtcr(0xFFFFFD7FFFDFAED0,0,743,0) [tsn: 4 rdba: 0x100060a (4/1546) obj: 79218 dobj: 79218] where: 743

              oracle`kcbgtcr
              oracle`ktecgshx+0x3f6
              oracle`kteinpscan+0x1b6
              oracle`kteiniscan+0x2c
              oracle`kdselini+0x2f
              oracle`kdsirs1+0x792
              oracle`kdsirs+0x2b
              oracle`qertbFetch+0xbb0
              oracle`opifch2+0xaa2
              oracle`opifch+0x3a
              oracle`opiodr+0x433
              oracle`ttcpip+0x599
              oracle`opitsk+0x600
              oracle`opiino+0x675
              oracle`opiodr+0x433
              oracle`opidrv+0x32e
              oracle`sou2o+0x57
              oracle`opimai_real+0x219
              oracle`ssthrdmain+0x14e
              oracle`main+0xcb

kcbgtcr(0xFFFFFD7FFDC3B2E0,0,860,0) [tsn: 4 rdba: 0x100060b (4/1547) obj: 79218 dobj: 79218] where: 860

              oracle`kcbgtcr
              oracle`ktrget2+0x27d
              oracle`kdstgr+0x46b
              oracle`qertbFetch+0x466
              oracle`opifch2+0xaa2
              oracle`opifch+0x3a
              oracle`opiodr+0x433
              oracle`ttcpip+0x599
              oracle`opitsk+0x600
              oracle`opiino+0x675
              oracle`opiodr+0x433
              oracle`opidrv+0x32e
              oracle`sou2o+0x57
              oracle`opimai_real+0x219
              oracle`ssthrdmain+0x14e
              oracle`main+0xcb
              oracle`0x159e67c
</pre>
<p>So, 3 consistent gets, 3 logical reads were performed.<br />
Two times block (4/1546) &#8211; this is segment header &#8211; has been read. The first time from function kteinicnt and the second time from function kteinpscan.<br />
And the third get is block (4/1547) &#8211; this is data block &#8211; was read from function kdstgr.</p>
<p>After that, if you push Ctrl+C, dtracelio script will be finished, and you will see Summary section:</p>
<pre class="brush: plain; title: ; notranslate">
^C

========================= Summary ==========================
object_id    data_object_id  function     where        count
79218        79218           kcbgtcr      742          1
79218        79218           kcbgtcr      743          1
79218        79218           kcbgtcr      860          1
</pre>
<p>Summary section is aggregation of all calls grouped by object_id, data_object_id, function, where.</p>
<p><strong>example 2</strong><br />
There was question on <a href="http://www.sql.ru/forum/actualthread.aspx?bid=3&amp;tid=876321">sql.ru</a>. The author asked &#8220;why number of current gets (db block gets) during update in distributed transaction is more then in usual, non-distributed transaction?&#8221;.<br />
In his case it were:<br />
3070178 current gets in non-distributed transaction<br />
6070211 current gets in distributed transaction</p>
<p>So, let&#8217;s find out what exactly is read during distributed transaction and is not read in usual, non-distributed transaction.</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; select * from v$version;
 
BANNER
--------------------------------------------------------------------------------
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
PL/SQL Release 11.2.0.2.0 - Production
CORE	11.2.0.2.0	Production
TNS for Solaris: Version 11.2.0.2.0 - Production
NLSRTL Version 11.2.0.2.0 - Production

SQL&gt; @spid
 
       SID ORACLE_DEDICATED_PROCESS CLIENTPID
---------- ------------------------ ------------------------
         1 16727                    900:408

SQL&gt; create table test as select * from all_objects;
 
Table created

SQL&gt; alter session set optimizer_dynamic_sampling=0;
 
Session altered

SQL&gt; @10046
 
Session altered
</pre>
<p>Here we are executing dtracelio.d:</p>
<pre class="brush: plain; title: ; notranslate">
dtracelio.d 16727 0
</pre>
<p> Here I have asked to show me only summary section of executions of functions performing logical I/O by process 16727.</p>
<p>1st update, without distributed transaction</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; begin
  2      update test set owner = owner;
  3      commit;
  4  end;
  5  /
 
PL/SQL procedure successfully completed
</pre>
<p>After the update has been completed we can push Ctrl+C to finish the DTrace and see the Summary section:</p>
<pre class="brush: plain; title: ; notranslate">
^C
========================= Summary ==========================
object_id    data_object_id  function     where        count
...
0            -1              kcbgcur      48           16
0            -1              kcbgcur      156          16
17           17              kcbgtcr      869          18
0            -1              kcbgcur      51           253
0            -1              kcbgcur      79           1365
79204        79204           kcbgcur      878          1656
0            -1              kcbgcur      50           2357
-1           79204           kcbgcur      859          2357
79204        79204           kcbgtcr      869          2726
79204        79204           kcbgcur      1053         75105
</pre>
<p>An excerpt from raw 10046 trace:</p>
<pre class="brush: plain; title: ; notranslate">
=====================
PARSING IN CURSOR #18446741324891442008 len=60 dep=0 uid=0 oct=47 lid=0 tim=229288260276 hv=2516922591 ad='88f557c0' sqlid='6sj578yb0ac6z'
begin
    update test set owner = owner;
    commit;
end;
END OF STMT
....
EXEC #18446741324891442008:c=2780000,e=4884618,p=0,cr=2776,cu=84545,mis=0,r=1,dep=0,og=1,plh=0,tim=229293144944
</pre>
<p>Pay attention, cu=84545.</p>
<p>Now let&#8217;s execute the same update within distributed transaction</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; declare
  2    i integer;
  3  begin
  4  
  5    select count(*) into i from dual@dblink;
  6    update test set owner = owner;
  7    commit;
  8  
  9  end;
 10  /
</pre>
<pre class="brush: plain; title: ; notranslate">
PARSING IN CURSOR #18446741324891442008 len=122 dep=0 uid=0 oct=47 lid=0 tim=229346672450 hv=1954107067 ad='88f04bc0' sqlid='arss8t9u7kmpv'
declare
  i integer;
begin
  select count(*) into i from dual@dblink;
  update test set owner = owner;
  commit;
end;
END OF STMT
...
EXEC #18446741324891442008:c=2950000,e=3432722,p=0,cr=2298,cu=159479,mis=0,r=1,dep=0,og=1,plh=0,tim=229350105247
</pre>
<p>As we can see cu=159479, about in two times more then in previous execution.</p>
<p>dtracelio.d output:</p>
<pre class="brush: plain; highlight: [14]; title: ; notranslate">
========================= Summary ==========================
object_id    data_object_id  function     where        count
...
0            -1              kcbgcur      48           16
0            -1              kcbgcur      156          16
17           17              kcbgtcr      869          18
0            -1              kcbgcur      51           49
0            -1              kcbgcur      50           442
-1           79204           kcbgcur      859          442
0            -1              kcbgcur      79           1161
79204        79204           kcbgcur      878          1177
79204        79204           kcbgtcr      869          2248
79204        79204           kcbgcur      1053         75105
0            -1              kcbgcur      109          79853
</pre>
<p>Pay attention on the bottom row</p>
<pre class="brush: plain; first-line: 14; title: ; notranslate">
0            -1              kcbgcur      109          79853
</pre>
<p>79853 additional db block gets of some undo block which does not exist in previous output.</p>
<p>If we take a look at the output of dtracelio.d with enabled output of each call we will see something like this:</p>
<pre class="brush: plain; title: ; notranslate">
...
kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053
kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109
kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053
kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109
kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053
kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109
kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053
kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109
kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053
kcbgcur(0xFFFFFD7FFFDF46A0,1,109,0) [tsn: 2 rdba: 0xc000f0 (3/240) obj: 0 dobj: -1] where: 109
kcbgcur(0xFFFFFD7FFDC42D68,2,1053,0) [tsn: 0 rdba: 0x4186e6 (1/100070) obj: 79204 dobj: 79204] where: 1053
...
</pre>
<p>Here we can see these additional calls &#8220;where: 109&#8243; and that the same block (3/240) is read during these calls.<br />
Here is only excerpt, but I checked, all 79853 calls for the same block from the same &#8220;where&#8221; (109 in this case).<br />
<em>Update: updated version of DTraceLIO allows to see that these current gets were performed in shared mode, so content of the block is not changed.</em></p>
<p>Let&#8217;s try to look at inside the block.</p>
<pre class="brush: plain; title: ; notranslate">
alter system dump datafile 3 block 240;
</pre>
<p>An excerpt from block dump:</p>
<pre class="brush: plain; title: ; notranslate">
...
Block dump from disk:
buffer tsn: 2 rdba: 0x00c000f0 (3/240)
scn: 0x0000.0015240d seq: 0x02 flg: 0x04 tail: 0x240d2602
frmt: 0x02 chkval: 0x54dc type: 0x26=KTU SMU HEADER BLOCK
...
</pre>
<p>So, this is the undo header.</p>
<p>What is &#8220;where: 109&#8243;? &#8220;Where&#8221; means location in code, let&#8217;s try to find function name with indx 109</p>
<pre class="brush: plain; title: ; notranslate">
SQL&gt; select indx, kcbwhdes from x$kcbwh where indx = 109;
 
      INDX KCBWHDES
---------- ----------------------------------------------------------------
       109 ktuwh87: ktugus:ktuGetExtTxnInfo
</pre>
<p>It is exactly what you will see if print call stack during call with &#8220;where: 109&#8243; (the second row, function from kcbgcur was called is ktuGetExtTxnInfo):</p>
<pre class="brush: plain; title: ; notranslate"> 
 0000000002c63d01 kcbgcur () + 1
 00000000029272d4 ktuGetExtTxnInfo () + 174
 00000000028f5193 ktugti () + 13
 00000000028e2561 ktuchg2 () + 1971
 0000000002944d2f ktbchg2 () + 12f
 0000000001b04809 kdu_array_flush_retry () + c69
 0000000001b01f42 kdu_array_buf () + 522
 0000000001af5390 kduurp () + 460
 0000000001adea00 kdusru () + 13e0
 0000000001ac6c12 kauupd () + 1b2
 0000000005a3370c updrow () + a5c
 0000000007b8bd27 qerupFetch () + 397
 000000000367cfdc qerstFetch () + 58c
 0000000005a3f2fe updaul () + 4ce
 0000000005a4469d updThreePhaseExe () + 1cd
 0000000005a43abd updexe () + 1ed
 000000000447b2c5 opiexe () + 2455
...
</pre>
<p>It were for 11.2.0.2. In my 10.2.0.5 all these additional calls are with &#8220;where: 383&#8243;, &#8220;ktuwh02: ktugus&#8221; which Jonathan Lewis called &#8220;Get Undo Segment header for commit&#8221; (Seems it is not correct descriptoin).<br />
And following call stack is during these calls</p>
<pre class="brush: plain; title: ; notranslate">
              oracle`kcbgcur
              oracle`ktugusc+0x321
              oracle`ktugti+0xf2
              oracle`ktuchg+0x139a
              oracle`ktbchg2+0x115
              oracle`kddchg+0x2c2
              oracle`kddlok+0x7da
              oracle`kddlkr+0x16a
              oracle`updrow+0x2417
              oracle`qerupRowProcedure+0x4f
              oracle`qerupFetch+0x339
              oracle`updaul+0x481
              oracle`updThreePhaseExe+0xc72
              oracle`updexe+0x171
              oracle`opiexe+0xf1b
              oracle`opipls+0x98e
              oracle`opiodr+0x433
              oracle`rpidrus+0xde
              oracle`skgmstack+0x80
              oracle`rpidru+0x86
</pre>
<p><Strong>Conclusion</strong><br />
We have determined what exactly are extra current gets within distributed transaction.<br />
Why Oracle 79853 times reads undo header is a topic for further research. I would prefer to write another post about it.</p>
<p><Strong>Little oftopic</strong>, but interesting point is that Oracle opens distributed transaction (binds undo header, transaction appears in v$transaction, v$global_transaction) for PLSQL block with dblinks during hard parsing. If hard parsing is not performed then during parsing/execution of statement with dblink.<br />
It follows that if you will execute a statement like this</p>
<pre class="brush: plain; title: ; notranslate">
declare
    i integer;
begin
    DML;
    DML;
    DML;
    if 1=0 then
        select count(*) into i from dual@dblink;
    end if;
end;
</pre>
<p>then either it will be executed within dustributed transaction (with extra current gets) or not depends on will hard parsing be performed or not.</p>
<p>p.s.<br />
Attention!<br />
I am still looking for additional information about<br />
* structure kcbds,<br />
* argments of functions kcbgtcr<br />
* argments of functions kcbgcur<br />
If you have this information please let me know.</p>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/alexanderanokhin.wordpress.com/224/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/alexanderanokhin.wordpress.com/224/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=alexanderanokhin.wordpress.com&#038;blog=17752444&#038;post=224&#038;subd=alexanderanokhin&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://alexanderanokhin.wordpress.com/2011/11/13/dynamic-tracing-of-oracle-logical-io/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/48a7cd21f3eee64ace66d6668a3e1223?s=96&#38;d=http%3A%2F%2Fs0.wp.com%2Fi%2Fmu.gif&#38;r=G" medium="image">
			<media:title type="html">alexanderanokhin</media:title>
		</media:content>
	</item>
	</channel>
</rss>
