Home > Oracle > Timing: rowsource statistics. Part 2: Overhead and inconsistent time

Timing: rowsource statistics. Part 2: Overhead and inconsistent time

December 24, 2012 Leave a comment Go to comments


Timing sampling frequency:
Number of calls getting timestamp depends of parameter _rowsource_statistics_sampfreq (default is 128).

  • If this parameter is set to 0, there are no time calculations in rowsource statistics. The functions qerstSnapStats()/qerstUpdateStats()do not get timestamp
  • If this parameter is set to 1, time is calculated always. Every pair qerstSnapStats()/qerstUpdateStats() gets timestamp (this is the case in the excerpts above).
  • If this parameter is set to N (as default value 128) then timestamp will be got every N tuples. It means that only N call of qerstSnapStats()/qerstUpdateStats() on some rowsource level will get timestamp.
  • There is special value 3 which enables gathering only rowcounts.

Pay attention on the case when _rowsource_statistics_sampfreq = N.
If this parameter is set to N (as default value 128) then timestamp will be got every N tuples. It means that only N call of qerstSnapStats()/qerstUpdateStats() on some rowsource level will get timestamp. Consequenced calls (for example next 127 times) of qerstSnapStats()/qerstUpdateStats() use that calculated time. In this case reported time of some rowsources is not real, but something like forecasted. As a result in this case we can find inconsistent time of some steps of an execution plan, when reported time of child steps more then time of parent steps.

From the other hand the most accurate (_rowsource_statistics_sampfreq = 1) leads to increased number of additional calls of the function getting timestamp, as a result increased time for execution of a query.

These are examples of the Digger’s output of the Digger for the same query

select --+ index(tbl)
       count(pad)
  from tbl;

-----------------------------------------------------
| Id  | Operation                    | Name | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |
|   1 |  SORT AGGREGATE              |      |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TBL  |    50 |
|   3 |    INDEX FULL SCAN           | IDX  |    50 |
-----------------------------------------------------

but with different _rowsource_statistics_sampfreq: 0 (timing disabled), 1 (the most accurate timing), 2, 4, 8, 16, 128
special case: _rowsource_statistics_sampfreq = 3 (only rowcounts)

Thus, we can separate three levels of details of rowsource statistics managed by the parameter _rowsource_statistics_sampfreq:
1: number of rows returned by rowsource
2: (1) + statistics as consistent gets, current gets, starts (number of times an execution plan step has been executed), etc
3: (1) + (2) + timing

As we know from 10g there are 3 ways how to enable gathering of rowsource statistics:
1) set parameter statistics_level to “all” on session or system level

alter session set statistics_level=all

By this another hidden parameter parameter _rowsource_execution_statistics is set into TRUE, it enables gathering of rowsource statistics.

  • statistics_level=all also sets parameter timed_os_statistics.
  • This is the only way to enable rowsource statistics gathering in 9i (except actual rows which are populated by enabled sql trace without statistics_level=all).
    Setting the parameter without sql trace enables gathering of rowsource statistics only for CBO (optimizer modes all_rows/first_rows), and the only way to get the statistics – to query V$SQL_PLAN_STATISTICS[_ALL].
    Setting the parameter with sql trace enables gathering of rowsource statistics for queries with any optimizer mode. In this case the subset statistics also can be printed in trace file. SQL trace without parameter statistics_level=all gathers only rowcounts.

In this case _rowsource_statistics_sampfreq is used only if the parameter has been set explicitly.
Otherwise using value is 1.

SQL> select ksppinm name,
  2         ksppstdvl value,
  3         ksppstdf IsDefault,
  4         decode(bitand(ksppstvf, 5), 1, 'TRUE', 0, 'FALSE') IsSetExplicitly
  5    from x$ksppi x, x$ksppcv y
  6   where (x.indx = y.indx)
  7     and ksppinm = '_rowsource_statistics_sampfreq';

NAME                                VALUE      ISDEFAULT  ISSETEXPLICITLY
----------------------------------- ---------- ---------- --------------------
_rowsource_statistics_sampfreq      128        TRUE       FALSE

SQL> select count(*) from v$parameter where name ='_rowsource_statistics_sampfreq';

  COUNT(*)
----------
         0

Here although the value of the parameter still be equal 128, the value using for frequency of getting timestamps will be 1.

If _rowsource_statistics_sampfreq is set manually then using value is _rowsource_statistics_sampfreq.

SQL> alter session set "_rowsource_statistics_sampfreq"=128;

Session altered.

SQL> select ksppinm name,
  2         ksppstdvl value,
  3         ksppstdf IsDefault,
  4         decode(bitand(ksppstvf, 5), 1, 'TRUE', 0, 'FALSE') IsSetExplicitly
  5    from x$ksppi x, x$ksppcv y
  6   where (x.indx = y.indx)
  7     and ksppinm = '_rowsource_statistics_sampfreq';

NAME                                VALUE      ISDEFAULT  ISSETEXPLICITLY
----------------------------------- ---------- ---------- --------------------
_rowsource_statistics_sampfreq      128        TRUE       TRUE

SQL> select count(*) from v$parameter where name ='_rowsource_statistics_sampfreq';

  COUNT(*)
----------
         1

In this case the value using for getting timestamps will be 128.

  • Notice that hidden parameters set explicitly appear in v$parameter

It is possible to set _rowsource_execution_statistics=true inside a session with statistics_level=typical manually.
In such case (and in cases below) real frequence of getting timestamps is equal to parameter _rowsource_statistics_sampfreq regardless how it was set (explicitly or not).

2) hint gather_plan_statistics
it enables rowsource statistics for a query.
In this case just _rowsource_statistics_sampfreq is used.

3) sql trace (10046 event)
In this case just _rowsource_statistics_sampfreq is used.
Keep in mind that in this case it is possible to see rowsource statistics not only via trace file or tkprof report, but also using dbms_xplan.display_cursor, it is much more convenient.

  • In each of these three cases optimizer environment (v$[sql/ses/sys]_optimizer_env) property sqlstat_enabled set to true.

Timing: overhead

Let’s see how and why different precise of timing calculation (_rowsource_statistics_sampfreq) affects amount of CPU and execution time of a query.
I will run a query with different _rowsource_statistics_sampfreq: default, 0, 128, 16, 8, 4, 2, 1.

In testcases below I will use simple table in 10 millions of rows and with an index

create table test(id not null, pad) as
  select rownum as id,
         '*' as pad
    from dual
 connect by level <= 10000000;

create index idx on test(id);

SQL> exec dbms_stats.gather_table_stats(user, 'TEST', estimate_percent => 100, cascade => true);

PL/SQL procedure successfully completed

and two queries over the table: one with Full Table Scan and another with Index Full Scan.
All runs on two versions – 10g and 11g.

In all cases I will use SQL tracing, because of easy access to consumed CPU instead of just rely on elapsed time.
In 10g sql trace for the session, in 11g SQL trace for exact SQL statement

alter session set events 'sql_trace[sql:9pzmsyh5t14bb]';

After all queries will be run and traced, I again run the same queries, but during each run of the SQL query I measured number of timer calls (gethrtime). I did it after previous run in order to avoid measurement (dtrace) overhead, so CPU time in the trace results in the table below does not contain dtrace overhead.

So, the first query with Full Table Scan:

select count(pad)
  from test;

--------------------------------------------
| Id  | Operation          | Name |   Rows |
--------------------------------------------
|   0 | SELECT STATEMENT   |      |      1 |
|   1 |  SORT AGGREGATE    |      |      1 |
|   2 |   TABLE ACCESS FULL| TEST |     10M|
--------------------------------------------

Result of the test are below: (all tkprof reports are made with option aggregate=no)
10g Full Table Scan: full script, output, raw trace, tkprof report

11g Full Table Scan: full script, output, raw trace, tkprof report

  • Notice interesting detail: such form of SQL tracing:
    alter session set events 'sql_trace[sql:9pzmsyh5t14bb]'

    does not trace parse calls

freq 10g 11g
cpu timer calls cpu timer calls
default (128) 0,66 234 392 0,65 156 262
0 0,59 5 0,57 5
128 0,66 234 392 0,64 156 262
16 1,13 1 875 017 1,02 1 250 015
8 1,64 3 750 020 1,32 2 500 017
4 2,50 7 500 019 2,02 5 000 019
2 4,44 15 000 026 3,41 10 000 025
1 8,18 30 000 027 6,28 20 000 033

And the same actions for changed query and Index Full Scan

select --+ index(test)
       count(pad)
  from test;

-----------------------------------------------------
| Id  | Operation                    | Name | Rows  |
-----------------------------------------------------
|   0 | SELECT STATEMENT             |      |     1 |
|   1 |  SORT AGGREGATE              |      |     1 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |    10M|
|   3 |    INDEX FULL SCAN           | IDX  |    10M|
-----------------------------------------------------

10g Index Full Scan: full script, output, raw trace, tkprof report

11g Index Full Scan: full script, output, raw trace, tkprof report

freq 10g 11g
cpu timer calls cpu timer calls
default (128) 3,49 937 531  2,78  312 519
0 3,28 5  2,62   7
128 3,52  937 534  2,69  312 519
16 5,24 7 500 036  3,32   2 500 021
8 7,18  15 000 038  4,31  5 000 025
4 11,25  30 000 044  5,28  10 000 029
2 19,01 60 000 056  7,92   20 000 041
1 34,03 120 000 067  13,06   40 000 057

We can see that _rowsource_statistics_sampfreq increases number of additional timer calls which obviously increases CPU consumption.
Also we can see that default value 128 does not lead to significant overhead

Pay attention aslo on:
- in the Full Table Scan case number of additional timer calls in 10g is in 1.5 times more than in 11g. We saw the reason above in section “How it works” – qerstUpdateStats() calls gethrtime() twice
- in the Index Full Scan case number of additional timer calls in 10g is in 3(!) times more than in 11g. Why?
Let’s try to take a look at where, in what call stack, gethrtime() were called.

dtrace -p <PID> -n 'pid$target:libc.so.1:gethrtime:entry{@tim[ustack()]=count();}' > report.txt

Excerpts:
10g:

...
              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x2fd
              oracle`qerstFetch+0x152
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x2fd
              oracle`qerstRowP+0x3d
              oracle`qerixtFetch+0x47d
              oracle`qerstFetch+0xf0
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x2fd
              oracle`qerstRowP+0x3d
              oracle`qertbFetchByRowID+0x88f
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x31e
              oracle`qerstFetch+0x152
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x31e
              oracle`qerstRowP+0x3d
              oracle`qerixtFetch+0x47d
              oracle`qerstFetch+0xf0
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x31e
              oracle`qerstRowP+0x3d
              oracle`qertbFetchByRowID+0x88f
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
          9999999

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstRowP+0x29b
              oracle`qerixtFetch+0x47d
              oracle`qerstFetch+0xf0
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
         10000000

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstRowP+0x29b
              oracle`qertbFetchByRowID+0x88f
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
         10000000

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstFetch+0x48
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
              oracle`0xdba2dc
         10000001

              libc.so.1`gethrtime
              oracle`qerstSnapStats+0x44
              oracle`qerstFetch+0x48
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
         10000001

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x134
              oracle`qerstFetch+0x152
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
         10000001

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x155
              oracle`qerstFetch+0x152
              oracle`qerpfFetch+0x102
              oracle`qerstFetch+0x12a
              oracle`qertbFetchByRowID+0x271
              oracle`qerstFetch+0xf0
              oracle`qergsFetch+0x16c
              oracle`qerstFetch+0xf0
              oracle`opifch2+0xa4e
              oracle`kpoal8+0xdb7
              oracle`opiodr+0x433
              oracle`ttcpip+0x46a
              oracle`opitsk+0x52d
              oracle`opiino+0x3f0
              oracle`opiodr+0x433
              oracle`opidrv+0x2f1
              oracle`sou2o+0x5b
              oracle`opimai_real+0x84
              oracle`main+0x64
         10000001

11g:

...
              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x206
              oracle`qerstRowP+0x4e
              oracle`qertbFetchByRowID+0xa4e
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
          9999999

              libc.so.1`gethrtime
              oracle`qerstUpdateStats+0x206
              oracle`qerstFetch+0x1a2
              oracle`qertbFetchByRowID+0x4c7
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
          9999999

              libc.so.1`gethrtime
              oracle`qerstRowP+0x25d
              oracle`qertbFetchByRowID+0xa4e
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
         10000000

              libc.so.1`gethrtime
              oracle`qerstFetch+0x234
              oracle`qertbFetchByRowID+0x4c7
              oracle`qerstFetch+0x3ea
              oracle`qergsFetch+0x7c3
              oracle`qerstFetch+0x3ea
              oracle`opifch2+0x8a4
              oracle`kpoal8+0xfb9
              oracle`opiodr+0x3a7
              oracle`ttcpip+0x4f7
              oracle`opitsk+0x608
              oracle`opiino+0x335
              oracle`opiodr+0x3a7
              oracle`opidrv+0x2e6
              oracle`sou2o+0x58
              oracle`opimai_real+0x20a
              oracle`ssthrdmain+0x125
              oracle`main+0xd4
              oracle`0x15d18bc
         10000001

As we can see in 10g additional rowsource level appears – qerpfFetch (prefetch), which also wrapped by qerstSnapStats/qerstUpdateStats functions. As the result number of calls in three times (1.5*2) more than in 11g.

Amount of overhead depends on number of these additional calls of the function getting timestamp and also it depends on implementation of this function, that is in different OS overhead can be different.
Mentioned calls gethrtime() in Solaris or gettimeofday() in Linux are syscalls. But these are special type of syscalls, these syscalls are implemented as syscalls with minimal overhead. Such syscalls in Solaris are called “fast trap syscalls” in Linux vsyscalls/VDSO (only 64 bit architectures).

It is why we do not see it as real syscalls. In this case in order to catch these calls we should trace library calls which are wrappers for the real syscalls. For example, in Linux it is possible to use ltrace (library trace).

Thus, performance overhead depends on
- number of additional timer calls: which itself depends on number of processed rows in each rowsource and number of rowsources
- implementation of the function getting timestamp, so in different OS overhead can be different
- and how exactly it works inside OS, even in the same OS it can work in different modes

Timing: Overhead: VDSO on Linux

  • gettimeofday speedup:
    A Virtual Dynamic Shared Object (VDSO), is a shared library that allows application in user space to perform some kernel actions without as much overhead as a system call. The VDSO is often used to provide fast access to the gettimeofday system call data.
    Enabling the VDSO instructs the kernel to use its definition of the symbols in the VDSO, rather than the ones found in any user-space shared libraries, particularly the glibc. The effects of enabling the VDSO are system-wide – either all processes use it or none do.
    When enabled, the VDSO overrides the glibc definition of gettimeofday with it’s own. This removes the overhead of a system call, as the call is made direct to the kernel memory, rather than going through the glibc.

The behavior of gethrtime() can be changed (system-wide):

  • The VDSO boot parameter has three possible values:
    0
    Provides the most accurate time intervals at μs (microsecond) resolution, but also produces the highest call overhead, as it uses a regular system call
    1
    Slightly less accurate, although still at μs resolution, with a lower call overhead
    2
    The least accurate, with time intervals at the ms (millisecond) level, but offers the lowest call overhead

VDSO behavior is enabled by default. The value used to enable the VDSO affects the behavior of gettimeofday. It can be enabled by echoing the desired value to /proc/sys/kernel/syscall64.

echo 1 > /proc/sys/kernel/vsyscall64

It allows us to compare performance of gehrtime() as VDSO and as usual syscall.
Environment: VirtualBox with Oracle 11.2.0.3 on Oracle Linux 6.3 inside

I will change VDSO behavior and run the same query as above, consequentially in three modes: 2,1,0 (gethrtime() becames a real syscall).
Again I use sql trace instead of statistics_level or hint gather_plan_statistics in order to easy access to cpu time.
The tracefile: orcl_ora_3017.trc

# echo 2 > /proc/sys/kernel/vsyscall64
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:05.45 |   41400 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:05.45 |   41400 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:04.54 |   41400 |
|   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:01.74 |   23555 |
-----------------------------------------------------------------------------------------------

FETCH #139665587261584:c=5403178,e=5453897,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789769054152

cpu=5,40

# echo 1 > /proc/sys/kernel/vsyscall64
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:05.17 |   41400 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:05.17 |   41400 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:04.29 |   41400 |
|   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:01.63 |   23555 |
-----------------------------------------------------------------------------------------------

FETCH #139665578749400:c=5119221,e=5169173,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789789329746

cpu=5,12

# echo 0 > /proc/sys/kernel/vsyscall64
-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:11.67 |   41400 |
|   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:11.67 |   41400 |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:09.15 |   41400 |
|   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.28 |   23555 |
-----------------------------------------------------------------------------------------------

FETCH #139665585290416:c=11504250,e=11668447,p=0,cr=41400,cu=0,mis=0,r=1,dep=0,og=1,plh=4182611088,tim=1350789819491035

cpu=11,50

Summary

mode cpu time
0 11,50
1 5,12
2 5,40

In order to measure number of times gettimeofday() has been called on Linux it is possible either use ltrace (in any mode) or any syscalls measurement method (as strace or dtrace on OL6) in mode 0 (when gettimeofday() is real syscall).

I have used dtrace and the simplest script:

# dtrace -p 3017 -n 'syscall::gettimeofday:entry/pid == $target/{@calls=count()}'
dtrace: description 'syscall::gettimeofday:entry' matched 1 probe
^C
40000640

Thus, number of timer calls in this case is 40 mln – the same number as in Solaris.

Timing: known issues

In some environments it is possible to get execution plans with absolutely crazy timings, as below.
This case is Solaris 10 on VirtualBox:

select count(*)
  from (select rownum id from dual connect by level <= 999999) t1,
       (select rownum id from dual connect by level <= 999999) t2
 where t1.id = t2.id;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                 |      |      1 |        |      1 |00:00:02.56 |       |       |          |
|   1 |  SORT AGGREGATE                  |      |      1 |      1 |      1 |00:00:02.56 |       |       |          |
|*  2 |   HASH JOIN                      |      |      1 |      1 |  99999 |07:15:32.57 |  3749K|  1936K| 5594K (0)|
|   3 |    VIEW                          |      |      1 |      1 |  99999 |03:28:07.19 |       |       |          |
|   4 |     COUNT                        |      |      1 |        |  99999 |02:31:21.65 |       |       |          |
|   5 |      CONNECT BY WITHOUT FILTERING|      |      1 |        |  99999 |01:53:30.97 |       |       |          |
|   6 |       FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       |       |          |
|   7 |    VIEW                          |      |      1 |      1 |  99999 |03:28:17.39 |       |       |          |
|   8 |     COUNT                        |      |      1 |        |  99999 |02:50:24.37 |       |       |          |
|   9 |      CONNECT BY WITHOUT FILTERING|      |      1 |        |  99999 |00:56:48.52 |       |       |          |
|  10 |       FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       |       |          |
--------------------------------------------------------------------------------------------------------------------

Pay attention that the query has been done in ~3 seconds, but reported time of some steps more than 7 hours.
The reason is non-monotonic timer. It means that subsequent call of timestamp can return value less that previous call (time from the past).
More details of this problem and the tool allowing to check it is described here: http://www.doerzbach.com/index.php?entry=entry120113-124240

P.S.
By the way Real-Time SQL monitoring works absolutely another way… but this is another story. I am going to explain it also in one of next posts.

P.P.S.
Seems that’s all. I hope didn’t forget anything ;-)
Any additions, corrections, questions are welcome!

About these ads
  1. December 25, 2012 at 5:56 am | #1

    Hi Alexander,

    a great post, as usual. Regarding inconsistent timings: when we see this, can we trust any part of the plan rowsource stats? For example, in this plan with a 7 hour hash join, can we conclude that the statement took 7h15m + 3h28m = 10h43m? Or that it took at least 7h15m? Or should we just discard the timing information altogether?

    It’s also annoying the way Oracle fails to properly include rowsource stats for scalar subqueries into those of their parents — in this case it’s even worse because in addition to timings, it also affects LIO. Is there any remedy for that?

  2. December 25, 2012 at 8:18 pm | #2

    savvinov :

    Regarding inconsistent timings: when we see this, can we trust any part of the plan rowsource stats? For example, in this plan with a 7 hour hash join, can we conclude that the statement took 7h15m + 3h28m = 10h43m? Or that it took at least 7h15m? Or should we just discard the timing information altogether?

    Hi Nikolay,

    There are two resons for inconsistent timing in rowsource statistics:
    1) the first one is “by design”, due the mechanic of _rowsource_statistics_sampfreq
    2) the second one is due to errors in timer’s work

    The first reason:
    When _rowsource_statistics_sampfreq is more than 1, let’s take default value 128, then Oracle calculates time for one of 128 rows. And use calculated time for the rest 127 rows. At this point some error appears, because time spent in calculated row and time spent in rest 127 rows can be different, at least calculated piece calls additional functions, system calls in order to get timestamps, when rest 127 rows do not.
    So, when the parameter _rowsource_statistics_sampfreq > 1 then reported time is adjusted, approximated.

    Let’s take a look on example where I ran a query with different _rowsource_statistics_sampfreq.
    This is SQL+ output from the test with the executoin plan with Index Full Scan on 11g, and below is an excerpt from this output:

    SQL> alter session set events 'sql_trace[sql:9pzmsyh5t14bb]';
    
    Session altered.
    
    SQL> alter session set "_rowsource_statistics_sampfreq"=128;
    
    Session altered.
    
    SQL> select --+ index(test)
      2         count(pad)
      3    from test;
    
    COUNT(PAD)                                                                                                                                                                                              
    ----------                                                                                                                                                                                              
      10000000                                                                                                                                                                                              
    
    Elapsed: 00:00:02.70
                                                                                                                                                                                                            
    -----------------------------------------------------------------------------------------------                                                                                                         
    | Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
    |   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:02.70 |   41401 |                                                                                                         
    |   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:02.70 |   41401 |                                                                                                         
    |   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:10.28 |   41401 |                                                                                                         
    |   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.63 |   23555 |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
                                                                                                                                                                                                            
    
    SQL> alter session set "_rowsource_statistics_sampfreq"=16;
    
    Session altered.
    
    SQL> select --+ index(test)
      2         count(pad)
      3    from test;
    
    COUNT(PAD)                                                                                                                                                                                              
    ----------                                                                                                                                                                                              
      10000000                                                                                                                                                                                              
    
    Elapsed: 00:00:03.33
                                                                                                                                                                                                            
    -----------------------------------------------------------------------------------------------                                                                                                         
    | Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
    |   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:03.33 |   41401 |                                                                                                         
    |   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:03.33 |   41401 |                                                                                                         
    |   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:10.36 |   41401 |                                                                                                         
    |   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.71 |   23555 |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
                                                                                                                                                                                                            
    
    SQL> alter session set "_rowsource_statistics_sampfreq"=8;
    
    Session altered.
    
    SQL> select --+ index(test)
      2         count(pad)
      3    from test;
    
    COUNT(PAD)                                                                                                                                                                                              
    ----------                                                                                                                                                                                              
      10000000                                                                                                                                                                                              
    
    Elapsed: 00:00:04.32
                                                                                                                                                                                                            
    -----------------------------------------------------------------------------------------------                                                                                                         
    | Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
    |   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:04.31 |   41401 |                                                                                                         
    |   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:04.31 |   41401 |                                                                                                         
    |   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:10.78 |   41401 |                                                                                                         
    |   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.92 |   23555 |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
                                                                                                                                                                                                            
    SQL> alter session set "_rowsource_statistics_sampfreq"=4;
    
    Session altered.
    
    SQL> select --+ index(test)
      2         count(pad)
      3    from test;
    
    COUNT(PAD)                                                                                                                                                                                              
    ----------                                                                                                                                                                                              
      10000000                                                                                                                                                                                              
    
    Elapsed: 00:00:05.28
                                                                                                                                                                                                            
    -----------------------------------------------------------------------------------------------                                                                                                         
    | Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
    |   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:05.28 |   41401 |                                                                                                         
    |   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:05.28 |   41401 |                                                                                                         
    |   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:10.24 |   41401 |                                                                                                         
    |   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.68 |   23555 |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
                                                                                                                                                                                                            
    
    SQL> alter session set "_rowsource_statistics_sampfreq"=2;
    
    Session altered.
    
    SQL> select --+ index(test)
      2         count(pad)
      3    from test;
    
    COUNT(PAD)                                                                                                                                                                                              
    ----------                                                                                                                                                                                              
      10000000                                                                                                                                                                                              
    
    Elapsed: 00:00:07.94
                                                                                                                                                                                                            
    -----------------------------------------------------------------------------------------------                                                                                                         
    | Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
    |   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:07.94 |   41401 |                                                                                                         
    |   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:07.94 |   41401 |                                                                                                         
    |   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:10.33 |   41401 |                                                                                                         
    |   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.73 |   23555 |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
                                                                                                                                                                                                            
    
    SQL> alter session set "_rowsource_statistics_sampfreq"=1;
    
    Session altered.
    
    SQL> select --+ index(test)
      2         count(pad)
      3    from test;
    
    COUNT(PAD)                                                                                                                                                                                              
    ----------                                                                                                                                                                                              
      10000000                                                                                                                                                                                              
    
    Elapsed: 00:00:13.07
                                                                                                                                                                                                            
    -----------------------------------------------------------------------------------------------                                                                                                         
    | Id  | Operation                    | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
    |   0 | SELECT STATEMENT             |      |      1 |        |      1 |00:00:13.08 |   41401 |                                                                                                         
    |   1 |  SORT AGGREGATE              |      |      1 |      1 |      1 |00:00:13.08 |   41401 |                                                                                                         
    |   2 |   TABLE ACCESS BY INDEX ROWID| TEST |      1 |     10M|     10M|00:00:10.22 |   41401 |                                                                                                         
    |   3 |    INDEX FULL SCAN           | IDX  |      1 |     10M|     10M|00:00:03.67 |   23555 |                                                                                                         
    -----------------------------------------------------------------------------------------------                                                                                                         
    

    Here we can see
    - in every case except the last one, when _rowsource_statistics_sampfreq = 1, reported time of child rowsources (spets 2 and 3) more than parent rowsources (step 1 of the execution plan, SORT AGGREGATE).
    - top rowsource of the execution plan, SORT AGGREGATE is always report right time (compare it with elapsed time of a query) because only few calculations have happened there, only two timestamps were need for the top rowsource.

       module call(args)             = return
    --------- ---------------------------------
       oracle -> qerstFetch(0xC2C9C2D8, 0xFFFFFD7FFDC3AA60, 0x4B6D830)
    libc.so.1   -> gethrtime(0xFFFFFD7FFFDFB510, 0x0, 0x43)
    libc.so.1   <- gethrtime = 0x8216E81D
       oracle   -> qergsFetch(0xC2C9C448, 0xFFFFFD7FFDC3A780, 0x3570158)
                   ....
       oracle   <- qergsFetch = 0x0
       oracle   -> qerstUpdateStats(0xFFFFFD7FFDC3AA60, 0xFFFFFD7FFFDFB510, 0x0)
    libc.so.1     -> gethrtime(0x59, 0xCA6EA940, 0x5A)
    libc.so.1     <- gethrtime = 0x84C1A7C9
       oracle   <- qerstUpdateStats = 0xA8
       oracle <- qerstFetch = 0x0
    

    It is why reported time for the top rowsource usually is real and true.

    Pay attention on the interesting detail here: reported time on the steps which are wrong in case when _rowsource_statistics_sampfreq > 1 is close or equal to the time of the same steps in case when _rowsource_statistics_sampfreq = 1 in our case.
    Let’s compare the time on the step 2 and 3 in all execution plans in this excerpt. It still be the same. In all plans except the latest, we see the time which we would have if set _rowsource_statistics_sampfreq = 1.

    But this rule does not work always. In some environment some another measurement mistake appears, when with less _rowsource_statistics_sampfreq time of steps 2 and 3 from our example become more accurate.

    The second reason:
    this crazy timing:

    --------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                        | Name | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
    --------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                 |      |      1 |        |      1 |00:00:02.56 |       |       |          |
    |   1 |  SORT AGGREGATE                  |      |      1 |      1 |      1 |00:00:02.56 |       |       |          |
    |*  2 |   HASH JOIN                      |      |      1 |      1 |  99999 |07:15:32.57 |  3749K|  1936K| 5594K (0)|
    |   3 |    VIEW                          |      |      1 |      1 |  99999 |03:28:07.19 |       |       |          |
    |   4 |     COUNT                        |      |      1 |        |  99999 |02:31:21.65 |       |       |          |
    |   5 |      CONNECT BY WITHOUT FILTERING|      |      1 |        |  99999 |01:53:30.97 |       |       |          |
    |   6 |       FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       |       |          |
    |   7 |    VIEW                          |      |      1 |      1 |  99999 |03:28:17.39 |       |       |          |
    |   8 |     COUNT                        |      |      1 |        |  99999 |02:50:24.37 |       |       |          |
    |   9 |      CONNECT BY WITHOUT FILTERING|      |      1 |        |  99999 |00:56:48.52 |       |       |          |
    |  10 |       FAST DUAL                  |      |      1 |      1 |      1 |00:00:00.01 |       |       |          |
    --------------------------------------------------------------------------------------------------------------------
    

    happened due to issues with timer in virtualized environment.
    Pay attention that this timing has been got with _rowsource_statistics_sampfreq = 1.

    The problem in this case that “next” timestamp can be (not every timestamp, but some of them) less than “previous”. It is easy to check using simple piece of code.
    In my case it reported

     296338:
        start 5206560176179
        end   5206560140799
        diff  -35380
     315984:
        start 5206570390808
        end   5206570323725
        diff  -67083
     397163:
        start 5206610428346
        end   5206610140817
        diff  -287529
     853899:
        start 5206839843194
        end   5206839814437
        diff  -28757
     875899:
        start 5206850129682
        end   5206850112551
        diff  -17131
     1674618:
        start 5207250178393
        end   5207250111996
        diff  -66397
    ...
    

    As I suggest, when Oracle subtracts previous timestamp from the next (when the next less then the previous) using unsigned variable it gets huge numbers.
    Obviously we cannot trust reported time in this case, except top level (step 1, SORT AGGREGATE) because only few timestamps on the top level, very small probability to get wrong timing here.

  3. December 27, 2012 at 4:27 pm | #3

    Hi Alexander,

    thank you for the great explanation. Jonathan Lewis mentioned the role of the timer function some years ago: http://jonathanlewis.wordpress.com/2007/11/25/gather_plan_statistics/. But he did not say much about the details.

    Regards

    Martin

  1. December 24, 2012 at 9:26 pm | #1
  2. May 22, 2013 at 12:40 pm | #2
  3. June 4, 2013 at 8:55 pm | #3
  4. November 16, 2013 at 9:52 am | #4

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 77 other followers

%d bloggers like this: