Digger: the tool for tracing of unix processes

Contents:
1. Introduction
2. New Features
3. How to use
4. Oracle kernel layers and functions
5. Important notes
6. Examples
7. Post Scriptum


Introduction
I would like to introduce new tool – Digger.
This tool allows to see tree of process’ calls such as application calls, library, system calls (and even kernel functions and OS scheduler actions) with additional information as function arguments, result, cpu & elapsed time.
This is not something Oracle specific, the tool can be used for tracing of any unix process (DTrace is required).

Sourcecode: digger.sh

Examples:

CPU# timestamp relative elapsed    cpu    module call(args)              = return
---- --------- -------- ------- ------ --------- ---------------------------------
   4     10650   296149       .      .    oracle -> kdsgrp(0xFFFFFD7FFDC28CC8, 0x0, 0xFFFFFD7FFDC28CC8)
   4     10662   296150       .      .    oracle   -> kcbispnd(0xFFFFFD7FFDC28CE0, 0x0, 0xFFFFFD7FFDC28CC8)
   4     10670   296151       8      0    oracle   <- kcbispnd = 0x0
   4     10686   296153       .      .    oracle   -> ktrget2(0xFFFFFD7FFDC28CD0, 0xFFFFFD7FFDC28BF8, 0x360)
   4     10698   296156       .      .    oracle     -> ktsmg_max_query(0x0, 0x0, 0xFFFFFD7FFDC11B28)
   4     10707   296157       8      0    oracle     <- ktsmg_max_query = 0x0
   4     10724   296160       .      .    oracle     -> kcbgtcr(0xFFFFFD7FFDC28CE0, 0x0, 0x360)
   4     10735   296162       .      .    oracle       -> ktrexf(0xFFFFFD7FFFDFADD0, 0xE771F80, 0x0)
   4     10743   296163       7      0    oracle       <- ktrexf = 0x9
   4     10758   296166       .      .    oracle       -> kcbzgs(0x1, 0xE771F80, 0x1)
   4     10768   296168       .      .    oracle         -> kssadf_numa_intl(0x26, 0x91F2BB48, 0x8DCA3948)
   4     10777   296169       8      1    oracle         <- kssadf_numa_intl = 0x91A75F98
   4     10790   296170      32      3    oracle       <- kcbzgs = 0x91A75F98
   4     10805   296171       .      .    oracle       -> kcbz_fp_buf(0x80FF38A8, 0x91A76018, 0x1)
   4     10813   296172       8      1    oracle       <- kcbz_fp_buf = 0x1
   4     10826   296173     102     13    oracle     <- kcbgtcr = 0x80EE8014
   4     10840   296176       .      .    oracle     -> kcbcge(0xFFFFFD7FFDC28CE0, 0x0, 0x0)
   4     10848   296176       7      0    oracle     <- kcbcge = 0x0
   4     10862   296178       .      .    oracle     -> ktcckv(0xFFFFFD7FFDC28CE0, 0xFFFFFD7FFDC281F4, 0x0)
   4     10870   296179       7      0    oracle     <- ktcckv = 0x0
   4     10881   296180     195     26    oracle   <- ktrget2 = 0x80EE807C
   4     10895   296181     244     32    oracle <- kdsgrp = 0x80EE9FAD
CPU# timestamp relative elapsed    cpu    module call(args)              = return
---- --------- -------- ------- ------ --------- ---------------------------------
   1     28814   204036       .      .    oracle -> ksfd_skgfqio(0xFFFFFD7FFFDF8B70, 0x4, 0x0)
   1     28834   204043       .      .    oracle   -> skgfqio(0xFFFFFD7FFDD79580, 0xFFFFFD7FFFDF8B70, 0x4)
   1     28851   204049       .      .    oracle     -> skgfrvldtrq(0xFFFFFD7FFFDF8B70, 0x4, 0xFFFFFD7FFDD79580)
   1     28880   204051      29      1    oracle     <- skgfrvldtrq = 0x0
   1     28904   204054       .      .   syscall     => pread(0x101, 0x7F4E2000, 0x2000)
   1     28923   204064      19      9   syscall     <= pread = 0x2000
   1     28941   204066     107     23    oracle   <- skgfqio = 0x0
   1     28958   204068     143     31    oracle <- ksfd_skgfqio = 0x0

The Digger can be considered as improved and extended version of open source tool
dapptrace, the tool from DTraceToolkit, included in several operating systems, developed by Brendan Gregg.
I thought should it be new version of dapptrace or dedicated tool and decided that scope of the Digger is wider then dapptrace, thus it should be considered as dedicated branch with dedicated name. Perhaps some features of the Digger will be propagated into dapptrace. I am working on it.


New features of the Digger
1) The capability to trace calls only inside some exact function.
If the option is set then all tracing will be enabled only inside this function only.

Original version of dapptrace is not convinient to be used on Oracle processes.
Oracle process has hundreds of thousands of functions (~166K functions in 11.2.0.2) and if you are going to trace some the simplest “select * from dual” you will get huge amount of info where you most likely need only very small part.
Usually it is required to trace only some operation, for example it can be function performing some operation as Full Table Scan, or parse call, I/O calls, etc.

2) The capability to filter traced calls.
It means to show not all calls but only some functions. For example to trace only kcb* (Kernel Cache Buffer laers’s calls) calls.
This especially convinient in Oracle because in Oracle all function have prefixes denoting affiliation to kernel subsystem.
For example: k*: Kernel, kc*: Kernel Cache, kt*: Kernel Transaction layer, etc.

3) The capability to trace system calls

These features implemented as following options:

 -f function(s)  # to trace calls inside some function (patterns * and ? are allowed)
 -t function(s)  # calls filter
 -s function(s)  # trace syscalls with filter
 -S              # trace all syscalls

Parameters “function(s)” can be
– exact function name
– pattern using symbols * or ?, example: ktr*
– list separated by commas, example: kcbget,kds*
– function can include module name, example: libc.so.1:gethrtime
If the module is not set then the module is determined by option -U (if set then all modules, otherwise only application calls are traced).

Valid usage:

                     
digger -p PID -FS -f kdsgrp

means to trace all application calls and system calls inside function kdsgrp

                     
digger -p PID -F -f qertbFetch -t kcb*,ktr* -s *read*,*aio

means to trace kcb* and ktr* functions and system calls *read* and *aio inside function qertbFetch

4) The capability to trace kernel calls (only for hardcore geeks).
Kernel is the SYS component of CPU utilization. This is a chance to look inside a system call.

5) The capability to trace scheduler actions

CPU# call(args) 		 = return
---- ---------------------------------
   0 -> kslfre(0x9191A5D0, 0x29, 0x0)		
   0   => yield(0x9191A5D0, 0x2000000000000029, 0xFFFFFFFD)		
   0     - sched:TS:ts_yield:change-pri [pr_clname=TS, pr_sname=O, pr_pri=59, new_pri=39]
   0     - sched:unix:setbackdq:enqueue [pr_clname=TS, pr_sname=O, cpu_id=0]
   0     - sched:unix:disp:dequeue [pr_clname=TS, pr_sname=R, cpu_id=0]
   0     - sched:unix:swtch:remain-cpu
   0   <= yield = 0x0
   0 <- kslfre = 0x0

6) The capability to trace actions related with I/O

 syscall     => pread(0x101, 0x694BC000, 0x2000)
 genunix       - io:genunix:bdev_strategy:start [d=sd3 f=/u02/oradata/orcl/example01.dbf b=8192]
 genunix       - io:genunix:biowait:wait-start [d=sd3 f=/u02/oradata/orcl/example01.dbf b=8192]
 genunix       - io:genunix:biowait:wait-done [d=sd3 f=/u02/oradata/orcl/example01.dbf b=8192]
 syscall     <= pread = 0x2000

7) Mechanics of indents differs from dapptrace.
There are functions where return probes do not exist due to compiler optimizations.

See also:
Tail-call Optimization
http://dtrace.org/blogs/brendan/2011/02/14/dtrace-pid-provider-return

It leads to moving of function tree on right side. In the original version of dapptrace it looks like following piece

   oracle    -> kpofdr(0xFFFFFD7FFFDF2780, 0x64, 0x0)		
   oracle      -> ttcrbur(0xE77ADF0, 0xE768C18, 0xFFFFFD7FFFDF26C0)
libc.so.1        -> memcpy(0xE9A2751, 0xFFFFFD7FFFDF26C0, 0x1)
libc.so.1          -> memcpy(0xE9A2752, 0xFFFFFD7FFFDF26C2, 0x2)
libc.so.1            -> memcpy(0xE9A2754, 0xFFFFFD7FFFDF26C4, 0x4)
libc.so.1              -> memcpy(0xE9A2758, 0xFFFFFD7FFFDF26C8, 0x4)
libc.so.1                -> memcpy(0xE9A275C, 0xFFFFFD7FFFDF26CC, 0x2)
libc.so.1                  -> memcpy(0xE9A275E, 0xFFFFFD7FFFDF21B4, 0x4)
libc.so.1                    -> memcpy(0xE9A2762, 0xFFFFFD7FFFDF21B4, 0x4)
   oracle                    <- ttcrbur = 0x0
   oracle                  <- kpofdr = 0x0
   oracle                  -> ...

In the Digger return point always has the same indent as entry point.
It does not completely prevent of the moving on right side, but makes an output more convinient for reading.

   oracle    -> kpofdr(0xFFFFFD7FFFDF2780, 0x64, 0x0)		
   oracle      -> ttcrbur(0xE77ADF0, 0xE768C18, 0xFFFFFD7FFFDF26C0)
libc.so.1        -> memcpy(0xE9A2751, 0xFFFFFD7FFFDF26C0, 0x1)
libc.so.1          -> memcpy(0xE9A2752, 0xFFFFFD7FFFDF26C2, 0x2)
libc.so.1            -> memcpy(0xE9A2754, 0xFFFFFD7FFFDF26C4, 0x4)
libc.so.1              -> memcpy(0xE9A2758, 0xFFFFFD7FFFDF26C8, 0x4)
libc.so.1                -> memcpy(0xE9A275C, 0xFFFFFD7FFFDF26CC, 0x2)
libc.so.1                  -> memcpy(0xE9A275E, 0xFFFFFD7FFFDF21B4, 0x4)
libc.so.1                    -> memcpy(0xE9A2762, 0xFFFFFD7FFFDF21B4, 0x4)
   oracle      <- ttcrbur = 0x0
   oracle    <- kpofdr = 0x0
   oracle    -> ...


How to use

USAGE: digger [-acdeFHIgKlmnoSUz] [-u lib] [-t functions] 
                 [-f functions] [-s functions] 
                 [-k functions] { -p PID | command }

-p PID          # examine this PID
-a              # print all details
-c              # print cpu ID
-d              # print relative timestamps (us)
-e              # print elapsed times (us)
-F              # print flow indentation
-l              # print pid/lwpid per line
-m              # print module name
-o              # print on cpu times (us)
-g              # print timestamps (us)
-u lib          # trace this library instead
-U              # trace all libraries + user funcs
-b bufsize      # dynamic variable buf size
-S              # trace all syscalls
-s function(s)  # trace syscalls with filter
-f function(s)  # to trace calls inside some function (patterns * and ? are allowed)
-K              # to trace kernel calls (fbt provider)
-k function(s)  # to trace kernel calls (fbt provider) using filter
-z              # print call stack
-I              # print fired probes of io provider
-H              # print fired probes of sched provider

If you use wide filter (for example if you do not set filter at all, option -t) too much time can be required to set all probes.
It is necessary to wait for the prompt “The Digger is ready…”. Time of waiting can be from a few seconds to several minutes.


Oracle kernel layers and functions
In order to use the tool more efficiently it is helpful to know more about Oracle kernel layers and function names

Useful info:


Important notes
1) Obviously some important actions happen inside functions without dedicated function calls. Thus these actions are invisible.
As an example, which I would like to emphasize is the functions related with latches.
There are dedicated functions related with latches in KSL (Kernel Service layer Latching & Wait) layer as kslgetl (KSL Get Exclusive Latch), kslgetsl (KSL Get Shared Latch), etc.
But also latches can be captured inside functions like kcbgtcr (consistent get) or kcbgcur (current get) without dedicated calls of ksl* functions.
Looks like duplicated code. In fact historically there are macroses KSLBEGIN / KSLEND in many functions, so code is not duplicated but as a result we do not see actions with latches as dedicated calls.

Following query can be helpful

SQL> SELECT t1.ksllasnam "parent_name",
  2         t2.ksllwnam  "location"
  3    FROM x$ksllw t2, x$kslwsc t1
  4   WHERE t2.indx = t1.indx
  5     and ksllwnam like 'kcbrls%';
 
parent_name            location
---------------------- -----------------
cache buffers chains   kcbrls: kslbegin

2) not enough memory for probes
If you will try to trace oracle processes without calls filter (option -f) or with wide filter, for example k* (to trace all kernel functions), you very likely will be faced with an error “failed to create probe … Not enough space”, solution is here

3) Use the tool only in testing environments. Some options (for example option -K – to trace kernel calls) can reduce response time of a system.

4) DTrace has lazy load
Some probes from shared libraries are not fired if patterns are used.
If you will do something like (pay attention the first and the third command are the same)

dtrace -n "pid<PID>:lib*:gethrtime:entry"
dtrace -n "pid<PID>:libc.so.1:gethrtime:entry"
dtrace -n "pid<PID>:lib*:gethrtime:entry"

then the first run you will get an error

dtrace: invalid probe specifier pid7144:lib*:gethrtime:entry: probe description pid7144:lib*:gethrtime:entry does not match any probes

but the second and the third run will be succesfully.
The reason is in lazy load.

See also: http://www.mail-archive.com/dtrace-discuss@opensolaris.org/msg03853.html


Examples
Buffer is pinned count
Timing: rowsource statistics. Part 1: How it works
Timing: rowsource statistics. Part 2: Overhead and inconsistent time


Post Scriptum
Any ideas how to improve the tool or bugs are welcome.

  1. shany
    May 23, 2013 at 6:24 am

    Could you please provide source code of dagger tool. The link which you have provided in the post is not taking me to the site.

  1. July 26, 2012 at 2:40 pm
  2. December 24, 2012 at 8:56 pm
  3. December 24, 2012 at 9:16 pm

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 82 other followers

%d bloggers like this: