Home > Oracle > filter IS NOT NULL

filter IS NOT NULL

November 16, 2013 Leave a comment Go to comments

Introduction
Although Oracle is smart enough it is always helpful to double check. Even in cases where you expect that Oracle aware about nulls and it will apply necessary filters itself, result can be unexpected.
Below are several cases where you would get a benefit from manually added filter which could look unnecessary and redundant.

1. Join

Let’s suggest we have a query

select --+ leading(e) use_nl(d)
       *
  from emp e, deptno d
 where e.deptno = d.deptno;

with nested loop join with leading table E, where column e.deptno is nullable and contains nulls.

So, the question is – will Oracle perform join iteration when get NULL in join column (e.deptno) or will skip join iteration?
Let’s check it.

SQL> create table t1 as select rownum id, '*' pad from dual connect by level <= 1000;

Table created

SQL> create table t2 as select rownum id, '*' pad from dual connect by level <= 1000;

Table created

SQL> update t1 set id = null where id > 10;

990 rows updated

SQL> commit;

Commit complete.

SQL> alter session set statistics_level=all;

Session altered.

SQL> select --+ leading(t1) use_nl(t2)
  2         count(*)
  3    from t1, t2
  4   where t1.id = t2.id;

  COUNT(*)
----------
        10

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.04 |    4004 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.04 |    4004 |
|   2 |   NESTED LOOPS      |      |      1 |     10 |     10 |00:00:00.04 |    4004 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |   1000 |   1000 |00:00:00.01 |       4 |
|*  4 |    TABLE ACCESS FULL| T2   |   1000 |      1 |     10 |00:00:00.04 |    4000 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T1"."ID"="T2"."ID")

Note
-----
   - dynamic sampling used for this statement (level=2)

T1.ID contains only 10 rows with not NULLs. However, as you can see FTS of T1 returned 1000 rows (column A-Rows) and table T2 has been scanned 1000 times (column Starts). The join condition is checked on inner row source.
Note: inner rowsource can be any row source, I am keeping Full Table Scan here just for simplicity

Let’s try to give Oracle more information about NULLs.

SQL> exec dbms_stats.gather_table_stats(user, 'T1', estimate_percent => 100, no_invalidate => false);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(user, 'T2', estimate_percent => 100, no_invalidate => false);

PL/SQL procedure successfully completed.

SQL> select --+ leading(t1) use_nl(t2)
  2         count(*)
  3    from t1, t2
  4   where t1.id = t2.id;

  COUNT(*)
----------
        10

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      44 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      44 |
|   2 |   NESTED LOOPS      |      |      1 |     10 |     10 |00:00:00.01 |      44 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |     10 |     10 |00:00:00.01 |       4 |
|*  4 |    TABLE ACCESS FULL| T2   |     10 |      1 |     10 |00:00:00.01 |      40 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."ID" IS NOT NULL)
   4 - filter("T1"."ID"="T2"."ID")

Statistics are gathered and now we have a filter, as a result the table T1 returned only 10 rows and the table T2 has been scanned only 10 times.

There is a threshold, it looks like the threshold is numrows/nullcnt > 5%, that means that the filter is added for the table when the number of NULLs is more than 5%.

SQL > exec dbms_stats.set_table_stats(user, 'T1', numrows => 1000000);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_column_stats(user, 'T1', 'ID', nullcnt => 50000, no_invalidate => false);

PL/SQL procedure successfully completed.

SQL> select --+ leading(t1) use_nl(t2)
  2         count(*)
  3    from t1, t2
  4   where t1.id = t2.id;

  COUNT(*)
----------
        10

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.03 |    3003 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.03 |    3003 |
|   2 |   NESTED LOOPS      |      |      1 |    950K|     10 |00:00:00.03 |    3003 |
|   3 |    TABLE ACCESS FULL| T1   |      1 |   1000K|   1000 |00:00:00.01 |       3 |
|*  4 |    TABLE ACCESS FULL| T2   |   1000 |      1 |     10 |00:00:00.03 |    3000 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T1"."ID"="T2"."ID")

Table T2 has been scanned 1000 times again. Let’s increase nulls count

SQL> exec dbms_stats.set_column_stats(user, 'T1', 'ID', nullcnt => 50001, no_invalidate => false);

PL/SQL procedure successfully completed.

SQL> select --+ leading(t1) use_nl(t2)
  2         count(*)
  3    from t1, t2
  4   where t1.id = t2.id;

  COUNT(*)
----------
        10

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      33 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      33 |
|   2 |   NESTED LOOPS      |      |      1 |    949K|     10 |00:00:00.01 |      33 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    949K|     10 |00:00:00.01 |       3 |
|*  4 |    TABLE ACCESS FULL| T2   |     10 |      1 |     10 |00:00:00.01 |      30 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - filter("T1"."ID" IS NOT NULL)
   4 - filter("T1"."ID"="T2"."ID")

Now table T2 has been scanned 10 times.

Fortunately it is always possible to use a workaround :-) and add condition “…is not null” manually

SQL> exec dbms_stats.set_column_stats(user, 'T1', 'ID', nullcnt => 50000, no_invalidate => false);

PL/SQL procedure successfully completed.

SQL> select --+ leading(t1) use_nl(t2)
  2          count(*)
  3       from t1, t2
  4       where t1.id = t2.id
  5         and t1.id is not null;

  COUNT(*)
----------
        10

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

--------------------------------------------------------------------------------------
| Id  | Operation           | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |      1 |        |      1 |00:00:00.01 |      33 |
|   1 |  SORT AGGREGATE     |      |      1 |      1 |      1 |00:00:00.01 |      33 |
|   2 |   NESTED LOOPS      |      |      1 |    902K|     10 |00:00:00.01 |      33 |
|*  3 |    TABLE ACCESS FULL| T1   |      1 |    950K|     10 |00:00:00.01 |       3 |
|*  4 |    TABLE ACCESS FULL| T2   |     10 |      1 |     10 |00:00:00.01 |      30 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - filter("T1"."ID" IS NOT NULL)
   4 - filter("T1"."ID"="T2"."ID")

Table T2 has been scanned 10 times.

2. Exadata offloading
The filter is especially important for Exadata where this threshold will determine whether all rows from the table will be returned from cells or filter will be offloaded.

Let’s consider more simple example with the only aggregate function.

Documentation tells us:
All aggregate functions except COUNT(*), GROUPING, and GROUPING_ID ignore nulls.

Let’s check following example

SQL> show parameter statistics_level

NAME                                 TYPE                             VALUE
------------------------------------ -------------------------------- ------------------------------
statistics_level                     string                           TYPICAL

SQL> set timing on

SQL> select --+ gather_plan_statistics
  2         max(subobject_name)
  3    from table_big01;

MAX(SUBOBJECT_NAME)
------------------------------
WRH$_WAITST_1321133030_1755

Elapsed: 00:00:46.46

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |             |      1 |        |      1 |00:00:45.98 |    7117K|   7117K|
|   1 |  SORT AGGREGATE            |             |      1 |      1 |      1 |00:00:45.98 |    7117K|   7117K|
|   2 |   TABLE ACCESS STORAGE FULL| TABLE_BIG01 |      1 |    534M|    534M|00:01:02.99 |    7117K|   7117K|
-------------------------------------------------------------------------------------------------------------

So, the query executed 46 seconds, 534M rows were returned from storage cells.
Let’s add the filter manually

SQL> select --+ gather_plan_statistics
  2         max(subobject_name)
  3    from table_big01
  4   where subobject_name is not null;

MAX(SUBOBJECT_NAME)
------------------------------
WRH$_WAITST_1321133030_1755

Elapsed: 00:00:10.27

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |             |      1 |        |      1 |00:00:09.79 |    7117K|   7117K|
|   1 |  SORT AGGREGATE            |             |      1 |      1 |      1 |00:00:09.79 |    7117K|   7117K|
|*  2 |   TABLE ACCESS STORAGE FULL| TABLE_BIG01 |      1 |     26M|     26M|00:00:11.24 |    7117K|   7117K|
-------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - storage("SUBOBJECT_NAME" IS NOT NULL)
       filter("SUBOBJECT_NAME" IS NOT NULL)

26M rows were returned and only 10 seconds was consumed. Difference in times. Not bad.
The most of the rows were filtered on storage cells and only not nullable columns were returned.

offtop: column A-Time in the previous execution plan shows an example of rowsource statistics time inconsistency because I have used hint gather_plan_statistics. However I preferred the hint and time inconsistency because with parameter statistics_level=all there is chance significantly increase execution time. Let’s check

SQL> alter session set statistics_level=all;

Session altered.

SQL> select max(subobject_name)
  2    from table_big01;

MAX(SUBOBJECT_NAME)
------------------------------
WRH$_WAITST_1321133030_1755

Elapsed: 00:01:33.96

SQL> select * from table(dbms_xplan.display_cursor(null, null, 'allstats last'));

-------------------------------------------------------------------------------------------------------------
| Id  | Operation                  | Name        | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT           |             |      1 |        |      1 |00:01:33.47 |    7117K|   7117K|
|   1 |  SORT AGGREGATE            |             |      1 |      1 |      1 |00:01:33.47 |    7117K|   7117K|
|   2 |   TABLE ACCESS STORAGE FULL| TABLE_BIG01 |      1 |    534M|    534M|00:00:58.90 |    7117K|   7117K|
-------------------------------------------------------------------------------------------------------------

Now rowsource statistics timing is consistent but execution is twice longer! 90s vs 46s

3. Aggregate functions and lost Index Fast Full Scan

Another case where you would need to add filter manually is operations with aggregate functions on nullable column. For instance the simplest arithmetic operation on aggregate function and Oracle does not recognize that it can use index fast full scan.

SQL> create table tbl as select * from all_objects;

Table created

SQL> create index idx on tbl (data_object_id);

Index created

SQL> explain plan for
  2  select --+ index_ffs(t)
  3         avg(data_object_id)
  4    from tbl t;

Explained.

SQL> select * from table(dbms_xplan.display(null,null,'basic cost'));

---------------------------------------------------
| Id  | Operation             | Name | Cost (%CPU)|
---------------------------------------------------
|   0 | SELECT STATEMENT      |      |     7   (0)|
|   1 |  SORT AGGREGATE       |      |            |
|   2 |   INDEX FAST FULL SCAN| IDX  |     7   (0)|
---------------------------------------------------

SQL> explain plan for
  2  select --+ index_ffs(t)
  3         -avg(data_object_id)
  4    from tbl t;

Explained.

SQL> select * from table(dbms_xplan.display(null,null,'basic cost'));

------------------------------------------------
| Id  | Operation          | Name | Cost (%CPU)|
------------------------------------------------
|   0 | SELECT STATEMENT   |      |   278   (1)|
|   1 |  SORT AGGREGATE    |      |            |
|   2 |   TABLE ACCESS FULL| TBL  |   278   (1)|
------------------------------------------------

Notice the only difference in the queries is “minus”, avg() vs -avg(), and Full Table Scan in the result.

And again manually added filter helps:

SQL> explain plan for
  2  select --+ index_ffs(t)
  3         -avg(data_object_id)
  4    from tbl t
  5   where data_object_id is not null;

Explained.

SQL> select * from table(dbms_xplan.display(null,null,'basic cost'));

---------------------------------------------------
| Id  | Operation             | Name | Cost (%CPU)|
---------------------------------------------------
|   0 | SELECT STATEMENT      |      |     7   (0)|
|   1 |  SORT AGGREGATE       |      |            |
|   2 |   INDEX FAST FULL SCAN| IDX  |     7   (0)|
---------------------------------------------------

Index Fast Full Scan instead of Full Table Scan now.

Conclusion: even if a query looks very simple and there is nothing to improve, seemingly unnecessary and redundant they can be very helpful.

update:
Just a more realistic example for the point (1) above about NULLs filtering in joins.
This is a simple example demonstrating that even it is too obious that condition t1.n = t.2 will never be completed for NULLs,
Oracle does not check the conditions on Nested Loops step itself, it checks conditions on inner row source, join iterations are not skipped.

SQL> create table t1 as
  2    select rownum as n,
  3           rownum as m,
  4           '*' as pad
  5      from dual
  6   connect by level <= 10000;

Table created

SQL> create table t2 as
  2    select * from t1;

Table created

SQL> create index idx2 on t2(n);

Index created

SQL> update t1 set m = null where rownum <= 5000;

5000 rows updated

SQL> commit;

Commit complete

So, now 50% of t1.m are NULLs.

SQL> select * from t1 sample(0.1);

         N          M PAD
---------- ---------- ---
       130            *
       619            *
      2359            *
      3334            *
      4212            *
      4752            *
      6040       6040 *
      6277       6277 *
      6393       6393 *
      6704       6704 *
      7320       7320 *
      9169       9169 *

10 rows selected

Let’s check the following query

SQL> select --+ leading(t1) use_nl(t2)
  2         sum(t2.m)
  3    from t1, t2
  4   where t1.n = t2.n
  5     and t1.m = t2.m;

 SUM(T2.M)
----------
  37502500

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |      1 |00:00:00.15 |     193 |
|   1 |  SORT AGGREGATE               |      |      1 |      1 |      1 |00:00:00.15 |     193 |
|   2 |   NESTED LOOPS                |      |      1 |        |   5000 |00:00:00.15 |     193 |
|   3 |    NESTED LOOPS               |      |      1 |  10000 |  10000 |00:00:00.09 |     172 |
|   4 |     TABLE ACCESS FULL         | T1   |      1 |  10000 |  10000 |00:00:00.01 |      24 |
|*  5 |     INDEX RANGE SCAN          | IDX2 |  10000 |      1 |  10000 |00:00:00.04 |     148 |
|*  6 |    TABLE ACCESS BY INDEX ROWID| T2   |  10000 |      1 |   5000 |00:00:00.03 |      21 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   5 - access("T1"."N"="T2"."N")
   6 - filter("T1"."M"="T2"."M")
Note
-----
   - dynamic sampling used for this statement (level=2)

Here 10K rows were fetched from T1, the index IDX2 was scanned 10K times, 10K rows were returned from the index scan, and then all of them were looked up on the table T2. The step 6 was also called 10K times. All NULLs were filtered only on the step 6.

Let’s add filter manually

SQL> select --+ leading(t1) use_nl(t2) 
  2         sum(t2.m)
  3    from t1, t2
  4   where t1.n = t2.n
  5     and t1.m = t2.m
  6     and t1.m is not null;

 SUM(T2.M)
----------
  37502500

------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |      |      1 |        |      1 |00:00:00.08 |     113 |
|   1 |  SORT AGGREGATE               |      |      1 |      1 |      1 |00:00:00.08 |     113 |
|   2 |   NESTED LOOPS                |      |      1 |        |   5000 |00:00:00.07 |     113 |
|   3 |    NESTED LOOPS               |      |      1 |   5000 |   5000 |00:00:00.04 |     102 |
|*  4 |     TABLE ACCESS FULL         | T1   |      1 |   5000 |   5000 |00:00:00.01 |      24 |
|*  5 |     INDEX RANGE SCAN          | IDX2 |   5000 |      1 |   5000 |00:00:00.02 |      78 |
|*  6 |    TABLE ACCESS BY INDEX ROWID| T2   |   5000 |      1 |   5000 |00:00:00.01 |      11 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter("T1"."M" IS NOT NULL)
   5 - access("T1"."N"="T2"."N")
   6 - filter("T1"."M"="T2"."M")
Note
-----
   - dynamic sampling used for this statement (level=2)

Now T1 returned 5000 not NULL rows, the index was scanned 5000 times, and then 5000 rows were checked on the step 6. As a result the step 6 was called 5000 times.

About these ads
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 77 other followers

%d bloggers like this: