Here is an interesting limitation to
Exadata Smart Scans - if more than
254 columns from a table (not HCC compressed, more on that in moment) need to be projected, Smart Scans for that particular segment will be
disabled and Exadata will fall back to conventional I/O. This means that the number of columns in the projection clause can make a
significant difference to performance, since only Smart Scans allow taking advantage of
offloading and particularly
avoiding I/O via Storage Indexes.
Now the expression "254 columns" might ring a bell, since it is the maximum number of columns that Oracle can store in a
single row piece - tables consisting of more than
254 columns will have to be stored in
multiple row pieces.
However, what I'm talking about here is not related to such columns residing in different row pieces of a row - Smart Scans still happily work even if columns from
different row pieces are projected (which was subject to several
bugs in the past), although you might end up with additional
"cell single block reads" in case of truly
chained rows rather than just additional logical I/O for picking up the different row pieces from the same block, also sometimes called
"intra-block" chaining.
No, the limitation simply seems to be that Smart Scans - broadly speaking and ignoring edge cases - can only
transport a maximum of 254 columns from a single (non-HCC) segment. Requesting more columns will simply
disable Smart Scans for that segment.
Now you might say, offloading and in particular offloading column projection isn't that much relevant if you select that many columns from a table anyway, but the point is that you loose the ability to benefit from
Storage Indexes and only transporting the
relevant rows back to the compute nodes.
Both features can speed up the processing significantly, in particular if the number of rows selected is only a fraction of the total number of rows, and/or the cells could avoid a significant amount of I/O via Storage Indexes.
To demonstrate the point I've put together a simple test case that generates a test table with more than 254 columns - the script below generates a table of approx.
40GB uncompressed size so that a significant difference in performance could be measured.
set echo on timing on time on
-- MAX is 999, there is a ID column as first col
define num_cols = 300
define compression = nocompress
--define compression = "compress for query low"
drop table many_cols_rg;
purge table many_cols_rg;
declare
s_sql varchar2(32767);
begin
s_sql := q'!
create table many_cols_rg pctfree 0
&compression
parallel nologging
as
with generator1 as
(
select /*+ materialize cardinality(1000) */
rownum as id
-- this makes the rowsource wider otherwise PX BLOCK ITERATOR has problems properly spreading work among the slaves
, rpad('x', 4000) as padding
from
dual
connect by
level <= 1e3
),
generator2 as
(
select /*+ materialize cardinality(10000) */
rownum as id
-- this makes the rowsource wider otherwise PX BLOCK ITERATOR has problems properly spreading work among the slaves
, rpad('x', 4000) as padding
from
dual
connect by
level <= 1e4
)
select
num_id as id
!';
for i in 1..&num_cols loop
s_sql := s_sql || ', char_id as col' || to_char(i, 'FM000');
end loop;
s_sql := s_sql || q'!
from
(
select /*+ no_merge */
a.id + (b.id - 1) * 1e3 as num_id
, cast(to_char(a.id + (b.id - 1) * 1e3, 'FM0000000000') as varchar2(10)) as char_id
from
generator1 a
, generator2 b
)
!';
execute immediate s_sql;
end;
/
exec dbms_stats.gather_table_stats(null, 'many_cols_rg')
Assuming a
Storage Index was generated on the
ID column a query like the following (using
Parallel Query at a DOP of 32 in the test runs here) can benefit from offloading, since in principle all I/O could be avoided via the Storage Index, and virtually no data needs to be transported from the cells to the compute nodes.
Note that it projects
exactly 254 columns from
different row pieces.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
11:28:22 SQL>
11:28:22 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;
NAME VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 0
cell physical IO interconnect bytes returned by smart scan 0
Elapsed: 00:00:00.00
11:28:22 SQL>
11:28:22 SQL> select
11:28:22 2 col001,
11:28:22 3 col002,
.
.
.
11:28:23 254 col253,
11:28:23 255 col300/*,
11:28:23 256 col254*/
11:28:23 257 from many_cols_rg where id between -2 and -1;
no rows selected
Elapsed: 00:00:02.40
11:28:25 SQL>
11:28:25 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;
NAME VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 2,1195E+10
cell physical IO interconnect bytes returned by smart scan 3000784
Elapsed: 00:00:00.01
11:28:25 SQL>
As you can see from the snippet, it took less than
2.5 seconds to run the query on the 40GB segment, and the session statistics report 20GB avoided via the Storage Index (which seems to be an instrumentation bug as it always reports only 50% of the total segment size as a maximum, this output was taken from 11.2.0.2 Exadata BP14). Furthermore only a couple of MB were exchanged between the cells and the compute nodes.
The corresponding Real-Time SQL Monitoring report confirms the "Smart Scan":
Increasing the number of columns projected from the segment above 254 (and as outlined above it doesn't matter from which row pieces these columns come from) disables the Smart Scan, and it takes more than
9 seconds to run essentially the same query, pumping all
40GB through the compute nodes to filter all rows.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
11:29:14 SQL>
11:29:14 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;
NAME VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 0
cell physical IO interconnect bytes returned by smart scan 0
Elapsed: 00:00:00.00
11:29:14 SQL>
11:29:14 SQL> select
11:29:14 2 col001,
11:29:14 3 col002,
.
.
.
11:29:15 254 col253,
11:29:15 255 col300,
11:29:15 256 col254
11:29:15 257 from many_cols_rg where id between -2 and -1;
no rows selected
Elapsed: 00:00:09.22
11:29:24 SQL>
11:29:24 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;
NAME VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 0
cell physical IO interconnect bytes returned by smart scan 0
Elapsed: 00:00:00.01
11:29:24 SQL>
The corresponding Real-Time SQL Monitoring report confirms the fallback to "direct path reads":
Funnily, in this deliberately crafted, extreme case here, it is much faster to
access the segment twice and get the remaining columns via a
self-join in order to benefit from the offloading features - it only takes
4.7 seconds to run the self-join, and the session statistics confirm that
both segment scans could leverage offloading and in particular
avoid I/O via Storage Indexes:
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
11:29:37 SQL>
11:29:37 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;
NAME VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 0
cell physical IO interconnect bytes returned by smart scan 0
Elapsed: 00:00:00.00
11:29:37 SQL>
11:29:37 SQL> select
11:29:37 2 a.col001,
11:29:37 3 a.col002,
.
.
.
11:29:37 254 a.col253,
11:29:37 255 a.col300,
11:29:37 256 b.col254
11:29:37 257 from many_cols_rg a, many_cols_rg b
11:29:37 258 where a.id between -2 and -1 and b.id between -2 and -1
11:29:37 259 and a.id = b.id;
no rows selected
Elapsed: 00:00:04.77
11:29:42 SQL>
11:29:42 SQL> select a.name, b.value from v$mystat b, v$statname a where a.name in ('cell physical IO bytes saved by storage index', 'cell physical IO interconnect bytes returned by smart scan') and a.statistic# = b.statistic#;
NAME VALUE
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index 4,2390E+10
cell physical IO interconnect bytes returned by smart scan 6001568
Elapsed: 00:00:00.01
11:29:42 SQL>
11:29:42 SQL>
Note: This is one of the cases where you don't want the optimizer to
eliminate a self-join via a query transformation based on corresponding constraints on the join columns :-)
HCC Compression
Now the funny thing is that if you repeat the table creation script but
uncomment the HCC compression, the Smart Scan happily works with up to
1,000 columns of such a compressed segment.
So obviously the general code implementation supports transporting rows with more than 254 columns from the cell to the compute nodes, but the question is why does it only do so with HCC compressed segments. It's probably a question that my client will raise with Oracle Support to find out the answer.
Footnote
At least only the number of
"raw" columns projected count towards the limitation - any expressions based on columns don't count, therefore you can project actually more than 254 expressions from a segment and still benefit from Smart Scans as long as the expressions refer to a
maximum of 254 "base" columns.
The same limitation could also be reproduced when using (almost) the latest available Exadata version as of writing this (11.2.0.3 BP12, almost because BP14 just came out as far as I know).
Thanks Randolf, very interesting post .
ReplyDeleteGregG
Hi Randolf,
ReplyDeleteJust FYI, this is documented in the Exadata Storage Server Owners Guide - "A query that has more than 255 columns referenced and heap table is uncompressed, or Basic or OLTP compressed. However such queries on Exadata Hybrid Columnar Compression-compressed tables are offloaded".
It's on Page 162 of the version I have (August 2011)
Cheers,
Tim
Hi Tim,
ReplyDeletecool, thanks for the pointer. So this is a documented limitation - I didn't know that.
Randolf
Very good post, thanks?
ReplyDelete> but the question is why does it only do so with HCC compressed segments?
=> I guess to force usage of HCC in order to speed-up finding and fixing of bugs.
cheers,
goran
Interesting post, Thanks!
ReplyDeleteWill smartscan work with other types of compression e.g. OLTP with > 254 column referenced
> Will smartscan work with other types of compression e.g. OLTP with > 254 column referenced
ReplyDeleteOLTP and basic compresssion only work with tables having less than 255 columns, so there won't be any actual compression, although Oracle doesn't complain, just silently falls back to no compression.
Since the internal organisation of a block therefore doesn't change, I'm pretty sure the same restriction applies to such a scenario, too.
Randolf