In der aktuellen Ausgabe des für Mitglieder der DOAG, SOUG und AOUG kostenlosen DOAG Red Stack Magazins wurde der erste Teil meiner zweiteiligen Artikelserie "Oracle Database Cloud Performance" veröffentlicht.
Die Artikelserie basiert auf den hier bereits publizierten Erkenntnissen in diesem Bereich und führt diese weiter fort.
Der erste Teil geht auf die verschiedenen Aspekte der maximal erreichbare Performance ein (CPU, Storage etc.), der zweite Teil wird in der nächsten Ausgabe zu lesen sein und legt den Schwerpunkt auf die Konsistenz der Performance, also wie konsistent sich die Datenbanken in der Cloud in Bezug auf Performance während der Tests verhalten haben.
Wednesday, June 28, 2017
Wednesday, May 31, 2017
New workshop "Exadata For Developers"
Just a short note that I've developed a new two day course that covers all relevant features that a database application developer should know when dealing with the Oracle Exadata Database Machine platform.
It covers in detail Smart Scans, the Exadata Flash Cache, Hybrid Columnar Compression and all surrounding features like Storage Indexes, (serial) direct path reads etc. etc.. Of course it also includes features that were added in 12c, like Attribute Clustering and Zone Maps.
All features are presented with live demo scripts, and there will be enough time to discuss your specific questions and analyse existing applications if desired.
For more information and details, check the corresponding pages:
German: Exadata für Anwendungsentwickler
English: Exadata For Developers
It covers in detail Smart Scans, the Exadata Flash Cache, Hybrid Columnar Compression and all surrounding features like Storage Indexes, (serial) direct path reads etc. etc.. Of course it also includes features that were added in 12c, like Attribute Clustering and Zone Maps.
All features are presented with live demo scripts, and there will be enough time to discuss your specific questions and analyse existing applications if desired.
For more information and details, check the corresponding pages:
German: Exadata für Anwendungsentwickler
English: Exadata For Developers
Monday, February 6, 2017
Oracle Database Cloud (DBaaS) Performance - Part 4 - Network
In the last part of this installment I'll have a brief look at the network performance measured in the Oracle DBaaS environment, in particular the network interface that gets used as private interconnect in case of RAC configuration. The network performance could also be relevant when evaluating how to transfer data to the cloud database.
I've used the freely available "iperf" tool to measure the network bandwidth and got the following results:
[root@test12102rac2 ~]# iperf3 -c 10.196.49.126
Connecting to host 10.196.49.126, port 5201
[ 4] local 10.196.49.130 port 41647 connected to 10.196.49.126 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 651 MBytes 5.46 Gbits/sec 15 786 KBytes
[ 4] 1.00-2.00 sec 823 MBytes 6.90 Gbits/sec 11 1.07 MBytes
[ 4] 2.00-3.00 sec 789 MBytes 6.62 Gbits/sec 7 1014 KBytes
[ 4] 3.00-4.00 sec 700 MBytes 5.87 Gbits/sec 39 1.04 MBytes
[ 4] 4.00-5.00 sec 820 MBytes 6.88 Gbits/sec 21 909 KBytes
[ 4] 5.00-6.00 sec 818 MBytes 6.86 Gbits/sec 17 1.17 MBytes
[ 4] 6.00-7.00 sec 827 MBytes 6.94 Gbits/sec 21 1005 KBytes
[ 4] 7.00-8.00 sec 792 MBytes 6.64 Gbits/sec 8 961 KBytes
[ 4] 8.00-9.00 sec 767 MBytes 6.44 Gbits/sec 4 1.11 MBytes
[ 4] 9.00-10.00 sec 823 MBytes 6.91 Gbits/sec 6 1.12 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 7.63 GBytes 6.55 Gbits/sec 149 sender
[ 4] 0.00-10.00 sec 7.63 GBytes 6.55 Gbits/sec receiver
iperf Done.
I've used the freely available "iperf" tool to measure the network bandwidth and got the following results:
[root@test12102rac2 ~]# iperf3 -c 10.196.49.126
Connecting to host 10.196.49.126, port 5201
[ 4] local 10.196.49.130 port 41647 connected to 10.196.49.126 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 651 MBytes 5.46 Gbits/sec 15 786 KBytes
[ 4] 1.00-2.00 sec 823 MBytes 6.90 Gbits/sec 11 1.07 MBytes
[ 4] 2.00-3.00 sec 789 MBytes 6.62 Gbits/sec 7 1014 KBytes
[ 4] 3.00-4.00 sec 700 MBytes 5.87 Gbits/sec 39 1.04 MBytes
[ 4] 4.00-5.00 sec 820 MBytes 6.88 Gbits/sec 21 909 KBytes
[ 4] 5.00-6.00 sec 818 MBytes 6.86 Gbits/sec 17 1.17 MBytes
[ 4] 6.00-7.00 sec 827 MBytes 6.94 Gbits/sec 21 1005 KBytes
[ 4] 7.00-8.00 sec 792 MBytes 6.64 Gbits/sec 8 961 KBytes
[ 4] 8.00-9.00 sec 767 MBytes 6.44 Gbits/sec 4 1.11 MBytes
[ 4] 9.00-10.00 sec 823 MBytes 6.91 Gbits/sec 6 1.12 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 7.63 GBytes 6.55 Gbits/sec 149 sender
[ 4] 0.00-10.00 sec 7.63 GBytes 6.55 Gbits/sec receiver
iperf Done.
So the network bandwidth seems to be something between 6 and 7 Gbits/sec, which is not too bad.
For completeness, the UDP results look like the following:
[root@test12102rac2 ~]# iperf3 -c 10.196.49.126 -u -b 10000M
Connecting to host 10.196.49.126, port 5201
[ 4] local 10.196.49.130 port 55482 connected to 10.196.49.126 port 5201
[ ID] Interval Transfer Bandwidth Total Datagrams
[ 4] 0.00-1.00 sec 494 MBytes 4.14 Gbits/sec 63199
[ 4] 1.00-2.00 sec 500 MBytes 4.20 Gbits/sec 64057
[ 4] 2.00-3.00 sec 462 MBytes 3.87 Gbits/sec 59102
[ 4] 3.00-4.00 sec 496 MBytes 4.16 Gbits/sec 63491
[ 4] 4.00-5.00 sec 482 MBytes 4.05 Gbits/sec 61760
[ 4] 5.00-6.00 sec 425 MBytes 3.57 Gbits/sec 54411
[ 4] 6.00-7.00 sec 489 MBytes 4.10 Gbits/sec 62574
[ 4] 7.00-8.00 sec 411 MBytes 3.45 Gbits/sec 52599
[ 4] 8.00-9.00 sec 442 MBytes 3.71 Gbits/sec 56541
[ 4] 9.00-10.00 sec 481 MBytes 4.04 Gbits/sec 61614
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-10.00 sec 4.57 GBytes 3.93 Gbits/sec 0.028 ms 23434/599340 (3.9%)
[ 4] Sent 599340 datagrams
iperf Done.
For completeness, the UDP results look like the following:
[root@test12102rac2 ~]# iperf3 -c 10.196.49.126 -u -b 10000M
Connecting to host 10.196.49.126, port 5201
[ 4] local 10.196.49.130 port 55482 connected to 10.196.49.126 port 5201
[ ID] Interval Transfer Bandwidth Total Datagrams
[ 4] 0.00-1.00 sec 494 MBytes 4.14 Gbits/sec 63199
[ 4] 1.00-2.00 sec 500 MBytes 4.20 Gbits/sec 64057
[ 4] 2.00-3.00 sec 462 MBytes 3.87 Gbits/sec 59102
[ 4] 3.00-4.00 sec 496 MBytes 4.16 Gbits/sec 63491
[ 4] 4.00-5.00 sec 482 MBytes 4.05 Gbits/sec 61760
[ 4] 5.00-6.00 sec 425 MBytes 3.57 Gbits/sec 54411
[ 4] 6.00-7.00 sec 489 MBytes 4.10 Gbits/sec 62574
[ 4] 7.00-8.00 sec 411 MBytes 3.45 Gbits/sec 52599
[ 4] 8.00-9.00 sec 442 MBytes 3.71 Gbits/sec 56541
[ 4] 9.00-10.00 sec 481 MBytes 4.04 Gbits/sec 61614
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Jitter Lost/Total Datagrams
[ 4] 0.00-10.00 sec 4.57 GBytes 3.93 Gbits/sec 0.028 ms 23434/599340 (3.9%)
[ 4] Sent 599340 datagrams
iperf Done.
Finally, "ping" results look like the following:
9665 packets transmitted, 9665 received, 0% packet loss, time 9665700ms
rtt min/avg/max/mdev = 0.135/0.308/199.685/3.322 ms
So an average latency of 0.3 ms also doesn't look too bad.
[Update 6.2.2017]: Thanks to Frits Hoogland who pointed out the very high "max" value for the ping. Although I didn't spot the pattern that he saw in a different network test setup ("cross cloud platform"), which was an initial slowness, it's still worth to point out the high "max" value of almost 200 ms for a ping, and also the "mdev" value of 3.322 ms seems to suggest that there were some significant variations in ping times observed that are potentially hidden behind the average values provided. I'll repeat the ping test and see if I can reproduce these outliers and if yes, find out more details.
[Update 6.2.2017]: Thanks to Frits Hoogland who pointed out the very high "max" value for the ping. Although I didn't spot the pattern that he saw in a different network test setup ("cross cloud platform"), which was an initial slowness, it's still worth to point out the high "max" value of almost 200 ms for a ping, and also the "mdev" value of 3.322 ms seems to suggest that there were some significant variations in ping times observed that are potentially hidden behind the average values provided. I'll repeat the ping test and see if I can reproduce these outliers and if yes, find out more details.
Labels:
DBaaS,
Oracle Cloud,
Performance
Monday, January 30, 2017
Oracle Database Cloud (DBaaS) Performance - Part 3 - Storage - 12.2 Update
Recently I repeated the I/O related tests on a 12.2.0.1 instance for curiosity and was surprised by the fact that I consistently got significantly better results as on 11.2.0.4 and 12.1.0.2.
Now you're probably aware that the version 12.2 so far is "cloud-only", so I can't tell / test whether the version 12.2 is generically providing that increased performance or whether Oracle has optimized the underlying stack, so that previous versions in general could also benefit from better performance if they ran on the same platform. Repeated tests with versions 11.2.0.4 and 12.1.0.2 confirmed the performance figures reported in the previous installment of this series, so as of the time of writing it's only the version 12.2 that provides the improved I/O performance.
Note that as of the time of writing only a single instance configuration was supported with version 12.2, so I wasn't able to run the tests in RAC configuration.
Here are the 12.2 I/O related test results:
IOPS
Again running the test on a 4 OCPU single instance configuration (results in 8 CPUs / 8 cores as outlined previously) with eight sessions:
So that is more than 65,000 IOPS on average at 8 KB block size, significantly more than the corresponding (and already impressive) 40,000 IOPS seen in the previous versions, and even at 16 KB block size still more than 50,000 IOPS - that is more than 800 MB / sec in single block requests!
I/O Throughput
Repeating the same Parallel Execution based test that first creates a very large table (8 KB block size, "direct path write") and then re-reads it using Parallel Query ("direct path read") I got the following results on the 4 OCPU single instance configuration:
Again the results are significantly improved over previous versions. The read performance improved from 640 MB / sec to almost 940 MB / sec. More importantly however the write performance improved from 120 MB / sec to 200 MB / sec, a performance that should allow even more write intensive workloads to perform well.
I/O Latency
The same test as previously was run, switching from asynchronous I/O ("db file parallel read") to synchronous I/O ("db file sequential read") allowing measurement of single block reads, running with 8 threads on the 4 OCPU / 8 CPU / cores configuration at 8 KB block size.
Again an significant improvement across all figures, 0.375 ms average wait time vs. 0.45 ms previously. almost 19,000 IOPS vs. 16,380 IOPS and more than 90% of the waits within 512 microseconds vs. 75% before.
Write Performance
With the significantly improved write throughput figures the expectation was that the random read + write test would perform much better than before, and it did:
In previous versions this test waited almost 50% on "free buffer waits" (due to the minimized buffer cache), so clearly the DBWR couldn't keep up with writing dirty blocks to disk. The picture has changed here significantly, with "free buffer waits" going down to just 4.5% of the overall database time, and performing almost 5,000 write requests per second (and almost 20,000 IOPS in total).
Note that the version 12.2 obviously introduces an optimization that treats repeated modifications of the same kind to the same block (in this case here update a column of a row to the same value as before) not as a "db block change", therefore I had to adjust the test to update the column value to a different value each time. Running the original test case showed a totally different profile, due the minimized "db block changes".
Summary
The 12.2 version in the Oracle Cloud shows a significantly improved I/O performance profile compared to previous versions. So far it is unclear whether that is a generic improvement of the new release, or the underlying stack used by the virtual machines has been optimized.
In particular the improved write performance is an important improvement.
Now you're probably aware that the version 12.2 so far is "cloud-only", so I can't tell / test whether the version 12.2 is generically providing that increased performance or whether Oracle has optimized the underlying stack, so that previous versions in general could also benefit from better performance if they ran on the same platform. Repeated tests with versions 11.2.0.4 and 12.1.0.2 confirmed the performance figures reported in the previous installment of this series, so as of the time of writing it's only the version 12.2 that provides the improved I/O performance.
Note that as of the time of writing only a single instance configuration was supported with version 12.2, so I wasn't able to run the tests in RAC configuration.
Here are the 12.2 I/O related test results:
IOPS
Again running the test on a 4 OCPU single instance configuration (results in 8 CPUs / 8 cores as outlined previously) with eight sessions:
So that is more than 65,000 IOPS on average at 8 KB block size, significantly more than the corresponding (and already impressive) 40,000 IOPS seen in the previous versions, and even at 16 KB block size still more than 50,000 IOPS - that is more than 800 MB / sec in single block requests!
I/O Throughput
Repeating the same Parallel Execution based test that first creates a very large table (8 KB block size, "direct path write") and then re-reads it using Parallel Query ("direct path read") I got the following results on the 4 OCPU single instance configuration:
Again the results are significantly improved over previous versions. The read performance improved from 640 MB / sec to almost 940 MB / sec. More importantly however the write performance improved from 120 MB / sec to 200 MB / sec, a performance that should allow even more write intensive workloads to perform well.
I/O Latency
The same test as previously was run, switching from asynchronous I/O ("db file parallel read") to synchronous I/O ("db file sequential read") allowing measurement of single block reads, running with 8 threads on the 4 OCPU / 8 CPU / cores configuration at 8 KB block size.
Again an significant improvement across all figures, 0.375 ms average wait time vs. 0.45 ms previously. almost 19,000 IOPS vs. 16,380 IOPS and more than 90% of the waits within 512 microseconds vs. 75% before.
Write Performance
With the significantly improved write throughput figures the expectation was that the random read + write test would perform much better than before, and it did:
In previous versions this test waited almost 50% on "free buffer waits" (due to the minimized buffer cache), so clearly the DBWR couldn't keep up with writing dirty blocks to disk. The picture has changed here significantly, with "free buffer waits" going down to just 4.5% of the overall database time, and performing almost 5,000 write requests per second (and almost 20,000 IOPS in total).
Note that the version 12.2 obviously introduces an optimization that treats repeated modifications of the same kind to the same block (in this case here update a column of a row to the same value as before) not as a "db block change", therefore I had to adjust the test to update the column value to a different value each time. Running the original test case showed a totally different profile, due the minimized "db block changes".
Summary
The 12.2 version in the Oracle Cloud shows a significantly improved I/O performance profile compared to previous versions. So far it is unclear whether that is a generic improvement of the new release, or the underlying stack used by the virtual machines has been optimized.
In particular the improved write performance is an important improvement.
Labels:
DBaaS,
Oracle Cloud,
Performance
Monday, January 23, 2017
Oracle Database Cloud (DBaaS) Performance - Part 2 - Storage
In this second part of this installment I'll focus on the performance figures related to I/O encountered when the corresponding tests were performed on the platform.
IOPS
When running with minimum sized buffer cache, direct and asynchronous I/O enabled, the following average read-only IOPS figures were measured over a period of several days (this is the test described in part three of the "performance consistency" series) .
First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) with either four or eight sessions:
Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) with either four or eight sessions:
So for the 8 KB block size the single instance test shows an average of almost 40.000 IOPS for read-only tests, and the two instance RAC even scales to almost 60.000 IOPS on average. These are pretty impressive IOPS figures for a general purpose shared / virtualized environment, and - at least - for the read part - are way above what other DBaaS cloud providers offer out of the box.
It's also worth mentioning that I got the same IOPS results independent from the amount of storage allocated - in contrast to Amazon RDS for example, where even the "Provisioned IOPS" storage class requires you to allocate at least 3 TB of storage in order to get the maximum of 30,000 IOPS. I've repeated the same test setup with the table size inflated to the maximum possible within my test account limitations (so total size of storage allocated close to the 1 TB storage quota) and still got comparable IOPS results to this test that only allocates approx. 1 GB for the 8 KB block size test (total storage allocated approx. 100 GB).
I/O Throughput
In order to measure the maximum I/O throughput corresponding Parallel Execution statements were used to create tables of sufficient size and read them via Parallel Query.
First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) at a degree of 16:
Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) at a degree of 16:
So the single instance configuration writes at 120 MB per second, and reads at 640 MB per second, whereas the two instance RAC configuration writes at 80 MB per second and reads at 1.100 MB per second. Clearly the storage layer is read optimized and writes are much slower than reads. The read performance is very good however, and again above what other cloud providers deliver (for example Amazon RDS "Provisioned IOPS" offers a maximum throughput of 320 MB per second, but again only if a sufficient amount of storage is allocated).
I/O Latency
In order to measure the I/O latency the test used above to determine the IOPS rate was modified to use random single block reads ("db file sequential read") instead of asynchronous reads ("db file parallel read"). The reason for this is that from a database perspective the latency for reads performed asynchronously cannot be measured properly, due to the way the database handles the I/O.
The results were the following, this time running on a 2 OCPU (4 CPUs / 4 cores) single instance configuration with 8 KB block size and 8 sessions:
So again rather impressive 0.45 ms average wait time for a random single block read, and the wait event histogram also shows that the performance was very consistent, with almost 75 percent of the waits taking less than 512 microseconds at a rate of more than 16.000 IOPS.
Write Performance
The read-only test above was modified to be 100 percent updates (for more details see here). Since the instance was configured to operate with a minimum sized buffer cache this puts maximum pressure on the database writer to write dirty blocks as fast as possible in order to allow new blocks to be read into the buffer cache.
Running on a 4 OCPU single instance configuration (8 CPUs / 8 cores) with eight sessions the following profile was measured - using NOARCHIVELOG mode and Flashback disabled:
So clearly the database writer couldn't write the dirty blocks quick enough - almost 50 percent of the database time the sessions had to wait for free buffers. This means that write intensive workloads might not work too well and run into these limitations.
Summary
The storage layer is clearly optimized for reads and delivers at least for the read-only tests a very good I/O performance. Write intensive workloads might not work too well if they consistently require more write performance than provided by the service.
IOPS
When running with minimum sized buffer cache, direct and asynchronous I/O enabled, the following average read-only IOPS figures were measured over a period of several days (this is the test described in part three of the "performance consistency" series) .
First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) with either four or eight sessions:
Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) with either four or eight sessions:
So for the 8 KB block size the single instance test shows an average of almost 40.000 IOPS for read-only tests, and the two instance RAC even scales to almost 60.000 IOPS on average. These are pretty impressive IOPS figures for a general purpose shared / virtualized environment, and - at least - for the read part - are way above what other DBaaS cloud providers offer out of the box.
It's also worth mentioning that I got the same IOPS results independent from the amount of storage allocated - in contrast to Amazon RDS for example, where even the "Provisioned IOPS" storage class requires you to allocate at least 3 TB of storage in order to get the maximum of 30,000 IOPS. I've repeated the same test setup with the table size inflated to the maximum possible within my test account limitations (so total size of storage allocated close to the 1 TB storage quota) and still got comparable IOPS results to this test that only allocates approx. 1 GB for the 8 KB block size test (total storage allocated approx. 100 GB).
I/O Throughput
In order to measure the maximum I/O throughput corresponding Parallel Execution statements were used to create tables of sufficient size and read them via Parallel Query.
First, running on a 4 OCPU single instance configuration (8 CPUs / 8 cores as outlined in the previous part) at a degree of 16:
Second, running on a 2+2 OCPU two instance RAC configuration (4+4 CPUs / 4+4 cores as outlined in the previous part) at a degree of 16:
So the single instance configuration writes at 120 MB per second, and reads at 640 MB per second, whereas the two instance RAC configuration writes at 80 MB per second and reads at 1.100 MB per second. Clearly the storage layer is read optimized and writes are much slower than reads. The read performance is very good however, and again above what other cloud providers deliver (for example Amazon RDS "Provisioned IOPS" offers a maximum throughput of 320 MB per second, but again only if a sufficient amount of storage is allocated).
I/O Latency
In order to measure the I/O latency the test used above to determine the IOPS rate was modified to use random single block reads ("db file sequential read") instead of asynchronous reads ("db file parallel read"). The reason for this is that from a database perspective the latency for reads performed asynchronously cannot be measured properly, due to the way the database handles the I/O.
The results were the following, this time running on a 2 OCPU (4 CPUs / 4 cores) single instance configuration with 8 KB block size and 8 sessions:
So again rather impressive 0.45 ms average wait time for a random single block read, and the wait event histogram also shows that the performance was very consistent, with almost 75 percent of the waits taking less than 512 microseconds at a rate of more than 16.000 IOPS.
Write Performance
The read-only test above was modified to be 100 percent updates (for more details see here). Since the instance was configured to operate with a minimum sized buffer cache this puts maximum pressure on the database writer to write dirty blocks as fast as possible in order to allow new blocks to be read into the buffer cache.
Running on a 4 OCPU single instance configuration (8 CPUs / 8 cores) with eight sessions the following profile was measured - using NOARCHIVELOG mode and Flashback disabled:
So clearly the database writer couldn't write the dirty blocks quick enough - almost 50 percent of the database time the sessions had to wait for free buffers. This means that write intensive workloads might not work too well and run into these limitations.
Summary
The storage layer is clearly optimized for reads and delivers at least for the read-only tests a very good I/O performance. Write intensive workloads might not work too well if they consistently require more write performance than provided by the service.
Labels:
DBaaS,
Oracle Cloud,
Performance
Monday, January 16, 2017
Oracle Parallel Execution Deep Dive Session
Here is a recording available on Youtube of a session I did a while ago, covering how to understand the essentials of Oracle Parallel Execution and how to read the corresponding execution plans.
Labels:
Parallel Execution,
Presentation,
Video Tutorial
Monday, January 9, 2017
12c Adaptive Joins Plus Statistics Feedback For Joins Cardinality Estimate Bug
I've encountered a bug at several clients that upgraded to Oracle 12c - 12.1.0.2 - that requires the combination of several new adaptive features introduced with Oracle 12c.
It needs an execution plan that makes use of adaptive join methods, plus at runtime the activation of the new "statistics feedback for joins" feature that was also introduced with Oracle 12c. Note that in 11.2 there was already the "cardinality feedback" feature that only applies to single table cardinality misestimates, but not to join cardinality misestimates.
In case then the join method used at runtime is a Nested Loop join - not necessarily the join method preferred initially, so a runtime switch from Hash to Nested Loop join also reproduces the problem - the "statistics feedback for joins" feature generates a bad OPT_ESTIMATE hint for the join cardinality that always seems to be one, like the following: OPT_ESTIMATE(... JOIN (...) ROWS=1.000000), no matter what the actual join cardinality observed was.
This can lead to very inefficient execution plans that get generated based on the "statistics feedback" on subsequent executions of the same statement, caused by the misleading join cardinality used to generate the new plan.
The good news is that in 12.2.0.1 and in 12.1, when making use of the new backport available that enables the same optimizer default settings than in 12.2 - for more information see this MOS document: Recommendations for Adaptive Features in Oracle Database 12c Release 1 (2187449.1) and for example Christian Antognini's post - the "statistics feedback for joins" feature is disabled by default (in fact in principle only the "adaptive joins" feature is left enabled, all other adaptive features more or less disabled), so the problem doesn't occur there. So one more good reason why installing this backport in 12.1 is a good idea.
However, when enabling this feature specifically or simply enabling all "adaptive statistics" features (OPTIMIZER_ADAPTIVE_STATISTICS = TRUE in 12.2 / 12.1 with backport) the problem also reproduces in 12.2.
If you don't make use of the recommended optimizer settings backport in 12.1 yet then setting "_optimizer_use_feedback" to FALSE prevents the problem, however this will disable both, the "cardinality feedback for single table" 11.2 feature as well as the "statistics feedback for joins" 12c feature.
In 12.2, there are two parameters related to "cardinality feedback", "_OPTIMIZER_USE_FEEDBACK" that controls the "cardinality feedback for single table" 11.2 feature and "_OPTIMIZER_USE_FEEDBACK_FOR_JOIN" that controls the new "statistics feedback for joins" feature. Hence, in 12.2, when enabling the "adaptive statistics" feature, the problem can be avoided by setting specifically "_OPTIMIZER_USE_FEEDBACK_FOR_JOIN" to FALSE, which would still leave the "cardinality feedback for single table" 11.2 feature enabled.
For more information regarding the various optimizer related settings and differences between 12.1 and 12.2 / 12.1 backport, see this very informative post by Christian Antognini.
Here is a simplified test case that allows reproducing the problem:
----------------------------------------------------------------------- -- Adaptive joins combined with statistics feedback for joins -- lead to join cardinality estimate of 1 -- caused by incorrect statistics feedback OPT_ESTIMATE hint generated -- -- The problem seems to be related to the following combination: -- -- - Adaptive join method selected -- - Statistics feedback for joins kicks in -- - The runtime join method used is Nested Loop join -- -- Reproduced: 12.1.0.2 -- 12.2.0.1 (with adaptive statistics features re-enabled) -- ----------------------------------------------------------------------- set echo on alter system flush shared_pool; drop table t1 purge; drop table t2 purge; create table t1 as select rownum as id , cast('9999' as varchar2(4)) as hist_ind , rpad('x', 200) as filler from dual connect by level <= 30000 order by dbms_random.value ; exec dbms_stats.gather_table_stats(null, 't1', method_opt => 'for all columns size 1') create unique index t1_idx on t1 (id); create table t2 as with generator1 as ( select /*+ cardinality(1e3) materialize */ rownum as id , rpad('x', 100) as filler from dual connect by level <= 1e3 ), generator2 as ( select /*+ cardinality(1e3) materialize */ rownum as id , rpad('x', 100) as filler from dual connect by level <= 1e3 ), generator as ( select /*+ leading(b a) */ (a.id - 1) * 1e3 + b.id as id from generator1 a , generator2 b ) select case when id <= 10000 then 1 when id <= 20000 then -1 else id end as id , cast('N' as varchar2(1)) as some_ind , rpad('x', 200) as filler from generator ; exec dbms_stats.gather_table_stats(null, 't2', method_opt => 'for all columns size 1') create /*unique*/ index t2_idx on t2 (id); set echo on define on column sql_id new_value sql_id alter session set statistics_level = all; -- Enable statistics feedback for joins in 12.2.0.1 alter session set optimizer_adaptive_statistics = true; -- Disabling statistics feedback for joins prevents the bug -- alter session set "_OPTIMIZER_USE_FEEDBACK_FOR_JOIN" = false; -- Or disabling adaptive joins prevents the bug -- alter session set "_OPTIMIZER_ADAPTIVE_PLANS" = false; -- alter session set "_OPTIMIZER_NLJ_HJ_ADAPTIVE_JOIN" = false; -- Not related to NL join optimized plan shapes --alter session set "_nlj_batching_enabled" = 0; --alter session set "_table_lookup_prefetch_size" = 0; -- Expected number of rows, initial join NL, still "statistics feedback" kicks in (is it a bug that statistics feedback kicks in here?) -- No join method switch -- Bad OPT_ESTIMATE hint for join (ROWS=1) => bug SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 20000 and 22000 ; -- Re-execute to have statistics feedback kick in / select prev_sql_id as sql_id from v$session where sid = userenv('sid'); select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%'; -- More than expected number of rows, initial join NL, "statistics feedback" kicks in -- Switch from NL to HASH (at runtime and at re-optimization time) -- Generated OPT_ESTIMATE hints are OK SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 1 and 2 ; -- Re-execute to have statistics feedback kick in / select prev_sql_id as sql_id from v$session where sid = userenv('sid'); select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%'; -- Less than expected number of rows, initial join HASH, "statistics feedback" kicks in -- Switch from HASH to NL (at runtime and at re-optimization time) -- Bad OPT_ESTIMATE hint for join generated (ROWS=1) => bug SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 2 and 20500 ; -- Re-execute to have statistics feedback kick in / select prev_sql_id as sql_id from v$session where sid = userenv('sid'); select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%'; -- Expected number of rows, initial join HASH, "statistics feedback" does not kick in -- No join method switch SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 20000 and 30000 ; -- Re-execute to have statistics feedback kick in / select prev_sql_id as sql_id from v$session where sid = userenv('sid'); select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%';The critical part of the output looks like this:
SQL> -- Expected number of rows, initial join NL, still "statistics feedback" kicks in (is it a bug that statistics feedback kicks in here?) SQL> -- No join method switch SQL> -- Bad OPT_ESTIMATE hint for join (ROWS=1) => bug SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM 2 T1 A0 INNER JOIN T2 A1 3 ON 1 = 1 4 AND A1.ID = A0.ID 5 WHERE 1 = 1 6 AND A0.HIST_IND = '9999' 7 AND A0.ID between 20000 and 22000 8 ; COUNT(A1.SOME_IND) ------------------ 2000 SQL> SQL> -- Re-execute to have statistics feedback kick in SQL> / COUNT(A1.SOME_IND) ------------------ 2000 SQL> SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid'); SQL_ID ------------- 8mqn521y28t58 SQL> SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 8mqn521y28t58, child number 0 ------------------------------------- SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 20000 and 22000 Plan hash value: 3258782287 -------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads | -------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 2036 (100)| | 1 |00:00:00.02 | 4044 | 38 | | 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.02 | 4044 | 38 | |- * 2 | HASH JOIN | | 1 | 2001 | 34017 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 | 38 | | 3 | NESTED LOOPS | | 1 | 2001 | 34017 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 | 38 | | 4 | NESTED LOOPS | | 1 | 2002 | 34017 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 2044 | 38 | |- 5 | STATISTICS COLLECTOR | | 1 | | | | | 2000 |00:00:00.01 | 2007 | 6 | | 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 2002 | 14014 | 1974 (0)| 00:00:01 | 2000 |00:00:00.01 | 2007 | 6 | | * 7 | INDEX RANGE SCAN | T2_IDX | 1 | 2002 | | 7 (0)| 00:00:01 | 2000 |00:00:00.01 | 7 | 6 | | * 8 | INDEX UNIQUE SCAN | T1_IDX | 2000 | 1 | | 0 (0)| | 2000 |00:00:00.01 | 37 | 32 | | * 9 | TABLE ACCESS BY INDEX ROWID | T1 | 2000 | 1 | 10 | 1 (0)| 00:00:01 | 2000 |00:00:00.01 | 2000 | 0 | |- * 10 | TABLE ACCESS FULL | T1 | 0 | 1 | 10 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 | -------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A1"."ID"="A0"."ID") 7 - access("A1"."ID">=20000 AND "A1"."ID"<=22000) 8 - access("A1"."ID"="A0"."ID") filter(("A0"."ID">=20000 AND "A0"."ID"<=22000)) 9 - filter("A0"."HIST_IND"='9999') 10 - filter(("A0"."ID">=20000 AND "A0"."ID"<=22000 AND "A0"."HIST_IND"='9999')) Note ----- - this is an adaptive plan (rows marked '-' are inactive) SQL_ID 8mqn521y28t58, child number 1 ------------------------------------- SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 20000 and 22000 Plan hash value: 3258782287 ----------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ----------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 2036 (100)| | 1 |00:00:00.02 | 4044 | | 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.02 | 4044 | |- * 2 | HASH JOIN | | 1 | 1 | 17 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 | | 3 | NESTED LOOPS | | 1 | 1 | 17 | 2036 (1)| 00:00:01 | 2000 |00:00:00.02 | 4044 | | 4 | NESTED LOOPS | | 1 | 2002 | 17 | 2036 (1)| 00:00:01 | 2000 |00:00:00.01 | 2044 | |- 5 | STATISTICS COLLECTOR | | 1 | | | | | 2000 |00:00:00.01 | 2007 | | 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 2002 | 14014 | 1974 (0)| 00:00:01 | 2000 |00:00:00.01 | 2007 | | * 7 | INDEX RANGE SCAN | T2_IDX | 1 | 2002 | | 7 (0)| 00:00:01 | 2000 |00:00:00.01 | 7 | | * 8 | INDEX UNIQUE SCAN | T1_IDX | 2000 | 1 | | 0 (0)| | 2000 |00:00:00.01 | 37 | | * 9 | TABLE ACCESS BY INDEX ROWID | T1 | 2000 | 1 | 10 | 1 (0)| 00:00:01 | 2000 |00:00:00.01 | 2000 | |- * 10 | TABLE ACCESS FULL | T1 | 0 | 1 | 10 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | ----------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A1"."ID"="A0"."ID") 7 - access("A1"."ID">=20000 AND "A1"."ID"<=22000) 8 - access("A1"."ID"="A0"."ID") filter(("A0"."ID">=20000 AND "A0"."ID"<=22000)) 9 - filter("A0"."HIST_IND"='9999') 10 - filter(("A0"."ID">=20000 AND "A0"."ID"<=22000 AND "A0"."HIST_IND"='9999')) Note ----- - statistics feedback used for this statement - this is an adaptive plan (rows marked '-' are inactive) 77 rows selected. SQL> SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%'; HINT_TEXT ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- OPT_ESTIMATE (@"SEL$58A6D7F6" JOIN ("A0"@"SEL$1" "A1"@"SEL$1") ROWS=1.000000 ) SQL> SQL> -- More than expected number of rows, initial join NL, "statistics feedback" kicks in SQL> -- Switch from NL to HASH (at runtime and at re-optimization time) SQL> -- Generated OPT_ESTIMATE hints are OK SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM 2 T1 A0 INNER JOIN T2 A1 3 ON 1 = 1 4 AND A1.ID = A0.ID 5 WHERE 1 = 1 6 AND A0.HIST_IND = '9999' 7 AND A0.ID between 1 and 2 8 ; COUNT(A1.SOME_IND) ------------------ 10000 SQL> SQL> -- Re-execute to have statistics feedback kick in SQL> / COUNT(A1.SOME_IND) ------------------ 10000 SQL> SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid'); SQL_ID ------------- 92rttcj6ntzqs SQL> SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 92rttcj6ntzqs, child number 0 ------------------------------------- SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 1 and 2 Plan hash value: 777836357 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | Reads | OMem | 1Mem | Used-Mem | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 7 (100)| | 1 |00:00:00.04 | 1262 | 70 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.04 | 1262 | 70 | | | | | * 2 | HASH JOIN | | 1 | 2 | 34 | 7 (0)| 00:00:01 | 10000 |00:00:00.04 | 1262 | 70 | 2061K| 2061K| 1355K (0)| |- 3 | NESTED LOOPS | | 1 | 2 | 34 | 7 (0)| 00:00:01 | 10000 |00:00:00.03 | 1258 | 40 | | | | |- 4 | NESTED LOOPS | | 1 | 3 | 34 | 7 (0)| 00:00:01 | 10000 |00:00:00.02 | 1258 | 40 | | | | |- 5 | STATISTICS COLLECTOR | | 1 | | | | | 10000 |00:00:00.02 | 1258 | 40 | | | | | 6 | TABLE ACCESS BY INDEX ROWID BATCHED| T2 | 1 | 3 | 21 | 6 (0)| 00:00:01 | 10000 |00:00:00.02 | 1258 | 40 | | | | | * 7 | INDEX RANGE SCAN | T2_IDX | 1 | 3 | | 3 (0)| 00:00:01 | 10000 |00:00:00.01 | 23 | 40 | | | | |- * 8 | INDEX UNIQUE SCAN | T1_IDX | 0 | 1 | | 0 (0)| | 0 |00:00:00.01 | 0 | 0 | | | | |- * 9 | TABLE ACCESS BY INDEX ROWID | T1 | 0 | 1 | 10 | 1 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | 0 | | | | | * 10 | TABLE ACCESS BY INDEX ROWID BATCHED | T1 | 1 | 1 | 10 | 1 (0)| 00:00:01 | 2 |00:00:00.01 | 4 | 30 | | | | | * 11 | INDEX RANGE SCAN | T1_IDX | 1 | 1 | | 0 (0)| | 2 |00:00:00.01 | 2 | 30 | | | | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A1"."ID"="A0"."ID") 7 - access("A1"."ID">=1 AND "A1"."ID"<=2) 8 - access("A1"."ID"="A0"."ID") filter(("A0"."ID"<=2 AND "A0"."ID">=1)) 9 - filter("A0"."HIST_IND"='9999') 10 - filter("A0"."HIST_IND"='9999') 11 - access("A0"."ID">=1 AND "A0"."ID"<=2) Note ----- - this is an adaptive plan (rows marked '-' are inactive) SQL_ID 92rttcj6ntzqs, child number 1 ------------------------------------- SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 1 and 2 Plan hash value: 3588347061 --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 7982 (100)| | 1 |00:00:00.13 | 29516 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.13 | 29516 | | | | |* 2 | HASH JOIN | | 1 | 10000 | 166K| 7982 (1)| 00:00:01 | 10000 |00:00:00.13 | 29516 | 2061K| 2061K| 1356K (0)| |* 3 | TABLE ACCESS FULL | T2 | 1 | 10000 | 70000 | 7978 (1)| 00:00:01 | 10000 |00:00:00.12 | 29512 | | | | |* 4 | TABLE ACCESS BY INDEX ROWID BATCHED| T1 | 1 | 2 | 20 | 4 (0)| 00:00:01 | 2 |00:00:00.01 | 4 | | | | |* 5 | INDEX RANGE SCAN | T1_IDX | 1 | 2 | | 2 (0)| 00:00:01 | 2 |00:00:00.01 | 2 | | | | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A1"."ID"="A0"."ID") 3 - filter(("A1"."ID"<=2 AND "A1"."ID">=1)) 4 - filter("A0"."HIST_IND"='9999') 5 - access("A0"."ID">=1 AND "A0"."ID"<=2) Note ----- - statistics feedback used for this statement 71 rows selected. SQL> SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%'; HINT_TEXT ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- OPT_ESTIMATE (@"SEL$58A6D7F6" JOIN ("A0"@"SEL$1" "A1"@"SEL$1") ROWS=10000.000000 ) SQL> SQL> -- Less than expected number of rows, initial join HASH, "statistics feedback" kicks in SQL> -- Switch from HASH to NL (at runtime and at re-optimization time) SQL> -- Bad OPT_ESTIMATE hint for join generated (ROWS=1) => bug SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM 2 T1 A0 INNER JOIN T2 A1 3 ON 1 = 1 4 AND A1.ID = A0.ID 5 WHERE 1 = 1 6 AND A0.HIST_IND = '9999' 7 AND A0.ID between 2 and 20500 8 ; COUNT(A1.SOME_IND) ------------------ 500 SQL> SQL> -- Re-execute to have statistics feedback kick in SQL> / COUNT(A1.SOME_IND) ------------------ 500 SQL> SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid'); SQL_ID ------------- c55rjg5mdxpph SQL> SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID c55rjg5mdxpph, child number 0 ------------------------------------- SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 2 and 20500 Plan hash value: 1011946885 ------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 8227 (100)| | 1 |00:00:00.13 | 30028 | | 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.13 | 30028 | |- * 2 | HASH JOIN | | 1 | 20500 | 340K| 8227 (1)| 00:00:01 | 500 |00:00:00.13 | 30028 | | 3 | NESTED LOOPS | | 1 | 20500 | 340K| 8227 (1)| 00:00:01 | 500 |00:00:00.13 | 30028 | | 4 | NESTED LOOPS | | 1 | | | | | 500 |00:00:00.13 | 29528 | |- 5 | STATISTICS COLLECTOR | | 1 | | | | | 500 |00:00:00.13 | 29512 | | * 6 | TABLE ACCESS FULL | T2 | 1 | 20500 | 140K| 7978 (1)| 00:00:01 | 500 |00:00:00.13 | 29512 | | * 7 | INDEX UNIQUE SCAN | T1_IDX | 500 | | | | | 500 |00:00:00.01 | 16 | | * 8 | TABLE ACCESS BY INDEX ROWID| T1 | 500 | 1 | 10 | 248 (0)| 00:00:01 | 500 |00:00:00.01 | 500 | |- * 9 | TABLE ACCESS FULL | T1 | 0 | 20501 | 200K| 248 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | ------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A1"."ID"="A0"."ID") 6 - filter(("A1"."ID"<=20500 AND "A1"."ID">=2)) 7 - access("A1"."ID"="A0"."ID") filter(("A0"."ID"<=20500 AND "A0"."ID">=2)) 8 - filter("A0"."HIST_IND"='9999') 9 - filter(("A0"."ID"<=20500 AND "A0"."ID">=2 AND "A0"."HIST_IND"='9999')) Note ----- - this is an adaptive plan (rows marked '-' are inactive) SQL_ID c55rjg5mdxpph, child number 1 ------------------------------------- SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 2 and 20500 Plan hash value: 1011946885 ---------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | ---------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 7994 (100)| | 1 |00:00:00.13 | 30028 | | 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.13 | 30028 | | 2 | NESTED LOOPS | | 1 | 1 | 17 | 7994 (1)| 00:00:01 | 500 |00:00:00.13 | 30028 | | 3 | NESTED LOOPS | | 1 | 500 | 17 | 7994 (1)| 00:00:01 | 500 |00:00:00.13 | 29528 | |* 4 | TABLE ACCESS FULL | T2 | 1 | 500 | 3500 | 7978 (1)| 00:00:01 | 500 |00:00:00.13 | 29512 | |* 5 | INDEX UNIQUE SCAN | T1_IDX | 500 | 1 | | 0 (0)| | 500 |00:00:00.01 | 16 | |* 6 | TABLE ACCESS BY INDEX ROWID| T1 | 500 | 1 | 10 | 1 (0)| 00:00:01 | 500 |00:00:00.01 | 500 | ---------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 4 - filter(("A1"."ID"<=20500 AND "A1"."ID">=2)) 5 - access("A1"."ID"="A0"."ID") filter(("A0"."ID"<=20500 AND "A0"."ID">=2)) 6 - filter("A0"."HIST_IND"='9999') Note ----- - statistics feedback used for this statement 69 rows selected. SQL> SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%'; HINT_TEXT ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- OPT_ESTIMATE (@"SEL$58A6D7F6" JOIN ("A0"@"SEL$1" "A1"@"SEL$1") ROWS=1.000000 ) SQL> SQL> -- Expected number of rows, initial join HASH, "statistics feedback" does not kick in SQL> -- No join method switch SQL> SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM 2 T1 A0 INNER JOIN T2 A1 3 ON 1 = 1 4 AND A1.ID = A0.ID 5 WHERE 1 = 1 6 AND A0.HIST_IND = '9999' 7 AND A0.ID between 20000 and 30000 8 ; COUNT(A1.SOME_IND) ------------------ 10000 SQL> SQL> -- Re-execute to have statistics feedback kick in SQL> / COUNT(A1.SOME_IND) ------------------ 10000 SQL> SQL> select prev_sql_id as sql_id from v$session where sid = userenv('sid'); SQL_ID ------------- 4tj7bn17xcbad SQL> SQL> select * from table(dbms_xplan.display_cursor('&sql_id', null, format => 'TYPICAL ALLSTATS LAST ADAPTIVE')); PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ SQL_ID 4tj7bn17xcbad, child number 0 ------------------------------------- SELECT /*+ leading(a1) */ count(A1.SOME_IND) FROM T1 A0 INNER JOIN T2 A1 ON 1 = 1 AND A1.ID = A0.ID WHERE 1 = 1 AND A0.HIST_IND = '9999' AND A0.ID between 20000 and 30000 Plan hash value: 4274056747 ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Starts | E-Rows |E-Bytes| Cost (%CPU)| E-Time | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | | 8227 (100)| | 1 |00:00:00.17 | 30434 | | | | | 1 | SORT AGGREGATE | | 1 | 1 | 17 | | | 1 |00:00:00.17 | 30434 | | | | | * 2 | HASH JOIN | | 1 | 10001 | 166K| 8227 (1)| 00:00:01 | 10000 |00:00:00.17 | 30434 | 1969K| 1969K| 1895K (0)| |- 3 | NESTED LOOPS | | 1 | 10001 | 166K| 8227 (1)| 00:00:01 | 10000 |00:00:00.14 | 29512 | | | | |- 4 | NESTED LOOPS | | 1 | | | | | 10000 |00:00:00.14 | 29512 | | | | |- 5 | STATISTICS COLLECTOR | | 1 | | | | | 10000 |00:00:00.13 | 29512 | | | | | * 6 | TABLE ACCESS FULL | T2 | 1 | 10002 | 70014 | 7978 (1)| 00:00:01 | 10000 |00:00:00.13 | 29512 | | | | |- * 7 | INDEX UNIQUE SCAN | T1_IDX | 0 | | | | | 0 |00:00:00.01 | 0 | | | | |- * 8 | TABLE ACCESS BY INDEX ROWID| T1 | 0 | 1 | 10 | 248 (0)| 00:00:01 | 0 |00:00:00.01 | 0 | | | | | * 9 | TABLE ACCESS FULL | T1 | 1 | 10001 | 97K| 248 (0)| 00:00:01 | 10001 |00:00:00.01 | 922 | | | | ---------------------------------------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("A1"."ID"="A0"."ID") 6 - filter(("A1"."ID"<=30000 AND "A1"."ID">=20000)) 7 - access("A1"."ID"="A0"."ID") filter(("A0"."ID">=20000 AND "A0"."ID"<=30000)) 8 - filter("A0"."HIST_IND"='9999') 9 - filter(("A0"."ID">=20000 AND "A0"."HIST_IND"='9999' AND "A0"."ID"<=30000)) Note ----- - this is an adaptive plan (rows marked '-' are inactive) 37 rows selected. SQL> SQL> select hint_text from V$SQL_REOPTIMIZATION_HINTS where sql_id = '&sql_id' and hint_text like '%JOIN%'; no rows selected SQL>Note how in each case where the Nested Loop join gets used at runtime and "statistics feedback for joins" kicks in, the bad OPT_ESTIMATE hint gets generated. I've discussed this case also with Nigel Bayliss at Oracle (the Optimizer Product Manager) and a corresponding bug was opened, so hopefully the problem gets addressed in the future.
Labels:
12.1.0.2,
12cR1,
12cR2. 12.2.0.1,
bug,
Cardinality Feedback,
Optimizer
Tuesday, January 3, 2017
Oracle Database Cloud (DBaaS) Performance - Part 1 - CPU
After having looked at the performance consistency provided by the Oracle Database Cloud offering in the previous series, I'll focus here on the raw performance figures I've measured during my tests, starting with the CPU related performance findings.
One of the first surprises is related to the fact that Oracle uses a unit called "OCPU" to specify the CPU capacity provided, which is explained here:
So one "OCPU" is supposed to represent one core with two Hyperthreading CPU threads, and hence should correspond for example to two VCPUs used as unit by Amazon.
But when looking at the actual CPU configuration of such a DBaaS VM, I got the following results for a 4 OCPU configuration:
So, that 4 OCPU configuration provides 8 CPUs, which is expected, but it provides those 8 CPUs with one thread per core, so that means 8 cores.
This is what I get when I configure a corresponding Amazon EC2 VM with 8 VCPUs (m4.2xlarge), which should be same as the Amazon RDS "db.m4.2xlarge" configuration (but I can't access a RDS instance on O/S level, hence the EC2 fallback):
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2300.062
BogoMIPS: 4600.12
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-7
So this more in line to what is expected - 4 cores providing 8 CPU threads.
Does it make a difference in terms of actual performance? It does - when running my CPU tests, both the PL/SQL as well as the SQL engine based tests (see the previous "performance consistency" series for more details about the tests) show the following average duration per iteration per thread when running with 4 vs. 8 threads:
Oracle DBaaS:
Amazon RDS:
So, using 8 threads instead of 4 threads only increases the duration of a test iteration slightly in the Oracle DBaaS 4 OCPU environment, whereas the Amazon RDS 8 VCPU environment significantly slows down, even more than expected considering Hyperthreading effects - in particular the SQL Logical I/O test requires more than twice of the time at 8 threads, but the PL/SQL based test, too, significantly slows down. It's interesting to see that running the SQL Logical I/O test at 4 threads the Amazon RDS environment outperforms the Oracle DBaaS offering to an unexpected degree.
Whether the better Amazon RDS SQL Logical I/O performance at 4 threads comes from the larger cache sizes reported by "lscpu" I don't know. I also don't know why Oracle provides more cores than outlined in their own product description. Maybe this should avoid exactly the effects seen with the Amazon RDS environment - maybe Hyperthreading doesn't work that well in virtualized environments - but that is just my speculation. Whether Oracle will keep this different approach in future I don't know either.
All I can say is that I consistently got that CPU / core ratio when configuring several services using a different number of OCPUs and that the my performance tests showed the difference outlined above when comparing the Oracle DBaaS and Amazon RDS environments.
One of the first surprises is related to the fact that Oracle uses a unit called "OCPU" to specify the CPU capacity provided, which is explained here:
So one "OCPU" is supposed to represent one core with two Hyperthreading CPU threads, and hence should correspond for example to two VCPUs used as unit by Amazon.
But when looking at the actual CPU configuration of such a DBaaS VM, I got the following results for a 4 OCPU configuration:
So, that 4 OCPU configuration provides 8 CPUs, which is expected, but it provides those 8 CPUs with one thread per core, so that means 8 cores.
This is what I get when I configure a corresponding Amazon EC2 VM with 8 VCPUs (m4.2xlarge), which should be same as the Amazon RDS "db.m4.2xlarge" configuration (but I can't access a RDS instance on O/S level, hence the EC2 fallback):
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2300.062
BogoMIPS: 4600.12
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-7
So this more in line to what is expected - 4 cores providing 8 CPU threads.
Does it make a difference in terms of actual performance? It does - when running my CPU tests, both the PL/SQL as well as the SQL engine based tests (see the previous "performance consistency" series for more details about the tests) show the following average duration per iteration per thread when running with 4 vs. 8 threads:
Oracle DBaaS:
Amazon RDS:
So, using 8 threads instead of 4 threads only increases the duration of a test iteration slightly in the Oracle DBaaS 4 OCPU environment, whereas the Amazon RDS 8 VCPU environment significantly slows down, even more than expected considering Hyperthreading effects - in particular the SQL Logical I/O test requires more than twice of the time at 8 threads, but the PL/SQL based test, too, significantly slows down. It's interesting to see that running the SQL Logical I/O test at 4 threads the Amazon RDS environment outperforms the Oracle DBaaS offering to an unexpected degree.
Whether the better Amazon RDS SQL Logical I/O performance at 4 threads comes from the larger cache sizes reported by "lscpu" I don't know. I also don't know why Oracle provides more cores than outlined in their own product description. Maybe this should avoid exactly the effects seen with the Amazon RDS environment - maybe Hyperthreading doesn't work that well in virtualized environments - but that is just my speculation. Whether Oracle will keep this different approach in future I don't know either.
All I can say is that I consistently got that CPU / core ratio when configuring several services using a different number of OCPUs and that the my performance tests showed the difference outlined above when comparing the Oracle DBaaS and Amazon RDS environments.
Labels:
DBaaS,
Oracle Cloud,
Performance
Subscribe to:
Posts (Atom)