Sunday, August 28, 2016

Oracle Database Cloud (DBaaS) Performance Consistency - Part 3

This is the third part of this installment, comparing the performance consistency of the DBaaS cloud offering with a dedicated physical host. This time instead of burning CPU using logical I/O only (see part 2) the test harness executes a SQL statement that has to perform physical I/O mostly, and to be more precise this round is a read-only test, so doesn't include any modification to data (except the logging and any other background system generated write activity, like maintaining ASH /AWR data etc.).

In order to maximize the physical I/O part the database instance was configured with a minimum sized buffer cache (16k block size in this case) and the following script was executed as many times as CPUs were available:

declare
  n number;
begin
  loop
    select /*+
              leading(t_o)
              use_nl(t_i)
              index(t_o)
              index(t_i)
          */
          sum(t_i.n)
          into n
    from
          t_o
        , t_i&tabname t_i
    where
          t_o.id_fk = t_i.id;
    insert into timings(testtype, thread_id, ts) values ('&testtype', &thread_id, systimestamp);
    commit;
  end loop;
end;
/

The outer table T_O was created like this:

create table t_o (id primary key, id_fk)
organization index
as
select
       rownum as id
     , case mod(rownum, 2) + 1
       when 1
       then mod(rownum, &tab_size / 20)
       else &tab_size / 10 - mod(rownum, &tab_size / 20) + 1
       end as id_fk
from dual

connect by level <= &tab_size;

Each thread got its dedicated inner table of the Nested Loop join, created like this:

create table t_i' || i || ' (id not null, n, filler)
pctfree 99 pctused 1
tablespace &tbs
as
select cast(rownum as integer) as id,
cast(rownum as number) as n,
cast(rpad('x', 200) as varchar2(200)) as filler
from dual

connect by level <= &tab_size / 10;

create index t_i' || i || '_idx on t_i' || i || ' (id, filler) pctfree 99 tablespace &tbs;


Due to the way the data was crafted and thanks to the Nested Loop join batching performed by Oracle this ensured that the sessions were performing "db file parallel read" I/O as much and hard as possible, so were submitting multiple I/O requests using a single call and taking advantage of asynchronous I/O where possible.

When running with 8 threads on those 8 CPUs on the DBaaS service this resulted in approx. 30.000 IOPS for those 16K blocks (approx. 480MB / sec throughput). Note that I got these 30.000 IOPS for 16K blocks also on a different instance with just 4 CPUs (2 OCPUs) and 4 threads, so it looks like this is a common 16K blocks IOPS limit for a DBaaS instance independently from the CPU count without further tweaking (I didn't attempt to optimize this in any way but used the service as provided / configured by the Oracle Cloud). It looks like this is more a throughput limitation at 480MB / sec than an IOPS limitation, because when using a 4K blocksize I was able to get 60.000 IOPS (middle value), but the IOPS rate was much more fluctuating (between 20.000 and 90.000) whereas the 30.000 IOPS using 16K blocksize was pretty much stable.

On the physical host I got approx. half of these IOPS (roughly 15.500) for those 16K blocks (approx. 250MB / sec throughput), which is very likely the throughput limit for the rather outdated hardware configuration, albeit using a rather modern Samsung SSD drive, but also meant that running at 4 threads I got a comparable number of IOPS per thread as in the DBaaS setup. Using a 4K blocksize the physical host maxed out at (very stable) 52.000 IOPS.

The overall results look like this:

DBaaS:



Physical host:




Again the graph is supposed to show how many of the runs deviated how much from the overall median runtime. Similar to the previous, logical I/O only test this time the DBaaS service shows a significantly different profile, with an even larger spread of deviation up to almost 9 percent from the median runtime, whereas the physical host only shows significant deviation up to 2.5 percent.

Again, like in the previous test, the physical host shows more extreme outliers than the DBaaS service.

The same graph on a per day basis this time shows significant differences between the days for the DBaaS service:
The physical host shows a very common pattern, except for the first day:
Looking at the individual performance of each thread the DBaaS shows a significant variation in performance per day:
The physical host shows a pretty consistent performance pattern, but interestingly the different threads show different, but consistent runtimes:
The next test round will include physical writes.

No comments: