Thursday, November 24, 2011

How To Cancel A Query Running In Another Session

This is not really anything new - in fact Tanel Poder has already blogged about it a while ago. Tanel has specifically covered the handling of "urgent" TCP packets and how this could be used to signal a "cancel" to another process, however this only works on Unix environments and not with Windows SQL*Plus clients. In Tanel's article it is also mentioned that there is an officially documented way of doing this via the Resource Manager if you happen to have an Enterprise Edition license.

In my quick tests however the call to DBMS_RESOURCE_MANAGER.SWITCH_CONSUMER_GROUP_FOR_SESS using "CANCEL_SQL" as consumer group only errors out with ORA-29366 saying that the specified consumer group is invalid.

So ideally there should be an approach that is independent from client or server O/S or license details, and indeed there is one, however it is using an undocumented event and therefore is unsupported and can only be used at your own risk.

If you set event 10237 ("ORA-10237: simulate ^C (for testing purposes)") in a session to any level greater 0 then any currently running and future execution will be "cancelled", so once the cancellation was successful the event needs to be unset otherwise the session will be in an unusable state cancelling any further attempts (applies even if the "lifetime 1" clause is used instead of "forever" when using ORADEBUG to set the event).

So a simple script like the following should be sufficient to cancel a current execution in another session without the need to kill the session.


--------------------------------------------------------
--
-- simulate_control_c.sql
--
-- Purpose:
--
-- Sets event 10237 in a session to simulate
-- pressing CONTROL+C for that session
--
-- Allows to cancel a running SQL statement from
-- a remote session without killing the session
--
-- If the session is stuck on the server side
-- which means that it can't be killed this
-- probably won't help either
--
-- Requirements:
--
-- EXECUTE privilege on SYS.DBMS_SYSTEM
-- SELECT privilege on V$SESSION
--
-- Usage:
--
-- @simulate_control_c <SID>
--
-- Note:
--
-- The usage of that event is undocumented
-- Therefore use at your own risk!
-- Provided for free, without any warranties -
-- test this before using it on anything important
--
-- Other implementation ideas:
--
-- The following code is supposed to achieve the same on Enterprise Edition
-- and enabled Resource Manager in a documented way
-- In all versions tested (10.2.0.4, 11.1.0.7, 11.2.0.1, 11.2.0.2) I get however
-- ORA-29366 and it doesn't work as described
-- Note that the official documentation doesn't explicitly mention CANCEL_SQL as
-- valid consumer group for this call

-- begin
-- sys.dbms_resource_manager.switch_consumer_group_for_sess(
-- <sid>,<serial#>,'CANCEL_SQL'
-- );
-- end;
--
-- When running on Unix KILL -URG sent to the server process
-- should also simulate a Control-C
-- This doesn't work with Windows SQL*Plus clients though
--
-- See Tanel Poder's blog post for more info
-- http://blog.tanelpoder.com/2010/02/17/how-to-cancel-a-query-running-in-another-session/
--
-- Author:
--
-- Randolf Geist
-- http://oracle-randolf.blogspot.com
--
-- Versions tested:
--
-- 11.2.0.1 Server+Client
-- 10.2.0.4 Server
-- 11.2.0.2 Server
--
--------------------------------------------------------

set echo off verify off feedback off

column sid new_value v_sid noprint
column serial# new_value v_serial noprint

-- Get details from V$SESSION
select
sid
, serial#
from
v$session
where
sid = to_number('&1')
and status = 'ACTIVE'
;

declare
-- Avoid compilation errors in case of SID not found
v_sid number := to_number('&v_sid');
v_serial number := to_number('&v_serial');
v_status varchar2(100);
-- 60 seconds default timeout
n_timeout number := 60;
dt_start date := sysdate;
begin
-- SID not found
if v_sid is null then
raise_application_error(-20001, 'SID: &1 cannot be found or is not in STATUS=ACTIVE');
else
-- Set event 10237 to level 1 in session to simulate CONTROL+C
sys.dbms_system.set_ev(v_sid, v_serial, 10237, 1, '');
-- Check session state
loop
begin
select
status
into
v_status
from
v$session
where
sid = v_sid;
exception
-- SID no longer found
when NO_DATA_FOUND then
raise_application_error(-20001, 'SID: ' || v_sid || ' no longer found after cancelling');
end;

-- Status no longer active
-- then set event level to 0 to avoid further cancels
if v_status != 'ACTIVE' then
sys.dbms_system.set_ev(v_sid, v_serial, 10237, 0, '');
exit;
end if;

-- Session still active after timeout exceeded
-- Give up
if dt_start + (n_timeout / 86400) < sysdate then
sys.dbms_system.set_ev(v_sid, v_serial, 10237, 0, '');
raise_application_error(-20001, 'SID: ' || v_sid || ' still active after ' || n_timeout || ' seconds');
end if;

-- Back off after 5 seconds
-- Check only every second from then on
-- Avoids burning CPU and potential contention by this loop
-- However this means that more than a single statement potentially
-- gets cancelled during this second
if dt_start + (5 / 86400) < sysdate then
dbms_lock.sleep(1);
end if;
end loop;
end if;
end;
/


It is particularly useful in Windows environments where the SQL*Plus executable by default doesn't allow cancelling a current execution by pressing Control+C - it works only while fetching or pressing it a second time, terminating the whole SQL*Plus client.

Note that Tanel's method is probably able to cancel queries that this approach cannot cancel because the URGENT signal handler under Unix effectively causes an interrupt to the running process executing the corresponding handler code whereas the event set here has to be actively checked by the code of the running process.

6 comments:

  1. great stuff, thank you!
    If you want to avoid killing another sql in the same session, you can get sql_id and afterwards compare to make sure, you just cancel this particular statement.

    ReplyDelete
  2. Very nice! It was even able to cancel a

    loop null; end loop;

    in an APEX region. This will be helpful.

    ReplyDelete
  3. FWIW I've just re-published the source code, because I've again forgot to check for greater or less characters screwing up the formatting of the code, although it looks like only the comment section was affected - the code itself should have been fine in the initial version.

    Randolf

    ReplyDelete
  4. Great Stuff, Do we have option to pass instance number ?

    ReplyDelete
  5. > Do we have option to pass instance number

    You need to run this on the same instance where you want to cancel the other session.

    Randolf

    ReplyDelete
  6. Randolf

    thanks for this useful script - I was looking for this since ages !

    faschd_e_monnemer

    ReplyDelete