Introduction
After a long time, I had the pleasure to support a maintenance weekend. Times are changing, it has been a little while working with the operational team on maintenance activities.
I’m interfacing application and business projects and I’m mainly focused on architecture and strategic topics. Based on my role, it’s required to have a detailed understanding of the application landscape. And that’s the main reason for being Involved.
Proceeding the maintenance, immediately, an evergreen appeared:
Long lasting (application) client-based database session
To understand the environment: This is a trading environment. The related database systems are designed for the highest availability and need to satisfy governmental and regulatory requirements. Business and end-users are very sensitive to system outages. There’s high interest to not lose a database and business transaction. Therefore, the Maximum Availability Architecture – MAA is the first choice. This will include the Real Application Cluster – RAC with a two-node active/active setup. The patching procedure is executed in rolling manner – One node after the other.
Smart DBAs might argue: “Why worry? There’s Transaction Guard and Application Continuity!” Well, I’ve never seen a consistent implementation in the fields, by now. Let’s assume it’s not available.
Stopping database services
To install the patches, the first activity will be to stop (all) database services running on the cluster node. In the older days (e.g. 11gR2), the following command would stop the service:
Examples will make use of wildcard terminology explained in RFC3092
#srvctl stop service -d db_unique_name [-s "service_name_list"
# [-n node_name | -i instance_name] [-f]
$srvctl stop service -d orcl_site10.foo.bar -s myapp.foo.bar -i orcl1
You may find the full command line reference here: Server Control Utility Reference (oracle.com)
This command won’t disconnect existing connections and sessions. But, new connections and sessions can’t be established. In theory, connections will bail out to the other cluster node(s), over time, and we may stop the database instance on the node for maintenance. So we can be sure that we will not terminate active sessions, by accident.
If you need evidence about the status of the currently running database service, you may check with:
SELECT name, inst_id
FROM gv$active_services
ORDER BY name, inst_id;
NAME INST_ID
------------------------------ -------
orcl 1
orcl 2
orcl_app1.foo.bar 1
orcl_app1.foo.bar 2
orcl_app2.foo.bar 1
orcl_app2.foo.bar 2
...
Review connected sessions
Once all application related database services are down, review the current connected sessions. Carry on with the maintenance only, once all sessions are disconnected from the affected database instance. Once again! By intention, we do not want to interrupt something, what might cause an issue!
The status of current connected sessions might be verified with the help of the v$session
view.
SELECT service_name, inst_id, count(*)
FROM gv$session
WHERE service_name NOT IN ('SYS$BACKGROUND','SYS$USERS')
GROUP BY service_name, inst_id
ORDER BY service_name, inst_id;
SERVICE_NAME INST_ID COUNT(*)
------------------------- ------- --------
...
orcl_app1.foo.bar 1 50
orcl_app1.foo.bar 2 50
orcl_app2.foo.bar 1 50
orcl_app2.foo.bar 2 50
...
Let’s say we did stop orcl_app1
and orcl_app2
database service on database instance with INST_ID=1
. The expectation is, that over the time the session count will decrease. On the other hand, the amount of sessions on INST_ID=2
will increase.
SERVICE_NAME INST_ID COUNT(*)
------------------------- ------- --------
...
orcl_app1.foo.bar 1 17
orcl_app1.foo.bar 2 68
orcl_app2.foo.bar 1 5
orcl_app2.foo.bar 2 72
...
And finally we reach a state where we wait for infinity, because there are some session that will not disappear. Other teams are waiting and want to go one. Can you fell the slight pressure?
Identify resistent sessions
To sort out the situation, it’s required to identify and terminate the related sessions, that are still connected. The database view v$session
is providing a lot more details to identify which connections are still connected and where are they coming from. The following query has proven to get a quick overview, very the pain might be:
SELECT service_name, inst_id, count(*), username, machine, program, osuser
FROM gv$session
WHERE service_name NOT IN ('SYS$BACKGROUND','SYS$USERS')
--AND inst_id=1
GROUP BY service_name, inst_id, username, machine, program, osuser
ORDER BY service_name, inst_id, username, machine, program;
With the given information, we may find ways to have a closer look into the sessions. Querying the view gv$session
with the following attributes might help:
- Who is the owner off the session(s)?
USERNAME, SCHEMANAME, OSUSER, MACHINE, TERMINAL
- What is the current activity?
SQL_ID, STATE
- Which executable or program is attached to the session?
PROGRAM, MODULE
A friend told me, a lot of situations have been identified, where the DBA himself was connected to a database service and was keeping the sessions open.
Just mentioning this for a friend…
Once identified the owner, you may clarify the situation and terminate the session.
Job done!
Well,…
closing every remaining session individually, is quite a tough job. Wouldn’t be nice, if we can automate this? And what, if you can’t find the responsabel owner of an active session for clarification?
Option 1 – make use of parameter stopoption
Over time the srvctl stop service
command has been evolved. Comparing 11gR2 with 19c we see new options:
#srvctl stop service {-node node_name | -db db_unique_name [-pq] [-pdb pluggable_database |
#-service "service_list" [-eval]] [-node node_name | -instance instance_name |}
#-serverpool pool_name] [-stopoption IMMEDIATE|TRANSACTIONAL|NONE] [-#drain_timeout timeout]
#[-wait {YES | NO}] [-force [-noreplay]] [-global_override] [-verbose]
$srvctl stop service -db orcl_site10.foo.bar -service myapp.foo.bar -instance orcl1
As we have learned earlier, stopping a database service will not disconnect existing sessions! Even with current database versions this hasn’t changed (exception you will change the default behaviour of the service). Now we received a new option that might be used with the help of the command srvctl stop service
.
You may find the full command line reference (v19c) here: Server Control Utility Reference (oracle.com)
Using the optional parameter -stopoption
will enable use to define how and if we want to terminate connected session. The default is still to not [NONE]
terminate connected sessions. The default behavior may be modified with the database service configuration using srvctl modify service ...
.
Based on the documentation we may chose from two options to terminate existing session:
- IMMEDIATE: Sessions that won’t dispose in a defined time by parameter -drain_timeout will be terminated.
- TRANSACTIONAL: Sessions terminate, once they have send a commit.
So with the following example command, we’ll have a good way to drain database sessions stopping the database services:
$srvctl stop service -db orcl_site10.foo.bar
-service orcl_app1.foo.bar
-instance orcl1 -drain_timeout 60 -force
-stopoption immediate -verbose
In the sake of automatization this will be a big benefit. There’s a command line parameter to deal with still connected session to a database service.
But wait a moment! We still did not solve the issue! How may we handle session in a more controlled way, where we can’t hardly terminated sessions with the -stopoption IMMEDIATE
? Shall we wait for infinity?
Well, if you chose the save way and you need evidence about sessions that will not drain, you need to make use of the -stopoption TRANSACTIONAL
parameter. It’s still required to review the v$session
view and look into the details to make an educated decisions for a termination. The detailed description is in the beginning of the post.
But hey! If you are still connected with the database, is there a way to drain and terminate all sessions connected to a database service with a database procedure?
Yes! There’s the old fashioned way. Go with option 2!
Option 2 – DBMS_SERVICE
DBMS_SERVICE.STOP_SERVICE
As mentioned in the beginn, e.g. using version 11gR2, it was not possible to drain connected session with the help of the srvctl stop service
command..
Reviewing the DBMS_SERVICE
package reference documents, there is a STOP_SERVICE
procedure. And we may observe the same evolvement, like we did observed with the srvctl stop service
command.
#11gR2
DBMS_SERVICE.STOP_SERVICE(
service_name IN VARCHAR2,
instance_name IN VARCHAR2);
#19c
DBMS_SERVICE.STOP_SERVICE(
service_name IN VARCHAR2,
instance_name IN VARCHAR2 DEFAULT NULL,
stop_option IN VARCHAR2 DEFAULT NULL,
drain_timeout IN NUMBER DEFAULT NULL,
replay IN BOOLEAN DEFAULT TRUE);
You may find the full package reference here:
v11gR2: DBMS_SERVICE (oracle.com)
v19c: DBMS_SERVICE (oracle.com)
In conclusion we might say that always there’ll be a equivalent database package and procedure, we may utilise within the database.
Warning!
Even it looks promising to make use of the DBMS_SERVICE.STOP_SERVICE
and any other procedure in the package. Please consider that Clusterware aka. Grid Infrastructure (or whatever the name will be – product names are changing quickly) is the leading system. If you change or modify the database service configuration with the help of the DBMS_SERVICE
package, the cluster management system will be confused and might not consider the configuration changes you have made.
It’s highly recommend to do configuration changes and modifications using the srvctl
command, only!
DBMS_SERVICE.DISCONNECT_SESSION
Compared with the srvctl
command, the DBMS_SERVICE
package is providing the procedure DISCONNECT_SESSION
with the parameter DISCONNECT_OPTION
. This is the equivalent to the stopoption
parameter in the v19c srvctl stop service
command. This packahge provides the only way, in 11gR2, to drain sessions from a database service, in a gentil way. Of course, there is a -force
parameter with srvctl stop service
, but that’s exactly what we don’t want to archive.
DBMS_SERVICE.DISCONNECT_SESSION(
service_name IN VARCHAR2,
disconnect_option IN NUMBER DEFAULT POST_TRANSACTION;
You may find the full package reference here: DBMS_SERVICE (oracle.com)
And somehow it’s following the same logic:
POST_TRANSACTION
=
0
: The session will terminate once the current transaction commits or rolls back.IMMEDIATE
=
1
: Sessions terminate immediately.NOREPLAY = 2
: Sessions are terminating immediately and won’t be replayed by application continuity. Related to Oracle Documentation this isIMMEDIATE
andNOREPLAY
together.
The NOREPLAY
parameter has been added to satisfy the Application Continuity feature. With version 11gR2 just POST_TRANSACTION
and IMMEDIATE
were available.
Aligned to our initial scenario, you may execute the following command to automatically disconnect every session that’s attache to the named database service. This is valide once the sessions activities are finalised with a commit or rollback.
EXEC DBMS_SERVICE.DISCONNECT_SESSION('orcl_app1.foo.bar');
Following, you’ve to monitor the v$session
view and check the status of the remaining session. If a session is resistent to terminate and disconnect, depending on the situation, application design and risk you might chose the easy way and send a DISCONNECT_SESSION
with the IMMEDIATE
parameter.
EXEC DBMS_SERVICE.DISCONNECT_SESSION('orcl_app1.foo.bar',1);
--DBMS_SERVICE.IMMEDIATE
If this is not a feasible option, you need to sort out the situation in detail, as described in the beginning.
Summary
We all know that every database landscape is very specific. Requirements, architecture and criticality will vary. There won’t be the one and only way to deal with such a situation. At least for me, the following has been turned out as viable way to deal with long lasting sessions in mission critical environments:
select instance_number, instance_name, host_name from v$instance;
INSTANCE_NUMBER INSTANCE_NAME HOST_NAME
--------------- ------------- -----------------
1 ORCL1 SRV_NODE1
- Stop the required database services with the help of
srvctl stop service
- Connect to the related database instance and make sure you are on the right instance and cluster node with the help of
v$instance
: - Review all required database services are stopped with
gv$active_services
- Send
EXEC DBMS_SERVICE.DISCONNECT_SESSION('my_service.foo.bar');
to close all connect sessions POST_TRANSACTIONAL. - Take your time and monitor
v$session
- Identify if there are resistent session and if any, consider appropriate actions.
If you interested in some more details about database services in Oracle RAC environments and client connection methods, you might find my presentations useful: