Wednesday, March 2, 2011

Fast Connection Failover Behavior Change in 11.2

Fast Connection Failover (FCF) allows jdbc connections to failover to surviving nodes in a RAC when one or more nodes are unavailable. This is very useful to have with middleware application server connection pools so when rolling upgrades or patch application is done on the database the connections in the middleware server would failover to surviving instances and client connections would not receive any errors.

It seem with 11.2.0.2 and 11.2.0.1 this doesn't work same as in 11gR1 and 10gR2.

There are two metalink notes available to test fcf on 10gR2 (How to Verify and Test Fast Connection Failover (FCF) Setup from a JDBC Thin Client Against a 10.2.x RAC Cluster [ID 433827.1]) and 11gR1 (Fast Connection Failover (FCF) Test Client Using 11g JDBC Driver and 11g RAC Cluster [ID 566573.1]).

In this case the java code provided with 566573.1 was used to test the failover of 10gR2 (10.2.0.4) , 11gR1 (11.1.0.7) and 11gR2 (11.2.0.1 and 11.2.0.2)

There are some differences between 433827.1 and 566573.1 note. In 10gR2 note the ons configuration uses vip hostname whereas in the 11gR1 note machine hostname is used. But for the testing purpose vip hostnames were used throughout (and 11gR1 has been deployed on production system using the vip hostname for ons configuration and failover worked without any problem), for 11gR2 it was tested with both vip hostname and hostname but there was no change in the result.

Furthermore the ons remote port on 11gR2 is changed to 6200 (same as 10gR2) but on 11gR1 it was 6251. The onsctl ping command on 10gR2 and 11gR1 provided the remote port this is no longer the case on 11gR2 but remote port could be observed with onsctl debug.
$GI_HOME/bin/onsctl ping
ons is running ...

$GI_HOME/bin/onsctl debug
HTTP/1.1 200 OK
Content-Length: 2488
Content-Type: text/html
Response:

== rac4.domain.net:6200 3640 11/03/02 15:36:32 ==
Home: /opt/app/11.2.0/grid

======== ONS ========
IP ADDRESS                   PORT    TIME   SEQUENCE  FLAGS
--------------------------------------- ----- -------- -------- --------
192.168.0.86  6200 4d6e6210 00000002 00000008

Listener:

TYPE                BIND ADDRESS               PORT  SOCKET
-------- --------------------------------------- ----- ------
Local                                  127.0.0.1  6100      5
Remote                                       any  6200      7
Remote                                       any  6200      
ojdbc6.jar and ons.jar from a 11gR2 home was used for all the test.

The test code gets 5 jdbc connections from the connection pool and prints to which instance these connections point to and close them and waits 30 seconds and re-tries the same procedure. During the 30 second wait one of the instnaces is shutdown with
srvctl stop instance -d dbname -i instancename -o abort
If the FCF is working properly then connections to the aborted instance would have been cleaned up and all the connections in the pool will be valid connections, connected to surviving node(s).

Output from running against the 10gR2 RAC
java support.au.fcf11g.FCFDemo
>> PROGRAM using JDBC thin driver, no oracle client required
>> ojdbc5.jar (for JDK 1.5) or ojdbc6.jar (for JDK 1.6) and ons.jar from $ORACLE_HOME/opmn/lib directory must be in the CLASSPATH
>> Press CNTRL C to exit running program

** Retrieving 5 connections from pool **
5 connections have been retrieved..

=============
Database Product Name is ... Oracle
Database Product Version is  Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
=============
JDBC Driver Name is ........ Oracle JDBC driver
JDBC Driver Version is ..... 11.2.0.2.0
JDBC Driver Major Version is 11
JDBC Driver Minor Version is 2
=============

-- Connection number #1
Instance -> rac10g21
Host -> rac1
Service -> rac10g2.domain.net
-- Connection number #2
Instance -> rac10g21
Host -> rac1
Service -> rac10g2.domain.net
-- Connection number #3
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net
-- Connection number #4
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net
-- Connection number #5
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 0
-----------

Sleeping for 30 seconds..
Woke up, will continue now..

** Retrieving 5 connections from pool **
5 connections have been retrieved..

-- Connection number #1
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net
-- Connection number #2
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net
-- Connection number #3
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net
-- Connection number #4
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net
-- Connection number #5
Instance -> rac10g22
Host -> rac2
Service -> rac10g2.domain.net

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 0
-----------

Sleeping for 30 seconds..
The two connections that were connected to instance rac10g21 has been cleaned up and all the connections now point to rac10g22.

Output from running against 11gR1 RAC
java support.au.fcf11g.FCFDemo
>> PROGRAM using JDBC thin driver, no oracle client required
>> ojdbc5.jar (for JDK 1.5) or ojdbc6.jar (for JDK 1.6) and ons.jar from $ORACLE_HOME/opmn/lib directory must be in the CLASSPATH
>> Press CNTRL C to exit running program

** Retrieving 5 connections from pool **
5 connections have been retrieved..

=============
Database Product Name is ... Oracle
Database Product Version is  Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
=============
JDBC Driver Name is ........ Oracle JDBC driver
JDBC Driver Version is ..... 11.2.0.2.0
JDBC Driver Major Version is 11
JDBC Driver Minor Version is 2
=============

-- Connection number #1
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net
-- Connection number #2
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net
-- Connection number #3
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net
-- Connection number #4
Instance -> rac11g12
Host -> rac2
Service -> rac11g1.domain.net
-- Connection number #5
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 0
-----------

Sleeping for 30 seconds..
Woke up, will continue now..

** Retrieving 5 connections from pool **
5 connections have been retrieved..

-- Connection number #1
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net
-- Connection number #2
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net
-- Connection number #3
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net
-- Connection number #4
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net
-- Connection number #5
Instance -> rac11g11
Host -> rac1
Service -> rac11g1.domain.net

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 0
-----------

Sleeping for 30 seconds..
There was one connection to rac11g12 instnace and this has been cleaned up and all the connections now point to rac11g11.

Now the 11.2 RACs first against 11.2.0.1
java support.au.fcf11g.FCFDemo
>> PROGRAM using JDBC thin driver, no oracle client required
>> ojdbc5.jar (for JDK 1.5) or ojdbc6.jar (for JDK 1.6) and ons.jar from $ORACLE_HOME/opmn/lib directory must be in the CLASSPATH
>> Press CNTRL C to exit running program

** Retrieving 5 connections from pool **
5 connections have been retrieved..

=============
Database Product Name is ... Oracle
Database Product Version is  Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
=============
JDBC Driver Name is ........ Oracle JDBC driver
JDBC Driver Version is ..... 11.2.0.2.0
JDBC Driver Major Version is 11
JDBC Driver Minor Version is 2
=============

-- Connection number #1
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net
-- Connection number #2
Instance -> rac11g22
Host -> rac5
Service -> rac11g2.domain.net
-- Connection number #3
Instance -> rac11g22
Host -> rac5
Service -> rac11g2.domain.net
-- Connection number #4
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net
-- Connection number #5
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 0
-----------

Sleeping for 30 seconds..
Woke up, will continue now..

** Retrieving 5 connections from pool **
5 connections have been retrieved..

-- Connection number #1
Error running query: No more data to read from socket
-- Connection number #2
Error running query: No more data to read from socket
-- Connection number #3
Instance -> rac11g22
Host -> rac5
Service -> rac11g2.domain.net
-- Connection number #4
Error running query: No more data to read from socket
-- Connection number #5
Instance -> rac11g22
Host -> rac5
Service -> rac11g2.domain.net

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 3
-----------

Sleeping for 30 seconds..
As seen from the above output the FCF hasn't happened and an error is thrown on the java program.

Running against 11.2.0.2
java support.au.fcf11g.FCFDemo
>> PROGRAM using JDBC thin driver, no oracle client required
>> ojdbc5.jar (for JDK 1.5) or ojdbc6.jar (for JDK 1.6) and ons.jar from $ORACLE_HOME/opmn/lib directory must be in the CLASSPATH
>> Press CNTRL C to exit running program

** Retrieving 5 connections from pool **
5 connections have been retrieved..

=============
Database Product Name is ... Oracle
Database Product Version is  Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
=============
JDBC Driver Name is ........ Oracle JDBC driver
JDBC Driver Version is ..... 11.2.0.2.0
JDBC Driver Major Version is 11
JDBC Driver Minor Version is 2
=============

-- Connection number #1
Instance -> rac11g22
Host -> rac5
Service -> rac11g2.domain.net
-- Connection number #2
Instance -> rac11g22
Host -> rac5
Service -> rac11g2.domain.net
-- Connection number #3
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net
-- Connection number #4
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net
-- Connection number #5
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 0
-----------

Sleeping for 30 seconds..
Woke up, will continue now..

** Retrieving 5 connections from pool **
5 connections have been retrieved..

-- Connection number #1
Error running query: No more data to read from socket
-- Connection number #2
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net
-- Connection number #3
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net
-- Connection number #4
Instance -> rac11g21
Host -> rac4
Service -> rac11g2.domain.net
-- Connection number #5
Error running query: No more data to read from socket

-- Displaying Cache Details --
NumberOfAvailableConnections: 0
NumberOfActiveConnections: 5
Number of query failures: 2
-----------

Sleeping for 30 seconds..
Since the FCF is not working when the application receives connections that are connected to the instnace that was shutdown error is thrown.

SR has been raised blog will be updated with the outcome.





Update 03 March 2011

Oracle has pointed out several metalink note regarding this. First it was stated this is expected behavior and mentioned in the metalink note
How To Use JDBC FCF Feature To Detect That a RAC Node Instance Has Been Shutdown ? [ID 364005.1]This error would have thrown if a node went down after a connection is taken from the pool and before doing some work with it. But the behavior change does not lie there but how connection pool cleans up invalid connections.

Two more metalink notes were pointed out Oracle Universal Connection Pool (UCP) and Deprecation of JDBC Implicit Connection Cache [ID 743726.1] and How to Verify Universal Connection Pool (UCP) / Fast Connection Failover (FCF) Setup [ID 1064652.1]

It seem with 11gR2 implicit connectoin caching is deprecated. Tests wer done using the code provided with 1064652.1 on 11gR2 and 11gR1 and still (with this whatever implicit caching mechanism present or not present on the jdbc side using UCP was common on both test cases) the difference on connection pool clean up is there. First test was on 11gR2 RAC.
----------- UCP Details ---------
NumberOfAvailableConnections: 4 <== 4 connection in the pool
BorrowedConnectionsCount: 1
---------------------------------

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #13  -> instance[rac11g22], host[rac5], service[rac11g2.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 4
BorrowedConnectionsCount: 1
---------------------------------

====================== instance shutdown =============================

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Connection retry necessary : No more data to read from socket
** FCFTest :
** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Connection retry necessary : No more data to read from socket <== expected behavior on 11gR2
** FCFTest :
** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #16  -> instance[rac11g22], host[rac5], service[rac11g2.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 2 <== connection pool cleaned up of invalid connections
BorrowedConnectionsCount: 1
---------------------------------

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #17  -> instance[rac11g22], host[rac5], service[rac11g2.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 2
BorrowedConnectionsCount: 1
---------------------------------
Now with 11gR1
java au.support.jdbc.scan.fcf.FCFTest
Started UCP FCF Test at Thu Mar 03 11:21:03 GMT 2011
** UCPPool : retrieveConnection
** FCFTest : Query #1  -> instance[rac11g12], host[rac2], service[rac11g1.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 4 <== 4 connections in the pool
BorrowedConnectionsCount: 1
---------------------------------

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #2  -> instance[rac11g11], host[rac1], service[rac11g1.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 4
BorrowedConnectionsCount: 1
---------------------------------
================== instance shutdown and connection pool gets cleaned up immediately.

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #3  -> instance[rac11g12], host[rac2], service[rac11g1.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 1 <== pool is cleaned up immediately no connection retry
BorrowedConnectionsCount: 1
---------------------------------

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #4  -> instance[rac11g12], host[rac2], service[rac11g1.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 1
BorrowedConnectionsCount: 1
---------------------------------
SR is ongoing will be updated.

Update 11 March 2011
Running the ucp test case with logging enabled showed the failover events in 11gR2 was missing the domain part of the service name whereas in 11gR1 the full service name including domain was sent. Logs below from the test for 11gR1
** UCPPool : connection returned to pool
2011-03-07T12:15:57.904+0000 UCP FINEST seq-43,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setEventType eventType: database/event/service
2011-03-07T12:15:57.904+0000 UCP FINEST seq-44,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.validateEventType eventType: database/event/service
2011-03-07T12:15:57.904+0000 UCP FINEST seq-45,thread-11 oracle.ucp.jdbc.oracle.ONSDatabaseFailoverEvent. eventType: database/event/service, eventBody: VERSION=1.0 service=rac11g1.domain.net instance=rac11g11 database=rac11g1 host=rac1 status=down reason=user
2011-03-07T12:15:57.905+0000 UCP FINEST seq-46,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setServiceName serviceName: rac11g1.domain.net
2011-03-07T12:15:57.905+0000 UCP FINEST seq-47,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setInstanceName instanceName: rac11g11
2011-03-07T12:15:57.905+0000 UCP FINEST seq-48,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setDbUniqueName dbUniqueName: rac11g1
2011-03-07T12:15:57.906+0000 UCP FINEST seq-49,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setHostName hostName: rac1
2011-03-07T12:15:57.906+0000 UCP FINEST seq-50,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setStatus status: down
2011-03-07T12:15:57.906+0000 UCP FINEST seq-51,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setReason reason: user
On 11gR2 the domain name is missing on the service name
** UCPPool : connection returned to pool
2011-03-07T12:41:47.951+0000 UCP FINEST seq-49,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setEventType eventType: database/event/service
2011-03-07T12:41:47.951+0000 UCP FINEST seq-50,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.validateEventType eventType: database/event/service
2011-03-07T12:41:47.952+0000 UCP FINEST seq-51,thread-11 oracle.ucp.jdbc.oracle.ONSDatabaseFailoverEvent. eventType: database/event/service, eventBody: VERSION=1.0 service=rac11g2 instance=rac11g21 database=rac11g2 host=rac4 status=down reason=USER
2011-03-07T12:41:47.952+0000 UCP FINEST seq-52,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setServiceName serviceName: rac11g2
2011-03-07T12:41:47.952+0000 UCP FINEST seq-53,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setInstanceName instanceName: rac11g21
2011-03-07T12:41:47.953+0000 UCP FINEST seq-54,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setDbUniqueName dbUniqueName: rac11g2
2011-03-07T12:41:47.953+0000 UCP FINEST seq-55,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setHostName hostName: rac4
2011-03-07T12:41:47.953+0000 UCP FINEST seq-56,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setStatus status: down
2011-03-07T12:41:47.954+0000 UCP FINEST seq-57,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setReason reason: user
According to Oracle Documentation provided throught the SRthis service name should match the service name in dba_services. Looking at this dba view
SQL> select service_id,name from dba_services;

SERVICE_ID NAME
---------- ------------
1 SYS$BACKGROUND
2 SYS$USERS
3 rac11g2sXDB
4 rac11g2s.domain.net
5 rac11g2XDB
6 rac11g2.domain.net

6 rows selected.

SQL> show parameter service

NAME TYPE VALUE
------------- ------ -------------------
service_names string rac11g2.domain.net
Pointed this out to oracle and was told this is related to "RDBMS Bug 9363352: SERVICE NAME HAS NO DOMAIN FOR UP/DOWN EVENTS, closed as not a bug" and at this point Oracle couldn't reproduce the error seen above (No more data to read from socket). Looking at the service name oracle is using noticed that it is a service name without a domain.

So created service name without a domain and tested.
srvctl add service -d rac11g2 -s nodomain -r "rac11g21,rac11g22"

srvctl start service -d rac11g2 -s nodomain

srvctl status service -d rac11g2 -s nodomain
Service nodomain is running on instance(s) rac11g21,rac11g22

SQL> show parameter service

NAME TYPE VALUE
------------- ------ ----------------------------
service_names string rac11g2.domain.net,nodomain

SQL> select name from dba_services;

NAME
---------------------
SYS$BACKGROUND
SYS$USERS
rac11g2sXDB
rac11g2s.domain.net
rac11g2XDB
rac11g2.domain.net
nodomain

$ lsnrctl status

LSNRCTL for Linux: Version 11.2.0.2.0 - Production on 11-MAR-2011 10:34:57

Copyright (c) 1991, 2010, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.2.0 - Production
Start Date                07-MAR-2011 12:48:29
Uptime                    3 days 21 hr. 46 min. 27 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/11.2.0/grid/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/rac4/listener/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.86)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.0.90)(PORT=1521)))
Services Summary...
Service "+ASM" has 1 instance(s).
Instance "+ASM1", status READY, has 1 handler(s) for this service...
Service "nodomain.domain.net" has 1 instance(s).
Instance "rac11g21", status READY, has 1 handler(s) for this service...
Service "rac11g2.domain.net" has 2 instance(s).
Instance "rac11g21", status UNKNOWN, has 1 handler(s) for this service...
Instance "rac11g21", status READY, has 1 handler(s) for this service...
Ran the test with logging enabled. This time the service name has the domain name even though it was created without a one and connection pool get refresh and validated quickly just as it was with 11gR1
** UCPPool : connection returned to pool
2011-03-11T10:25:00.022+0000 UCP FINEST seq-75,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setEventType eventType: database/event/service
2011-03-11T10:25:00.022+0000 UCP FINEST seq-76,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.validateEventType eventType: database/event/service
2011-03-11T10:25:00.022+0000 UCP FINEST seq-77,thread-11 oracle.ucp.jdbc.oracle.ONSDatabaseFailoverEvent. eventType: database/event/service, eventBody: VERSION=1.0 service=nodomain.domain.net instance=rac11g22 database=rac11g2 host=rac5 status=down reason=USER
2011-03-11T10:25:00.023+0000 UCP FINEST seq-78,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setServiceName serviceName: nodomain.domain.net
2011-03-11T10:25:00.023+0000 UCP FINEST seq-79,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setInstanceName instanceName: rac11g22
2011-03-11T10:25:00.023+0000 UCP FINEST seq-80,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setDbUniqueName dbUniqueName: rac11g2
2011-03-11T10:25:00.024+0000 UCP FINEST seq-81,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setHostName hostName: rac5
2011-03-11T10:25:00.024+0000 UCP FINEST seq-82,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setStatus status: down
2011-03-11T10:25:00.024+0000 UCP FINEST seq-83,thread-11 oracle.ucp.jdbc.oracle.OracleFailoverEventImpl.setReason reason: user
Waiting for Oracle reply on this.

Update 06 April 2011
Oracle came back with this is similar to "unpublished bug 9740127 fixed in version 12" also mentioned couple of bugs 8779597 and 9740127 and pointed out that if a application service was created with domain name (instead of using the default database service) failover works as expected.

So created an application service (additional service as before)
srvctl add service -d rac11g2 -s withdomain.domain.net -r "rac11g21,rac11g22"
Ran the test again and the connection failover was as before.
** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #9  -> instance[rac11g21], host[rac4], service[withdomain.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 4
BorrowedConnectionsCount: 1
---------------------------------

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #10  -> instance[rac11g21], host[rac4], service[withdomain.domain.net]

----------- UCP Details ---------
NumberOfAvailableConnections: 0
BorrowedConnectionsCount: 1
---------------------------------
In other versions (10gR2 and 11gR1) default service with or without domain would work fine in a failover situation.
Created another database without a domain name and ran the test again and saw that failover wasn't working as expected.
** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #181  -> instance[testdb1], host[rac4], service[testdb]

----------- UCP Details ---------
NumberOfAvailableConnections: 1
BorrowedConnectionsCount: 1
---------------------------------

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Connection retry necessary : No more data to read from socket
** FCFTest : Apr 6, 2011 2:14 PM SUCCESS     Connections:(Available=1 Affected=0 FailedToProcess=0 MarkedDown=0 Closed=0)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0) cardinality=2 targetedToTear=0 tornDown=0 markedToClose=0 targetUpEventNewConns=1 upEventNewConns=1
Apr 6, 2011 2:14 PM SUCCESS      Connections:(Available=2 Affected=1 FailedToProcess=0 MarkedDown=1 Closed=1)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0)
Apr 6, 2011 2:13 PM SUCCESS     Connections:(Available=1 Affected=0 FailedToProcess=0 MarkedDown=0 Closed=0)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0) cardinality=2 targetedToTear=0 tornDown=0 markedToClose=0 targetUpEventNewConns=1 upEventNewConns=1
Apr 6, 2011 2:12 PM SUCCESS      Connections:(Available=1 Affected=1 FailedToProcess=0 MarkedDown=1 Closed=1)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0)
Apr 6, 2011 2:12 PM SUCCESS     Connections:(Available=1 Affected=0 FailedToProcess=0 MarkedDown=0 Closed=0)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0) cardinality=2 targetedToTear=0 tornDown=0 markedToClose=0 targetUpEventNewConns=1 upEventNewConns=1
Apr 6, 2011 2:11 PM SUCCESS      Connections:(Available=2 Affected=1 FailedToProcess=0 MarkedDown=1 Closed=1)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0)
Apr 6, 2011 2:11 PM SUCCESS     Connections:(Available=1 Affected=0 FailedToProcess=0 MarkedDown=0 Closed=0)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0) cardinality=2 targetedToTear=0 tornDown=0 markedToClose=0 targetUpEventNewConns=1 upEventNewConns=1
Apr 6, 2011 2:10 PM SUCCESS      Connections:(Available=3 Affected=2 FailedToProcess=0 MarkedDown=2 Closed=2)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0)
Apr 6, 2011 2:10 PM SUCCESS     Connections:(Available=2 Affected=0 FailedToProcess=0 MarkedDown=0 Closed=0)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0) cardinality=2 targetedToTear=0 tornDown=0 markedToClose=0 targetUpEventNewConns=2 upEventNewConns=2
Apr 6, 2011 2:09 PM SUCCESS      Connections:(Available=6 Affected=4 FailedToProcess=0 MarkedDown=4 Closed=4)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0)
Apr 6, 2011 2:09 PM SUCCESS     Connections:(Available=3 Affected=0 FailedToProcess=0 MarkedDown=0 Closed=0)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0) cardinality=2 targetedToTear=0 tornDown=0 markedToClose=0 targetUpEventNewConns=3 upEventNewConns=3
Apr 6, 2011 2:08 PM SUCCESS      Connections:(Available=5 Affected=2 FailedToProcess=0 MarkedDown=2 Closed=2)(Borrowed=0 Affected=0 FailedToProcess=0 MarkedDown=0 MarkedDeferredClose=0 Closed=0)

** UCPPool : connection returned to pool
** UCPPool : retrieveConnection
** FCFTest : Query #183  -> instance[testdb2], host[rac5], service[testdb]

----------- UCP Details ---------
NumberOfAvailableConnections: 0
BorrowedConnectionsCount: 1
---------------------------------
Oracle confirmed this is related to using default database service and suggested to create and use application services instead of the default database service.

Update 12 March 2012
UCP delay in processing ONS events during data guard or RAC failover using FCF [ID 1380527.1]
Apply 12596492 patch (available for 11.2.0.2 and 11.2.0.3)