JMS Bridge stalled - router freezes

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

JMS Bridge stalled - router freezes

TheQL
Hello,

we have problems with a JMS bridge from time to time. This seems to be a problem with all swiftlets, that when they for some reason die, you cannot stop them without freezing the router.

In this case the JMS bridge doesn't do anything although it seems to be running and is connected. This is often preceded by a connection loss caused by a network outage, but when the connection to the other router is re-established the bridge does not always resume to function properly. One would assume it would work to stop the JMS bridge and start it again, forcing a manual reconnect. But the stopping times out in Explorer after 60.000ms. Afterwards it isn't even possible to halt the active instance or doing anything at all in SwiftMQ-Explorer. Issuing a kill via shell initiates a shutdown, but as it is impossible to stop the JMS bridge, the router doesn't shut down. I have to manually run kill -9 against it.

I have created a thread dump in the shutdown state, though:
Shutdown SwiftMQ 9.4.1 Production ...
... shutdown: JMS Bridge Extension Swiftlet
2014-05-09 20:07:04
Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.45-b01 mixed mode):

"SIGTERM handler" daemon prio=10 tid=0x000000004f076000 nid=0x4f68 waiting for monitor entry [0x00002acc28d12000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Shutdown.java:168)
        - waiting to lock <0x000000077b50ce88> (a java.lang.Class for java.lang.Shutdown)
        at java.lang.Terminator$1.handle(Terminator.java:35)
        at sun.misc.Signal$1.run(Signal.java:195)
        at java.lang.Thread.run(Thread.java:662)

"SIGTERM handler" daemon prio=10 tid=0x000000004efa8800 nid=0x4f5a waiting for monitor entry [0x00002acc420e2000]
   java.lang.Thread.State: BLOCKED (on object monitor)
        at java.lang.Shutdown.exit(Shutdown.java:168)
        - waiting to lock <0x000000077b50ce88> (a java.lang.Class for java.lang.Shutdown)

If you need more of the dump I can send you the file, didn't want to post it all here.

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

IIT Software
Administrator
I'll have a look but I guess it's the the old problem that lies in the foreign JMS if it disconnects but doesn't inform the ExceptionListener and then hangs on close.
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

TheQL
Thanks for looking into it, but the foreign JMS might be innocent as there is no correct connection close at all. The VPN router establishing the connect to the foreign JMS fails and drops the connection. The swiftlet should detect this in my opinion and try to reconnect indefinitely until the connection is back up. Anyway, even if it fails to detect this, it would be great if the router wouldn't die when you try to stop the swiftlet.

We have similar issues with the SNMP swiftlet from time to time. I don't know what causes the swiftlet to stop functioning, but when you try to stop it, the router dies in the same way as described above.
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

IIT Software
Administrator
The Swiftlet uses the JMS implementation of the foreign JMS server. There are exactly 2 ways to get notified about a connection problem in JMS:

1) A JMSException during any synchronous JMS method call

2) Through a JMS ExceptionListener registered on the foreign JMS connection

The Swiftlet handles both and forces a connection close. During shutdown the Swiftlet closes all foreign JMS connection by calling its close method. It seems that hangs (you have not posted that part of the thread dump).

So given what you state - the VPN router "drops" the connection. How is this "drop" performed? A TCP connection is packet oriented. The only way one or both sides close it is if they agree to close (sending the resp. TCP packets around that leads to a close of the half end of the socket) or a timeout. So a VPN router can't "drop" a connection actually.
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

TheQL
Thanks for coming back to this!

First of all I can gladly post or send the rest of the thread dump, if desired.

I don't know what kind of problem causes the VPN connection to be lost, but it is in no way a planned connection shutdown. Some kind of problem with the connect occurs and then it's just dead.
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

IIT Software
Administrator
Yes, please post the whole dump.
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

TheQL
Here you are: jms_bridge_dump.txt
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

IIT Software
Administrator
It's what I've assumed. It hangs while shutting down the foreign bridge (first is to disable the message listener):

"SwiftMQ-bridge.server-28" daemon prio=10 tid=0x000000004e700800 nid=0x3aa1 in Object.wait() [0x00002acc3f0b2000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at com.ibm.mq.jmqi.remote.internal.system.ReentrantMutex.acquire(ReentrantMutex.java:92)
        - locked <0x0000000794784d70> (a com.ibm.mq.jmqi.remote.internal.RemoteHconn$DispatchLock)
        at com.ibm.mq.jmqi.remote.internal.RemoteHconn.requestDispatchLock(RemoteHconn.java:585)
        at com.ibm.mq.jmqi.remote.internal.RemoteFAP.MQCTL(RemoteFAP.java:2114)
        at com.ibm.msg.client.wmq.internal.WMQConsumerOwnerShadow.suspendAsyncService(WMQConsumerOwnerShadow.java:698)
        at com.ibm.msg.client.wmq.internal.WMQSession.suspendAsyncService(WMQSession.java:1577)
        at com.ibm.msg.client.wmq.internal.WMQAsyncConsumerShadow.deregisterMessageListener(WMQAsyncConsumerShadow.java:729)
        - locked <0x000000079478ce08> (a java.lang.Object)
        at com.ibm.msg.client.wmq.internal.WMQAsyncConsumerShadow.setMessageListener(WMQAsyncConsumerShadow.java:882)
        at com.ibm.msg.client.wmq.internal.WMQMessageConsumer.setMessageListener(WMQMessageConsumer.java:493)
        at com.ibm.msg.client.jms.internal.JmsMessageConsumerImpl.setMessageListener(JmsMessageConsumerImpl.java:527)
        - locked <0x000000079478c460> (a java.lang.Object)
        at com.ibm.mq.jms.MQMessageConsumer.setMessageListener(MQMessageConsumer.java:318)
        at com.swiftmq.extension.bridge.RemoteQueueBridgeSource.destroy(Unknown Source)
        at com.swiftmq.extension.bridge.DestinationBridge.d(Unknown Source)
        - locked <0x0000000794789168> (a com.swiftmq.extension.bridge.DestinationBridge)
        at com.swiftmq.extension.bridge.ServerBridge.b(Unknown Source)
        at com.swiftmq.extension.bridge.ServerBridge.b(Unknown Source)
        at com.swiftmq.extension.bridge.ServerBridge.f(Unknown Source)
        - locked <0x00000007814f9ad0> (a com.swiftmq.extension.bridge.ServerBridge)
        at com.swiftmq.extension.bridge.BridgeSwiftlet.a(Unknown Source)
        - locked <0x00000007814fcab0> (a com.swiftmq.extension.bridge.BridgeSwiftlet)
        at com.swiftmq.extension.bridge.BridgeSwiftlet.a(Unknown Source)
        - locked <0x00000007814fcab0> (a com.swiftmq.extension.bridge.BridgeSwiftlet)
        at com.swiftmq.extension.bridge.BridgeSwiftlet.b(Unknown Source)
        at com.swiftmq.extension.bridge.m.run(Unknown Source)
        at com.swiftmq.impl.threadpool.standard.PoolThread.run(Unknown Source)

If the physical connection is lost (like pulling the cable) then only a timeout mechanism can lead to a close. You may check MQ's docs how to configure it and set a very low value.
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

TheQL
Thanks for your reply.

Do you mean the foreign MQ's docs? I am not sure where I would be able to set a timeout in the JMS bridge configuration. All there is is the retry interval.

On the other hand you do consider it normal that the entire router is unresponsive when waiting for the connection close which is destined to fail?
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

IIT Software
Administrator
You should consult the WebsphereMQ docs if you can set some connection properties or system property to configure a kind of keep alive timeout.

If we don't close, we force memory or connection leaks. One is as bad as the other. The best is the foreign JMS acts properly.
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

TheQL
Had the problem again today. There was a log entry

2014-09-17 08:18:38.581/xt$bridge/ERROR/SWORD_C2k/onException: com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ1107: A problem with this connection has occurred. An error has occurred with the WebSphere MQ JMS connection. Use the linked exception to determine the cause of this error.

The exception from /swiftmq/scripts/unix/mqjms.log (not the ideal location for a log)
was
ACTION:
Review the exception details for further information.
--------------------------------------------------------------------
17/09/2014 08:18:38 [DispatchThread: 1] com.ibm.msg.client.jms.internal.JmsProviderExceptionListener
An exception has been delivered to the connections exception listener: '
                       Message : com.ibm.msg.client.jms.DetailedJMSException: JMSWMQ1107: A problem with this connection has occurred. An error has occurred with the WebSphere MQ JMS connection. Use the linked exception to determine the cause of this error.
                         Class : class com.ibm.msg.client.jms.DetailedJMSException
                         Stack : com.ibm.msg.client.wmq.common.internal.Reason.reasonToException(Reason.java:608)
                               : com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:236)
                               : com.ibm.msg.client.wmq.internal.WMQConnection.consumer(WMQConnection.java:834)
                               : com.ibm.mq.jmqi.remote.internal.RemoteAsyncConsume.callEventHandler(RemoteAsyncConsume.java:1021)
                               : com.ibm.mq.jmqi.remote.internal.RemoteAsyncConsume.driveEventsEH(RemoteAsyncConsume.java:1379)
                               : com.ibm.mq.jmqi.remote.internal.RemoteDispatchThread.run(RemoteDispatchThread.java:309)
                               : com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:209)
                               : com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:100)
                               : com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:224)
                               : com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:298)
                               : com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1220)
     Caused by [1] --> Message : com.ibm.mq.MQException: JMSCMQ0001: WebSphere MQ call failed with compcode '2' ('MQCC_FAILED') reason '2009' ('MQRC_CONNECTION_BROKEN').
                         Class : class com.ibm.mq.MQException
                         Stack : com.ibm.msg.client.wmq.common.internal.Reason.createException(Reason.java:223)
                               : com.ibm.msg.client.wmq.internal.WMQConnection.consumer(WMQConnection.java:834)
                               : com.ibm.mq.jmqi.remote.internal.RemoteAsyncConsume.callEventHandler(RemoteAsyncConsume.java:1021)
                               : com.ibm.mq.jmqi.remote.internal.RemoteAsyncConsume.driveEventsEH(RemoteAsyncConsume.java:1379)
                               : com.ibm.mq.jmqi.remote.internal.RemoteDispatchThread.run(RemoteDispatchThread.java:309)
                               : com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.runTask(WorkQueueItem.java:209)
                               : com.ibm.msg.client.commonservices.workqueue.SimpleWorkQueueItem.runItem(SimpleWorkQueueItem.java:100)
                               : com.ibm.msg.client.commonservices.workqueue.WorkQueueItem.run(WorkQueueItem.java:224)
                               : com.ibm.msg.client.commonservices.workqueue.WorkQueueManager.runWorkQueueItem(WorkQueueManager.java:298)
                               : com.ibm.msg.client.commonservices.j2se.workqueue.WorkQueueManagerImplementation$ThreadPoolWorker.run(WorkQueueManagerImplementation.java:1220)
'.

EXPLANATION:
null

ACTION:
Review the exception details for further information.

Anyway, the connection was dead although it seemed to be up. I tried to stop the bridging and that didn't work. Ended up with a dead router once more. I would really appreciate at least having the opportunity of somehow killing the thread and not having to kill the entire SwiftMQ router if you can think of any possible way to do that.

Also, if I knew any parameters for the MQ client, how would I pass them to client?
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

IIT Software
Administrator
This is a WMQ client (configuration?) problem. Keepalive and the like. We will check if we can workaround e.g. doing the close async with a timeout. But even then there is a blocked thread hanging on the WMQ connection close...
Reply | Threaded
Open this post in threaded view
|

Re: JMS Bridge stalled - router freezes

TheQL
Any improvement will be appreciated. In combination with the HA negotiation problem and null pointer exception with JMS bridges this is a great deal of maintenance work and leads to unnecessary service interruptions.