Management Swiftlet dies on failover of remote instance in Router network

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Management Swiftlet dies on failover of remote instance in Router network

TheQL
Hi,

I had to perform a few failovers in the past and now I am starting to recognize a pattern. We have a router network with 3 HA routers, once I perform a failover on one location it does happen quite regularly that this exception arises on another location:

Got Exception:
    ThreadGroup: mgmt
    ActiveTask : PipelineQueue, dispatchToken=sys$mgmt.dispatchqueue
Stack Trace:
java.lang.NullPointerException
        at com.swiftmq.impl.mgmt.standard.v750.DispatcherImpl.doExpire(Unknown Source)
        at com.swiftmq.impl.mgmt.standard.DispatchQueue.c(Unknown Source)
        at com.swiftmq.impl.mgmt.standard.DispatchQueue.visit(Unknown Source)
        at com.swiftmq.impl.mgmt.standard.po.CheckExpire.accept(Unknown Source)
        at com.swiftmq.tools.pipeline.PipelineQueue.process(Unknown Source)
        at com.swiftmq.tools.queue.SingleProcessorQueue.dequeue(Unknown Source)
        at com.swiftmq.tools.pipeline.PipelineQueue$QueueProcessor.run(Unknown Source)
        at com.swiftmq.impl.threadpool.standard.PoolThread.run(Unknown Source)

After this exception the management swiftlet at that location is dead, also memory consumption gets huge, resulting in the GC to run permanently and a performance decrease. I then perform a failover on the broken router, of course management swiftlet does not terminate, kill -9 helps and after all failovers are through and HA is back all is well again. Anyway, this seems to be an issue not existing in 9.6.0 but somehow in 9.7.1 it does.

I also have a stack trace of the router.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Management Swiftlet dies on failover of remote instance in Router network

IIT Software
Administrator
I've created a job fix. We've fixed a NPE in the Management Swiftlet in 9.7.2 which occurred when the trace space "kernel" was enabled. Was it enabled when it occurred?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Management Swiftlet dies on failover of remote instance in Router network

TheQL
This post was updated on .
Hi, sorry, I have given you incorrect information, we are running 9.7.3 - also no trace swiftlet is enabled on our sites, I just temporary enable them when I really think I need it.

ALSO: Today I performed a lot of failovers without any of the remote mgmt swiftlets throwing an exception btw. - and then it did happen shortly after I wrote this *sigh*
Loading...