Extension Swiflet can lose data on failover

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Extension Swiflet can lose data on failover

TheQL
Hi,

it's hard to reproduce, I figure, because this has only happened twice here in over a year running SwiftMQ 10.2.0, but as it happened TWICE and not only a single time, I thought I'd let you know.

On the first occurence after a double failover from instance 1 to 2 and back again, which seemingly worked fine, we noticed broken messages consumed by the JavaMail Bridge Extension Swiftlet. Upon investigation I found that on a single bridge of many a single property translation was no longer configured. I then saved the config and performed a diff on the routerconfig.xml against the backup version and could verify that only this one translation was missing. I then copied the backup file over routerconfig.xml and the watchdog picked up the change, added the property and everything was fine.

Today the same thing happened on a different HA cluster, here a JMS Bridge had lost the "remote_to_local" bridge definition. I could easily verify this by performing the same steps as above.

2018-08-21 14:56:27.545/SwiftletManager/INFORMATION/ConfigfileWatchdog/performTimeAction/applyNewEntities, context=/xt$bridge/servers/somebridge/bridgings, entity added=copy remote to local
2018-08-21 14:57:27.725/SwiftletManager/INFORMATION/ConfigfileWatchdog/performTimeAction/applyNewEntities, context=/xt$bridge/servers/somebridge/bridgings, entity added=copy remote to local
2018-08-21 14:58:27.841/SwiftletManager/INFORMATION/ConfigfileWatchdog/performTimeAction/applyNewEntities, context=/xt$bridge/servers/somebridge/bridgings, entity added=copy remote to local

Now this time the watchdog did attempt to create the bridge but obviously failed, because it kept trying.
The error.log revealed this:

2018-08-21 14:58:27.843/xt$bridge/somebridge/ERROR/onEntityAdd (bridgings): Exception creating bridging 'copy remote to local': java.lang.NullPointerException

I then deleted the entire bridge, saved the config and then I copied the backup routerconfig.xml over again, this time it worked and added the entire bridge. Just wanted to let you know...
Reply | Threaded
Open this post in threaded view
|

Re: Extension Swiflet can lose data on failover

IIT Software
Administrator
Thank you. I will try to reproduce it!
Reply | Threaded
Open this post in threaded view
|

Re: Extension Swiflet can lose data on failover

TheQL
While thinking about it, the NPE could have been the reason for the Bridge property to disappear on failover. But I don't believe there was an NPE on the first incident on the JavaMail Bridge. I could try to dig in our logs to verify, if you'd like.
Reply | Threaded
Open this post in threaded view
|

Re: Extension Swiflet can lose data on failover

IIT Software
Administrator
It only affects Extension Swiftlets. The difference between those and Kernel Swiftlets is the delay in which Extension Swiftlets are loaded. They are loaded in another thread and the delay depends on the interval for the deploy space of the Deploy Swiftlet.

So if you have a failover:

- active replicates config to standby on initial connect
- but only for those Swiftlets that have been registered in the Management Tree
- Extension Swiftlets register in the tree upon load which takes place after a delay
- therefore the Extension Swiftlets load their config from the last save state in the routerconfig of the standby

This is what I need to test. May be a simple auto-save after a config replication on an initial connect to the standby would solve that.

Meanwhile try to save every change to active and standby. If standby is not running and you change the config, you might run into this issue, I guess.

I have created a job fix for this. Will be probably fixed for the next release.
Reply | Threaded
Open this post in threaded view
|

Re: Extension Swiflet can lose data on failover

TheQL
Hi,

thanks for investigating!

Anyway, usually we save the config pretty often and hardly ever is the standby instance unavailable during that. As a matter of fact the JavaMail Bridge that initially had this issue is unchanged for quite a while... Nevertheless your fixes will be an improvement, I believe.
Reply | Threaded
Open this post in threaded view
|

Re: Extension Swiflet can lose data on failover

TheQL
In reply to this post by IIT Software
Hello,

this just happened again. I have made another observation, though, as this time there was only one failover from active to standby and enabled another perspective on the issue!

Although the standby config was saved at the same date as the active, the property that disappeared was already missing in the saved config on standby. I can't explain why this was the case and especially why we encounter this issue so regularly while it has never been an issue in the past. Just wanted to let you know.

See for yourself:

Active
-------
[root@mq1 replicated]$ ls
...
-rw-r--r--. 1 swiftmq swiftmq 254361 26. Sep 16:00 routerconfig.xml.20181005111307919
...
[root@mq1 replicated]$ grep -A 5 "name=\"XXX_Mailbridge_Inbox\" enabled" routerconfig.xml.20181005111307919
      <outbound-bridge name="XXX_Mailbridge_Inbox" enabled="true" smtp-host="localhost" source-name="XXX_Mailbridge_Inbox" transformer-class="com.swiftmq.outbound.Base64MailTransformer">
        <default-headers>
          <default-header name="from" value="swiftmq-admin@example.com"/>
          <default-header name="subject" value="data"/>
          <default-header name="to" value="recipient@exampe.com"/>
        </default-headers>

Standby
---------
[root@mq2 replicated]$ ls
...
-rw-r--r--. 1 swiftmq swiftmq 253933 26. Sep 16:00 routerconfig.xml.20181005111307922
...

[root@mq2 replicated]$ grep -A 5 "name=\"XXX_Mailbridge_Inbox\" enabled" routerconfig.xml.20181005111307922
      <outbound-bridge name="XXX_Mailbridge_Inbox" enabled="true" smtp-host="localhost" source-name="XXX_Mailbridge_Inbox" transformer-class="com.swiftmq.outbound.Base64MailTransformer">
        <default-headers/>
        <header-translations/>
        <javamail-properties/>
      </outbound-bridge>
Reply | Threaded
Open this post in threaded view
|

Re: Extension Swiflet can lose data on failover

IIT Software
Administrator
Thank you, this helps. This will be fixed in the next release.