HA reconnect timeout

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

HA reconnect timeout

Morc
  Hi,

  We're using SwiftMQ 9.2.5 in replicated HA setup. Due to some unidentified network problems keepalive counter reaches 0 and the API tries to reconnect. A few questions:
- can the API be forced to try the same server where it was connected to and not the other node?
- how can I adjust the reconnection timeout, because it seems to be set to 60000 (60 seconds)?

  Keepalive is set to 500 to detect instance failure quickly.

The JNDI URL the application uses is: smqp://host1:4001/host2=host2;port2=4001;timeout=10000;retrydelay=1000;maxretries=50;reconnect=true

  Thanks,

  Morc
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

IIT Software
Administrator
Reconnect takes place round-robin (host2 - host1 - host2 ..). Look here how to configure reconnect behavior.
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

Morc
Thanks, but what about the timeout when trying to reconnect? Can that be adjusted some way or it's hardwired into code, that 60 seconds? Because if the connection breaks/keepalive ticks down the API will try the standby node, and thus unnecessarily delaying the reconnect for at least a minute.
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

IIT Software
Administrator
Which timeout do you mean? If the client tries to reconnect and it doesn't get a connection, it immediately gets a SocketException and tries the other host.

If you mean that it takes time for the client to detect a broken network, then you may adjust the keep alive interval in the connection factory. The connection is marked as dead and disconnected after 5 missing keep alive messages.
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

Morc
In reply to this post by Morc
Well, the problem is that due to a firewall in between the two machines the TCP connection does come up, but nothing comes through it because the port on the far end actually is not open. I don't know why it works like that, I'll talk to the network admins as this is really weird. Anyhow please note the 1 minute difference between two log lines (09:11:18 vs 09:12:18) and the delay=60000 value. Is that waiting time configurable and where/how? reconnect_log.txt
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

IIT Software
Administrator
Ok, I see. This is configured by system property "swiftmq.request.timeout". Default is 60000. Look here.
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

Morc
Cool, thanks!
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

Morc
I have to return to this problem. I thought it was solved for good, but today we had another event where the reconnection took more than a minute. What I see in the log is that it tried to connected to the inactive node and again, waited 60 seconds, then tried the active node. Why?
Please see attached log and notice the 1 minute gap between 06:17:33 and 06:18:36 reconnect_log2.txt
Reply | Threaded
Open this post in threaded view
|

Re: HA reconnect timeout

IIT Software
Administrator
It stucks in creating the connection. Could you please shoot a thread dump of your client when it stucks there?