We're running SwiftMQ 9.2.5 in HA. The server handles a few hundred messages a second maximum, running on a multi core physical server. Normal CPU usage is between 120-160%. When I want to connect to the server (no authentication) with SwiftMQ Explorer it never displays the configuration tree or any error message. The displayed RAM usage goes up and down in cycles, and SwiftMQ Router process CPU usage spikes to 500-600% or even higher and maintains a high level until I close SwiftMQ Explorer. This is what I see in the SwiftMQ Router log:
2016-07-13 13:12:00.845/[BlockingTCPListener, swiftlet=sys$jms, port=4101]/Listener/INFORMATION/connection accepted: 10.0.0.1
2016-07-13 13:12:00.911/sys$jms/INFORMATION/JMSConnection v750/10.0.0.1:55416/receiving disconnect request, scheduling connection close
2016-07-13 13:12:00.913/10.0.0.1:55416/BlockingHandler/INFORMATION/Exception, EXITING: java.io.IOException: End-of-Stream reached
2016-07-13 13:12:00.913/sys$jms/INFORMATION/JMSConnection v750/10.0.0.1:55416/connection closed
2016-07-13 13:12:00.915/[BlockingTCPListener, swiftlet=sys$jms, port=4101]/Listener/INFORMATION/connection accepted: 10.0.0.1
2016-07-13 13:12:01.110/sys$mgmt/INFORMATION/SwiftMQ Explorer connected from host 'Client01'
2016-07-13 13:12:24.928/10.0.0.1:55417/BlockingHandler/INFORMATION/Exception, EXITING: java.io.IOException: End-of-Stream reached
2016-07-13 13:12:24.930/sys$jms/INFORMATION/JMSConnection v750/10.0.0.1:55417/connection closed
2016-07-13 13:12:27.935/[BlockingTCPListener, swiftlet=sys$jms, port=4101]/Listener/INFORMATION/connection accepted: 10.0.0.1
2016-07-13 13:12:27.962/sys$mgmt/INFORMATION/SwiftMQ Explorer connected from host 'Client01'
2016-07-13 13:12:47.933/10.0.0.1:55419/BlockingHandler/INFORMATION/Exception, EXITING: java.io.IOException: End-of-Stream reached
2016-07-13 13:12:47.936/sys$jms/INFORMATION/JMSConnection v750/10.0.0.1:55419/connection closed
2016-07-13 13:12:50.939/[BlockingTCPListener, swiftlet=sys$jms, port=4101]/Listener/INFORMATION/connection accepted: 10.0.0.1
2016-07-13 13:12:50.971/sys$mgmt/INFORMATION/SwiftMQ Explorer connected from host 'Client01'
2016-07-13 13:13:09.343/sys$mgmt/INFORMATION/SwiftMQ Explorer disconnected from host 'Client01'. Reason: Lease Timeout
2016-07-13 13:13:09.343/sys$mgmt/INFORMATION/SwiftMQ Explorer disconnected from host 'Client01'
2016-07-13 13:13:11.437/10.0.0.1:55421/BlockingHandler/INFORMATION/Exception, EXITING: java.io.IOException: End-of-Stream reached
2016-07-13 13:13:11.441/sys$jms/INFORMATION/JMSConnection v750/10.0.0.1:55421/connection closed
When the router is running with less/no load Explorer connects fine. This problematic connection attempt seems to cause a performance problem in the router itself as well. What timeout value should I raise to be able to connect reliably?
The management tree might become quite big if you have many entries, e.g. many queues, many connections etc. The tree is serialized and sent to the Explorer as JMS message. Have a look here to configure the JMS Swiftlet to better handle large messages (network buffer)
I created a listener with a big output buffer (10MB) and a connection factory with a big input buffer (10MB), but Explore behaves similarly, I don't get the configuration tree in it. What does the "Lease Timeout" message in the log means? Should I try setting a bigger buffer or set the other buffer size on the listener and connection factory as well? We have maybe 150 queues/topics and maybe a thousand connections on the router.
There were no keep alive messages received from the explorer. That is, the router doesn't send it. There is only one reason here: The tree is so huge that it takes a lot of time to expand the network buffer (= high CPU) to send it. During this time there is no output on this connection possible, thus no keepalives.
I haven't seen this before. But if the tree is so huge, you would get problems anyway to manage with Explorer because it's based on Swing and Swing is single threaded. I doubt that is able to process all these updates of the tree so at some point it's just better to use CLI instead.
Please define huge! We have 1597 connections at the moment. It this huge? A few hundred are topic connections, so there are temporary queues created. But even then there should not be more than 5000-600 entities total under the various Usage subtrees. I tried to run Explorer on the server, although it was before I tried these listener settings, but did not work. Also is it possible using CLI to inspect messages on queues, or even see how many are queued up, or what the flow control delay is? And to inspect message content?
Let's see. About 1500 JMS connections. I guess you're using JNDI. Do your clients close the JNDI context as recommended? If not then you have 1 further connection per JNDI context, hence you have 3K connections (please check Network Swiftlet / Connections). Plus temp queues etc. Do you have "Smart Management Tree" enabled (=default)? It's an attribute under the Router Environment (.env context). If not, you have all senders, receivers, sessions and so on in the tree.
The problem is that I can't figure out why your tree is so big.
You can do everything with CLI. SwiftMQ Explorer uses CLI internally, including fc delays, message content etc. Have a look here.
Yes, we use JNDI and I believe we close the context, but those 1597 were the actual TCP connections to the router as returned by the netstat command. Start Management Tree is false and I think false is the default in 9.2.5 version anyway. Can I see with CLI the message stats like I can with Explorer? Each connection, number of messages sitting on it, produced/consumed message count, FC delay, etc? I don't think so. So finding a problem component in the system which causes a queue is cumbersome to say the least with CLI. I created the listener in CLI and the connection factory, so I have an idea how user-friendly that could be :)
As I wrote, you can do everything with CLI but it's not as nice as Explorer. ;-)
Welcome to SwiftMQ!
Trying to connect ... connected
Router 'router1' is available for administration.
Type 'help' to get a list of available commands.
> sr router1
router1> cc /sys$queuemanager/usage
Entity List: Usage
Description: Active Queues
Number of Entities in this List: 21
router1/sys$queuemanager/usage> cc orderscollected
Entity: Active Queue
Description: Active Queue
Properties for this Entity:
Name Current Value
acache-size (R/O) 500
acache-size-kb (R/O) -1
acleanup-interval (R/O) 120000
aflowcontrol-start-queuesize (R/O) 400
amax-messages (R/O) -1
flowcontrol-delay (R/O) 0
mcache-messages (R/O) 0
mcache-size-kb (R/O) 0
messagecount (R/O) 60036
msg-consume-rate (R/O) 0
msg-produce-rate (R/O) 0
total-consumed (R/O) 0
total-produced (R/O) 0
Entity contains no Sub-Entities.
Smart Management Tree is true by default. If you have it to false then the tree is much deeper (and bigger). Please set it to true with CLI.
So basically the problem is that there's too much data and too tight keepalive interval? If I'd raise the interval and measure how long it takes for the data to come through I could set values which would work?
Yeah, there is definitely too much data. I don't know why and it seems to be impossible to figure that out, except you check each usage section with CLI (hint: sum command delivers the number of entries. just write a CLI script).
Try to increase the keepalive interval to the default of 60000. But I guess you will then hit the lease timeout.