we're using SwiftMQ HA 6.1.0 (replicated file store) on two linux server (debian).
We use a queue as an "Archive" queue which happens to store more than 25000 persistent messages for a long period of time (4 to 6 months).
The application console allows user to search messages in this queue and there is a monitoring thread which uses the CLI API to get the queue message count every 20 secondes.
To improve the overall performance, the store swiflet cache is set to 50000 (min=max), the queue message cache to 10000 and the queue cleanup interval to 3600000 (1 hour). A backup job is scheduled every night.
All is working fine since the beginning of July (when we put this application into production) but we noticed that after a certain amount of time, the queue message count becomes inaccurate (the value returned is lower than the real message count) ! A start-stop of the router corrects the pb...
Since 15 days, we're facing another problem : the application continues to store messages into this queue without any error but we are unable to access these messages using a browser, nor using the SwiftMQ Explorer.... The message count seems accurate (growing) though.
We first tried a complete SwiftMQ HA routers restart with no luck.
We thought that maybe we had reached a limit of swiftmq so we used a "Queue Purger" job to delete the oldest messages => the queue message count came back from 25000 to 18000 but we're still unable to access the last 3000 (newest) messages using SwiftMQ Explorer => we get a "Requet timeout" message box.
Tonight we will try a Shrink Job to see if it helps recovring the hidden messages; another idea would be to use a Queue Mover job to transfer all the messages into a new empty queue, deleting the current queue and moving the messages back (if the problems is due to a corrupted queue).
To be complete, the same application use another queue to store messages the same way but with less messages (around 5000 messages) and we encountered no problem at all with this queue.
Can you help us recovering these messages since some users of our application need to get access to them and are currently stuck.
Just for the records: The problem was a corrupted store because 2 persons did start/stop the instances at the same time but were not aware of each others action... We're able to recover a major part of the store with the Queue Mover Job.