In a 3-node network of brokers with persistence on network storage, there is a possibility of index corruption on failed store operations. With NFS, for example, setting the timeo value to the recommended value of 20 ms can result in Input/Output errors under load, causing a broker restart. Occasionally, this index corruption can result in failure of the broker to pass journal checks and start, necessitating removal of the index files to recover.
Sometimes this seems to result in erroneous message counts and duplicate messages within the NOB. For example, the broker displays a lower number of messages in the queue counts than actually exist in the journals. When the index is removed and rebuilt, counts go up and sometimes extra / duplicate messages are observed.