[prev in list] [next in list] [prev in thread] [next in thread] 

List:       james-dev
Subject:    Re: Making all queues on Rabbitmq quorum queue when option enabled
From:       Quan tran hong <quan.tranhong1999 () gmail ! com>
Date:       2024-04-19 8:05:02
Message-ID: CAFG0j1DweB3NV5CScO5D3620bVxaZr7NDVT_Yxs2DmRApWbyAg () mail ! gmail ! com
[Download RAW message or body]


> Do you share this point of view?

+1

Quan

V o Th 2, 15 thg 4, 2024 vào lúc 16:43 Benoit TELLIER <btellier@apache.org>
đã viết:

> Hi Quan,
>
> First thanks for the job done on this topic.
>
> I know some members of the community (Karsten ?) already did significant
> work on the topic but more oriented toward the POP3 server.
>
> This work is of course welcome as it would result in a higher
> reliability for the IMAP / JMAP components.
>
>  > What do you think about making all queues on Rabbitmq quorum queue
> when option enabled? On the principle, +1 In practice that is slightly
> harder for the event bus notification queue... - We can likely afford
> losing some of those pub sub message? - The queue is tied to a
> connection, thus if the node/connection goes done it can be recreated
> elsewhere? - We would need to come up with a cleanup strategy in order
> to eventually deletes queues haging around. - Also, how relevant is this
> RabbitMQ backend pub sub implementation when compared with the work done
> with Redis? IMO the eventbus notification was the main blocker in order
> to achieve decent HA with RabbitMQ. Do you share this point of view?
> Best regards, Benoit TELLIER
>
> On 15/04/2024 09:53, Quan tran hong wrote:
> > Hi folks,
> >
> > Recently we encountered a deployment issue that used a RabbitMQ Cluster
> > where a RabbitMQ node outage (for about 1 hour) forced James service more
> > or less to be down too.
> >
> > I created a Jira ticket to report the issue:
> > https://issues.apache.org/jira/projects/JAMES/issues/JAMES-4027
> >
> > More details below for one did not read the Jira ticket yet:
> >
> > Today, when the quorum option is enabled, only some queues are quorum
> > queues, not all (e.g. event bus notification queues and Task Manager's
> > termination queues).
> >
> > I tried to reproduce the issue and here is my theory:
> >
> > The RabbitMQ node that stores the notification queues is down
> > -> James can not publish messages to RabbitMQ and causes e.g. IMAP
> SELECT,
> > STORE, APPEND, UNSELECT ... commands to fail
> > -> James keeps retrying the publish failures (retry for Group
> registration
> > which seems to rely on the classic queue too) and queues other IMAP
> > requests in the meantime.
> > -> The IMAP server queue becomes full and the exception `The IMAP server
> > has reached its maximum capacity` is thrown.
> > -> James IMAP becomes a zombie and cascading failures.
> >
> > James needs to be more fault-tolerant in this case.
> >
> > We think making all queues on Rabbitmq quorum queue when
> > `quorum.queues.enable=true` would help James be more fault tolerant on
> that
> > scenario.
> >
> > We investigated a POC athttps://
> github.com/apache/james-project/pull/2191  and
> > the full quorum queues helped James be more fault tolerant as expected.
> >
> > After full quorum queues are used, the James performance is a bit slower
> > but is still fine, and that cost is likely needed to make James more
> > reliable.
> >
> > If we use Redis backed event bus notifications, the performance is better
> > than the RabbitMQ notification quorum queues.
> >
> > What do you think about making all queues on Rabbitmq quorum queue when
> > option enabled? Feedback and review are very welcome.
> >
> > Thanks for reading.
> >
> > Quan
> >


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic