[prev in list] [next in list] [prev in thread] [next in thread]
List: cassandra-dev
Subject: Re: GMFD messages
From: Anthony Molinaro <anthonym () alumni ! caltech ! edu>
Date: 2010-05-26 23:52:36
Message-ID: 20100526235236.GB63843 () alumni ! caltech ! edu
[Download RAW message or body]
Hi,
Still haven't heard from anyone about this, while the restart helped
temporarily, it now seems to be broken again. I restarted one node
with TRACE logging on, and I see this
INFO [GMFD:1] 2010-05-26 23:44:44,260 GossipDigestSynMessage.java (line 129)
@@@@ Breaking out to respect the MTU size in EPS. Estimate is 56 @@@@
TRACE [GMFD:1] 2010-05-26 23:44:44,260 Gossiper.java (line 293) @@@@ Size of
GossipDigestAckMessage is 1374
TRACE [GMFD:1] 2010-05-26 23:44:44,261 Gossiper.java (line 937) Sending a
GossipDigestAckMessage to /10.192.63.127
So it seems like cassandra needs 56 bytes for each server in a gossip packet
which with a maximum packet size of 1428 means at most you can have 24 servers?
Which sort of sucks, since I have 27 right now. However, I'm not certain
how this would explain what I see which is a complete cluster restart works
fine for about 14 hours, then suddenly no longer works?
Ideas?
-Anthony
On Wed, May 26, 2010 at 08:59:30AM -0700, Anthony Molinaro wrote:
> Hi,
>
> I noticed yesterday I have lots of these messages
>
> INFO [GMFD:1] 2010-05-25 23:21:04,070 GossipDigestSynMessage.java (line 152)
> Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-05-25 23:21:05,224 GossipDigestSynMessage.java (line 129)
> @@@@ Breaking out to respect the MTU size in EPS. Estimate is 56 @@@@
>
> The first message only occurs on some machines in my cluster. The second
> on all of them.
>
> The ones with the first message seem to be building up quite a backlog
> in their MessageDeserializer PendingTasks.
>
> I assume there is a correlation, what could be causing this sort of thing?
>
> This cluster is now at 27 m1.xlarge boxes on ec2 running 0.6.2 of some flavor.
>
> I ended up restarting one of the boxes which was behind and when it came
> back it only had some parts of the ring, so I shutdown everything, brought
> the seed nodes back, then brought the rest back and that seemed to fix
> things, but this definitely seems like some sort of bug with gossip?
>
> Thanks,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro <anthonym@alumni.caltech.edu>
--
------------------------------------------------------------------------
Anthony Molinaro <anthonym@alumni.caltech.edu>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic