[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: [EXTERNAL] Re: [Discuss] Generic Purpose Rate Limiter in Cassandra
From:       Jaydeep Chovatia <chovatia.jaydeep () gmail ! com>
Date:       2024-05-07 4:24:14
Message-ID: CABzeAR4u=PXbAgQ0a1t97czarj1uVnC6TkgY212eF2RdnRToZw () mail ! gmail ! com
[Download RAW message or body]

Sure, Caleb. I will include the work as part of CASSANDRA-19534
<https://issues.apache.org/jira/browse/CASSANDRA-19534> in the CEP-41.

Jaydeep

On Fri, May 3, 2024 at 7:48 AM Caleb Rackliffe <calebrackliffe@gmail.com>
wrote:

> FYI, there is some ongoing sort-of-related work going on in
> CASSANDRA-19534 <https://issues.apache.org/jira/browse/CASSANDRA-19534>
> 
> On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia <
> chovatia.jaydeep@gmail.com> wrote:
> 
> > Just created an official CEP-41
> > <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter>
> >  incorporating the feedback from this discussion. Feel free to let me know
> > if I may have missed some important feedback in this thread that is not
> > captured in the CEP-41.
> > 
> > Jaydeep
> > 
> > On Thu, Feb 22, 2024 at 11:36 AM Jaydeep Chovatia <
> > chovatia.jaydeep@gmail.com> wrote:
> > 
> > > Thanks, Josh. I will file an official CEP with all the details in a few
> > > days and update this thread with that CEP number.
> > > Thanks a lot everyone for providing valuable insights!
> > > 
> > > Jaydeep
> > > 
> > > On Thu, Feb 22, 2024 at 9:24 AM Josh McKenzie <jmckenzie@apache.org>
> > > wrote:
> > > 
> > > > Do folks think we should file an official CEP and take it there?
> > > > 
> > > > +1 here.
> > > > 
> > > > Synthesizing your gdoc, Caleb's work, and the feedback from this thread
> > > > into a draft seems like a solid next step.
> > > > 
> > > > On Wed, Feb 7, 2024, at 12:31 PM, Jaydeep Chovatia wrote:
> > > > 
> > > > I see a lot of great ideas being discussed or proposed in the past to
> > > > cover the most common rate limiter candidate use cases. Do folks think we
> > > > should file an official CEP and take it there?
> > > > 
> > > > Jaydeep
> > > > 
> > > > On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe <
> > > > calebrackliffe@gmail.com> wrote:
> > > > 
> > > > I just remembered the other day that I had done a quick writeup on the
> > > > state of compaction stress-related throttling in the project:
> > > > 
> > > > 
> > > > https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing
> > > >  
> > > > I'm sure most of it is old news to the people on this thread, but I
> > > > figured I'd post it just in case :)
> > > > 
> > > > On Tue, Jan 30, 2024 at 11:58 AM Josh McKenzie <jmckenzie@apache.org>
> > > > wrote:
> > > > 
> > > > 
> > > > 2.) We should make sure the links between the "known" root causes of
> > > > cascading failures and the mechanisms we introduce to avoid them remain
> > > > very strong.
> > > > 
> > > > Seems to me that our historical strategy was to address individual
> > > > known cases one-by-one rather than looking for a more holistic
> > > > load-balancing and load-shedding solution. While the engineer in me likes
> > > > the elegance of a broad, more-inclusive *actual SEDA-like* approach,
> > > > the pragmatist in me wonders how far we think we are today from a stable
> > > > set-point.
> > > > 
> > > > i.e. are we facing a handful of cases where nodes can still get pushed
> > > > over and then cascade that we can surgically address, or are we facing a
> > > > broader lack of back-pressure that rears its head in different domains
> > > > (client -> coordinator, coordinator -> replica, internode with other
> > > > operations, etc) at surprising times and should be considered more
> > > > holistically?
> > > > 
> > > > On Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:
> > > > 
> > > > I almost forgot CASSANDRA-15817, which introduced
> > > > reject_repair_compaction_threshold, which provides a mechanism to stop
> > > > repairs while compaction is underwater.
> > > > 
> > > > On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe <calebrackliffe@gmail.com>
> > > > wrote:
> > > > 
> > > > 
> > > > Hey all,
> > > > 
> > > > I'm a bit late to the discussion. I see that we've already discussed
> > > > CASSANDRA-15013 <https://issues.apache.org/jira/browse/CASSANDRA-15013>
> > > > and CASSANDRA-16663
> > > > <https://issues.apache.org/jira/browse/CASSANDRA-16663> at least in
> > > > passing. Having written the latter, I'd be the first to admit it's a crude
> > > > tool, although it's been useful here and there, and provides a couple
> > > > primitives that may be useful for future work. As Scott mentions, while it
> > > > is configurable at runtime, it is not adaptive, although we did
> > > > make configuration easier in CASSANDRA-17423
> > > > <https://issues.apache.org/jira/browse/CASSANDRA-17423>. It also is
> > > > global to the node, although we've lightly discussed some ideas around
> > > > making it more granular. (For example, keyspace-based limiting, or limiting
> > > > "domains" tagged by the client in requests, could be interesting.) It also
> > > > does not deal with inter-node traffic, of course.
> > > > 
> > > > Something we've not yet mentioned (that does address internode traffic)
> > > > is CASSANDRA-17324
> > > > <https://issues.apache.org/jira/browse/CASSANDRA-17324>, which I
> > > > proposed shortly after working on the native request limiter (and have just
> > > > not had much time to return to). The basic idea is this:
> > > > 
> > > > When a node is struggling under the weight of a compaction backlog and
> > > > becomes a cause of increased read latency for clients, we have two safety
> > > > valves:
> > > > 
> > > > 
> > > > 1.) Disabling the native protocol server, which stops the node from
> > > > coordinating reads and writes.
> > > > 2.) Jacking up the severity on the node, which tells the dynamic snitch
> > > > to avoid the node for reads from other coordinators.
> > > > 
> > > > 
> > > > These are useful, but we don't appear to have any mechanism that would
> > > > allow us to temporarily reject internode hint, batch, and mutation messages
> > > > that could further delay resolution of the compaction backlog.
> > > > 
> > > > 
> > > > Whether it's done as part of a larger framework or on its own, it still
> > > > feels like a good idea.
> > > > 
> > > > Thinking in terms of opportunity costs here (i.e. where we spend our
> > > > finite engineering time to holistically improve the experience of operating
> > > > this database) is healthy, but we probably haven't reached the point of
> > > > diminishing returns on nodes being able to protect themselves from clients
> > > > and from other nodes. I would just keep in mind two things:
> > > > 
> > > > 1.) The effectiveness of rate-limiting in the system (which includes
> > > > the database and all clients) as a whole necessarily decreases as we move
> > > > from the application to the lowest-level database internals. Limiting
> > > > correctly at the client will save more resources than limiting at the
> > > > native protocol server, and limiting correctly at the native protocol
> > > > server will save more resources than limiting after we've dispatched
> > > > requests to some thread pool for processing.
> > > > 2.) We should make sure the links between the "known" root causes of
> > > > cascading failures and the mechanisms we introduce to avoid them remain
> > > > very strong.
> > > > 
> > > > In any case, I'd be happy to help out in any way I can as this moves
> > > > forward (especially as it relates to our past/current attempts to address
> > > > this problem space).
> > > > 
> > > > 
> > > > 
> > > > 


[Attachment #3 (text/html)]

<div dir="ltr">Sure, Caleb. I will include the work as part of  <a \
href="https://issues.apache.org/jira/browse/CASSANDRA-19534" \
id="m_5731778228093056870gmail-key-val" rel="13574821" target="_blank" \
style="color:rgb(0,82,204);font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px">CASSANDRA-19534</a>  \
in the CEP-41.<div><br></div><div>Jaydeep</div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, May 3, 2024 at \
7:48 AM Caleb Rackliffe &lt;<a \
href="mailto:calebrackliffe@gmail.com">calebrackliffe@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr">FYI, there is some ongoing sort-of-related work going on in  <a \
href="https://issues.apache.org/jira/browse/CASSANDRA-19534" \
id="m_5731778228093056870gmail-key-val" rel="13574821" \
style="font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px;color:rgb(0,82,204)" \
target="_blank">CASSANDRA-19534</a></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Wed, Apr 10, 2024 at 6:35 PM Jaydeep Chovatia &lt;<a \
href="mailto:chovatia.jaydeep@gmail.com" \
target="_blank">chovatia.jaydeep@gmail.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr">Just created an official <a \
href="https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-41+%28DRAFT%29+Apache+Cassandra+Unified+Rate+Limiter" \
target="_blank">CEP-41</a> incorporating the feedback from this discussion. Feel free \
to let me know if I may have missed some important  feedback in this thread that is \
not captured in the CEP-41.<div><div><br></div><div>Jaydeep</div></div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 22, 2024 at \
11:36 AM Jaydeep Chovatia &lt;<a href="mailto:chovatia.jaydeep@gmail.com" \
target="_blank">chovatia.jaydeep@gmail.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr">Thanks, Josh. I will file an official CEP with all the details in a few \
days and update this thread with that CEP number.<div>Thanks a lot everyone for \
providing valuable insights!<br><div><br></div><div>Jaydeep</div></div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Feb 22, 2024 at \
9:24 AM Josh McKenzie &lt;<a href="mailto:jmckenzie@apache.org" \
target="_blank">jmckenzie@apache.org</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><u></u><div><blockquote \
type="cite"><div>Do folks think we should file an official CEP and take it \
there?<br></div></blockquote><div>+1 here.<br></div><div><br></div><div>Synthesizing \
your gdoc, Caleb&#39;s work, and the feedback from this thread into a draft seems \
like a solid next step.</div><div><br></div><div>On Wed, Feb 7, 2024, at 12:31 PM, \
Jaydeep Chovatia wrote:<br></div><blockquote type="cite" \
id="m_5731778228093056870m_-6921198577149219309m_-7984032332765510640m_9160566408033444713qt"><div \
dir="ltr"><div>I see a lot of great ideas being discussed or proposed in the past to \
cover the most common rate limiter candidate use cases. Do folks think we should file \
an official CEP and take it \
there?<br></div><div><br></div><div>Jaydeep<br></div></div><div><br></div><div><div \
dir="ltr">On Fri, Feb 2, 2024 at 8:30 AM Caleb Rackliffe &lt;<a \
href="mailto:calebrackliffe@gmail.com" \
target="_blank">calebrackliffe@gmail.com</a>&gt; wrote:<br></div><blockquote \
style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr"><div>I just remembered the other day that I had done a quick writeup on the \
state of compaction stress-related throttling in the \
project:<br></div><div><br></div><div><div><a \
href="https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing" \
target="_blank">https://docs.google.com/document/d/1dfTEcKVidRKC1EWu3SO1kE1iVLMdaJ9uY1WMpS3P_hs/edit?usp=sharing</a><br></div><div><br></div><div>I&#39;m \
sure most of it is old news to the people on this thread, but I figured I&#39;d post \
it just in case :)<br></div></div></div><div><br></div><div><div dir="ltr">On Tue, \
Jan 30, 2024 at 11:58 AM Josh McKenzie &lt;<a href="mailto:jmckenzie@apache.org" \
target="_blank">jmckenzie@apache.org</a>&gt; wrote:<br></div><blockquote \
style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div><div><u></u><br></div><div><blockquote \
type="cite"><div dir="ltr"><div>2.) We should make sure the links between the \
&quot;known&quot; root causes of cascading failures and the mechanisms we introduce \
to avoid them remain very strong.<br></div></div></blockquote><div \
dir="ltr"><div>Seems to me that our historical strategy was to address individual \
known cases one-by-one rather than looking for a more holistic load-balancing and \
load-shedding solution. While the engineer in me likes the elegance of a broad, \
more-inclusive  <i>actual SEDA-like</i> approach, the pragmatist in me wonders how \
far we think we are today from a stable set-point.<br></div><div><br></div><div>i.e. \
are we facing a handful of cases where nodes can still get pushed over and then \
cascade that we can surgically address, or are we facing a broader lack of \
back-pressure that rears its head in different domains (client -&gt; coordinator, \
coordinator -&gt; replica, internode with other operations, etc) at surprising times \
and should be considered more holistically?<br></div></div><div><br></div><div>On \
Tue, Jan 30, 2024, at 12:31 AM, Caleb Rackliffe wrote:<br></div><blockquote \
type="cite" id="m_5731778228093056870m_-6921198577149219309m_-7984032332765510640m_9160566408033444713qt-m_529344327455457930m_2033029874085850919qt"><div \
dir="ltr">I almost forgot CASSANDRA-15817, which introduced \
reject_repair_compaction_threshold, which provides a mechanism to stop repairs while \
compaction is underwater.<br></div><div dir="ltr"><div><br></div><blockquote \
type="cite"><div>On Jan 26, 2024, at 6:22 PM, Caleb Rackliffe &lt;<a \
href="mailto:calebrackliffe@gmail.com" \
target="_blank">calebrackliffe@gmail.com</a>&gt; \
wrote:<br></div></blockquote></div><blockquote type="cite"><div \
dir="ltr"><div><br></div><div dir="ltr"><div>Hey \
all,<br></div><div><br></div><div>I&#39;m a bit late to the discussion. I see that \
we&#39;ve already discussed  <a \
href="https://issues.apache.org/jira/browse/CASSANDRA-15013" \
id="m_5731778228093056870m_-6921198577149219309m_-7984032332765510640m_9160566408033444713qt-m_529344327455457930m_2033029874085850919qt-gmail-key-val" \
rel="13214422" style="font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px;color:rgb(0,82,204)" \
target="_blank">CASSANDRA-15013</a>  and  <a \
href="https://issues.apache.org/jira/browse/CASSANDRA-16663" \
id="m_5731778228093056870m_-6921198577149219309m_-7984032332765510640m_9160566408033444713qt-m_529344327455457930m_2033029874085850919qt-gmail-key-val" \
rel="13377694" style="font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px;color:rgb(0,82,204)" \
target="_blank">CASSANDRA-16663</a>  at least in passing. Having written the latter, \
I&#39;d be the first to admit it&#39;s a crude tool, although it&#39;s been useful \
here and there,  and provides a couple primitives that may be useful for future  \
work. As Scott mentions, while it is configurable at runtime, it is not adaptive,  \
although we did make  configuration easier in  <a \
href="https://issues.apache.org/jira/browse/CASSANDRA-17423" \
id="m_5731778228093056870m_-6921198577149219309m_-7984032332765510640m_9160566408033444713qt-m_529344327455457930m_2033029874085850919qt-gmail-key-val" \
rel="13432692" style="font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px;color:rgb(0,82,204)" \
target="_blank">CASSANDRA-17423</a>. It also is global to the node, although \
we&#39;ve lightly discussed some ideas around making it more granular. (For example, \
keyspace-based limiting, or limiting &quot;domains&quot; tagged by the client in \
requests, could be interesting.) It also does not deal with inter-node traffic,  of \
course.<br></div><div><br></div><div>Something we&#39;ve not yet mentioned (that does \
address internode traffic) is  <a \
href="https://issues.apache.org/jira/browse/CASSANDRA-17324" \
id="m_5731778228093056870m_-6921198577149219309m_-7984032332765510640m_9160566408033444713qt-m_529344327455457930m_2033029874085850919qt-gmail-key-val" \
rel="13425490" style="font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px;color:rgb(0,82,204)" \
target="_blank">CASSANDRA-17324</a>, which I proposed shortly after working on the \
native request limiter (and have just not had much time to return to). The basic idea \
is this:<br></div><div><br></div><blockquote style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><p \
style="margin:0px;padding:0px;color:rgb(23,43,77);font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px">When a node is \
struggling under the weight of a compaction backlog and becomes a cause of increased \
read latency for clients, we have two safety valves:<br></p><p style="margin:10px 0px \
0px;padding:0px;color:rgb(23,43,77);font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px"><br></p><div>1.) \
Disabling the native protocol server, which stops the node from coordinating reads \
and writes.<br></div><div>2.) Jacking up the severity on the node, which tells the \
dynamic snitch to avoid the node for reads from other \
coordinators.<br></div><p><br></p><p style="margin:10px 0px \
0px;padding:0px;color:rgb(23,43,77);font-family:-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid \
Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px">These are useful, \
but we don't appear to have any mechanism that would allow us to temporarily reject \
internode hint, batch, and mutation messages that could further delay resolution of \
the compaction backlog.<br></p></blockquote><div><br></div><div>Whether it&#39;s done \
as part of a larger framework or on its own, it still feels like a good \
idea.<br></div><div><br></div><div>Thinking in terms of opportunity costs here (i.e. \
where we spend our finite engineering time to holistically  improve the experience of \
operating this database) is healthy, but we probably haven&#39;t reached the point of \
diminishing returns on nodes being able to protect themselves from clients and from \
other nodes. I would just keep in mind two things:<br></div><div><br></div><div>1.) \
The effectiveness of rate-limiting in the system (which includes the database and all \
clients) as a whole necessarily decreases as we move from the application to the \
lowest-level database internals. Limiting correctly at the client will save more \
resources than limiting at the native protocol server, and limiting correctly at the \
native protocol server will save more resources than limiting after we&#39;ve \
dispatched requests to some thread pool for processing.<br></div><div>2.) We should \
make sure the links between the &quot;known&quot; root causes of cascading failures \
and the mechanisms we introduce to avoid them remain very \
strong.<br></div><div><br></div><div>In any case, I&#39;d be happy to help out in any \
way I can as this moves forward (especially as it relates to our past/current \
attempts to address this problem \
space).<br></div></div></div></blockquote></blockquote><div><br></div></div></div></bl \
ockquote></div></blockquote></div></blockquote><div><br></div></div></div></blockquote></div>
 </blockquote></div>
</blockquote></div>
</blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic