[prev in list] [next in list] [prev in thread] [next in thread]
List: cassandra-user
Subject: Re: Permission/Role Cache causing timeouts in apps.
From: Chahat Bhatia <chahat.bhatia () goevive ! com>
Date: 2021-07-27 15:55:17
Message-ID: CAJPMtkBspc+L+-YPA43XNL5K_=Jv+Ehrh6Pf9T3THSXZK837TQ () mail ! gmail ! com
[Download RAW message or body]
Okay. Sure. Thanks a lot for all the information. Really helped. :)
On Tue, 27 Jul 2021 at 21:05, Bowen Song <bowen@bso.ng> wrote:
> Based on the information I know, I'd say that you don't have any specific
> issue with the authentication related tables, but you do have a general
> overloading problem during peak load. I think it's fairly likely that your
> 7 nodes cluster (6 nodes in one DC) is not able to keep up with the peak
> load, and you will need to either scale up for the peak load or tune the
> application to avoid the bursty behaviour.
> On 27/07/2021 16:27, Chahat Bhatia wrote:
>
> Yes, the application in quite read heavy and the request pattern is bursty
> too. Hence that big a request failure in such less time.
> Also, nothing out of the ordinary in cfstats and proxyhistograms.
>
> But there are Native-Transport-Requests dropped messages (Almost similar
> stats on all the nodes) :
>
> /opt/evive/pkgs/apache-cassandra-3.0.13/bin/nodetool tpstats
> >
> > Pool Name Active Pending Completed
> > Blocked All time blocked
> > ...
> > Native-Transport-Requests 6 0 27777476
> > 0 8262
> >
> > Message type Dropped
> > READ 24260
> >
>
>
> * We have the below values for the following params:*
> concurrent_reads: 96
> native_transport_max_threads: 320
> JVM_OPTS="$JVM_OPTS -Dcassandra.max_queued_native_transport_requests=3072"
>
> Do you think it would be advisable to tune the number in the above params
> to have lesser load on the node?
>
>
> On Tue, 27 Jul 2021 at 20:13, Bowen Song <bowen@bso.ng> <bowen@bso.ng>
> wrote:
>
> > Wow, 15 seconds timeout? That's pretty long... You may want to check the nodetool
> > tpstats and make sure the NTP thread pool isn't blocking things.
> >
> > 16k read requests dropped in 5 seconds, or over 3k requests per second on
> > a single node, is a bit suspicious. Does your read requests tend to be
> > bursty?
> > On 27/07/2021 15:32, Chahat Bhatia wrote:
> >
> > Yes, RF=6 for system auth. Sorry my bad.
> >
> >
> > No, we are not using cassandra user for the application. We have a custom
> > super user for our operational and administrative tasks and a separate role
> > with needed perms for the application.
> >
> > > role | super | login | options
> > > ------------------------+-------+-------+---------
> > > EviveCassandra | True | True | {}
> >
> > ccusr | False | True | {}
> >
> >
> >
> >
> > - gc_warn_threshold_in_ms: 1000
> > - read_request_timeout_in_ms: 15000 // been the same since the start
> > - cross_node_timeout: false // but we have ntp on each
> > server and the clocks are synced.
> >
> > - On checking the logs a little more I found out these:
> >
> > INFO [ScheduledTasks:1] 2021-07-26 07:21:33,279
> > > MessagingService.java:1013 - READ messages were dropped in last 5000 ms:
> > > 289 for internal timeout and 0 for cross node timeout
> >
> > INFO [ScheduledTasks:1] 2021-07-26 07:22:33,894
> > > MessagingService.java:1013 - READ messages were dropped in last 5000 ms:
> > > 273 for internal timeout and 0 for cross node timeout
> >
> > INFO [ScheduledTasks:1] 2021-07-27 06:12:03,988
> > > MessagingService.java:1013 - READ messages were dropped in last 5000 ms:
> > > 160 for internal timeout and 0 for cross node timeout
> >
> > INFO [ScheduledTasks:1] 2021-07-27 06:13:09,125
> > > MessagingService.java:1013 - READ messages were dropped in last 5000 ms:
> > > 223 for internal timeout and 0 for cross node timeout
> >
> > INFO [ScheduledTasks:1] 2021-07-27 06:13:44,212
> > > MessagingService.java:1013 - READ messages were dropped in last 5000 ms: 72
> > > for internal timeout and 0 for cross node timeout
> >
> > INFO [ScheduledTasks:1] 2021-07-27 06:14:14,496
> > > MessagingService.java:1013 - READ messages were dropped in last 5000 ms:
> > > 16036 for internal timeout and 0 for cross node timeout
> >
> > INFO [ScheduledTasks:1] 2021-07-27 06:15:59,744
> > > MessagingService.java:1013 - READ messages were dropped in last 5000 ms:
> > > 287 for internal timeout and 0 for cross node timeout
> >
> >
> >
> > - Also, I am checking cfstats and proxyhistorgrams is in progress,
> > will update incase anythings suspicious.
> >
> >
> > On Tue, 27 Jul 2021 at 18:09, Bowen Song <bowen@bso.ng> <bowen@bso.ng>
> > wrote:
> >
> > > Hello Chahat,
> > >
> > >
> > > You haven't replied to the first point, are you using the "cassandra"
> > > user?
> > >
> > >
> > > The schema and your description don't quite match. When you said:
> > >
> > > *the system_auth for 2 DCs : **us-east*
> > > * with 6 nodes (and RF=3) and ... *
> > >
> > > I assume you meant to say 6 nodes and RF=6? That sounds totally fine.
> > >
> > >
> > > As of the GC logs, do you see any WARN level logs? You may also want to
> > > double check the choice of the gc_warn_threshold_in_ms value is
> > > reasonable. I'm a bit surprised to see the GC logs full of ~1.6ms GC
> > > pauses. AFAIK, by default Cassandra only logs GC pauses longer than 200ms.
> > > I have to assume that's the result of your custom script. You may want to
> > > double check the script actually does what it says on the tin, just in case
> > > it misses something.
> > >
> > >
> > > Then, the other relevant questions.
> > >
> > > Do you use cross node timeout? If so, are the clocks in sync between all
> > > servers?
> > >
> > > What is the timeout values in the cassandra.yaml file? (
> > > read_request_timeout_in_ms, etc.) Are they reasonably long enough for
> > > the corresponding request type to complete?
> > >
> > > Since you've only got 7 nodes, I'd also recommend you to check the nodetool
> > > cfstats & nodetool cfhistograms output for the tables in the
> > > system_auth keyspace on each node, and see if you can spot anything out of
> > > ordinary.
> > >
> > >
> > > Regards,
> > >
> > > Bowen
> > > On 27/07/2021 09:23, Chahat Bhatia wrote:
> > >
> > > Thanks for the prompt response.
> > >
> > > * Here is the system_schema.keyspaces entry:*
> > >
> > > system_auth | True | {'class':
> > > > 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '6',
> > > > 'us-east-backup': '1'}
> > > > census | True | {'class':
> > > > 'org.apache.cassandra.locator.NetworkTopologyStrategy', 'us-east': '3',
> > > > 'us-east-backup': '1'}
> > >
> > >
> > > So, the system_auth for 2 DCs : *us-east with 6 nodes (and RF=3) and
> > > us-east-backup with a single node and 100% of the data.*
> > > *Keyspace census* showing RF=3 for the main DC and RF=1 for the backup
> > > site.
> > >
> > >
> > > And for running the application, we have a user specifically created for
> > > > that application itself and that user also has its appropriate permissions
> > > > in cassandra to select, modify and delete from concerned tables. And it
> > > > uses LOCAL_QUORUM for querying the data and the local-dc is set to
> > > > 'us-east'.
> > >
> > >
> > >
> > > Also, there is no excessive GC for any of the nodes, we run a custom
> > > script to trackthe GC stats (from the cassandra log itself) and output it.
> > > Below is the output from the current running script and its similar for all
> > > the servers:
> > >
> > > 2021-07-27 03:04:45,072 INFO gcstats:58 - Application Thread stop time
> > > > 0.001656 seconds.
> > > > 2021-07-27 03:04:45,080 INFO gcstats:58 - Application Thread stop time
> > > > 0.001669 seconds.
> > > > 2021-07-27 03:04:45,087 INFO gcstats:58 - Application Thread stop time
> > > > 0.001601 seconds.
> > > > 2021-07-27 03:04:45,095 INFO gcstats:58 - Application Thread stop time
> > > > 0.001713 seconds.
> > > > 2021-07-27 03:04:45,103 INFO gcstats:58 - Application Thread stop time
> > > > 0.001586 seconds.
> > > > 2021-07-27 03:04:45,110 INFO gcstats:58 - Application Thread stop time
> > > > 0.001671 seconds.
> > > > 2021-07-27 03:04:45,118 INFO gcstats:58 - Application Thread stop time
> > > > 0.001691 seconds.
> > > > 2021-07-27 03:04:45,127 INFO gcstats:58 - Application Thread stop time
> > > > 0.001860 seconds.
> > > > 2021-07-27 03:04:45,134 INFO gcstats:58 - Application Thread stop time
> > > > 0.001630 seconds.
> > > > 2021-07-27 03:04:45,141 INFO gcstats:58 - Application Thread stop time
> > > > 0.001515 seconds.
> > > > 2021-07-27 03:04:45,148 INFO gcstats:58 - Application Thread stop time
> > > > 0.001533 seconds.
> > > > 2021-07-27 03:04:45,156 INFO gcstats:58 - Application Thread stop time
> > > > 0.001630 seconds.
> > > > 2021-07-27 03:04:45,163 INFO gcstats:58 - Application Thread stop time
> > > > 0.001577 seconds.
> > > > 2021-07-27 03:04:45,170 INFO gcstats:58 - Application Thread stop time
> > > > 0.001538 seconds.
> > > > 2021-07-27 03:04:45,177 INFO gcstats:58 - Application Thread stop time
> > > > 0.001615 seconds.
> > > > 2021-07-27 03:04:45,186 INFO gcstats:58 - Application Thread stop time
> > > > 0.001584 seconds.
> > > > 2021-07-27 03:04:45,193 INFO gcstats:58 - Application Thread stop time
> > > > 0.001558 seconds.
> > > > 2021-07-27 03:04:45,200 INFO gcstats:58 - Application Thread stop time
> > > > 0.001696 seconds.
> > > > 2021-07-27 03:04:45,208 INFO gcstats:58 - Application Thread stop time
> > > > 0.001658 seconds.
> > > > 2021-07-27 03:04:45,215 INFO gcstats:58 - Application Thread stop time
> > > > 0.001592 seconds.
> > > > 2021-07-27 03:04:45,222 INFO gcstats:58 - Application Thread stop time
> > > > 0.001618 seconds.
> > > > 2021-07-27 03:05:08,907 INFO gcstats:58 - Application Thread stop time
> > > > 0.001624 seconds.
> > > > 2021-07-27 03:06:34,436 INFO gcstats:58 - Application Thread stop time
> > > > 0.297773 seconds.
> > > >
> > >
> > >
> > >
> > >
> > >
> > > On Tue, 27 Jul 2021 at 13:23, 'Bowen Song' via Infra Updates <
> > > infra-updates@goevive.com> wrote:
> > >
> > > > Hello Chahat,
> > > >
> > > >
> > > > First, can you please make sure the Cassandra user used by the
> > > > application is not "cassandra"? Because the "cassandra" user uses
> > > > QUORUM consistency level to read the auth tables.
> > > >
> > > > Then, can you please make sure the replication strategy is set
> > > > correctly for the system_auth namespace? I.e.: ensure the old DC is not
> > > > present, and the new DC has sufficient number of replicas for fault
> > > > tolerance.
> > > >
> > > > Finally, can you please check the GC logs, and make sure there isn't
> > > > JVM GC issues, espicially long STW pauses?
> > > >
> > > >
> > > > Regards,
> > > >
> > > > Bowen
> > > >
> > > >
> > > > On 27/07/2021 08:34, Chahat Bhatia wrote:
> > > >
> > > > Hi Community,
> > > >
> > > > Context: We are running a cluster of 6 nodes in production with a RF=3
> > > > in AWS.
> > > > We recently moved from physical servers to cloud by adding a new DC and
> > > > then removing the old one. Everything is working fine in all the other
> > > > applications except this one.
> > > >
> > > >
> > > > *As we recently started experiencing read timeouts in one of our
> > > > production applications where the client threw *
> > > >
> > > > Error An unexpected error occurred server side on ip-IP.ec2.internal:
> > > > > com.google.common.util.concurrent.*UncheckedExecutionException*:
> > > > > *com.google.common.util.concurrent.UncheckedExecutionException:**
> > > > > java.lang.RuntimeException:
> > > > > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> > > > > received only 0 responses.*
> > > > > com.datastax.driver.core.exceptions.ServerError: An unexpected error
> > > > > occurred server side :
> > > > > com.google.common.util.concurrent.UncheckedExecutionException:
> > > > > com.google.common.util.concurrent.UncheckedExecutionException:
> > > > > java.lang.RuntimeException:
> > > > > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed
> > > > > out - received only 0 responses.
> > > >
> > > > at com.datastax.driver.core.exceptions.ServerError.copy(
> > > > > ServerError.java:63) ~[cassandra-driver-core-3.3.0-shaded.jar!/:?] at
> > > > > com.datastax.driver.core.exceptions.ServerError.copy(ServerError.java:
> > > > > 25) ~[cassandra-driver-core-3.3.0-shaded.jar!/:?] at
> > > > > com.datastax.driver.core.DriverThrowables.propagateCause(
> > > > > DriverThrowables.java:37) ~[cassandra-driver-core-3.3.0-shaded.jar!/:?
> > > > > ] at
> > > > > com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly(
> > > > > DefaultResultSetFuture.java:245) ~[cassandra-driver-core-3
> > > > > .3.0-shaded.jar!/:?] at
> > > > > com.datastax.driver.core.AbstractSession.execute(AbstractSession.java:
> > > > > 68) ~[cassandra-driver-core-3.3.0-shaded.jar!/:? ]
> > > >
> > > > ............ cntd
> > > >
> > > >
> > > >
> > > > com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra
> > > > > timeout during read query at consistency LOCAL_QUORUM (2 responses were
> > > > > required but only 1 replica responded)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *And around the same time these were the errors on the server side
> > > > (from cassandra logs): *
> > > >
> > > >
> > > > >
> > > > > *ERROR [RolesCacheRefresh:1] 2021-07-26 06:32:43,094
> > > > > CassandraDaemon.java:207 - Exception in thread
> > > > > Thread[RolesCacheRefresh:1,5,main] java.lang.RuntimeException:
> > > > > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> > > > > received only 0 responses. * at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.getRoles(CassandraRoleManager.java:280)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:135)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:130)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > > > ~[na:1.8.0_131]
> > > > > at
> > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > > > ~[na:1.8.0_131]
> > > > > at
> > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > > > [na:1.8.0_131]
> > > > > at
> > > > > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> > > > > [apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
> > > > > Caused by: org.apache.cassandra.exceptions.ReadTimeoutException:
> > > > > Operation timed out - received only 0 responses.
> > > > > at
> > > > > org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > >
> > > > >
> > > > >
> > > > > *ERROR [PermissionsCacheRefresh:1] 2021-07-26 07:11:25,804
> > > > > CassandraDaemon.java:207 - Exception in thread
> > > > > Thread[PermissionsCacheRefresh:1,5,main] java.lang.RuntimeException:
> > > > > org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out -
> > > > > received only 0 responses. * at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.isSuper(CassandraRoleManager.java:304)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.Roles.hasSuperuserStatus(Roles.java:52)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:71)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:76)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:136)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:131)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > > > > ~[na:1.8.0_131]
> > > > > at
> > > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > > > > ~[na:1.8.0_131]
> > > > > at
> > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > > > > [na:1.8.0_131]
> > > > > at
> > > > > org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
> > > > > [apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_131]
> > > > > Caused by: org.apache.cassandra.exceptions.ReadTimeoutException:
> > > > > Operation timed out - received only 0 responses.
> > > > > at
> > > > > org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > > at
> > > > > org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
> > > > > ~[apache-cassandra-3.0.13.jar:3.0.13]
> > > > >
> > > >
> > > >
> > > >
> > > > *These are the values of these params in my configuration file *
> > > >
> > > > permissions_validity_in_ms: 300000
> > > > > permissions_update_interval_in_ms: 20000
> > > > > roles_validity_in_ms: 300000
> > > > > roles_update_interval_in_ms: 20000
> > > > >
> > > >
> > > > This was not the case earlier and since this comes from a single app
> > > > alone we are not sure if this is actually the issue. Can anyone please
> > > > point out if these values are misconfigured and hence causing the issue or
> > > > is it somewhere else we should be looking at?
> > > >
> > > > Any help would be appreciated.
> > > >
> > > > Thanks & Regards,
> > > > Chahat.
> > > >
> > > >
> > >
> > > --
> > > Thanks & Regards,
> > > Chahat Bhatia
> > > Systems Engineer
> > > *Evive*
> > > +91 7087629779
> > >
> > >
> >
> > --
> > Thanks & Regards,
> > Chahat Bhatia
> > Systems Engineer
> > *Evive*
> > +91 7087629779
> >
> >
>
> --
> Thanks & Regards,
> Chahat Bhatia
> Systems Engineer
> *Evive*
> +91 7087629779
>
>
--
Thanks & Regards,
Chahat Bhatia
Systems Engineer
*Evive*
+91 7087629779
[Attachment #3 (text/html)]
<div dir="ltr">Okay. Sure. Thanks a lot for all the information. Really helped. :) \
</div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, 27 Jul \
2021 at 21:05, Bowen Song <bowen@bso.ng> wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex">
<div>
<p>Based on the information I know, I'd say that you don't have any
specific issue with the authentication related tables, but you do
have a general overloading problem during peak load. I think it's
fairly likely that your 7 nodes cluster (6 nodes in one DC) is not
able to keep up with the peak load, and you will need to either
scale up for the peak load or tune the application to avoid the
bursty behaviour. <br>
</p>
<div>On 27/07/2021 16:27, Chahat Bhatia
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Yes, the application in quite read heavy and the
request pattern is bursty too. Hence that big a request failure
in such less time.
<div>Also, nothing out of the ordinary in cfstats and
proxyhistograms.<br>
<br>
But there are Native-Transport-Requests dropped messages
(Almost similar stats on all the nodes) : </div>
<div><br>
/opt/evive/pkgs/apache-cassandra-3.0.13/bin/nodetool tpstats
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Pool Name \
Active Pending Completed Blocked \
All time blocked<br>
...<br>
Native-Transport-Requests 6 0 \
27777476 0 8262<br>
<br>
Message type Dropped<br>
READ 24260<br>
</blockquote>
<div><br>
<b><br>
We have the below values for the following params:</b><br>
concurrent_reads: 96<br>
native_transport_max_threads: 320<br>
JVM_OPTS="$JVM_OPTS
-Dcassandra.max_queued_native_transport_requests=3072"<br>
<br>
Do you think it would be advisable to tune the number in the
above params to have lesser load on the node? <br>
<br>
</div>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 27 Jul 2021 at 20:13,
Bowen Song <a href="mailto:bowen@bso.ng" \
target="_blank"><bowen@bso.ng></a> wrote:<br> </div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div>
<p>Wow, 15 seconds timeout? That's pretty long... You may
want to check the <tt>nodetool tpstats</tt> and make sure
the NTP thread pool isn't blocking things.<br>
</p>
<p> 16k read requests dropped in 5 seconds, or over 3k
requests per second on a single node, is a bit suspicious.
Does your read requests tend to be bursty?</p>
<div>On 27/07/2021 15:32, Chahat Bhatia wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Yes, RF=6 for system auth. Sorry my bad.<br>
<br>
<br>
No, we are not using cassandra user for the application.
We have a custom super user for our operational and
administrative tasks and a separate role with
needed perms for the application. <br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> role \
| super | login | options<br>
------------------------+-------+-------+---------<br>
EviveCassandra | True | True | {} </blockquote>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> \
ccusr | False | True | {}</blockquote>
<div> </div>
<div><br>
<ul>
<li> gc_warn_threshold_in_ms: 1000</li>
<li>read_request_timeout_in_ms: 15000 // been the
same since the start </li>
<li>cross_node_timeout: false // but we
have ntp on each server and the clocks are
synced. <br>
<br>
</li>
<li>On checking the logs a little more I found out
these:</li>
</ul>
</div>
<blockquote style="margin:0px 0px 0px 40px;border:none;padding:0px">
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="verdana, \
sans-serif">INFO [ScheduledTasks:1] 2021-07-26 07:21:33,279
MessagingService.java:1013 - READ messages were
dropped in last 5000 ms: 289 for internal
timeout and 0 for cross node timeout</font></blockquote>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="verdana, \
sans-serif">INFO [ScheduledTasks:1] 2021-07-26 07:22:33,894
MessagingService.java:1013 - READ messages were
dropped in last 5000 ms: 273 for internal
timeout and 0 for cross node timeout</font></blockquote>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="verdana, \
sans-serif">INFO [ScheduledTasks:1] 2021-07-27 06:12:03,988
MessagingService.java:1013 - READ messages were
dropped in last 5000 ms: 160 for internal
timeout and 0 for cross node timeout</font></blockquote>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="verdana, \
sans-serif">INFO [ScheduledTasks:1] 2021-07-27 06:13:09,125
MessagingService.java:1013 - READ messages were
dropped in last 5000 ms: 223 for internal
timeout and 0 for cross node timeout</font></blockquote>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="verdana, \
sans-serif">INFO [ScheduledTasks:1] 2021-07-27 06:13:44,212
MessagingService.java:1013 - READ messages were
dropped in last 5000 ms: 72 for internal timeout
and 0 for cross node timeout</font></blockquote>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="verdana, \
sans-serif">INFO [ScheduledTasks:1] 2021-07-27 06:14:14,496
MessagingService.java:1013 - READ messages were
dropped in last 5000 ms: 16036 for internal
timeout and 0 for cross node timeout</font></blockquote>
</div>
<div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="verdana, \
sans-serif">INFO [ScheduledTasks:1] 2021-07-27 06:15:59,744
MessagingService.java:1013 - READ messages were
dropped in last 5000 ms: 287 for internal
timeout and 0 for cross node timeout</font></blockquote>
</div>
</blockquote>
<div><br>
</div>
<div>
<ul>
<li>Also, I am checking cfstats and
proxyhistorgrams is in progress, will update
incase anythings suspicious.</li>
</ul>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 27 Jul 2021 at
18:09, Bowen Song <a href="mailto:bowen@bso.ng" \
target="_blank"><bowen@bso.ng></a> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div>
<p>Hello Chahat,</p>
<p><br>
</p>
<p>You haven't replied to the first point, are you
using the "cassandra" user?</p>
<p><br>
</p>
<p>The schema and your description don't quite
match. When you said:<br>
</p>
<blockquote>
<p><i>the system_auth for 2 DCs : \
</i><i><b>us-east</b></i><i> with 6 nodes (and RF=3) and ...<br>
</i></p>
</blockquote>
<p>I assume you meant to say 6 nodes and RF=6? That
sounds totally fine.</p>
<p><br>
</p>
<p>As of the GC logs, do you see any WARN level
logs? You may also want to double check the choice
of the <tt>gc_warn_threshold_in_ms</tt> value is
reasonable. I'm a bit surprised to see the GC logs
full of ~1.6ms GC pauses. AFAIK, by default
Cassandra only logs GC pauses longer than 200ms. I
have to assume that's the result of your custom
script. You may want to double check the script
actually does what it says on the tin, just in
case it misses something.<br>
</p>
<p><br>
</p>
<p>Then, the other relevant questions.</p>
<p>Do you use cross node timeout? If so, are the
clocks in sync between all servers?</p>
<p>What is the timeout values in the cassandra.yaml
file? (<tt>read_request_timeout_in_ms,</tt> etc.)
Are they reasonably long enough for the
corresponding request type to complete?<br>
</p>
<p>Since you've only got 7 nodes, I'd also recommend
you to check the <tt>nodetool cfstats</tt> &
<tt>nodetool cfhistograms</tt> output for the
tables in the system_auth keyspace on each node,
and see if you can spot anything out of ordinary.<br>
</p>
<p><br>
</p>
<p>Regards,</p>
<p>Bowen<br>
</p>
<div>On 27/07/2021 09:23, Chahat Bhatia wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Thanks for the prompt response.<br>
<b><br>
Here is the system_schema.keyspaces entry:</b><br>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px">
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> \
system_auth | True | {'class':
\
'org.apache.cassandra.locator.NetworkTopologyStrategy',
'us-east': '6', 'us-east-backup': \
'1'}<br>
census | \
True | {'class':
\
'org.apache.cassandra.locator.NetworkTopologyStrategy',
'us-east': '3', 'us-east-backup': \
'1'}</blockquote> </blockquote>
<br>
So, the system_auth for 2 DCs : <i><b>us-east</b>
with 6 nodes (and RF=3) and <b>us-east-backup</b>
with a single node and 100% of the data.</i>
<div><b>Keyspace census</b> showing RF=3 for the
main DC and RF=1 for the backup site.<br>
<br>
<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">And for running the \
application, we have a user specifically created for that application
itself and that user also has its appropriate
permissions in cassandra to select, modify and
delete from concerned tables. And it uses
LOCAL_QUORUM for querying the data and the
local-dc is set to 'us-east'. </blockquote>
<div><br>
<br>
</div>
<div>Also, there is no excessive GC for any of
the nodes, we run a custom script to trackthe
GC stats (from the cassandra log itself) and
output it. Below is the output from the
current running script and its similar for all
the servers: </div>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">2021-07-27 \
03:04:45,072 INFO gcstats:58 - Application Thread stop time 0.001656 seconds.<br>
2021-07-27 03:04:45,080 INFO gcstats:58 -
Application Thread stop time 0.001669
seconds.<br>
2021-07-27 03:04:45,087 INFO gcstats:58 -
Application Thread stop time 0.001601
seconds.<br>
2021-07-27 03:04:45,095 INFO gcstats:58 -
Application Thread stop time 0.001713
seconds.<br>
2021-07-27 03:04:45,103 INFO gcstats:58 -
Application Thread stop time 0.001586
seconds.<br>
2021-07-27 03:04:45,110 INFO gcstats:58 -
Application Thread stop time 0.001671
seconds.<br>
2021-07-27 03:04:45,118 INFO gcstats:58 -
Application Thread stop time 0.001691
seconds.<br>
2021-07-27 03:04:45,127 INFO gcstats:58 -
Application Thread stop time 0.001860
seconds.<br>
2021-07-27 03:04:45,134 INFO gcstats:58 -
Application Thread stop time 0.001630
seconds.<br>
2021-07-27 03:04:45,141 INFO gcstats:58 -
Application Thread stop time 0.001515
seconds.<br>
2021-07-27 03:04:45,148 INFO gcstats:58 -
Application Thread stop time 0.001533
seconds.<br>
2021-07-27 03:04:45,156 INFO gcstats:58 -
Application Thread stop time 0.001630
seconds.<br>
2021-07-27 03:04:45,163 INFO gcstats:58 -
Application Thread stop time 0.001577
seconds.<br>
2021-07-27 03:04:45,170 INFO gcstats:58 -
Application Thread stop time 0.001538
seconds.<br>
2021-07-27 03:04:45,177 INFO gcstats:58 -
Application Thread stop time 0.001615
seconds.<br>
2021-07-27 03:04:45,186 INFO gcstats:58 -
Application Thread stop time 0.001584
seconds.<br>
2021-07-27 03:04:45,193 INFO gcstats:58 -
Application Thread stop time 0.001558
seconds.<br>
2021-07-27 03:04:45,200 INFO gcstats:58 -
Application Thread stop time 0.001696
seconds.<br>
2021-07-27 03:04:45,208 INFO gcstats:58 -
Application Thread stop time 0.001658
seconds.<br>
2021-07-27 03:04:45,215 INFO gcstats:58 -
Application Thread stop time 0.001592
seconds.<br>
2021-07-27 03:04:45,222 INFO gcstats:58 -
Application Thread stop time 0.001618
seconds.<br>
2021-07-27 03:05:08,907 INFO gcstats:58 -
Application Thread stop time 0.001624
seconds.<br>
2021-07-27 03:06:34,436 INFO gcstats:58 -
Application Thread stop time 0.297773
seconds.<br>
</blockquote>
</div>
</blockquote>
<div>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"><br> <br>
<br>
</blockquote>
<br>
</div>
</div>
<br>
<div class="gmail_quote">
<div dir="ltr" class="gmail_attr">On Tue, 27 Jul
2021 at 13:23, 'Bowen Song' via Infra Updates
<<a href="mailto:infra-updates@goevive.com" \
target="_blank">infra-updates@goevive.com</a>> wrote:<br>
</div>
<blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <div>
<p>Hello Chahat,</p>
<p><br>
</p>
<p>First, can you please make sure the
Cassandra user used by the application is
not "<span>cassandra"</span>? Because the
"<span>cassandra" user</span> uses QUORUM
consistency level to read the auth tables.</p>
<p>Then, can you please make sure the
replication strategy is set correctly for
the system_auth namespace? I.e.: ensure
the old DC is not present, and the new DC
has sufficient number of replicas for
fault tolerance.</p>
<p>Finally, can you please check the GC
logs, and make sure there isn't JVM GC
issues, espicially long STW pauses?</p>
<p><br>
</p>
<p>Regards,</p>
<p>Bowen</p>
<p><br>
</p>
<div>On 27/07/2021 08:34, Chahat Bhatia
wrote:<br>
</div>
<blockquote type="cite">
<div dir="ltr">Hi Community,
<div><br>
Context: We are running a cluster of 6
nodes in production with a RF=3 in
AWS.<br>
We recently moved from physical
servers to cloud by adding a new DC
and then removing the old one.
Everything is working fine in all the
other applications except this one. </div>
<div><br>
<b>As we recently started experiencing
read timeouts in one of our
production applications where the
client threw <br>
</b></div>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <blockquote class="gmail_quote" style="margin:0px 0px \
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span \
style="background-color:rgb(255,255,255)"><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">Error An \
unexpected error occurred server side on</span><span \
style="box-sizing:inherit;color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
ip-IP</span><span style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">.</span><span \
style="box-sizing:inherit;color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">ec2</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">.internal</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">:</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
com.google.common.util.concurrent.<b>UncheckedExecutionException</b>:<span \
style="box-sizing:inherit"></span></span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
<b>com.google.common.util.concurrent.UncheckedExecutionException:<span \
style="box-sizing:inherit"></span></b></span><b><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
java.lang.RuntimeException:<span style="box-sizing:inherit"></span></span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
org.apache.cassandra.exceptions.ReadTimeoutException:<span \
style="box-sizing:inherit"></span></span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> Operation \
timed out</span><span style="box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> -</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> received \
only</span><span style="box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> 0</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
responses.</span></b><span style="box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
com.datastax.driver.core.exceptions.ServerError:<span \
style="box-sizing:inherit"></span></span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> An unexpected \
error occurred server side </span> <span \
style="font-size:13px;color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces">:</span><span \
style="font-size:13px;box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> \
com.google.common.util.concurrent.UncheckedExecutionException:<span \
style="box-sizing:inherit"></span></span><span \
style="font-size:13px;box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> \
com.google.common.util.concurrent.UncheckedExecutionException:<span \
style="box-sizing:inherit"></span></span><span \
style="font-size:13px;box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> \
java.lang.RuntimeException:<span style="box-sizing:inherit"></span></span><span \
style="font-size:13px;box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> \
org.apache.cassandra.exceptions.ReadTimeoutException:<span \
style="box-sizing:inherit"></span></span><span \
style="font-size:13px;color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> Operation timed \
out</span><span style="font-size:13px;box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> -</span><span \
style="font-size:13px;color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> received only</span><span \
style="font-size:13px;box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> 0</span><span \
style="font-size:13px;color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces"> \
responses.</span></span></blockquote> <blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><span \
style="background-color:rgb(255,255,255)"><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> at \
com.datastax.driver.core.exceptions.ServerError.copy</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">(<span \
style="box-sizing:inherit">ServerError.java:<span \
style="box-sizing:inherit">63</span><span \
style="box-sizing:inherit"></span></span>)</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> ~</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">[<span \
style="box-sizing:inherit">cassandra-driver-core-3</span><span \
style="box-sizing:inherit">.3.0-shaded.jar!/:?</span>]</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> at \
com.datastax.driver.core.exceptions.ServerError.copy</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">(<span \
style="box-sizing:inherit">ServerError.java:<span \
style="box-sizing:inherit">25</span><span \
style="box-sizing:inherit"></span></span>)</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> ~</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">[<span \
style="box-sizing:inherit">cassandra-driver-core-3</span><span \
style="box-sizing:inherit">.3.0-shaded.jar!/:?</span>]</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> at \
com.datastax.driver.core.DriverThrowables.propagateCause</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">(<span \
style="box-sizing:inherit">DriverThrowables.java:<span \
style="box-sizing:inherit">37</span><span \
style="box-sizing:inherit"></span></span>)</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> ~</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">[<span \
style="box-sizing:inherit">cassandra-driver-core-3</span><span \
style="box-sizing:inherit">.3.0-shaded.jar!/:?</span>]</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> at \
com.datastax.driver.core.DefaultResultSetFuture.getUninterruptibly</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">(<span \
style="box-sizing:inherit">DefaultResultSetFuture.java:<span \
style="box-sizing:inherit">245</span><span \
style="box-sizing:inherit"></span></span>)</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> ~</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">[<span \
style="box-sizing:inherit">cassandra-driver-core-3</span><span \
style="box-sizing:inherit">.3.0-shaded.jar!/:?</span>]</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> at \
com.datastax.driver.core.AbstractSession.execute</span><span \
style="box-sizing:inherit;font-family:"Roboto Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces">(<span \
style="box-sizing:inherit">AbstractSession.java:<span \
style="box-sizing:inherit">68</span><span \
style="box-sizing:inherit"></span></span>)</span><span \
style="color:rgba(28,43,52,0.98);font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"> \
~</span></span><span style="box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;font-size:13px;white-space:break-spaces"><span \
style="background-color:rgb(255,255,255)">[<span \
style="box-sizing:inherit">cassandra-driver-core-3</span><span \
style="box-sizing:inherit">.3.0-shaded.jar!/:?</span></span></span> <span \
style="font-size:13px;box-sizing:inherit;font-family:"Roboto \
Mono",Menlo,"Liberation \
Mono",Courier,monospace;white-space:break-spaces">]</span> </blockquote> \
</blockquote>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <blockquote class="gmail_quote" style="margin:0px 0px \
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> ............ \
cntd</blockquote> </blockquote>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <div> </div>
</blockquote>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <blockquote class="gmail_quote" style="margin:0px 0px \
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font face="comic \
sans ms, sans-serif" \
color="#444444">com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra \
timeout during read query at consistency LOCAL_QUORUM
(2 responses were required but
only 1 replica responded)</font></blockquote>
</blockquote>
<div><br>
<br>
<b>And around the same time these were
the errors on the server side (from
cassandra logs):<br>
<br>
</b></div>
<div>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <blockquote class="gmail_quote" style="margin:0px 0px \
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><font \
face="verdana, sans-serif" color="#444444"><b>ERROR [RolesCacheRefresh:1]
2021-07-26 06:32:43,094
CassandraDaemon.java:207 -
Exception in thread
Thread[RolesCacheRefresh:1,5,main]<br>
java.lang.RuntimeException:
\
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received
only 0 responses.<br>
</b> at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.CassandraRoleManager.getRoles(CassandraRoleManager.java:280)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
\
org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:135) \
~[apache-cassandra-3.0.13.jar:3.0.13]<br> at
\
org.apache.cassandra.auth.RolesCache$1$1.call(RolesCache.java:130) \
~[apache-cassandra-3.0.13.jar:3.0.13]<br> at
\
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]<br>
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_131]<br>
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]<br>
at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
java.lang.Thread.run(Thread.java:748)
~[na:1.8.0_131]<br>
Caused by:
\
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received
only 0 responses.<br>
at
org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
\
org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137) \
~[apache-cassandra-3.0.13.jar:3.0.13]<br> at
org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
\
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524) \
~[apache-cassandra-3.0.13.jar:3.0.13]<br> at
org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
<br>
<b>ERROR
[PermissionsCacheRefresh:1]
2021-07-26 07:11:25,804
CassandraDaemon.java:207 -
Exception in thread
\
Thread[PermissionsCacheRefresh:1,5,main]<br> java.lang.RuntimeException:
\
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received
only 0 responses.<br>
</b> at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:512)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.CassandraRoleManager.isSuper(CassandraRoleManager.java:304)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
\
org.apache.cassandra.auth.Roles.hasSuperuserStatus(Roles.java:52) \
~[apache-cassandra-3.0.13.jar:3.0.13]<br> at
org.apache.cassandra.auth.AuthenticatedUser.isSuper(AuthenticatedUser.java:71)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.CassandraAuthorizer.authorize(CassandraAuthorizer.java:76)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:136)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.PermissionsCache$1$1.call(PermissionsCache.java:131)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
\
java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_131]<br>
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_131]<br>
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_131]<br>
at
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79)
[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
java.lang.Thread.run(Thread.java:748)
~[na:1.8.0_131]<br>
Caused by:
\
org.apache.cassandra.exceptions.ReadTimeoutException: Operation timed out - received
only 0 responses.<br>
at
org.apache.cassandra.service.ReadCallback.awaitResults(ReadCallback.java:132)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
\
org.apache.cassandra.service.ReadCallback.get(ReadCallback.java:137) \
~[apache-cassandra-3.0.13.jar:3.0.13]<br> at
org.apache.cassandra.service.AbstractReadExecutor.get(AbstractReadExecutor.java:145)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.service.StorageProxy$SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch(StorageProxy.java:1715)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:1664)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1605)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
\
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1524) \
~[apache-cassandra-3.0.13.jar:3.0.13]<br> at
org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:955)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:263)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:224)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.CassandraRoleManager.getRoleFromTable(CassandraRoleManager.java:520)
~[apache-cassandra-3.0.13.jar:3.0.13]<br>
at
org.apache.cassandra.auth.CassandraRoleManager.getRole(CassandraRoleManager.java:502)
~[apache-cassandra-3.0.13.jar:3.0.13]</font><br>
</blockquote>
</blockquote>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <div><br>
</div>
</blockquote>
<br>
<b>These are the values of these
params in my configuration file <br>
</b></div>
<blockquote style="margin:0px 0px 0px \
40px;border:none;padding:0px"> <div>
<blockquote class="gmail_quote" style="margin:0px \
0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex">permissions_validity_in_ms: 300000<br>
permissions_update_interval_in_ms:
20000<br>
roles_validity_in_ms: 300000<br>
roles_update_interval_in_ms: 20000<br>
</blockquote>
</div>
</blockquote>
<div>
<div><br>
</div>
<div>This was not the case earlier and
since this comes from a single app
alone we are not sure if this is
actually the issue. Can anyone
please point out if these values are
misconfigured and hence causing the
issue or is it somewhere else we
should be looking at? <br>
<br>
Any help would be appreciated. <br>
<br>
Thanks & Regards,<br>
Chahat.<br clear="all">
<div><br>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">Thanks \
& Regards,</font></span></div>
<div><font size="1" \
color="#000000"><span style="font-family:Arial;white-space:pre-wrap">Chahat \
Bhatia</span><br> </font></div>
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">Systems \
Engineer</font></span></div> <div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" \
color="#000000"><b>Evive</b></font></span></div> <div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">+91 \
7087629779</font></span></div> </div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">Thanks \
& Regards,</font></span></div>
<div><font size="1" color="#000000"><span \
style="font-family:Arial;white-space:pre-wrap">Chahat Bhatia</span><br> \
</font></div>
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">Systems \
Engineer</font></span></div>
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" \
color="#000000"><b>Evive</b></font></span></div>
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">+91 \
7087629779</font></span></div> </div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote>
</div>
<br clear="all">
<div><br>
</div>
-- <br>
<div dir="ltr">
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div>
<div dir="ltr">
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">Thanks \
& Regards,</font></span></div>
<div><font size="1" color="#000000"><span \
style="font-family:Arial;white-space:pre-wrap">Chahat Bhatia</span><br> \
</font></div>
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">Systems \
Engineer</font></span></div>
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" \
color="#000000"><b>Evive</b></font></span></div>
<div><span \
style="font-family:Arial;white-space:pre-wrap"><font size="1" color="#000000">+91 \
7087629779</font></span></div> </div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</blockquote>
</div>
</blockquote></div><br clear="all"><div><br></div>-- <br><div dir="ltr" \
class="gmail_signature"><div dir="ltr"><div><div dir="ltr"><div><div \
dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><span \
style="font-family:Arial;white-space:pre-wrap"><font color="#000000" size="1">Thanks \
& Regards,</font></span></div><div><font color="#000000" size="1"><span \
style="font-family:Arial;white-space:pre-wrap">Chahat \
Bhatia</span><br></font></div><div><span \
style="font-family:Arial;white-space:pre-wrap"><font color="#000000" size="1">Systems \
Engineer</font></span></div><div><span \
style="font-family:Arial;white-space:pre-wrap"><font color="#000000" \
size="1"><b>Evive</b></font></span></div><div><span \
style="font-family:Arial;white-space:pre-wrap"><font color="#000000" size="1">+91 \
7087629779</font></span></div></div></div></div></div></div></div></div></div></div></div></div></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic