[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cassandra-dev
Subject:    Re: [DISCUSSION] If we fix code that used default encoding to now be UTF-8... is this a regression?
From:       Derek Chen-Becker <derek () chen-becker ! org>
Date:       2022-11-29 17:26:21
Message-ID: CAMbmz5=Bzg7h7dKMJi5+MD+dnsqCdn4GYy+BjyeeC4L9aCj8+Q () mail ! gmail ! com
[Download RAW message or body]

As an initial step, could we introduce some sort of log warning,
metric or other indicator for operators to determine if they're
running with a non-UTF-8 encoding?

On Mon, Nov 28, 2022 at 1:21 PM David Capwell <dcapwell@apple.com> wrote:
> 
> It probably has to be done on a  case-by-case basis
> 
> 
> Yeah, this is what I feel as well…
> 
> Does the linter provide more detail than just the list?
> 
> 
> Not really, it shows how to fix but can't really say if the fix will cause \
> issues… If you are not running with UTF-8 we do the right thing most of the time, \
> but some files "may" break… this would also be true if you backup/restore these \
> files on a different environment... 
> 
> On Nov 10, 2022, at 12:44 PM, Derek Chen-Becker <derek@chen-becker.org> wrote:
> 
> This seems fraught with peril. I think that it should be fixed, but I
> also wonder what the testing requirements would be to validate no
> regression. It probably has to be done on a  case-by-case basis. Is it
> as simple as auditing places where we're calling getBytes or
> PrintReader/PrintWriter without an explicit encoding? Some of them,
> like https://github.com/apache/cassandra/blob/30ad754d7e95501ffa916bf986e4cfda1aa5e441/src/java/org/apache/cassandra/tools/HashPassword.java#L128,
>  look like that would be easy to address, but others seem like they
> could be complicated.
> 
> Does the linter provide more detail than just the list?
> 
> Cheers,
> 
> Derek
> 
> On Fri, Nov 4, 2022 at 2:09 PM David Capwell <dcapwell@apple.com> wrote:
> 
> 
> Testing out linter trying to see if it can solve a case for Simulator and see we \
> have 25 cases where we don't add the encoding and rely on default, which is based \
> off the system… 
> If we attempt to fix these cases, I am wondering if this is a regression… it \
> "might" be the case someone set -Dfile.encoding=ascii or updated env LANG to \
> something non-UTF based… 
> Here is the list reported
> 
> org.apache.cassandra.cql3.functions.JavaBasedUDFunction since first historized \
> release org.apache.cassandra.db.ColumnFamilyStore since first historized release
> org.apache.cassandra.db.compaction.CompactionLogger$CompactionLogSerializer since \
> first historized release org.apache.cassandra.db.filter.RowFilter$CustomExpression \
> since first historized release org.apache.cassandra.db.lifecycle.LogTransaction \
> since first historized release org.apache.cassandra.gms.FailureDetector since first \
> historized release org.apache.cassandra.index.sasi.analyzer.StandardTokenizerImpl \
> since first historized release org.apache.cassandra.io.sstable.SSTable since first \
> historized release org.apache.cassandra.io.util.FileReader since first historized \
> release org.apache.cassandra.io.util.FileReader since first historized release
> org.apache.cassandra.io.util.FileWriter since first historized release
> org.apache.cassandra.io.util.FileWriter since first historized release
> org.apache.cassandra.metrics.SamplingManager since first historized release
> org.apache.cassandra.metrics.SamplingManager since first historized release
> org.apache.cassandra.schema.IndexMetadata since first historized release
> org.apache.cassandra.security.PEMBasedSslContextFactory since first historized \
> release org.apache.cassandra.tools.HashPassword since first historized release
> org.apache.cassandra.tools.JMXTool$Dump$Format$3 since first historized release
> org.apache.cassandra.tools.NodeTool$NodeToolCmd since first historized release
> org.apache.cassandra.tools.SSTableMetadataViewer since first historized release
> org.apache.cassandra.transport.Client since first historized release
> org.apache.cassandra.utils.ByteArrayUtil since first historized release
> org.apache.cassandra.utils.FBUtilities since first historized release
> org.apache.cassandra.utils.GuidGenerator since first historized release
> org.apache.cassandra.utils.HeapUtils since first historized release
> 
> 
> 
> --
> +---------------------------------------------------------------+
> > Derek Chen-Becker                                             |
> > GPG Key available at https://keybase.io/dchenbecker and       |
> > https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> > Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
> +---------------------------------------------------------------+
> 
> 


-- 
+---------------------------------------------------------------+
> Derek Chen-Becker                                             |
> GPG Key available at https://keybase.io/dchenbecker and       |
> https://pgp.mit.edu/pks/lookup?search=derek%40chen-becker.org |
> Fngrprnt: EB8A 6480 F0A3 C8EB C1E7  7F42 AFC5 AFEE 96E4 6ACC  |
+---------------------------------------------------------------+


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic