[prev in list] [next in list] [prev in thread] [next in thread] 

List:       hadoop-dev
Subject:    Re: ZStandard compression crashes
From:       Wei-Chiu Chuang <weichiu () cloudera ! com ! INVALID>
Date:       2020-06-26 20:39:42
Message-ID: CADiq6=wzconvpmgYVfouQF-ELDckLmmLHROQ+n8Pm67Vy5nhdA () mail ! gmail ! com
[Download RAW message or body]


A similar bug was reported: HADOOP-17096
<https://issues.apache.org/jira/browse/HADOOP-17096>

On Mon, May 11, 2020 at 3:48 PM Eric Yang <eyang@apache.org> wrote:

> If I recall this problem correctly, the root cause is the default zstd
> compression block size is 256kb, and Hadoop Zstd compression will attempt
> to use the OS platform default compression size, if it is available.  The
> recommended output size is slightly bigger than input size to account for
> header size in Zstd compression.
> http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982
>  
> Where, Hadoop code
> https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259 \
> is setting output size to the same as input size, if input size is bigger than
> output size.  By manually setting buffer size to a small value, input size
> will be smaller than recommended output size to keep the system working.
> By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
> system to work without a predefined default.
> 
> On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
> <weichiu@cloudera.com.invalid> wrote:
> 
> > Thanks for the pointer, it does look similar. However we are roughly on
> > the
> > latest of branch-3.1 and this fix is in our branch. I'm pretty sure we
> > have
> > all the zstd fixes.
> > 
> > I believe the libzstd version used is 1.4.4 but need to confirm. I
> > suspected it's a library version issue because we've been using zstd
> > compression for over a year, and this bug (reproducible) happens
> > consistently just recently.
> > 
> > On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ayushtkn@gmail.com> wrote:
> > 
> > > Hi Wei Chiu,
> > > What is the Hadoop version being used?
> > > Give a check if HADOOP-15822 is there, it had something similar error.
> > > 
> > > -Ayush
> > > 
> > > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <weichiu@apache.org>
> > wrote:
> > > > 
> > > > Hadoop devs,
> > > > 
> > > > A colleague of mine recently hit a strange issue where zstd
> > compression
> > > > codec crashes.
> > > > 
> > > > Caused by: java.lang.InternalError: Error (generic)
> > > > at
> > > > 
> > > 
> > org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> > > > Method)
> > > > at
> > > > 
> > > 
> > org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> > 
> > > > at
> > > > 
> > > 
> > org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> > > > at
> > > > 
> > > 
> > org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> > > > at
> > > > 
> > > 
> > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> > 
> > > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
> > > > at
> > > > 
> > > 
> > org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> > 
> > > > at
> > > > 
> > > 
> > org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> > 
> > > > 
> > > > Anyone out there hitting the similar problem?
> > > > 
> > > > A temporary workaround is to set buffer size "set
> > > > io.compression.codec.zstd.buffersize=8192;"
> > > > 
> > > > We suspected it's a bug in zstd library, but couldn't verify. Just
> > want
> > > to
> > > > send this out and see if I can get some luck.
> > > 
> > 
> 



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic