[prev in list] [next in list] [prev in thread] [next in thread]
List: hadoop-dev
Subject: Re: ZStandard compression crashes
From: Wei-Chiu Chuang <weichiu () cloudera ! com ! INVALID>
Date: 2020-06-26 20:39:42
Message-ID: CADiq6=wzconvpmgYVfouQF-ELDckLmmLHROQ+n8Pm67Vy5nhdA () mail ! gmail ! com
[Download RAW message or body]
A similar bug was reported: HADOOP-17096
<https://issues.apache.org/jira/browse/HADOOP-17096>
On Mon, May 11, 2020 at 3:48 PM Eric Yang <eyang@apache.org> wrote:
> If I recall this problem correctly, the root cause is the default zstd
> compression block size is 256kb, and Hadoop Zstd compression will attempt
> to use the OS platform default compression size, if it is available. The
> recommended output size is slightly bigger than input size to account for
> header size in Zstd compression.
> http://software.icecube.wisc.edu/coverage/00_LATEST/icetray/private/zstd/lib/compress/zstd_compress.c.gcov.html#2982
>
> Where, Hadoop code
> https://github.com/apache/hadoop/blame/trunk/hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/compress/zstd/ZStandardCompressor.c#L259 \
> is setting output size to the same as input size, if input size is bigger than
> output size. By manually setting buffer size to a small value, input size
> will be smaller than recommended output size to keep the system working.
> By returning ZTD_CStreamOutSize() in getSteramSize, it may enable the
> system to work without a predefined default.
>
> On Mon, May 11, 2020 at 2:29 PM Wei-Chiu Chuang
> <weichiu@cloudera.com.invalid> wrote:
>
> > Thanks for the pointer, it does look similar. However we are roughly on
> > the
> > latest of branch-3.1 and this fix is in our branch. I'm pretty sure we
> > have
> > all the zstd fixes.
> >
> > I believe the libzstd version used is 1.4.4 but need to confirm. I
> > suspected it's a library version issue because we've been using zstd
> > compression for over a year, and this bug (reproducible) happens
> > consistently just recently.
> >
> > On Mon, May 11, 2020 at 1:57 PM Ayush Saxena <ayushtkn@gmail.com> wrote:
> >
> > > Hi Wei Chiu,
> > > What is the Hadoop version being used?
> > > Give a check if HADOOP-15822 is there, it had something similar error.
> > >
> > > -Ayush
> > >
> > > > On 11-May-2020, at 10:11 PM, Wei-Chiu Chuang <weichiu@apache.org>
> > wrote:
> > > >
> > > > Hadoop devs,
> > > >
> > > > A colleague of mine recently hit a strange issue where zstd
> > compression
> > > > codec crashes.
> > > >
> > > > Caused by: java.lang.InternalError: Error (generic)
> > > > at
> > > >
> > >
> > org.apache.hadoop.io.compress.zstd.ZStandardCompressor.deflateBytesDirect(Native
> > > > Method)
> > > > at
> > > >
> > >
> > org.apache.hadoop.io.compress.zstd.ZStandardCompressor.compress(ZStandardCompressor.java:216)
> >
> > > > at
> > > >
> > >
> > org.apache.hadoop.io.compress.CompressorStream.compress(CompressorStream.java:81)
> > > > at
> > > >
> > >
> > org.apache.hadoop.io.compress.CompressorStream.write(CompressorStream.java:76)
> > > > at
> > > >
> > >
> > org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:57)
> >
> > > > at java.io.DataOutputStream.write(DataOutputStream.java:107)
> > > > at
> > > >
> > >
> > org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.writeKVPair(IFile.java:617)
> >
> > > > at
> > > >
> > >
> > org.apache.tez.runtime.library.common.sort.impl.IFile$Writer.append(IFile.java:480)
> >
> > > >
> > > > Anyone out there hitting the similar problem?
> > > >
> > > > A temporary workaround is to set buffer size "set
> > > > io.compression.codec.zstd.buffersize=8192;"
> > > >
> > > > We suspected it's a bug in zstd library, but couldn't verify. Just
> > want
> > > to
> > > > send this out and see if I can get some luck.
> > >
> >
>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic