[prev in list] [next in list] [prev in thread] [next in thread]
List: busybox
Subject: Re: [PATCH] Add support for zstd decompression
From: Norbert Lange <nolange79 () gmail ! com>
Date: 2021-09-28 9:16:29
Message-ID: CADYdroNAByKEWCxNrrD4x3uEYRO9vkGE0tiQ_J4hMZsTTbgPTg () mail ! gmail ! com
[Download RAW message or body]
Am So., 12. Sept. 2021 um 23:12 Uhr schrieb Norbert Lange <nolange79@gmail.com>:
>
> Am So., 12. Sept. 2021 um 09:00 Uhr schrieb Jeff Pohlmeyer
> <yetanothergeek@gmail.com>:
> >
> > On Fri, Sep 10, 2021 at 7:52 PM Denys Vlasenko <vda.linux@googlemail.com> wrote:
> > > I'm getting this:
> >
> > > (add/remove: 96/0 grow/shrink: 6/2 up/down: 24743/-98) Total: 24645 bytes
>
> I can kick this down a bit by declaring all functions static,
> inlining and constant propagation does the rest.
>
> Using git/busybox as source for busybox (x86_64, gcc 10)
> GEN /tmp/build/Makefile
> function old new delta
> unpack_zstd_stream - 5070 +5070
> static.HUF_readDTableX1_wksp_bmi2 - 1755 +1755
> static.ZSTD_decompressBlock_internal - 1468 +1468
> static.ZSTD_decompressSequences_body - 1429 +1429
> ZSTD_decompressContinue - 1062 +1062
> HUF_decompress4X1_usingDTable_internal_body - 883 +883
> FSE_readNCount_body - 622 +622
> ML_defaultDTable - 520 +520
> LL_defaultDTable - 520 +520
> static.ZSTD_buildFSETable_body - 518 +518
> XXH64_digest - 494 +494
> static.FSE_decompress_usingDTable_generic - 470 +470
> ZSTD_getFrameHeader_advanced - 423 +423
> static.XXH64_update_endian - 416 +416
> ZSTD_decompressBegin_usingDDict - 391 +391
> static.ZSTD_buildSeqTable - 375 +375
> ZSTD_execSequenceEnd - 300 +300
> OF_defaultDTable - 264 +264
> ZSTD_decodeFrameHeader - 259 +259
> BIT_initDStream - 258 +258
> ZSTD_DCtx_selectFrameDDict - 234 +234
> ZSTD_safecopy - 225 +225
> ML_bits - 212 +212
> ML_base - 212 +212
> .rodata 98830 99029 +199
> ZSTD_decompressContinueStream - 177 +177
> LL_bits - 144 +144
> LL_base - 144 +144
> OF_bits - 128 +128
> OF_base - 128 +128
> unzstd_main - 126 +126
> static.HUF_decodeStreamX1 - 117 +117
> BIT_reloadDStream - 114 +114
> ZSTD_overlapCopy8 - 107 +107
> ZSTD_clearDict - 105 +105
> ZSTD_frameHeaderSize_internal - 103 +103
> HUF_decompress1X1_usingDTable_internal_body - 102 +102
> ZSTD_wildcopy - 94 +94
> static.unzstd_longopts - 81 +81
> packed_usage 34120 34198 +78
> ZSTD_getcBlockSize - 78 +78
> tar_main 1290 1360 +70
> FSE_decodeSymbolFast - 58 +58
> BIT_reloadDStreamFast - 50 +50
> setup_transformer_on_fd 155 204 +49
> FSE_decodeSymbol - 44 +44
> HUF_decodeSymbolX1 - 39 +39
> BIT_readBits - 38 +38
> ZSTD_initFseState - 34 +34
> static.dec64table - 32 +32
> static.dec32table - 32 +32
> ZSTD_fcs_fieldSize - 32 +32
> ZSTD_did_fieldSize - 32 +32
> static.ZSTD_customFree - 27 +27
> applet_main 3192 3216 +24
> BIT_endOfDStream - 22 +22
> applet_names 2747 2767 +20
> repStartValue - 12 +12
> tar_longopts 314 321 +7
> static.CSWTCH - 6 +6
> applet_suid 100 101 +1
> applet_install_loc 200 201 +1
> ------------------------------------------------------------------------------
> (add/remove: 54/0 grow/shrink: 9/0 up/down: 21035/0) Total: 21035 bytes
> text data bss dec hex filename
> 999282 16443 1856 1017581 f86ed busybox_old
> 1020376 16467 1856 1038699 fd96b busybox_unstripped
>
> >
> > > I suspect Facebook et al do not share busybox's zeal about smaller size.
>
> Particularly some bullet points for zstd are speed, so that's a bit
> beside the point ;)
> Ideally we could define some macros to get there,
> I believe the simplest assumption is, that just no one cared enough
> to cleanly separate every option.
>
> >
> > I found this comment on github[1]:
> > "There is no new magic number planned in the foreseeable future.
> > 0xFD2FB528 is intended to be the only magic number for zstd frames."
> >
> > Do you think that implies that at least the basic file format is
> > probably stable?
>
> The format is documented and even publicized as rfc8878.
> Digging through the code I already found some spots adding code to ensure
> no data is produced that old (reference) implementations cant decode
> (ie. workaround for bugs).
>
> so going with the reference implementation should be rather safe.
>
> Still I think that being able to track upstream should be the best path.
>
> I did my own patch (some time ago, just took time to clean it up),
> as far as I can see some bits are there that are missing in Jeff's patch,
> the unzstd applet is a bit more feature full and behaves like the reference.
>
> The concept for upstream sources would be to use tools/scripts
> for most changes. (documented in README.source aswell).
>
> extending that, to say cut out comments or functions that aren't used
> (anything related to compression/dictionaries) should result
> in something making upstream syncs simpler and drop like 2/3 rds of lines.
>
> $zstd_path/contrib/freestanding_lib/freestanding.py \
> --source-lib $zstd_path/lib \
> --output-lib zstd \
> -DZSTD_NO_INTRINSICS \
> -DZSTD_NO_UNUSED_FUNCTIONS \
> -DZSTD_LEGACY_SUPPORT=0 \
> -DZSTD_STATIC_LINKING_ONLY \
> -DFSE_STATIC_LINKING_ONLY \
> -DHUF_STATIC_LINKING_ONLY \
> -DXXH_STATIC_LINKING_ONLY \
> -DZSTD_ADDRESS_SANITIZER=0 \
> -DZSTD_MEMORY_SANITIZER=0 \
> -UFUZZING_BUILD_MODE_UNSAFE_FOR_PRODUCTION \
> -U__cplusplus \
> -UZSTD_DLL_EXPORT \
> -UZSTD_DLL_IMPORT \
> -UZSTD_MULTITHREAD \
> -RZSTDLIB_API=MEM_STATIC \
> -RZSTDLIB_VISIBILITY=MEM_STATIC \
> -RZSTDERRORLIB_VISIBILITY=MEM_STATIC \
> -DZSTD_HAVE_WEAK_SYMBOLS=0 \
> -DZSTD_TRACE=0 \
> -DZSTD_NO_TRACE
>
> sed -e 's,^\([[:alnum:]_\*]* ERR_[[:alnum:]_]*\)(,static \1(,' \
> -e 's,^\([[:alnum:]_\*]* FSE_[[:alnum:]_]*\) \?(,static \1(,' \
> -e 's,^\([[:alnum:]_\*]* ZSTD_[[:alnum:]_]*\) \?(,static \1(,' \
> -e 's,^\([[:alnum:]_\*]* HUF_[[:alnum:]_]*\) \?(,static \1(,' \
> -e 's,^\([[:alnum:]_\*]* HIST_[[:alnum:]_]*\)(,static \1(,' \
> -e 's,^\(const \)\?\([[:alnum:]_\*]* ZSTD_[[:alnum:]_]*\) \?(,static \1\2(,' \
> -i zstd/*/*.h
>
> Norbert
New version is in a fork: https://github.com/nolange/busybox/commits/zstdapplets
Down to around 17KB cost in size.
Few of the improvements are already upstreamed, and I made an issue
about how to proceed:
https://github.com/facebook/zstd/issues/2806
Would like to clear up how to proceed?
Norbert
_______________________________________________
busybox mailing list
busybox@busybox.net
http://lists.busybox.net/mailman/listinfo/busybox
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic