[prev in list] [next in list] [prev in thread] [next in thread]
List: zlib-devel
Subject: [Zlib-devel] deflateSetDictionary(): How to determine "most commonly used strings"?
From: madler () alumni ! caltech ! edu (Mark Adler)
Date: 2010-04-17 15:23:37
Message-ID: 64E8A8B8-0D48-42F3-A469-C16DC476535B () alumni ! caltech ! edu
[Download RAW message or body]
On Apr 16, 2010, at 4:07 PM, John Bowler wrote:
> the dictionary just behaves as though it was prefixed on front of the data to be \
> compressed.
...
> So far as I can see the dictionary can only have an effect on the first window-size \
> bytes of the uncompressed data, because after that the dictionary bytes are no \
> longer visible to the compression algorithm.
Correct.
> 1) HTML files all start the same way. The most obvious thing to put, right at the \
> *start* of the dictionary, is that block starting "<html..."; that's an instant \
> saving of 20 or more bytes.
Actually you want to put the most likely strings to be repeated at the end of the \
dictionary, not the start. The end of the dictionary will provide shorter distances, \
which are coded in fewer bits.
On Apr 16, 2010, at 6:37 PM, Greg Roelofs wrote:
> This is not a commonly used option; it adds complexity with relatively
> little benefit except when compressing a whole bunch of similar, small
> files.
In fact I know of an application that used text messages for transmitting vending \
machine data that greatly benefitted from this, where it would use the previous up to \
32K of messages (which was many messages) as the dictionary for the next message. \
There was a return path for retransmission, so if the receiver lost lock on the \
dictionary, the message was sent with no dictionary and the process started over.
On Apr 17, 2010, at 1:10 AM, Peter Elmer wrote:
> I've been meaning since some time to ask this same question regarding a
> zlib interface for dictionary discovery.
Actually, you don't need a zlib interface. You can get the same information from the \
compressed data itself. infgen ( http://zlib.net/infgen.c.gz ) will "disassemble" a \
deflate stream into readable descriptions of the contents. The matches could perhaps \
be used to aid in dictionary creation.
Mark
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic