[prev in list] [next in list] [prev in thread] [next in thread] 

List:       php-doc-bugs
Subject:    [DOC-BUGS] Doc #68556 [Opn]: Fix for zlib.deflate and zlib.inflate filters; fix for example #1
From:       "salsi at icosaedro dot it" <php-bugs () lists ! php ! net>
Date:       2016-01-21 0:35:24
Message-ID: 201601210035.u0L0ZOFa023932 () sgrv20 ! php ! net
[Download RAW message or body]

Edit report at https://bugs.php.net/bug.php?id=68556&edit=1

 ID:                 68556
 User updated by:    salsi at icosaedro dot it
 Reported by:        salsi at icosaedro dot it
-Summary:            zlib.deflate is not the reverse of zlib.inflate
+Summary:            Fix for zlib.deflate and zlib.inflate filters; fix
                     for example #1
 Status:             Open
 Type:               Documentation Problem
 Package:            Streams related
 PHP Version:        Irrelevant
 Block user comment: N
 Private report:     N

 New Comment:

Completing description with the zlib.inflate filter:


zlib.inflate filter
-------------------

Only the 'window' parameter is allowed; any other parameter (memory, level) is \
ignored. Be $W the base-2 log of the history buffer size, so that 2^$W bytes are \
allocated by the decompressor. This value must be greater or equal to the value used \
for compression, because no attempt is made to reallocate. The range is 9 <= $W <= \
15. If unknown, $W=15 is the safer choice.

- DEFLATE (RFC 1951): use window=-$W, with $W being a value not less than that used \
for compression. If unknown, set window=-15.

- ZLIB (RFC 1950): use window=$W. The value of $W is available from the same ZLIB \
header, otherwise set window=15.

- GZIP (RFC 1952): use window=$W+16. The value of $W is available from the same ZLIB \
header, otherwise set window=31.

- ZLIB or GZIP: use window=$W+32 for automatic header detection, so that both the \
formats can be recognized and decompressed; window=15+32=47 is the safer choice.


Error detection:

- If the window parameter is not one of those listed above, an error MAY be \
generated:

	E_WARNING: stream_filter_append(): Invalid parameter give for window size.

- If the $W value is less than that used for compression, an empty string or a short \
read MAY result, but no error is triggered nor exception is thrown. This last issue \
reveals when highly compressible data (say, a source program at least 40 KB long) \
that were compressed with large window (window=15+16), are then decompressed with a \
smaller window (window=9+16).

- On checksum fail, corrupted or incomplete data are returned, but no error is \
triggered nor exception is thrown.


(About the missing detection of decoding errors, see bug #71417.)


Previous Comments:
------------------------------------------------------------------------
[2016-01-19 12:54:41] salsi at icosaedro dot it

Related To: Bug #71396

------------------------------------------------------------------------
[2016-01-19 12:54:41] salsi at icosaedro dot it

Related To: Bug #71396

------------------------------------------------------------------------
[2016-01-19 12:39:46] salsi at icosaedro dot it

Once read alll the RFC 1950-1952, checked the zlib manual, read here and there the C \
source that implements zlib.deflate, and having experimented myself (see bug #71396 \
for a test program), I finally came to the conclusion that explains why the example#1 \
cannot work, and why that manual page needs a deep update:

http://php.net/manual/en/filters.compression.php

I'm not a doc contributor, but the following text might be a start:


zlib.deflate: what it really does and what the 'window' parameter really means
------------------------------------------------------------------------------

The zlib.deflate filters implements the compression methods DEFLATE, ZLIB and GZIP \
depending on the value of the 'window' parameter.

The 4 lower bits of the window parameter set the size of the internal "history \
buffer" used, being the base-2 logarithm of its size in a range from 8 up to 15. The \
meaning of the others bits of the window parameter can be set as described below.


- DEFLATE (RFC 1950) is a raw compression algorithm without header, without checksum. \
It is performed when the window parameter is set in the range from -8 up to -15. This \
compression algorithm is the base for all the formats generated by the zlib.deflate \
filter. The corresponding functions that operates directly on string are gzdeflate() \
and gzdeflate().


- ZLIB (RFC 1951) applies the DEFLATE algorithm and adds a 2 bytes header and 4 bytes \
trailer containing the Adler32 checksum of the uncompressed data in little-endian \
byte order:

        ZLIB = ZLIBHEADER(2B)  DEFLATE  ADLER32(4B)

The 2 B header, read as 16 bit unsigned number in little-endian order, must be \
multiple of 31. This format is generated when the window parameter is set in the \
range from 8 up to 15. The corresponding functions that operates directly on string \
are gzcompress() and gzuncompress().


- GZIP (RFC 1952) applies the DEFLATE algorithm adding an header and a trailer with \
the CRC32 checksum of the uncompressed data and their length, both in little-endian \
byte ordering:

        GZIP = GZIPHEADER(10B)  DEFLATE  CRC32(4B)  LENGTH(4B)

This is the format of the .gz files, with the GZIPHEADER containing in the order the \
GZIP signature ("\x1f\x8B"), the compression method ("\x08"), 6 zero bytes, and the \
operating system set to the current system ("\x00" = Windows, "\x03" = Unix, etc.). \
This format is generated when the window parameter is set in the range from 8+16=24 \
up to 15+16=31. Note that there is a limit of 4GB to the maximum length of the \
compressed data; beyond this limit, only the modulo 2^32 of the actual length is \
stored in the LENGHT part. The corresponding functions that operates directly on \
string are gzencode() and gzdecode(); the gzopen() function allows to read and write \
.gz files.


By default window=-15, so generating a DEFLATE compressed stream with an internal \
history buffer of 2^15 B. Invalid window parameter may be detected and a quite \
obscure error can be generated:

       E_WARNING: unable to create or locate filter "zlib.deflate"


All this explains why the example #1 does not work and nothing can be read back. In \
fact using window=15 it generates ZLIB which the following \
readfile('php://filter/zlib.inflate/resource=test.deflated'); statement cannot read \
because it is expecting DEFLATE. The same example works setting the default \
window=-15.

The remaining big question is why the original example#1 does not work but no error \
is shown. But this is another story.

------------------------------------------------------------------------
[2014-12-06 07:14:58] salsi at icosaedro dot it

Description:
------------
Example #1 at http://php.net/manual/it/filters.compression.php is wrong, in fact it \
does not work, and the read file is not decoded at all. The reason is that the \
zlib.deflate filter actually generates a ZLIB Compress format (RFC 1950), NOT a \
DEFLATE format (RFC 1951), while zlib.inflate decompresses the DEFLATE, NOT the ZLIB \
Compress format; these two stream filters are not one the reverse of the other. So \
the situation is quite asymmetrical:

DEFLATE compressed format (RFC 1951):
gzdeflate() --> compress
gzinflate() and zlib.inflate --> decompress

ZLIB Compress format (RFC 1950: 2 bytes header + DEFLATE + ADLER32):
gzcompress() and zlib.deflate --> compress
gzuncompress() --> decompress

That's why in some users' comments one may find the strange recipe that states \
"zlib.inflate works on DEFLATE compressed data, but you have to strip away the first \
two bytes": those two bytes are in fact the ZLIB Compress header, as the DEFLATE \
compressed data are actually ZLIB Compress; the final ADLER32 hash gets ignored by \
the DEFLATE decompressor anyway as garbage.

I'm unsure if this is more a flawed API design than a doc issue.

Test script:
---------------
Example #1 with my fix:
<?php
$params = array('level' => 6, 'window' => 15, 'memory' => 9);

$original_text = "This is a test.\nThis is only a test.\nThis is not an important \
string.\n"; echo "The original text is " . strlen($original_text) . " characters \
long.\n";

$fp = fopen('test.deflated', 'w');
stream_filter_append($fp, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
fwrite($fp, $original_text);
fclose($fp);

echo "The compressed file is " . filesize('test.deflated') . " bytes long.\n";
echo "The original text was:\n";

# WRONG: does not show the original message:
####/* Use readfile and zlib.inflate to decompress on the fly */
####readfile('php://filter/zlib.inflate/resource=test.deflated');

# FIX:
echo gzuncompress(file_get_contents("test.deflated"));

/* Generates output:
 *
 * The original text is 70 characters long.
 * The compressed file is 56 bytes long.
 * The original text was:
 * This is a test.
 * This is only a test.
 * This is not an important string.
 *
 */
?>




------------------------------------------------------------------------



--
Edit this bug report at https://bugs.php.net/bug.php?id=68556&edit=1

-- 
PHP Documentation Bugs Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic