[prev in list] [next in list] [prev in thread] [next in thread]
List: php-doc-bugs
Subject: [DOC-BUGS] Doc #68556 [Opn]: Fix for zlib.deflate and zlib.inflate filters; fix for example #1
From: "salsi at icosaedro dot it" <php-bugs () lists ! php ! net>
Date: 2016-01-21 0:35:24
Message-ID: 201601210035.u0L0ZOFa023932 () sgrv20 ! php ! net
[Download RAW message or body]
Edit report at https://bugs.php.net/bug.php?id=68556&edit=1
ID: 68556
User updated by: salsi at icosaedro dot it
Reported by: salsi at icosaedro dot it
-Summary: zlib.deflate is not the reverse of zlib.inflate
+Summary: Fix for zlib.deflate and zlib.inflate filters; fix
for example #1
Status: Open
Type: Documentation Problem
Package: Streams related
PHP Version: Irrelevant
Block user comment: N
Private report: N
New Comment:
Completing description with the zlib.inflate filter:
zlib.inflate filter
-------------------
Only the 'window' parameter is allowed; any other parameter (memory, level) is \
ignored. Be $W the base-2 log of the history buffer size, so that 2^$W bytes are \
allocated by the decompressor. This value must be greater or equal to the value used \
for compression, because no attempt is made to reallocate. The range is 9 <= $W <= \
15. If unknown, $W=15 is the safer choice.
- DEFLATE (RFC 1951): use window=-$W, with $W being a value not less than that used \
for compression. If unknown, set window=-15.
- ZLIB (RFC 1950): use window=$W. The value of $W is available from the same ZLIB \
header, otherwise set window=15.
- GZIP (RFC 1952): use window=$W+16. The value of $W is available from the same ZLIB \
header, otherwise set window=31.
- ZLIB or GZIP: use window=$W+32 for automatic header detection, so that both the \
formats can be recognized and decompressed; window=15+32=47 is the safer choice.
Error detection:
- If the window parameter is not one of those listed above, an error MAY be \
generated:
E_WARNING: stream_filter_append(): Invalid parameter give for window size.
- If the $W value is less than that used for compression, an empty string or a short \
read MAY result, but no error is triggered nor exception is thrown. This last issue \
reveals when highly compressible data (say, a source program at least 40 KB long) \
that were compressed with large window (window=15+16), are then decompressed with a \
smaller window (window=9+16).
- On checksum fail, corrupted or incomplete data are returned, but no error is \
triggered nor exception is thrown.
(About the missing detection of decoding errors, see bug #71417.)
Previous Comments:
------------------------------------------------------------------------
[2016-01-19 12:54:41] salsi at icosaedro dot it
Related To: Bug #71396
------------------------------------------------------------------------
[2016-01-19 12:54:41] salsi at icosaedro dot it
Related To: Bug #71396
------------------------------------------------------------------------
[2016-01-19 12:39:46] salsi at icosaedro dot it
Once read alll the RFC 1950-1952, checked the zlib manual, read here and there the C \
source that implements zlib.deflate, and having experimented myself (see bug #71396 \
for a test program), I finally came to the conclusion that explains why the example#1 \
cannot work, and why that manual page needs a deep update:
http://php.net/manual/en/filters.compression.php
I'm not a doc contributor, but the following text might be a start:
zlib.deflate: what it really does and what the 'window' parameter really means
------------------------------------------------------------------------------
The zlib.deflate filters implements the compression methods DEFLATE, ZLIB and GZIP \
depending on the value of the 'window' parameter.
The 4 lower bits of the window parameter set the size of the internal "history \
buffer" used, being the base-2 logarithm of its size in a range from 8 up to 15. The \
meaning of the others bits of the window parameter can be set as described below.
- DEFLATE (RFC 1950) is a raw compression algorithm without header, without checksum. \
It is performed when the window parameter is set in the range from -8 up to -15. This \
compression algorithm is the base for all the formats generated by the zlib.deflate \
filter. The corresponding functions that operates directly on string are gzdeflate() \
and gzdeflate().
- ZLIB (RFC 1951) applies the DEFLATE algorithm and adds a 2 bytes header and 4 bytes \
trailer containing the Adler32 checksum of the uncompressed data in little-endian \
byte order:
ZLIB = ZLIBHEADER(2B) DEFLATE ADLER32(4B)
The 2 B header, read as 16 bit unsigned number in little-endian order, must be \
multiple of 31. This format is generated when the window parameter is set in the \
range from 8 up to 15. The corresponding functions that operates directly on string \
are gzcompress() and gzuncompress().
- GZIP (RFC 1952) applies the DEFLATE algorithm adding an header and a trailer with \
the CRC32 checksum of the uncompressed data and their length, both in little-endian \
byte ordering:
GZIP = GZIPHEADER(10B) DEFLATE CRC32(4B) LENGTH(4B)
This is the format of the .gz files, with the GZIPHEADER containing in the order the \
GZIP signature ("\x1f\x8B"), the compression method ("\x08"), 6 zero bytes, and the \
operating system set to the current system ("\x00" = Windows, "\x03" = Unix, etc.). \
This format is generated when the window parameter is set in the range from 8+16=24 \
up to 15+16=31. Note that there is a limit of 4GB to the maximum length of the \
compressed data; beyond this limit, only the modulo 2^32 of the actual length is \
stored in the LENGHT part. The corresponding functions that operates directly on \
string are gzencode() and gzdecode(); the gzopen() function allows to read and write \
.gz files.
By default window=-15, so generating a DEFLATE compressed stream with an internal \
history buffer of 2^15 B. Invalid window parameter may be detected and a quite \
obscure error can be generated:
E_WARNING: unable to create or locate filter "zlib.deflate"
All this explains why the example #1 does not work and nothing can be read back. In \
fact using window=15 it generates ZLIB which the following \
readfile('php://filter/zlib.inflate/resource=test.deflated'); statement cannot read \
because it is expecting DEFLATE. The same example works setting the default \
window=-15.
The remaining big question is why the original example#1 does not work but no error \
is shown. But this is another story.
------------------------------------------------------------------------
[2014-12-06 07:14:58] salsi at icosaedro dot it
Description:
------------
Example #1 at http://php.net/manual/it/filters.compression.php is wrong, in fact it \
does not work, and the read file is not decoded at all. The reason is that the \
zlib.deflate filter actually generates a ZLIB Compress format (RFC 1950), NOT a \
DEFLATE format (RFC 1951), while zlib.inflate decompresses the DEFLATE, NOT the ZLIB \
Compress format; these two stream filters are not one the reverse of the other. So \
the situation is quite asymmetrical:
DEFLATE compressed format (RFC 1951):
gzdeflate() --> compress
gzinflate() and zlib.inflate --> decompress
ZLIB Compress format (RFC 1950: 2 bytes header + DEFLATE + ADLER32):
gzcompress() and zlib.deflate --> compress
gzuncompress() --> decompress
That's why in some users' comments one may find the strange recipe that states \
"zlib.inflate works on DEFLATE compressed data, but you have to strip away the first \
two bytes": those two bytes are in fact the ZLIB Compress header, as the DEFLATE \
compressed data are actually ZLIB Compress; the final ADLER32 hash gets ignored by \
the DEFLATE decompressor anyway as garbage.
I'm unsure if this is more a flawed API design than a doc issue.
Test script:
---------------
Example #1 with my fix:
<?php
$params = array('level' => 6, 'window' => 15, 'memory' => 9);
$original_text = "This is a test.\nThis is only a test.\nThis is not an important \
string.\n"; echo "The original text is " . strlen($original_text) . " characters \
long.\n";
$fp = fopen('test.deflated', 'w');
stream_filter_append($fp, 'zlib.deflate', STREAM_FILTER_WRITE, $params);
fwrite($fp, $original_text);
fclose($fp);
echo "The compressed file is " . filesize('test.deflated') . " bytes long.\n";
echo "The original text was:\n";
# WRONG: does not show the original message:
####/* Use readfile and zlib.inflate to decompress on the fly */
####readfile('php://filter/zlib.inflate/resource=test.deflated');
# FIX:
echo gzuncompress(file_get_contents("test.deflated"));
/* Generates output:
*
* The original text is 70 characters long.
* The compressed file is 56 bytes long.
* The original text was:
* This is a test.
* This is only a test.
* This is not an important string.
*
*/
?>
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=68556&edit=1
--
PHP Documentation Bugs Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic