[prev in list] [next in list] [prev in thread] [next in thread]
List: cifs-protocol
Subject: Re: [cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#221014004
From: "Jeff McCashland \(He/him\) via cifs-protocol" <cifs-protocol () lists ! samba ! org>
Date: 2022-10-27 16:56:41
Message-ID: BL1PR21MB3113DF944ECDE29C5FA690ADA3339 () BL1PR21MB3113 ! namprd21 ! prod ! outlook ! com
[Download RAW message or body]
Hi Douglas,
Thank you for the fast response. I will continue digging into this.
Best regards,
Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open \
Specifications Team
Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific \
Time (US and Canada) Local country phone number found here: \
http://support.microsoft.com/globalenglish | Extension 1138300
-----Original Message-----
From: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
Sent: Wednesday, October 26, 2022 2:10 PM
To: Jeff McCashland (He/him) <jeffm@microsoft.com>; cifs-protocol@lists.samba.org; \
Samuel Cabrero (Samba) <scabrero@samba.org>
Cc: Microsoft Support <supportmail@microsoft.com>
Subject: Re: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - \
TrackingID#2210140040006030
hi Jeff,
Thanks. Yes, I think you're understanding correctly, and that is a valid answer, and \
I *would* have happily accepted it, but in the meantime I have had the misfortune of \
re-reading MS-XCA again and again, and I now believe it contradicts this view.
In 2.1.4.3 there is:
The following pseudocode demonstrates the encoding method.
Write the 256-byte table of symbol bit lengths
While there are more literals or matches to encode
[[write bits per the algorithm, not in question here]]
WriteBits(SymbolLength[256], SymbolCode[256])
FlushBits()
This appears to be encoding a single block (there's one 256-byte table), and it ends \
with the FlushBits(), which is essentially the "ignore ghi..." in my example. However \
it also has a "WriteBits(SymbolLength[256], SymbolCode[256])", which I understand \
should only happen at the end of the last block.
I think it would be accurate to say this pseudocode "demonstrates the encoding method \
for a message of 65536 or fewer bytes", but is unclear for multi-block messages.
And in section 2.2.4 the main decompression pseudocode loop starts like:
Loop until a decompression terminating condition
Build the decoding table
CurrentPosition = 256 // start at the end of the Huffman table
NextBits = Read16Bits(InputBuffer + CurrentPosition)
CurrentPosition += 2
NextBits <<= 16
NextBits |= Read16Bits(InputBuffer + CurrentPosition)
CurrentPosition += 2
ExtraBitCount = 16
which suggests that the bits "ghi..." are discarded because we are told implicitly in \
the text that Read16bits shifts input into a 32 bit register -- if we call it twice \
at the beginning of each block, whatever was in the register has to fall out the \
other end.
cheers,
Douglas
On 27/10/22 05:48, Jeff McCashland (He/him) wrote:
> Hi Douglas,
>
> As I understand, each 64k block is processed separately. In other words, the first \
> 64k block is LZ77 compressed, then the Huffman codes are constructed based on \
> symbol frequency in that 64k. If, in your example, DEF ends the 64k block, then the \
> subsequent ghi... will be processed with the second 64k block and Huffman table, \
> and not dropped.
> Am I understanding your question correctly?
>
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft
> Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone:
> (UTC-08:00) Pacific Time (US and Canada) Local country phone number
> found here:
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsuppo
> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
> com%7C2eed4b9f9fda44b7458d08dab79674a8%7C72f988bf86f141af91ab2d7cd011d
> b47%7C1%7C0%7C638024154089025057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> amp;sdata=gToAFtZwgAnmsIJgYkP9aVhup%2Blb5zsDN%2Bajebbzh18%3D&reser
> ved=0 | Extension 1138300
>
> -----Original Message-----
> From: Jeff McCashland (He/him)
> Sent: Friday, October 14, 2022 1:58 PM
> To: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>;
> cifs-protocol@lists.samba.org; Samuel Cabrero (Samba)
> <scabrero@samba.org>
> Cc: Microsoft Support <supportmail@microsoft.com>
> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks
> - TrackingID#2210140040006030
>
> [Tom to BCC]
>
> Hi Douglas,
>
> I'll research this question and let you know what I learn.
>
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft
> Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone:
> (UTC-08:00) Pacific Time (US and Canada) Local country phone number
> found here:
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsuppo
> rt.microsoft.com%2Fglobalenglish&data=05%7C01%7Cjeffm%40microsoft.
> com%7C2eed4b9f9fda44b7458d08dab79674a8%7C72f988bf86f141af91ab2d7cd011d
> b47%7C1%7C0%7C638024154089025057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> amp;sdata=gToAFtZwgAnmsIJgYkP9aVhup%2Blb5zsDN%2Bajebbzh18%3D&reser
> ved=0 | Extension 1138300
>
> -----Original Message-----
> From: Tom Jebo <tomjebo@microsoft.com>
> Sent: Friday, October 14, 2022 9:45 AM
> To: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>;
> cifs-protocol@lists.samba.org; Samuel Cabrero (Samba)
> <scabrero@samba.org>
> Cc: Microsoft Support <supportmail@microsoft.com>
> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks
> - TrackingID#2210140040006030
>
> [dochelp to bcc]
> [casemail cc]
>
> Hi Douglas,
>
> Thank you for your request. One of the Open Specifications team will respond to \
> start working with you. I have created case 2210140040006030 and added the number \
> to the subject of this email. Please refer to this case number in future \
> communications regarding this issue.
> Best regards,
> Tom Jebo
> Sr Escalation Engineer
> Microsoft Open Specifications
>
> -----Original Message-----
> From: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
> Sent: Thursday, October 13, 2022 9:57 PM
> To: Interoperability Documentation Help <dochelp@microsoft.com>;
> cifs-protocol@lists.samba.org; Samuel Cabrero (Samba)
> <scabrero@samba.org>
> Subject: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks
>
> hi Dochelp,
>
>
> Does the beginning of the second and subsequent blocks break the bitstream, \
> starting again at a byte boundary after the new Huffman table?
> The question is best explained by analogy to the way long lengths are handled in \
> matches. Suppose we have a match symbol in the middle of a bitstream, and the match \
> is a long one, requiring the reading of an extra byte:
> ijklmnop abcDEFgh [distance] qrs...
> >
> [match 1, 15]
>
> Here abc, ghi.. are the sequence of bits in the stream around the match DEF, which \
> is read in alternating bytes by little-endian rules, and the distance is plonked in \
> the middle of the stream as an individual byte. The stream just flows around it, so \
> gh-ijklmnop are interpreted after [distance].
> Now, if DEF instead ended the block:
>
> ijklmnop abcDEFgh [new Huffman table] qrs...
> >
> [ends the block (64k)]
>
>
> would the bits gh-jklmnop be interpreted using the new Huffman table, as part of \
> the new block, or would those bits be dropped?
> Multi-block examples would of course be helpful.
>
>
> Douglas
>
>
_______________________________________________
cifs-protocol mailing list
cifs-protocol@lists.samba.org
https://lists.samba.org/mailman/listinfo/cifs-protocol
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic