[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cifs-protocol
Subject:    Re: [cifs-protocol] [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - TrackingID#221014004
From:       "Jeff McCashland \(He/him\) via cifs-protocol" <cifs-protocol () lists ! samba ! org>
Date:       2022-10-27 16:56:41
Message-ID: BL1PR21MB3113DF944ECDE29C5FA690ADA3339 () BL1PR21MB3113 ! namprd21 ! prod ! outlook ! com
[Download RAW message or body]

Hi Douglas,

Thank you for the fast response. I will continue digging into this. 

Best regards,
Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft Protocol Open \
                Specifications Team 
Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: (UTC-08:00) Pacific \
Time (US and Canada) Local country phone number found here: \
http://support.microsoft.com/globalenglish | Extension 1138300

-----Original Message-----
From: Douglas Bagnall <douglas.bagnall@catalyst.net.nz> 
Sent: Wednesday, October 26, 2022 2:10 PM
To: Jeff McCashland (He/him) <jeffm@microsoft.com>; cifs-protocol@lists.samba.org; \
                Samuel Cabrero (Samba) <scabrero@samba.org>
Cc: Microsoft Support <supportmail@microsoft.com>
Subject: Re: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks - \
TrackingID#2210140040006030

hi Jeff,

Thanks. Yes, I think you're understanding correctly, and that is a valid answer, and \
I *would* have happily accepted it, but in the meantime I have had the misfortune of \
re-reading MS-XCA again and again, and I now believe it contradicts this view.

In 2.1.4.3 there is:

   The following pseudocode demonstrates the encoding method.

        Write the 256-byte table of symbol bit lengths
        While there are more literals or matches to encode
           [[write bits per the algorithm, not in question here]]
        WriteBits(SymbolLength[256], SymbolCode[256])
        FlushBits()

This appears to be encoding a single block (there's one 256-byte table), and it ends \
with the FlushBits(), which is essentially the "ignore ghi..." in my example. However \
it also has a "WriteBits(SymbolLength[256], SymbolCode[256])", which I understand \
should only happen at the end of the last block.

I think it would be accurate to say this pseudocode "demonstrates the encoding method \
for a message of 65536 or fewer bytes", but is unclear for multi-block messages.


And in section 2.2.4 the main decompression pseudocode loop starts like:

   Loop until a decompression terminating condition
       Build the decoding table
       CurrentPosition = 256     // start at the end of the Huffman table
       NextBits = Read16Bits(InputBuffer + CurrentPosition)
       CurrentPosition += 2
       NextBits <<= 16
       NextBits |= Read16Bits(InputBuffer + CurrentPosition)
       CurrentPosition += 2
       ExtraBitCount = 16


which suggests that the bits "ghi..." are discarded because we are told implicitly in \
the text that Read16bits shifts input into a 32 bit register -- if we call it twice \
at the beginning of each block, whatever was in the register has to fall out the \
other end.

cheers,
Douglas


On 27/10/22 05:48, Jeff McCashland (He/him) wrote:
> Hi Douglas,
> 
> As I understand, each 64k block is processed separately. In other words, the first \
> 64k block is LZ77 compressed, then the Huffman codes are constructed based on \
> symbol frequency in that 64k. If, in your example, DEF ends the 64k block, then the \
> subsequent ghi... will be processed with the second 64k block and Huffman table, \
> and not dropped. 
> Am I understanding your question correctly?
> 
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
> Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: 
> (UTC-08:00) Pacific Time (US and Canada) Local country phone number 
> found here: 
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsuppo
> rt.microsoft.com%2Fglobalenglish&amp;data=05%7C01%7Cjeffm%40microsoft.
> com%7C2eed4b9f9fda44b7458d08dab79674a8%7C72f988bf86f141af91ab2d7cd011d
> b47%7C1%7C0%7C638024154089025057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> amp;sdata=gToAFtZwgAnmsIJgYkP9aVhup%2Blb5zsDN%2Bajebbzh18%3D&amp;reser
> ved=0 | Extension 1138300
> 
> -----Original Message-----
> From: Jeff McCashland (He/him)
> Sent: Friday, October 14, 2022 1:58 PM
> To: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>; 
> cifs-protocol@lists.samba.org; Samuel Cabrero (Samba) 
> <scabrero@samba.org>
> Cc: Microsoft Support <supportmail@microsoft.com>
> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks 
> - TrackingID#2210140040006030
> 
> [Tom to BCC]
> 
> Hi Douglas,
> 
> I'll research this question and let you know what I learn.
> 
> Best regards,
> Jeff McCashland (He/him) | Senior Escalation Engineer | Microsoft 
> Protocol Open Specifications Team
> Phone: +1 (425) 703-8300 x38300 | Hours: 9am-5pm | Time zone: 
> (UTC-08:00) Pacific Time (US and Canada) Local country phone number 
> found here: 
> https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsuppo
> rt.microsoft.com%2Fglobalenglish&amp;data=05%7C01%7Cjeffm%40microsoft.
> com%7C2eed4b9f9fda44b7458d08dab79674a8%7C72f988bf86f141af91ab2d7cd011d
> b47%7C1%7C0%7C638024154089025057%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wL
> jAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&
> amp;sdata=gToAFtZwgAnmsIJgYkP9aVhup%2Blb5zsDN%2Bajebbzh18%3D&amp;reser
> ved=0 | Extension 1138300
> 
> -----Original Message-----
> From: Tom Jebo <tomjebo@microsoft.com>
> Sent: Friday, October 14, 2022 9:45 AM
> To: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>; 
> cifs-protocol@lists.samba.org; Samuel Cabrero (Samba) 
> <scabrero@samba.org>
> Cc: Microsoft Support <supportmail@microsoft.com>
> Subject: RE: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks 
> - TrackingID#2210140040006030
> 
> [dochelp to bcc]
> [casemail cc]
> 
> Hi Douglas,
> 
> Thank you for your request. One of the Open Specifications team will respond to \
> start working with you. I have created case 2210140040006030 and added the number \
> to the subject of this email. Please refer to this case number in future \
> communications regarding this issue. 
> Best regards,
> Tom Jebo
> Sr Escalation Engineer
> Microsoft Open Specifications
> 
> -----Original Message-----
> From: Douglas Bagnall <douglas.bagnall@catalyst.net.nz>
> Sent: Thursday, October 13, 2022 9:57 PM
> To: Interoperability Documentation Help <dochelp@microsoft.com>; 
> cifs-protocol@lists.samba.org; Samuel Cabrero (Samba) 
> <scabrero@samba.org>
> Subject: [EXTERNAL] [MS-XCA] LZ77+ Huffman: questions about blocks
> 
> hi Dochelp,
> 
> 
> Does the beginning of the second and subsequent blocks break the bitstream, \
> starting again at a byte boundary after the new Huffman table? 
> The question is best explained by analogy to the way long lengths are handled in \
> matches. Suppose we have a match symbol in the middle of a bitstream, and the match \
> is a long one, requiring the reading of an extra byte: 
> ijklmnop  abcDEFgh [distance] qrs...
> > 
> [match 1, 15]
> 
> Here abc, ghi.. are the sequence of bits in the stream around the match DEF, which \
> is read in alternating bytes by little-endian rules, and the distance is plonked in \
> the middle of the stream as an individual byte. The stream just flows around it, so \
> gh-ijklmnop are interpreted after [distance]. 
> Now, if DEF instead ended the block:
> 
> ijklmnop  abcDEFgh [new Huffman table] qrs...
> > 
> [ends the block (64k)]
> 
> 
> would the bits gh-jklmnop be interpreted using the new Huffman table, as part of \
> the new block, or would those bits be dropped? 
> Multi-block examples would of course be helpful.
> 
> 
> Douglas
> 
> 


_______________________________________________
cifs-protocol mailing list
cifs-protocol@lists.samba.org
https://lists.samba.org/mailman/listinfo/cifs-protocol


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic