'[john-dev] Latin-1 to UTF-16 conversion (was Lei's GSoC progress)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       john-dev
Subject:    [john-dev] Latin-1 to UTF-16 conversion (was Lei's GSoC progress)
From:       Lei Zhang <zhanglei.april () gmail ! com>
Date:       2015-07-29 3:35:22
Message-ID: FDA6BA75-697E-4D59-9F58-17A6DB844713 () gmail ! com
[Download RAW message or body]


> On Jul 27, 2015, at 5:01 PM, magnum <john.magnum@hushmail.com> wrote:
> 
> On 2015-07-27 03:15, Lei Zhang wrote:
> > 
> > 1. The input key is appropriately padded in set_key() for the SIMD
> > SHA function, and key length is also determined in the process. What
> > do I do if the key is UTF16-encoded? In episerver(non-SIMD), it uses
> > enc_to_utf16() to convert the key and get its length. But each key is
> > not contiguously stored for the SIMD SHA function, thus
> > enc_to_utf16() won't be applicable.
> 
> So episerver is sha256($s.utf16($p)) or sha1($s.utf16($p)). The MSSQL formats are \
> similar but appends salt instead of prepending (actually that's more tricky to \
> optimize since we can't keep the salt at a fixed position). 
> For fast formats like this, flat enc_to_utf16() is far too slow. You should convert \
> right into SIMD buffer like in MSSQL05's set_key. 
> Then you would just store the (bit-)length in the Merkel-Damgard buffer and be done \
> with it. You'd read it back in get_key when needed. 
> You don't need it for anything else: For best performance, you should write the \
> salt right into SIMD buffer in set_salt() (repeated for all of the vector width of \
> course). The set_key and get_key functions will know there's a fixed salt length of \
> 16 (octets) so can just start writing/reading after it, and write (read) the bit \
> length with these extra 16 in mind. Then they'd write the Merkel-Damgard bit length \
> field as 8 * (16 + keylen) with keylen counted in octets... 
> After all this, crypt_all() is simply just a matter of calling the SHA256 (or SHA1) \
> function - the buffer is ready to use.

I looked at set_key() in mssql05 and nt2, which both convert latin-1 to utf-16 into \
SIMD key buffer. Yet there're still some details I don't understand.

1. mssql05 uses SHA1 and nt2 uses MD4, both of which use the same padding scheme, \
except for the endianness of the padded length at the tail of the block. But their \
code for converting are somehow different,

e.g. in mssql05's set_key():
	*keybuf_word = JOHNSWAP((temp << 16) | temp2);
and in nt2:
	temp2 |= (temp << 16);
	*keybuf_word = temp2;

Why is there no endianness swapping in nt2?

2. In mssql05's set_key():
	unsigned int *keybuf_word = (unsigned int*)&saved_key[GETPOS(3, index)];

What's the intention of the number 3 here? Salts are appended to message in mssql05, \
so this is not for preserving space for salt. And the salt size is not 3 anyway.

BTW, there're so many hardcoded values in the code for SIMD buffer handling. This \
would cause a lot of headaches for a newcomer...

3. I see that the returned value in get_salt() and get_binary() are sometimes \
endianness-swapped for a SIMD build and sometimes not. What's the point here?


Thanks,
Lei=


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic