'Re: [boost] [text] SIMD UTF-8 decoding'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       boost
Subject:    Re: [boost] [text] SIMD UTF-8 decoding
From:       Alexander Grund via Boost <boost () lists ! boost ! org>
Date:       2020-06-18 11:38:01
Message-ID: c7eeeb08-c39c-aa8c-a3f0-ed83adbd6dbb () tu-dresden ! de
[Download RAW message or body]

[Attachment #2 (multipart/signed)]

Am 18.06.20 um 13:10 schrieb Phil Endecott via Boost:
> Alexander Grund wrote:
>> I've seen other SIMD UTF-8 conversions around and they basically all 
>> focus on ASCII converting as much as possible and fallback to 
>> one-by-one decoding once a non-ascii is found
>
> The question is, do they do that because they've determined that
> that gives the best performance (for some benchmark input), or
> have they not tried to do more with the SIMD code?
I guess the former which would be my intuition. It is easy to detect the 
first byte of a multi-byte UTF-8 sequence in SIMD and also easy to bulk 
convert single-byte UTF-8 sequences. Once you get to converting the 
multi-byte sequence then SIMD doesn't make sense anymore. To much 
checking to do: How many bytes to "squash", end-of-input, shortest 
value, legal value, ...
So summary: Once it requries branching it doesn't make sense to use SIMD 
anymore.

["smime.p7s" (application/pkcs7-signature)]

_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

[prev in list] [next in list] [prev in thread] [next in thread]