[prev in list] [next in list] [prev in thread] [next in thread]
List: boost
Subject: Re: [boost] [text] SIMD UTF-8 decoding
From: Alexander Grund via Boost <boost () lists ! boost ! org>
Date: 2020-06-18 11:38:01
Message-ID: c7eeeb08-c39c-aa8c-a3f0-ed83adbd6dbb () tu-dresden ! de
[Download RAW message or body]
[Attachment #2 (multipart/signed)]
Am 18.06.20 um 13:10 schrieb Phil Endecott via Boost:
> Alexander Grund wrote:
>> I've seen other SIMD UTF-8 conversions around and they basically all
>> focus on ASCII converting as much as possible and fallback to
>> one-by-one decoding once a non-ascii is found
>
> The question is, do they do that because they've determined that
> that gives the best performance (for some benchmark input), or
> have they not tried to do more with the SIMD code?
I guess the former which would be my intuition. It is easy to detect the
first byte of a multi-byte UTF-8 sequence in SIMD and also easy to bulk
convert single-byte UTF-8 sequences. Once you get to converting the
multi-byte sequence then SIMD doesn't make sense anymore. To much
checking to do: How many bytes to "squash", end-of-input, shortest
value, legal value, ...
So summary: Once it requries branching it doesn't make sense to use SIMD
anymore.
["smime.p7s" (application/pkcs7-signature)]
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic