'Re: [whatwg] Encoding Sniffing'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       whatwg
Subject:    Re: [whatwg] Encoding Sniffing
From:       Alexey Proskuryakov <ap () webkit ! org>
Date:       2012-04-23 17:58:17
Message-ID: FAA095D7-6F14-4F76-9D30-8574C9B3D019 () webkit ! org
[Download RAW message or body]


21.04.2012, =D7 3:21, Anne van Kesteren =CE=C1=D0=C9=D3=C1=CC(=C1):

> 1) Is this something we want to define and eventually implement the =
same way?

I think that the general direction should be getting rid of encoding =
sniffing. It's very rarely helpful if ever, and implementations are =
wildly different.

WebKit can optionally use ICU for charset detection. We also have custom =
built-in heuristics to switch between Japanese encodings only (think =
rendering unlabeled EUC-JP pages when default browser encoding is set to =
Shift-JIS). Safari doesn't enable ICU based detection to no visible user =
disconcert, and I don't know if the Japanese heuristics are still =
important.

> 2) Does this need to apply outside HTML? For JavaScript it forbidden =
per the HTML standard at the moment. CSS and XML do not allow it either. =
Is it used for decoding text/plain at the moment?
> 3) Is there a limit to how many bytes we should look at?

Related to the last question, WebKit doesn't implement re-navigation =
(neither for charset sniffing, nor for <meta charset>), and I don't =
think that we ever should.

- WBR, Alexey Proskuryakov

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic