[prev in list] [next in list] [prev in thread] [next in thread] 

List:       icu4j-support
Subject:    Re: Encoding Detection.
From:       Markus Scherer <markus.scherer () jtcsv ! com>
Date:       2002-10-28 17:13:30
[Download RAW message or body]

Shaan wrote:
>        Is there any way to find out the Encoding of any InputStream in 
> Java.    is  ICU4C library can help me in this way??  if yes, then how?

ICU4C currently only has a function for interpreting Unicode signature byte sequences (BOMs). ICU 
does not have any heuristic code for "guessing" charsets.

Such heuristics depend a lot on what kind of documents you expect to encounter - HTML, XML, plain 
text, natural language vs. mostly data, known language(s) and script(s), ...

Mozilla has such code, optimized for HTML pages and charsets commonly used in those. Other libraries 
may have different code with different optimizations.

Best regards,
markus

_______________________________________________
icu4j-support mailing list
icu4j-support@oss.software.ibm.com
http://oss.software.ibm.com/developerworks/oss/mailman/listinfo/icu4j-support
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic