[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    how Windows browsers encode URL [Re: why the % cruft?]
From:       Vadim Plessky <lucy-ples () mtu-net ! ru>
Date:       2002-07-09 16:37:52
[Download RAW message or body]

On Tuesday 09 July 2002 11:15 am, Waldo Bastian wrote:
> On Tuesday 09 July 2002 12:01 am, Lars Knoll wrote:
> > > URLs are spec'ed as a sequence of octets (8-bit values) "Unicode URLs"
> > > basically don't exist. Despite that we try to handle them anyway and
> > > appearantly that doesn't always work. (E.g. we need to convert unicode
> > > to an 8 bit sequence before we can tranfer it to the website but the
> > > encoding to use for that is unspecified, so we can only guess.)
> > 
> > As Dirk already pointed out, IE sends URLS in utf8 by default. I'm
> > pretty sure we could do the same without breaking a lot of web pages
> > (they'd be broken with IE aswell). Maybe there's an HTTP header field we
> > can set to indicate this?
> 
> My impression was that many non-latin1 (e.g. russian, japanese, korean,
> etc.) websites use the "local locale" as encoding and not utf8. Maybe Vadim
> can comment on that from the Russian point of view.

I did the same experiment (search for ‘пример' ) with several Windows \
browsers  I have.

Opera 6 / Windows
----------------
http://www.google.com/search?hl=en&ie=ISO-8859-1&q=%3F%3F%3F%3F%3F%3F&btnG=Google+Search \
                
  --> here Opera fails exactly in a same way as Konqueror
http://www.google.com.ru/search?q=%EF%F0%E8%EC%E5%F0&ie=windows-1251&hl=ru&btnG=%CF%EE%E8%F1%EA+%E2+Google \
  
Netscape 6/Win 
---------------- 
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&btnG=Google+Search \
 http://www.google.com.ru/search?q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&ie=UTF-8&oe=UTF-8&hl=ru&btnG=%D0%9F%D0%BE%D0%B8%D1%81%D0%BA+%D0%B2+Google \
  
http://www.yandex.ru/yandsearch?text=%EF%F0%E8%EC%E5%F0 
(URL encoded in windows 1251) 
http://search.rambler.ru/cgi-bin/rambler_search?words=%EF%F0%E8%EC%E5%F0&where=1 
(URL encoded in windows 1251)

MS IE6
----------------
http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&btnG=Google+Search
 http://www.google.com.ru/search?q=%D0%BF%D1%80%D0%B8%D0%BC%D0%B5%D1%80&ie=UTF-8&oe=UTF-8&hl=ru&btnG=%D0%9F%D0%BE%D0%B8%D1%81%D0%BA+%D0%B2+Google



And, finally I extracted some words from the mail I have in Chineese:
老蟹

and searched Google for it using Mozilla:
Results were quite good, 721 matches (don't ask me what those words mean!...) 
http://www.google.com/search?hl=en&lr=&ie=UTF-8&oe=UTF-8&q=%E8%80%81%E8%9F%B9&btnG=Google+Search
 Again, UTF8.
So, it seems it's rather safe to encode URL to UTF8, as it's common pratice 
and acepted not only by MS IE, but by Mozilla aswell.

> 
> Cheers,
> Waldo

My Best Regards,
-- 

Vadim Plessky
http://kde2.newmail.ru  (English)
33 Window Decorations and 6 Widget Styles for KDE
http://kde2.newmail.ru/kde_themes.html
KDE mini-Themes
http://kde2.newmail.ru/themes/


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic