[prev in list] [next in list] [prev in thread] [next in thread] 

List:       markdown-discuss
Subject:    Re: converting html with \xa9 to Markdown and using iconv?
From:       Milian Wolff <mail () milianw ! de>
Date:       2007-03-22 21:03:29
Message-ID: 200703222203.29530.mail () milianw ! de
[Download RAW message or body]

Am Donnerstag, 22. März 2007 schrieb Jeremy C. Reed:
> The html document various characters like
>  	\xa0
> ©	\xa9  (Copyright symbol)
> (and others).
>
> I tried using html2text.py but it didn't like these characters.
>
> Any ideas on how I can use iconv or another tool to convert documents like
> this so I can then convert to Markdown?
>
> I don't want to do manually as I have around 500+ documents.
>
>
>   Jeremy C. Reed

As far as I understand you, you are looking for a converter which supports 
UTF-8 / Unicode characters?

My PHP-script (ported from html2text.py) doesn't change those, so it would 
theoretically work. Try it out at [1].

But: It's PHP - so unless you have access to a command line or write a little 
PHP script to be run locally it will be of no use for you. The latter should 
be pretty easy though, simply recourse through your files / folders, apply 
html2text to all and save the output somewhere. You might want to allow 
long(er) execution times for PHP scripts for the meantime.

Another alternative would be to use one of the other converters, I know there 
are some but I don't have their URLs at hand. Maybe someone will be able to 
help you.

 [1]: http://milianw.de/projects/html2text/

-- 
Milian Wolff
http://milianw.de

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic