[prev in list] [next in list] [prev in thread] [next in thread] 

List:       aspell-user
Subject:    [aspell-user] Recommendations for non-interactive use
From:       Greg Ward <gward () mems-exchange ! org>
Date:       2002-03-20 13:10:04
Message-ID: 20020320210922.GA13545 () mems-exchange ! org
[Download RAW message or body]

I'm trying to figure out what the best tool for non-interactive,
just-show-me-the-misspelled-words-and-go-away use is.  The specific
context is a web crawler that spellchecks each page, so the ability to
parse HTML would be spiffy.

ispell works, but its HTML parser appears broken, so I have to parse the
HTML myself and feed ispell the non-tag text.  This is implemented and
working, so if nobody has a better idea, it's what I'll stick with.

I've just tried aspell .33.7.1, and its HTML parser is definitely
better, but it's two orders of magnitude slower than ispell.  (On one
50k HTML file, ispell takes 0.025 sec, and aspell takes 2.2 sec.)  IMHO
this is a showstopper, but I wonder if it's possible my aspell is
miscompiled or misconfigured or something.  Or is aspell just 100x
slower than ispell in general?  This is on Debian Linux 3.0 (unstable).

Finally, if anyone knows of another tool that simply detects and reports
misspelled words, without bothering to suggest alternatives, I'd love to
hear about it.  Did a quick freshmeat search this morning (which is what
reminded me of aspell), but didn't find anything.

Thanks --

        Greg
-- 
Greg Ward - software developer                gward@mems-exchange.org
MEMS Exchange                            http://www.mems-exchange.org


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic