[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice
Subject:    Re: Formatting large docs in Kword
From:       Nicolas Goutte <nicog () snafu ! de>
Date:       2002-12-13 14:18:37
[Download RAW message or body]

On Friday 13 December 2002 06:54, Clarence Dang wrote:
> On Tue, 10 Dec 2002 07:04 pm, Jonathan Drews wrote:
> > On Tuesday 10 December 2002 12:25 am, Clarence Dang wrote:
> > > Hi Jonathan,
> > >
> > > Try loading your text file with the other two options in the "End of
> > > Paragraph" group i.e. "Sentence" and "Old method" (not "As is") in
> > > the "KWord's Plain Text Import Filter" dialog.  They are not perfect
> > > but I think they will do what you want.
> >
> >  Splendid!  Worked like a charm Clarence.
> >
> :)
>
> I would be careful about those modes though (esp. "Old method") because
> they heuristically determine when it's time to end the paragraph (so
> sentences could actually be appended to headings).

Yes, sometimes "Old Method" is too smart. That is also why there is no real 
name for this method. However it works well with paragraphs separated by 
empty lines.

>
> > > > BTW Kword now loads and edits these big docs pretty quickly.
> > >
> > > Really?  The filter reads one character at a time, last time I
> > > checked... You should try loading the 20MB DNA sequence associated
> > > with bug #45973 :-) (actually the real problem with that bug is the
> > > amount of memory that KWord takes up but anyway I'm getting OT again
> >
> >  Yes, but by using your advice it loaded Volume 1 of Decline and Fall of
> > The Roman Empire pretty quickly.
>
> How many gigahertz is your machine? :)
> On my Pentium II, it took 8 minutes CPU time before I stopped loading
> it.... I'm actually going to be working on this filter in a few weeks
> (colour text via ANSI escape codes, table support...).

Do you plan to do it for the import only or for the export too?

>
> > I then saved that as a Kword doc (with
> > the new full page format) and reopened it.  Memory consumption was
> > negligible!
>
> I don't think KWord internally uses XML (haven't checked kwdoc.cc, but).
>
> > When I loaded the original text file using "As is: At the
> > end of the line", the memory consumption was enormous.
>
> Surely, "Sentence" mode would have have taken a similar amount of memory
> (it would be slightly less but not significantly less)?

Not necessary, end of lines are not commonly end of sentences, so you will 
commonly get much less paragraphs than with the "as is" mode.

That is why I had made this mode, as it can make quite good paragraphs out of 
monolithic texts (for example those converted from HTML with clueless 
scripts.)

>
> > It used  ~500 Mb.
> > So the page formating makes a big difference in the memory
> > consumed?
>
> Well there isn't really a difference in Page Formatting at all with those
> modes, just how often "paragraphs" get written.  Dunno about memory though
> but I would imagine fewer paragraphs (not "As is" mode) would be better.
>
> > It took about half an hour to paste the 722 additional pages, to make
> > the 1444 page *kwd.

>
> IMHO, that's much too long!  But, I'm too busy to fix that at the moment.

It is perhaps too long but it is probably the time KWord currently needs to 
paginate. (Do not forget: you are making tons of double precision floating 
point operations, you cannot have them in no time!)

>
> Clarence

Have a nice day/evening/night!

>
>
> ____________________________________
> koffice mailing list
> koffice@mail.kde.org
> To unsubscribe please visit:
> http://mail.kde.org/mailman/listinfo/koffice

____________________________________
koffice mailing list
koffice@mail.kde.org
To unsubscribe please visit:
http://mail.kde.org/mailman/listinfo/koffice
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic