'Re: pdf import in KWord'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       koffice-devel
Subject:    Re: pdf import in KWord
From:       Thomas Zander <zander () kde ! org>
Date:       2006-10-17 11:39:33
Message-ID: 200610171339.37795.zander () kde ! org
[Download RAW message or body]

[Attachment #2 (multipart/signed)]

On Tuesday 17 October 2006 12:39, Brad Hards wrote:
> I think the best way is to add ODF support is as a new OutputDev (as Jan
> and Martin pointed out) within the poppler codebase. There are some good
> hooks in PDF (and now in poppler) to extract useful information like
> reading order.
>
> This should be able to be incorporated into okular as "export to ODF",
> although it would require some changes to the Okular::Generator class.
>
> However I'm not sure what the "render to ODF" part is going to look like.
> Do you want to render a single page at a time? Do you want a different kind
> of rendering if the application is KWord, Krita or Karbon? Pass over a
> memory buffer or via a temp file?

The ODF fileformat is formatted so that it can contain text, frames, notes, 
vector art and pixmaps nested and also multi page.
The ideal would be to export a full document with all the content there is. 
Applications like KWord can use that as is, but applications that only are 
interrested in the vector art can ignore the text during import and only take 
out the vector part of the xml-tree.
Note that Karbon and krita can also load the whole document including text in 
koffice2, the only restriction they have is that multiple pages are stored 
differently.
But those apps can either split into different layers or just ignore other 
pages.

> Martin pointed out some issues with fonts  - do you want to try to make the
> file editable (so we should try to substitute the font based on metrics),

Very much so, yes.
> or do you want to try to make it look like the original (so we should
> render the text and pass over an image of the glyph.

I can imagine that for some corner cases importing a bitmap might be wanted 
(missing fonts or when the restoration of the text fails), and it might be an 
option, though.
I guess if its an easy to add feature...

> Note that it is also possible to have a PDF that is just an image of the
> page. Is it important to try to OCR the image?
AFAIK open source OCR is not at the pinnacle of its promise at all, so I don't 
see a lot of added value. I'd say its a "nice to have'.

Was the idea of the okular team to move any such function to poppler (which is 
a shared lib the koffice filter will then depend on) or somewhere else so the 
filter will have to depend on okular?
I'd prefer the former.

Cheers!
-- 
Thomas Zander

[Attachment #5 (application/pgp-signature)]

_______________________________________________
koffice-devel mailing list
koffice-devel@kde.org
https://mail.kde.org/mailman/listinfo/koffice-devel

[prev in list] [next in list] [prev in thread] [next in thread]