[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kde-bugs-dist
Subject:    [Bug 161324] recognise columns in the text of a page
From:       Robert Knight <robertknight () gmail ! com>
Date:       2010-09-17 9:21:56
Message-ID: 20100917092156.6B0DB67611 () immanuel ! kde ! org
[Download RAW message or body]

https://bugs.kde.org/show_bug.cgi?id=161324





--- Comment #37 from Robert Knight <robertknight gmail com>  2010-09-17 11:21:53 ---
> Does poppler guess the text layout using some generic heuristic algorithm, or
> use some explicit information on text ordering embedded in the PDF format?

PDFs do not contain layout information about how text is structured into
paragraphs and columns.  As I understand it, what PDF provides is essentially a
list of commands that say "draw string S at position P with font F".

I haven't looked into recent versions of Poppler but older versions had some
fairly complex heuristic algorithms to try to piece together the layout given
the input.  These algorithms had some interesting flaws.  If I remember
correctly, due to numerical instability the order of paragraphs in the output
text could differ significantly depending on the processor on which you ran the
code.

-- 
Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are watching all bug changes.
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic