From kde-bugs-dist Fri Sep 17 09:21:56 2010 From: Robert Knight Date: Fri, 17 Sep 2010 09:21:56 +0000 To: kde-bugs-dist Subject: [Bug 161324] recognise columns in the text of a page Message-Id: <20100917092156.6B0DB67611 () immanuel ! kde ! org> X-MARC-Message: https://marc.info/?l=kde-bugs-dist&m=128471631126660 https://bugs.kde.org/show_bug.cgi?id=161324 --- Comment #37 from Robert Knight 2010-09-17 11:21:53 --- > Does poppler guess the text layout using some generic heuristic algorithm, or > use some explicit information on text ordering embedded in the PDF format? PDFs do not contain layout information about how text is structured into paragraphs and columns. As I understand it, what PDF provides is essentially a list of commands that say "draw string S at position P with font F". I haven't looked into recent versions of Poppler but older versions had some fairly complex heuristic algorithms to try to piece together the layout given the input. These algorithms had some interesting flaws. If I remember correctly, due to numerical instability the order of paragraphs in the output text could differ significantly depending on the processor on which you ran the code. -- Configure bugmail: https://bugs.kde.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching all bug changes.