[prev in list] [next in list] [prev in thread] [next in thread]
List: freedesktop-poppler
Subject: Re: [poppler] [Poppler] Bug in your text matching routine
From: Ed Catmur <ed () catmur ! co ! uk>
Date: 2007-08-31 18:33:43
Message-ID: 1188585223.3416.4.camel () capella ! catmur ! co ! uk
[Download RAW message or body]
This is a MIME-formatted message. If you see this text it means that your
E-mail software does not support MIME-formatted messages.
On Thu, 2007-08-30 at 19:45 +0200, Albert Astals Cid wrote:
> A Dijous 30 Agost 2007, James Cloos va escriure:
> > >>>>> "Ed" == Ed Catmur <ed@catmur.co.uk> writes:
> > Ed> Question: where do we want to draw the match box when a
> > Ed> search /partially/ matches a compatibility decomposition?
> >
> > Ed> 1. at the end of the compatibility character
> > Ed> 2. exactly halfway through the compatibility character
> > Ed> 3. as far through as the match constitutes of the compatibility
> > Ed> decomposition (e.g. 2/3 through when matching 'ff' of FFI LIGATURE)
> >
> > Ed> 3. seems the most elegant, but could be a little complex to implement
> > Ed> and may not always be the right solution (RTL, zero-width characters,
> > etc.)
> >
> > I'd vote for getting 1 in for now and only then spending any time on
> > implementing 3. It may even be the better option overall.
> >
> > As you say, 3 will be quite complex when dealing with the scripts which
> > require shaping engines or syllable-per-glyph scripts like Hangeul, if
> > you allow searching for syllable components.
> >
> > With some of the scripts you would even need disjoint match boxes.
> >
> > Even in cases where the syllable block isn't a single glyph it might be
> > better to highlight the whole thing rather than just the matched pieces.
>
> I'm with James here, go for 1 and then for 3 if you feel powerful :D
No, you're right; without information on the layout of subglyphs in
compatibility characters trying to implement 3 is pointless (and perhaps
even then).
Here's the patch for 1.
Ed
["highlight-full-glyph.patch" (text/x-patch)]
--- poppler/TextOutputDev.cc 2007/08/30 22:11:59 1.1
+++ poppler/TextOutputDev.cc 2007/08/31 05:34:30
@@ -3068,30 +3068,35 @@ GBool TextPage::findText(Unicode *s, int
// found it
if (k == len) {
+ // where s2 matches a subsequence of a compatibility equivalence
+ // decomposition, highlight the entire glyph, since we don't know
+ // the internal layout of subglyph components
+ int normStart = line->normalized_idx[j];
+ int normAfterEnd = line->normalized_idx[j + len - 1] + 1;
switch (line->rot) {
case 0:
- xMin1 = line->edge[line->normalized_idx[j]];
- xMax1 = line->edge[line->normalized_idx[j + len]];
+ xMin1 = line->edge[normStart];
+ xMax1 = line->edge[normAfterEnd];
yMin1 = line->yMin;
yMax1 = line->yMax;
break;
case 1:
xMin1 = line->xMin;
xMax1 = line->xMax;
- yMin1 = line->edge[line->normalized_idx[j]];
- yMax1 = line->edge[line->normalized_idx[j + len]];
+ yMin1 = line->edge[normStart];
+ yMax1 = line->edge[normAfterEnd];
break;
case 2:
- xMin1 = line->edge[line->normalized_idx[j + len]];
- xMax1 = line->edge[line->normalized_idx[j]];
+ xMin1 = line->edge[normAfterEnd];
+ xMax1 = line->edge[normStart];
yMin1 = line->yMin;
yMax1 = line->yMax;
break;
case 3:
xMin1 = line->xMin;
xMax1 = line->xMax;
- yMin1 = line->edge[line->normalized_idx[j + len]];
- yMax1 = line->edge[line->normalized_idx[j]];
+ yMin1 = line->edge[normAfterEnd];
+ yMax1 = line->edge[normStart];
break;
}
if (backward) {
_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic