[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freedesktop-poppler
Subject:    Re: [poppler] [Poppler] Bug in your text matching routine
From:       Ed Catmur <ed () catmur ! co ! uk>
Date:       2007-08-31 18:33:43
Message-ID: 1188585223.3416.4.camel () capella ! catmur ! co ! uk
[Download RAW message or body]

This is a MIME-formatted message.  If you see this text it means that your
E-mail software does not support MIME-formatted messages.


On Thu, 2007-08-30 at 19:45 +0200, Albert Astals Cid wrote:
> A Dijous 30 Agost 2007, James Cloos va escriure:
> > >>>>> "Ed" == Ed Catmur <ed@catmur.co.uk> writes:
> > Ed> Question: where do we want to draw the match box when a
> > Ed> search /partially/ matches a compatibility decomposition?
> >
> > Ed> 1. at the end of the compatibility character
> > Ed> 2. exactly halfway through the compatibility character
> > Ed> 3. as far through as the match constitutes of the compatibility
> > Ed> decomposition (e.g. 2/3 through when matching 'ff' of FFI LIGATURE)
> >
> > Ed> 3. seems the most elegant, but could be a little complex to implement
> > Ed> and may not always be the right solution (RTL, zero-width characters,
> > etc.)
> >
> > I'd vote for getting 1 in for now and only then spending any time on
> > implementing 3.  It may even be the better option overall.
> >
> > As you say, 3 will be quite complex when dealing with the scripts which
> > require shaping engines or syllable-per-glyph scripts like Hangeul, if
> > you allow searching for syllable components.
> >
> > With some of the scripts you would even need disjoint match boxes.
> >
> > Even in cases where the syllable block isn't a single glyph it might be
> > better to highlight the whole thing rather than just the matched pieces.
> 
> I'm with James here, go for 1 and then for 3 if you feel powerful :D

No, you're right; without information on the layout of subglyphs in
compatibility characters trying to implement 3 is pointless (and perhaps
even then).

Here's the patch for 1.

Ed

["highlight-full-glyph.patch" (text/x-patch)]

--- poppler/TextOutputDev.cc	2007/08/30 22:11:59	1.1
+++ poppler/TextOutputDev.cc	2007/08/31 05:34:30
@@ -3068,30 +3068,35 @@ GBool TextPage::findText(Unicode *s, int
 
 	// found it
 	if (k == len) {
+	  // where s2 matches a subsequence of a compatibility equivalence
+	  // decomposition, highlight the entire glyph, since we don't know
+	  // the internal layout of subglyph components
+	  int normStart = line->normalized_idx[j];
+	  int normAfterEnd = line->normalized_idx[j + len - 1] + 1;
 	  switch (line->rot) {
 	  case 0:
-	    xMin1 = line->edge[line->normalized_idx[j]];
-	    xMax1 = line->edge[line->normalized_idx[j + len]];
+	    xMin1 = line->edge[normStart];
+	    xMax1 = line->edge[normAfterEnd];
 	    yMin1 = line->yMin;
 	    yMax1 = line->yMax;
 	    break;
 	  case 1:
 	    xMin1 = line->xMin;
 	    xMax1 = line->xMax;
-	    yMin1 = line->edge[line->normalized_idx[j]];
-	    yMax1 = line->edge[line->normalized_idx[j + len]];
+	    yMin1 = line->edge[normStart];
+	    yMax1 = line->edge[normAfterEnd];
 	    break;
 	  case 2:
-	    xMin1 = line->edge[line->normalized_idx[j + len]];
-	    xMax1 = line->edge[line->normalized_idx[j]];
+	    xMin1 = line->edge[normAfterEnd];
+	    xMax1 = line->edge[normStart];
 	    yMin1 = line->yMin;
 	    yMax1 = line->yMax;
 	    break;
 	  case 3:
 	    xMin1 = line->xMin;
 	    xMax1 = line->xMax;
-	    yMin1 = line->edge[line->normalized_idx[j + len]];
-	    yMax1 = line->edge[line->normalized_idx[j]];
+	    yMin1 = line->edge[normAfterEnd];
+	    yMax1 = line->edge[normStart];
 	    break;
 	  }
 	  if (backward) {


_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic