[prev in list] [next in list] [prev in thread] [next in thread] 

List:       fop-dev
Subject:    Re: Fwd: FOP 1.1 - Unable to copy/paste text is not working
From:       Vincent Hennebert <vhennebert () gmail ! com>
Date:       2013-01-31 9:23:28
Message-ID: 510A3810.5070309 () gmail ! com
[Download RAW message or body]

[Moving over to fop-dev as this is getting technical]

On 30/01/13 15:58, Glenn Adams wrote:
> On Wed, Jan 30, 2013 at 6:44 AM, Neeraj <neerajiiita@gmail.com> wrote:
> 
> > 
> > Yes, my editor can handle used font.
> > If you highlight the text in the editor and set the font to Arial do you
> > see any
> > glyph? For PDF text - No
> > 
> > For embedding this, May be I added embedding mode full later, after
> > generating
> > PDF, but in both the cases it is giving same results.
> > 
> > The issue I reported was for non-Base14 font. You are using Arial which is
> > Base14 font and FOP has full support for these kinds of fonts.
> > 
> > Well as you said, I tried same functionality with Arial font also and
> > found same
> > issue in different form.
> > 
> > Original Arabic text - هذا تعليق الاختبار. تتم كتابة \
> > الكلمات بشكل صحيح PDF Arabic text      - ھذا تعلیق \
> > الاختبار. تتم كتابة الكلمات بشكل صحیح 
> > If I compare PDF and MS-Word files, it looks exactly similar but when I
> > copy it
> > to an editor(Font supported), the words look different (Glyphs are
> > missing). You
> > can check the above text.
> > 
> > Why am I loosing text while doing copy/paste?
> 
> 
> One thing to keep in mind is that some fonts do not include entries in the
> CMAP table for all glyphs that can be referenced by performing the
> character to glyph transformation process. In this case FOP, synthesizes a
> CMAP entry which is used in the embedded font, where this entry uses a
> dynamically generated Unicode value in the PUA (private use area). This
> latter is necessary since PDF requires specifying *some* character code
> (and not glyph index directly) when performing text drawing.

I may be missing something, but I don't understand this ‘PDF requires
specifying some character code'. AFAIU you can put glyph indices
directly in the PDF string; you just have to specify Identity-H as the
font's encoding and Identity in the CIDToGIDMap. So I'm not sure why it
is necessary to use codes in the private use area.

Then, to have copy-paste working, you ‘just' have to provide an
appropriate ToUnicode CMap, that re-maps the shaped glyph to the
original Unicode code point(s).


> If you then attempt to copy this text and paste into another editor that
> isn't aware of this dynamic mapping using the embedded font's CMAP, then
> you may lose that mapping information. One possible way to fix this, which
> I haven't investigated in detail, is to provide a separately encoding
> Unicode string that contains the original, pre-transformed text, and
> associate this string with the displayed post-transformed character string
> that may contain these dynamic PUA characters. The PDF viewer would then
> need to make use of the pre-transformed string when performing copy
> operations. However, I haven't researched this to see if PDF supports.
> 
> Anyway, I suspect this is what is causing your problem. I've opened a bug
> on this at [1].
> 
> [1] https://issues.apache.org/jira/browse/FOP-2204

Vincent


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic