[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freedesktop-poppler
Subject:    Re: [poppler] poppler-dump
From:       Marco <ctxspi () gmail ! com>
Date:       2014-03-13 12:07:09
Message-ID: CAAVAo4Ox8MK0feMPRmZ97oR3af4x4sLQTdOhHKwn3PPQaWY=Lg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


I have found rules in TextOutputDev.cc inside function 'void
TextPage::dump(void *outputStream, TextOutputFunc outputFunc, GBool
physLayout) { ... }' that give me right layout (for me) and code is:

...
} else {
    for (flow = flows; flow; flow = flow->next) {
      for (blk = flow->blocks; blk; blk = blk->next) {
        for (line = blk->lines; line; line = line->next) {
          n = line->len;
          if (line->hyphenated && (line->next || blk->next)) {
            --n;
          }
          s = new GooString();
          dumpFragment(line->text, n, uMap, s);
          (*outputFunc)(outputStream, s->getCString(), s->getLength());
          delete s;
          // output a newline when a hyphen is not suppressed
          if (n == line->len) {
            (*outputFunc)(outputStream, eol, eolLen);
          }
        }
      }
      (*outputFunc)(outputStream, eol, eolLen);
    }
  }

Do you know if same method can be imported for poppler-dump.cc?

Please can you explained me howto?

As you can see in my first mail problem to print in string was solved.


2014-03-13 12:30 GMT+01:00 Marco <ctxspi@gmail.com>:

> Hi Brad
>
> I think that the main problem is that poppler-cpp library cannot print pdf
> file as same mode of pdftotext (command without any layout option).
>
>
> 2014-03-13 10:20 GMT+01:00 Brad Hards <bradh@frogmouth.net>:
>
> On Thu, 13 Mar 2014 10:11:36 AM Marco wrote:
>> > I have tried it more times but I need to have in output not ustring data
>> > but string or pointer of chars.
>> >
>> > I need to have utf8 charset but not in the ustring format.
>> From cpp/poppler-global.h header:
>>
>> class POPPLER_CPP_EXPORT ustring : public std::basic_string<unsigned
>> short>
>> {
>> public:
>>     ustring();
>>     ustring(size_type len, value_type ch);
>>     ~ustring();
>>
>>     byte_array to_utf8() const;
>>     std::string to_latin1() const;
>>
>>     static ustring from_utf8(const char *str, int len = -1);
>>     static ustring from_latin1(const std::string &str);
>> ...
>> }
>>
>>
>
>
> --
> E' meglio coltivare GNU/Linux... tanto Windows si pianta da solo!!
>



-- 
E' meglio coltivare GNU/Linux... tanto Windows si pianta da solo!!

[Attachment #5 (text/html)]

<div dir="ltr"><div>I have found rules in TextOutputDev.cc inside function &#39;void \
TextPage::dump(void *outputStream, TextOutputFunc outputFunc, GBool physLayout) { ... \
}&#39; that give me right layout (for me) and code is:<br> <br>...<br></div>} else \
{<br><div>    for (flow = flows; flow; flow = flow-&gt;next) {<br>      for (blk = \
flow-&gt;blocks; blk; blk = blk-&gt;next) {<br>        for (line = blk-&gt;lines; \
line; line = line-&gt;next) {<br>  n = line-&gt;len;<br>          if \
(line-&gt;hyphenated &amp;&amp; (line-&gt;next || blk-&gt;next)) {<br>            \
--n;<br>          }<br>          s = new GooString();<br>          \
dumpFragment(line-&gt;text, n, uMap, s);<br>  (*outputFunc)(outputStream, \
s-&gt;getCString(), s-&gt;getLength());<br>          delete s;<br>          // output \
a newline when a hyphen is not suppressed<br>          if (n == line-&gt;len) {<br>   \
(*outputFunc)(outputStream, eol, eolLen);<br>  }<br>        }<br>      }<br>      \
(*outputFunc)(outputStream, eol, eolLen);<br>    }<br>  }<br><br></div><div>Do you \
know if same method can be imported for poppler-dump.cc?<br><br></div><div>Please can \
you explained me howto?<br> <br></div><div>As you can see in my first mail problem to \
print in string was solved.<br></div></div><div class="gmail_extra"><br><br><div \
class="gmail_quote">2014-03-13 12:30 GMT+01:00 Marco <span dir="ltr">&lt;<a \
href="mailto:ctxspi@gmail.com" target="_blank">ctxspi@gmail.com</a>&gt;</span>:<br> \
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div>Hi Brad<br><br></div>I think that the \
main problem is that poppler-cpp library cannot print pdf file as same mode of \
pdftotext (command without any layout option).<br> </div><div \
class="gmail_extra"><br> <br><div class="gmail_quote">2014-03-13 10:20 GMT+01:00 Brad \
Hards <span dir="ltr">&lt;<a href="mailto:bradh@frogmouth.net" \
target="_blank">bradh@frogmouth.net</a>&gt;</span>:<div><div \
class="h5"><br><blockquote class="gmail_quote" style="margin:0 0 0 \
.8ex;border-left:1px #ccc solid;padding-left:1ex">

<div>On Thu, 13 Mar 2014 10:11:36 AM Marco wrote:<br>
&gt; I have tried it more times but I need to have in output not ustring data<br>
&gt; but string or pointer of chars.<br>
&gt;<br>
&gt; I need to have utf8 charset but not in the ustring format.<br>
</div>From cpp/poppler-global.h header:<br>
<br>
class POPPLER_CPP_EXPORT ustring : public std::basic_string&lt;unsigned short&gt;<br>
{<br>
public:<br>
    ustring();<br>
    ustring(size_type len, value_type ch);<br>
    ~ustring();<br>
<br>
    byte_array to_utf8() const;<br>
    std::string to_latin1() const;<br>
<br>
    static ustring from_utf8(const char *str, int len = -1);<br>
    static ustring from_latin1(const std::string &amp;str);<br>
...<br>
}<br>
<br>
</blockquote></div></div></div><br><br clear="all"><div class=""><br>-- <br>E&#39; \
meglio coltivare GNU/Linux... tanto Windows si pianta da solo!!<br> </div></div>
</blockquote></div><br><br clear="all"><br>-- <br>E&#39; meglio coltivare \
GNU/Linux... tanto Windows si pianta da solo!!<br> </div>



_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic