[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freedesktop-poppler
Subject:    Re: [poppler] poppler-dump
From:       Marco <ctxspi () gmail ! com>
Date:       2014-03-13 9:11:36
Message-ID: CAAVAo4OXzGriUUjP+VdwyjYOCOJAG_nNEH8rYvpvEGuH0zn4zA () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Il giorno 12/mar/2014 20:36, "Albert Astals Cid" <aacid@kde.org> ha scritto:
>
> >
> > El Dimecres, 12 de març de 2014, a les 20:25:45, Marco va escriure:
> > > Hi Albert
> > >
> > > Command 'pdftotext -layout filename.pdf -' it is the same if I use
> > > physical_layout in my small program, but if I have a pdf file with text
> > > into tables (I am sorry for my bad description), and I use command
> > > 'pdftotext filename.pdf -', it  give a results that I cannot display
> using
> > > 'raw_order_layout' or 'physical_layout' in my program.
> >
> > I'd say it is the other way around, poppler-dump can't give you what
> -layout
> > does.
> >
> > Compare the code of poppler-page.cpp and pdftottext, it's pretty
> straight-
> > forward.
> >
> > Cheers,
> >   Albert
> >
> > >
> > > 2014-03-12 19:49 GMT+01:00 Marco <ctxspi@gmail.com>:
> > > > Hi to all
> > > >
> > > > I am new user to poppler and I have a short question.
> > > >
> > > > In my small program I use these lines:
> > > >
> > > > for (int i = 0; i < pages; ++i) {
> > > >
> > > >     cout << "Page " << (i + 1) << "/" << pages << ":" << endl;
> > > >     auto_ptr<poppler::page> p(doc->create_page(i));
> > > >     poppler::byte_array text_ba = p.get()->text(p->page_rect(),
> > > >
> > > > poppler::page::raw_order_layout).to_utf8();
> > > >
> > > >     text_ba.push_back(0); // Add a NULL terminator for the C char *
> > > >     string text( text_ba.begin(), text_ba.end() );
> > > >     cout << text << endl;
> > > >     }
> > > >
> > > > to print text of file pdf, but using 'raw_order_layout' or
> > > > 'physical_layout' the output is different if I use the command
> 'pdftotext
> > > > filename.pdf -'.
> > > >
> > > >
> > > > How I can show text (but written in a pointer of char) as command
> > > > 'pdftotext filename.pdf -' ?
> > > >
> > > > Thank
> >
>
> Albert I'am sorry for mail incovenient.
--
I have tried it more times but I need to have in output not ustring data
but string or pointer of chars.

I need to have utf8 charset but not in the ustring format.

[Attachment #5 (text/html)]

<div dir="ltr"><br><div class="gmail_extra"><br><br><div class="gmail_quote"><span \
dir="ltr"></span><br><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> <p>
Il giorno 12/mar/2014 20:36, &quot;Albert Astals Cid&quot; &lt;<a \
href="mailto:aacid@kde.org" target="_blank">aacid@kde.org</a>&gt; ha \
scritto:</p><div><div class="h5"><br> &gt;<br>
&gt; El Dimecres, 12 de març de 2014, a les 20:25:45, Marco va escriure:<br>
&gt; &gt; Hi Albert<br>
&gt; &gt;<br>
&gt; &gt; Command &#39;pdftotext -layout filename.pdf -&#39; it is the same if I \
use<br> &gt; &gt; physical_layout in my small program, but if I have a pdf file with \
text<br> &gt; &gt; into tables (I am sorry for my bad description), and I use \
command<br> &gt; &gt; &#39;pdftotext filename.pdf -&#39;, it  give a results that I \
cannot display using<br> &gt; &gt; &#39;raw_order_layout&#39; or \
&#39;physical_layout&#39; in my program.<br> &gt;<br>
&gt; I&#39;d say it is the other way around, poppler-dump can&#39;t give you what \
-layout<br> &gt; does.<br>
&gt;<br>
&gt; Compare the code of poppler-page.cpp and pdftottext, it&#39;s pretty \
straight-<br> &gt; forward.<br>
&gt;<br>
&gt; Cheers,<br>
&gt;   Albert<br>
&gt;<br>
&gt; &gt;<br>
&gt; &gt; 2014-03-12 19:49 GMT+01:00 Marco &lt;<a href="mailto:ctxspi@gmail.com" \
target="_blank">ctxspi@gmail.com</a>&gt;:<br> &gt; &gt; &gt; Hi to all<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; I am new user to poppler and I have a short question.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; In my small program I use these lines:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; for (int i = 0; i &lt; pages; ++i) {<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     cout &lt;&lt; &quot;Page &quot; &lt;&lt; (i + 1) &lt;&lt; \
&quot;/&quot; &lt;&lt; pages &lt;&lt; &quot;:&quot; &lt;&lt; endl;<br> &gt; &gt; &gt; \
auto_ptr&lt;poppler::page&gt; p(doc-&gt;create_page(i));<br> &gt; &gt; &gt;     \
poppler::byte_array text_ba = p.get()-&gt;text(p-&gt;page_rect(),<br> &gt; &gt; \
&gt;<br> &gt; &gt; &gt; poppler::page::raw_order_layout).to_utf8();<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;     text_ba.push_back(0); // Add a NULL terminator for the C char \
*<br> &gt; &gt; &gt;     string text( text_ba.begin(), text_ba.end() );<br>
&gt; &gt; &gt;     cout &lt;&lt; text &lt;&lt; endl;<br>
&gt; &gt; &gt;     }<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; to print text of file pdf, but using &#39;raw_order_layout&#39; or<br>
&gt; &gt; &gt; &#39;physical_layout&#39; the output is different if I use the command \
&#39;pdftotext<br> &gt; &gt; &gt; filename.pdf -&#39;.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; How I can show text (but written in a pointer of char) as command<br>
&gt; &gt; &gt; &#39;pdftotext filename.pdf -&#39; ?<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; Thank<br>
&gt;<br>
</div></div><p></p>
</blockquote></div>Albert I&#39;am sorry for mail incovenient.<br></div><div \
class="gmail_extra">--<br>I have tried it more times but I need to have in output not \
ustring data but string or pointer of chars.<br><br></div> <div class="gmail_extra">I \
need to have utf8 charset but not in the ustring format.<br></div><div \
class="gmail_extra"><br clear="all"><br><br> </div></div>



_______________________________________________
poppler mailing list
poppler@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/poppler


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic