'Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       poi-user
Subject:    Re: Retrieving content of hyperlinked slides in powerpoint files(.PPTX) through apache POI
From:       David Law <david.law () apconsult ! de>
Date:       2014-01-31 9:37:05
Message-ID: 52EB6EC1.8070407 () apconsult ! de
[Download RAW message or body]

Hi Simanchal,

sorry: I missed that the for-loop was leaving the 1st entry intact.

In a way, your new forwards loop is a bit better
as it documents that you are always removing
entry[1], but you still have the uncertainty:
"did I remove the last one!!" :-)

(This is NOT a fault of your code, rather a shortcoming
  of the Java language & they haven't addressed it with
  the for-each construct. What is missing is an index in
  the syntax of for-each & the possibility to iterate
  backwards through collections)

Slightly more paranoid is the following:
while (true) {
     try {
         xslfParagraph.getXmlObject().removeR(1);
     }
     catch (IndexOutOfBoundsException e) {
         break;
     }
}
...after this loop you are certain all except entry[0] were removed!

By the way, I assume you're using Java 7,
so you might like to use the try-with-resources syntax:
try (   final FileInputStream  fis = new FileInputStream (inputFile);
         final FileOutputStream fos = new FileOutputStream(outputFile);
         ) {
     final XMLSlideShow ppt = new XMLSlideShow(fis);
     :     :            :   : :   :
}
catch(Exception ex){
     ex.printStackTrace();
}

This guarantees that fis & fos will always be closed automatically.

All the best,
DaveLaw

On 31.01.2014 08:43, simanchal maharana wrote:
> Hi David,
> 
> Thanks lot for your suggestion.
> Actually its for translation of PPTX files. So I have to replace whole
> paragraph with its translation.
> But paragraph is combination of some <a:r> ie; CTRegularTextRun, and
> again <a:r> is parent of <a:t>.
> 
> 1. I saw XML of each paragraph, but never faced more than one <a:t> in
> one <a:r> (CTRegularTextRun), but in code it gives array of T. so
> write code in this way.
> 
> 2. I have to replace whole paragraph content by its translation. Its
> not possible to divide translated text as per <a:r> or <a:t>. So I
> removed all siblings of <a:r> ie; CTRegularTextRun except 1st one and
> I am replacing <a:t> content of 1st CTRegularTextRun of paragraph by
> its translation.
> 
> 3. For blank paragraphs number of ctRegularTextRun is zero I guess. so
> while setting its content by
> ctRegularTextRun[0].setT("Some Translated Text"); gives
> ArrayIndexOutOfBound exception. So I check for length. I
> have modified it.
> String originalParaText = replaceUnwantedChar(xslfParagraph.getText());
> if ( ! originalText.isEmpty()) {
> 
> I am doing all operation. So now I don't need to check for length of
> CTRegularTextRun[] for that paragraph.
> Thanks lot for this suggestion.
> 
> 4. Now I figured out the difference among
> 
> for(int index = 1; index <= ctRegularTextRun.length-1; index++) and
> for(int index = ctRegularTextRun.length-1; index > 0 ; index--).
> 
> While traversing in forward direction (1st one) it gives
> IndexOutOfBoundsException while traversing in backward (2nd one) it
> works fine. So I was deleting ctRegularTextRun in backward direction
> whereas I was leaving 1st ctRegularTextRun ie; present at 0th index.
> 
> but now I can put both.
> 
> CTRegularTextRun[] ctRegularTextRun = xslfParagraph.getXmlObject().getRArray();
> for(int index = ctRegularTextRun.length-1; index > 0 ; index--){
> xslfParagraph.getXmlObject().removeR(index);
> }
> 
> or
> 
> for(int index = 1; index <= ctRegularTextRun.length-1; index++){
> xslfParagraph.getXmlObject().removeR(1);
> }
> 
> Thanks lot for your suggestion.
> Simanchal
> 
> 
> 
> On Fri, Jan 31, 2014 at 4:17 AM, David Law-2 [via Apache POI]
> <ml-node+s1045710n5714785h97@n5.nabble.com> wrote:
> > Simanchal,
> > 
> > may I ask a couple of stupid questions?
> > 
> > I've removed some dead code & what's left
> > in the heart of all those nested if's & for's is this:
> > 
> > CTRegularTextRun[] ctRegularTextRun =
> > xslfParagraph.getXmlObject().getRArray();
> > 
> > for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
> > xslfParagraph.getXmlObject().removeR(index);
> > }
> > if (ctRegularTextRun.length > 0) {
> > ctRegularTextRun[0].setT("");
> > }
> > 
> > First you get an Array of all CTRegularTextRuns contained in the XmlObject.
> > Then you remove them all from the XmlObject.
> > (they now only exist in the Array you just got)
> > Finally you set the T Element of (only!) the 1st CTRegularTextRun (if
> > present) to "".
> > 
> > Q1) Now I wonder why you need to iterate backwards through the Array?
> > Q2) Setting the T Element will have no effect (because you have just
> > deleted all R's from the XmlObject?!
> > 
> > All the best,
> > DaveLaw
> > 
> > 
> > On 30.01.2014 04:44, simanchal maharana wrote:
> > 
> > > Hi Andreas,
> > > 
> > > PFA PPTX file for your review.
> > > 
> > > Thanks,
> > > Simanchal
> > > 
> > > On Thu, Jan 30, 2014 at 2:44 AM, Andreas Beeker [via Apache POI]
> > > <[hidden email]> wrote:
> > > > Hi,
> > > > 
> > > > is there a chance to get your .pptx-files?
> > > > 
> > > > - link it to your stackoverflow post [1]
> > > > - or open a bugzilla entry [2]
> > > > - or send it to my email address (least preferred ...)
> > > > 
> > > > Andi.
> > > > 
> > > > 
> > > > [1]
> > > > 
> > > > http://stackoverflow.com/questions/21386211/retrieving-content-of-hyperlinked-slides-in-powerpoint-files-pptx-through-apac
> > > >  [2] http://issues.apache.org/bugzilla/buglist.cgi?product=POI
> > > > 
> > > > On 29.01.2014 07:21, simanchal maharana wrote:
> > > > 
> > > > > I am trying to get the text content of powerpoint files and replace with
> > > > > some
> > > > > other text. I have a powerpoint file of 20 slides. where 13,14,15,16
> > > > > slides
> > > > > have hyperlink to 17,18,19 and 20th slide. I am using XMLSlideshow to
> > > > > traverse through the slides, But it gives only 16 slides. It does not
> > > > > give
> > > > > last 4 hyperlinked slides.
> > > > > 
> > > > > Any idea really appreciable in advance how can I get content of all
> > > > > hyper-linked slides and Replace by some other text.
> > > > > 
> > > > > here is my code.
> > > > > 
> > > > > import java.io.FileInputStream;
> > > > > import java.io.FileOutputStream;
> > > > > import org.apache.poi.xslf.usermodel.XMLSlideShow;
> > > > > import org.apache.poi.xslf.usermodel.XSLFShape;
> > > > > import org.apache.poi.xslf.usermodel.XSLFSlide;
> > > > > import org.apache.poi.xslf.usermodel.XSLFTextParagraph;
> > > > > import org.apache.poi.xslf.usermodel.XSLFTextShape;
> > > > > import org.openxmlformats.schemas.drawingml.x2006.main.CTRegularTextRun;
> > > > > public class Testing {
> > > > > static String inputFile =
> > > > > "C:\\Users\\SM78882\\Desktop\\Testing\\IE_Basics_English.pptx";
> > > > > static String outputFile =
> > > > > "C:\\Users\\SM78882\\Desktop\\Testing\\result.pptx";
> > > > > 
> > > > > public static String replaceUnwantedChar(String originalString) {
> > > > > if (null != originalString)
> > > > > return "" + originalString.replaceAll("(\n+)|(\t+)|(\\s{2,})", " ")
> > > > > .trim();
> > > > > else
> > > > > return "";
> > > > > }
> > > > > public static void main(String[] args) {
> > > > > FileInputStream fis = null;
> > > > > FileOutputStream fos = null;
> > > > > XMLSlideShow ppt = null;
> > > > > try {
> > > > > fis = new FileInputStream(inputFile);
> > > > > fos = new FileOutputStream(outputFile);
> > > > > ppt = new XMLSlideShow(fis);
> > > > > System.out.println("No of slides:" + ppt.getSlides().length); // gives
> > > > > 16
> > > > > slides.
> > > > > for (XSLFSlide slide : ppt.getSlides()) {
> > > > > for (XSLFShape shape : slide) {
> > > > > if (shape instanceof XSLFTextShape) {
> > > > > XSLFTextShape txShape = (XSLFTextShape) shape;
> > > > > for (XSLFTextParagraph xslfParagraph : txShape .getTextParagraphs()) {
> > > > > String originalText = replaceUnwantedChar(xslfParagraph .getText());
> > > > > if (!originalText.isEmpty()) {
> > > > > String translation = "";
> > > > > if (translation != null) {
> > > > > CTRegularTextRun[] ctRegularTextRun = xslfParagraph
> > > > > .getXmlObject().getRArray();
> > > > > for (int index = ctRegularTextRun.length - 1; index > 0; index--) {
> > > > > xslfParagraph.getXmlObject().removeR( index);
> > > > > }
> > > > > if (ctRegularTextRun.length > 0)
> > > > > ctRegularTextRun[0].setT(translation);
> > > > > }
> > > > > }
> > > > > }
> > > > > }
> > > > > }
> > > > > }
> > > > > ppt.write(fos);
> > > > > fos.close();
> > > > > fis.close();
> > > > > } catch (Exception ex) {
> > > > > ex.printStackTrace();
> > > > > }
> > > > > }
> > > > > }
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > 
> > > > > --
> > > > > View this message in context:
> > > > > 
> > > > > http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766.html
> > > > >  Sent from the POI - User mailing list archive at Nabble.com.
> > > > > 
> > > > > ---------------------------------------------------------------------
> > > > > To unsubscribe, e-mail: [hidden email]
> > > > > For additional commands, e-mail: [hidden email]
> > > > > 
> > > > > 
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [hidden email]
> > > > For additional commands, e-mail: [hidden email]
> > > > 
> > > > 
> > > > 
> > > > ________________________________
> > > > If you reply to this email, your message will be added to the discussion
> > > > below:
> > > > 
> > > > http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714769.html
> > > >  To unsubscribe from Retrieving content of hyperlinked slides in
> > > > powerpoint
> > > > files(.PPTX) through apache POI, click here.
> > > > NAML
> > > Final_2.7z (16M)
> > > <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/0/Final_2.7z>
> > > IE_Basics_English.pptx (4M)
> > > <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/1/IE_Basics_English.pptx>
> > >  PPTXParser_Code.java (4K)
> > > <http://apache-poi.1045710.n5.nabble.com/attachment/5714773/2/PPTXParser_Code.java>
> > >  
> > > 
> > > 
> > > 
> > > --
> > > View this message in context:
> > > http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714773.html
> > >  Sent from the POI - User mailing list archive at Nabble.com.
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> > 
> > 
> > 
> > ________________________________
> > If you reply to this email, your message will be added to the discussion
> > below:
> > http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714785.html
> >  To unsubscribe from Retrieving content of hyperlinked slides in powerpoint
> > files(.PPTX) through apache POI, click here.
> > NAML
> 
> 
> 
> --
> View this message in context: \
> http://apache-poi.1045710.n5.nabble.com/Retrieving-content-of-hyperlinked-slides-in-powerpoint-files-PPTX-through-apache-POI-tp5714766p5714788.html
>  Sent from the POI - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic