[prev in list] [next in list] [prev in thread] [next in thread] 

List:       mtos-dev
Subject:    Re: [MTOS-dev] Developing action streams
From:       Mark Paschal <mark () sixapart ! com>
Date:       2008-02-26 23:05:49
Message-ID: 47C49B4D.9040704 () sixapart ! com
[Download RAW message or body]

Hi, John:

John Eckman wrote:
> First, is there a better place to ask questions about the Action  
> Streams plugin?

Not really. The mt-dev Yahoo Group may be better, as Action Streams 
isn't part of MTOS.

I agree asking in the comment threads doesn't scale. Hopefully when we 
reorganize code.sixapart.com next we'll have forums for supporting 
projects like these.


> Here's the relevant info from the config.yaml I'm trying to use:

> action_streams:
>      amazon:
>          purchased:	
>              name: Purchased
>              description: Items added to amazon wishlist
>              html_form: 'New music purchase: <a href="[_2]">[_3]</a>'
>              html_params:
>                  - url
>                  - title
>              url: 'http://www.amazon.com/gp/registry/wishlist/{{ident}}'
>              identifier: url
>              scraper:
>                  foreach: //tbody[@name]
>                  get:
>                      thumbnail:
>                          - //tr/td[@rowspan="4"]/a/img
>                          - @src
>                      title:
>                          - //tr/td[@class="small"]/b/a
>                          - TEXT
>                      url:
>                          - //tr/td[@class="small"]/b/a
>                          - @href

> - Any obvious reasons why this doesn't work? Can't seem to match the  
> img, which is inside a td with a rowspan of 4.

Is it? I see this HTML in your suggested example wishlist:

   <table width="100%">
   <tbody name="item.0.I2IRDH1RVW6KQ1.B000001Y33">
     <tr valign="top">
       <td align="center" width="65" valign="top">
         <a href="http://www.amazon.com/gp/product/B000001Y33/...">
           <img src="http://ecx.images-amazon.com/images/I/
                     11-gcd7NVjL.jpg"
                width="110" alt="" height="110" border="0" />
         </a>
       </td>

It doesn't seem to have a rowspan="4". I do see the rowspan="4" if I 
view my own wishlist while I'm logged in, but as Action Streams won't 
be logged in to Amazon when it fetches the wishlist page, I think it 
should see the HTML I'm seeing, with no rowspan.

Assuming the thumbnail is always the first image in the tbody section, 
this XPath selector should also work:

     thumbnail:
         - //img[1 = position()]
         - @src


> - Is there a log anywhere of what the scraper is doing? I wanted to  
> try different XPath queries but felt like was working blind- ended up  
> with blank or with various Scalar() values - which I assume means the  
> xpath expression matched too many things.

Web::Scraper comes with a "scraper" command line tool for debugging 
expressions. Debugging the wishlist example looked like this:

http://pastie.caboo.se/157833

The SCALAR(0x...) values are due to how Web::Scraper treats links. For 
tag attributes it knows are URIs, Web::Scraper returns URI objects 
instead of strings. Action Streams then has to turn those back into 
strings, but it only does it when selecting an event's "url" field, 
not the thumbnail.

In Action Streams 1.0, you'd have to make a Perl event instead of a 
Web::Scraper one to fix that. This patch will allow these selectors to 
work for your personal use, though:

http://pastie.caboo.se/157853

I hope this helps, and let me know if you have more questions.


Mark Paschal
Software developer, Movable Type
mark@sixapart.com

_______________________________________________
MTOS-dev mailing list
MTOS-dev@sixapart.com
http://www.sixapart.com/mailman/listinfo/mtos-dev
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic