[prev in list] [next in list] [prev in thread] [next in thread]
List: mtos-dev
Subject: Re: [MTOS-dev] Developing action streams
From: Mark Paschal <mark () sixapart ! com>
Date: 2008-02-26 23:05:49
Message-ID: 47C49B4D.9040704 () sixapart ! com
[Download RAW message or body]
Hi, John:
John Eckman wrote:
> First, is there a better place to ask questions about the Action
> Streams plugin?
Not really. The mt-dev Yahoo Group may be better, as Action Streams
isn't part of MTOS.
I agree asking in the comment threads doesn't scale. Hopefully when we
reorganize code.sixapart.com next we'll have forums for supporting
projects like these.
> Here's the relevant info from the config.yaml I'm trying to use:
> action_streams:
> amazon:
> purchased:
> name: Purchased
> description: Items added to amazon wishlist
> html_form: 'New music purchase: <a href="[_2]">[_3]</a>'
> html_params:
> - url
> - title
> url: 'http://www.amazon.com/gp/registry/wishlist/{{ident}}'
> identifier: url
> scraper:
> foreach: //tbody[@name]
> get:
> thumbnail:
> - //tr/td[@rowspan="4"]/a/img
> - @src
> title:
> - //tr/td[@class="small"]/b/a
> - TEXT
> url:
> - //tr/td[@class="small"]/b/a
> - @href
> - Any obvious reasons why this doesn't work? Can't seem to match the
> img, which is inside a td with a rowspan of 4.
Is it? I see this HTML in your suggested example wishlist:
<table width="100%">
<tbody name="item.0.I2IRDH1RVW6KQ1.B000001Y33">
<tr valign="top">
<td align="center" width="65" valign="top">
<a href="http://www.amazon.com/gp/product/B000001Y33/...">
<img src="http://ecx.images-amazon.com/images/I/
11-gcd7NVjL.jpg"
width="110" alt="" height="110" border="0" />
</a>
</td>
It doesn't seem to have a rowspan="4". I do see the rowspan="4" if I
view my own wishlist while I'm logged in, but as Action Streams won't
be logged in to Amazon when it fetches the wishlist page, I think it
should see the HTML I'm seeing, with no rowspan.
Assuming the thumbnail is always the first image in the tbody section,
this XPath selector should also work:
thumbnail:
- //img[1 = position()]
- @src
> - Is there a log anywhere of what the scraper is doing? I wanted to
> try different XPath queries but felt like was working blind- ended up
> with blank or with various Scalar() values - which I assume means the
> xpath expression matched too many things.
Web::Scraper comes with a "scraper" command line tool for debugging
expressions. Debugging the wishlist example looked like this:
http://pastie.caboo.se/157833
The SCALAR(0x...) values are due to how Web::Scraper treats links. For
tag attributes it knows are URIs, Web::Scraper returns URI objects
instead of strings. Action Streams then has to turn those back into
strings, but it only does it when selecting an event's "url" field,
not the thumbnail.
In Action Streams 1.0, you'd have to make a Perl event instead of a
Web::Scraper one to fix that. This patch will allow these selectors to
work for your personal use, though:
http://pastie.caboo.se/157853
I hope this helps, and let me know if you have more questions.
Mark Paschal
Software developer, Movable Type
mark@sixapart.com
_______________________________________________
MTOS-dev mailing list
MTOS-dev@sixapart.com
http://www.sixapart.com/mailman/listinfo/mtos-dev
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic