[prev in list] [next in list] [prev in thread] [next in thread]
List: pywikipediabot-users
Subject: Re: [Pywikipedia-l] match and list, but not replace
From: Chris Watkins <chriswaterguy () appropedia ! org>
Date: 2010-04-12 13:32:20
Message-ID: g2o393c781d1004120632off1c1fdzfc2c993fc720410f () mail ! gmail ! com
[Download RAW message or body]
[Attachment #2 (multipart/alternative)]
On Mon, Apr 12, 2010 at 18:05, Merlijn van Deen <valhallasw@arctus.nl>wrote:
> Searching by using a text dump sounds more reasonable to me.
How would you do this?
E.g. I want to create a list all pages with tables - i.e. with the string
"{|". The MediaWiki search won't do this, but I assume it's possible with a
site dump. But I don't know the command to use.
Thanks
If you insist on changing replace.py, make sure you are removing all
> occurences of both put and put_async.
>
> Best regards,
> Merlijn 'valhallasw' van Deen
>
>
> On 12 April 2010 09:54, Chris Watkins <chriswaterguy@appropedia.org>wrote:
>
>> So I haven't found a way to make a list of matches without replacing. I
>> suspect there's a very simple way, or it would take very simple changes to
>> replace.py.
>>
>>
>> I tried editing replace.py myself, to make it do everything except replace
>> the files. Then I could hack the log files to get the list I want. But I had
>> no success - I'm not coder, so it was guesswork.
>>
>> I copied replace.py to a new file intended to do everything except put
>> files, and called it *replacenoput.py* (i.e. "replace," but no "put")
>>
>> My first attempt was to remove this section (commented it out first, but
>> then removed to be sure):
>>
>> if self.acceptall and new_text != original_text:
>> try:
>> page.put(new_text, self.editSummary)
>> except wikipedia.EditConflict:
>> wikipedia.output(u'Skipping %s because of edit
>> conflict'
>> % (page.title(),))
>> except wikipedia.SpamfilterError, e:
>> wikipedia.output(
>> u'Cannot change %s because of blacklist entry %s'
>> % (page.title(), e.url))
>> except wikipedia.PageNotSaved, error:
>> wikipedia.output(u'Error putting page: %s'
>> % (error.args,))
>> except wikipedia.LockedPage:
>> wikipedia.output(u'Skipping %s (locked page)'
>> % (page.title(),))
>>
>>
>> Fail - it made the changes all the same.
>>
>> Then I figured out that wikipedia.py was being used to put the files. So I
>> copied that to a new file *wikipedianoput.py* and changed every wikipedia
>> reference in *replacenoput.py* to wikipedianoput.
>>
>> Then I scanned through wikipedianoput.py looking for what I need to
>> block... but I couldn't tell.
>>
>> Can anyone help? Or even better, is there a more elegant way?
>>
>> Thanks
>> Chris
>>
>>
>> On Fri, Apr 2, 2010 at 00:12, Daniel Mietchen <
>> daniel.mietchen@googlemail.com> wrote:
>>
>>> Hi Chris,
>>>
>>> On Thu, Apr 1, 2010 at 2:26 PM, Chris Watkins
>>> <chriswaterguy@appropedia.org> wrote:
>>> > Thanks Daniel... I'm confused though.
>>> >
>>> > On Thu, Apr 1, 2010 at 20:25, Daniel Mietchen
>>> > <daniel.mietchen@googlemail.com> wrote:
>>> >>
>>> >> Perhaps
>>> >> http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py
>>> >> will do the trick,
>>> >
>>> > I can't see how to use it for matching a specific string.
>>> Nor do I - sorry. What I had in mind was to apply it to a page that
>>> contains your search string, and to restrict the search for "copyright
>>> violations" to your site.
>>> But this may indeed be a dead end.
>>>
>>> >> or simply
>>> >> http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py
>>> >> in -debug mode?
>>> >
>>> > Where can I find information on -debug mode? I see there is -verbose
>>> mode
>>> > which "may be helpful when debugging", but I don't see how that helps.
>>> I thought that most PWB scripts had it, but apparently replace.py does
>>> not.
>>>
>>> but if the
>>> def __init__(self, reader, force, append, summary, minor, autosummary,
>>> debug):
>>> line contains "debug" (as in the example above, taken from
>>>
>>> http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup
>>> ),
>>> then -debug is an option with which the script can be run such that it
>>> performs all its
>>> actions except editing the pages.
>>>
>>> I am not very experienced with Python or PWB either, but since nobody
>>> had replied so far, I wrote out my ideas as they came to mind.
>>> Sorry for the confusion,
>>>
>>> Daniel
>>>
>>> > I may be missing something obvious &-)
>>> Me too.
>>>
>>> > Chris
>>> >
>>> >
>>> >>
>>> >> Daniel
>>> >>
>>> >> On Thu, Apr 1, 2010 at 6:05 AM, Chris Watkins
>>> >> <chriswaterguy@appropedia.org> wrote:
>>> >> > I want to generate a list of matches for a search, but not do
>>> anything
>>> >> > to
>>> >> > the page.
>>> >> >
>>> >> > E.g. I want to list all pages that contain "redirect[[:Category",
>>> but I
>>> >> > don't want to modify the pages.
>>> >> >
>>> >> > I guess that it's possible to modify redirect.py (I don't speak
>>> python,
>>> >> > but
>>> >> > it shouldn't be hard) and run it with -log. But maybe there's a
>>> simpler
>>> >> > way?
>>> >> >
>>> >> > Thanks in advance.
>>> >> >
>>> >> > --
>>> >> > Chris Watkins
>>> >> >
>>> >> > Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>> >> >
>>> >> > blogs.appropedia.org
>>> >> > community.livejournal.com/appropedia
>>> >> > identi.ca/appropedia
>>> >> > twitter.com/appropedia
>>> >> >
>>> >> > _______________________________________________
>>> >> > Pywikipedia-l mailing list
>>> >> > Pywikipedia-l@lists.wikimedia.org
>>> >> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> http://www.google.com/profiles/daniel.mietchen
>>> >>
>>> >> _______________________________________________
>>> >> Pywikipedia-l mailing list
>>> >> Pywikipedia-l@lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >
>>> >
>>> >
>>> > --
>>> > Chris Watkins
>>> >
>>> > Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>> >
>>> > blogs.appropedia.org
>>> > community.livejournal.com/appropedia
>>> > identi.ca/appropedia
>>> > twitter.com/appropedia
>>> >
>>> > _______________________________________________
>>> > Pywikipedia-l mailing list
>>> > Pywikipedia-l@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> http://www.google.com/profiles/daniel.mietchen
>>>
>>> _______________________________________________
>>> Pywikipedia-l mailing list
>>> Pywikipedia-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>>
>>
>>
>>
>> --
>> Chris Watkins
>>
>> Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>
>> blogs.appropedia.org
>> community.livejournal.com/appropedia
>> identi.ca/appropedia
>> twitter.com/appropedia
>>
>> _______________________________________________
>> Pywikipedia-l mailing list
>> Pywikipedia-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>
>>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>
--
Chris Watkins
Appropedia.org - Sharing knowledge to build rich, sustainable lives.
blogs.appropedia.org
community.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia
[Attachment #5 (text/html)]
<br><br><div class="gmail_quote">On Mon, Apr 12, 2010 at 18:05, Merlijn van Deen \
<span dir="ltr"><<a \
href="mailto:valhallasw@arctus.nl">valhallasw@arctus.nl</a>></span> \
wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; \
border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
Searching by using a text dump sounds more reasonable to me. \
</blockquote><div><br>How would you do this? <br><br>E.g. I want to create a list all \
pages with tables - i.e. with the string "{|". The MediaWiki search \
won't do this, but I assume it's possible with a site dump. But I don't \
know the command to use.<br><br>Thanks <br><br><br></div><blockquote \
class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, \
204, 204); padding-left: 1ex;">
If you insist on changing replace.py, make sure you are removing all occurences of \
both put and put_async.<br><br>Best regards,<br>Merlijn 'valhallasw' van \
Deen<div><div></div><div class="h5"><br>
<br><div class="gmail_quote">On 12 April 2010 09:54, Chris Watkins <span \
dir="ltr"><<a href="mailto:chriswaterguy@appropedia.org" \
target="_blank">chriswaterguy@appropedia.org</a>></span> wrote:<br><blockquote \
class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, \
204, 204); padding-left: 1ex;">
So I haven't found a way to make a list of matches without replacing. I suspect \
there's a very simple way, or it would take very simple changes to \
replace.py.<br><br><br>I tried editing replace.py myself, to make it do everything \
except replace the files. Then I could hack the log files to get the list I want. But \
I had no success - I'm not coder, so it was guesswork.<br>
<br>I copied replace.py to a new file intended to do everything except put files, \
and called it <b>replacenoput.py</b> (i.e. "replace," but no \
"put")<br><br>My first attempt was to remove this section (commented it out \
first, but then removed to be sure):<br>
<br> if self.acceptall and new_text != original_text:<br> \
try:<br> page.put(new_text, self.editSummary)<br> \
except wikipedia.EditConflict:<br> wikipedia.output(u'Skipping \
%s because of edit conflict'<br>
% (page.title(),))<br> except \
wikipedia.SpamfilterError, e:<br> wikipedia.output(<br> \
u'Cannot change %s because of blacklist entry %s'<br>
% (page.title(), e.url))<br> except \
wikipedia.PageNotSaved, error:<br> wikipedia.output(u'Error \
putting page: %s'<br> % (error.args,))<br>
except wikipedia.LockedPage:<br> \
wikipedia.output(u'Skipping %s (locked page)'<br> \
% (page.title(),))<br><br><br>Fail - it made the changes all the same. <br>
<br>Then I figured out that wikipedia.py was being used to put the files. So I copied \
that to a new file <b>wikipedianoput.py</b> and changed every wikipedia reference in \
<b>replacenoput.py</b> to wikipedianoput.<br><br>
Then I scanned through wikipedianoput.py looking for what I need to block... but I \
couldn't tell.<br> <br>Can anyone help? Or even better, is there a more elegant \
way?<br><br>Thanks <br><font \
color="#888888">Chris</font><div><div></div><div><br><br><div class="gmail_quote">On \
Fri, Apr 2, 2010 at 00:12, Daniel Mietchen <span dir="ltr"><<a \
href="mailto:daniel.mietchen@googlemail.com" \
target="_blank">daniel.mietchen@googlemail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px \
solid rgb(204, 204, 204); padding-left: 1ex;">Hi Chris,<br> <div><br>
On Thu, Apr 1, 2010 at 2:26 PM, Chris Watkins<br>
<<a href="mailto:chriswaterguy@appropedia.org" \
target="_blank">chriswaterguy@appropedia.org</a>> wrote:<br> > Thanks Daniel... \
I'm confused though.<br> ><br>
> On Thu, Apr 1, 2010 at 20:25, Daniel Mietchen<br>
> <<a href="mailto:daniel.mietchen@googlemail.com" \
target="_blank">daniel.mietchen@googlemail.com</a>> wrote:<br> >><br>
>> Perhaps<br>
>> <a href="http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py" \
target="_blank">http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py</a><br> \
>> will do the trick,<br> ><br>
> I can't see how to use it for matching a specific string.<br>
</div>Nor do I - sorry. What I had in mind was to apply it to a page that<br>
contains your search string, and to restrict the search for "copyright<br>
violations" to your site.<br>
But this may indeed be a dead end.<br>
<div><br>
>> or simply<br>
>> <a href="http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py" \
target="_blank">http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py</a><br> \
>> in -debug mode?<br> ><br>
> Where can I find information on -debug mode? I see there is -verbose mode<br>
> which "may be helpful when debugging", but I don't see how that \
helps.<br> </div>I thought that most PWB scripts had it, but apparently replace.py \
does not.<br> <br>
but if the<br>
def __init__(self, reader, force, append, summary, minor, autosummary, debug):<br>
line contains "debug" (as in the example above, taken from<br>
<a href="http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup" \
target="_blank">http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup</a><br>
),<br>
then -debug is an option with which the script can be run such that it<br>
performs all its<br>
actions except editing the pages.<br>
<br>
I am not very experienced with Python or PWB either, but since nobody<br>
had replied so far, I wrote out my ideas as they came to mind.<br>
Sorry for the confusion,<br>
<br>
Daniel<br>
<div><br>
> I may be missing something obvious &-)<br>
</div>Me too.<br>
<div><div></div><div><br>
> Chris<br>
><br>
><br>
>><br>
>> Daniel<br>
>><br>
>> On Thu, Apr 1, 2010 at 6:05 AM, Chris Watkins<br>
>> <<a href="mailto:chriswaterguy@appropedia.org" \
target="_blank">chriswaterguy@appropedia.org</a>> wrote:<br> >> > I want \
to generate a list of matches for a search, but not do anything<br> >> > \
to<br> >> > the page.<br>
>> ><br>
>> > E.g. I want to list all pages that contain \
"redirect[[:Category", but I<br> >> > don't want to modify the \
pages.<br> >> ><br>
>> > I guess that it's possible to modify redirect.py (I don't speak \
python,<br> >> > but<br>
>> > it shouldn't be hard) and run it with -log. But maybe there's a \
simpler<br> >> > way?<br>
>> ><br>
>> > Thanks in advance.<br>
>> ><br>
>> > --<br>
>> > Chris Watkins<br>
>> ><br>
>> > Appropedia.org - Sharing knowledge to build rich, sustainable \
lives.<br> >> ><br>
>> > <a href="http://blogs.appropedia.org" \
target="_blank">blogs.appropedia.org</a><br> >> > <a \
href="http://community.livejournal.com/appropedia" \
target="_blank">community.livejournal.com/appropedia</a><br> >> > <a \
href="http://identi.ca/appropedia" target="_blank">identi.ca/appropedia</a><br> \
>> > <a href="http://twitter.com/appropedia" \
target="_blank">twitter.com/appropedia</a><br> >> ><br>
>> > _______________________________________________<br>
>> > Pywikipedia-l mailing list<br>
>> > <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> >> > <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
>> ><br> >> ><br>
>><br>
>><br>
>><br>
>> --<br>
>> <a href="http://www.google.com/profiles/daniel.mietchen" \
target="_blank">http://www.google.com/profiles/daniel.mietchen</a><br> >><br>
>> _______________________________________________<br>
>> Pywikipedia-l mailing list<br>
>> <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> >> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
><br> ><br>
><br>
> --<br>
> Chris Watkins<br>
><br>
> Appropedia.org - Sharing knowledge to build rich, sustainable lives.<br>
><br>
> <a href="http://blogs.appropedia.org" \
target="_blank">blogs.appropedia.org</a><br> > <a \
href="http://community.livejournal.com/appropedia" \
target="_blank">community.livejournal.com/appropedia</a><br> > <a \
href="http://identi.ca/appropedia" target="_blank">identi.ca/appropedia</a><br> > \
<a href="http://twitter.com/appropedia" \
target="_blank">twitter.com/appropedia</a><br> ><br>
> _______________________________________________<br>
> Pywikipedia-l mailing list<br>
> <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> > <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
><br> ><br>
<br>
<br>
<br>
--<br>
<a href="http://www.google.com/profiles/daniel.mietchen" \
target="_blank">http://www.google.com/profiles/daniel.mietchen</a><br> <br>
_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
</div></div></blockquote></div><br><br clear="all"><br></div></div>-- \
<br><div><div></div><div>Chris Watkins<br><br>Appropedia.org - Sharing knowledge to \
build rich, sustainable lives.<br><br><a href="http://blogs.appropedia.org" \
target="_blank">blogs.appropedia.org</a><br>
<a href="http://community.livejournal.com/appropedia" \
target="_blank">community.livejournal.com/appropedia</a><br><a \
href="http://identi.ca/appropedia" target="_blank">identi.ca/appropedia</a><br><a \
href="http://twitter.com/appropedia" target="_blank">twitter.com/appropedia</a><br>
</div></div><br>_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
<br></blockquote></div><br> \
</div></div><br>_______________________________________________<br> Pywikipedia-l \
mailing list<br> <a href="mailto:Pywikipedia-l@lists.wikimedia.org">Pywikipedia-l@lists.wikimedia.org</a><br>
<a href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
<br></blockquote></div><br><br clear="all"><br>-- <br>Chris \
Watkins<br><br>Appropedia.org - Sharing knowledge to build rich, sustainable \
lives.<br><br><a href="http://blogs.appropedia.org">blogs.appropedia.org</a><br><a \
href="http://community.livejournal.com/appropedia">community.livejournal.com/appropedia</a><br>
<a href="http://identi.ca/appropedia">identi.ca/appropedia</a><br><a \
href="http://twitter.com/appropedia">twitter.com/appropedia</a><br>
_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic