[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pywikipediabot-users
Subject:    Re: [Pywikipedia-l] match and list, but not replace
From:       Chris Watkins <chriswaterguy () appropedia ! org>
Date:       2010-04-12 13:32:20
Message-ID: g2o393c781d1004120632off1c1fdzfc2c993fc720410f () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


On Mon, Apr 12, 2010 at 18:05, Merlijn van Deen <valhallasw@arctus.nl>wrote:

> Searching by using a text dump sounds more reasonable to me.


How would you do this?

E.g. I want to create a list all pages with tables - i.e. with the string
"{|". The MediaWiki search won't do this, but I assume it's possible with a
site dump. But I don't know the command to use.

Thanks


If you insist on changing replace.py, make sure you are removing all
> occurences of both put and put_async.
>
> Best regards,
> Merlijn 'valhallasw' van Deen
>
>
> On 12 April 2010 09:54, Chris Watkins <chriswaterguy@appropedia.org>wrote:
>
>> So I haven't found a way to make a list of matches without replacing. I
>> suspect there's a very simple way, or it would take very simple changes to
>> replace.py.
>>
>>
>> I tried editing replace.py myself, to make it do everything except replace
>> the files. Then I could hack the log files to get the list I want. But I had
>> no success - I'm not coder, so it was guesswork.
>>
>> I copied replace.py to a new file intended to do everything except put
>> files,  and called it *replacenoput.py* (i.e. "replace," but no "put")
>>
>> My first attempt was to remove this section (commented it out first, but
>> then removed to be sure):
>>
>>             if self.acceptall and new_text != original_text:
>>                 try:
>>                     page.put(new_text, self.editSummary)
>>                 except wikipedia.EditConflict:
>>                     wikipedia.output(u'Skipping %s because of edit
>> conflict'
>>                                      % (page.title(),))
>>                 except wikipedia.SpamfilterError, e:
>>                     wikipedia.output(
>>                         u'Cannot change %s because of blacklist entry %s'
>>                         % (page.title(), e.url))
>>                 except wikipedia.PageNotSaved, error:
>>                     wikipedia.output(u'Error putting page: %s'
>>                                      % (error.args,))
>>                 except wikipedia.LockedPage:
>>                     wikipedia.output(u'Skipping %s (locked page)'
>>                                      % (page.title(),))
>>
>>
>> Fail - it made the changes all the same.
>>
>> Then I figured out that wikipedia.py was being used to put the files. So I
>> copied that to a new file *wikipedianoput.py* and changed every wikipedia
>> reference in *replacenoput.py* to wikipedianoput.
>>
>> Then I scanned through wikipedianoput.py looking for what I need to
>> block... but I couldn't tell.
>>
>> Can anyone help? Or even better, is there a more elegant way?
>>
>> Thanks
>> Chris
>>
>>
>> On Fri, Apr 2, 2010 at 00:12, Daniel Mietchen <
>> daniel.mietchen@googlemail.com> wrote:
>>
>>> Hi Chris,
>>>
>>> On Thu, Apr 1, 2010 at 2:26 PM, Chris Watkins
>>> <chriswaterguy@appropedia.org> wrote:
>>> > Thanks Daniel... I'm confused though.
>>> >
>>> > On Thu, Apr 1, 2010 at 20:25, Daniel Mietchen
>>> > <daniel.mietchen@googlemail.com> wrote:
>>> >>
>>> >> Perhaps
>>> >> http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py
>>> >> will do the trick,
>>> >
>>> > I can't see how to use it for matching a specific string.
>>> Nor do I - sorry. What I had in mind was to apply it to a page that
>>> contains your search string, and to restrict the search for "copyright
>>> violations" to your site.
>>> But this may indeed be a dead end.
>>>
>>> >> or simply
>>> >> http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py
>>> >> in -debug mode?
>>> >
>>> > Where can I find information on -debug mode? I see there is -verbose
>>> mode
>>> > which "may be helpful when debugging", but I don't see how that helps.
>>> I thought that most PWB scripts had it, but apparently replace.py does
>>> not.
>>>
>>> but if the
>>>  def __init__(self, reader, force, append, summary, minor, autosummary,
>>> debug):
>>> line contains "debug" (as in the example above, taken from
>>>
>>> http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup
>>> ),
>>> then -debug is an option with which the script can be run such that it
>>> performs all its
>>> actions except editing the pages.
>>>
>>> I am not very experienced with Python or PWB either, but since nobody
>>> had replied so far, I wrote out my ideas as they came to mind.
>>> Sorry for the confusion,
>>>
>>> Daniel
>>>
>>> > I may be missing something obvious &-)
>>> Me too.
>>>
>>> > Chris
>>> >
>>> >
>>> >>
>>> >> Daniel
>>> >>
>>> >> On Thu, Apr 1, 2010 at 6:05 AM, Chris Watkins
>>> >> <chriswaterguy@appropedia.org> wrote:
>>> >> > I want to generate a list of matches for a search, but not do
>>> anything
>>> >> > to
>>> >> > the page.
>>> >> >
>>> >> > E.g. I want to list all pages that contain "redirect[[:Category",
>>> but I
>>> >> > don't want to modify the pages.
>>> >> >
>>> >> > I guess that it's possible to modify redirect.py (I don't speak
>>> python,
>>> >> > but
>>> >> > it shouldn't be hard) and run it with -log. But maybe there's a
>>> simpler
>>> >> > way?
>>> >> >
>>> >> > Thanks in advance.
>>> >> >
>>> >> > --
>>> >> > Chris Watkins
>>> >> >
>>> >> > Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>> >> >
>>> >> > blogs.appropedia.org
>>> >> > community.livejournal.com/appropedia
>>> >> > identi.ca/appropedia
>>> >> > twitter.com/appropedia
>>> >> >
>>> >> > _______________________________________________
>>> >> > Pywikipedia-l mailing list
>>> >> > Pywikipedia-l@lists.wikimedia.org
>>> >> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >> >
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> http://www.google.com/profiles/daniel.mietchen
>>> >>
>>> >> _______________________________________________
>>> >> Pywikipedia-l mailing list
>>> >> Pywikipedia-l@lists.wikimedia.org
>>> >> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >
>>> >
>>> >
>>> > --
>>> > Chris Watkins
>>> >
>>> > Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>> >
>>> > blogs.appropedia.org
>>> > community.livejournal.com/appropedia
>>> > identi.ca/appropedia
>>> > twitter.com/appropedia
>>> >
>>> > _______________________________________________
>>> > Pywikipedia-l mailing list
>>> > Pywikipedia-l@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> http://www.google.com/profiles/daniel.mietchen
>>>
>>> _______________________________________________
>>> Pywikipedia-l mailing list
>>> Pywikipedia-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>>
>>
>>
>>
>> --
>> Chris Watkins
>>
>> Appropedia.org - Sharing knowledge to build rich, sustainable lives.
>>
>> blogs.appropedia.org
>> community.livejournal.com/appropedia
>> identi.ca/appropedia
>> twitter.com/appropedia
>>
>> _______________________________________________
>> Pywikipedia-l mailing list
>> Pywikipedia-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>>
>>
>
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
>


-- 
Chris Watkins

Appropedia.org - Sharing knowledge to build rich, sustainable lives.

blogs.appropedia.org
community.livejournal.com/appropedia
identi.ca/appropedia
twitter.com/appropedia

[Attachment #5 (text/html)]

<br><br><div class="gmail_quote">On Mon, Apr 12, 2010 at 18:05, Merlijn van Deen \
<span dir="ltr">&lt;<a \
href="mailto:valhallasw@arctus.nl">valhallasw@arctus.nl</a>&gt;</span> \
wrote:<br><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; \
border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">

Searching by using a text dump sounds more reasonable to me. \
</blockquote><div><br>How would you do this? <br><br>E.g. I want to create a list all \
pages with tables - i.e. with the string &quot;{|&quot;. The  MediaWiki search \
won&#39;t do this, but I assume it&#39;s possible with a site dump. But I don&#39;t \
know the command to use.<br><br>Thanks <br><br><br></div><blockquote \
class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, \
204, 204); padding-left: 1ex;">

If you insist on changing replace.py, make sure you are removing all occurences of \
both put and put_async.<br><br>Best regards,<br>Merlijn &#39;valhallasw&#39; van \
Deen<div><div></div><div class="h5"><br>

<br><div class="gmail_quote">On 12 April 2010 09:54, Chris Watkins <span \
dir="ltr">&lt;<a href="mailto:chriswaterguy@appropedia.org" \
target="_blank">chriswaterguy@appropedia.org</a>&gt;</span> wrote:<br><blockquote \
class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, \
204, 204); padding-left: 1ex;">



So I haven&#39;t found a way to make a list of matches without replacing. I suspect \
there&#39;s a very simple way, or it would take very simple changes to \
replace.py.<br><br><br>I tried editing replace.py myself, to make it do everything \
except replace the files. Then I could hack the log files to get the list I want. But \
I had no success - I&#39;m not coder, so it was guesswork.<br>





<br>I copied replace.py to a new file intended to do everything except put files,  \
and called it <b>replacenoput.py</b> (i.e. &quot;replace,&quot; but no \
&quot;put&quot;)<br><br>My first attempt was to remove this section (commented it out \
first, but then removed to be sure):<br>





<br>            if self.acceptall and new_text != original_text:<br>                \
try:<br>                    page.put(new_text, self.editSummary)<br>                \
except wikipedia.EditConflict:<br>                    wikipedia.output(u&#39;Skipping \
%s because of edit conflict&#39;<br>





                                     % (page.title(),))<br>                except \
wikipedia.SpamfilterError, e:<br>                    wikipedia.output(<br>            \
u&#39;Cannot change %s because of blacklist entry %s&#39;<br>





                        % (page.title(), e.url))<br>                except \
wikipedia.PageNotSaved, error:<br>                    wikipedia.output(u&#39;Error \
putting page: %s&#39;<br>                                     % (error.args,))<br>





                except wikipedia.LockedPage:<br>                    \
wikipedia.output(u&#39;Skipping %s (locked page)&#39;<br>                             \
% (page.title(),))<br><br><br>Fail - it made the changes all the same. <br>





<br>Then I figured out that wikipedia.py was being used to put the files. So I copied \
that to a new file <b>wikipedianoput.py</b> and changed every wikipedia reference in \
<b>replacenoput.py</b> to wikipedianoput.<br><br>




Then I scanned through wikipedianoput.py looking for what I need to block... but I \
couldn&#39;t tell.<br> <br>Can anyone help? Or even better, is there a more elegant \
way?<br><br>Thanks <br><font \
color="#888888">Chris</font><div><div></div><div><br><br><div class="gmail_quote">On \
Fri, Apr 2, 2010 at 00:12, Daniel Mietchen <span dir="ltr">&lt;<a \
href="mailto:daniel.mietchen@googlemail.com" \
target="_blank">daniel.mietchen@googlemail.com</a>&gt;</span> wrote:<br>





<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px \
solid rgb(204, 204, 204); padding-left: 1ex;">Hi Chris,<br> <div><br>
On Thu, Apr 1, 2010 at 2:26 PM, Chris Watkins<br>
&lt;<a href="mailto:chriswaterguy@appropedia.org" \
target="_blank">chriswaterguy@appropedia.org</a>&gt; wrote:<br> &gt; Thanks Daniel... \
I&#39;m confused though.<br> &gt;<br>
&gt; On Thu, Apr 1, 2010 at 20:25, Daniel Mietchen<br>
&gt; &lt;<a href="mailto:daniel.mietchen@googlemail.com" \
target="_blank">daniel.mietchen@googlemail.com</a>&gt; wrote:<br> &gt;&gt;<br>
&gt;&gt; Perhaps<br>
&gt;&gt; <a href="http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py" \
target="_blank">http://meta.wikimedia.org/wiki/Pywikipediabot/copyright.py</a><br> \
&gt;&gt; will do the trick,<br> &gt;<br>
&gt; I can&#39;t see how to use it for matching a specific string.<br>
</div>Nor do I - sorry. What I had in mind was to apply it to a page that<br>
contains your search string, and to restrict the search for &quot;copyright<br>
violations&quot; to your site.<br>
But this may indeed be a dead end.<br>
<div><br>
&gt;&gt; or simply<br>
&gt;&gt; <a href="http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py" \
target="_blank">http://meta.wikimedia.org/wiki/Pywikipediabot/replace.py</a><br> \
&gt;&gt; in -debug mode?<br> &gt;<br>
&gt; Where can I find information on -debug mode? I see there is -verbose mode<br>
&gt; which &quot;may be helpful when debugging&quot;, but I don&#39;t see how that \
helps.<br> </div>I thought that most PWB scripts had it, but apparently replace.py \
does not.<br> <br>
but if the<br>
 def __init__(self, reader, force, append, summary, minor, autosummary, debug):<br>
line contains &quot;debug&quot; (as in the example above, taken from<br>
<a href="http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup" \
target="_blank">http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikipedia/pagefromfile.py?view=markup</a><br>
 ),<br>
then -debug is an option with which the script can be run such that it<br>
performs all its<br>
actions except editing the pages.<br>
<br>
I am not very experienced with Python or PWB either, but since nobody<br>
had replied so far, I wrote out my ideas as they came to mind.<br>
Sorry for the confusion,<br>
<br>
Daniel<br>
<div><br>
&gt; I may be missing something obvious &amp;-)<br>
</div>Me too.<br>
<div><div></div><div><br>
&gt; Chris<br>
&gt;<br>
&gt;<br>
&gt;&gt;<br>
&gt;&gt; Daniel<br>
&gt;&gt;<br>
&gt;&gt; On Thu, Apr 1, 2010 at 6:05 AM, Chris Watkins<br>
&gt;&gt; &lt;<a href="mailto:chriswaterguy@appropedia.org" \
target="_blank">chriswaterguy@appropedia.org</a>&gt; wrote:<br> &gt;&gt; &gt; I want \
to generate a list of matches for a search, but not do anything<br> &gt;&gt; &gt; \
to<br> &gt;&gt; &gt; the page.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; E.g. I want to list all pages that contain \
&quot;redirect[[:Category&quot;, but I<br> &gt;&gt; &gt; don&#39;t want to modify the \
pages.<br> &gt;&gt; &gt;<br>
&gt;&gt; &gt; I guess that it&#39;s possible to modify redirect.py (I don&#39;t speak \
python,<br> &gt;&gt; &gt; but<br>
&gt;&gt; &gt; it shouldn&#39;t be hard) and run it with -log. But maybe there&#39;s a \
simpler<br> &gt;&gt; &gt; way?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Thanks in advance.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; --<br>
&gt;&gt; &gt; Chris Watkins<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Appropedia.org - Sharing knowledge to build rich, sustainable \
lives.<br> &gt;&gt; &gt;<br>
&gt;&gt; &gt; <a href="http://blogs.appropedia.org" \
target="_blank">blogs.appropedia.org</a><br> &gt;&gt; &gt; <a \
href="http://community.livejournal.com/appropedia" \
target="_blank">community.livejournal.com/appropedia</a><br> &gt;&gt; &gt; <a \
href="http://identi.ca/appropedia" target="_blank">identi.ca/appropedia</a><br> \
&gt;&gt; &gt; <a href="http://twitter.com/appropedia" \
target="_blank">twitter.com/appropedia</a><br> &gt;&gt; &gt;<br>
&gt;&gt; &gt; _______________________________________________<br>
&gt;&gt; &gt; Pywikipedia-l mailing list<br>
&gt;&gt; &gt; <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> &gt;&gt; &gt; <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
&gt;&gt; &gt;<br> &gt;&gt; &gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; <a href="http://www.google.com/profiles/daniel.mietchen" \
target="_blank">http://www.google.com/profiles/daniel.mietchen</a><br> &gt;&gt;<br>
&gt;&gt; _______________________________________________<br>
&gt;&gt; Pywikipedia-l mailing list<br>
&gt;&gt; <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> &gt;&gt; <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
&gt;<br> &gt;<br>
&gt;<br>
&gt; --<br>
&gt; Chris Watkins<br>
&gt;<br>
&gt; Appropedia.org - Sharing knowledge to build rich, sustainable lives.<br>
&gt;<br>
&gt; <a href="http://blogs.appropedia.org" \
target="_blank">blogs.appropedia.org</a><br> &gt; <a \
href="http://community.livejournal.com/appropedia" \
target="_blank">community.livejournal.com/appropedia</a><br> &gt; <a \
href="http://identi.ca/appropedia" target="_blank">identi.ca/appropedia</a><br> &gt; \
<a href="http://twitter.com/appropedia" \
target="_blank">twitter.com/appropedia</a><br> &gt;<br>
&gt; _______________________________________________<br>
&gt; Pywikipedia-l mailing list<br>
&gt; <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> &gt; <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
&gt;<br> &gt;<br>
<br>
<br>
<br>
--<br>
<a href="http://www.google.com/profiles/daniel.mietchen" \
target="_blank">http://www.google.com/profiles/daniel.mietchen</a><br> <br>
_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
</div></div></blockquote></div><br><br clear="all"><br></div></div>-- \
<br><div><div></div><div>Chris Watkins<br><br>Appropedia.org - Sharing knowledge to \
build rich, sustainable lives.<br><br><a href="http://blogs.appropedia.org" \
target="_blank">blogs.appropedia.org</a><br>





<a href="http://community.livejournal.com/appropedia" \
target="_blank">community.livejournal.com/appropedia</a><br><a \
href="http://identi.ca/appropedia" target="_blank">identi.ca/appropedia</a><br><a \
href="http://twitter.com/appropedia" target="_blank">twitter.com/appropedia</a><br>






</div></div><br>_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
<br></blockquote></div><br> \
</div></div><br>_______________________________________________<br> Pywikipedia-l \
mailing list<br> <a href="mailto:Pywikipedia-l@lists.wikimedia.org">Pywikipedia-l@lists.wikimedia.org</a><br>
 <a href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
<br></blockquote></div><br><br clear="all"><br>-- <br>Chris \
Watkins<br><br>Appropedia.org - Sharing knowledge to build rich, sustainable \
lives.<br><br><a href="http://blogs.appropedia.org">blogs.appropedia.org</a><br><a \
href="http://community.livejournal.com/appropedia">community.livejournal.com/appropedia</a><br>


<a href="http://identi.ca/appropedia">identi.ca/appropedia</a><br><a \
href="http://twitter.com/appropedia">twitter.com/appropedia</a><br>



_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic