[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pywikipediabot-users
Subject:    Re: [Pywikipedia-l] Wikidata and Pywikipedia
From:       Yuri Astrakhan <yuriastrakhan () gmail ! com>
Date:       2013-03-03 18:34:23
Message-ID: CAJGfNe_jCLURRssNmBBKG8Pg+_Vn-tQqiu8pbFWTver0s1trJw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hehe, I could always use more hands - especially if you want to try
reworking the server-side wikidata API. Right now is the best time to do
it, otherwise lots more bots will be written for the current API, and it
will be so much harder to change.

Lets start with cleaning up the API specification at
http://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API

I need to iron out
* generator - ability to search wikidata based on some parameters to get a
list of items (Q pages)
* props - ability to request specific information from each Q page

This should get us started, as coding should be easy once we know exactly
what we need :)
Thanks!


On Sun, Mar 3, 2013 at 1:13 PM, Amir Ladsgroup <ladsgroup@gmail.com> wrote:

> You can count on about writing the code :) Tell me if need more hands
> 
> 
> On Sat, Mar 2, 2013 at 7:48 PM, Yuri Astrakhan <yuriastrakhan@gmail.com>wrote:
> 
> > Of course, the goal is to keep (and even enhance) the current
> > functionality. And yes, there will be a significant overlap when both api's
> > would function. The main reason for posting it now is so that whoever
> > decides to implement it would know the wikidata api roadmap and plan
> > accordingly. Plus I hope for a ton of good suggestions :)
> > 
> > 
> > On Sat, Mar 2, 2013 at 11:13 AM, Maarten Dammers <maarten@mdammers.nl>wrote:
> > 
> > > Hi Yurik,
> > > 
> > > Op 2-3-2013 17:00, Yuri Astrakhan schreef:
> > > 
> > > I would also like to bring up the pending Wikidata API RFC at
> > > http://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API
> > > 
> > > Thanks for pointing that out.
> > > 
> > > 
> > > In that RFC I plan to unify Wikidata API to be more seamlessly
> > > integrated with the core query API. Any feedback from pywiki community is
> > > welcome.
> > > 
> > > If we can still functionally do the same what I described below you
> > > probably won't get any complaints from this side. Is that the case?
> > > Can you please include a period of overlap of old-style and new-style
> > > api so we have time to update the framework?
> > > 
> > > Maarten
> > > 
> > > 
> > > 
> > > On Sat, Mar 2, 2013 at 10:49 AM, Maarten Dammers <maarten@mdammers.nl>wrote:
> > > 
> > > > Hi everyone,
> > > > 
> > > > As you might know phase 1 of Wikidata (interwiki links) is live at a
> > > > lot of Wikipedia's and soon to be turned on for all Wikipedia's. Phase 2 is
> > > > next, that's basically about infobox data. We are going to need a lot of
> > > > clever bots to fill Wikidata. To make that possible Pywikipedia should
> > > > (properly) implement Wikidata. That way bot authors don't have to worry or
> > > > care about the inner workings of the Wikidata api, they just talk to the
> > > > framework. At the moment trunk has a first implementation that isn't very
> > > > clean and in the rewrite it's still missing.
> > > > 
> > > > Legoktm and I talked about this on irc. We need to have a proper data
> > > > model in Pywikipedia. Based on
> > > > https://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model_primer :
> > > > * WikibasePage is a subclass of Page and has some basic shared
> > > > functions for labels, descriptions and aliases
> > > > * ItemPage is a subclass of WikibasePage with some item specific
> > > > functions like claims and sitelinks (example
> > > > https://www.wikidata.org/wiki/Q256638)
> > > > * PropertyPage is a subclass of WikibasePage with some property
> > > > specific functions for the datatype (example
> > > > https://www.wikidata.org/wiki/Property:P22)
> > > > * QueryPage is a subclass of WikibasePage for the future query type
> > > > * Claim is a subclass of object for claims. Simplified: It's a property
> > > > (P22, father) attached to an item (Q256638, the princes) linking to another
> > > > item (Q380949, Willem IV)
> > > > 
> > > > You can get these pages like a normal page (site object + title), but
> > > > you probably also want to get them based on a Wikipedia page. For that
> > > > there is
> > > > https://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Princess%20Carolina%20of%20Orange-Nassau. \
> > > > We should have a staticmethod itemByPage(Page) in which Page is \
> > > > https://en.wikipedia.org/wiki/Princess_Carolina_of_Orange-Nassau and it will \
> > > > give you the itemPage object for https://www.wikidata.org/wiki/Q256638. \
> > > > Currently in trunk the DataPage object has a constructor where you can give a \
> > > > page object and you'll get the corrosponding dataPage. I don't think that's \
> > > > the way to do it because it violates the data model and will get us in a lot \
> > > > of trouble later on when other sites (like Commons) might implement the \
> > > > Wikibase extension. 
> > > > A WikibasePage should work the same as a normal page when it comes to
> > > > fetching data. It should have the initial version (just a title, no
> > > > content) and once you use a function that needs data (or you force it), it
> > > > will fetch all the data from Wikibase and caches it.
> > > > * For an item the data looks like
> > > > https://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q256638&format=json
> > > >                 
> > > > * For a property the data looks like
> > > > https://www.wikidata.org/w/api.php?action=wbgetentities&ids=P22&format=json
> > > > Parts of the data (description, aliases and labels) should be processed
> > > > in the get function of WikibasePage, other parts in ItemPage /PropertyPage
> > > > 
> > > > Based on the api we should probably have some generators:
> > > > * One or more generator that uses wbgetentities to (pre-)fetch objects
> > > > * A search generator that uses wbsearchentities
> > > > 
> > > > WikibasePage:
> > > > * Set/add/delete label (@property?)
> > > > * Set/add/delete description (@property?)
> > > > * Set/add/delete alias (@property?)
> > > > 
> > > > ItemPage
> > > > * Set/add/delete sitelink (@property?)
> > > > 
> > > > Claim logic
> > > > 
> > > > Not sure how we can use wbeditentity and wblinktitles
> > > > 
> > > > We took some notes on
> > > > https://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata/Rewrite_proposal.
> > > >  
> > > > What do you think? Is this the right direction? Feedback is appreciated.
> > > > 
> > > > Maarten
> > > > 
> > > > 
> > > > _______________________________________________
> > > > Pywikipedia-l mailing list
> > > > Pywikipedia-l@lists.wikimedia.org
> > > > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> > > > 
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Pywikipedia-l mailing \
> > > listPywikipedia-l@lists.wikimedia.orghttps://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> > >  
> > > 
> > > 
> > > _______________________________________________
> > > Pywikipedia-l mailing list
> > > Pywikipedia-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> > > 
> > > 
> > 
> > _______________________________________________
> > Pywikipedia-l mailing list
> > Pywikipedia-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> > 
> > 
> 
> 
> --
> Amir
> 
> 
> _______________________________________________
> Pywikipedia-l mailing list
> Pywikipedia-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
> 
> 


[Attachment #5 (text/html)]

<div dir="ltr">Hehe, I could always use more hands - especially if you want to try \
reworking the server-side wikidata API. Right now is the best time to do it, \
otherwise lots more bots will be written for the current API, and it will be so much \
harder to change.<div>

<br><div>Lets start with cleaning up the API specification at <a \
href="http://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API">http://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API</a></div></div><div>
 <br>
</div><div style>I need to iron out</div><div style>* generator - ability to search \
wikidata based on some parameters to get a list of items (Q pages)</div><div style>* \
props - ability to request specific information from each Q page</div>

<div style><br></div><div style>This should get us started, as coding should be easy \
once we know exactly what we need :)</div><div style>Thanks!</div></div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Sun, Mar 3, 2013 at 1:13 PM, \
Amir Ladsgroup <span dir="ltr">&lt;<a href="mailto:ladsgroup@gmail.com" \
target="_blank">ladsgroup@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">You can count on about writing the code :) \
Tell me if need more hands<br></div><div class="gmail_extra">

<div><div class="h5"><br><br><div class="gmail_quote">On Sat, Mar 2, 2013 at 7:48 PM, \
Yuri Astrakhan <span dir="ltr">&lt;<a href="mailto:yuriastrakhan@gmail.com" \
target="_blank">yuriastrakhan@gmail.com</a>&gt;</span> wrote:<br>



<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr">Of course, the goal is to keep (and even \
enhance) the current functionality. And yes, there will be a significant overlap when \
both api&#39;s would function. The main reason for posting it now is so that whoever \
decides to implement it would know the wikidata api roadmap and plan accordingly. \
Plus I hope for a ton of good suggestions :)</div>



<div><div>

<div class="gmail_extra"><br><br><div class="gmail_quote">On Sat, Mar 2, 2013 at \
11:13 AM, Maarten Dammers <span dir="ltr">&lt;<a href="mailto:maarten@mdammers.nl" \
target="_blank">maarten@mdammers.nl</a>&gt;</span> wrote:<br>





<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex">  
    
  
  <div bgcolor="#FFFFFF" text="#000000">
    <div>Hi Yurik,<br>
      <br>
      Op 2-3-2013 17:00, Yuri Astrakhan schreef:<br>
    </div><div>
    <blockquote type="cite">
      <div dir="ltr">I would also like to bring up the pending Wikidata
        API RFC at <a \
href="http://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API" \
target="_blank">http://www.mediawiki.org/wiki/Requests_for_comment/Wikidata_API</a></div>
  </blockquote></div>
    Thanks for pointing that out.<div><br>
    <blockquote type="cite">
      <div dir="ltr">
        <div>
          <br>
        </div>
        <div>In that RFC I plan to unify Wikidata API to be
          more seamlessly integrated with the core query API. Any
          feedback from pywiki community is welcome.</div>
      </div>
    </blockquote></div>
    If we can still functionally do the same what I described below you
    probably won&#39;t get any complaints from this side. Is that the case?<br>
    Can you please include a period of overlap of old-style and
    new-style api so we have time to update the framework? <br><span><font \
color="#888888">  <br>
    Maarten</font></span><div><div><br>
    <blockquote type="cite">
      <div class="gmail_extra"><br>
        <br>
        <div class="gmail_quote">
          On Sat, Mar 2, 2013 at 10:49 AM, Maarten Dammers <span dir="ltr">&lt;<a \
href="mailto:maarten@mdammers.nl" target="_blank">maarten@mdammers.nl</a>&gt;</span>  \
                wrote:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px \
#ccc solid;padding-left:1ex">  Hi everyone,<br>
            <br>
            As you might know phase 1 of Wikidata (interwiki links) is
            live at a lot of Wikipedia&#39;s and soon to be turned on for
            all Wikipedia&#39;s. Phase 2 is next, that&#39;s basically about
            infobox data. We are going to need a lot of clever bots to
            fill Wikidata. To make that possible Pywikipedia should
            (properly) implement Wikidata. That way bot authors don&#39;t
            have to worry or care about the inner workings of the
            Wikidata api, they just talk to the framework. At the moment
            trunk has a first implementation that isn&#39;t very clean and
            in the rewrite it&#39;s still missing.<br>
            <br>
            Legoktm and I talked about this on irc. We need to have a
            proper data model in Pywikipedia. Based on <a \
href="https://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model_primer" \
target="_blank">https://meta.wikimedia.org/wiki/Wikidata/Notes/Data_model_primer</a>  \
                :<br>
            * WikibasePage is a subclass of Page and has some basic
            shared functions for labels, descriptions and aliases<br>
            * ItemPage is a subclass of WikibasePage with some item
            specific functions like claims and sitelinks (example <a \
href="https://www.wikidata.org/wiki/Q256638" \
                target="_blank">https://www.wikidata.org/wiki/Q256638</a>)<br>
            * PropertyPage is a subclass of WikibasePage with some
            property specific functions for the datatype (example <a \
href="https://www.wikidata.org/wiki/Property:P22" \
                target="_blank">https://www.wikidata.org/wiki/Property:P22</a>)<br>
            * QueryPage is a subclass of WikibasePage for the future
            query type<br>
            * Claim is a subclass of object for claims. Simplified: It&#39;s
            a property (P22, father) attached to an item (Q256638, the
            princes) linking to another item (Q380949, Willem IV)<br>
            <br>
            You can get these pages like a normal page (site object +
            title), but you probably also want to get them based on a
            Wikipedia page. For that there is <a \
href="https://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Princess%20Carolina%20of%20Orange-Nassau" \
target="_blank">https://www.wikidata.org/wiki/Special:ItemByTitle/enwiki/Princess%20Carolina%20of%20Orange-Nassau</a>
                
            . We should have a staticmethod itemByPage(Page) in which
            Page is <a \
href="https://en.wikipedia.org/wiki/Princess_Carolina_of_Orange-Nassau" \
                target="_blank">https://en.wikipedia.org/wiki/Princess_Carolina_of_Orange-Nassau</a>
                
            and it will give you the itemPage object for <a \
href="https://www.wikidata.org/wiki/Q256638" \
target="_blank">https://www.wikidata.org/wiki/Q256638</a>.  Currently in trunk the \
DataPage object has a constructor  where you can give a page object and you&#39;ll \
                get the
            corrosponding dataPage. I don&#39;t think that&#39;s the way to do
            it because it violates the data model and will get us in a
            lot of trouble later on when other sites (like Commons)
            might implement the Wikibase extension.<br>
            <br>
            A WikibasePage should work the same as a normal page when it
            comes to fetching data. It should have the initial version
            (just a title, no content) and once you use a function that
            needs data (or you force it), it will fetch all the data
            from Wikibase and caches it.<br>
            * For an item the data looks like <a \
href="https://www.wikidata.org/w/api.php?action=wbgetentities&amp;ids=Q256638&amp;format=json" \
target="_blank">https://www.wikidata.org/w/api.php?action=wbgetentities&amp;ids=Q256638&amp;format=json</a><br>







            * For a property the data looks like <a \
href="https://www.wikidata.org/w/api.php?action=wbgetentities&amp;ids=P22&amp;format=json" \
target="_blank">https://www.wikidata.org/w/api.php?action=wbgetentities&amp;ids=P22&amp;format=json</a><br>







            Parts of the data (description, aliases and labels) should
            be processed in the get function of WikibasePage, other
            parts in ItemPage /PropertyPage<br>
            <br>
            Based on the api we should probably have some generators:<br>
            * One or more generator that uses wbgetentities to
            (pre-)fetch objects<br>
            * A search generator that uses wbsearchentities<br>
            <br>
            WikibasePage:<br>
            * Set/add/delete label (@property?)<br>
            * Set/add/delete description (@property?)<br>
            * Set/add/delete alias (@property?)<br>
            <br>
            ItemPage<br>
            * Set/add/delete sitelink (@property?)<br>
            <br>
            Claim logic<br>
            <br>
            Not sure how we can use wbeditentity and wblinktitles<br>
            <br>
            We took some notes on <a \
href="https://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata/Rewrite_proposal" \
target="_blank">https://www.mediawiki.org/wiki/Manual:Pywikipediabot/Wikidata/Rewrite_proposal</a>
                
            .<br>
            <br>
            What do you think? Is this the right direction? Feedback is
            appreciated.<br>
            <br>
            Maarten<br>
            <br>
            <br>
            _______________________________________________<br>
            Pywikipedia-l mailing list<br>
            <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
                target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br>
            <a href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br>  \
</blockquote>  </div>
        <br>
      </div>
      <br>
      <fieldset></fieldset>
      <br>
      <pre>_______________________________________________
Pywikipedia-l mailing list
<a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a> </pre>
    </blockquote>
    <br>
  </div></div></div>

<br>_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
<br></blockquote></div><br></div> \
</div></div><br>_______________________________________________<br> Pywikipedia-l \
mailing list<br> <a href="mailto:Pywikipedia-l@lists.wikimedia.org" \
target="_blank">Pywikipedia-l@lists.wikimedia.org</a><br> <a \
href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
<br></blockquote></div><br><br clear="all"><br></div></div><span class="HOEnZb"><font \
color="#888888">-- <br>Amir<br><br> </font></span></div>
<br>_______________________________________________<br>
Pywikipedia-l mailing list<br>
<a href="mailto:Pywikipedia-l@lists.wikimedia.org">Pywikipedia-l@lists.wikimedia.org</a><br>
 <a href="https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l" \
target="_blank">https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l</a><br> \
<br></blockquote></div><br></div>


[Attachment #6 (text/plain)]

_______________________________________________
Pywikipedia-l mailing list
Pywikipedia-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic