[prev in list] [next in list] [prev in thread] [next in thread] 

List:       wikipedia-l
Subject:    Re: [Wikipedia-l] Project: This wikipedia-related article is a stub...
From:       Mark Williamson <node.ue () gmail ! com>
Date:       2005-09-07 0:18:50
Message-ID: 849f98ed0509061718e539b58 () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

I think that the problem is how much value is placed on article count.

Rather, we should place the value on size in bytes -- obviously, some
languages take up more or less space than others, but it does seem to
work better: some Wikis with high article counts but low amounts of
content appear lower or on the same level with Wikis with low article
counts but relatively high amounts of content.

For example, see br.wiki, scn.wiki, li.wiki, compare them with
bn.wiki, sa.wiki (much of the size of sa.wiki is artificial as well
due to whole sections of the Rgveda being copied verbatim when they
really belong in Wikisource), and gd.wiki.

In fact, you can tell just how nasty so many of the articles on
sa.wiki are by taking a look at this image:
http://en.wikipedia.org/wikistats/EN/PlotDatabaseSize7.png

It's the only wiki of such a size to have the vast majority of its
growth in giant leaps like that, which is indicative of a bot or some
other fast, low-quality article adding technique.

Mark

On 06/09/05, Tomasz Wegrzanowski <taw@users.sf.net> wrote:
> On Tue, Sep 06, 2005 at 11:52:21PM +0200, Lars Aronsson wrote:
> > Paweł Dembowski wrote:
> > > It seems to me that Swedish Wikipedia is quite the opposite - they
> > > have over 100,000 articles mostly because of the huge amount of
> > > substubs...
> >
> > I agree that this is embarrasing and should be addressed. I think
> > that the Danish Wikipedia, with 30,000 articles, has an even
> > higher percentage of (sub-)stubs than the Swedish one, but this is
> > just a feeling and I have no numbers to prove this.  We need a
> > statistic for the amount of (sub-)stubs, so we can talk verifiable
> > numbers (and set goals) instead of guestimates.  How do we define
> > that?  Is the ">200 ch" count ("alternative" article count, [1])
> > in Erik Zachte's Wikistats a good metric?  Or the percentage of
> > articles longer than 0.5 kilobytes [2]?  I think 200 characters is
> > an OK stub, but perhaps a substub is less than 70 characters?
> > This leaves us with the Special:Shortpages page.  That page has
> > the advantage of being instantly updated, which Wikistats is not.
> >
> > The Swedish Wikipedia has 421 articles (0.4% of 102K) shorter than
> > 70 bytes and the Danish has 351 (1.1% of 31K).  As a comparison,
> > the Dutch Wikipedia has 79 (0.08% of 89K) and the Polish has 387
> > (0.4% of 93K).  This makes the Polish look just as bad as the
> > Swedish, since both have 0.4% of articles shorter than 70 bytes.
> > But perhaps a substub should be defined at 50 bytes instead?
> > Or 100 bytes or 150?
> 
> Numbers like 0.4% of articles tell more about effectiveness
> of the wikicleaning process than about the typical article.
> (and by the way, Special:Shortpages is not updated live
> on WikiMedia servers)
> 
> Just take a look at the list of shortest pages on Polish
> Wikipedia - they're almost all:
> * Redirects (what are they doing on the list ?)
> * Disambiguation pages without descriptions for the links.
>   Sometimes articles have titles so obvious that {{disambig}} +
>   list of the links is enough.
> * A few cases of things that look like leftovers of the
>   past technical problems
> * A few cases of things that should be immediately deteled,
>   but have been missed or are simply too recent and will
>   be deleted soon
> _______________________________________________
> Wikipedia-l mailing list
> Wikipedia-l@Wikimedia.org
> http://mail.wikipedia.org/mailman/listinfo/wikipedia-l
> 


-- 
SI HOC LEGERE SCIS NIMIVM ERVDITIONIS HABES
QVANTVM MATERIAE MATERIETVR MARMOTA MONAX SI MARMOTA MONAX MATERIAM
POSSIT MATERIARI
ESTNE VOLVMEN IN TOGA AN SOLVM TIBI LIBET ME VIDERE


_______________________________________________
Wikipedia-l mailing list
Wikipedia-l@Wikimedia.org
http://mail.wikipedia.org/mailman/listinfo/wikipedia-l


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic