[prev in list] [next in list] [prev in thread] [next in thread] 

List:       xml-dev
Subject:    Re: [xml-dev] XML data sets with (known) data quality problems
From:       Andrew Welch <andrew.j.welch () gmail ! com>
Date:       2012-02-06 14:02:33
Message-ID: CAEG2duAbX3d7tHvvAhv+WUsF8jtJodBTCE14LeV9rBstwD0rAA () mail ! gmail ! com
[Download RAW message or body]

> In order to test exhaustively this library, we need to have XML data sets
> that have data quality problems known a priori.
> By data quality problems, we mean: missing values, misspellings, synonyms,
> values out of domain, approximate duplicates, etc.


Government data:  http://data.gov.uk/data

I did a short contract for 'LinkedGov' a while back
(http://linkedgov.org/), it's their goal to make the data clean and
usable, so you might want to get in touch with them.



-- 
Andrew Welch
http://andrewjwelch.com

_______________________________________________________________________

XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.

[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic