[prev in list] [next in list] [prev in thread] [next in thread]
List: xml-dev
Subject: Re: [xml-dev] XML data sets with (known) data quality problems
From: Andrew Welch <andrew.j.welch () gmail ! com>
Date: 2012-02-06 14:02:33
Message-ID: CAEG2duAbX3d7tHvvAhv+WUsF8jtJodBTCE14LeV9rBstwD0rAA () mail ! gmail ! com
[Download RAW message or body]
> In order to test exhaustively this library, we need to have XML data sets
> that have data quality problems known a priori.
> By data quality problems, we mean: missing values, misspellings, synonyms,
> values out of domain, approximate duplicates, etc.
Government data: http://data.gov.uk/data
I did a short contract for 'LinkedGov' a while back
(http://linkedgov.org/), it's their goal to make the data clean and
usable, so you might want to get in touch with them.
--
Andrew Welch
http://andrewjwelch.com
_______________________________________________________________________
XML-DEV is a publicly archived, unmoderated list hosted by OASIS
to support XML implementation and development. To minimize
spam in the archives, you must subscribe before posting.
[Un]Subscribe/change address: http://www.oasis-open.org/mlmanage/
Or unsubscribe: xml-dev-unsubscribe@lists.xml.org
subscribe: xml-dev-subscribe@lists.xml.org
List archive: http://lists.xml.org/archives/xml-dev/
List Guidelines: http://www.oasis-open.org/maillists/guidelines.php
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic