[prev in list] [next in list] [prev in thread] [next in thread]
List: sqlite-users
Subject: Re: [sqlite] Finding duplicate records
From: Clark Christensen <cdcmicro () yahoo ! com>
Date: 2005-01-27 23:13:39
Message-ID: 20050127231339.15338.qmail () web20423 ! mail ! yahoo ! com
[Download RAW message or body]
--- Brass Tilde <brasstilde@insightbb.com> wrote:
> > I'm new to SQL, and SQLite, and I find myself needing
> to
> > identify duplicate records in a SQLite table (there are
> > about 2K duplicates in a 36K row table). Any
> suggestions
> > or magic SQL queries appreciated :-)
>
> Here are a couple of strategies I've used successfully
> for finding and
> potentially removing duplicate records. It assumes a
> primary key field that
> is *separate* from the fields being checked for
> duplication. This example
> also assumes that I only want to check for duplication on
> two of the fields
> in the table: field1 and field2.
>
> Assume a table such as follows, that might included
> duplicate records.
>
> create table checkedtable
> (
> priKey integer primary key,
> field1 char (10),
> field2 char (10),
> <other possible fields>
> );
>
> and a temporary table to hold the duplicates
>
> create temp table dups
> (
> field1,
> field2,
> reccount,
> keyvalue
> );
>
> The first method is probably the simplest:
>
> insert into dups (field1, field2, keyvalue)
> select field1, field2, max(priKey)
> from checkedtable
> group by field1, field2
>
>
> I don't know if this one works on SQLite, as I've never
> had reason to try
> it. (I've learned a fair bit about DB design in the last
> couple of years
> before starting with SQLite, so duplication that needs
> correction is less of
> a problem now :) It does work on SQL Server 2000,
> though.
>
> The second method is one that I used before I understood
> that GROUP BY could
> be used for more than counting. :-)
>
> Get all the records that have duplicate information and
> put them into the
> temp table:
>
> insert into dups (field1, field2, reccount)
> select field1, field2, count(priKey)
> from checkedtable
> group by field1, field2
> having count(priKey) > 1
>
> Now, get the key for one of them:
>
> update dups
> set keyvalue = checkedtable.priKey
> from checkedtable
> where checkedtable.field1 = checkedtable.field1
> and checkedtable.field2 = checkedtable.field2
>
>
>
> Using either of these methods, the temporary table now
> contains key values
> that point to one of each set of duplicated records in
> the original table.
> It can be used to delete them, archive them, change them,
> whatever, by
> joining back to the original.
>
> Note that the code presented here is off the top of my
> head, and not tested,
> so there may be syntactic errors that I didn't catch
> while writing it. With
> any luck, there aren't any logic errors. Hopefully, it
> will give you a
> starting point.
>
> Brad
>
>
Basically, it worked in SQLite. Thanks for your help.
-Clark
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic