'Re: [sqlite] Finding duplicate records'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       sqlite-users
Subject:    Re: [sqlite] Finding duplicate records
From:       Clark Christensen <cdcmicro () yahoo ! com>
Date:       2005-01-27 23:13:39
Message-ID: 20050127231339.15338.qmail () web20423 ! mail ! yahoo ! com
[Download RAW message or body]


--- Brass Tilde <brasstilde@insightbb.com> wrote:

> > I'm new to SQL, and SQLite, and I find myself needing
> to
> > identify duplicate records in a SQLite table (there are
> > about 2K duplicates in a 36K row table).  Any
> suggestions
> > or magic SQL queries appreciated :-)
> 
> Here are a couple of strategies I've used successfully
> for finding and
> potentially removing duplicate records.  It assumes a
> primary key field that
> is *separate* from the fields being checked for
> duplication.  This example
> also assumes that I only want to check for duplication on
> two of the fields
> in the table: field1 and field2.
> 
> Assume a table such as follows, that might included
> duplicate records.
> 
>     create table checkedtable
>         (
>             priKey    integer primary key,
>             field1    char    (10),
>             field2    char    (10),
>             <other possible fields>
>         );
> 
> and a temporary table to hold the duplicates
> 
>     create temp table dups
>         (
>             field1,
>             field2,
>             reccount,
>             keyvalue
>         );
> 
> The first method is probably the simplest:
> 
>     insert into dups (field1, field2, keyvalue)
>         select field1, field2, max(priKey)
>         from   checkedtable
>         group by field1, field2
> 
> 
> I don't know if this one works on SQLite, as I've never
> had reason to try
> it.  (I've learned a fair bit about DB design in the last
> couple of years
> before starting with SQLite, so duplication that needs
> correction is less of
> a problem now :)  It does work on SQL Server 2000,
> though.
> 
> The second method is one that I used before I understood
> that GROUP BY could
> be used for more than counting. :-)
> 
> Get all the records that have duplicate information and
> put them into the
> temp table:
> 
>     insert into dups (field1, field2, reccount)
>         select field1, field2, count(priKey)
>         from checkedtable
>         group by field1, field2
>         having count(priKey) > 1
> 
> Now, get the key for one of them:
> 
>     update  dups
>     set     keyvalue = checkedtable.priKey
>     from    checkedtable
>     where   checkedtable.field1 = checkedtable.field1
>     and     checkedtable.field2 = checkedtable.field2
> 
> 
> 
> Using either of these methods, the temporary table now
> contains key values
> that point to one of each set of duplicated records in
> the original table.
> It can be used to delete them, archive them, change them,
> whatever, by
> joining back to the original.
> 
> Note that the code presented here is off the top of my
> head, and not tested,
> so there may be syntactic errors that I didn't catch
> while writing it.  With
> any luck, there aren't any logic errors.  Hopefully, it
> will give you a
> starting point.
> 
> Brad
> 
> 
Basically, it worked in SQLite.  Thanks for your help.

 -Clark

[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic