'Re: Load duplicates'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       dbi-users
Subject:    Re: Load duplicates
From:       Ian <pcs305 () gmail ! com>
Date:       2023-08-16 14:42:44
Message-ID: CAMH845TsXBB8VjC_3fWcA+sjRA+81tBbZBbpUgYii=2VhXhHBw () mail ! gmail ! com
[Download RAW message or body]

Geoffrey,
Thank you for that info. Working on getting it in.
Very helpful.

Best regards.

On Tue, Aug 15, 2023 at 1:34â€¯PM Geoffrey Rommel <wgrommel@gmail.com> wrote:

> So are you reading the records one at a time through DBI and looking for
> duplicates? There is probably a better way. ðŸ˜
>
> I work with Teradata, so I'll use Teradata syntax, but the syntax for
> other databases will be similar.
>
> To find the duplicates in your table, you can select all columns from the
> table (except the auto-increment) along with a count, like so:
>
> create volatile table ##dup_counts
> as
> (select acct, comp, accr, descrip, all-other-columns...,
> min(auto_increment) min_num /* or max() */ , count(*) kount
> from table1
> group by acct, comp, accr, descrip, all-other-columns... )
> with data;
>
> Now there are no duplicates in ##dup_counts, so you can insert them to
> your target table, either generating a new auto_increment or using min_num.
>
> The duplicates can now be inserted into your error table:
> insert into error_table
> select acct, comp, ...etc.
> from ##dup_counts
> where kount > 1;
>
> Et voilÃ  ... the records are deduped with only three statements.
>
> As you probably know, in relational databases rows do not have an order,
> so I'm not sure that you necessarily need a sequence number on the
> duplicates. If you want to trace the duplicates back to a flat file source,
> it might be better to add a sequence number to the file before loading it
> to the database.
>
> I hope this helps.
>
>
> On Tue, Aug 15, 2023 at 11:50â€¯AM Ian <pcs305@gmail.com> wrote:
>
>> Hi group,
>>
>> My perl skills are basic and my SQL skills almost match that.
>> Using perl 5.28 and mysql on windows.
>>
>> I have a couple million records that needs processing to go to their
>> final destination.
>> Currently I'm catchin duplicates with error 1062 in perl and using that
>> to write the duplicate records to a separate table for later processing.
>>
>> Question: Can DBI give me a field from the record on the main table
>> causing the duplicate so I can add that to the record going to
>> the duplicate table?
>>
>> Example:
>> MAIN table fields = "auto_increment", acct, comp,accr,desc,etc....  :
>>  (auto_increment created when records are added, rest is the source
>> record)
>> Duplicate table fields = "auto_increment from MAIN table",
>> acct,comp,accr,desc,etc.....
>>
>> Thanks
>> Ian
>>
>>
>>

-- 
Ian

[Attachment #3 (text/html)]

<div dir="ltr">Geoffrey,<div>Thank you for that info. Working on getting it \
in.</div><div>Very helpful.</div><div><br></div><div>Best \
regards.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On \
Tue, Aug 15, 2023 at 1:34â€¯PM Geoffrey Rommel &lt;<a \
href="mailto:wgrommel@gmail.com">wgrommel@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div>So \
are you reading the records one at a time through DBI and looking for duplicates? \
There is probably a better way. ðŸ˜</div><div><br></div><div>I work with Teradata, \
so I&#39;ll use Teradata syntax, but the syntax for other databases will be \
similar.</div><div><br></div><div>To find the duplicates in your table, you can \
select all columns from the table (except the auto-increment) along with a count, \
like so:</div><div><br></div><div>create volatile table \
##dup_counts</div><div>as</div><div>(select acct, comp, accr, descrip, \
all-other-columns..., min(auto_increment) min_num /* or max() */ , count(*) \
kount</div><div>from table1</div><div>group by  acct, comp, accr, descrip, \
all-other-columns... )</div><div>with data;</div><div><br></div><div>Now there are no \
duplicates in ##dup_counts, so you can insert them to your target table, either \
generating a new auto_increment or using min_num.</div><div><br></div><div>The \
duplicates can now be inserted into your error table:</div><div>insert into \
error_table</div><div>select acct, comp, ...etc.</div><div>from \
##dup_counts</div><div>where kount &gt; 1;</div><div><br></div><div>Et voilÃ  ... the \
records are deduped with only three statements.</div><div><br></div><div>As you \
probably know, in relational databases rows do not have an order, so I&#39;m not sure \
that you necessarily need a sequence number on the duplicates. If you want to trace \
the duplicates back to a flat file source, it might be better to add a sequence \
number to the file before loading it to the database.</div><div><br></div><div>I hope \
this helps.</div><br></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Tue, Aug 15, 2023 at 11:50â€¯AM Ian &lt;<a \
href="mailto:pcs305@gmail.com" target="_blank">pcs305@gmail.com</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr">Hi \
group,<div><br></div><div>My perl skills are basic and my SQL skills almost match \
that.</div><div>Using perl 5.28 and mysql on windows.</div><div><br></div><div>I have \
a couple million records that needs processing to go to their final \
destination.</div><div>Currently I&#39;m catchin duplicates with error 1062 in perl \
and using that to write the duplicate records to a separate  table for later \
processing.<br></div><div><br></div><div>Question: Can DBI give me a field from the \
record on the main table causing the duplicate so I can add  that to the record going \
to the  duplicate table?</div><div><br></div><div>Example:</div><div>MAIN table \
fields = &quot;auto_increment&quot;, acct, comp,accr,desc,etc....   :</div><div>  \
(auto_increment created when records are added, rest is the source \
record)</div><div>Duplicate table fields = &quot;auto_increment from MAIN \
table&quot;, acct,comp,accr,desc,etc.....</div><div><br></div><div>Thanks</div><div>Ian</div><div><br></div><div><br></div></div>
 </blockquote></div>
</blockquote></div><br clear="all"><div><br></div><span \
class="gmail_signature_prefix">-- </span><br><div dir="ltr" \
class="gmail_signature"><div dir="ltr">Ian<br></div></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic