'Re: [TLM] Re: [GENERAL] batch insert/update'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [TLM] Re: [GENERAL] batch insert/update
From:       "blackwater dev" <blackwaterdev () gmail ! com>
Date:       2007-12-31 15:34:05
Message-ID: 34a9824e0712310734i3a412b80ia50dd05e4938f4ac () mail ! gmail ! com
[Download RAW message or body]

I was also thinking about adding a 'is_new' column to the table which I
would flag as 0, then do a basic copy of all the new rows in with is_new at
1.  I'd then do a delete statement to delete all the rows which are
duplicate and have a flag of 0 as the copy should leave me some with two
rows, one with is_new of 1 and some with 0.  Just don't know if this would
be best.

On Dec 26, 2007 3:13 PM, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:

> On Wed, 26 Dec 2007 20:48:27 +0100
> Andreas Kretschmer <akretschmer@spamfence.net> wrote:
>
> > blackwater dev <blackwaterdev@gmail.com> schrieb:
> >
> > > I have some php code that will be pulling in a file via ftp.
> > > This file will contain 20,000+ records that I then need to pump
> > > into the postgres db.  These records will represent a subset of
> > > the records in a certain table.  I basically need an efficient
> > > way to pump these rows into the table, replacing matching rows
> > > (based on id) already there and inserting ones that aren't.  Sort
> > > of looping through the result and inserting or updating based on
> > > the presents of the row, what is the best way to handle this?
> > > This is something that will run nightly.
>
> > Insert you data to a extra table and work with regular SQL to
> > insert/update the destination table. You can use COPY to insert the
> > data into your extra table, this works very fast, but you need a
> > suitable file format for this.
>
> What if you know in advance what are the row that should be inserted
> and you've a batch of rows that should be updated?
>
> Is it still the fasted system to insert them all in a temp table with
> copy?
>
> What about the one that have to be updated if you've all the columns,
> not just the changed ones?
> Is it faster to delete & insert or to update?
>
> updates comes with the same pk as the destination table.
>
> thx
>
> --
> Ivan Sergio Borgonovo
> http://www.webthatworks.it
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>

[Attachment #3 (text/html)]

I was also thinking about adding a &#39;is_new&#39; column to the table which I would \
flag as 0, then do a basic copy of all the new rows in with is_new at 1.&nbsp; \
I&#39;d then do a delete statement to delete all the rows which are duplicate and \
have a flag of 0 as the copy should leave me some with two rows, one with is_new of 1 \
and some with 0.&nbsp; Just don&#39;t know if this would be best. <br><br><div \
class="gmail_quote">On Dec 26, 2007 3:13 PM, Ivan Sergio Borgonovo &lt;<a \
href="mailto:mail@webthatworks.it">mail@webthatworks.it</a>&gt; wrote:<br><blockquote \
class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt \
0pt 0.8ex; padding-left: 1ex;"> <div class="Ih2E3d">On Wed, 26 Dec 2007 20:48:27 \
+0100<br>Andreas Kretschmer &lt;<a \
href="mailto:akretschmer@spamfence.net">akretschmer@spamfence.net</a>&gt; \
wrote:<br><br></div><div class="Ih2E3d">&gt; blackwater dev &lt; <a \
href="mailto:blackwaterdev@gmail.com">blackwaterdev@gmail.com</a>&gt; \
schrieb:<br>&gt;<br>&gt; &gt; I have some php code that will be pulling in a file via \
ftp.<br>&gt; &gt; This file will contain 20,000+ records that I then need to pump \
<br>&gt; &gt; into the postgres db. &nbsp;These records will represent a subset \
of<br>&gt; &gt; the records in a certain table. &nbsp;I basically need an \
efficient<br>&gt; &gt; way to pump these rows into the table, replacing matching rows \
<br>&gt; &gt; (based on id) already there and inserting ones that aren&#39;t. \
&nbsp;Sort<br>&gt; &gt; of looping through the result and inserting or updating based \
on<br>&gt; &gt; the presents of the row, what is the best way to handle this? \
<br>&gt; &gt; This is something that will run nightly.<br><br>&gt; Insert you data to \
a extra table and work with regular SQL to<br>&gt; insert/update the destination \
table. You can use COPY to insert the<br>&gt; data into your extra table, this works \
very fast, but you need a <br>&gt; suitable file format for this.<br><br></div><div \
class="Ih2E3d">What if you know in advance what are the row that should be \
inserted<br>and you&#39;ve a batch of rows that should be updated?<br><br>Is it still \
the fasted system to insert them all in a temp table with <br>copy?<br><br>What about \
the one that have to be updated if you&#39;ve all the columns,<br>not just the \
changed ones?<br>Is it faster to delete &amp; insert or to update?<br><br>updates \
comes with the same pk as the destination table. <br><br>thx<br><br></div><div \
class="Ih2E3d">--<br>Ivan Sergio Borgonovo<br><a href="http://www.webthatworks.it" \
target="_blank">http://www.webthatworks.it</a><br><br><br>---------------------------(end \
of broadcast)--------------------------- <br>TIP 6: explain analyze is your \
friend<br><br></div>---------------------------(end of \
broadcast)---------------------------<br>TIP 9: In versions below 8.0, the planner \
will ignore your desire to<br> &nbsp; &nbsp; &nbsp; choose an index scan if your \
joining column&#39;s datatypes do not <br> &nbsp; &nbsp; &nbsp; \
match<br></blockquote></div><br>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic