'Re: [BUGS] BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 o'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-bugs
Subject:    Re: [BUGS] BUG #8173: Inserting heap tuples in bulk in COPY patch return wrong line on failure 999 o
From:       Heikki Linnakangas <hlinnakangas () vmware ! com>
Date:       2013-05-23 11:56:35
Message-ID: 519E03F3.70306 () vmware ! com
[Download RAW message or body]

On 21.05.2013 18:20, lalbin@fhcrc.org wrote:
> The following bug has been logged on the website:
>
> Bug reference:      8173
> Logged by:          Lloyd Albin
> Email address:      lalbin@fhcrc.org
> PostgreSQL version: 9.2.4
> Operating system:   SUSE Linux (64-bit)
> Description:
>
> During testing for our 9.2 upgrade, we found that the error messages we were
> expecting did not match what was given by the program. In looking over the
> revision notes from our current version of 9.0.12 through the 9.2.4, that we
> are testing, I believe that I have tracked down the issue to "Improve COPY
> performance by adding tuples to the heap in batches". When I looked at the
> patch code in
> http://www.postgresql.org/message-id/4E708759.40206@enterprisedb.com I found
> that you are inserting 1000 rows at a time. The problem is that on failure,
> you return either row 1000 or the last line, whichever comes first. This can
> be confusing as you will see in the demo code below.
>
> CREATE TABLE public.table1 (
>    key INTEGER,
>    PRIMARY KEY(key)
> );
>
> Create a csv file with only one column of data, numbered from 1 to 1008.
>
> Make two copies of the file and name them csv_test.csv and csv_test2.csv.
>
> Edit csv_test.csv and change the entry 1000 to 500.
>
> Edit csv_test2.csv and change the entry 900 to 500.
>
> On 9.0.12 Server
>
> COPY public.table1 FROM 'csv_test.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 1000: "500"
>
> COPY public.table1 FROM 'csv_test2.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 900: "500"
>
> Both times the context gave us the correct information.
>
> Now try the same thing on 9.2.4 Server
>
> COPY public.table1 FROM 'csv_test.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 1000: "500"
>
> COPY public.table1 FROM 'csv_test2.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 1000: "1000"
>
> As you can see, the second test returned the last line of the set of tuples
> being recorded not the line that actually failed.
>
> Make a copy of csv_test2.csv and name it csv_test3.csv.
> Edit csv_test3.csv and remove all entries after 994.
>
> COPY public.table1 FROM 'csv_test3.csv';
>
> ERROR:  duplicate key value violates unique constraint "table1_pkey"
> DETAIL:  Key (key)=(500) already exists.
> CONTEXT:  COPY table1, line 995: ""
>
> If you are writing less than 1000 lines then it will return the line after
> the last line with a value of "".

Hmm, yeah, it's quite self-evident what's happening; the server reports 
the last line that was *read*, no the line where the error happened.

Committed a fix for this. Unfortunately we only keep the last line read 
buffered in text format, so after this you'll only get the line number, 
not the content of that line:

postgres=# copy foo from '/tmp/foo';
ERROR:  duplicate key value violates unique constraint "foo_pkey"
DETAIL:  Key (id)=(4500) already exists.
CONTEXT:  COPY foo, line 4500

Thanks for the report!

- Heikki


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic