[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: Re: Inserts create new records in reporting table
From: Ron <ronljohnsonjr () gmail ! com>
Date: 2022-12-31 8:55:10
Message-ID: 348ac840-800a-7402-c2ea-e64167a68e2d () gmail ! com
[Download RAW message or body]
200K *total* records, or 200K records per day/month/quarter/etc? Because
200K records isn't that much on modern hardware. Heck, it wasn't much 20
years ago on "Enterprise" hardware.
And you don't mention the window for loading the data into "the" table, and
then into the reporting table.
A trigger will slow down each insert into "the" table, since each operation
will do 2x (or more) work. Only you know whether it's more important to get
the data into "the" table ASAP and then then load into the reporting table
at your leisure, or get it into both tables ASAP.
No matter what you do, create as few indices as possible before the load
phase. Add the rest afterwards.
Lastly, remember "locality of data". That makes caches much more
effective. A single query on "the" table when it's clustered on an index
tuned to support the reporting table will run Really Fast, whereas a query
(or trigger) which bounces all around the table in each fetch and insert
won't be nearly as fast.
Really lastly: *test each method*.
On 12/31/22 01:02, Bret Stern wrote:
> Love the forum,
>
> I'm bringing 100k - 200k of AR transactions into a table from several
> separate ERP companies.
>
> As I insert the records, does it make sense to run a trigger event to
> build or insert data into
>
> a reporting table (to save generating a query to build the reporting table
> later.)
>
>
> For example, I bring in 120k records which contain invoice dates from the
> beginning of time
>
> in the ERP system. (eg; records with dated transactions from 2010...2022)
>
>
> Conceptual data
>
> [Location][InvoiceDate][Amount]
>
> Dallas 2012-01-09 7500.00
> Dallas 2012-11-19 1200.00
> Dallas 2012-04-02 500.00
> Phoenix 2012-01-03 1200.00
> Phoenix 2012-04-12 7500.00
>
> Would it be possible, or would it make sense to create a trigger which
> populated another
>
> table like below after each insert...or is is a resource killer..or just
> unnecessary, and smarter to create a query
>
> that performs calculated on the fly columns for the dates of invoice?
>
>
> [Location][InvoiceDate][Amount] [Jan] [Feb] [Mar]
> [Apr] [May] [Jun] [Jul] [Aug]
> [Sep] [Oct] [Nov] [Dec]
>
> Dallas 2012-01-09 7500.00
>
> Dallas 2012-11-19 1200.00
>
> Dallas 2012-04-02 500.00
>
> Phoenix 2012-01-03 1200.00
>
> Phoenix 2012-04-12 7500.00
>
>
> Bret
>
>
>
--
Born in Arizona, moved to Babylonia.
[Attachment #3 (text/html)]
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
200K <b>total</b> records, or 200K records per
day/month/quarter/etc? Because 200K records isn't that much on
modern hardware. Heck, it wasn't much 20 years ago on "Enterprise"
hardware.<br>
<br>
And you don't mention the window for loading the data into "the"
table, and then into the reporting table. <br>
<br>
A trigger will slow down each insert into "the" table, since each
operation will do 2x (or more) work. Only you know whether it's
more important to get the data into "the" table ASAP and then then
load into the reporting table at your leisure, or get it into both
tables ASAP.<br>
<br>
No matter what you do, create as few indices as possible before the
load phase. Add the rest afterwards.<br>
<br>
Lastly, remember "locality of data". That makes caches much more
effective. A single query on "the" table when it's clustered on an
index tuned to support the reporting table will run Really Fast,
whereas a query (or trigger) which bounces all around the table in
each fetch and insert won't be nearly as fast.<br>
<br>
Really lastly: <b>test each method</b>.<br>
<br>
<div class="moz-cite-prefix">On 12/31/22 01:02, Bret Stern wrote:<br>
</div>
<blockquote type="cite"
cite="mid:5a52b569-e54c-98f5-9c23-937186ad8477@machinemanagement.com">Love
the forum,
<br>
<br>
I'm bringing 100k - 200k of AR transactions into a table from
several separate ERP companies.
<br>
<br>
As I insert the records, does it make sense to run a trigger event
to build or insert data into
<br>
<br>
a reporting table (to save generating a query to build the
reporting table later.)
<br>
<br>
<br>
For example, I bring in 120k records which contain invoice dates
from the beginning of time
<br>
<br>
in the ERP system. (eg; records with dated transactions from
2010...2022)
<br>
<br>
<br>
Conceptual data
<br>
<br>
[Location][InvoiceDate][Amount]
<br>
<br>
Dallas 2012-01-09 7500.00
<br>
Dallas 2012-11-19 1200.00
<br>
Dallas 2012-04-02 500.00
<br>
Phoenix 2012-01-03 1200.00
<br>
Phoenix 2012-04-12 7500.00
<br>
<br>
Would it be possible, or would it make sense to create a trigger
which populated another
<br>
<br>
table like below after each insert...or is is a resource
killer..or just unnecessary, and smarter to create a query
<br>
<br>
that performs calculated on the fly columns for the dates of
invoice?
<br>
<br>
<br>
[Location][InvoiceDate][Amount] [Jan] [Feb]
[Mar] [Apr] [May] [Jun] [Jul]
[Aug] [Sep] [Oct] [Nov]
[Dec]
<br>
<br>
Dallas 2012-01-09 7500.00
<br>
<br>
Dallas 2012-11-19 1200.00
<br>
<br>
Dallas 2012-04-02 500.00
<br>
<br>
Phoenix 2012-01-03 1200.00
<br>
<br>
Phoenix 2012-04-12 7500.00
<br>
<br>
<br>
Bret
<br>
<br>
<br>
<br>
</blockquote>
<br>
<div class="moz-signature">-- <br>
Born in Arizona, moved to Babylonia.</div>
</body>
</html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic