'Re: When to store data that could be derived'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: When to store data that could be derived
From:       Frank <frank () chagford ! com>
Date:       2019-03-26 14:37:38
Message-ID: 24fa9e9a-20e8-9bf1-2400-9e9d12d6bf69 () chagford ! com
[Download RAW message or body]

On 2019-03-25 5:44 PM, Frank wrote:
> 
> 
> On reflection, I have not been consistent with my use of indexes, and I 
> think that will affect the query plan.
> 
> There are at least two issues -
> 
> 1. The first table defined in the VIEW is ccc.ar_tran_inv. It has the 
> following index -
>      "arinv_cust_date" btree (cust_row_id NULLS FIRST, tran_date NULLS 
> FIRST) WHERE deleted_id = 0
> 
> I have not used "WHERE deleted_id = 0" when constructing the VIEW, but I 
> have used "WHERE posted = '1'". I don't think the index can be used with 
> this setup.
> 
> 2. The complicated table in the VIEW is ccc.ar_rec_subtran. Various 
> columns such as tran_date and posted are retrieved via CASE expressions 
> from two underlying tables. Those tables have certain indexes defined, 
> but I cannot see how they can be utilised from my current setup.
> 
> I think I should spend some time tidying this up before you try to make 
> sense of the query plan. Any tips on how to improve it will be appreciated.
> 

I have spent a lot of time testing various permutations and trying to 
understand them using EXPLAIN.

My original concern was the overhead of calculating derived data. I now 
realise that it is more important to get the indexes right, as that has 
a much bigger impact on performance.

The VIEW that I have been using for testing is actually masking the 
problem. The view combines four tables, three of which are 
straightforward, easy to index, and fast. The fourth one is complex, 
difficult to index, and comparatively slow. So I forgot about the VIEW 
and concentrated on the complex table.

I now understand the caveats I received earlier in this thread. It seems 
impossible to make use of the indexes on the JOINed tables in the 
following query. I did create an index on cust_row_id in the main table, 
and it made a difference with a simple query, but by the time I added 
the JOINs, the improvement was hardly noticeable.

This is the query that I used -

EXPLAIN SELECT COUNT(*)
     FROM ccc.ar_rec_subtran a
     LEFT JOIN ccc.ar_tran_rec_det z ON z.row_id = a.tran_det_row_id
     LEFT JOIN ccc.ar_tran_rec y ON y.row_id = z.tran_row_id
     LEFT JOIN ccc.cb_tran_rec_det x ON x.row_id = a.tran_det_row_id
     LEFT JOIN ccc.cb_tran_rec w ON w.row_id = x.tran_row_id
WHERE a.cust_row_id = 4 AND
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.tran_date
         WHEN a.tran_type = 'cb_rec' THEN w.tran_date
     END BETWEEN '2015-05-01' AND '2015-05-31' AND
     CASE
         WHEN a.tran_type = 'ar_rec' THEN y.posted
         WHEN a.tran_type = 'cb_rec' THEN w.posted
     END = true AND
     a.deleted_id = 0;

Attached is the EXPLAIN for this one.

I also ran the same query with the following WHERE clause -

WHERE a.cust_row_id = 4 AND
     ((a.tran_type = 'ar_rec' AND
         y.tran_date BETWEEN '2015-05-01' AND '2015-05-31') OR
     (a.tran_type = 'cb_rec' AND
         w.tran_date BETWEEN '2015-05-01' AND '2015-05-31')) AND
     ((a.tran_type = 'ar_rec' AND y.posted = true) OR
         (a.tran_type = 'cb_rec' AND w.posted = true)) AND
     a.deleted_id = 0;

The timings were virtually identical, so I have not attached that EXPLAIN.

Frank

["explain_2" (text/plain)]

                                                                                      \
QUERY PLAN                                                                            \
                
-------------------------------------------------------------------------------------- \
-------------------------------------------------------------------------------------- \
-------------------------------------------------------------------------------------- \
-------------------------------------------------------------------------------------- \
-------------------------------------------------------------------------------------- \
-----------------------------------------------------------------------------------  \
                Aggregate  (cost=1514.75..1514.76 rows=1 width=8)
   ->  Hash Left Join  (cost=895.41..1514.49 rows=102 width=0)
         Hash Cond: (a.tran_det_row_id = x.row_id)
         Filter: (CASE WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN y.posted WHEN \
((a.tran_type)::text = 'cb_rec'::text) THEN w.posted ELSE NULL::boolean END AND (CASE \
WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN y.tran_date WHEN \
((a.tran_type)::text = 'cb_rec'::text) THEN w.tran_date ELSE NULL::date END >= \
'2015-05-01'::date) AND (CASE WHEN ((a.tran_type)::text = 'ar_rec'::text) THEN \
y.tran_date WHEN ((a.tran_type)::text = 'cb_rec'::text) THEN w.tran_date ELSE \
                NULL::date END <= '2015-05-31'::date))
         ->  Hash Right Join  (cost=893.05..1505.23 rows=1835 width=16)
               Hash Cond: (y.row_id = z.tran_row_id)
               ->  Seq Scan on ar_tran_rec y  (cost=0.00..511.25 rows=22025 width=9)
               ->  Hash  (cost=870.12..870.12 rows=1835 width=15)
                     ->  Hash Right Join  (cost=343.97..870.12 rows=1835 width=15)
                           Hash Cond: (z.row_id = a.tran_det_row_id)
                           ->  Seq Scan on ar_tran_rec_det z  (cost=0.00..397.53 \
                rows=22053 width=8)
                           ->  Hash  (cost=321.03..321.03 rows=1835 width=11)
                                 ->  Bitmap Heap Scan on ar_rec_subtran a  \
(cost=38.51..321.03 rows=1835 width=11)  Recheck Cond: (cust_row_id = 4)
                                       Filter: (deleted_id = 0)
                                       ->  Bitmap Index Scan on ar_sub_cust  \
(cost=0.00..38.05 rows=1835 width=0)  Index Cond: (cust_row_id = 4)
         ->  Hash  (cost=2.23..2.23 rows=10 width=9)
               ->  Hash Left Join  (cost=1.04..2.23 rows=10 width=9)
                     Hash Cond: (x.tran_row_id = w.row_id)
                     ->  Seq Scan on cb_tran_rec_det x  (cost=0.00..1.10 rows=10 \
                width=8)
                     ->  Hash  (cost=1.02..1.02 rows=2 width=9)
                           ->  Seq Scan on cb_tran_rec w  (cost=0.00..1.02 rows=2 \
width=9)



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic