'Re: Query plan: SELECT vs INSERT from same select'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: Query plan: SELECT vs INSERT from same select
From:       Alban Hertroys <haramrae () gmail ! com>
Date:       2019-07-24 9:24:23
Message-ID: 346DB5C5-3B26-4FA6-9B17-93FDED88A738 () gmail ! com
[Download RAW message or body]


> On 23 Jul 2019, at 22:29, Alexander Voytsekhovskyy <young.inbox@gmail.com> wrote:
> 
> I have quite complicated query:
> 
> SELECT axis_x1, axis_y1, SUM(delivery_price)  as v_1 FROM (
> SELECT to_char(delivery_data.delivery_date, 'YYYY-MM') as axis_x1, \
> clients.id_client as axis_y1, delivery_data.amount * production_price.price * \
> groups.discount as delivery_price 
> FROM delivery_data 
> JOIN client_tt ON (client_tt.id_client_tt = delivery_data.id_client_tt) 
> JOIN clients ON (client_tt.id_client = clients.id_client) 
> JOIN production ON (production.id = delivery_data.id_product) 
> JOIN groups ON (groups.id = delivery_data.delivery_group_id AND client_tt.id_group \
> = groups.id AND groups.id = clients.id_group) 

Are client_tt.id_group and clients.id_group ever different from each other? It looks \
like you might have redundant information there, but... If they are guaranteed to be \
the same then you don't need the JOIN to clients, which would both remove a JOIN and \
reduce the complexity of the JOIN condition on groups.

Or (assuming the group id's are indeed supposed to be equal), you could
 JOIN clients ON (client_tt.id_client = clients.id_client AND client_tt.id_group = \
clients.id_group) instead of putting that condition within the JOIN condition on \
groups.

I don't think either option will make a huge difference (the first probably more than \
the second, as it reduces an entire join), but it could be enough to help the \
database figure out a better plan.

> LEFT JOIN production_price on (delivery_data.id_product = \
> production_price.id_production AND groups.price_list_id = \
> production_price.price_list_id AND delivery_data.delivery_date BETWEEN \
> production_price.date_from AND production_price.date_to)  
> WHERE delivery_data.delivery_date between '2019-03-01' AND '2019-06-30' 
> AND delivery_data.delivery_group_id IN (...short list of values...) 
> AND delivery_data.id_product IN ()) AS tmpsource 

You don't have a price if your goods weren't produced in the delivery window you set? \
Or do you have goods that get delivered without having a price?

You seem to be using this query for a report on nett sales by month, but I have my \
doubts whether that LEFT JOIN, and especially the condition on the production date \
window, is really what you want: Your formula for delivery_price includes the price \
column from that LEFT JOIN, so you're going to get 0 values when there is no \
production_price record in your delivery-window, resulting in a SUM that's too low if \
the product was produced before (or after, but that seems unlikely) the delivery \
window.

> WHERE TRUE 

This line is unnecessary.

> GROUP BY GROUPING SETS ((axis_x1, axis_y1), (axis_x1), (axis_y1), ())

Apparently (I'm new to these statements), CUBE (axis_x1, axis_y1) is a shorthand for \
the above. They seem to have been introduced at the same time (in 9.6?). See: \
https://www.postgresql.org/docs/11/queries-table-expressions.html#QUERIES-GROUPING-SETS


> It runs well, took 1s and returns 4000 rows.

I won't go into the performance issue ash this point, other more knowledgeable people \
already did.

Regards,

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic