'=?GBK?Q?Re:_How_to_improve_the_performance_of_my_SQL_query=A3=BF?='

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    =?GBK?Q?Re:_How_to_improve_the_performance_of_my_SQL_query=A3=BF?=
From:       gzh  <gzhcoder () 126 ! com>
Date:       2023-07-28 11:50:29
Message-ID: 4f2b7ce.6eac.1899c560173.Coremail.gzhcoder () 126 ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

Thank you very much for taking the time to reply to my question. 




> The problem is none of the explains you sent match with the description above. The \
> last one when you forced the optimizer to go with index scan (SET enable_seqscan TO \
> off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your \
> standard of 2 characters column name). There may have a couple of explanations to \
> this:

> - One is that the index may not be exactly the same as described above;

The primary key of the tbl_sha table consists of several fields, and ms_cd is just \
one of them. I just explained the definitions of the fields used in the query SQL.




> - Another one is the order in the index. Because you have a composed index the \
> order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, \
> etrys). I wonder if you could recreate this index with the following order: (ms_cd, \
> etrys, et_cd) and run the same query;

The index of TBL_SHA table is defined as follows.




CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS)

CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)

CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, \
RM_CD)




> You can also try to trick the optimizer, for example, what is the result (and \
> explain) of the below query?

> 

> WITH tbi (ry_cd) AS (

> SELECT tbl_inf.ry_cd

> FROM tbl_inf tbi

> WHERE tbi.ms_cd = 'MLD009'

> AND tbl_inf.ry_cd = '00000001'

> ) SELECT COUNT(et_cd) FROM tbl_sha tbs

> JOIN tbi ON tbi.ry_cd = tbs .etrys

> WHERE tbs .ms_cd = 'MLD009';

The SQL execution encountered an error, so I made some modifications.

Please refer to the execution plan.

     WHERE tbi.ms_cd = 'MLD009'

→

     WHERE tbl_inf.ms_cd = 'MLD009'




QUERY PLAN

Limit  (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.074..133557.729 rows=1 loops=1)

  ->  Aggregate  (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.072..133557.726 rows=1 loops=1)

        ->  Nested Loop  (cost=1000.29..2664512.83 rows=1719572 width=9) (actual \
time=29657.638..133341.053 rows=2113500 loops=1)

              ->  Index Only Scan using tbl_inf_pkc on tbl_inf  (cost=0.29..8.31 \
rows=1 width=9) (actual time=1.316..1.321 rows=1 loops=1)

                    Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = \
'00000001'::bpchar))

                    Heap Fetches: 1

              ->  Gather  (cost=1000.00..2647308.80 rows=1719572 width=18) (actual \
time=29656.318..132969.910 rows=2113500 loops=1)

                    Workers Planned: 2

                    Workers Launched: 2

                    ->  Parallel Seq Scan on tbl_sha tbs  (cost=0.00..2474351.60 \
rows=716488 width=18) (actual time=29654.184..132876.292 rows=704500 loops=3)

                          Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = \
'00000001'::bpchar))

                          Rows Removed by Filter: 14678996

Planning Time: 0.164 ms

Execution Time: 133557.767 ms




> Well, adding more resources tends to improve performance, but it's usually not \
> linear and the improvement may not be as large as you want for the extra price you \
> are paying. I would first try to understand the performance problem because using \
> the "add more resources" approach may just delay the problem and it tends to get \
> worse with time as the dataset increases.

I strongly agree with your viewpoint, but I currently don't have a solution in mind \
for the problem.
















At 2023-07-28 04:38:39, "Charly" <carlbsb@gmail.com> wrote:

Hi "gzh",


Based on the info you provided I'm assuming you are trying to use the TBL_SHA primary \
key to do an index-only scan as in you mentioned above you have:
> TBL_SHA
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)

Assuming a composed index here by the 3 columns.


> TBL_INF
> ms_cd character(6) NOT NULL       -- PRIMARY KEY
> ry_cd character(8) NOT NULL       -- PRIMARY KEY
Here it's more clear that there is a composed index based on those 2 columns.



The problem is none of the explains you sent match with the description above. The \
last one when you forced the optimizer to go with index scan (SET enable_seqscan TO \
off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your \
standard of 2 characters column name). There may have a couple of explanations to \
                this:
 - One is that the index may not be exactly the same as described above;
 - Another one is the order in the index. Because you have a composed index the order \
of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). \
I wonder if you could recreate this index with the following order: (ms_cd, etrys, \
et_cd) and run the same query;  There may be other problems happening there, but \
those are the ones I see more evident from the description of the problem.


Giving a closer look to your query I really didn't understand the reasoning to have \
that subselect as it seems only to validate the two relations have common items or to \
validate a parent relationship, in this case you can probably use a join to get the \
same result:  
SELECT COUNT(et_cd) FROM tbl_sha tbs
    JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
    AND tbl_inf.ry_cd = '00000001';


You can also try to trick the optimizer, for example, what is the result (and \
explain) of the below query?


WITH tbi (ry_cd) AS (
    SELECT tbl_inf.ry_cd
    FROM tbl_inf tbi
    WHERE tbi.ms_cd = 'MLD009'
        AND tbl_inf.ry_cd = '00000001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
    JOIN tbi ON tbi.ry_cd = tbs .etrys
    WHERE tbs .ms_cd = 'MLD009';


You can alternatively try the CTE using the both columns in the JOIN clause.



On Thu, 27 Jul 2023 at 05:10, gzh <gzhcoder@126.com> wrote:

Thank you for your suggestion. 
The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB. 
Could the low performance be due to the low configuration of AWS RDS? 
We are considering trying a higher configuration instance.


Well, adding more resources tends to improve performance, but it's usually not linear \
and the improvement may not be as large as you want for the extra price you are \
paying. I would first try to understand the performance problem because using the \
"add more resources" approach may just delay the problem and it tends to get worse \
with time as the dataset increases.



I hope it helps.


--

Regards,


Charly Batista
Shanghai, China
carlbsb@gmail.com 

https://www.linkedin.com/in/charlybatista/

Linux user #391083


“If you have an apple and I have an apple and we exchange these apples then you and \
I will still each have one apple. But if you have an idea and I have an idea and we \
exchange these ideas, then each of us will have two ideas."  George Bernard Shaw \
(1856-1950)


[Attachment #3 (text/html)]

<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><p \
style="margin: 0;">Thank you very much for taking the time to reply to my \
question.&nbsp;</p><p style="margin: 0;"><br></p><p style="margin: 0;">&gt; The \
problem is none of the explains you sent match with the description above. The last \
one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) \
the chosen index seems to be one composed by ms_cd and cd_ate (following your \
standard of 2 characters column name). There may have a couple of explanations to \
this:</p><p style="margin: 0;">&gt; - One is that the index may not be exactly the \
same as described above;</p><p style="margin: 0;">The primary key of the tbl_sha \
table consists of several fields, and ms_cd is just one of them. I just explained the \
definitions of the fields used in the query SQL.</p><p style="margin: 0;"><br></p><p \
style="margin: 0;">&gt; - Another one is the order in the index. Because you have a \
composed index the order of the columns in the index matters, and it seems the order \
is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the \
following order: (ms_cd, etrys, et_cd) and run the same query;</p><p style="margin: \
0;">The index of TBL_SHA table is defined as follows.</p><p style="margin: \
0;"><br></p><p style="margin: 0;">CREATE INDEX index_search_01 ON mdb.TBL_SHA USING \
btree (MS_CD, ETRYS)</p><p style="margin: 0;">CREATE INDEX index_search_02 ON \
mdb.TBL_SHA USING btree (ET_CD, ETRYS)</p><p style="margin: 0;">CREATE INDEX \
index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)</p><p \
style="margin: 0;"><br></p><p style="margin: 0;">&gt; You can also try to trick the \
optimizer, for example, what is the result (and explain) of the below query?</p><p \
style="margin: 0;">&gt;&nbsp;</p><p style="margin: 0;">&gt; WITH tbi (ry_cd) AS \
(</p><p style="margin: 0;">&gt;&nbsp; &nbsp; &nbsp;SELECT tbl_inf.ry_cd</p><p \
style="margin: 0;">&gt;&nbsp; &nbsp; &nbsp;FROM tbl_inf tbi</p><p style="margin: \
0;">&gt;&nbsp; &nbsp; &nbsp;WHERE tbi.ms_cd = 'MLD009'</p><p style="margin: \
0;">&gt;&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AND tbl_inf.ry_cd = '00000001'</p><p \
style="margin: 0;">&gt; ) SELECT COUNT(et_cd) FROM tbl_sha tbs</p><p style="margin: \
0;">&gt;&nbsp; &nbsp; &nbsp;JOIN tbi ON tbi.ry_cd = tbs .etrys</p><p style="margin: \
0;">&gt;&nbsp; &nbsp; &nbsp;WHERE tbs .ms_cd = 'MLD009';</p><p style="margin: 0;">The \
SQL execution encountered an error, so I made some modifications.</p><p \
style="margin: 0;">Please refer to the execution plan.</p><p style="margin: \
0;">&nbsp; &nbsp; &nbsp;WHERE tbi.ms_cd = 'MLD009'</p><p style="margin: 0;">→</p><p \
style="margin: 0;">&nbsp; &nbsp; &nbsp;WHERE tbl_inf.ms_cd = 'MLD009'</p><p \
style="margin: 0;"><br></p><p style="margin: 0;">QUERY PLAN</p><p style="margin: \
0;">Limit&nbsp; (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.074..133557.729 rows=1 loops=1)</p><p style="margin: 0;">&nbsp; \
-&gt;&nbsp; Aggregate&nbsp; (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.072..133557.726 rows=1 loops=1)</p><p style="margin: 0;">&nbsp; &nbsp; \
&nbsp; &nbsp; -&gt;&nbsp; Nested Loop&nbsp; (cost=1000.29..2664512.83 rows=1719572 \
width=9) (actual time=29657.638..133341.053 rows=2113500 loops=1)</p><p \
style="margin: 0;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -&gt;&nbsp; Index \
Only Scan using tbl_inf_pkc on tbl_inf&nbsp; (cost=0.29..8.31 rows=1 width=9) (actual \
time=1.316..1.321 rows=1 loops=1)</p><p style="margin: 0;">&nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Index Cond: ((ms_cd = \
'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))</p><p style="margin: 0;">&nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Heap Fetches: 1</p><p \
style="margin: 0;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -&gt;&nbsp; \
Gather&nbsp; (cost=1000.00..2647308.80 rows=1719572 width=18) (actual \
time=29656.318..132969.910 rows=2113500 loops=1)</p><p style="margin: 0;">&nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Workers Planned: \
2</p><p style="margin: 0;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; Workers Launched: 2</p><p style="margin: 0;">&nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -&gt;&nbsp; Parallel Seq Scan on \
tbl_sha tbs&nbsp; (cost=0.00..2474351.60 rows=716488 width=18) (actual \
time=29654.184..132876.292 rows=704500 loops=3)</p><p style="margin: 0;">&nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \
Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))</p><p \
style="margin: 0;">&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; \
&nbsp; &nbsp; &nbsp; &nbsp; Rows Removed by Filter: 14678996</p><p style="margin: \
0;">Planning Time: 0.164 ms</p><p style="margin: 0;">Execution Time: 133557.767 \
ms</p><p style="margin: 0;"><br></p><p style="margin: 0;">&gt; Well, adding more \
resources tends to improve performance, but it's usually not linear and the \
improvement may not be as large as you want for the extra price you are paying. I \
would first try to understand the performance problem because using the "add more \
resources" approach may just delay the problem and it tends to get worse with time as \
the dataset increases.</p><p style="margin: 0;">I strongly agree with your viewpoint, \
but I currently don't have a solution in mind for the problem.</p><p style="margin: \
0;"><br></p><p style="margin: 0;"><br></p><p style="margin: 0;"><br></p><p \
style="margin: 0;"><br></p><div style="position:relative;zoom:1"></div><div \
id="divNeteaseMailCard"></div><p style="margin: 0;"><br></p><p>At 2023-07-28 \
04:38:39, "Charly" &lt;carlbsb@gmail.com&gt; wrote:</p><blockquote \
id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: \
#ccc 1px solid"><div dir="ltr"><div dir="ltr">Hi "gzh",</div><div \
dir="ltr"><br></div><div>Based on the info you provided I'm assuming you are trying \
to use the TBL_SHA primary key to do an index-only scan as in you mentioned above you \
have:</div><div></div><div>&gt; TBL_SHA<br>&gt; ms_cd character(6) NOT NULL &nbsp; \
&nbsp; &nbsp; -- PRIMARY KEY<br>&gt; et_cd character(8)<br>&gt; etrys \
character(8)<br></div><div>Assuming a composed index here by the 3 \
columns.<br></div><div><br>&gt; TBL_INF<br>&gt; ms_cd character(6) NOT NULL &nbsp; \
&nbsp; &nbsp; -- PRIMARY KEY<br>&gt; ry_cd character(8) NOT NULL &nbsp; &nbsp; &nbsp; \
-- PRIMARY KEY</div><div>Here it's more clear that there is a composed index based on \
those 2 columns.<br></div><div><br></div><div>The problem is none of the explains you \
sent match with the description above. The last one when you forced the optimizer to \
go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one \
composed by <b>ms_cd</b> and cd_ate (following your standard of 2 characters column \
name). There may have a couple of explanations to this:</div><div>&nbsp;- One is that \
the index may not be exactly the same as described above;</div><div>&nbsp;- Another \
one is the order in the index. Because you have a composed index the order of the \
columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I \
wonder if you could recreate this index with the following order: (ms_cd, etrys, \
et_cd) and run the same query;</div><div>&nbsp;There may be other problems happening \
there, but those are the ones I see more evident from the description of the \
problem.</div><div><br></div><div>Giving a closer look to your query I really didn't \
understand the reasoning to have that subselect as it seems only to validate the two \
relations have common items or to validate a parent relationship, in this case you \
can probably use a join to get the same result:</div><div>&nbsp;</div><div>SELECT \
COUNT(et_cd) FROM tbl_sha tbs<br>&nbsp;&nbsp;&nbsp; JOIN tbi ON tbi.ry_cd = tbs.etrys \
AND tbi.ms_cd = tbi.ms_cd<br>WHERE tbi.ms_cd = 'MLD009' <br>&nbsp;&nbsp;&nbsp; AND \
tbl_inf.ry_cd = '00000001';</div><div><br></div><div>You can also try to trick the \
optimizer, for example, what is the result (and explain) of the below \
query?</div><div><br></div><div>WITH tbi (ry_cd) AS (<br>&nbsp; &nbsp; SELECT \
tbl_inf.ry_cd<br>&nbsp; &nbsp; FROM tbl_inf tbi<br>&nbsp; &nbsp; WHERE tbi.ms_cd = \
'MLD009'<br>&nbsp; &nbsp; &nbsp; &nbsp; AND tbl_inf.ry_cd = '00000001'<br>) SELECT \
COUNT(et_cd) FROM tbl_sha tbs <br>&nbsp;&nbsp;&nbsp; JOIN tbi ON tbi.ry_cd = tbs \
.etrys<br>&nbsp;&nbsp;&nbsp; WHERE tbs .ms_cd = \
'MLD009';</div><div><br></div><div>You can alternatively try the CTE using the both \
columns in the JOIN clause.<br></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Thu, 27 Jul 2023 at 05:10, gzh &lt;<a \
href="mailto:gzhcoder@126.com">gzhcoder@126.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div \
style="line-height:1.7;color:rgb(0,0,0);font-size:14px;font-family:Arial"><div \
style="margin:0px"><span style="color:rgb(17,24,39);font-family:&quot;Albert \
sans&quot;,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto \
Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI \
Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color \
Emoji&quot;;font-size:medium">Thank you for your suggestion.&nbsp;</span></div><div \
style="margin:0px"><span style="color:rgb(17,24,39);font-family:&quot;Albert \
sans&quot;,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto \
Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI \
Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color \
Emoji&quot;;font-size:medium">The database is hosted on AWS RDS, with the instance \
having vCPU=2 and RAM=4GB.&nbsp;</span></div><div style="margin:0px"><span \
style="color:rgb(17,24,39);font-family:&quot;Albert \
sans&quot;,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto \
Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI \
Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color \
Emoji&quot;;font-size:medium">Could the low performance be due to the low \
configuration of AWS RDS?&nbsp;</span></div><div style="margin:0px"><span \
style="color:rgb(17,24,39);font-family:&quot;Albert \
sans&quot;,ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,&quot;Segoe \
UI&quot;,Roboto,&quot;Helvetica Neue&quot;,Arial,&quot;Noto \
Sans&quot;,sans-serif,&quot;Apple Color Emoji&quot;,&quot;Segoe UI \
Emoji&quot;,&quot;Segoe UI Symbol&quot;,&quot;Noto Color \
Emoji&quot;;font-size:medium">We are considering trying a higher configuration \
instance.</span></div></div></blockquote><div><br></div>Well, adding more resources \
tends to improve performance, but it's usually not linear and the improvement may not \
be as large as you want for the extra price you are paying. I would first try to \
understand the performance problem because using the "add more resources" approach \
may just delay the problem and it tends to get worse with time as the dataset \
increases. <br></div><div class="gmail_quote"><br></div><div class="gmail_quote">I \
hope it helps.<br clear="all"></div><br><span class="gmail_signature_prefix">-- \
</span><br><div dir="ltr" class="gmail_signature"><div \
dir="ltr"><div>Regards,</div><div><br></div><div><b>Charly \
Batista</b></div><div><font size="1"><i>Shanghai, China</i></font></div><div><font \
size="1" color="#0000ee"><i><u><a href="mailto:carlbsb@gmail.com" \
target="_blank">carlbsb@gmail.com</a>&nbsp;</u></i></font><br></div><div><font \
size="1"><i><a href="https://www.linkedin.com/in/charlybatista/" \
target="_blank">https://www.linkedin.com/in/charlybatista/</a></i></font><br></div><div><i><font \
size="1">Linux user #391083<br></font></i></div><div><div><i><br>“If you have an \
apple and I have an apple and we exchange these apples then you and I will still each \
have one apple. But if you have an idea and I have an idea and we exchange these \
ideas, then each of us will have two ideas."<br>&nbsp; &nbsp; &nbsp; George Bernard \
Shaw (1856-1950)</i></div></div></div></div></div> </blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic