[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: =?GBK?Q?Re:_How_to_improve_the_performance_of_my_SQL_query=A3=BF?=
From: gzh <gzhcoder () 126 ! com>
Date: 2023-07-28 11:50:29
Message-ID: 4f2b7ce.6eac.1899c560173.Coremail.gzhcoder () 126 ! com
[Download RAW message or body]
[Attachment #2 (text/plain)]
Thank you very much for taking the time to reply to my question.
> The problem is none of the explains you sent match with the description above. The \
> last one when you forced the optimizer to go with index scan (SET enable_seqscan TO \
> off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your \
> standard of 2 characters column name). There may have a couple of explanations to \
> this:
> - One is that the index may not be exactly the same as described above;
The primary key of the tbl_sha table consists of several fields, and ms_cd is just \
one of them. I just explained the definitions of the fields used in the query SQL.
> - Another one is the order in the index. Because you have a composed index the \
> order of the columns in the index matters, and it seems the order is (ms_cd, et_cd, \
> etrys). I wonder if you could recreate this index with the following order: (ms_cd, \
> etrys, et_cd) and run the same query;
The index of TBL_SHA table is defined as follows.
CREATE INDEX index_search_01 ON mdb.TBL_SHA USING btree (MS_CD, ETRYS)
CREATE INDEX index_search_02 ON mdb.TBL_SHA USING btree (ET_CD, ETRYS)
CREATE INDEX index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, \
RM_CD)
> You can also try to trick the optimizer, for example, what is the result (and \
> explain) of the below query?
>
> WITH tbi (ry_cd) AS (
> SELECT tbl_inf.ry_cd
> FROM tbl_inf tbi
> WHERE tbi.ms_cd = 'MLD009'
> AND tbl_inf.ry_cd = '00000001'
> ) SELECT COUNT(et_cd) FROM tbl_sha tbs
> JOIN tbi ON tbi.ry_cd = tbs .etrys
> WHERE tbs .ms_cd = 'MLD009';
The SQL execution encountered an error, so I made some modifications.
Please refer to the execution plan.
WHERE tbi.ms_cd = 'MLD009'
¡ú
WHERE tbl_inf.ms_cd = 'MLD009'
QUERY PLAN
Limit (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.074..133557.729 rows=1 loops=1)
-> Aggregate (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.072..133557.726 rows=1 loops=1)
-> Nested Loop (cost=1000.29..2664512.83 rows=1719572 width=9) (actual \
time=29657.638..133341.053 rows=2113500 loops=1)
-> Index Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 \
rows=1 width=9) (actual time=1.316..1.321 rows=1 loops=1)
Index Cond: ((ms_cd = 'MLD009'::bpchar) AND (ry_cd = \
'00000001'::bpchar))
Heap Fetches: 1
-> Gather (cost=1000.00..2647308.80 rows=1719572 width=18) (actual \
time=29656.318..132969.910 rows=2113500 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on tbl_sha tbs (cost=0.00..2474351.60 \
rows=716488 width=18) (actual time=29654.184..132876.292 rows=704500 loops=3)
Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = \
'00000001'::bpchar))
Rows Removed by Filter: 14678996
Planning Time: 0.164 ms
Execution Time: 133557.767 ms
> Well, adding more resources tends to improve performance, but it's usually not \
> linear and the improvement may not be as large as you want for the extra price you \
> are paying. I would first try to understand the performance problem because using \
> the "add more resources" approach may just delay the problem and it tends to get \
> worse with time as the dataset increases.
I strongly agree with your viewpoint, but I currently don't have a solution in mind \
for the problem.
At 2023-07-28 04:38:39, "Charly" <carlbsb@gmail.com> wrote:
Hi "gzh",
Based on the info you provided I'm assuming you are trying to use the TBL_SHA primary \
key to do an index-only scan as in you mentioned above you have:
> TBL_SHA
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> et_cd character(8)
> etrys character(8)
Assuming a composed index here by the 3 columns.
> TBL_INF
> ms_cd character(6) NOT NULL -- PRIMARY KEY
> ry_cd character(8) NOT NULL -- PRIMARY KEY
Here it's more clear that there is a composed index based on those 2 columns.
The problem is none of the explains you sent match with the description above. The \
last one when you forced the optimizer to go with index scan (SET enable_seqscan TO \
off;) the chosen index seems to be one composed by ms_cd and cd_ate (following your \
standard of 2 characters column name). There may have a couple of explanations to \
this:
- One is that the index may not be exactly the same as described above;
- Another one is the order in the index. Because you have a composed index the order \
of the columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). \
I wonder if you could recreate this index with the following order: (ms_cd, etrys, \
et_cd) and run the same query; There may be other problems happening there, but \
those are the ones I see more evident from the description of the problem.
Giving a closer look to your query I really didn't understand the reasoning to have \
that subselect as it seems only to validate the two relations have common items or to \
validate a parent relationship, in this case you can probably use a join to get the \
same result:
SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs.etrys AND tbi.ms_cd = tbi.ms_cd
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001';
You can also try to trick the optimizer, for example, what is the result (and \
explain) of the below query?
WITH tbi (ry_cd) AS (
SELECT tbl_inf.ry_cd
FROM tbl_inf tbi
WHERE tbi.ms_cd = 'MLD009'
AND tbl_inf.ry_cd = '00000001'
) SELECT COUNT(et_cd) FROM tbl_sha tbs
JOIN tbi ON tbi.ry_cd = tbs .etrys
WHERE tbs .ms_cd = 'MLD009';
You can alternatively try the CTE using the both columns in the JOIN clause.
On Thu, 27 Jul 2023 at 05:10, gzh <gzhcoder@126.com> wrote:
Thank you for your suggestion.
The database is hosted on AWS RDS, with the instance having vCPU=2 and RAM=4GB.
Could the low performance be due to the low configuration of AWS RDS?
We are considering trying a higher configuration instance.
Well, adding more resources tends to improve performance, but it's usually not linear \
and the improvement may not be as large as you want for the extra price you are \
paying. I would first try to understand the performance problem because using the \
"add more resources" approach may just delay the problem and it tends to get worse \
with time as the dataset increases.
I hope it helps.
--
Regards,
Charly Batista
Shanghai, China
carlbsb@gmail.com
https://www.linkedin.com/in/charlybatista/
Linux user #391083
¡°If you have an apple and I have an apple and we exchange these apples then you and \
I will still each have one apple. But if you have an idea and I have an idea and we \
exchange these ideas, then each of us will have two ideas." George Bernard Shaw \
(1856-1950)
[Attachment #3 (text/html)]
<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><p \
style="margin: 0;">Thank you very much for taking the time to reply to my \
question. </p><p style="margin: 0;"><br></p><p style="margin: 0;">> The \
problem is none of the explains you sent match with the description above. The last \
one when you forced the optimizer to go with index scan (SET enable_seqscan TO off;) \
the chosen index seems to be one composed by ms_cd and cd_ate (following your \
standard of 2 characters column name). There may have a couple of explanations to \
this:</p><p style="margin: 0;">> - One is that the index may not be exactly the \
same as described above;</p><p style="margin: 0;">The primary key of the tbl_sha \
table consists of several fields, and ms_cd is just one of them. I just explained the \
definitions of the fields used in the query SQL.</p><p style="margin: 0;"><br></p><p \
style="margin: 0;">> - Another one is the order in the index. Because you have a \
composed index the order of the columns in the index matters, and it seems the order \
is (ms_cd, et_cd, etrys). I wonder if you could recreate this index with the \
following order: (ms_cd, etrys, et_cd) and run the same query;</p><p style="margin: \
0;">The index of TBL_SHA table is defined as follows.</p><p style="margin: \
0;"><br></p><p style="margin: 0;">CREATE INDEX index_search_01 ON mdb.TBL_SHA USING \
btree (MS_CD, ETRYS)</p><p style="margin: 0;">CREATE INDEX index_search_02 ON \
mdb.TBL_SHA USING btree (ET_CD, ETRYS)</p><p style="margin: 0;">CREATE INDEX \
index_search_03 ON mdb.TBL_SHA USING btree (MS_CD, ET_DAY, BK_CD, FR_CD, RM_CD)</p><p \
style="margin: 0;"><br></p><p style="margin: 0;">> You can also try to trick the \
optimizer, for example, what is the result (and explain) of the below query?</p><p \
style="margin: 0;">> </p><p style="margin: 0;">> WITH tbi (ry_cd) AS \
(</p><p style="margin: 0;">> SELECT tbl_inf.ry_cd</p><p \
style="margin: 0;">> FROM tbl_inf tbi</p><p style="margin: \
0;">> WHERE tbi.ms_cd = 'MLD009'</p><p style="margin: \
0;">> AND tbl_inf.ry_cd = '00000001'</p><p \
style="margin: 0;">> ) SELECT COUNT(et_cd) FROM tbl_sha tbs</p><p style="margin: \
0;">> JOIN tbi ON tbi.ry_cd = tbs .etrys</p><p style="margin: \
0;">> WHERE tbs .ms_cd = 'MLD009';</p><p style="margin: 0;">The \
SQL execution encountered an error, so I made some modifications.</p><p \
style="margin: 0;">Please refer to the execution plan.</p><p style="margin: \
0;"> WHERE tbi.ms_cd = 'MLD009'</p><p style="margin: 0;">¡ú</p><p \
style="margin: 0;"> WHERE tbl_inf.ms_cd = 'MLD009'</p><p \
style="margin: 0;"><br></p><p style="margin: 0;">QUERY PLAN</p><p style="margin: \
0;">Limit (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.074..133557.729 rows=1 loops=1)</p><p style="margin: 0;"> \
-> Aggregate (cost=2668811.76..2668811.77 rows=1 width=8) (actual \
time=133555.072..133557.726 rows=1 loops=1)</p><p style="margin: 0;"> \
-> Nested Loop (cost=1000.29..2664512.83 rows=1719572 \
width=9) (actual time=29657.638..133341.053 rows=2113500 loops=1)</p><p \
style="margin: 0;"> -> Index \
Only Scan using tbl_inf_pkc on tbl_inf (cost=0.29..8.31 rows=1 width=9) (actual \
time=1.316..1.321 rows=1 loops=1)</p><p style="margin: 0;"> \
Index Cond: ((ms_cd = \
'MLD009'::bpchar) AND (ry_cd = '00000001'::bpchar))</p><p style="margin: 0;"> \
Heap Fetches: 1</p><p \
style="margin: 0;"> -> \
Gather (cost=1000.00..2647308.80 rows=1719572 width=18) (actual \
time=29656.318..132969.910 rows=2113500 loops=1)</p><p style="margin: 0;"> \
Workers Planned: \
2</p><p style="margin: 0;"> \
Workers Launched: 2</p><p style="margin: 0;"> \
-> Parallel Seq Scan on \
tbl_sha tbs (cost=0.00..2474351.60 rows=716488 width=18) (actual \
time=29654.184..132876.292 rows=704500 loops=3)</p><p style="margin: 0;"> \
\
Filter: ((ms_cd = 'MLD009'::bpchar) AND (etrys = '00000001'::bpchar))</p><p \
style="margin: 0;"> \
Rows Removed by Filter: 14678996</p><p style="margin: \
0;">Planning Time: 0.164 ms</p><p style="margin: 0;">Execution Time: 133557.767 \
ms</p><p style="margin: 0;"><br></p><p style="margin: 0;">> Well, adding more \
resources tends to improve performance, but it's usually not linear and the \
improvement may not be as large as you want for the extra price you are paying. I \
would first try to understand the performance problem because using the "add more \
resources" approach may just delay the problem and it tends to get worse with time as \
the dataset increases.</p><p style="margin: 0;">I strongly agree with your viewpoint, \
but I currently don't have a solution in mind for the problem.</p><p style="margin: \
0;"><br></p><p style="margin: 0;"><br></p><p style="margin: 0;"><br></p><p \
style="margin: 0;"><br></p><div style="position:relative;zoom:1"></div><div \
id="divNeteaseMailCard"></div><p style="margin: 0;"><br></p><p>At 2023-07-28 \
04:38:39, "Charly" <carlbsb@gmail.com> wrote:</p><blockquote \
id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: \
#ccc 1px solid"><div dir="ltr"><div dir="ltr">Hi "gzh",</div><div \
dir="ltr"><br></div><div>Based on the info you provided I'm assuming you are trying \
to use the TBL_SHA primary key to do an index-only scan as in you mentioned above you \
have:</div><div></div><div>> TBL_SHA<br>> ms_cd character(6) NOT NULL \
-- PRIMARY KEY<br>> et_cd character(8)<br>> etrys \
character(8)<br></div><div>Assuming a composed index here by the 3 \
columns.<br></div><div><br>> TBL_INF<br>> ms_cd character(6) NOT NULL \
-- PRIMARY KEY<br>> ry_cd character(8) NOT NULL \
-- PRIMARY KEY</div><div>Here it's more clear that there is a composed index based on \
those 2 columns.<br></div><div><br></div><div>The problem is none of the explains you \
sent match with the description above. The last one when you forced the optimizer to \
go with index scan (SET enable_seqscan TO off;) the chosen index seems to be one \
composed by <b>ms_cd</b> and cd_ate (following your standard of 2 characters column \
name). There may have a couple of explanations to this:</div><div> - One is that \
the index may not be exactly the same as described above;</div><div> - Another \
one is the order in the index. Because you have a composed index the order of the \
columns in the index matters, and it seems the order is (ms_cd, et_cd, etrys). I \
wonder if you could recreate this index with the following order: (ms_cd, etrys, \
et_cd) and run the same query;</div><div> There may be other problems happening \
there, but those are the ones I see more evident from the description of the \
problem.</div><div><br></div><div>Giving a closer look to your query I really didn't \
understand the reasoning to have that subselect as it seems only to validate the two \
relations have common items or to validate a parent relationship, in this case you \
can probably use a join to get the same result:</div><div> </div><div>SELECT \
COUNT(et_cd) FROM tbl_sha tbs<br> JOIN tbi ON tbi.ry_cd = tbs.etrys \
AND tbi.ms_cd = tbi.ms_cd<br>WHERE tbi.ms_cd = 'MLD009' <br> AND \
tbl_inf.ry_cd = '00000001';</div><div><br></div><div>You can also try to trick the \
optimizer, for example, what is the result (and explain) of the below \
query?</div><div><br></div><div>WITH tbi (ry_cd) AS (<br> SELECT \
tbl_inf.ry_cd<br> FROM tbl_inf tbi<br> WHERE tbi.ms_cd = \
'MLD009'<br> AND tbl_inf.ry_cd = '00000001'<br>) SELECT \
COUNT(et_cd) FROM tbl_sha tbs <br> JOIN tbi ON tbi.ry_cd = tbs \
.etrys<br> WHERE tbs .ms_cd = \
'MLD009';</div><div><br></div><div>You can alternatively try the CTE using the both \
columns in the JOIN clause.<br></div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Thu, 27 Jul 2023 at 05:10, gzh <<a \
href="mailto:gzhcoder@126.com">gzhcoder@126.com</a>> wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div \
style="line-height:1.7;color:rgb(0,0,0);font-size:14px;font-family:Arial"><div \
style="margin:0px"><span style="color:rgb(17,24,39);font-family:"Albert \
sans",ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe \
UI",Roboto,"Helvetica Neue",Arial,"Noto \
Sans",sans-serif,"Apple Color Emoji","Segoe UI \
Emoji","Segoe UI Symbol","Noto Color \
Emoji";font-size:medium">Thank you for your suggestion. </span></div><div \
style="margin:0px"><span style="color:rgb(17,24,39);font-family:"Albert \
sans",ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe \
UI",Roboto,"Helvetica Neue",Arial,"Noto \
Sans",sans-serif,"Apple Color Emoji","Segoe UI \
Emoji","Segoe UI Symbol","Noto Color \
Emoji";font-size:medium">The database is hosted on AWS RDS, with the instance \
having vCPU=2 and RAM=4GB. </span></div><div style="margin:0px"><span \
style="color:rgb(17,24,39);font-family:"Albert \
sans",ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe \
UI",Roboto,"Helvetica Neue",Arial,"Noto \
Sans",sans-serif,"Apple Color Emoji","Segoe UI \
Emoji","Segoe UI Symbol","Noto Color \
Emoji";font-size:medium">Could the low performance be due to the low \
configuration of AWS RDS? </span></div><div style="margin:0px"><span \
style="color:rgb(17,24,39);font-family:"Albert \
sans",ui-sans-serif,system-ui,-apple-system,BlinkMacSystemFont,"Segoe \
UI",Roboto,"Helvetica Neue",Arial,"Noto \
Sans",sans-serif,"Apple Color Emoji","Segoe UI \
Emoji","Segoe UI Symbol","Noto Color \
Emoji";font-size:medium">We are considering trying a higher configuration \
instance.</span></div></div></blockquote><div><br></div>Well, adding more resources \
tends to improve performance, but it's usually not linear and the improvement may not \
be as large as you want for the extra price you are paying. I would first try to \
understand the performance problem because using the "add more resources" approach \
may just delay the problem and it tends to get worse with time as the dataset \
increases. <br></div><div class="gmail_quote"><br></div><div class="gmail_quote">I \
hope it helps.<br clear="all"></div><br><span class="gmail_signature_prefix">-- \
</span><br><div dir="ltr" class="gmail_signature"><div \
dir="ltr"><div>Regards,</div><div><br></div><div><b>Charly \
Batista</b></div><div><font size="1"><i>Shanghai, China</i></font></div><div><font \
size="1" color="#0000ee"><i><u><a href="mailto:carlbsb@gmail.com" \
target="_blank">carlbsb@gmail.com</a> </u></i></font><br></div><div><font \
size="1"><i><a href="https://www.linkedin.com/in/charlybatista/" \
target="_blank">https://www.linkedin.com/in/charlybatista/</a></i></font><br></div><div><i><font \
size="1">Linux user #391083<br></font></i></div><div><div><i><br>¡°If you have an \
apple and I have an apple and we exchange these apples then you and I will still each \
have one apple. But if you have an idea and I have an idea and we exchange these \
ideas, then each of us will have two ideas."<br> George Bernard \
Shaw (1856-1950)</i></div></div></div></div></div> </blockquote></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic