[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: [GENERAL] difference functions
From: Mark Rostron <mrostron () ql2 ! com>
Date: 2010-10-31 20:00:54
Message-ID: FD020D3E50E7FA479567872E5F5F31E3045A186C4E () ex01 ! corp ! ql2 ! com
[Download RAW message or body]
This post is just to record an example of how to use the new window fn's in 8.4 to \
perform difference-between-row calculations. To demonstrate, we create a table with \
30 rows of data, two columns, one of which contains the sequence 1..30, the other \
contains mod(c1,10). So the table looks like this:
> \d x
Table "public.x"
Column | Type | Modifiers
--------+---------+-----------
c1 | integer |
c2 | integer |
> select * from x order by c1,c2;
c1 | c2
----+----
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
6 | 6
7 | 7
8 | 8
9 | 9
10 | 0
11 | 1
12 | 2
13 | 3
14 | 4
15 | 5
16 | 6
17 | 7
18 | 8
19 | 9
20 | 0
21 | 1
22 | 2
23 | 3
24 | 4
25 | 5
26 | 6
27 | 7
28 | 8
29 | 9
30 | 0
(30 rows)
Now if I want to calculate the difference between two rows, I can use the lag() \
window function (8.4 manual ch 9.19). The query and result set of a typical function \
would be:
> select c1,c2,lag(c1,1,(select max(c1) from x x1 where x1.c2 = x.c2 and x1.c1 < \
> x.c1)) over ( partition by c2 order by c1) from x;
c1 | c2 | lag
----+----+-----
10 | 0 |
20 | 0 | 10
30 | 0 | 20
1 | 1 |
11 | 1 | 1
21 | 1 | 11
2 | 2 |
12 | 2 | 2
22 | 2 | 12
3 | 3 |
13 | 3 | 3
23 | 3 | 13
4 | 4 |
14 | 4 | 4
24 | 4 | 14
5 | 5 |
15 | 5 | 5
25 | 5 | 15
6 | 6 |
16 | 6 | 6
26 | 6 | 16
7 | 7 |
17 | 7 | 7
27 | 7 | 17
8 | 8 |
18 | 8 | 8
28 | 8 | 18
9 | 9 |
19 | 9 | 9
29 | 9 | 19
(30 rows)
Almost there.
You can already see that the delta is (lag - c1).
But: how to handle the missing-first-row-null-delta.
Now, we need a 'sliding window' capability, so we can perform the calculations using \
values outside of this result set. (eg stock values have a daily closing price that \
would be used to calculate the next day's delta .... etc) (this is always messy, so I \
was really happy to discover that the lag function provides an elegant way of \
reducing the mess).
In this example, let's assume that the first row in each partition (i.e. c1<=10) is \
our 'previous-row', upon which we will base our delta.
So, handling the 'first-value' case is as follows:
> select c1,c2,lag(c1,1,(select max(c1) from x x1 where x1.c2 = x.c2 and x1.c1 < \
> x.c1)) over ( partition by c2 order by c1) from x where c1 > 10;
c1 | c2 | lag
----+----+-----
20 | 0 | 10
30 | 0 | 20
11 | 1 | 1
21 | 1 | 11
12 | 2 | 2
22 | 2 | 12
13 | 3 | 3
23 | 3 | 13
14 | 4 | 4
24 | 4 | 14
15 | 5 | 5
25 | 5 | 15
16 | 6 | 6
26 | 6 | 16
17 | 7 | 7
27 | 7 | 17
18 | 8 | 8
28 | 8 | 18
19 | 9 | 9
29 | 9 | 19
(20 rows)
As you can see, now the 'lag' column contains the previous value of c1 in all cases, \
the first of which is calculated from the previous 'sliding-window' maximum. You can \
now add the delta calculation (lag-c1) and you have your delta.
Finally, another point, on performance.
The explain plan for this is:
> explain select c1,c2,lag(c1,1,(select max(c1) from x x1 where x1.c2 = x.c2 and \
> x1.c1 < x.c1)) over ( partition by c2 order by c1) from x where c1 > 10;
QUERY PLAN
------------------------------------------------------------------
WindowAgg (cost=1.84..33.02 rows=21 width=8)
-> Sort (cost=1.84..1.89 rows=21 width=8)
Sort Key: x.c2, x.c1
-> Seq Scan on x (cost=0.00..1.38 rows=21 width=8)
Filter: (c1 > 10)
SubPlan 1
-> Aggregate (cost=1.45..1.46 rows=1 width=4)
-> Seq Scan on x x1 (cost=0.00..1.45 rows=1 width=4)
Filter: ((c1 < $1) AND (c2 = $0))
(9 rows)
We have already gained because this feature eliminates what would have been handled \
by a lot of messy client code, not to mention the reduction of all of this to a \
single scan, and lookup for the first row. But the first row lookup is going to mean \
multiple-lookups, and if your table 'x' is large, that's going to hurt.
However, typically, I will offset the first-row-lookup overhead by preparing for it:
Create a small temp table containing exactly the first-row-lag-set that you are gonna \
want And then the first-row-query goes against the temp table.
If the size of table 'x' is such that you need to do this, it'll help.
Alternatively, instead of a sub-query, you can union the temp table into your \
calculation, and then exclude it from the final result using nested-sub-queries. \
Small temp tables are trivial to build and very useful for this type of thing.
I used this technique to do a covariance matrix calculation in an oracle database a \
while ago - it kinda worked pretty well. So I'm really glad to see it available in pg \
now - it's really gonna help getting queries simpler and faster.
Mr
[Attachment #3 (text/html)]
<html xmlns:v="urn:schemas-microsoft-com:vml" \
xmlns:o="urn:schemas-microsoft-com:office:office" \
xmlns:w="urn:schemas-microsoft-com:office:word" \
xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" \
xmlns="http://www.w3.org/TR/REC-html40"><head><meta http-equiv=Content-Type \
content="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 14 \
(filtered medium)"><style><!-- /* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div \
class=WordSection1><p class=MsoNormal>This post is just to record an example of how \
to use the new window fn’s in 8.4 to perform difference-between-row \
calculations.<o:p></o:p></p><p class=MsoNormal>To demonstrate, we create a \
table with 30 rows of data, two columns, one of which contains the sequence \
1..30, the other contains mod(c1,10).<o:p></o:p></p><p class=MsoNormal>So the table \
looks like this:<o:p></o:p></p><p class=MsoNormal>>\d x<o:p></o:p></p><p \
class=MsoNormal> Table \
"public.x"<o:p></o:p></p><p class=MsoNormal> Column | \
Type | Modifiers<o:p></o:p></p><p \
class=MsoNormal>--------+---------+-----------<o:p></o:p></p><p class=MsoNormal> \
c1 | integer |<o:p></o:p></p><p class=MsoNormal> \
c2 | integer |<o:p></o:p></p><p class=MsoNormal>> select * \
from x order by c1,c2;<o:p></o:p></p><p class=MsoNormal> c1 | c2<o:p></o:p></p><p \
class=MsoNormal>----+----<o:p></o:p></p><p class=MsoNormal> 1 | \
1<o:p></o:p></p><p class=MsoNormal> 2 | 2<o:p></o:p></p><p \
class=MsoNormal> 3 | 3<o:p></o:p></p><p class=MsoNormal> 4 | \
4<o:p></o:p></p><p class=MsoNormal> 5 | 5<o:p></o:p></p><p \
class=MsoNormal> 6 | 6<o:p></o:p></p><p class=MsoNormal> 7 | \
7<o:p></o:p></p><p class=MsoNormal> 8 | 8<o:p></o:p></p><p \
class=MsoNormal> 9 | 9<o:p></o:p></p><p class=MsoNormal> 10 | \
0<o:p></o:p></p><p class=MsoNormal> 11 | 1<o:p></o:p></p><p class=MsoNormal> 12 \
| 2<o:p></o:p></p><p class=MsoNormal> 13 | 3<o:p></o:p></p><p \
class=MsoNormal> 14 | 4<o:p></o:p></p><p class=MsoNormal> 15 | \
5<o:p></o:p></p><p class=MsoNormal> 16 | 6<o:p></o:p></p><p class=MsoNormal> 17 \
| 7<o:p></o:p></p><p class=MsoNormal> 18 | 8<o:p></o:p></p><p \
class=MsoNormal> 19 | 9<o:p></o:p></p><p class=MsoNormal> 20 | \
0<o:p></o:p></p><p class=MsoNormal> 21 | 1<o:p></o:p></p><p class=MsoNormal> 22 \
| 2<o:p></o:p></p><p class=MsoNormal> 23 | 3<o:p></o:p></p><p \
class=MsoNormal> 24 | 4<o:p></o:p></p><p class=MsoNormal> 25 | \
5<o:p></o:p></p><p class=MsoNormal> 26 | 6<o:p></o:p></p><p class=MsoNormal> 27 \
| 7<o:p></o:p></p><p class=MsoNormal> 28 | 8<o:p></o:p></p><p \
class=MsoNormal> 29 | 9<o:p></o:p></p><p class=MsoNormal> 30 | \
0<o:p></o:p></p><p class=MsoNormal>(30 rows)<o:p></o:p></p><p class=MsoNormal>Now if \
I want to calculate the difference between two rows, I can use the lag() window \
function (8.4 manual ch 9.19).<o:p></o:p></p><p class=MsoNormal>The query and result \
set of a typical function would be:<o:p></o:p></p><p class=MsoNormal>> select \
c1,c2,lag(c1,1,(select max(c1) from x x1 where x1.c2 = x.c2 and x1.c1 < \
x.c1)) over ( partition by c2 order by c1) from x;<o:p></o:p></p><p \
class=MsoNormal> c1 | c2 | lag<o:p></o:p></p><p \
class=MsoNormal>----+----+-----<o:p></o:p></p><p class=MsoNormal> 10 | 0 \
|<o:p></o:p></p><p class=MsoNormal> 20 | 0 | 10<o:p></o:p></p><p \
class=MsoNormal> 30 | 0 | 20<o:p></o:p></p><p class=MsoNormal> 1 \
| 1 |<o:p></o:p></p><p class=MsoNormal> 11 | 1 | \
1<o:p></o:p></p><p class=MsoNormal> 21 | 1 | 11<o:p></o:p></p><p \
class=MsoNormal> 2 | 2 |<o:p></o:p></p><p class=MsoNormal> 12 | 2 \
| 2<o:p></o:p></p><p class=MsoNormal> 22 | 2 | \
12<o:p></o:p></p><p class=MsoNormal> 3 | 3 |<o:p></o:p></p><p \
class=MsoNormal> 13 | 3 | 3<o:p></o:p></p><p class=MsoNormal> 23 \
| 3 | 13<o:p></o:p></p><p class=MsoNormal> 4 | 4 \
|<o:p></o:p></p><p class=MsoNormal> 14 | 4 | 4<o:p></o:p></p><p \
class=MsoNormal> 24 | 4 | 14<o:p></o:p></p><p class=MsoNormal> 5 \
| 5 |<o:p></o:p></p><p class=MsoNormal> 15 | 5 | \
5<o:p></o:p></p><p class=MsoNormal> 25 | 5 | 15<o:p></o:p></p><p \
class=MsoNormal> 6 | 6 |<o:p></o:p></p><p class=MsoNormal> 16 | 6 \
| 6<o:p></o:p></p><p class=MsoNormal> 26 | 6 | \
16<o:p></o:p></p><p class=MsoNormal> 7 | 7 |<o:p></o:p></p><p \
class=MsoNormal> 17 | 7 | 7<o:p></o:p></p><p class=MsoNormal> 27 \
| 7 | 17<o:p></o:p></p><p class=MsoNormal> 8 | 8 \
|<o:p></o:p></p><p class=MsoNormal> 18 | 8 | 8<o:p></o:p></p><p \
class=MsoNormal> 28 | 8 | 18<o:p></o:p></p><p class=MsoNormal> 9 \
| 9 |<o:p></o:p></p><p class=MsoNormal> 19 | 9 | \
9<o:p></o:p></p><p class=MsoNormal> 29 | 9 | 19<o:p></o:p></p><p \
class=MsoNormal>(30 rows)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p \
class=MsoNormal>Almost there.<o:p></o:p></p><p class=MsoNormal>You can already see \
that the delta is (lag – c1).<o:p></o:p></p><p class=MsoNormal>But: how to \
handle the missing-first-row-null-delta.<o:p></o:p></p><p class=MsoNormal>Now, we \
need a ‘sliding window’ capability, so we can perform the calculations \
using values outside of this result set.<o:p></o:p></p><p class=MsoNormal>(eg stock \
values have a daily closing price that would be used to calculate the next \
day’s delta …. etc)<o:p></o:p></p><p class=MsoNormal>(this is always \
messy, so I was really happy to discover that the lag function provides an elegant \
way of reducing the mess).<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p \
class=MsoNormal>In this example, let’s assume that the first row in each \
partition (i.e. c1<=10) is our ‘previous-row’, upon which we will base \
our delta.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p \
class=MsoNormal>So, handling the ‘first-value’ case is as \
follows:<o:p></o:p></p><p class=MsoNormal>> select c1,c2,lag(c1,1,(select max(c1) \
from x x1 where x1.c2 = x.c2 and x1.c1 < x.c1)) over ( partition by c2 order \
by c1) from x where c1 > 10;<o:p></o:p></p><p class=MsoNormal> c1 | c2 | \
lag<o:p></o:p></p><p class=MsoNormal>----+----+-----<o:p></o:p></p><p \
class=MsoNormal> 20 | 0 | 10<o:p></o:p></p><p class=MsoNormal> 30 | \
0 | 20<o:p></o:p></p><p class=MsoNormal> 11 | 1 | \
1<o:p></o:p></p><p class=MsoNormal> 21 | 1 | 11<o:p></o:p></p><p \
class=MsoNormal> 12 | 2 | 2<o:p></o:p></p><p class=MsoNormal> 22 \
| 2 | 12<o:p></o:p></p><p class=MsoNormal> 13 | 3 | \
3<o:p></o:p></p><p class=MsoNormal> 23 | 3 | 13<o:p></o:p></p><p \
class=MsoNormal> 14 | 4 | 4<o:p></o:p></p><p class=MsoNormal> 24 \
| 4 | 14<o:p></o:p></p><p class=MsoNormal> 15 | 5 | \
5<o:p></o:p></p><p class=MsoNormal> 25 | 5 | 15<o:p></o:p></p><p \
class=MsoNormal> 16 | 6 | 6<o:p></o:p></p><p class=MsoNormal> 26 \
| 6 | 16<o:p></o:p></p><p class=MsoNormal> 17 | 7 | \
7<o:p></o:p></p><p class=MsoNormal> 27 | 7 | 17<o:p></o:p></p><p \
class=MsoNormal> 18 | 8 | 8<o:p></o:p></p><p class=MsoNormal> 28 \
| 8 | 18<o:p></o:p></p><p class=MsoNormal> 19 | 9 | \
9<o:p></o:p></p><p class=MsoNormal> 29 | 9 | 19<o:p></o:p></p><p \
class=MsoNormal>(20 rows)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p \
class=MsoNormal>As you can see, now the ‘lag’ column contains the \
previous value of c1 in all cases, the first of which is calculated from the previous \
‘sliding-window’ maximum.<o:p></o:p></p><p class=MsoNormal>You can now \
add the delta calculation (lag-c1) and you have your delta.<o:p></o:p></p><p \
class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Finally, another point, on \
performance.<o:p></o:p></p><p class=MsoNormal>The explain plan for this \
is:<o:p></o:p></p><p class=MsoNormal>> explain select c1,c2,lag(c1,1,(select \
max(c1) from x x1 where x1.c2 = x.c2 and x1.c1 < x.c1)) over ( partition by \
c2 order by c1) from x where c1 > 10;<o:p></o:p></p><p \
class=MsoNormal> &nbs \
p; \
QUERY PLAN<o:p></o:p></p><p \
class=MsoNormal>------------------------------------------------------------------<o:p></o:p></p><p \
class=MsoNormal> WindowAgg (cost=1.84..33.02 rows=21 width=8)<o:p></o:p></p><p \
class=MsoNormal> -> Sort (cost=1.84..1.89 rows=21 \
width=8)<o:p></o:p></p><p \
class=MsoNormal> Sort Key: x.c2, \
x.c1<o:p></o:p></p><p \
class=MsoNormal> -> Seq Scan \
on x (cost=0.00..1.38 rows=21 width=8)<o:p></o:p></p><p \
class=MsoNormal> \
Filter: (c1 > 10)<o:p></o:p></p><p class=MsoNormal> SubPlan \
1<o:p></o:p></p><p class=MsoNormal> -> \
Aggregate (cost=1.45..1.46 rows=1 width=4)<o:p></o:p></p><p \
class=MsoNormal> \
-> Seq Scan on x x1 (cost=0.00..1.45 rows=1 width=4)<o:p></o:p></p><p \
class=MsoNormal> \
Filter: ((c1 < $1) AND (c2 = $0))<o:p></o:p></p><p class=MsoNormal>(9 \
rows)<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>We \
have already gained because this feature eliminates what would have been handled by a \
lot of messy client code, not to mention the reduction of all of this to a single \
scan, and lookup for the first row.<o:p></o:p></p><p class=MsoNormal>But the first \
row lookup is going to mean multiple-lookups, and if your table ‘x’ is \
large, that’s going to hurt.<o:p></o:p></p><p \
class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>However, typically, I will \
offset the first-row-lookup overhead by preparing for it:<o:p></o:p></p><p \
class=MsoNormal>Create a small temp table containing exactly the first-row-lag-set \
that you are gonna want<o:p></o:p></p><p class=MsoNormal>And then the first-row-query \
goes against the temp table.<o:p></o:p></p><p class=MsoNormal>If the size of table \
‘x’ is such that you need to do this, it’ll help.<o:p></o:p></p><p \
class=MsoNormal>Alternatively, instead of a sub-query, you can union the temp table \
into your calculation, and then exclude it from the final result using \
nested-sub-queries.<o:p></o:p></p><p class=MsoNormal>Small temp tables are trivial to \
build and very useful for this type of thing.<o:p></o:p></p><p \
class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I used this technique to do a \
covariance matrix calculation in an oracle database a while ago – it kinda \
worked pretty well.<o:p></o:p></p><p class=MsoNormal>So I’m really glad to see \
it available in pg now – it’s really gonna help getting queries simpler \
and faster.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p \
class=MsoNormal>Mr<o:p></o:p></p><p \
class=MsoNormal><o:p> </o:p></p></div></body></html>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic