[prev in list] [next in list] [prev in thread] [next in thread]
List: postgresql-general
Subject: Re: [HACKERS] proposal: be smarter about i/o patterns in index scan
From: "Glen Parker" <glenebob () nwlink ! com>
Date: 2004-05-19 21:06:44
Message-ID: AJEKKAIECKNMBCEKADJPKEACCFAA.glenebob () nwlink ! com
[Download RAW message or body]
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane
> > Or:
> > 2. Sort AND COPY TID's into physical order.
> > 3. Read heap in phy. order, match results to un-sorted TID list.
> > That sounds quite cheap.
>
> No, it sounds like handwaving. What's your "match results" step, and
You got me :-)
> does it take less than O(N^2) time? How do you get to *return* the
Less than O(N^2)??? Geez I think so!
> tuples in index order, without having read them all before you return
> the first one? (Which is really what the problem is with the sort
> alternative, at least for fast-start cases such as the LIMIT 1 example.)
(you'll have to pardon me for thinking as I type... and being to long winded
:-)
It takes O(1) time.
If you have a hard maximum of TID's you'll grab from the index (which sounds
right), you could store the TID list in a vector. The IO-order-sorted list
could be a singly-linked-list OR a vector, but either way, each entry would
have to point back to an offset in the index-ordered list.
The index-ordered list would be (sizeof(TID) * max_TID_count) bytes, and the
IO-ordered list would be (sizeof(TID) + sizeof(int) * max_TID_count) bytes.
If a TID is 8 bytes and we're going to fetch 2048 index entries per
iteration, that gets us (*counting on fingers*) 16K (for index-ordered list)
+ 24K for the IO-ordered list.
The index-ordered list can also be a singly-linked list, in which case the
mapping would be by pointer. If both lists are single-linked lists, then
the size overhead (for a fully realized 2048 entry scan) would be:
((sizeof(TID) + sizeof(void*)) * max_TID_count) + ((sizeof(TID) +
sizeof(void*) + sizeof(void*)) * max_TID_count) = (*more counting on
fingers*) ... 24K + 32K = 56K.
Hmm, the vector style would be crazy expensive for small queries. The
linked-list approach sounds pretty good though. I assume no one thinks
doing this would be completely free :-) And, if you did the IO-ordered list
as a vector, you'd save the linked-list overhead, and you would know exactly
how big to make it, so it wouldn't have to hurt you on small queries.
2048 entries seems pretty darn generous actually.
> How do you get to *return* the
> tuples in index order, without having read them all before you return
Hmm, worsed case scenario would be to scan the heap in exactly-reverse order
of index order. With a 2048 entry batch size, you'd have a fair amount of
overhead on a LIMIT 1 :-)
Unless the index scanner could be given a maximum tuple count, rather than
just 'knowing' it from a GUC value. Would this dirty up the plan tree
beyond all recognition?
> What happens when you run out of memory for the list of TIDs ... er,
> make that two lists of TIDs?
Now you really got me. Hand waving: the same thing you do when you run out
of memory anywhere else :-) By configuring the server to fetch index
entries in 1-entry batches, you'd be right back to the overhead (or very
nearly so) of the current implementation, so it would only hurt people who
chose to be hurt by it :-)
-Glen
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic