'Re: [HACKERS] proposal: be smarter about i/o patterns in index scan'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: [HACKERS] proposal: be smarter about i/o patterns in index scan
From:       "Glen Parker" <glenebob () nwlink ! com>
Date:       2004-05-19 21:06:44
Message-ID: AJEKKAIECKNMBCEKADJPKEACCFAA.glenebob () nwlink ! com
[Download RAW message or body]

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org]On Behalf Of Tom Lane

> > Or:
> >   2. Sort AND COPY TID's into physical order.
> >   3. Read heap in phy. order, match results to un-sorted TID list.
> > That sounds quite cheap.
>
> No, it sounds like handwaving.  What's your "match results" step, and

You got me :-)

> does it take less than O(N^2) time?  How do you get to *return* the

Less than O(N^2)???  Geez I think so!

> tuples in index order, without having read them all before you return
> the first one?  (Which is really what the problem is with the sort
> alternative, at least for fast-start cases such as the LIMIT 1 example.)

(you'll have to pardon me for thinking as I type... and being to long winded
:-)

It takes O(1) time.

If you have a hard maximum of TID's you'll grab from the index (which sounds
right), you could store the TID list in a vector.  The IO-order-sorted list
could be a singly-linked-list OR a vector, but either way, each entry would
have to point back to an offset in the index-ordered list.

The index-ordered list would be (sizeof(TID) * max_TID_count) bytes, and the
IO-ordered list would be (sizeof(TID) + sizeof(int) * max_TID_count) bytes.
If a TID is 8 bytes and we're going to fetch 2048 index entries per
iteration, that gets us (*counting on fingers*) 16K (for index-ordered list)
+ 24K for the IO-ordered list.

The index-ordered list can also be a singly-linked list, in which case the
mapping would be by pointer.  If both lists are single-linked lists, then
the size overhead (for a fully realized 2048 entry scan) would be:
((sizeof(TID) + sizeof(void*)) * max_TID_count)  +  ((sizeof(TID) +
sizeof(void*) + sizeof(void*)) * max_TID_count) = (*more counting on
fingers*) ... 24K + 32K = 56K.

Hmm, the vector style would be crazy expensive for small queries.  The
linked-list approach sounds pretty good though.  I assume no one thinks
doing this would be completely free :-)  And, if you did the IO-ordered list
as a vector, you'd save the linked-list overhead, and you would know exactly
how big to make it, so it wouldn't have to hurt you on small queries.

2048 entries seems pretty darn generous actually.

> How do you get to *return* the
> tuples in index order, without having read them all before you return

Hmm, worsed case scenario would be to scan the heap in exactly-reverse order
of index order.  With a 2048 entry batch size, you'd have a fair amount of
overhead on a LIMIT 1 :-)

Unless the index scanner could be given a maximum tuple count, rather than
just 'knowing' it from a GUC value.  Would this dirty up the plan tree
beyond all recognition?

> What happens when you run out of memory for the list of TIDs ... er,
> make that two lists of TIDs?

Now you really got me.  Hand waving: the same thing you do when you run out
of memory anywhere else :-)  By configuring the server to fetch index
entries in 1-entry batches, you'd be right back to the overhead (or very
nearly so) of the current implementation, so it would only hurt people who
chose to be hurt by it :-)


-Glen





---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic