'Re: [PERFORM] Aggregating tsqueries'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: [PERFORM] Aggregating tsqueries
From:       Heikki Linnakangas <hlinnakangas () vmware ! com>
Date:       2014-09-18 7:20:41
Message-ID: 541A87C9.4030002 () vmware ! com
[Download RAW message or body]

On 09/17/2014 07:56 AM, Alexander Hill wrote:
> Hello,
>
> I have a table of tree nodes with a tsquery column. To get a subtree's
> tsquery, I need to OR all of its nodes' tsqueries together.
>
> I defined a custom aggregate using tsquery_or:
>
>      CREATE AGGREGATE tsquery_or_agg (tsquery)
>      (
>          sfunc = tsquery_or,
>          stype = tsquery
>      );
>
> but I've found that
>
>      tsquery_or_agg(query)
>
> is about a hundred times slower than this:
>
>      ('(' || string_agg(query::text, ')|(') || ')')::tsquery
>
> That works perfectly so I'm happy to continue doing it, but I'm curious to
> know why the difference is so great and if anything can be done about it?

string_agg's state transition function uses a buffer that's expanded as 
needed. At every step, the next string is appended to the buffer. Your 
custom aggregate is less efficient, because it constructs a new tsquery 
object at every step. In every step, a new tsquery object is allocated 
and the old result and the next source tsquery are copied to it. That's 
much more expensive.

If you're not shy of writing C code, you could write a more efficient 
version of tsquery_or_agg too, using a similar technique.

- Heikki


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic