[prev in list] [next in list] [prev in thread] [next in thread] 

List:       pgsql-performance
Subject:    Re: Index for range queries on JSON (user defined fields)
From:       Marco Colli <collimarco91 () gmail ! com>
Date:       2020-12-05 10:50:09
Message-ID: CAFvCgN4fZc4ZGw5z-y_xbj9sJrK9nMNgzDiUkXBDQQuunZgstQ () mail ! gmail ! com
[Download RAW message or body]

Thanks for the suggestion: I had already considered that solution (first
link), but the fear is having to JOIN large tables with hundreds of
millions of records.

For my understanding **using JOIN when dealing with big data is bad and a
nightmare for performance**: can you confirm? Or am I missing something?

That tables would be frequently read and updated and are the core of the
application: that also means that every update on a user would produce
**many dead rows** - not just 1 user row, as in the case of JSON, but many
rows in the user metadata table.





On Fri, Dec 4, 2020 at 11:40 PM Nick Cleaton <nick@cleaton.net> wrote:

> On Fri, 4 Dec 2020 at 15:39, Marco Colli <collimarco91@gmail.com> wrote:
>
>> Hello!
>>
>> We have a multi-tenant service where each customer has millions of users
>> (total: ~150M rows). Now we would like to let each customer define some
>> custom columns for his users and then let the customer search his users
>> efficiently based on these columns.
>>
>> This problem seems really hard to solve with PostgreSQL:
>>
>> https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields
>>
>> In particular the easiest way would be to add a JSON field on the users
>> table (e.g. user metadata). However the PostgreSQL GIN index only supports
>> exact matches and not range queries. This means that a query on a range
>> (e.g. age > 30) would be extremely inefficient and would result in a table
>> scan.
>>
>
> You could have a table of (tenant, customer, setting_name, setting_value)
> so that a btree index on (tenant, setting_name, setting_value) would work
> for "select customer from my_table where tenant=$1 and setting_name='age'
> and setting_value > 30"
>
> That doesn't deal with setting values having a variety of types, but you
> could have a distinct user defined settings table for each setting value
> type that you want to support.
>
>

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="ltr">Thanks for the suggestion: I had already considered \
that solution (first link), but the fear is having to JOIN large tables with hundreds \
of millions of records.  <div><br></div><div>For my understanding **using JOIN when \
dealing with big data is bad and a nightmare for performance**: can you confirm? Or \
am I missing something?<br><div><br></div><div>That tables would be frequently read \
and updated and are the core of the application: that also means that every update on \
a user would produce **many dead rows** - not just 1 user row, as in the case of \
JSON, but many rows in the user metadata \
table.<br></div><div><br></div><div><div><div><br></div><div><br><div><br></div></div></div></div></div></div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Dec 4, 2020 at 11:40 PM \
Nick Cleaton &lt;<a href="mailto:nick@cleaton.net">nick@cleaton.net</a>&gt; \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr"><div dir="ltr">On Fri, 4 Dec 2020 at 15:39, Marco Colli &lt;<a \
href="mailto:collimarco91@gmail.com" target="_blank">collimarco91@gmail.com</a>&gt; \
wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" \
style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div \
dir="ltr">Hello!<br><div><br></div><div>We have a multi-tenant service where each \
customer has millions of users (total: ~150M rows). Now we would like to let each \
customer define some custom columns for his users and then let the customer  search \
his users efficiently based on these columns.</div><div><br></div><div>This problem \
seems really hard to solve with PostgreSQL:</div><div><a \
href="https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields" \
target="_blank">https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields</a><br></div><div><br></div><div>In \
particular the easiest way would be to add a JSON field on the users table (e.g. user \
metadata). However the PostgreSQL GIN index only supports exact matches and not range \
queries. This means that a query on a range (e.g. age &gt; 30) would be extremely \
inefficient and would result in a table \
scan.</div></div></div></div></div></div></div></blockquote><div><br></div><div>You \
could have a table of (tenant, customer, setting_name, setting_value) so that a btree \
index on (tenant, setting_name, setting_value) would work for &quot;select customer \
from my_table where tenant=$1 and setting_name=&#39;age&#39; and setting_value &gt; \
30&quot; <br></div><div><br></div><div>That doesn&#39;t deal with setting values \
having a variety of types, but you could have a distinct user defined settings table \
for each setting value type that you want to \
support.<br></div><div><br></div></div></div> </blockquote></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic