[prev in list] [next in list] [prev in thread] [next in thread]
List: pgsql-performance
Subject: Re: Index for range queries on JSON (user defined fields)
From: Marco Colli <collimarco91 () gmail ! com>
Date: 2020-12-05 10:50:09
Message-ID: CAFvCgN4fZc4ZGw5z-y_xbj9sJrK9nMNgzDiUkXBDQQuunZgstQ () mail ! gmail ! com
[Download RAW message or body]
Thanks for the suggestion: I had already considered that solution (first
link), but the fear is having to JOIN large tables with hundreds of
millions of records.
For my understanding **using JOIN when dealing with big data is bad and a
nightmare for performance**: can you confirm? Or am I missing something?
That tables would be frequently read and updated and are the core of the
application: that also means that every update on a user would produce
**many dead rows** - not just 1 user row, as in the case of JSON, but many
rows in the user metadata table.
On Fri, Dec 4, 2020 at 11:40 PM Nick Cleaton <nick@cleaton.net> wrote:
> On Fri, 4 Dec 2020 at 15:39, Marco Colli <collimarco91@gmail.com> wrote:
>
>> Hello!
>>
>> We have a multi-tenant service where each customer has millions of users
>> (total: ~150M rows). Now we would like to let each customer define some
>> custom columns for his users and then let the customer search his users
>> efficiently based on these columns.
>>
>> This problem seems really hard to solve with PostgreSQL:
>>
>> https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields
>>
>> In particular the easiest way would be to add a JSON field on the users
>> table (e.g. user metadata). However the PostgreSQL GIN index only supports
>> exact matches and not range queries. This means that a query on a range
>> (e.g. age > 30) would be extremely inefficient and would result in a table
>> scan.
>>
>
> You could have a table of (tenant, customer, setting_name, setting_value)
> so that a btree index on (tenant, setting_name, setting_value) would work
> for "select customer from my_table where tenant=$1 and setting_name='age'
> and setting_value > 30"
>
> That doesn't deal with setting values having a variety of types, but you
> could have a distinct user defined settings table for each setting value
> type that you want to support.
>
>
[Attachment #3 (text/html)]
<div dir="ltr"><div dir="ltr">Thanks for the suggestion: I had already considered \
that solution (first link), but the fear is having to JOIN large tables with hundreds \
of millions of records. <div><br></div><div>For my understanding **using JOIN when \
dealing with big data is bad and a nightmare for performance**: can you confirm? Or \
am I missing something?<br><div><br></div><div>That tables would be frequently read \
and updated and are the core of the application: that also means that every update on \
a user would produce **many dead rows** - not just 1 user row, as in the case of \
JSON, but many rows in the user metadata \
table.<br></div><div><br></div><div><div><div><br></div><div><br><div><br></div></div></div></div></div></div></div><br><div \
class="gmail_quote"><div dir="ltr" class="gmail_attr">On Fri, Dec 4, 2020 at 11:40 PM \
Nick Cleaton <<a href="mailto:nick@cleaton.net">nick@cleaton.net</a>> \
wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr"><div dir="ltr">On Fri, 4 Dec 2020 at 15:39, Marco Colli <<a \
href="mailto:collimarco91@gmail.com" target="_blank">collimarco91@gmail.com</a>> \
wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" \
style="margin:0px 0px 0px \
0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204,204,204);padding-left:1ex"><div \
dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div \
dir="ltr">Hello!<br><div><br></div><div>We have a multi-tenant service where each \
customer has millions of users (total: ~150M rows). Now we would like to let each \
customer define some custom columns for his users and then let the customer search \
his users efficiently based on these columns.</div><div><br></div><div>This problem \
seems really hard to solve with PostgreSQL:</div><div><a \
href="https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields" \
target="_blank">https://stackoverflow.com/questions/5106335/how-to-design-a-database-for-user-defined-fields</a><br></div><div><br></div><div>In \
particular the easiest way would be to add a JSON field on the users table (e.g. user \
metadata). However the PostgreSQL GIN index only supports exact matches and not range \
queries. This means that a query on a range (e.g. age > 30) would be extremely \
inefficient and would result in a table \
scan.</div></div></div></div></div></div></div></blockquote><div><br></div><div>You \
could have a table of (tenant, customer, setting_name, setting_value) so that a btree \
index on (tenant, setting_name, setting_value) would work for "select customer \
from my_table where tenant=$1 and setting_name='age' and setting_value > \
30" <br></div><div><br></div><div>That doesn't deal with setting values \
having a variety of types, but you could have a distinct user defined settings table \
for each setting value type that you want to \
support.<br></div><div><br></div></div></div> </blockquote></div>
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic