[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: bug or lacking doc hint
From:       Kirk Wolak <wolakk () gmail ! com>
Date:       2023-06-27 18:30:46
Message-ID: CACLU5mTQ+qi_7Y2NpZb+dv9Q9VS3KNebWC2JHbvzhYxrr4anTQ () mail ! gmail ! com
[Download RAW message or body]

On Mon, Jun 26, 2023 at 4:21 PM Marc Millas <marc.millas@mokadb.com> wrote:

> On Mon, Jun 26, 2023 at 4:05 PM Ron <ronljohnsonjr@gmail.com> wrote:
>
>> On 6/26/23 07:22, Marc Millas wrote:
>>
>> On Mon, Jun 26, 2023 at 5:47 AM Avin Kavish <avin@baseboard.ai> wrote:
>>
>>> Sounds like the problem you are having is, the server is running out of
>>> temporary resources for the operation that users are trying to do. So
>>> according to Tom, on the postgres side, the operation cannot be optimized
>>> further.
>>>
>>> I think you have few choices here,
>>>   - See if increasing the resources of the server will allow them to run
>>> the operation
>>>   - Ask users not to do that operation
>>>   - Use a extension like citus to scale horizontally
>>>
>>> But I'm thinking why a massively inefficient join is needed in the first
>>> place. Shouldn't joins be for following keyed relationships. So ideally a
>>> unique indexed column, but at the very least an indexed column. Why is a
>>> join required on a dynamically calculated substring? Can it be made into a
>>> static computed value and indexed? Substring sounds like an op that should
>>> be in the filter stage.
>>>
>>> Can you describe your data model? Maybe we can give some specific advice.
>>>
>>
>> There is a set of big tables containing fine grain health data. The DB is
>> accessed by people doing research on various kind of sicknesses. So, by
>> nature, all columns (I mean ALL) can be used for every kind of SQL
>> including, obviously, lots of intricated joins.
>>
>>
>> This sounds like the kind of problem normally solved by data warehouses.
>> Is your schema designed like a DW, or is it in 3NF?
>>
>
> it's, indeed, some kind of dwh.
> but it's neither a star nor a snowflake .at least not used like those
> standard schemas.
> in one of the big tables (10 billions+ rows), there is around 60 columns,
> describing one event: some guy have had a given sickness, got a given medoc
> etc
> The pb is that its not one simple event with a set of dimensions,
> the  people  using that db are NOT looking for an event according to
> various criterias, they are looking for correlations between each of the
> 60+ columns.
> As a consequence very few indexes are used as most requests end in some
> kind of huge sequential reads.
> The machine was built for this and perform well, but some requests are
> posing pb and we must find solutions/workaround.
> one of the users did rewrite the request using  a select distinct matched
> with left join(s) and table.a is not null set of conditions.
> looks crazy, but does work. I'll get the request tomorrow.
>
Marc,
  Something we did for cases like this... We actually created views that
handled the complex joining.
Then we trained users to select from the views (as opposed to the tables).

  The upside of this approach is that you can really optimize the views for
the joins.  And then the results get filtered by
the where clauses they provide.

  Usually a DBA or heavy DB user creates the views, and gets the permission
to create the indexes that speed them up.

  The DOWNSIDE of this approach is that if you are constantly adding
columns, you have to recreate your views.
And please be careful with Views that depend on views.  While it works, it
creates downsides when you attempt to
change a view.  Often having to drop all of the downstream views, and
recreate them.

HTH

Kirk

>

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="ltr">On Mon, Jun 26, 2023 at 4:21 PM Marc Millas &lt;<a \
href="mailto:marc.millas@mokadb.com">marc.millas@mokadb.com</a>&gt; \
wrote:<br></div><div class="gmail_quote"><blockquote class="gmail_quote" \
style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div><div dir="ltr" \
class="gmail_signature"><div dir="ltr"><div dir="ltr"><div>On Mon, Jun 26, 2023 at \
4:05 PM Ron &lt;<a href="mailto:ronljohnsonjr@gmail.com" \
target="_blank">ronljohnsonjr@gmail.com</a>&gt; \
wrote:<br></div></div></div></div></div></div><div class="gmail_quote"><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex">  
    
  
  <div>
    On 6/26/23 07:22, Marc Millas wrote:<br>
    <blockquote type="cite">
      
      <div dir="ltr">
        <div dir="ltr">
          <div>
            <div dir="ltr" class="gmail_signature">
              <div dir="ltr">
                <div dir="ltr">
                  <div>On Mon, Jun 26, 2023 at
            5:47 AM Avin Kavish &lt;<a href="mailto:avin@baseboard.ai" \
                target="_blank">avin@baseboard.ai</a>&gt;
            wrote:<br></div></div></div></div></div></div><div class="gmail_quote">
          <blockquote class="gmail_quote" style="margin:0px 0px 0px \
0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">  <div \
dir="ltr">Sounds like the problem you are having is,  the server is running out of \
temporary resources for the  operation that users are trying to do. So according to
              Tom, on the postgres side, the operation cannot be
              optimized further.
              <div><br>
              </div>
              <div>I think you have few choices here,</div>
              <div>   - See if increasing the resources of the server
                will allow them to run the operation</div>
              <div>   - Ask users not to do that operation</div>
              <div>   - Use a extension like citus to scale horizontally</div>
              <div><br>
              </div>
              <div>But I&#39;m thinking why a massively inefficient join is
                needed in the first place. Shouldn&#39;t joins be for
                following keyed relationships. So ideally a unique
                indexed column, but at the very least an indexed column.
                Why is a join required on a dynamically calculated
                substring? Can it be made into a static computed
                value  and indexed? Substring sounds like an op that
                should be in the filter stage.</div>
              <div><br>
              </div>
              <div>Can you describe your data model? Maybe we can give
                some specific advice.</div>
            </div>
          </blockquote>
          <div><br>
          </div>
          <div>There is a set of big tables containing fine grain health
            data. The DB is accessed by people doing research on various
            kind of sicknesses. So, by nature, all columns (I mean ALL)
            can be used for every kind of SQL including, obviously, lots
            of intricated  joins. <br>
          </div>
        </div>
      </div>
    </blockquote>
    <br>
    This sounds like the kind of problem normally solved by data
    warehouses.   Is your schema designed like a DW, or is it in \
3NF?<br></div></blockquote><div><br></div><div>it&#39;s, indeed, some kind of \
dwh.</div><div>but it&#39;s neither a star nor  a snowflake .at least not used like \
those standard schemas.</div><div>in one of the big tables (10 billions+ rows), there \
is around 60 columns, describing one event: some guy have had a given sickness, got a \
given medoc etc</div><div>The pb is that its not one simple event with a set of \
dimensions, the    people  

using that db

are NOT looking for an event according to various criterias, they  are looking for \
correlations between each of the 60+ columns.</div><div>As a consequence very few \
indexes are used as most requests end in some kind of huge sequential \
reads.</div><div>The machine was built for this and perform well, but some requests \
are posing pb and we must find solutions/workaround.</div><div>one of the users did \
rewrite the request using   a select distinct matched with left join(s) and table.a \
is not null set of conditions.</div><div>looks crazy, but does work. I&#39;ll get the \
request tomorrow.</div></div></div></blockquote><div>Marc,</div><div>   Something we \
did for cases like this... We actually created views that handled the complex \
joining.</div><div>Then we trained users to select from the views (as opposed to the \
tables).</div><div><br></div><div>   The upside of this approach is that you can \
really optimize the views for the joins.   And then the results get filtered \
by</div><div>the where clauses they provide.</div><div><br></div><div>   Usually a \
DBA or heavy DB user creates the views, and gets the permission to create the indexes \
that speed them up.</div><div><br></div><div>   The DOWNSIDE of this approach is that \
if you are constantly  adding columns, you have to recreate your views.<br>And please \
be careful with Views that depend on views.   While it works, it creates downsides \
when you attempt to</div><div>change a view.   Often having to drop all of the \
downstream views, and recreate \
them.</div><div><br></div><div>HTH</div><div><br></div><div>Kirk</div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex"> </blockquote></div></div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic