[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgresql-general
Subject:    Re: Please recommend postgresql.conf improvements for osm2pgsql loading Europe
From:       hector vass <hector.vass () gmail ! com>
Date:       2024-03-31 15:55:02
Message-ID: CAJJx+iXnJUqUntghEmRr1Gb+XWQ3ES4dOZ3O9s+sKJW2wZbMug () mail ! gmail ! com
[Download RAW message or body]

On Sat, 30 Mar 2024, 10:04 Alexander Farber, <alexander.farber@gmail.com>
wrote:

> Thank you, Justin -
>
> On Sat, Mar 30, 2024 at 4:33 AM Justin Clift <justin@postgresql.org>
> wrote:
>
>> On 2024-03-30 05:53, Alexander Farber wrote:
>> > I use the following postgresql.conf in my Dockerfile
>> > ( the full version at https://stackoverflow.com/a/78243530/165071 ),
>> > when loading a 28 GByte large europe-latest.osm.pbf
>>
>> Not specific conf file improvements, but for an initial data load
>> have you done things like turning off fsync(), deferring index
>> creating until after the data load finishes, and that kind of thing?
>>
>
> I will try the following commands in my Dockerfile then
> and later report back on any improvements:
>
> RUN set -eux && \
>     pg_ctl init && \
>     echo "shared_buffers = 1GB"                >> $PGDATA/postgresql.conf
> && \
>     echo "work_mem = 50MB"                     >> $PGDATA/postgresql.conf
> && \
>     echo "maintenance_work_mem = 10GB"         >> $PGDATA/postgresql.conf
> && \
>     echo "autovacuum_work_mem = 2GB"           >> $PGDATA/postgresql.conf
> && \
>     echo "wal_level = minimal"                 >> $PGDATA/postgresql.conf
> && \
>     echo "checkpoint_timeout = 60min"          >> $PGDATA/postgresql.conf
> && \
>     echo "max_wal_size = 10GB"                 >> $PGDATA/postgresql.conf
> && \
>     echo "checkpoint_completion_target = 0.9"  >> $PGDATA/postgresql.conf
> && \
>     echo "max_wal_senders = 0"                 >> $PGDATA/postgresql.conf
> && \
>     echo "random_page_cost = 1.0"              >> $PGDATA/postgresql.conf
> && \
>     echo "password_encryption = scram-sha-256" >> $PGDATA/postgresql.conf
> && \
>     echo "fsync = off"                            >>
> $PGDATA/postgresql.conf && \
>     pg_ctl start && \
>     createuser --username=postgres $PGUSER && \
>     createdb --username=postgres --encoding=UTF8 --owner=$PGUSER
> $PGDATABASE && \
>     psql --username=postgres $PGDATABASE --command="ALTER USER $PGUSER
> WITH PASSWORD '$PGPASSWORD';" && \
>     psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF
> NOT EXISTS postgis;' && \
>     psql --username=postgres $PGDATABASE --command='CREATE EXTENSION IF
> NOT EXISTS hstore;' && \
>     osm2pgsql --username=$PGUSER --database=$PGDATABASE --create
> --cache=60000 --hstore --latlong /data/map.osm.pbf && \
>     rm -f /data/map.osm.pbf && \
>     pg_ctl stop && \
>     echo "fsync = on"                            >>
> $PGDATA/postgresql.conf && \
>     echo '# TYPE DATABASE USER ADDRESS METHOD'                >
> $PGDATA/pg_hba.conf && \
>     echo "local all postgres peer"                           >>
> $PGDATA/pg_hba.conf && \
>     echo "local $PGDATABASE $PGUSER           scram-sha-256" >>
> $PGDATA/pg_hba.conf && \
>     echo "host  $PGDATABASE $PGUSER 0.0.0.0/0 scram-sha-256" >>
> $PGDATA/pg_hba.conf
>
> The later fsync = on will override the former, right?
>
> Best regards
> Alex
>
>
>

2hrs sounds reasonable for Europe, it's a big place in terms of osm data
and osm2pgsql is doing processing to convert to geometry objects prior to
doing anything on the Postgresql side.
If you examine the --log--sql output for a small test country you can see
what it does in terms of the postgresql.
osm2pgsql gives options to trim the output to only what you need (so if you
don't want waterways, traffic features, parking places or places of worship
etc.. why load them)
Hopefully you have found the excellent geofabrik
https://download.geofabrik.de/ source for osm data.
Rather than load this data afresh each update cycle you would be better off
simply loading the changes so the .osc files or ... osm osmosis will create
the equivalent of a diff file for you
Looks like you are already using osm2psql's recommended postgresql.config
settings, I'd be surprised if this was way off.  Getting as close to tin
rather than virtual machines and containers will also help, lots of io
going on here.
If you are only interested in the geography you might consider geofabrik's
shapefile available for many countries, they have already done some of the
work for you.

Apologies if you are already a long way down this route & just asking about
the final stage of loading the osm2pgsql output to Postgresql but however
well you do here I would only expect small marginal gains.

[Attachment #3 (text/html)]

<div dir="ltr"><div dir="auto"><div><br><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Sat, 30 Mar 2024, 10:04 Alexander Farber, &lt;<a \
href="mailto:alexander.farber@gmail.com" \
target="_blank">alexander.farber@gmail.com</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div \
dir="ltr"><div>Thank you, Justin -</div><br><div class="gmail_quote"><div dir="ltr" \
class="gmail_attr">On Sat, Mar 30, 2024 at 4:33 AM Justin Clift &lt;<a \
href="mailto:justin@postgresql.org" rel="noreferrer" \
target="_blank">justin@postgresql.org</a>&gt; wrote:<br></div><blockquote \
class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid \
rgb(204,204,204);padding-left:1ex">On 2024-03-30 05:53, Alexander Farber \
wrote:<br>&gt; I use the following postgresql.conf in my Dockerfile<br> &gt; ( the \
full version at <a href="https://stackoverflow.com/a/78243530/165071" rel="noreferrer \
noreferrer" target="_blank">https://stackoverflow.com/a/78243530/165071</a> ),<br> \
&gt; when loading a 28 GByte large europe-latest.osm.pbf<br><br> Not specific conf \
file improvements, but for an initial data load<br> have you done things like turning \
off fsync(), deferring index<br> creating until after the data load finishes, and \
that kind of thing?<br></blockquote><div><br>I will try the following commands in my \
Dockerfile then  </div><div>and later report back on any \
improvements:<br><br><div>RUN set -eux &amp;&amp; \</div><div>      pg_ctl init \
&amp;&amp; \</div><div>      echo &quot;shared_buffers = 1GB&quot;                    \
&gt;&gt; $PGDATA/postgresql.conf &amp;&amp; \</div><div>      echo &quot;work_mem = \
50MB&quot;                                &gt;&gt; $PGDATA/postgresql.conf &amp;&amp; \
\</div><div>      echo &quot;maintenance_work_mem = 10GB&quot;              &gt;&gt; \
$PGDATA/postgresql.conf &amp;&amp; \</div><div>      echo &quot;autovacuum_work_mem = \
2GB&quot;                 &gt;&gt; $PGDATA/postgresql.conf &amp;&amp; \</div><div>    \
echo &quot;wal_level = minimal&quot;                          &gt;&gt; \
$PGDATA/postgresql.conf &amp;&amp; \</div><div>      echo &quot;checkpoint_timeout = \
60min&quot;               &gt;&gt; $PGDATA/postgresql.conf &amp;&amp; \</div><div>    \
echo &quot;max_wal_size = 10GB&quot;                          &gt;&gt; \
$PGDATA/postgresql.conf &amp;&amp; \</div><div>      echo \
&quot;checkpoint_completion_target = 0.9&quot;   &gt;&gt; $PGDATA/postgresql.conf \
&amp;&amp; \</div><div>      echo &quot;max_wal_senders = 0&quot;                     \
&gt;&gt; $PGDATA/postgresql.conf &amp;&amp; \</div><div>      echo \
&quot;random_page_cost = 1.0&quot;                     &gt;&gt; \
$PGDATA/postgresql.conf &amp;&amp; \</div><div>      echo &quot;password_encryption = \
scram-sha-256&quot; &gt;&gt; $PGDATA/postgresql.conf &amp;&amp; \</div><div>      \
echo &quot;fsync = off&quot;                                          &gt;&gt; \
$PGDATA/postgresql.conf &amp;&amp; \</div><div>      pg_ctl start &amp;&amp; \  \
</div><div>      createuser --username=postgres $PGUSER &amp;&amp; \</div><div>      \
createdb --username=postgres --encoding=UTF8 --owner=$PGUSER $PGDATABASE &amp;&amp; \
\</div><div>      psql --username=postgres $PGDATABASE --command=&quot;ALTER USER \
$PGUSER WITH PASSWORD &#39;$PGPASSWORD&#39;;&quot; &amp;&amp; \</div><div>      psql \
--username=postgres $PGDATABASE --command=&#39;CREATE EXTENSION IF NOT EXISTS \
postgis;&#39; &amp;&amp; \</div><div>      psql --username=postgres $PGDATABASE \
--command=&#39;CREATE EXTENSION IF NOT EXISTS hstore;&#39; &amp;&amp; \</div><div>    \
osm2pgsql --username=$PGUSER --database=$PGDATABASE --create --cache=60000 --hstore \
--latlong /data/map.osm.pbf &amp;&amp; \</div><div>      rm -f /data/map.osm.pbf \
&amp;&amp; \</div><div>      pg_ctl stop &amp;&amp; \</div><div>      echo \
&quot;fsync = on&quot;                                          &gt;&gt; \
$PGDATA/postgresql.conf &amp;&amp; \</div><div>      echo &#39;# TYPE DATABASE USER \
ADDRESS METHOD&#39;                        &gt; $PGDATA/pg_hba.conf &amp;&amp; \
\</div><div>      echo &quot;local all postgres peer&quot;                            \
&gt;&gt; $PGDATA/pg_hba.conf &amp;&amp; \</div><div>      echo &quot;local \
$PGDATABASE $PGUSER                 scram-sha-256&quot; &gt;&gt; $PGDATA/pg_hba.conf \
&amp;&amp; \</div><div>      echo &quot;host   $PGDATABASE $PGUSER <a \
href="http://0.0.0.0/0" rel="noreferrer" target="_blank">0.0.0.0/0</a> \
scram-sha-256&quot; &gt;&gt; $PGDATA/pg_hba.conf<br><br>The later fsync = on will \
override the former, right?</div></div><div><br></div><div>Best \
regards</div><div>Alex</div><div><br></div><div>  \
</div></div></div></div></div></div></blockquote></div></div><div \
dir="auto"><br></div><div dir="auto">2hrs sounds reasonable for Europe, it&#39;s a \
big place in terms of osm data and osm2pgsql is doing processing to convert to \
geometry objects prior to doing anything on the Postgresql side.</div><div \
dir="auto">If you examine the --log--sql output for a small test country you can see \
what it does in terms of the postgresql.</div><div dir="auto">osm2pgsql gives options \
to trim the output to only what you need (so if you don&#39;t  want waterways, \
traffic features, parking places or places of worship etc.. why load them)</div><div \
dir="auto">Hopefully you have found the excellent geofabrik  <a \
href="https://download.geofabrik.de/" \
target="_blank">https://download.geofabrik.de/</a> source for osm data.    \
</div><div>Rather than load this data afresh each update cycle you would be better \
off simply loading the changes so the .osc files or ... osm osmosis will create the \
equivalent of a diff file for you</div><div dir="auto">Looks like you are already \
using osm2psql&#39;s recommended postgresql.config settings, I&#39;d be surprised if \
this was way off.   Getting as close to tin rather than virtual machines and \
containers will also help, lots of io going on here.    </div><div dir="auto">If you \
are only interested in the geography  you might consider geofabrik&#39;s shapefile \
available for many countries, they have already done some of the work for \
you.<br></div><div dir="auto"><br></div><div dir="auto"><div dir="auto">Apologies if \
you are already a long way down this route &amp; just asking about the final stage of \
loading the osm2pgsql output to Postgresql but however well you do here I would only \
expect small marginal gains.</div><div dir="auto"><br></div></div><div \
dir="auto"><br></div><div dir="auto"><div class="gmail_quote"><blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"> </blockquote></div></div></div>
</div>



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic