[prev in list] [next in list] [prev in thread] [next in thread] 

List:       postgis-users
Subject:    Re: [postgis-users] Merging census block data in postgresql
From:       "Tsering W. Shawa" <shawatw () princeton ! edu>
Date:       2020-05-12 13:39:02
Message-ID: BL0PR04MB648428C0BDC005475F5026AAB9BE0 () BL0PR04MB6484 ! namprd04 ! prod ! outlook ! com
[Download RAW message or body]

[Attachment #2 (text/plain)]

Hi Paul,

Thanks for your email. I have a background in a GIS but not in a relational database. \
I am trying to understand conceptually how process should work. This is what I was \
thinking.....after I import all the individual state census block layer or table on \
the Postgresql database through the PostGIS Shapefile Import/Export Manager then I \
have to merge or combine them into one. After I merged all the census block \
boundaries and imported geocoded location data that is in a shapefile format onto the \
Postgresql database then I could run the spatial join SQL statement something like \
below example.



SELECT *

FROM census_blocks AS c

JOIN address AS a

ON ST_Contains (c.geom, a.geom);


I, therefore, trying to understand what options are there to do a similar function in \
the Postgresql - PostGIS world.

Any help and suggestions will be appreciated.

Thanks,
-Tsering


________________________________
From: postgis-users <postgis-users-bounces@lists.osgeo.org> on behalf of Paul Ramsey \
                <pramsey@cleverelephant.ca>
Sent: Monday, May 11, 2020 4:54 PM
To: PostGIS Users Discussion <postgis-users@lists.osgeo.org>
Subject: Re: [postgis-users] Merging census block data in postgresql

Check for a GIST index on the geometry column of the address points in particular, \
but also the blocks. If missing,

CREATE INDEX blocks_geom_x ON blocks USING GIST(geom);
CREATE INDEX addresses_geom_x ON addresses USING GIST(geom);

What SQL you run will depend to some extent on what you are trying to accomplish with \
the query. I'll assume you just want to make an address/block lookup table.

CREATE table addresses_blocks (
  address_id bigint,
  block_id bigint
);

INSERT INTO addresses_blocks (address_id, block_id)
  SELECT a.address_id, b.block_id
  FROM addresses a
  JOIN blocks b ON ST_Intersects(b.geom, a.geom);

Then just wait. Depending on your version of PgSQL and PostGIS this may or may not \
result in a parallel plan. Use EXPLAIN on the query before running it for real to see \
if you're getting multiple workers.

If you aren't getting multiple workers, you can get parallel behaviour by \
establishing multiple connections to the database and running independent queries in \
each that work on separate pieces of the problem.

INSERT INTO addresses_blocks (address_id, block_id)
  SELECT a.address_id, b.block_id
  FROM addresses a
  JOIN blocks b ON ST_Intersects(b.geom, a.geom);
  WHERE b.state = 'NY';

INSERT INTO addresses_blocks (address_id, block_id)
  SELECT a.address_id, b.block_id
  FROM addresses a
  JOIN blocks b ON ST_Intersects(b.geom, a.geom);
  WHERE b.state = 'CA';

Etc...
Depending on your platform, you might find that GNU parallel is a useful tool for \
spawning a set of workers that chew through a task file and run all the jobs you want \
in an optimally parallel way (ie, 4 cpus => 4 workers).

ATB,

P

> On May 11, 2020, at 1:27 PM, Tsering W. Shawa <shawatw@princeton.edu> wrote:
> 
> 
> I have to run a spatial join between millions of addresses and the census block \
> boundaries covering whole United States. I can do this function in ArcGIS Pro by \
> merging all the census blocks of each state to one File geodatabase and then run a \
> spatial join tool. Many of you might know that the census block boundaries for the \
> whole United States is over 11 GB and therefore too large for a shapefile. 
> I am learning PostGIS and wanted to see how fast spatial join function works in \
> PostGIS compare to ArcGIS Pro or QGIS. I have already imported all the census block \
> boundaries in postgresql and plan to import geocoded addresses also in the \
> postgresql database. 
> My question is...what sql function should I use to merge them together? Do I need \
> to create a new database or table that has all the census block boundaries merged \
> together? Any spatial index I need to use to run it faster? 
> Any workflow or suggestion or example of sql will be appreciated.
> 
> Many thanks in advance.
> 
> -Tsering
> _______________________________________________
> postgis-users mailing list
> postgis-users@lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/postgis-users

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users


[Attachment #3 (text/html)]

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} \
</style> </head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> Hi Paul,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> Thanks for your email. I have a background in a GIS but not in \
a relational database. I am trying to understand conceptually how process should \
work. This is what I was thinking.....after I import all the individual state census \
block layer or table on the  Postgresql database through the PostGIS Shapefile \
Import/Export Manager then I have to merge or combine them into one. After I merged \
all the census block boundaries and imported geocoded location data that is in a \
shapefile format onto the Postgresql database  then I could run the spatial join SQL \
statement something like below example.&nbsp;</div> <div style="font-family: Calibri, \
Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <p \
style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: Calibri, sans-serif"> \
<o:p>&nbsp;</o:p></p> <p style="margin: 0in 0in 0.0001pt; font-size: 12pt; \
font-family: Calibri, sans-serif"> SELECT *</p>
<p style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: Calibri, \
sans-serif"> FROM census_blocks AS c</p>
<p style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: Calibri, \
sans-serif"> JOIN address AS a</p>
<p style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: Calibri, \
sans-serif"> ON ST_Contains (c.geom, a.geom);</p>
<p style="margin: 0in 0in 0.0001pt; font-size: 12pt; font-family: Calibri, \
sans-serif"> <br>
</p>
I, therefore, trying to understand what options are there to do a similar function in \
the Postgresql - PostGIS world.&nbsp;</div> <div style="font-family: Calibri, Arial, \
Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);"> <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> Any help and suggestions will be appreciated.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> Thanks,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
                color: rgb(0, 0, 0);">
-Tsering</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; \
color: rgb(0, 0, 0);"> <br>
</div>
<div>
<div id="Signature"></div>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" \
style="font-size:11pt" color="#000000"><b>From:</b> postgis-users \
&lt;postgis-users-bounces@lists.osgeo.org&gt; on behalf of Paul Ramsey \
&lt;pramsey@cleverelephant.ca&gt;<br> <b>Sent:</b> Monday, May 11, 2020 4:54 PM<br>
<b>To:</b> PostGIS Users Discussion &lt;postgis-users@lists.osgeo.org&gt;<br>
<b>Subject:</b> Re: [postgis-users] Merging census block data in postgresql</font>
<div>&nbsp;</div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">Check for a GIST index on the geometry column of the address \
points in particular, but also the blocks.<br> If missing,<br>
<br>
CREATE INDEX blocks_geom_x ON blocks USING GIST(geom);<br>
CREATE INDEX addresses_geom_x ON addresses USING GIST(geom);<br>
<br>
What SQL you run will depend to some extent on what you are trying to accomplish with \
the query. I'll assume you just want to make an address/block lookup table.<br> <br>
CREATE table addresses_blocks (<br>
&nbsp; address_id bigint,<br>
&nbsp; block_id bigint<br>
);<br>
<br>
INSERT INTO addresses_blocks (address_id, block_id) <br>
&nbsp; SELECT a.address_id, b.block_id<br>
&nbsp; FROM addresses a<br>
&nbsp; JOIN blocks b ON ST_Intersects(b.geom, a.geom);<br>
<br>
Then just wait. Depending on your version of PgSQL and PostGIS this may or may not \
result in a parallel plan. Use EXPLAIN on the query before running it for real to see \
if you're getting multiple workers.<br> <br>
If you aren't getting multiple workers, you can get parallel behaviour by \
establishing multiple connections to the database and running independent queries in \
each that work on separate pieces of the problem.<br> <br>
INSERT INTO addresses_blocks (address_id, block_id) <br>
&nbsp; SELECT a.address_id, b.block_id<br>
&nbsp; FROM addresses a<br>
&nbsp; JOIN blocks b ON ST_Intersects(b.geom, a.geom);<br>
&nbsp; WHERE b.state = 'NY';<br>
<br>
INSERT INTO addresses_blocks (address_id, block_id) <br>
&nbsp; SELECT a.address_id, b.block_id<br>
&nbsp; FROM addresses a<br>
&nbsp; JOIN blocks b ON ST_Intersects(b.geom, a.geom);<br>
&nbsp; WHERE b.state = 'CA';<br>
<br>
Etc...<br>
Depending on your platform, you might find that GNU parallel is a useful tool for \
spawning a set of workers that chew through a task file and run all the jobs you want \
in an optimally parallel way (ie, 4 cpus =&gt; 4 workers).<br> <br>
ATB,<br>
<br>
P<br>
<br>
&gt; On May 11, 2020, at 1:27 PM, Tsering W. Shawa &lt;shawatw@princeton.edu&gt; \
wrote:<br> &gt; <br>
&gt; <br>
&gt; I have to run a spatial join between millions of addresses and the census block \
boundaries covering whole United States. I can do this function in ArcGIS Pro by \
merging all the census blocks of each state to one File geodatabase and then run a \
spatial join  tool. Many of you might know that the census block boundaries for the \
whole United States is over 11 GB and therefore too large for a shapefile. <br>
&gt; <br>
&gt; I am learning PostGIS and wanted to see how fast spatial join function works in \
PostGIS compare to ArcGIS Pro or QGIS. I have already imported all the census block \
boundaries in postgresql and plan to import geocoded addresses also in the postgresql \
database. <br>
&gt; <br>
&gt; My question is...what sql function should I use to merge them together? Do I \
need to create a new database or table that has all the census block boundaries \
merged together? Any spatial index I need to use to run it faster?&nbsp; <br>
&gt; <br>
&gt; Any workflow or suggestion or example of sql will be appreciated.<br>
&gt; <br>
&gt; Many thanks in advance.<br>
&gt; <br>
&gt; -Tsering<br>
&gt; _______________________________________________<br>
&gt; postgis-users mailing list<br>
&gt; postgis-users@lists.osgeo.org<br>
&gt; <a href="https://lists.osgeo.org/mailman/listinfo/postgis-users">https://lists.osgeo.org/mailman/listinfo/postgis-users</a><br>
 <br>
_______________________________________________<br>
postgis-users mailing list<br>
postgis-users@lists.osgeo.org<br>
<a href="https://lists.osgeo.org/mailman/listinfo/postgis-users">https://lists.osgeo.org/mailman/listinfo/postgis-users</a></div>
 </span></font></div>
</body>
</html>


[Attachment #4 (unknown)]

_______________________________________________
postgis-users mailing list
postgis-users@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/postgis-users

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic