[prev in list] [next in list] [prev in thread] [next in thread] 

List:       grass-dev
Subject:    [GRASS-dev] Heading towards unified dataset
From:       Vaclav Petras <wenzeslaus () gmail ! com>
Date:       2014-09-30 18:18:40
Message-ID: CABo5uVt7MPvR6RT_S5dH=8pvyNyV3+asWpypsbn2BCZD+jU6wg () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


Hi,

at FOSS4G we were talking about the need of unified dataset, a GRASS
location in our GRASS case, to enable easy writing of examples and also
tests.

The location would have maps with unified names such as "elevation" and
these names can be used in the examples and tests so that both examples and
tests can, to certain extent, work in different locations. For examples in
manual pages or other educational material, this would mean that it could
be used better across more countries or areas. For tests, this would mean
that different projection or data can be tested with the algorithms.

This has of course its limits, for example the result of statistical
analysis would be different in different locations but the point is that
the analysis can be done.

We already have NC (full) location and NC basic location. They have these
raster maps in common:

elevation
elevation_shade
lakes

These maps are in NC basic:

basins
geology
landuse
soils

But in full NC they have different names:

basin_50K
geology_30m
landuse96_28m
soilsID

Of course the longer names have their reasoning in differences with similar
raster maps in the same location but I would say that having unified names
is more advantageous for teaching/test datasets then absolute clearness of
names. This should be in metadata anyway.

One can also argue about the unified names themselves (e.g. elevation vs
dtm or usage of underscore) but most of it is pretty clear since it has to
be the most general names possible.

The names must be obviously in English. If somebody would like to have data
in different language, derived dataset must be created. Perhaps it would be
possible to provide some batch version of g.rename (but there are also
attribute columns and others).

The last issue might be what if there is nothing in the area which can be
part of the map or if dam or pond are lakes. But we can allow for some
inaccuracies when creating a training dataset.

The other locations which can be unified are Piemonte and Spearfish.

So, what are the next steps? Decide about which maps to include and which
names to use? Let's start from the NC basic location.

Raster maps

basins
elevation
elevation_shade
geology
lakes
landuse
soils

I'm not sure if geology and soils would be available in other locations, so
we could leave out them. However, they are available for Spearfish and
maybe for Piemonte (my Italian is not really usable).

Vector maps

boundary_region
boundary_state
census
elev_points
firestations
geology
geonames
hospitals
points_of_interest
railroads
roadsmajor
schools
streams
streets
zipcodes

We would need to have at at least one map for each type. I'm not sure what
are the crucial ones and broadly available but it seems that training
datasets are usually near some civilization, so roads or schools might be
available. Buildings would be nice to have.

Attribute data, time series and 3D rasters and (real) 3D vectors are of
course whole new level. So, I would start with rasters and (mostly 2D)
vectors.

Vaclav

[Attachment #5 (text/html)]

<div dir="ltr"><div><div>Hi,<br><br></div>at FOSS4G we were talking about the need of \
unified dataset, a GRASS location in our GRASS case, to enable easy writing of \
examples and also tests.<br><br></div>The location would have maps with unified names \
such as &quot;elevation&quot; and these names can be used in the examples and tests \
so that both examples and tests can, to certain extent, work in different locations. \
For examples in manual pages or other educational material, this would mean that it \
could be used better across more countries or areas. For tests, this would mean that \
different projection or data can be tested with the algorithms.<br><br>This has of \
course its limits, for example the result of statistical analysis would be different \
in different locations but the point is that the analysis can be \
done.<br><div><div><div><br></div><div>We already have NC (full) location and NC \
basic location. They have these raster maps in \
common:<br></div><div><br>elevation<br>elevation_shade<br>lakes<br><br></div><div>These \
maps are in NC basic:<br></div><div><br>basins<br>geology<br>landuse<br>soils<br><br></div><div>But \
in full NC they have different \
names:<br></div><div><br>basin_50K<br>geology_30m<br>landuse96_28m<br>soilsID<br><br></div><div>Of \
course the longer names have their reasoning in differences with similar raster maps \
in the same location but I would say that having unified names is more advantageous \
for teaching/test datasets then absolute clearness of names. This should be in \
metadata anyway.<br><br></div><div>One can also argue about the unified names \
themselves (e.g. elevation vs dtm or usage of underscore) but most of it is pretty \
clear since it has to be the most general names possible.<br><br></div><div>The names \
must be obviously in English. If somebody would like to have data in different \
language, derived dataset must be created. Perhaps it would be possible to provide \
some batch version of g.rename (but there are also attribute columns and \
others).<br></div><div><br></div><div>The last issue might be what if there is \
nothing in the area which can be part of the map or if dam or pond are lakes. But we \
can allow for some inaccuracies when creating a training \
dataset.<br><br></div><div>The other locations which can be unified are Piemonte and \
Spearfish.<br><br></div><div>So, what are the next steps? Decide about which maps to \
include and which names to use? Let&#39;s start from the NC basic \
location.<br><br>Raster \
maps<br><br>basins<br>elevation<br>elevation_shade<br>geology<br>lakes<br>landuse<br>soils<br><br>I&#39;m \
not sure if geology and soils would be available in other locations, so we could \
leave out them. However, they are available for Spearfish and maybe for Piemonte (my \
Italian is not really usable).<br><br>Vector \
maps<br><br>boundary_region<br>boundary_state<br>census<br>elev_points<br>firestations \
<br>geology<br>geonames<br>hospitals<br>points_of_interest<br>railroads<br>roadsmajor<br>schools<br>streams<br>streets<br>zipcodes<br><br></div><div>We \
would need to have at at least one map for each type. I&#39;m not sure what are the \
crucial ones and broadly available but it seems that training datasets are usually \
near some civilization, so roads or schools might be available. Buildings would be \
nice to have.<br><br></div><div>Attribute data, time series and 3D rasters and (real) \
3D vectors are of course whole new level. So, I would start with rasters and (mostly \
2D) vectors.<br></div><div><br></div><div>Vaclav<br></div></div></div></div>



_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic