'[vtk-developers] Flying Edges'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       vtk-developers
Subject:    [vtk-developers] Flying Edges
From:       Will Schroeder <will.schroeder () kitware ! com>
Date:       2015-01-10 15:04:46
Message-ID: CAEiDrtW++84mVSqLkkg68xdJkTyxRMmkThQ8-ypDLiv8BjD6+A () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Over the holidays I gave myself the gift of designing and implementing a
new contouring algorithm :-) To my utter delight and amazement I stumbled
upon a novel approach that is readily parallelizable and yet wicked fast in
serial mode as well. The algorithm (vtkFlyingEdges2D/vtkFlyingEdges3D) has
been pushed into gerrit for review if anyone is interested (FlyingEdges
branch). Consider this a work in progress, we are benchmarking and working
through parallel implementation details now. We are planning a paper
submission in the near future.

TLDR Details
+ As many of you know the VTK community is taking aim at seriously
reworking the system to better support parallel computing. Much of this
work is being led by Berk Geveci and his team. For example, Berk introduced
the vtkSMPTools into VTK about a year ago, and we are actively working with
some big name chip manufacturers, DOE Labs, research organizations, and
other customers/collaborators to get this ambitious undertaking done.
Hopefully individuals in this VTK community will also lend their
significant expertise as well.
+ Under prodding by Berk, and inspired by Patrick O'Leary's "born parallel"
slogan to rethinking systems and algorithms for parallel computing, the
motivation was to take one of the most important visualization algorithms
(isocontouring) and see if we could redesign it for emerging parallel
approaches.
+ The venerable Marching Cubes algorithm introduced the key idea that the
topology of an isosurface within a voxel can be captured with an index into
a simple, finite case table. Yet the implementation of the algorithm can be
slower than you might expect because, for example, on interior voxels,
voxel edges can be processed up to four times, and vertices visited eight
times (ignoring gradient calculations). Also the output mesh is dynamically
created: a priori the number of output points and triangles is unknown,
which means that memory resizing (realloc's) are required. Finally, the
algorithm is often implemented with a "locator" or hash table to prevent
the creation of duplicate points on shared voxel edges (the so-called point
merging problem). Not good for parallel computing.
+ Ken Martin's extremely fast Synchronized Templates algorithm introduced
the notion of a "voxel axes" or "edge triad" which consists of the three
x-y-z voxel axes located at the voxel origin. This triad is moved through
the volume in such a way as to intersect each voxel edge only one time
(with special treatment on the +x+y+z volume boundaries). (An aside: Ken
wrote this like 15 years ago, it's been the VTK standard ever since.
Initially we filed a patent application [since abandoned] but that was
before we realized how stupid it is for an open source company to hold
patents. Live and learn :-))
+ Flying Edges builds on these concepts and introduces some novel ideas. It
is a 3-pass algorithm: the first two passes gather information about the
volume including "intersections" along x-edges, and the number of output
primitives and points. The final pass generates output into preallocated
arrays. Some of the key ideas include:
-- each pass traverses x-rows in the volume completely independently
meaning that it is readily parallelizable.
-- the case table is based on the four voxel x-edges (which can be easily
transformed to and from a vertex-based MC table). Because the cases are
edge based, the computation of cases is parallel separable.
-- The novel concept of "computational trimming" is used to avoid massive
amounts of work. Basically the fact that isosurfaces are topologically
smooth is combined with the results of the first pass (building
x-voxel-edge cases) to reason about where the isosurface can exist in the
volume. This very simple topological reasoning enables the algorithm to
skip portions of x-rows, entire x-rows, and even x-y slices very rapidly.
-- Some simple edge metadata is gathered for each x-row volume edge which
can be used to allocate and partition memory for separate threads to write
into without contention (i.e., no mutexes or locks). This metadata also
enables the algorithm to process the voxels along each x-row in such a way
as to synchronously increment the point ids and triangle ids without
requiring point merging or hashing. In fact, topology generation (i.e.,
creating triangles) is completely independent from point/geometry
generation (i.e., interpolating point coordinates and attributes along
edges).
+ Early results show a 2-5x speedup over Synchronized Templates (>10x over
vtkImageMarchingCubes) when running in serial. However, this is very data
dependent and I suspect compiler dependent since there is a lot of C++
templating and in-lining going on (make sure to build optimized/release
mode). Anyway I am very excited because once the parallel loops are
functioning properly we should see much more performance improvement. The
scalability may be hampered however because data movement over the memory
bus may slow things up (computations are relatively modest compared to the
total data processed). We'll be benchmarking performance over the next
couple of months to see where we really are and of course we'll update this
community at that time.
+ The memory overhead of the algorithm is roughly 2-bits per vertex.
However as implemented now I'm avoiding bit packing and just use a byte
(unsigned char) per vertex.
+ Other interesting tidbits:
-- The basic algorithm is extremely simple to implement. Most of the
complications come from dealing with volume boundaries. It would be
possible to significantly reduce code complexity by either padding the
+x+y+z volume boundaries by a layer of voxels, or alternatively not
processing the outermost layer of voxels.
-- Because I am lazy, the algorithm reuses the MC case table as enumerated
in VTK (vtkMarchingCubesTriangleCases.h). However at instantiation the MC
table is converted to a Flying Edges edge-based case table. Also note that
the MC table as originally defined is for a left-handed coordinate system
(see the original paper). So reordering of triangles is required to make
sure that normals and triangle ordering is consistent.
-- On our radar is potential vectorization of operation like interpolation
across voxel edges. As you know some of the big hardware manufacturers are
really pushing this now and we'd like to learn how to best take advantage
of this emerging resource.
-- Probably the biggest downside of this algorithm is that is can produce
degenerate triangles (i.e., zero-area triangles in 3D, zero-length line
segments in 2D). The major reason is that when preallocating output memory,
we do so based on topological evaluation. However, during actual point
generation in degenerate cases (a degenerate case occurs when one of more
vertex scalar values == isocontour value) multiple intersection points can
occur at the vertex producing degeneracies. Since discarding a degenerate
triangle would mess up the output preallocation we can't just throw away
degenerate primitives. However I have already thought through a solution to
this problem but it is based on (spoiler alert) a much more complex case
table that enumerates 3-values per vertex: (<,==,>) when compared to the
isocontour value. This is a novel idea as well, mostly the academic
literature completely ignores the reality of degeneracies. Maybe next
holiday season we'll slip it into the algorithm.....

Okay I've got to get back to the pointy-haired management stuff. But please
feel free to provide feedback, or offers of collaboration :-) In particular
we'd love support, either in funding or brainpower, to do even more cool
parallel processing stuff. Hoping to hear from you, and I hope your coming
year is as exciting as mine is looking to be!

Best,
W

-- 
William J. Schroeder, PhD
Kitware, Inc.
28 Corporate Drive
Clifton Park, NY 12065
will.schroeder@kitware.com
http://www.kitware.com
(518) 881-4902

[Attachment #5 (text/html)]

<div dir="ltr">Over the holidays I gave myself the gift of designing and implementing \
a new contouring algorithm :-) To my utter delight and amazement I stumbled upon a \
novel approach that is readily parallelizable and yet wicked fast in serial mode as \
well. The algorithm (vtkFlyingEdges2D/vtkFlyingEdges3D) has been pushed into gerrit \
for review if anyone is interested (FlyingEdges branch). Consider this a work in \
progress, we are benchmarking and working through parallel implementation details \
now. We are planning a paper submission in the near future.<div><br></div><div>TLDR \
Details</div><div>+ As many of you know the VTK community is taking aim at seriously \
reworking the system to better support parallel computing. Much of this work is being \
led by Berk Geveci and his team. For example, Berk introduced the vtkSMPTools into \
VTK about a year ago, and we are actively working with some big name chip \
manufacturers, DOE Labs, research organizations, and other customers/collaborators to \
get this ambitious undertaking done. Hopefully individuals in this VTK community will \
also lend their significant expertise as well.</div><div>+ Under prodding by Berk, \
and inspired by Patrick O&#39;Leary&#39;s &quot;born parallel&quot; slogan to \
rethinking systems and algorithms for parallel computing, the motivation was to take \
one of the most important visualization algorithms (isocontouring) and see if we \
could redesign it for emerging parallel approaches.</div><div>+ The venerable \
Marching Cubes algorithm introduced the key idea that the topology of an isosurface \
within a voxel can be captured with an index into a simple, finite case table. Yet \
the implementation of the algorithm can be slower than you might expect because, for \
example, on interior voxels, voxel edges can be processed up to four times, and \
vertices visited eight times (ignoring gradient calculations). Also the output mesh \
is dynamically created: a priori the number of output points and triangles is \
unknown, which means that memory resizing (realloc&#39;s) are required. Finally, the \
algorithm is often implemented with a &quot;locator&quot; or hash table to prevent \
the creation of duplicate points on shared voxel edges (the so-called point merging \
problem). Not good for parallel computing.</div><div>+ Ken Martin&#39;s extremely \
fast Synchronized Templates algorithm introduced the notion of a &quot;voxel \
axes&quot; or &quot;edge triad&quot; which consists of the three x-y-z voxel axes \
located at the voxel origin. This triad is moved through the volume in such a way as \
to intersect each voxel edge only one time (with special treatment on the +x+y+z \
volume boundaries). (An aside: Ken wrote this like 15 years ago, it&#39;s been the \
VTK standard ever since. Initially we filed a patent application [since abandoned] \
but that was before we realized how stupid it is for an open source company to hold \
patents. Live and learn :-))</div><div>+ Flying Edges builds on these concepts and \
introduces some novel ideas. It is a 3-pass algorithm: the first two passes gather \
information about the volume including &quot;intersections&quot; along x-edges, and \
the number of output primitives and points. The final pass generates output into \
preallocated arrays. Some of the key ideas include:</div><div>-- each pass traverses \
x-rows in the volume completely independently meaning that it is readily \
parallelizable.</div><div>-- the case table is based on the four voxel x-edges (which \
can be easily transformed to and from a vertex-based MC table). Because the cases are \
edge based, the computation of cases is parallel separable.</div><div>-- The novel \
concept of &quot;computational trimming&quot; is used to avoid massive amounts of \
work. Basically the fact that isosurfaces are topologically smooth is combined with \
the results of the first pass (building x-voxel-edge cases) to reason about where the \
isosurface can exist in the volume. This very simple topological reasoning enables \
the algorithm to skip portions of x-rows, entire x-rows, and even x-y slices very \
rapidly.  </div><div>-- Some simple edge metadata is gathered for each x-row volume \
edge which can be used to allocate and partition memory for separate threads to write \
into without contention (i.e., no mutexes or locks). This metadata also enables the \
algorithm to process the voxels along each x-row in such a way as to synchronously \
increment the point ids and triangle ids without requiring point merging or hashing. \
In fact, topology generation (i.e., creating triangles) is completely independent \
from point/geometry generation (i.e., interpolating point coordinates and attributes \
along edges).</div><div>+ Early results show a 2-5x speedup over Synchronized \
Templates (&gt;10x over vtkImageMarchingCubes) when running in serial. However, this \
is very data dependent and I suspect compiler dependent since there is a lot of C++ \
templating and in-lining going on (make sure to build optimized/release mode). Anyway \
I am very excited because once the parallel loops are functioning properly we should \
see much more performance improvement. The scalability may be hampered however \
because data movement over the memory bus may slow things up (computations are \
relatively modest compared to the total data processed). We&#39;ll be benchmarking \
performance over the next couple of months to see where we really are and of course \
we&#39;ll update this community at that time.</div><div>+ The memory overhead of the \
algorithm is roughly 2-bits per vertex. However as implemented now I&#39;m avoiding \
bit packing and just use a byte (unsigned char) per vertex.</div><div>+ Other \
interesting tidbits:</div><div>-- The basic algorithm is extremely simple to \
implement. Most of the complications come from dealing with volume boundaries. It \
would be possible to significantly reduce code complexity by either padding the \
+x+y+z volume boundaries by a layer of voxels, or alternatively not processing the \
outermost layer of voxels.</div><div>-- Because I am lazy, the algorithm reuses the \
MC case table as enumerated in VTK (vtkMarchingCubesTriangleCases.h). However at \
instantiation the MC table is converted to a Flying Edges edge-based case table. Also \
note that the MC table as originally defined is for a left-handed coordinate system \
(see the original paper). So reordering of triangles is required to make sure that \
normals and triangle ordering is consistent.</div><div>-- On our radar is potential \
vectorization of operation like interpolation across voxel edges. As you know some of \
the big hardware manufacturers are really pushing this now and we&#39;d like to learn \
how to best take advantage of this emerging resource.</div><div>-- Probably the \
biggest downside of this algorithm is that is can produce degenerate triangles (i.e., \
zero-area triangles in 3D, zero-length line segments in 2D). The major reason is that \
when preallocating output memory, we do so based on topological evaluation. However, \
during actual point generation in degenerate cases (a degenerate case occurs when one \
of more vertex scalar values == isocontour value) multiple intersection points can \
occur at the vertex producing degeneracies. Since discarding a degenerate triangle \
would mess up the output preallocation we can&#39;t just throw away degenerate \
primitives. However I have already thought through a solution to this problem but it \
is based on (spoiler alert) a much more complex case table that enumerates 3-values \
per vertex: (&lt;,==,&gt;) when compared to the isocontour value. This is a novel \
idea as well, mostly the academic literature completely ignores the reality of \
degeneracies. Maybe next holiday season we&#39;ll slip it into the \
algorithm.....</div><div><br></div><div>Okay I&#39;ve got to get back to the \
pointy-haired management stuff. But please feel free to provide feedback, or offers \
of collaboration :-) In particular we&#39;d love support, either in funding or \
brainpower, to do even more cool parallel processing stuff. Hoping to hear from you, \
and I hope your coming year is as exciting as mine is looking to \
be!</div><div><br></div><div>Best,</div><div>W<br clear="all"><div><br></div>-- \
<br><div class="gmail_signature">William J. Schroeder, PhD<br>Kitware, Inc.<br>28 \
Corporate Drive<br>Clifton Park, NY 12065<br><a \
href="mailto:will.schroeder@kitware.com">will.schroeder@kitware.com</a><br><a \
href="http://www.kitware.com">http://www.kitware.com</a><br>(518) 881-4902</div> \
</div></div>

_______________________________________________
Powered by www.kitware.com

Visit other Kitware open-source projects at http://www.kitware.com/opensource/opensource.html

Search the list archives at: http://markmail.org/search/?q=vtk-developers

Follow this link to subscribe/unsubscribe:
http://public.kitware.com/mailman/listinfo/vtk-developers

[prev in list] [next in list] [prev in thread] [next in thread]