'Re: [Haskell-beginners] What is the functional way of implementing a function that takes a long time'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       haskell-beginners
Subject:    Re: [Haskell-beginners] What is the functional way of implementing a function that takes a long time
From:       Bryce Verdier <bryceverdier () gmail ! com>
Date:       2013-04-24 16:28:35
Message-ID: 51780833.2050103 () gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]

Thank you Ertugrul for this awesome explanation.

It's emails like that are the reason why I love this list.

Bryce

On 04/23/2013 06:28 AM, Ertugrul Söylemez wrote:
> "Costello, Roger L." <costello@mitre.org> wrote:
>
>> Suppose a function takes a long time to do its work.
>>
>> Perhaps it takes minutes or even hours to complete.
>>
>> While it is crunching along, it would be nice to have some insight
>> into its status such as (1) how close is it to completing? (2) what
>> part of the task is it currently working on?
>>
>> It might even be nice to be notified when it is finished.
>>
>> What is the functional way of implementing the function?
> The traditional way in Haskell to see intermediate results is not to
> produce the final result only, but to produce a list of intermediate
> results, the last of which is the final result.  The trick is to apply
> what we call corecursion, which basically means:  Wrap the recursion in
> an (ideally nonstrict) data constructor cell.  The function
>
>      sqrtApprox :: Rational -> Rational
>
> then becomes:
>
>      sqrtApprox :: Rational -> [Rational]
>
> You know that you have done it properly if you used proper corecursion,
> which looks similar to this:
>
>      loop x y = (x, y) : loop x' y'
>          where
>          x' = {- ... -}
>          y' = {- ... -}
>
> The important part is that the recursion is the right argument of the
> constructor (:) and that the constructor application is the last thing
> that happens.  You can make sure that you have done it properly and even
> get some nice deforestation optimizations by using one of the predefined
> corecursion operators like 'unfoldr', 'iterate', etc.
>
> Sometimes you'll want to encode an algorithm even as a composition of
> predefined corecursive formulas like 'map', 'filter', 'tails', etc.  For
> example you may have a list [1,2,3,4,5,6,7,8,9] to begin with and you
> want to encode the sum of the products of three consecutive values,
> 1*2*3 + 4*5*6 + 7*8*9 + 10:
>
>      takeWhile (not . null) . map (take 3) . iterate (drop 3)
>
> Now how do we produce online statistics while this algorithm calculates
> its result?  This is the easiest part, but there is a catch.  You have a
> corecursively produced list, which ensures that the list is lazy enough.
> All you have to do now is to consume the list as part of an IO action.
> The foldM combinator is most helpful here:
>
>      sumStats :: (Num a) => [a] -> IO a
>      sumStats = foldM f 0
>          where
>          f s x = do
>              putStrLn ("Sum so far: " ++ s)
>              return (s + x)
>
> As said there is a catch.  In your recursive consumer you actually have
> to make sure that the intermediate result is actually calculated.  This
> is important, because otherwise your consumer doesn't actually force the
> calculation, but really just builds up a large unevaluated expression,
> which is only evaluated at the very end.
>
> The easiest way to ensure this is to just do what I did in sumStats:
> Print the intermediate result (you may call it 'state') or some value
> derived from it (make sure that the value depends on the entire state).
> If you don't want to perform some IO action with the state you can also
> just be strict.  There are many ways to be strict, my favorite being
>
>      f s x = do
>          {- ... -}
>          return $! s + x
>
> but you can also use the BangPatterns extension.
>
> The basic idea of all this is that you turn an opaque monolithic formula
> into a stream processing formula.  This not only makes it more flexible,
> but may also help you understand the original problem better and split
> it into independent modules.  Remember that you can always compose
> stream processors using regular function composition as done above.
>
> Once you are comfortable with using lists for stream processing you can
> also use an actual stream processing abstraction.  My personal favorite
> right now is the 'pipes' library, but there are other useful libraries
> including the other modern library 'conduit' as well as the traditional
> 'enumerator' library.
>
> This of course does not end with streams.  Streams are for algorithms
> with a linear execution paths, but not every algorithm follows that.  If
> your formula naturally follows multiple paths, there is nothing wrong
> with corecursively producing and recursively consuming trees or graphs,
> for example.  You just need unfoldTree+foldTreeM or
> unfoldGraph+foldGraphM for that.  This works with every algebraic data
> structure.
>
> As a final remark, don't worry about the performance.  Haskell is a lazy
> language.  Done properly at least the intermediate lists will be
> optimized away by the compiler and the resulting machine code is close
> to what a C compiler would have produced for the original monolithic
> formula randomly interspersed with statistics printing commands.
>
>
> Greets
> Ertugrul
>
>
>
> _______________________________________________
> Beginners mailing list
> Beginners@haskell.org
> http://www.haskell.org/mailman/listinfo/beginners

[Attachment #5 (text/html)]

<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    <div class="moz-cite-prefix">Thank you Ertugrul for this awesome
      explanation.<br>
      <br>
      It's emails like that are the reason why I love this list.<br>
      <br>
      Bryce<br>
      <br>
      On 04/23/2013 06:28 AM, Ertugrul S&ouml;ylemez wrote:<br>
    </div>
    <blockquote cite="mid:20130423152849.59b6ea25@tritium.ertes.de"
      type="cite">
      <pre wrap="">"Costello, Roger L." <a class="moz-txt-link-rfc2396E" \
href="mailto:costello@mitre.org">&lt;costello@mitre.org&gt;</a> wrote:

</pre>
      <blockquote type="cite">
        <pre wrap="">Suppose a function takes a long time to do its work.

Perhaps it takes minutes or even hours to complete.

While it is crunching along, it would be nice to have some insight
into its status such as (1) how close is it to completing? (2) what
part of the task is it currently working on?

It might even be nice to be notified when it is finished.

What is the functional way of implementing the function?
</pre>
      </blockquote>
      <pre wrap="">
The traditional way in Haskell to see intermediate results is not to
produce the final result only, but to produce a list of intermediate
results, the last of which is the final result.  The trick is to apply
what we call corecursion, which basically means:  Wrap the recursion in
an (ideally nonstrict) data constructor cell.  The function

    sqrtApprox :: Rational -&gt; Rational

then becomes:

    sqrtApprox :: Rational -&gt; [Rational]

You know that you have done it properly if you used proper corecursion,
which looks similar to this:

    loop x y = (x, y) : loop x' y'
        where
        x' = {- ... -}
        y' = {- ... -}

The important part is that the recursion is the right argument of the
constructor (:) and that the constructor application is the last thing
that happens.  You can make sure that you have done it properly and even
get some nice deforestation optimizations by using one of the predefined
corecursion operators like 'unfoldr', 'iterate', etc.

Sometimes you'll want to encode an algorithm even as a composition of
predefined corecursive formulas like 'map', 'filter', 'tails', etc.  For
example you may have a list [1,2,3,4,5,6,7,8,9] to begin with and you
want to encode the sum of the products of three consecutive values,
1*2*3 + 4*5*6 + 7*8*9 + 10:

    takeWhile (not . null) . map (take 3) . iterate (drop 3)

Now how do we produce online statistics while this algorithm calculates
its result?  This is the easiest part, but there is a catch.  You have a
corecursively produced list, which ensures that the list is lazy enough.
All you have to do now is to consume the list as part of an IO action.
The foldM combinator is most helpful here:

    sumStats :: (Num a) =&gt; [a] -&gt; IO a
    sumStats = foldM f 0
        where
        f s x = do
            putStrLn ("Sum so far: " ++ s)
            return (s + x)

As said there is a catch.  In your recursive consumer you actually have
to make sure that the intermediate result is actually calculated.  This
is important, because otherwise your consumer doesn't actually force the
calculation, but really just builds up a large unevaluated expression,
which is only evaluated at the very end.

The easiest way to ensure this is to just do what I did in sumStats:
Print the intermediate result (you may call it 'state') or some value
derived from it (make sure that the value depends on the entire state).
If you don't want to perform some IO action with the state you can also
just be strict.  There are many ways to be strict, my favorite being

    f s x = do
        {- ... -}
        return $! s + x

but you can also use the BangPatterns extension.

The basic idea of all this is that you turn an opaque monolithic formula
into a stream processing formula.  This not only makes it more flexible,
but may also help you understand the original problem better and split
it into independent modules.  Remember that you can always compose
stream processors using regular function composition as done above.

Once you are comfortable with using lists for stream processing you can
also use an actual stream processing abstraction.  My personal favorite
right now is the 'pipes' library, but there are other useful libraries
including the other modern library 'conduit' as well as the traditional
'enumerator' library.

This of course does not end with streams.  Streams are for algorithms
with a linear execution paths, but not every algorithm follows that.  If
your formula naturally follows multiple paths, there is nothing wrong
with corecursively producing and recursively consuming trees or graphs,
for example.  You just need unfoldTree+foldTreeM or
unfoldGraph+foldGraphM for that.  This works with every algebraic data
structure.

As a final remark, don't worry about the performance.  Haskell is a lazy
language.  Done properly at least the intermediate lists will be
optimized away by the compiler and the resulting machine code is close
to what a C compiler would have produced for the original monolithic
formula randomly interspersed with statistics printing commands.

Greets
Ertugrul

</pre>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
Beginners mailing list
<a class="moz-txt-link-abbreviated" \
href="mailto:Beginners@haskell.org">Beginners@haskell.org</a> <a \
class="moz-txt-link-freetext" \
href="http://www.haskell.org/mailman/listinfo/beginners">http://www.haskell.org/mailman/listinfo/beginners</a>
 </pre>
    </blockquote>
    <br>
  </body>
</html>

_______________________________________________
Beginners mailing list
Beginners@haskell.org
http://www.haskell.org/mailman/listinfo/beginners

[prev in list] [next in list] [prev in thread] [next in thread]