[prev in list] [next in list] [prev in thread] [next in thread]
List: openjdk-lambda-dev
Subject: Re: Parallelism cost function
From: Paul Sandoz <paul.sandoz () oracle ! com>
Date: 2014-01-29 10:27:30
Message-ID: 98AD6353-C8AB-4341-95B5-6E1E2CEE31B9 () oracle ! com
[Download RAW message or body]
[Attachment #2 (multipart/signed)]
On Jan 29, 2014, at 3:46 AM, Sam Pullara <spullara@gmail.com> wrote:
> I think that we should have a lot more information here about when it is \
> appropriate to use the parallelStream() call unless we are going to make sure that \
> it executes inappropriate workloads sequentially.
Very tricky problem to solve, since we don't know what the per-element cost of the \
pipeline is. (We have some indications as to the quality of splitting the source.)
> I’d hate to have a generation of Java programmers randomly adding .parallelStream() \
> to all their Streams just because they think it will always be faster.
In our presentations we explicitly talk about this ("Going parallel is easy to do, \
but not always the right thing to do") and help developers to derive a mental model \
(and when in doubt always measure! [1]).
So far we have deliberately avoided getting into details of this in the JavaDoc [2], \
it could easily take up a few chapters of a book.
However, in hindsight we could have a section highlighting the areas, such as the \
source size and splitting characteristics, the cost-per-element, and unbalancing \
making the computation "lumpy", to at least head off the meme that "parallel always \
equals faster". I believe we could add such a section to the docs of an 8u release.
Paul.
[1] We could provide some helper tooling leveraging jmh.
[2] "flatMap pushed loads of elements into the stream but i ain't seeing any speed \
up, why?", "that's because the source, which is a Set, only contains a few elements \
that happen to be mostly lumped in the same bucket".
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic