[prev in list] [next in list] [prev in thread] [next in thread] 

List:       beowulf
Subject:    Re: [Beowulf] likwid vs stream (after HPCG discussion)
From:       Jörg_Saßmannshausen <sassy-work () sassy ! formativ ! net>
Date:       2022-03-21 17:39:04
Message-ID: 1790884.dna9SdsAjJ () deepblue
[Download RAW message or body]

Dear all,

reading through this made me realise how difficult things are these days. 
If you have a HPC cluster for just a few applications, you find it probably 
easier to build the software and replace the hardware as you really could go 
down to the assembler level and have really highly optimised code and hardware 
for exactly these few jobs.
If, as we are, you have to support anything from a single core job to say a 
few hundred cores, MPI and threaded jobs, memory bandwidth extensive to heave 
IO ones, all of that goes out the chimney. I reckon it will be very difficult 
to then find the only 'right' benchmark for you as your applications vary so 
much. So the trick is probably to find the sweet spot for your cluster, which 
might be a different setup for other sites. 

As always, thanks for sharing your thoughts. 

All the best from a sunny and mild London

Jörg

Am Montag, 21. März 2022, 09:46:31 GMT schrieb Mikhail Kuzminsky:
> In message from Scott Atchley <e.scott.atchley@gmail.com> (Sun, 20 Mar
> 
> 2022 14:52:10 -0400):
> > On Sat, Mar 19, 2022 at 6:29 AM Mikhail Kuzminsky <kus@free.net>
> > 
> > wrote:
> > > If so, it turns out that for the HPC user, stream gives a more
> > > important estimate - the application is translated by the compiler
> > > (they do not write in assembler - except for modules from
> > > 
> > > mathematical
> > > 
> > > libraries), and stream will give a real estimate of what will be
> > > received in the application.
> > 
> > When vendors advertise STREAM results, they compile the application
> > 
> > with
> > 
> > non-temporal loads and stores. This means that all memory accesses
> > 
> > bypass
> > 
> > the processor's caches. If your application of interest does a random
> > 
> > walk
> > 
> > through memory and there is neither temporal or spatial locality,
> > 
> > then
> > 
> > using non-temporal loads and stores makes sense and STREAM
> > 
> > irrelevant.
> 
> STREAM is not initially oriented to random access to memory. In this
> case, memory latencies are important, and it makes more sense to get a
> bandwidth estimate in the mega-sweep
> (https://github.com/UK-MAC/mega-stream).
> _______________________________________________
> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf



_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit \
https://beowulf.org/cgi-bin/mailman/listinfo/beowulf


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic