'Re: [john-dev] rawsha256.cu patch(using shared memory)'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       john-dev
Subject:    Re: [john-dev] rawsha256.cu patch(using shared memory)
From:       myrice <qqlddg () gmail ! com>
Date:       2012-03-28 21:16:21
Message-ID: CANJ2NMMoV1EWU-sqW1u5jeFZZ4ACcMRRJT2a5dWbi06Rqvm4HQ () mail ! gmail ! com
[Download RAW message or body]

Hi,

Lukas, Solar

Thank you for help!

To Lukas:
> You used shared memory after ALL time consuming computations, totaly
> not good idea:)
> First of all you must decide will you apply for slow, or fast hashes.
> Those are DIFFERENT tasks with different needs.

I go through the code and find your comment "//use shared memory". I
suppose you mean that to use shared memory making the final output(i.e.
write to global memory) coalesce. Just as the code show, I first write it
to shared memory and then write to global memory. This contribute to the
performance gains. I just have a try on this. Now I know, in order to make
fast hash efficient, we have to do lots of works and next I will discuss
with Solar.

To Solar:
> That's nice, but this is still awfully slow.  In fact, even the
> benchmarks we have on the wiki somehow show higher speeds, even though
> you have a faster card (GTX-580, right?)

I am sorry for lack my hardware details. GTX-580 is my lab's server. But
recently it becomes unstable :(
I tested this code on my laptop with GeForce 9600M GS card and P8600 CPU.
So the performance is slow.

> The formats interface bottleneck is somewhere above 50M c/s.  Actually,
> --format=dummy shows it at around 130M c/s on Core i7-2600, which is
> what you said you use, but indeed interfacing to the GPU takes time.
> With Samuele's fast hash implementations in OpenCL and running on GPU,
> we're getting close to 50M c/s.  So you also need to get close to that.
> This is a good thing for you to attempt.

> (And once you get there, you'd need to somehow demonstrate that your
> code would be even faster without the interface bottleneck - e.g., by
> starting to implement candidate password generation and hash comparison
> on GPU in whatever quick way you can for the demo.)

Okay, I will implement XSHA512 first. If I have time, I will make this.
However, I think If I implement candidate password generation and
comparison on GPU, there are lots of work to do. I have to go
through existing code on password generation(I guess they are mainly in
Crakc.c?) and subtitute it with cuda.

[Attachment #3 (text/html)]

Hi,<div><br></div><div>Lukas, Solar</div><div><br></div><div>Thank you for \
help!</div><div><br></div><div>To Lukas:</div><div>&gt; You used shared memory after \
ALL time consuming computations, totaly<br>&gt; not good idea:)<br>




&gt; First of all you must decide will you apply for slow, or fast hashes.<br>&gt; \
Those are DIFFERENT tasks with different needs.</div><div><br></div><div>I go through \
the code and find your comment &quot;//use shared memory&quot;. I suppose you mean \
that to use shared memory making the final output(i.e. write to global memory) \
coalesce. Just as the code show, I first write it to shared memory and then write to \
global memory. This contribute to the performance gains. I just have a try on this. \
Now I know, in order to make fast hash efficient, we have to do lots of works and \
next I will discuss with Solar.</div>




<div><br></div><div>To Solar:</div><div><span>&gt; That&#39;s nice, but this is still \
awfully slow.  In fact, even the</span><br><span>&gt; benchmarks we have on the wiki \
somehow show higher speeds, even though</span><br>



<span>&gt; you have a faster card (GTX-580, right?)</span></div><div><font \
color="#222222" face="arial, sans-serif"><br></font></div><div><font color="#222222" \
face="arial, sans-serif">I am sorry for lack my hardware details. GTX-580 is my \
lab&#39;s server. But recently it becomes unstable :( </font></div>



<div><font color="#222222" face="arial, sans-serif">I tested this code on my laptop \
with GeForce 9600M GS card and P8600 CPU. So the performance is \
slow.</font></div><div><font color="#222222" face="arial, sans-serif"><br>



</font></div><div><span>&gt; The formats interface bottleneck is somewhere above 50M \
c/s.  Actually,</span><br><span>&gt; --format=dummy shows it at around 130M c/s on \
Core i7-2600, which is</span><br> <span>&gt; what you said you use, but indeed \
interfacing to the GPU takes time.</span><br><span>&gt; With Samuele&#39;s fast hash \
implementations in OpenCL and running on GPU,</span><br><span>&gt; we&#39;re getting \
close to 50M c/s.  So you also need to get close to that.</span><br>


<span>&gt; This is a good thing for you to attempt.</span><br><br><span>&gt; (And \
once you get there, you&#39;d need to somehow demonstrate that \
your</span><br><span>&gt; code would be even faster without the interface bottleneck \
- e.g., by</span><br>


<span>&gt; starting to implement candidate password generation and hash \
comparison</span><br><span>&gt; on GPU in whatever quick way you can for the \
demo.)</span></div><div><span><br></span></div><div><span>Okay, I will implement \
XSHA512 first. If I have time, I will make this. However, I think If I implement \
candidate password generation and comparison on GPU, there are lots of work to do. I \
have to go through existing code on password generation(I guess they are mainly in \
Crakc.c?) and subtitute it with cuda. </span></div>

<div><font color="#222222" face="arial, sans-serif"><br>
</font></div>



[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic