'[CFRG] Comments on draft-irtf-cfrg-argon2-13.txt'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       cfrg
Subject:    [CFRG] Comments on draft-irtf-cfrg-argon2-13.txt
From:       steve () tobtu ! com
Date:       2021-04-11 3:18:24
Message-ID: 587454483.311259.1618111104472 () email ! ionos ! com
[Download RAW message or body]

> 4. Parameter Choice
> 
> Argon2id is optimized for more realistic settings, where the adversary possibly can \
> access the same machine, use its CPU or mount cold-boot attacks.

"mount cold-boot attacks" needs to be removed.

----

> 7.4. Recommendations
> 
> The Argon2id variant with t=1 and 2GiB memory is FIRST RECOMMENDED option and is \
> suggested as a default setting for all environments. This setting is secure against \
> side-channel attacks and maximizes adversarial costs on dedicated bruteforce \
> hardware. The Argon2id variant with t=3 and 64 MiB memory is SECOND RECOMMENDED \
> option and is suggested as a default setting for memory-constrained environments.

"This setting is secure against side-channel attacks" this is not true. Note Argon2id \
with side-channel attacks drops it to basically Argon2i m=m'/4*(1+x/p), t=1 where x \
is 1 or 2. x is 2 with probability of ((p-1)/p)^p. You only need to do 99% of the \
first slice on all lanes then the second slice on one lane. Then you get to a \
dependent read that is in the current lane (that ((p-1)/p)^p probability) use that to \
get a the second dependent read location. With those two you should have a low \
probability (1 in 10000 or so) of a false positive. In some cases if m is large \
enough, you only need the first one dependent read location. Thus you don't care if \
it's in the current lane.

----

Argon2id: m=2 GiB, t=1, p=4
Argon2id: md MiB, t=3, p=4

Why are these so disparate? For an attacker with enough memory, these settings should \
be similarly hard. In reality the one with lower memory usage should be harder for an \
attacker with enough memory. *BUT* "m=2 GiB, t=1" reads/writes 4 GiB of memory and \
"md MiB, t=3" reads/writes 0.5 GiB of memory. So "m=2 GiB, t=1" is 8x harder than "md \
MiB, t=3" given an attacker with enough memory. To match memory reads/writes you need \
"md MiB, t"". Bandwidth used for Argon2 is m*(3*t-1)).

I'm all for going to 11 on settings but you want a server to do "m=2 GiB, t=1, p=4" \
with no mention of a queue. When settings are this high, it is very important to set \
up a queue to prevent it from exhausting all memory. Also there is no mention of \
throughput: number of password checks per second. In the past it was suggested to be \
able to do about 10 checks/core/second.

----

It maybe be useful to state minimum settings. Basically a bottom of the barrel, don't \
go lower than these settings. After searching for CPUs, GPUs, and FPGAs, the best \
attacker is GPUs. FPGAs have 1/2 to 1/3 the memory bandwidth as GPUs. CPUs have 1/4 \
to 1/20 the memory bandwidth as GPUs.

GPUs:
RTX 3080: 10 GiB, 760.0 GB/s
RTX 3090: 24 GiB, 935.8 GB/s (takes 3 "PCI slots" vs 2)
Radeon VII: 16 GiB, 1028 GB/s

There are these GPUs. Cost is unknown but a system with 8 A100 cards is $200k:
A100 80GBâ€‹: 80 GiB, 2039 GB/s
A100: 40 GiB, 1555 GB/s

Since the current best cost/performance GPU for password cracking is the RTX 3080, \
I'll go with that. Memory hard algorithms cracking speed is based on memory bandwidth \
if there's enough computing power. At low memory usage we can assume it gets near max \
bandwidth. The goal for a good password hash/KDF for auth is <10 kH/s/GPU. Basically \
<10 kH/s/GPU is good and â‰¥10 kH/s/GPU is not good. Therefore a memory hard \
algorithm needs to read/write at least 72.5 MiB (760,000,000,000/10,000/1024/1024).

m7 MiB, t=1: does 74 MiB
m MiB, t=2: does 75 MiB
m MiB, t=3: does 80 MiB

These are just higher than 72.5 MiB which means the theoretical max is <10 kH/s/GPU. \
You could argue that an attacker could buy Radeon VII's which would push those to \
read/write at least 98.1 MiB. I'd argue that people aren't building password cracking \
rigs specifically for memory hard algorithms, but let's say you win:

mP MiB, t=1: does 100 MiB
m  MiB, t=2: does 100 MiB
m MiB, t=3: does 104 MiB

----

Note for encryption you want these to be <1 kH/s/GPU. At these memory sizes the GPU \
likely can't use max bandwidth unless p>1.

m63 MiB, t=1: does 726 MiB
m5 MiB, t=2: does 725 MiB
m‘ MiB, t=3: does 728 MiB

Or for Radeon VII:
mI1 MiB, t=1: does 982 MiB
m7 MiB, t=2: does 985 MiB
m3 MiB, t=3: does 984 MiB

----

To future proof these you could say find a GPU that has an MSRP of about $700 in 2015 \
USD with the highest memory bandwidth or 1000 GB/s which ever is higher. Then divide \
memory bandwidth by ((3*t-1)*10.5 GB/s) for minimum memory size in MiB.

_______________________________________________
CFRG mailing list
CFRG@irtf.org
https://www.irtf.org/mailman/listinfo/cfrg

[prev in list] [next in list] [prev in thread] [next in thread]