'[tor-dev] Improving Tor network models [was: Update to Proposal 316: FlashFlow]'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       tor-dev
Subject:    [tor-dev] Improving Tor network models [was: Update to Proposal 316: FlashFlow]
From:       "Jansen, Robert G CIV USN NRL (5543) Washington DC (USA)" <rob.g.jansen () nrl ! navy
Date:       2020-10-17 23:42:43
Message-ID: 6959E031-706A-4DB4-B7B4-51C76589B537 () nrl ! navy ! mil
[Download RAW message or body]

> 
> On Oct 8, 2020, at 2:50 PM, Mike Perry <mikeperry@torproject.org> wrote:
> 
> I do not yet have confidence that these issues are solved simply because
> they did not appear in Shadow. Shadow does not simulate multi-instance
> relays, CPU bound relays, or structural load imbalances in the network.

Hi Mike and others!

I'd like to better understand your criticisms here so that we can work to make Shadow \
more useful (work that fits squarely under sponsor 38).

> multi-instance relays

Nothing prevents Shadow from running multiple tor relay processes on the same virtual \
host. We could add this to the Tor models that are created by our model generation \
tool[0].

One issue is that we don't have ground truth about:
- which relays are co-resident with one another; and
- the capacity of the machine hosting the co-resident relays.

A short term fix could be that we look at relays in the same family, and randomly \
choose some of them to run on the same machine (setting the capacity of the machine \
as the sum of max observed bandwidth of the co-resident relays). A longer term \
solution would be to add a new parameter similar to MyFamily and ask operators to \
identify which relays are co-resident, or add to tor a self-measurement of \
co-residency - and that would provide the ground truth we would need for accurate \
modeling.

Thoughts? Any other ideas?

> CPU-bound relays

There are two issues here:
- we need to improve/rewrite our virtual CPU module in Shadow that accounts for CPU \
                load; and
- we need ground truth about the number of CPUs and CPU speeds for each relay.

The first one is relatively straightforward to resolve, the second one again requires \
some form of self-reporting or automated self-measurement in tor.

> structural load imbalances

Could you please explain this one in a couple more sentences?

By 'structural' I think you might mean imbalances across relay positions (i.e., more \
guard bandwidth and less exit bandwidth). If so, then Shadow does already properly \
account for this by statically assigning flags using the TestingDirAuthVoteExit and \
TestingDirAuthVoteGuard torrc options.

Here are some bonus ones for you:

> capacity of relays

We currently use the maximum observed bandwidth that we've seen for a relay and set \
that value as the network link capacity of the (virtual) host machine that runs reach \
relay. Again, we don't have any ground truth of how much capacity is available to \
each relay, though maybe someday FlashFlow will collect it for us.

> diversity of Tor versions

We should make sure our modeling tool includes relays across different versions of \
Tor, since not all relays in the public network run the same version. This one is \
pretty simple to fix (it just requires us to build Tor plugins multiple different Tor \
source versions) but research that is testing how a new idea performs across the \
network by modifying Tor source will obviously need to use their custom research \
version of Tor.

Peace, love, and positivity,
Rob

[0] https://github.com/shadow/tornetgen
_______________________________________________
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

[prev in list] [next in list] [prev in thread] [next in thread]