[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openbsd-tech
Subject:    Re: amd64: add tsc_delay(), a TSC-based delay(9) implementation
From:       Mark Kettenis <mark.kettenis () xs4all ! nl>
Date:       2020-08-25 20:03:23
Message-ID: ea27a40a3915085f () bloch ! sibelius ! xs4all ! nl
[Download RAW message or body]

> Date: Tue, 25 Aug 2020 12:20:22 -0700
> From: Mike Larkin <mlarkin@nested.page>
> 
> On Mon, Aug 24, 2020 at 12:29:15AM -0500, Scott Cheloha wrote:
> > On Sun, Aug 23, 2020 at 11:45:22PM -0500, Scott Cheloha wrote:
> > >
> > > [...]
> > >
> > > > > This patch (or something equivalent) is a prerequisite to running the
> > > > > lapic timer in oneshot or TSC deadline mode.  Using the lapic timer to
> > > > > implement delay(9) when it isn't running in periodic mode is too
> > > > > complicated.  However, using the i8254 for delay(9) is too slow.  We
> > > > > need an alternative.
> > > >
> > > > Hmm, but what are we going to use on machines where the TSC isn't
> > > > constant/invariant?
> > >
> > > Probably fall back on the i8254?  Unless someone wants to add yet
> > > another delay(9) implementation to amd64...
> > >
> > > > In what respect is the i8254 too slow?  Does it take more than a
> > > > microsecond to read it?
> > >
> > > On my machine, the portion of gettick() *within* the mutex runs in ~19
> > > microseconds.
> > >
> > > That's before any overhead from mtx_enter(9).  I think having multiple
> > > threads in delay(9) should be relatively rare, but you have to keep
> > > that in mind.
> > >
> > > No idea what the overhead would look like on real hardware.  I'm
> > > pretty sure my i8254 is emulated.
> > >
> > > > We could use the HPET I suppose, whic may be a bit better.
> > >
> > > It's better.  No mutex.  On my machine it takes ~11 microseconds.
> > > It's a start.
> >
> > Hmmm, now I'm worried I have screwed something up or misconfigured
> > something.
> >
> > It doesn't seem right that it would take 20K cycles to read the HPET
> > on this machine.
> >
> > Am I way off?  Or is 20K actually a reasonable number?
> >
> 
> There have been reports of the HPET being really slow on some machines.
> IIRC this is why we ended up getting a tsc timecounter a number of years
> ago. Someone (reyk@?) found his skylake had a super slow HPET and that
> ended up being part of the impetus to to a tsc timecounter.

I believe that was "discovered" years ago, before Skylake existed.

Anyway, yes, HPET is much slower.  But since both Intel and AMD have
seem to mess up the TSC every other CPU generation or so we have to
have a fallback.

> Also, 20k cycles is totally expected if you are on a VM (not sure if
> this is the case).

And in the end, delay(9) should not be used in performance critical
paths, so it doesn't matter all that much.  Your emulated com(4) may
result in some wated cycles perhaps.  But if you pushing lots of data
over your virtual serial port, maybe you should rethink what you're
doing.

> > For comparison, lapic_gettick() completes in... 80 nanoseconds (?) on
> > the same machine.  Relevant sysctls:
> >
> 
> LAPIC memory page accesses go to the CPU. It's not always the case that
> the HPET does the same (they may be accessed via PCI). Also, in a VM,
> on new CPUs, LAPIC virtualization can be enabled which means no exits
> for LAPIC accesses. So, yeah, these numbers you are seeing aren't surprising.
> 
> > $ sysctl hw.{model,setperf,perfpolicy} machdep.{tscfreq,invarianttsc}
> > hw.model=Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
> > hw.setperf=100
> > hw.perfpolicy=high
> > machdep.tscfreq=2112000000
> > machdep.invarianttsc=1
> >
> > ... if it really takes that long, then "high precision" is a bit of a
> > misnomer.
> >
> 

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic