[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-arm-kernel
Subject:    Re: On TLB flushing
From:       Russell King - ARM Linux <linux () arm ! linux ! org ! uk>
Date:       2004-04-16 16:51:09
Message-ID: 20040416175109.K29891 () flint ! arm ! linux ! org ! uk
[Download RAW message or body]

On Fri, Apr 16, 2004 at 09:20:54AM -0700, Marc Singer wrote:
> On Fri, Apr 16, 2004 at 05:06:06PM +0100, Russell King - ARM Linux wrote:
> > No.  After that point, a read to that page _may_ (not should) generate
> > a fault.  There is no requirement that accesses do generate a fault at
> > this stage.
> > 
> > As far as the kernel is concerned, the page is still mapped into the
> > user process and the user process has every right to access that page
> > without generating a fault and telling the kernel about it.
> 
> Let me understand this.  
> 
> The kernel has knowingly removed the PTE without flushing the TLB.

No.  You're looking at it from too low a level.  Take a moment to stand
back and look at what's happening from a higher level.

High-level: The kernel has marked the page "old".  However, the page
is still mapped into user space, and may be accessed by userspace at
any time without causing any faults.  IOW, the page has not been
unmapped.


Low-level: In order to ascertain whether the page is still in use,
and thereby to restore its "young" status, the architecture
implementation has _lazily_ disabled access to the page.

Accesses to this page are still permitted as long as the TLB contains
an entry.  The lifetime of the TLB entry is bounded by either the next
context switch, or the TLB entry being evicted due to the TLBs
replacement rules.

=== time passes ===

When the TLB entry has been evicted, the next access causes a fault,
and the kernel looks at the PTE entry.


High-level: The kernel notices that the PTE indicates an "old" but
present page.  It marks it "young" and returns.


Low-level: We re-instate the PTE entry.  Since the TLB does not
contain an entry for this address, we avoid the flush.  When the
aborted instruction is retried, the MMU fetches the PTE and places
it in the TLB.



I hope this gives you a better understanding.  Now, to the particular
points you've raised:

> It
> might leave this condition as is.  It, for some reason, the TLB is
> cleared (e.g. context switch), then a reference to that page will
> cause a fault even though the kernel hasn't reassigned the page.  This
> seems strange to me.  Why change the hardware mapping if it isn't
> going to reassign the page?

We are after a feel for which pages in the system are in use, so we
can improve the selection of which pages to throw out on to disk or
discard and which to keep.  This is only meant to be a hint to the
MM layer, and not an absolute state indicator.

> Hang on.  Do we do this so that we can detect that the page is still
> in use?

Yes.  But again, I'll stress that we are only looking for a vague hint
not an absolute state.  "Lazy" is the buzz-word here.

> In other words, we know the page is still mapped, but we
> don't tell the CPU that, we tell it that the page is gone.  If the
> user accesses the page again, we remap the PTE and set the appropriate
> kernel-pte bits.

More or less.

> If this is so, isn't it imperative that we flush the tlb?

No, because we are looking for hints, not an absolute "this is being
accessed right now" indication - we couldn't really care that the page
has been accessed a couple of ms after we've marked it young.

> > > > However, once the page has been cleared (by, eg, ptep_clear_flush())
> > > > then the page is free to be reused.  In that case, having a stale
> > > > PTE entry would be _really_ bad.
> > > 
> > > I'll have to get a more detailed execution trace so that I can
> > > convince you that the flush code isn't being called.
> > 
> > If ptep_clear_flush() has not been called, as far as the kernel is
> > concerned, userspace has every right to access the page.
> > 
> > If you're saying that, in your case, the kernel thinks it owns the
> > page and is overwriting the contents of that page, there is a serious
> > kernel bug somewhere, and it isn't with the ARM implementation.
> 
> I don't think this is true.  Adding a TLB flush eliminates the bug.
> If this were merely a page overwrite problem then I think the TLB
> flush wouldn't fix it.

Please read my comments in the context of your comments above and not
in isolation.  I believe my comments are true in the context of your
statements.

> Honestly, I don't see how other architectures could be experiencing
> this problem.  No matter how cool I might think it would be to find a
> bonafide kernel bug, I cannot believe that I'm the first to uncover
> something so wrong.

Other architectures.

* x86:

  - page is marked old.  PTE entry remains intact, pointing at the page.
  - page is accessed, hardware itself marks the page old.  No software
    intevention at all.

* Alpha:

  - same as ARM; "old" emulated by lazily disabling access to the page
    and _not_ flushing the TLB, and waiting for a fault to mark it
    young again.

-------------------------------------------------------------------
Subscription options: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm-kernel
FAQ:       http://www.arm.linux.org.uk/armlinux/mlfaq.php
Etiquette: http://www.arm.linux.org.uk/armlinux/mletiquette.php
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic