'Re: [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable'

[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kvm
Subject:    Re: [PATCH] KVM: PPC: Book3S HV: Make the guest MMU hash table size configurable
From:       Avi Kivity <avi () redhat ! com>
Date:       2012-04-30 13:34:33
Message-ID: 4F9E94E9.9040500 () redhat ! com
[Download RAW message or body]

On 04/30/2012 02:54 PM, Paul Mackerras wrote:
> > >
> > > It's not practical to grow the HPT after the guest has started
> > > booting.  It is possible to have two HPTs: one that the guest sees,
> > > which can be in pageable memory, and another shadow HPT that the
> > > hardware uses, which has to be in physically contiguous memory.  In
> > > this model the size of the shadow HPT can be changed at will, at the
> > > expense of having to reestablish the entries in it, though that can be
> > > done on demand.  I have avoided that approach until now because it
> > > uses more memory and is slower than just having a single HPT.
> > 
> > This is similar to x86 in the pre npt/ept days, it's indeed slow.  I
> > guess we'll be stuck with the pv hash until you get nested lookups (at
> > least a nested hash lookup is just 3 accesses instead of 24).
>
> How do you get 24?  Naively I would have thought that with a 4-level
> guest page table and a 4-level host page table you would get 16
> accesses.  

Each of the four guest ptes requires 4 host ptes accesses for
translation, plus one access to fetch the pte itself.  Finally the data
access itself needs 4 host ptes.  4*5+4 = 24 -- so we need 25 memory
accesses in a guest to fetch a word with an empty TLB, compared to 5 on
bare metal.

> I have seen a research paper that shows that those accesses
> can be cached really well, whereas accesses in a hash generally don't
> cache well at all.

Yes, generally the first three levels (on both guest and host) cache
well, plus there are intermediate TLB entries for them.  The last level
misses on large guests (since it occupies 0.2% of memory even with a
single mm_struct), which is why we (=Andrea) implemented transparent
huge pages that remove it completely.  Another thing that can't be done
on ppc IIUC.  Maybe you should talk to your hardware people.

> > How are limits managed?  Won't a user creating a thousand guests with a
> > 16MB hash each bring a server to its knees?
>
> Well, that depends on how much memory the server has.  In my
> experience the limit seems to be about 300 to 400 guests on a POWER7
> with 128GB of RAM; that's with each guest getting 0.5GB of RAM (about
> the minimum needed to boot Fedora or RHEL successfully) and using KSM.
> Beyond that it gets really short of memory and starts thrashing.  It
> seems to be the guest memory that consumes the memory rather than the
> HPTs, which are much smaller.  And for a 0.5GB guest, a 1MB HPT is
> ample, so 1000 guests then only use up 1GB.  Part of the point of my
> patch is to allow userspace to make the HPT be 1MB rather than 16MB
> for small guests like these.
>

Okay, sounds reasonable for the constraints you have.

I guess you can still make the sizing automatic by deferring hash table
allocation until the first KVM_RUN (when you know the size of guest
memory) but that leads to awkward locking and doesn't mesh well with
memory hotplug.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
[prev in list] [next in list] [prev in thread] [next in thread]
Configure | About | News | Add a list | Sponsored by KoreLogic