[prev in list] [next in list] [prev in thread] [next in thread] 

List:       ast-users
Subject:    Re: [ast-users] [ast-developers] vmalloc memory allocations via shared memory ? / was: Re: [patch] v
From:       Glenn Fowler <glenn.s.fowler () gmail ! com>
Date:       2013-12-10 9:27:05
Message-ID: CAK449vCyVVvPk_Jm9uATDjsgcMgpQC33sfP9kiab5Fjov3SkLw () mail ! gmail ! com
[Download RAW message or body]

[Attachment #2 (multipart/alternative)]


i'll defer to kpv on this


On Tue, Dec 10, 2013 at 4:08 AM, Lionel Cons <lionelcons1972@gmail.com>wrote:

> On 10 December 2013 09:13, Glenn Fowler <glenn.s.fowler@gmail.com> wrote:
> > On Mon, Dec 9, 2013 at 5:17 PM, Roland Mainz <roland.mainz@nrubsig.org>
> > wrote:
> >>
> >> On Mon, Dec 9, 2013 at 10:53 PM, Roland Mainz <roland.mainz@nrubsig.org
> >
> >> wrote:
> >> > On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler <
> glenn.s.fowler@gmail.com>
> >> > wrote:
> >> >> On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak <
> iszczesniak@gmail.com>
> >> >> wrote:
> >> [snip]
> >> > Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) "excempt"
> >> > from this functionality since it cannot overcommit memory (except if
> >> > someone uses |MAP_NORESERVE| or uses kernel debugging options in
> >> > /etc/system) ...
> >> >
> >> > ... attached (as
> >> > "astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt") is a
> >> > patch which...
> >> > 1. ... restores this exception for Solaris
> >> >
> >> > 2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB for
> >> > 64bit processes since both values are more or less the points where
> >> > the fragmentation stops. Note that this does *not* mean it will use so
> >> > much memory... it only means that it reserves this amount of memory
> >> > and the real allocation happens on the first read, write or execute
> >> > access of the matching MMU page. This also means there is no
> >> > performance difference between a 1MB |mmap(MAP_ANON)| and a 128MB
> >> > |mmap(MAP_ANON)| since it only reserves memory but does not
> >> > initalise/allocate it yet... this happens on the first time it's
> >> > accessed. The other reasons for the 4MB/16MB size were: x86 has 2MB
> >> > largepages, allowing a ksh process to benefit from such pages,
> >> > additionaly most AST (including ksh93) applications consume a few MB
> >> > of memory... so there is a good chance that the "typical"
> >> > application/shell memory consumtion completly fits into that 4MB
> >> > chunk. 64bit processes get four times as much memory since it's
> >> > expected that they may operate on much larger datasets (and see the
> >> > comment about fragmentation above)
> >> >
> >> > Just to demonstrate "reservation" vs. "real usage" via Solaris pmap:
> >> > -- snip --
> >> > $ ksh -c 'print hello ; pmap -x $$ ; true' | egrep '16384.*anon'
> >> > FFFFFD7FFDA00000      16384        148         20          - rw---
>  [
> >> > anon ]
> >> > -- snip --
> >> > The test shows that of 16384k only 148k have really been touched...
> >> > the difference (16384-148) is reserved by the shell process but not
> >> > used.
> >> >
> >> > 3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 to
> >> > describe whether the kernel permits overcommitment of memory or not.
> >> > AFAIK a simple function could be written which returns |-1| (not not
> >> > permit overcommitment), |0| (don't know) or |1| (does permit
> >> > overcommitment) ... and if the function returns |-1| vmalloc should do
> >> > the same as on Solaris
> >> >
> >> > 4. The patch removes one unneccesary |memset(p, 0, size)| which was
> >> > touching pages and therefore allocating them
> >>
> >> Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above
> >> vmalloc seems to resort to try shared memory:
> >> -- snip --
> >> shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT)   = 8
> >> brk(0x00603480)                                 = 0
> >> shmat(8, 0, 0600)                               = 0xFFFFFD7FFAA00000
> >> shmdt(0xFFFFFD7FFAA00000)                       = 0
> >> shmat(8, 0xDFFFFFAFFFA83000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xEFFFFE97FD241000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xF7FFFE0BFBE20000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFBFFFDC5FB410000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFDFFFDA2FAF08000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFEFFFD917AC84000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFF7FFD88BAB42000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFBFFD845AAA1000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFDFFD822AA50000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFEFFD8112A28000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFF7FD8086A14000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFFBFD8040A0A000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFFDFD801DA05000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFFEFD800C202000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFFF7D8003601000, 0600)              Err#22 EINVAL
> >> shmat(8, 0xFFFFBD7FFF000000, 0600)              = 0xFFFFBD7FFF000000
> >> shmdt(0xFFFFBD7FFF000000)                       = 0
> >> shmat(8, 0xFFFFBD7FFB000000, 0600)              = 0xFFFFBD7FFB000000
> >> shmdt(0xFFFFBD7FFB000000)                       = 0
> >> shmat(8, 0xFFFFBD7FF7000000, 0600)              = 0xFFFFBD7FF7000000
> >> shmdt(0xFFFFBD7FF7000000)                       = 0
> >> -- snip --
> >> ... note that such an allocation is... erm... not wise... because
> >> shared memory is usually a resource which system-wide... which means
> >> if many shell processes use shared memory it won't be available for
> >> other proceses (like databases) anymore.'
> >
> >
> > if you look at that particular code its probing process address
> boundaries
> > and then releasing after the probe (shmdt())
>
> How useful is this kind of probing? Most operating systems restrict
> shared memory to a specific virtual address range which is defined at
> boot time. Probing outside that range will always return a failure
> because its not in the 'address window' defined by the system.
> AFAIK such a probe strikes me as pretty useless because it depends on
> a behaviour which is not portable across platforms or different
> hardware configurations running the same operating system version.
>
> Lionel
>

[Attachment #5 (text/html)]

<div dir="ltr">i&#39;ll defer to kpv on this</div><div \
class="gmail_extra"><br><br><div class="gmail_quote">On Tue, Dec 10, 2013 at 4:08 AM, \
Lionel Cons <span dir="ltr">&lt;<a href="mailto:lionelcons1972@gmail.com" \
target="_blank">lionelcons1972@gmail.com</a>&gt;</span> wrote:<br> <blockquote \
class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc \
solid;padding-left:1ex"><div class="HOEnZb"><div class="h5">On 10 December 2013 \
09:13, Glenn Fowler &lt;<a \
href="mailto:glenn.s.fowler@gmail.com">glenn.s.fowler@gmail.com</a>&gt; wrote:<br>

&gt; On Mon, Dec 9, 2013 at 5:17 PM, Roland Mainz &lt;<a \
href="mailto:roland.mainz@nrubsig.org">roland.mainz@nrubsig.org</a>&gt;<br> &gt; \
wrote:<br> &gt;&gt;<br>
&gt;&gt; On Mon, Dec 9, 2013 at 10:53 PM, Roland Mainz &lt;<a \
href="mailto:roland.mainz@nrubsig.org">roland.mainz@nrubsig.org</a>&gt;<br> &gt;&gt; \
wrote:<br> &gt;&gt; &gt; On Fri, Dec 6, 2013 at 5:40 AM, Glenn Fowler &lt;<a \
href="mailto:glenn.s.fowler@gmail.com">glenn.s.fowler@gmail.com</a>&gt;<br> &gt;&gt; \
&gt; wrote:<br> &gt;&gt; &gt;&gt; On Thu, Dec 5, 2013 at 4:50 PM, Irek Szczesniak \
&lt;<a href="mailto:iszczesniak@gmail.com">iszczesniak@gmail.com</a>&gt;<br> &gt;&gt; \
&gt;&gt; wrote:<br> &gt;&gt; [snip]<br>
&gt;&gt; &gt; Erm... Solaris (|__SunOS|) was once (pre-vmalloc-rewrite) \
&quot;excempt&quot;<br> &gt;&gt; &gt; from this functionality since it cannot \
overcommit memory (except if<br> &gt;&gt; &gt; someone uses |MAP_NORESERVE| or uses \
kernel debugging options in<br> &gt;&gt; &gt; /etc/system) ...<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; ... attached (as<br>
&gt;&gt; &gt; &quot;astksh20131010_vmalloc_sunos_fragmentation_fix001.diff.txt&quot;) \
is a<br> &gt;&gt; &gt; patch which...<br>
&gt;&gt; &gt; 1. ... restores this exception for Solaris<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 2. ... bumps the |mmap()| size to 4MB for 32bit processes and 16MB \
for<br> &gt;&gt; &gt; 64bit processes since both values are more or less the points \
where<br> &gt;&gt; &gt; the fragmentation stops. Note that this does *not* mean it \
will use so<br> &gt;&gt; &gt; much memory... it only means that it reserves this \
amount of memory<br> &gt;&gt; &gt; and the real allocation happens on the first read, \
write or execute<br> &gt;&gt; &gt; access of the matching MMU page. This also means \
there is no<br> &gt;&gt; &gt; performance difference between a 1MB |mmap(MAP_ANON)| \
and a 128MB<br> &gt;&gt; &gt; |mmap(MAP_ANON)| since it only reserves memory but does \
not<br> &gt;&gt; &gt; initalise/allocate it yet... this happens on the first time \
it&#39;s<br> &gt;&gt; &gt; accessed. The other reasons for the 4MB/16MB size were: \
x86 has 2MB<br> &gt;&gt; &gt; largepages, allowing a ksh process to benefit from such \
pages,<br> &gt;&gt; &gt; additionaly most AST (including ksh93) applications consume \
a few MB<br> &gt;&gt; &gt; of memory... so there is a good chance that the \
&quot;typical&quot;<br> &gt;&gt; &gt; application/shell memory consumtion completly \
fits into that 4MB<br> &gt;&gt; &gt; chunk. 64bit processes get four times as much \
memory since it&#39;s<br> &gt;&gt; &gt; expected that they may operate on much larger \
datasets (and see the<br> &gt;&gt; &gt; comment about fragmentation above)<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Just to demonstrate &quot;reservation&quot; vs. &quot;real usage&quot; \
via Solaris pmap:<br> &gt;&gt; &gt; -- snip --<br>
&gt;&gt; &gt; $ ksh -c &#39;print hello ; pmap -x $$ ; true&#39; | egrep \
&#39;16384.*anon&#39;<br> &gt;&gt; &gt; FFFFFD7FFDA00000      16384        148        \
20          - rw---    [<br> &gt;&gt; &gt; anon ]<br>
&gt;&gt; &gt; -- snip --<br>
&gt;&gt; &gt; The test shows that of 16384k only 148k have really been touched...<br>
&gt;&gt; &gt; the difference (16384-148) is reserved by the shell process but not<br>
&gt;&gt; &gt; used.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 3. Linux has /proc/sys/vm/overcommit_memory which is either 0 or 1 \
to<br> &gt;&gt; &gt; describe whether the kernel permits overcommitment of memory or \
not.<br> &gt;&gt; &gt; AFAIK a simple function could be written which returns |-1| \
(not not<br> &gt;&gt; &gt; permit overcommitment), |0| (don&#39;t know) or |1| (does \
permit<br> &gt;&gt; &gt; overcommitment) ... and if the function returns |-1| vmalloc \
should do<br> &gt;&gt; &gt; the same as on Solaris<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; 4. The patch removes one unneccesary |memset(p, 0, size)| which was<br>
&gt;&gt; &gt; touching pages and therefore allocating them<br>
&gt;&gt;<br>
&gt;&gt; Note that if I use VMALLOC_OPTIONS=getmem=safe with the patch above<br>
&gt;&gt; vmalloc seems to resort to try shared memory:<br>
&gt;&gt; -- snip --<br>
&gt;&gt; shmget(IPC_PRIVATE, 67108864, 0600|IPC_CREAT)   = 8<br>
&gt;&gt; brk(0x00603480)                                 = 0<br>
&gt;&gt; shmat(8, 0, 0600)                               = 0xFFFFFD7FFAA00000<br>
&gt;&gt; shmdt(0xFFFFFD7FFAA00000)                       = 0<br>
&gt;&gt; shmat(8, 0xDFFFFFAFFFA83000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xEFFFFE97FD241000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xF7FFFE0BFBE20000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFBFFFDC5FB410000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFDFFFDA2FAF08000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFEFFFD917AC84000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFF7FFD88BAB42000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFBFFD845AAA1000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFDFFD822AA50000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFEFFD8112A28000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFF7FD8086A14000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFFBFD8040A0A000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFFDFD801DA05000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFFEFD800C202000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFFF7D8003601000, 0600)              Err#22 EINVAL<br>
&gt;&gt; shmat(8, 0xFFFFBD7FFF000000, 0600)              = 0xFFFFBD7FFF000000<br>
&gt;&gt; shmdt(0xFFFFBD7FFF000000)                       = 0<br>
&gt;&gt; shmat(8, 0xFFFFBD7FFB000000, 0600)              = 0xFFFFBD7FFB000000<br>
&gt;&gt; shmdt(0xFFFFBD7FFB000000)                       = 0<br>
&gt;&gt; shmat(8, 0xFFFFBD7FF7000000, 0600)              = 0xFFFFBD7FF7000000<br>
&gt;&gt; shmdt(0xFFFFBD7FF7000000)                       = 0<br>
&gt;&gt; -- snip --<br>
&gt;&gt; ... note that such an allocation is... erm... not wise... because<br>
&gt;&gt; shared memory is usually a resource which system-wide... which means<br>
&gt;&gt; if many shell processes use shared memory it won&#39;t be available for<br>
&gt;&gt; other proceses (like databases) anymore.&#39;<br>
&gt;<br>
&gt;<br>
&gt; if you look at that particular code its probing process address boundaries<br>
&gt; and then releasing after the probe (shmdt())<br>
<br>
</div></div>How useful is this kind of probing? Most operating systems restrict<br>
shared memory to a specific virtual address range which is defined at<br>
boot time. Probing outside that range will always return a failure<br>
because its not in the &#39;address window&#39; defined by the system.<br>
AFAIK such a probe strikes me as pretty useless because it depends on<br>
a behaviour which is not portable across platforms or different<br>
hardware configurations running the same operating system version.<br>
<span class="HOEnZb"><font color="#888888"><br>
Lionel<br>
</font></span></blockquote></div><br></div>



_______________________________________________
ast-users mailing list
ast-users@lists.research.att.com
http://lists.research.att.com/mailman/listinfo/ast-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic