[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-kernel
Subject:    Networking & atomic allocations in kernel [patch]
From:       Zlatko Calusic <Zlatko.Calusic () CARNet ! hr>
Date:       1997-10-23 16:03:56
[Download RAW message or body]

Well, many people on the list claim that atomic memory allocations can 
fail, and that it's no big trouble if it happens.

But... what if I have to wait 10-30 minutes after first failed
allocation, to get my machine back ? And all my running applications
got killed!

Problem:
While working logged on my Linux machine remotely (via XDM login),
it's easy to kill it. If I start Netscape and XEmacs, so machine swap
somewhat, and then start FTP or just click on some URL in Netscape, I
lose control. Workaround is to reset (CTRL-ALT-BS) Xserver on local
machine (also Linux, of course), then wait about 15 minutes, sometimes
even half an hour, and then I can log in again. In the meantime, Linux
refuses even ICMP packets!!! After I gain control of it, kernel log is
full of "Insufficient memory; nuking packet!" messages (thousands of
them). It's slightly harder to accomplish this lockup while on the
console, but not impossible. Happened to me at least twice, only
reboot was the solution.

My investigation:
Networking layer uses buffers (skb's) of ~1700 bytes, if you're using
ethernet physical layer (MTU: 1500). That means: slab allocator will
use size-2048 cache! Size of one slab for size-2048 cache is four (4)
pages. Under condition of (rather) heavy networking, memory becomes so 
fragmented, that you can't find even one chunk of free pages bigger
than 8k. All get_free_page(...,order,...) request will fail, if order
is >= 2. And lots of ethernet packets are ariving, waiting to be
served!

Now question is: what does kswapd do to free memory? Answer is:
nothing! After kswapd brings nr_free_pages to free_pages_high, it just
sleeps. To be completely correct, it wakes 4 times a second, but just
to see there's no job for it. And that's the time I loose control of
my Linux. Too bad.

Workaround:
Kswapd should check if memory is badly fragmented, and if there is not
enough free pages in free_area lists of order >= 2, it should swap out
pages until it frees few chunks. Under some circumstances (with
100Mb/s ethernet, few TCP connections) it was easy to have 6MB or even
more of free memory, with not one 16kB contiguous free area in
memory!!! Since kswapd has a tough job with this fragmentation,
occasionaly I still get "Insufficient memory..." messages, but now my
system doesn't lock any more. And networking is faster! In the patch
I'm using 'min_free_pages / 2' as a minimal number of pages I want to
have in at least 16kb (4 pages) chunks. I got to this after thorough
testing and experimenting , and it seems like it's a good idea. Patch
is well tested and should not make you any trouble.

BTW, tipical suggestions like:
	"... just echo "256 1024 2048" > /proc/sys/vm/freepages..."
doesn't help at all with memory fragmentation. It just make computer
crawl (since you're not using all of memory you payed for :)).

My setup (though it should not be important in this issue):
Pentium 133, 32MB RAM, 10/100MB/s ethernet, de4x5, Linux v2.1.59.

Disclaimer:
I hope my english was good enough to explain what I wanted to. :)

Patch follows:
-------------------------------------------------------------------------


diff -urN linux-2.1.59/include/linux/swap.h linux/include/linux/swap.h
--- linux-2.1.59/include/linux/swap.h	Tue Oct 21 21:31:30 1997
+++ linux/include/linux/swap.h	Tue Oct 21 22:06:48 1997
@@ -34,6 +34,7 @@
 
 extern int nr_swap_pages;
 extern int nr_free_pages;
+extern int nr_free_pages_bigorder;
 extern atomic_t nr_async_pages;
 extern int min_free_pages;
 extern int free_pages_low;
diff -urN linux-2.1.59/mm/page_alloc.c linux/mm/page_alloc.c
--- linux-2.1.59/mm/page_alloc.c	Tue Jun 17 01:36:01 1997
+++ linux/mm/page_alloc.c	Tue Oct 21 22:06:48 1997
@@ -30,6 +30,9 @@
 int nr_swap_pages = 0;
 int nr_free_pages = 0;
 
+/* Number of the free pages in chunks of order 2 and bigger */
+int nr_free_pages_bigorder = 0;
+
 /*
  * Free area management
  *
@@ -118,12 +121,17 @@
 		if (!test_and_change_bit(index, area->map))
 			break;
 		remove_mem_queue(list(map_nr ^ -mask));
+		if (order >= 2)
+			nr_free_pages_bigorder -= 1 << order;
 		mask <<= 1;
+		order++;
 		area++;
 		index >>= 1;
 		map_nr &= mask;
 	}
 	add_mem_queue(area, list(map_nr));
+	if (order >= 2)
+		nr_free_pages_bigorder += 1 << order;
 
 #undef list
 
@@ -171,6 +179,8 @@
 				(prev->next = ret->next)->prev = prev; \
 				MARK_USED(map_nr, new_order, area); \
 				nr_free_pages -= 1 << order; \
+			        if (new_order >= 2) \
+				        nr_free_pages_bigorder -= 1 << new_order; \
 				EXPAND(ret, map_nr, order, new_order, area); \
 				spin_unlock_irqrestore(&page_alloc_lock, flags); \
 				return ADDRESS(map_nr); \
@@ -187,6 +197,8 @@
 		area--; high--; size >>= 1; \
 		add_mem_queue(area, map); \
 		MARK_USED(index, high, area); \
+		if (high >= 2) \
+		        nr_free_pages_bigorder += 1 << high; \
 		index += size; \
 		map += size; \
 	} \
diff -urN linux-2.1.59/mm/vmscan.c linux/mm/vmscan.c
--- linux-2.1.59/mm/vmscan.c	Tue Oct 21 21:31:24 1997
+++ linux/mm/vmscan.c	Wed Oct 22 12:52:27 1997
@@ -464,7 +464,8 @@
 			pages = nr_free_pages;
 			if (nr_free_pages >= min_free_pages)
 				pages += atomic_read(&nr_async_pages);
-			if (pages >= free_pages_high)
+			if (pages >= free_pages_high &&
+				nr_free_pages_bigorder >= min_free_pages / 2)
 				break;
 			wait = (pages < free_pages_low);
 			if (try_to_free_page(GFP_KERNEL, 0, wait))
@@ -488,7 +489,7 @@
 	int want_wakeup = 0, memory_low = 0;
 	int pages = nr_free_pages + atomic_read(&nr_async_pages);
 
-	if (pages < free_pages_low)
+	if (pages < free_pages_low || nr_free_pages_bigorder < min_free_pages / 2)
 		memory_low = want_wakeup = 1;
 	else if (pages < free_pages_high && jiffies >= next_swap_jiffies)
 		want_wakeup = 1;




Regards,
-- 
Posted by Zlatko Calusic           E-mail: <Zlatko.Calusic@CARNet.hr>
---------------------------------------------------------------------
RTFM in Unix: read the fine manual; RTFM in Win32: reboot the fine machine

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic