[prev in list] [next in list] [prev in thread] [next in thread] 

List:       helix-server-cvs
Subject:    [Server-cvs] engine/core/pub server_engine.h,1.8.72.2,1.8.72.3
From:       dcollins () helixcommunity ! org
Date:       2013-09-24 21:03:29
[Download RAW message or body]

Update of /cvsroot/server/engine/core/pub
In directory cvs01.internal.helixcommunity.org:/tmp/cvs-serv7879/pub

Modified Files:
      Tag: SERVER_NUCLEUS
	server_engine.h 
Log Message:
Synopsis
========
Fixes RPD-2431: high CPU load with select() errors

Branches:  SERVER_NUCLEUS
Reviewer:  Jamie


Description
===========
This was a strange and highly unusual problem that appears to have been
triggered by a problem with the OS.  Numerous things weren't working on
the system related to networking, although a few things worked (ping,
Remote Desktop.)  Whatever the system's problem was, it caused the
server to get WSAEPROVIDERFAILEDINIT errors from socket calls.

While there seems to be little we can do about the core system problem,
we were making it much worse for the user since we were spinning
the CPUs at 100% with select()-related socket errors.

This adds some defensive code such that if we see excessive select()
socket errors in the first minute of uptime, we terminate the
server.  If we see it later, it is a different unknown problem,
so we restart the server.


Files Affected
==============
server/engine/core/server_engine.cpp
server/engine/core/platform/win/engine.cpp
server/engine/core/pub/server_engine.h


Testing Performed
=================
Unit Tests:
- None

Integration Tests:
- After diagnosing the problematic system, I reproduced the main
  part of the behavior by forcing callbacks.Select() to return -1
  in the mainloop.  With the diff we detected the problem and bailed.
  I forced a delay to verify a restart if we'd been running a while.

Leak Tests:
- None

Performance Tests:
- None

Platforms Tested:  Windows 7 / x86 (Debug)
Builds Verified:  VC10 / x86 / Debug


QA Hints
========
This is another one of those that will be hard for QA to test...
I'll have to provide an instrumented build.



Index: server_engine.h
===================================================================
RCS file: /cvsroot/server/engine/core/pub/server_engine.h,v
retrieving revision 1.8.72.2
retrieving revision 1.8.72.3
diff -u -d -r1.8.72.2 -r1.8.72.3
--- server_engine.h	19 Sep 2013 17:25:42 -0000	1.8.72.2
+++ server_engine.h	24 Sep 2013 21:03:12 -0000	1.8.72.3
@@ -70,6 +70,7 @@
 private:
     void                OnSocketsChanged();
     void                OnDescriptorsChanged();
+    void                HandleSelectErrors(UINT32 ulErrCounter, UINT32 ulErrCode);
 
     Process*            m_pProc;
 


_______________________________________________
Server-cvs mailing list
Server-cvs@helixcommunity.org
http://lists.helixcommunity.org/mailman/listinfo/server-cvs
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic