[prev in list] [next in list] [prev in thread] [next in thread] 

List:       opensolaris-security-discuss
Subject:    Re: [networking-discuss] New project proposal: IP	Datapath	Refactoring
From:       Erik Nordmark <erik.nordmark () sun ! com>
Date:       2009-01-29 2:37:56
Message-ID: 49811684.1050009 () sun ! com
[Download RAW message or body]

Dan McDonald wrote:
> Pardon the top-post, but I think the Security community will be interested
> in this project too because complexity is the enemy of security, and this
> project reduces complexity.

Sorry for not thinking about that up front.

> And as a core contributor in Security, I ACK/+1 this project for endorsing
> this project.  (The project team can deem this endorsement inappropriate if
> they wish.)

I don't have an issue with endorsements from the security community.

    Erik

> Dan
> 
> On Wed, Jan 28, 2009 at 03:07:57PM -0800, Erik Nordmark wrote:
>> We would like to start the IP datapath refactoring project.
>> We are requesting endorsement from the the networking community.
>>
>> Thanks,
>>     Erik
>>
>>
>> -- OPENSOLARIS PROJECT PROPOSAL --
>>
>>
>> Project Name:
>> 	IP Datapath Refactoring
>>
>> Project Synopsis:
>> 	Simplify the IP Datapaths to make them more understandable and evolvable
>>
>>
>> Project Purpose:
>>
>> The IP datapaths are extemely hard to follow both at the micro level 
>> (ip_output_options and ip_wput_ire, and ip_input) and at the macro level 
>> (an outbound packet needing IPsec and ARP resolution goes through an odd 
>> number of steps).
>>
>> That makes it hard to even fix bugs in that code, let alone getting it 
>> to perform. This has resulted improving performance by creating numerous 
>> fast paths, which are subsets of the full datapaths. This further makes 
>> maintenance of the code a hazardous activity.
>> The root cause of the complexity is that ip_newroute introduces 
>> asynchrony in the wrong part of the code. Tradionally ARP is done at the 
>> very bottom of the IP output side, but to avoid a separate ARP table 
>> lookup Solaris has an IRE_CACHE entry which is created to include the 
>> ARP information. This is done early in ip_output because the IRE_CACHE 
>> is also used to pick an IP source address in some cases (unconnected UDP 
>> and RAWIP sockets) and we need the IP source address early (before doing 
>> IPsec etc).
>>
>> We need to move the ARP-related asynchrony to the bottom of IP output to 
>> get the output datapaths be more sane, and it also makes sense to 
>> disassociate source address selection from routing/IRE lookup. (In 1991 
>> the source address selection was simpler than today to the association 
>> made some sense. But with IPMP, IPv6, shared-IP zones etc the source 
>> address selection can't simply be associated with the route.)
>>
>> A side effect of ip_newroute is that we need to carry various 
>> information from the transport protocols to the point after ip_newroute 
>> is done. We've created various ways to put this information in the 
>> messages so that they can be queued with the packets waiting for ARP 
>> resolution; the ip6i_t is there for this purpose as well as the 
>> ipsec_{in,out}_t which is currently used for more than just IPsec. There 
>> are also ad-hoc places we scribble information (b_prev, etc).
>>
>> Note that the ip6i_t and M_CTL are also used to carry information 
>> between the transport protocols (for both the input and output path). 
>> But after Fireengine in S10 introduced direct function calls between the 
>> transports and IP we are no longer limited to passing a message using 
>> putnext. Hence we can relatively easily add function call arguments up 
>> and down between the transports and IP and have those function call 
>> arguments carry the meta-data associated with the packet (an example of 
>> meta-data is that on the receive side the transports need the incoming 
>> interface - the ill_t - to handle IP_RECVPKTINFO and IPv6 link-local 
>> addresses correctly.)
>>
>> Having looked at the dependencies that unravel when ip_newroute is 
>> removed it turns out that the whole concept of IRE_CACHE isn't needed 
>> any more. We can do more efficient caching (and S10 already does for 
>> TCP) by caching the IRE and NCE (neighb or cache entry containing ARP 
>> information) in the conn_t.
>>
>> This results in the removal of
>> 	ip_newroute*
>> 	IRE_CACHE
>> 	ip6i_t
>> 	M_CTL usage, including ipsec_out_t and ipsec_in_t
>> 	Various b_prev usage in the ip_input side
>> and the addition of
>> 	ip_xmit_attr_t - the transmit attributes passed to ip_output
>> 	ip_recv_attr_t - receive attributes passed up to the ULP (and used 
>> internally
>> 	in IP)
>> 	A new way to track dependencies when IREs are added and removed
>> 	Using nce_t for ARP information (we do this partially today; mostly for the
>> 	IPv4 forwarding paths)
>>
>> Current prototyping indicates that about 30,000 lines of code can be 
>> removed as a result of these changes (combined with the ARP/IP merge 
>> pieces).
>>
>> The discussion will take place on the existing 
>> networking-discuss@opensolaris.org list.
>>
>>
>> Proposed Community Sponsors:
>>      Networking
>>
>>
>> Participants:
>>      Project lead:
>>          Erik Nordmark
>>
>>      Other Participants:
>>          Sowmini Varadhan
>> 	Yunsong (Roamer) Lu
>> 	Nitin Hande
>>
>>    Other interested participants: please speak up.  We have some
>>    prototype code, and contributions of review time, bug fixes, or
>>    testing are very welcome; there's a lot of code changes here.
>>
>> ------
>> _______________________________________________
>> networking-discuss mailing list
>> networking-discuss@opensolaris.org
> _______________________________________________
> networking-discuss mailing list
> networking-discuss@opensolaris.org

_______________________________________________
security-discuss mailing list
security-discuss@opensolaris.org
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic