[prev in list] [next in list] [prev in thread] [next in thread] 

List:       osdl-fastboot
Subject:    Re: [Fastboot] [FYI] kexec: design point and implementation for the
From:       Vivek Goyal <vgoyal () in ! ibm ! com>
Date:       2005-12-15 6:26:14
Message-ID: 20051215061414.GB5200 () in ! ibm ! com
[Download RAW message or body]

On Wed, Dec 14, 2005 at 02:09:32PM -0600, Milton Miller wrote:
> I saw a discussion that had occurred on IRC while I slept last night, 
> between a maintainer unfamiliar with the kexec design point and another 
> person.  I wrote this introduction up to provide some background for 
> someone who who knows an architecture but not the design of kexec and 
> the user-space vs kernel split.
> 
> It contains some information about what the kernel implements, some 
> background as to the kernel to loaded image interface, my design 
> decisions when writing the ppc64 port, my experiences testing the port, 
> and some background on the current state of kexec-tools.
> 
> I am posting it here as an FYI but feel free to incorporate this as 
> documentation either in kexec-tools or the kernel Documentation 
> directory.
> 
> milton
> 
> 
> 
> I.  The kernel level kexec design point.
> 
> Kexec has been evolved into a load syscall and a execute loaded image, 
> which is implemented as part of the reboot syscall.  The kexec_load 
> syscall specifies data buffers, the destination for these buffers, and 
> a single entry-point to continue execute after the data has been copied 
> to the specified memory location.  The kexec interface (both the load 
> and exec portions) provide no method to pass arguments or register 
> values, only memory contents and a single address as the entry point.  
> The entry point is expected to create a stack, load registers and setup 
> any other environment as necessary by the loaded codes calling 
> convention.  To explain why, a short description of the kexec exit 
> (exec) path is in order.
> 
> On most architectures, memory is copied in real mode by a small stub 
> (relocate-new-kernel) that is self-contained and position independent.  
> It is often stackless, or a stack is included in its allocated size.  
> The runtime location of this stub is allocated  by the kernel during 
> kexec_load so it will not conflict with the user specified target 
> memory.  This code parses the the descriptor list built by the generic 
> code and does the page-moves specified therein.  It then branches to 
> the address specified at load-time with a minimally specified state.
> 
> Instead of passing arguments or even defining a stack, userspace is 
> expected to load a trampoline that creates the environment expected by 
> the called image.  This includes establishing a stack and loading cpu 
> registers as required.  It may change cpu modes (for example, switch to 
> 16 or 32 bit mode on x86).  The combination of memory and an entry 
> point should be sufficient to write a stub to call any program, not 
> just a new kernel.
> 
> 
> II. Implementation for 64-bit PowerPC platforms (formerly ppc64)
> 
> II.A  Design decisions
> 
> PowerPC 64 bit platforms challenged the generic kexec code in some 
> areas.  Real mode does not provide access to all or even significant 
> portions of memory on some platforms, and the MMU is complex and 
> interfaced differently depending on the platform.  There is no method 
> to stop and restart cpus guaranteed to be available nor is there a way 
> for a cpu to find out which cpu it is in the system.
> 
> To avoid writing code to manage segments, page table, or RMO issues, on 
>  64 bit PowerPC the static kernel is blocked from being a kexec load 
> destination, and the copy normally done by relocate-new-kernel is done 
> using kernel facilities.  I deemed this restriction acceptable since 
> the kernel does not care where it is loaded except that it is linear 
> and within the RMO.  If some other application required a fixed 
> address, adding a copy loop to its trampoline should be minor.
> 
> Since the PowerPC architecture does not provide a method to stop and 
> restart a cpu and not all platforms do either, I had to do something to 
> handle "secondary" cpus.  Each cpu needed to know its hardware cpu 
> number (since there is no generic way to obtain this number) and needed 
> code to execute.  I could have said all cpus call the same entry point 
> passing the cpu number.
> 
> Instead, I decided to take inspiration from the interface between 
> prom_init and the kernel.  The image has two entry points (one for the 
> main thread and one for the secondary threads) specified as one 
> address.  The cpus hold their hardware cpu number in r3.  The secondary 
> cpus are instructed to branch to 0x60 after thier first 0x100 bytes of 
> the entry point are copied to address 0.  The primary (selected at 
> execution time) enters at the specified address.  Unlike the kernel, r3 
> contains the hardware cpu number as the primary is not selected at load 
> time.  Since the architecture requires the address to go though a 
> register for to start execution at an arbitrary destination, I defined 
> that r4 contains this value  and r5 contains 0 like as the kernel 
> requires (both could have been left unspecified).
> 
> 
> II.B Testing it out: user space tools
> 
> When I wrote the kernel code, the kexec-tools package was quite 
> architecture specific.  Much of it was obtaining architecture 
> information to create data structures used by the i386 kernel entry 
> points, and code written to be called was mixed with the code creating 
> the data and making the system calls.  However, there was some generic 
> code that expected things like allocating memory from an architecture 
> supplied map.  Because the ppc64 kernel entry point requirements were 
> similar to what could be provided by the kexec_load system call, I 
> decided it was easier to write code that just called the kexec syscalls 
> directly from a command line.  Hence the previously posted tools (See 
> the fastboot archives around April 2005).
> 
> While the kexec entry-point interface is close to what the kernel 
> expects, it is not exactly the same.  I wrote a short assembly 
> trampoline (called v2wrap for device-tree-struct version 2 wrapper) 
> that needed two arguments.  The last 8 bytes was patched with the 
> kernel load address, and the second argument, the device tree 
> structure, was assumed the device tree structure was immediately 
> following the code.  The trampoline stores the hardware cpu number into 
> the device-tree header structure (it was not known at load time), 
> marshals the secondary cpus while the actual kernels code is copied to 
> address 0, and calls the kernel entry-point with r3 pointing to the 
> device tree structure as expected by the kernel.
> 
> Since that code was written the kexec-tools package has improved, 
> separating infrastructure improved.
> 
> 
> III.  The kexec-tools package
> 
> The reference implementation for the user space part of kexec is 
> maintained in the kexec-tools package.  Release 1.101 has separated out 
> architecture specific code from the generic code.  It created the 
> purgatory concept where code to be loaded is built separately from the 
> code loader code.  The generic code has library functions to allocate, 
> relocate, and otherwise patch elf executables.  It relies on 
> architecture code to determine the available memory map, select the 
> pieces of code and their sequence, build any argument data structures 
> required, and patch purgatory code to load arguments as required.
> 
> The package includes some generic C code for purgatory to do things 
> like checksum the image that was loaded to check its integrity.  It is 
> anticipated that purgatory will do things like convert specific per-cpu 
> elf note buffers into the single crash-notes expected by the dump 
> kernel for kdump.

Merging of per cpu PT_NOTE headers and creating a single PT_NOTE to be
compatible with core file conventions is done bye second kernel.

> 
> While the generic purgatory code does have concept of debug print 
> method, the architecture code is responsible to actually produce any 
> output.
> 

> _______________________________________________
> fastboot mailing list
> fastboot@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/fastboot



_______________________________________________
fastboot mailing list
fastboot@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/fastboot


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic