[prev in list] [next in list] [prev in thread] [next in thread]
List: osdl-fastboot
Subject: Re: [Fastboot] [FYI] kexec: design point and implementation for the
From: Vivek Goyal <vgoyal () in ! ibm ! com>
Date: 2005-12-15 6:26:14
Message-ID: 20051215061414.GB5200 () in ! ibm ! com
[Download RAW message or body]
On Wed, Dec 14, 2005 at 02:09:32PM -0600, Milton Miller wrote:
> I saw a discussion that had occurred on IRC while I slept last night,
> between a maintainer unfamiliar with the kexec design point and another
> person. I wrote this introduction up to provide some background for
> someone who who knows an architecture but not the design of kexec and
> the user-space vs kernel split.
>
> It contains some information about what the kernel implements, some
> background as to the kernel to loaded image interface, my design
> decisions when writing the ppc64 port, my experiences testing the port,
> and some background on the current state of kexec-tools.
>
> I am posting it here as an FYI but feel free to incorporate this as
> documentation either in kexec-tools or the kernel Documentation
> directory.
>
> milton
>
>
>
> I. The kernel level kexec design point.
>
> Kexec has been evolved into a load syscall and a execute loaded image,
> which is implemented as part of the reboot syscall. The kexec_load
> syscall specifies data buffers, the destination for these buffers, and
> a single entry-point to continue execute after the data has been copied
> to the specified memory location. The kexec interface (both the load
> and exec portions) provide no method to pass arguments or register
> values, only memory contents and a single address as the entry point.
> The entry point is expected to create a stack, load registers and setup
> any other environment as necessary by the loaded codes calling
> convention. To explain why, a short description of the kexec exit
> (exec) path is in order.
>
> On most architectures, memory is copied in real mode by a small stub
> (relocate-new-kernel) that is self-contained and position independent.
> It is often stackless, or a stack is included in its allocated size.
> The runtime location of this stub is allocated by the kernel during
> kexec_load so it will not conflict with the user specified target
> memory. This code parses the the descriptor list built by the generic
> code and does the page-moves specified therein. It then branches to
> the address specified at load-time with a minimally specified state.
>
> Instead of passing arguments or even defining a stack, userspace is
> expected to load a trampoline that creates the environment expected by
> the called image. This includes establishing a stack and loading cpu
> registers as required. It may change cpu modes (for example, switch to
> 16 or 32 bit mode on x86). The combination of memory and an entry
> point should be sufficient to write a stub to call any program, not
> just a new kernel.
>
>
> II. Implementation for 64-bit PowerPC platforms (formerly ppc64)
>
> II.A Design decisions
>
> PowerPC 64 bit platforms challenged the generic kexec code in some
> areas. Real mode does not provide access to all or even significant
> portions of memory on some platforms, and the MMU is complex and
> interfaced differently depending on the platform. There is no method
> to stop and restart cpus guaranteed to be available nor is there a way
> for a cpu to find out which cpu it is in the system.
>
> To avoid writing code to manage segments, page table, or RMO issues, on
> 64 bit PowerPC the static kernel is blocked from being a kexec load
> destination, and the copy normally done by relocate-new-kernel is done
> using kernel facilities. I deemed this restriction acceptable since
> the kernel does not care where it is loaded except that it is linear
> and within the RMO. If some other application required a fixed
> address, adding a copy loop to its trampoline should be minor.
>
> Since the PowerPC architecture does not provide a method to stop and
> restart a cpu and not all platforms do either, I had to do something to
> handle "secondary" cpus. Each cpu needed to know its hardware cpu
> number (since there is no generic way to obtain this number) and needed
> code to execute. I could have said all cpus call the same entry point
> passing the cpu number.
>
> Instead, I decided to take inspiration from the interface between
> prom_init and the kernel. The image has two entry points (one for the
> main thread and one for the secondary threads) specified as one
> address. The cpus hold their hardware cpu number in r3. The secondary
> cpus are instructed to branch to 0x60 after thier first 0x100 bytes of
> the entry point are copied to address 0. The primary (selected at
> execution time) enters at the specified address. Unlike the kernel, r3
> contains the hardware cpu number as the primary is not selected at load
> time. Since the architecture requires the address to go though a
> register for to start execution at an arbitrary destination, I defined
> that r4 contains this value and r5 contains 0 like as the kernel
> requires (both could have been left unspecified).
>
>
> II.B Testing it out: user space tools
>
> When I wrote the kernel code, the kexec-tools package was quite
> architecture specific. Much of it was obtaining architecture
> information to create data structures used by the i386 kernel entry
> points, and code written to be called was mixed with the code creating
> the data and making the system calls. However, there was some generic
> code that expected things like allocating memory from an architecture
> supplied map. Because the ppc64 kernel entry point requirements were
> similar to what could be provided by the kexec_load system call, I
> decided it was easier to write code that just called the kexec syscalls
> directly from a command line. Hence the previously posted tools (See
> the fastboot archives around April 2005).
>
> While the kexec entry-point interface is close to what the kernel
> expects, it is not exactly the same. I wrote a short assembly
> trampoline (called v2wrap for device-tree-struct version 2 wrapper)
> that needed two arguments. The last 8 bytes was patched with the
> kernel load address, and the second argument, the device tree
> structure, was assumed the device tree structure was immediately
> following the code. The trampoline stores the hardware cpu number into
> the device-tree header structure (it was not known at load time),
> marshals the secondary cpus while the actual kernels code is copied to
> address 0, and calls the kernel entry-point with r3 pointing to the
> device tree structure as expected by the kernel.
>
> Since that code was written the kexec-tools package has improved,
> separating infrastructure improved.
>
>
> III. The kexec-tools package
>
> The reference implementation for the user space part of kexec is
> maintained in the kexec-tools package. Release 1.101 has separated out
> architecture specific code from the generic code. It created the
> purgatory concept where code to be loaded is built separately from the
> code loader code. The generic code has library functions to allocate,
> relocate, and otherwise patch elf executables. It relies on
> architecture code to determine the available memory map, select the
> pieces of code and their sequence, build any argument data structures
> required, and patch purgatory code to load arguments as required.
>
> The package includes some generic C code for purgatory to do things
> like checksum the image that was loaded to check its integrity. It is
> anticipated that purgatory will do things like convert specific per-cpu
> elf note buffers into the single crash-notes expected by the dump
> kernel for kdump.
Merging of per cpu PT_NOTE headers and creating a single PT_NOTE to be
compatible with core file conventions is done bye second kernel.
>
> While the generic purgatory code does have concept of debug print
> method, the architecture code is responsible to actually produce any
> output.
>
> _______________________________________________
> fastboot mailing list
> fastboot@lists.osdl.org
> https://lists.osdl.org/mailman/listinfo/fastboot
_______________________________________________
fastboot mailing list
fastboot@lists.osdl.org
https://lists.osdl.org/mailman/listinfo/fastboot
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic