[prev in list] [next in list] [prev in thread] [next in thread] 

List:       linux-power
Subject:    [linux-power][pre-patch] Suspend To RAM
From:       Patrick Mochel <mochel () transmeta ! com>
Date:       2001-04-19 18:17:59
[Download RAW message or body]


I tried this once, but since the patch was quite large, it is waiting for
approval from the list moderator. In the meantime, I'm resending it, with
a gzip'd patch.

If the moderator is reading, go ahead and deny the previous message.


---------


Suspend To RAM

FIRST: DO NOT USE THIS CODE!!!

Repeat: DO NOT USE THIS CODE!!!

This code is highly experimental and should only be touched by people that
are working on the _infrastructure_ concerning ACPI suspend states. 

Parts of this code have been known to work for me. Parts have not been tested at 
all. I should have added deliberate compile errors to prevent it from even compiling.
It is meant for demonstration purposes ONLY.

That said, here is a little explanation about what is going on.

I started working on the issue of making the kernel suspend and resume. I got a 
working implementation, which is what most of this code is, a little over a month
ago. But, I was at the point where I needed to wake the system back up somehow, so
I started looking into how to enable that. I found kernel/pm.c and started hacking 
around in it. Then I found drivers/pci/pci.c and soon found myself entrenched in
the problem of solving the suspend problem completely. No longer could I just 
concentrate on the kernel aspect of suspending.

First, I'll explain how the suspend/resume code works, then I'll talk about the
changes I made to kernel/pm.c.

If your system supports any sleep states, /proc/acpi/sleep/sleep will show up. I know
its redundant, but I'll fix it. You write the system sleep state you want to enter
into that file. An identity mapped page table is created, for graceful execution
in real mode when we come back, the processor register state is saved, and the 
proper values are written to the ACPI registers. Suddenly you find yourself asleep.
And, oh yes, the address of the wakeup vector is written to the FACS table to tell
the BIOS where to go when we wake up.
When that happens, and the BIOS jumps to the wakeup vector, we're in real mode.
%cr3 is set, paging is enabled, protected mode is entered, and a far jump is made
back to the kernel. This takes about 54 bytes :)

At this point the complete register state is restored. The original page table is
restored, and we return to the point where we left off. It seems simple enough, but
it is some nasty code.

The next piece of the puzzle is the devices. It's going to be difficult to get this
right, but I think that we can figure it out.

The main thing that we need for this to work is a complete tree view of all the
devices in the system. For PCI, this is easy, since it already exists. There is
even code written to walk the PCI device tree for suspending and resuming. And,
there is even code to set the power state of PCI devices that support the PCI PM
spec and have the PM Capabilities register space. Granted, I don't think that
any of the drivers actually make use of the suspend/resume callbacks in struct
pci_driver, or call pci_set_power_state, but to be honest, I haven't looked yet.

Our main problem comes in interfacing with not only the PCI tree, but also the
rest of the devices and buses in the system, like USB and PCMCIA. There is a 
pm_* API, but I thought that it was initially worthless. After some divine 
intervention and some playing with it tonight, I am starting to think otherwise.

pm.c maintains a list of all the devices that call pm_register(). This does us
no good if all the devices in the system register with it. The PM layer must 
guarantee a sane ordering when walking the tree - you cannot have buses going into
suspend before its child devices. And you can't access the device before its
parent is awake.

But, if the list in pm.c is a list of only the buses in the system, and the buses
handle the suspend/resume of its children, like PCI does, then it can actually
be useful to us. 

I modified that a bit more and made it a tree view that it kept of all the buses
and devices in the system. This was for a couple of reasons. One is that it may
be possible that the pm layer may want to touch all of the devices at some point.
Definitely not for the case of suspend or resume, but maybe for some other signal?
I suppose that whatever signal this is could just be passed on to the list of
buses, and they can take care of passing to their children, and that may work
out better. 

But, there is also the case of the interface to the user. This is primarily for
the ability to enable the device to generate wake events, which should be user-
configurable. By making a tree-view of all the devices, it can be easily 
exported to /proc.

When the PM layer is initialized, it creates /proc/pm. Every time a driver
registers with it, it creates a new subdirectory under its parent. Root 
buses and system devices (like the keyboard controller, serial ports, etc) have
pm_root as their root (the system root). Other drivers must pass a pointer to 
their parent's struct pm_dev. In each device's directory, besides child directories,
there are a handful of files - wake, state, and capabilities. Reading the first
confesses its ability to wake the system, writing to it sets that ability. The 
second tells the power state of the device, and writing to it can set it. The 
third could be used for describing what the device supports, as far as PM goes -
sleep states, latencies, etc.

This interface, and the tree in pm.c was just a quick hack so that I could set
wake events. I don't particularly love it, nor am convinced that it should stick 
around in that form. But, there does need to be some interface to set wake events,
and this was easy enough. I actually haven't tested it, and I have already thought
of a couple of bugs in it already. But, that's really besides the point.

There are a lot of issues and details that have not even begun to be addressed, 
such as dealing with the other buses in the system, dealing with system devices
on the motherboard, or even dealing the bridges themselves. That's not to mention
any synchronization issues, or dealing with the appropriate AML in the system. And
many more that I am not mentioning.

This code, though, along with a patch that Jeff Garzik just posted should get
us a lot closer to a working system.

-------------------------------------------------------------------------------

A couple of other notes on some code in this patch:

When we resume, the BIOS jumps to our code in real mode. This code must 
reside in low memory. Guaranteeing this is not possible, given current 
semantics. I modified the zone list that is kept of get_free_page to include
one from 0-1M, so that you may allocate a page down there. Note that this does
not interfere with anything else that tries to allocate memory, esp. devices
that need legacy DMA memory under 16M (If 1M - 16M is full, it will fall back
on the lowest zone).

real_mode_restart was modified to support sane handling of three-level page 
directories. There are also some useless cleanups in there. 

I modified the suspend and resume callbacks in struct pci_driver, as well
as all of the pci_pm_* functions to accept a 'state' parameter indicating the
desired sleep state to enter. Ideally, we should be tell a device to enter
D0-D3, not just D0 and D3. We may want to forgo this for now, though.

I also modified the calls to pm_register (in the keyboard, console, and pci
drivers) to pass a NULL in as their parent and an ascii string for their
"connector_id", which is some text desciption of where they reside on the 
bus. This helps mainly for the /proc interface.



Please, don't use this code unless you absolutely know what you're doing.
It's quite dangerous and likely to cause you and your computer lots of
grief. Other than that, enjoy, and good luck.

-pat


["acpi-19042001.diff.gz" (APPLICATION/octet-stream)]

[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic