[prev in list] [next in list] [prev in thread] [next in thread]
List: freebsd-ppc
Subject: Re: lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright ha
From: Mark Millard <markmi () dsl-only ! net>
Date: 2014-09-27 10:51:32
Message-ID: E84C7587-E155-43A1-922F-848B112108C5 () dsl-only ! net
[Download RAW message or body]
I found the backtrace for the OF_peer call that leads to the "before \
copyright"/ofwcall-for-peer hang/crash in ofwcall. This happens to be the first \
ofwcall with pmap_bootstrapped!=0, which may be the biggest issue involved (for what \
it implies).
.OF_peer+0x8c
.powermac_smp_first_cpu+0x3c (OF_peer(0) below)
.platform_smp_first_cpu+0x78
.cpu_mp_setmaxid+0x2c (via .mpt_fc_els_reply_handler+0x2e68 that is not explicitly \
listed)
.mp_setmaxid+0x14
.mi_startup0x10c
btext+0xbc
The source code involved is:
static int
powermac_smp_first_cpu(platform_t plat, struct cpuref *cpuref)
{
char buf[8];
phandle_t cpu, dev, root;
int res;
root = OF_peer(0);
dev = OF_child(root);
while (dev != 0) {
res = OF_getprop(dev, "name", buf, sizeof(buf));
if (res > 0 && strcmp(buf, "cpus") == 0)
break;
dev = OF_peer(dev);
}
if (dev == 0) {
/*
* psim doesn't have a name property on the /cpus node,
* but it can be found directly
*/
dev = OF_finddevice("/cpus");
if (dev == -1)
return (ENOENT);
}
cpu = OF_child(dev);
while (cpu != 0) {
res = OF_getprop(cpu, "device_type", buf, sizeof(buf));
if (res > 0 && strcmp(buf, "cpu") == 0)
break;
cpu = OF_peer(cpu);
}
if (cpu == 0)
return (ENOENT);
return (powermac_smp_fill_cpuref(cpuref, cpu));
}
To check if the peer use is special I temporarily made OF_peer cache the node 0 \
result so only the first such call uses ofwcall. (The above is not the first such \
call.) The expectation is that the OF_child should then fail. And it does. So peer is \
not special: it is just whichever ofwcall argument type happens to be the first after \
pmap_bootstrapped!=0 that get the problem.
===
Mark Millard
markmi at dsl-only.net
On Sep 26, 2014, at 11:55 PM, Mark Millard <markmi at dsl-only.net> wrote:
According to my adjusted dumping: At the "before Copyright"/ofwcall-for-peer crash \
ofw_real_mode==0.
And that does turn off exception vector save/restore:
__inline void
ofw_save_trap_vec(char *save_trap_vec)
{
if (!ofw_real_mode)
return;
bcopy((void *)EXC_RST, save_trap_vec, EXC_LAST - EXC_RST);
}
static __inline void
ofw_restore_trap_vec(char *restore_trap_vec)
{
if (!ofw_real_mode)
return;
bcopy(restore_trap_vec, (void *)EXC_RST, EXC_LAST - EXC_RST);
__syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD);
}
So now it is clear to me how FreeBSD's exception vectors could be involved in a \
context that does not have FreeBSD's environment in place. (Finally!)
For powerpc64/GENERIC64 it should also then establish OFW_STD_32BIT:
boolean_t
OF_bootstrap()
{
boolean_t status = FALSE;
if (openfirmware_entry != NULL) {
if (ofw_real_mode) {
status = OF_install(OFW_STD_REAL, 0);
} else {
#ifdef __powerpc64__
status = OF_install(OFW_STD_32BIT, 0);
#else
status = OF_install(OFW_STD_DIRECT, 0);
#endif
}
This seems to be like OFW_STD_REAL in what it sets up: ofw_real_methods.
static ofw_def_t ofw_real = {
OFW_STD_REAL,
ofw_real_methods,
0
};
OFW_DEF(ofw_real);
static ofw_def_t ofw_32bit = {
OFW_STD_32BIT,
ofw_real_methods,
0
};
OFW_DEF(ofw_32bit);
ofw_real_mode is used to figure out the context when it matters from what I can tell \
so far.
Just to experiment to be sure I temporarily hacked in ignoring ofw_real_mode in \
ofw_save_trap_vec and ofw_restore_trap_vec so they would be effective at exception \
vector swapping.
As I guessed it still hangs before the copyright notice. (Without getting to DDB so \
no dump information is displayed.)
===
Mark Millard
markmi at dsl-only.net
On Sep 26, 2014, at 10:18 PM, Mark Millard <markmi at dsl-only.net> wrote:
The first send of this was big enough for the moderator to be involved. So I canceled \
and am sending with less history included.
[I'll note that I seem to have trouble typing 0xdbb290 vs. 0xbdd290. The actual value \
is 0xdbb290. The references to the incorrect typing should say 0xbdd290, which is the \
wrong value. But I've had both types of references listing the wrong text... in \
various notes.]
===
Mark Millard
markmi@dsl-only.net
On Sep 26, 2014, at 10:11 PM, Mark Millard <markmi@dsl-only.net> wrote:
The openfirmware peer crash (i.e., the before Copyright notice crash) happens \
during/just-after the MMU setup and the peer pfwcall is the first ofwcall where \
pmap_bootstrapped is non-zero at the time. In other words: the very first ofwcall in \
the new context fails.
And this failure involves some of the same code area that I got a backtrace for and \
reported as a separate crash (with the trace listed). As a reminder for that \
backtrace that has a difference failure point:
.pvo_vaddr_compare+0x14, instruction ld r0, r4, 0x58 [or ld r0,88(r4) in an alternate \
notation]
.pvo_tree_RB_FIND+0x38
.moea64_dev_direct_mapped_0x90
.pmap_dev_direct_mapped+0x84 ("_dev" was missing in earlier note)
.bs_remap_earlyboot_0x6c
.moea64_late_bootstrap+0x178
.moea64_bootstrap_native+0x120
.pmap_bootstrap+0xac
.powerpc_init+0x514
btext+0xa8
As for the sequence of ofwcall's that I reported: starting at the last OF_finddevice \
before the OF_instance_to_package that I reported in the sequence of ofwcall's from \
quiesce until the crash...
moea64_late_bootstrap does
chosen = OF_finddevice("/chosen");
if (chosen != -1 && OF_getprop(chosen, "mmu", &mmui, 4) != -1) {
mmu = OF_instance_to_package(mmui);
if (mmu == -1 || (sz = OF_getproplen(mmu, "translations")) == -1)
sz = 0;
if (sz > 6144 /* tmpstksz - 2 KB headroom */)
panic("moea64_bootstrap: too many ofw translations");
if (sz > 0)
moea64_add_ofw_mappings(mmup, mmu, sz);
}
with moea64_add_ofw_mappings called. Then...
moea64_add_ofw_mappings does...
bzero(translations, sz);
OF_getprop(OF_finddevice("/"), "#address-cells", &acells,
sizeof(acells));
if (OF_getprop(mmu, "translations", trans_cells, sz) == -1)
panic("moea64_bootstrap: can't get ofw translations");
And it is the next ofwcall after that last OF_getprop that fails. (It happens to be a \
peer request.) Adding a dump of the pmap_bootstrapped value with the ofwcall name in \
my hack for reporting things about the crash confirmed that peer ofwcall as the first \
with pmap_bootstrapped non-zero.
I will note here that it is somewhat later than the above code that pvo_vaddr_compare \
ends up executing via bs_remap_earlyboot. That earlier moea64_late_bootstrap code \
continues after the } from the first if above with:
/*
* Calculate the last available physical address.
*/
for (i = 0; phys_avail[i + 2] != 0; i += 2)
;
Maxmem = powerpc_btop(phys_avail[i + 1]);
/*
* Initialize MMU and remap early physical mappings
*/
MMU_CPU_BOOTSTRAP(mmup,0);
mtmsr(mfmsr() | PSL_DR | PSL_IR);
pmap_bootstrapped++;
bs_remap_earlyboot();
(and more). I've not found the peer call yet but it may well be after the \
pvo_vaddr_compare shown above as far as execution order goes.
===
Mark Millard
markmi at dsl-only.net
On Sep 25, 2014, at 2:41 PM, Mark Millard <markmi at dsl-only.net> wrote:
The first boot after make -8 kernel without quiesce also died during peer, I'd guess \
the same one.
Looks like quiesce does not matter for the issue. (But it is handy for identifying \
which peer fails.)
===
Mark Millard
markmi at dsl-only.net
On Sep 25, 2014, at 2:08 PM, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:
Can you comment out the call to quiesce? It may not be necessary on your system.
-Nathan
On 09/25/14 13:17, Mark Millard wrote:
> The "before copyright" hang/exception is during the first openfirmware "peer" after \
> "quiesce". The ofw_restore_trap_vec(save_trap_init) completes fine, the \
> ofwcall(args) is made but it does not return normally.
> Ignoring the ofwcall's from before quiesce, the sequence of ofwcall's is:
>
> quiesce
> finddevice
> parent
> getprop
> getprop
> getprop
> finddevice
> getprop
> instance-to-package
> getproplen
> finddevice
> getprop
> getprop
> peer
>
> And when the boot fails before the copyright that ofwcall for peer ends up \
> resulting in the register dump with no register pointing to the kernel's normal \
> stack area.
> I still have no clue what is happening during peer. \
> ofw_restore_trap_vec(save_trap_init) is being called and is returning before \
> ofwcall is used. For all I know some uses of peer could require not being quiesce'd \
> in order for peer to be reliable.
> In the form of my display indicating what executed the text reported ends in:
>
> <peer>^
>
> where the ^ indicates the stage that last completed in the call sequence inside \
> openfirmware_core. This information is displayed by the
> x/s ofw_name_history
>
> in the automatically created default script for DDB. I read the sequence backwards \
> from the end marker (here ^), following the wraparound if there is that much text \
> and if I care to go back that far.
> FreeBSD FBSDG5M1 10.1-BETA2 FreeBSD 10.1-BETA2 #11 r271944M: Thu Sep 25 12:14:05 \
> PDT 2014 root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64 powerpc
> My current hacks to get this information are:
>
> Index: /usr/src/sys/ddb/db_script.c
> ===================================================================
> --- /usr/src/sys/ddb/db_script.c (revision 271944)
> +++ /usr/src/sys/ddb/db_script.c (working copy)
> @@ -319,10 +319,25 @@
> {
> char scriptname[DB_MAXSCRIPTNAME];
>
> + /* HACK!!! : Additional lines to force a basic default script to exist.
> + * Will dump information even if ddb input is not available for early crash.
> + * Used to get more information about PowerMac G5 "before Copyright" hangs.
> + */
> + struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
> + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; bt; x/s \
> ofw_name_history"); +
> snprintf(scriptname, sizeof(scriptname), "%s.%s",
> DB_SCRIPT_KDBENTER_PREFIX, eventname);
> if (db_script_exec(scriptname, 0) == ENOENT)
> (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> +
> + /* HACK!!! : Additional lines to always use the default script,
> + * even if scriptname existed and was executed.
> + * Will dump information even if ddb input is not available for early crash.
> + * Used to get more information about PowerMac G5 "before Copyright" hangs.
> + */
> + else
> + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> }
>
> /*-
> Index: /usr/src/sys/powerpc/conf/GENERIC64
> ===================================================================
> --- /usr/src/sys/powerpc/conf/GENERIC64 (revision 271944)
> +++ /usr/src/sys/powerpc/conf/GENERIC64 (working copy)
> @@ -76,6 +76,8 @@
> # Debugging support. Always need this:
> options KDB # Enable kernel debugger support.
> options KDB_TRACE # Print a stack trace for a panic.
> +options DDB
> +options GDB
>
> # Make an SMP-capable kernel by default
> options SMP # Symmetric MultiProcessor Kernel
> Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c
> ===================================================================
> --- /usr/src/sys/powerpc/ofw/ofw_machdep.c (revision 271944)
> +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c (working copy)
> @@ -324,6 +324,12 @@
> openfirmware(&args);
> }
>
> +/* Part of HACK to have record of ofw call names */
> +#define ofw_name_history_record_size 256
> +char ofw_name_history[ofw_name_history_record_size+1] = {}; /* Initially: \
> automatically '\0' filled */ +char * ofw_name_history_pos = ofw_name_history;
> +/* End Part of HACK */
> +
> static int
> openfirmware_core(void *args)
> {
> @@ -330,6 +336,42 @@
> int result;
> register_t oldmsr;
>
> + { /* HACK to have record of ofw call names */
> + struct argtype_prefix {
> + cell_t name;
> + };
> +
> + char *name = (char*) (uintptr_t) (((struct argtype_prefix*)args)->name);
> +
> + int i;
> +
> + *ofw_name_history_pos = '<';
> +
> + for(i=0; (*name) && i!=20; i++) {
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = *name;
> +
> + name++;
> + }
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = '>';
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = '@';
> +
> + ofw_name_history[ofw_name_history_record_size] = '\0'; /* Paranoia */
> + } /* HACK end */
> +
> /*
> * Turn off exceptions - we really don't want to end up
> * anywhere unexpected with PCPU set to something strange
> @@ -337,14 +379,22 @@
> */
> oldmsr = intr_disable();
>
> + *ofw_name_history_pos = '#'; /* HACK */
> +
> ofw_sprg_prepare();
>
> + *ofw_name_history_pos = '$'; /* HACK */
> +
> /* Save trap vectors */
> ofw_save_trap_vec(save_trap_of);
>
> + *ofw_name_history_pos = '%'; /* HACK */
> +
> /* Restore initially saved trap vectors */
> ofw_restore_trap_vec(save_trap_init);
>
> + *ofw_name_history_pos = '^'; /* HACK */
> +
> #if defined(AIM) && !defined(__powerpc64__)
> /*
> * Clear battable[] translations
> @@ -357,13 +407,21 @@
>
> result = ofwcall(args);
>
> + *ofw_name_history_pos = '&'; /* HACK */
> +
> /* Restore trap vecotrs */
> ofw_restore_trap_vec(save_trap_of);
>
> + *ofw_name_history_pos = '*'; /* HACK */
> +
> ofw_sprg_restore();
>
> + *ofw_name_history_pos = '~'; /* HACK */
> +
> intr_restore(oldmsr);
>
> + *ofw_name_history_pos = '!'; /* HACK */
> +
> return (result);
> }
>
>
>
>
>
> ===
> Mark Millard
> markmi at dsl-only.net
>
_______________________________________________
freebsd-ppc@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "freebsd-ppc-unsubscribe@freebsd.org"
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic