[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freebsd-ppc
Subject:    Re: lr=u_trap+0x10 and srr0=k_trap+0x28 for "stopped at 0 illegal instruction 0" before-copyright ha
From:       Mark Millard <markmi () dsl-only ! net>
Date:       2014-09-27 10:51:32
Message-ID: E84C7587-E155-43A1-922F-848B112108C5 () dsl-only ! net
[Download RAW message or body]

I found the backtrace for the OF_peer call that leads to the "before \
copyright"/ofwcall-for-peer hang/crash in ofwcall. This happens to be the first \
ofwcall with pmap_bootstrapped!=0, which may be the biggest issue involved (for what \
it implies).

.OF_peer+0x8c
.powermac_smp_first_cpu+0x3c (OF_peer(0) below)
.platform_smp_first_cpu+0x78
.cpu_mp_setmaxid+0x2c (via .mpt_fc_els_reply_handler+0x2e68 that is not explicitly \
                listed)
.mp_setmaxid+0x14
.mi_startup0x10c
btext+0xbc

The source code involved is:

static int
powermac_smp_first_cpu(platform_t plat, struct cpuref *cpuref)
{
        char buf[8];
        phandle_t cpu, dev, root;
        int res;

        root = OF_peer(0);

        dev = OF_child(root);
        while (dev != 0) {
                res = OF_getprop(dev, "name", buf, sizeof(buf));
                if (res > 0 && strcmp(buf, "cpus") == 0)
                        break;
                dev = OF_peer(dev);
        }
        if (dev == 0) {
                /*
                 * psim doesn't have a name property on the /cpus node,
                 * but it can be found directly
                 */
                dev = OF_finddevice("/cpus");
                if (dev == -1)
                        return (ENOENT);
        }

        cpu = OF_child(dev);

        while (cpu != 0) {
                res = OF_getprop(cpu, "device_type", buf, sizeof(buf));
                if (res > 0 && strcmp(buf, "cpu") == 0)
                        break;
                cpu = OF_peer(cpu);
        }
        if (cpu == 0)
                return (ENOENT);

        return (powermac_smp_fill_cpuref(cpuref, cpu));
}

To check if the peer use is special I temporarily made OF_peer cache the node 0 \
result so only the first such call uses ofwcall. (The above is not the first such \
call.) The expectation is that the OF_child should then fail. And it does. So peer is \
not special: it is just whichever ofwcall argument type happens to be the first after \
pmap_bootstrapped!=0 that get the problem.




===
Mark Millard
markmi at dsl-only.net

On Sep 26, 2014, at 11:55 PM, Mark Millard <markmi at dsl-only.net> wrote:

According to my adjusted dumping: At the "before Copyright"/ofwcall-for-peer crash \
ofw_real_mode==0.

And that does turn off exception vector save/restore:

__inline void
ofw_save_trap_vec(char *save_trap_vec)
{
        if (!ofw_real_mode)
                return;

        bcopy((void *)EXC_RST, save_trap_vec, EXC_LAST - EXC_RST);
}

static __inline void
ofw_restore_trap_vec(char *restore_trap_vec)
{
        if (!ofw_real_mode)
                return;

        bcopy(restore_trap_vec, (void *)EXC_RST, EXC_LAST - EXC_RST);
        __syncicache(EXC_RSVD, EXC_LAST - EXC_RSVD);
}

So now it is clear to me how FreeBSD's exception vectors could be involved in a \
context that does not have FreeBSD's environment in place. (Finally!)

For powerpc64/GENERIC64 it should also then establish OFW_STD_32BIT:

boolean_t
OF_bootstrap()
{
        boolean_t status = FALSE;
                        
        if (openfirmware_entry != NULL) {
                if (ofw_real_mode) {
                        status = OF_install(OFW_STD_REAL, 0);
                } else {
                        #ifdef __powerpc64__
                        status = OF_install(OFW_STD_32BIT, 0);
                        #else
                        status = OF_install(OFW_STD_DIRECT, 0);
                        #endif
                }

This seems to be like OFW_STD_REAL in what it sets up: ofw_real_methods.

static ofw_def_t ofw_real = {
        OFW_STD_REAL,
        ofw_real_methods,
        0
};
OFW_DEF(ofw_real);

static ofw_def_t ofw_32bit = {
        OFW_STD_32BIT,
        ofw_real_methods,
        0
};
OFW_DEF(ofw_32bit);

ofw_real_mode is used to figure out the context when it matters from what I can tell \
so far.


Just to experiment to be sure I temporarily hacked in ignoring ofw_real_mode in \
ofw_save_trap_vec and ofw_restore_trap_vec so they would be effective at exception \
vector swapping.

As I guessed it still hangs before the copyright notice. (Without getting to DDB so \
no dump information is displayed.)






===
Mark Millard
markmi at dsl-only.net

On Sep 26, 2014, at 10:18 PM, Mark Millard <markmi at dsl-only.net> wrote:

The first send of this was big enough for the moderator to be involved. So I canceled \
and am sending with less history included.

[I'll note that I seem to have trouble typing 0xdbb290 vs. 0xbdd290. The actual value \
is 0xdbb290. The references to the incorrect typing should say 0xbdd290, which is the \
wrong value. But I've had both types of references listing the wrong text... in \
various notes.]

===
Mark Millard
markmi@dsl-only.net

On Sep 26, 2014, at 10:11 PM, Mark Millard <markmi@dsl-only.net> wrote:

The openfirmware peer crash (i.e., the before Copyright notice crash) happens \
during/just-after the MMU setup and the peer pfwcall is the first ofwcall where \
pmap_bootstrapped is non-zero at the time. In other words: the very first ofwcall in \
the new context fails.

And this failure involves some of the same code area that I got a backtrace for and \
reported as a separate crash (with the trace listed). As a reminder for that \
backtrace that has a difference failure point:

.pvo_vaddr_compare+0x14, instruction ld r0, r4, 0x58 [or ld r0,88(r4) in an alternate \
                notation]
.pvo_tree_RB_FIND+0x38
.moea64_dev_direct_mapped_0x90
.pmap_dev_direct_mapped+0x84 ("_dev" was missing in earlier note)
.bs_remap_earlyboot_0x6c
.moea64_late_bootstrap+0x178
.moea64_bootstrap_native+0x120
.pmap_bootstrap+0xac
.powerpc_init+0x514
btext+0xa8

As for the sequence of ofwcall's that I reported: starting at the last OF_finddevice \
before the OF_instance_to_package that I reported in the sequence of ofwcall's from \
quiesce until the crash...

moea64_late_bootstrap does

        chosen = OF_finddevice("/chosen");
        if (chosen != -1 && OF_getprop(chosen, "mmu", &mmui, 4) != -1) {
            mmu = OF_instance_to_package(mmui);
            if (mmu == -1 || (sz = OF_getproplen(mmu, "translations")) == -1)
                sz = 0;
            if (sz > 6144 /* tmpstksz - 2 KB headroom */)
                panic("moea64_bootstrap: too many ofw translations");
                        
            if (sz > 0)
                moea64_add_ofw_mappings(mmup, mmu, sz);
        }

with moea64_add_ofw_mappings called. Then...

moea64_add_ofw_mappings does...

        bzero(translations, sz);
        OF_getprop(OF_finddevice("/"), "#address-cells", &acells,
            sizeof(acells));
        if (OF_getprop(mmu, "translations", trans_cells, sz) == -1)
                panic("moea64_bootstrap: can't get ofw translations");

And it is the next ofwcall after that last OF_getprop that fails. (It happens to be a \
peer request.) Adding a dump of the pmap_bootstrapped value with the ofwcall name in \
my hack for reporting things about the crash confirmed that peer ofwcall as the first \
with pmap_bootstrapped non-zero.

I will note here that it is somewhat later than the above code that pvo_vaddr_compare \
ends up executing via bs_remap_earlyboot. That earlier moea64_late_bootstrap code \
continues after the } from the first if above with:

        /*
         * Calculate the last available physical address.
         */
        for (i = 0; phys_avail[i + 2] != 0; i += 2)
                ;
        Maxmem = powerpc_btop(phys_avail[i + 1]);

        /*
         * Initialize MMU and remap early physical mappings
         */
        MMU_CPU_BOOTSTRAP(mmup,0);
        mtmsr(mfmsr() | PSL_DR | PSL_IR);
        pmap_bootstrapped++;
        bs_remap_earlyboot();

(and more). I've not found the peer call yet but it may well be after the \
pvo_vaddr_compare shown above as far as execution order goes.





===
Mark Millard
markmi at dsl-only.net

On Sep 25, 2014, at 2:41 PM, Mark Millard <markmi at dsl-only.net> wrote:

The first boot after make -8 kernel without quiesce also died during peer, I'd guess \
the same one.

Looks like quiesce does not matter for the issue. (But it is handy for identifying \
which peer fails.)



===
Mark Millard
markmi at dsl-only.net

On Sep 25, 2014, at 2:08 PM, Nathan Whitehorn <nwhitehorn at freebsd.org> wrote:

Can you comment out the call to quiesce? It may not be necessary on your system.
-Nathan

On 09/25/14 13:17, Mark Millard wrote:
> The "before copyright" hang/exception is during the first openfirmware "peer" after \
> "quiesce". The ofw_restore_trap_vec(save_trap_init) completes fine, the \
> ofwcall(args) is made but it does not return normally. 
> Ignoring the ofwcall's from before quiesce, the sequence of ofwcall's is:
> 
> quiesce
> finddevice
> parent
> getprop
> getprop
> getprop
> finddevice
> getprop
> instance-to-package
> getproplen
> finddevice
> getprop
> getprop
> peer
> 
> And when the boot fails before the copyright that ofwcall for peer ends up \
> resulting in the register dump with no register pointing to the kernel's normal \
> stack area. 
> I still have no clue what is happening during peer. \
> ofw_restore_trap_vec(save_trap_init) is being called and is returning before \
> ofwcall is used. For all I know some uses of peer could require not being quiesce'd \
> in order for peer to be reliable. 
> In the form of my display indicating what executed the text reported ends in:
> 
> <peer>^
> 
> where the ^ indicates the stage that last completed in the call sequence inside \
> openfirmware_core. This information is displayed by the 
> x/s ofw_name_history
> 
> in the automatically created default script for DDB. I read the sequence backwards \
> from the end marker (here ^), following the wraparound if there is that much text \
> and if I care to go back that far. 
> FreeBSD FBSDG5M1 10.1-BETA2 FreeBSD 10.1-BETA2 #11 r271944M: Thu Sep 25 12:14:05 \
> PDT 2014     root@FBSDG5M1:/usr/obj/usr/src/sys/GENERIC64  powerpc 
> My current hacks to get this information are:
> 
> Index: /usr/src/sys/ddb/db_script.c
> ===================================================================
> --- /usr/src/sys/ddb/db_script.c (revision 271944)
> +++ /usr/src/sys/ddb/db_script.c (working copy)
> @@ -319,10 +319,25 @@
> {
> char scriptname[DB_MAXSCRIPTNAME];
> 
> + /* HACK!!! : Additional lines to force a basic default script to exist.
> +  * Will dump information even if ddb input is not available for early crash.
> +  * Used to get more information about PowerMac G5 "before Copyright" hangs.
> +  */
> + struct ddb_script *dsp = db_script_lookup(DB_SCRIPT_KDBENTER_DEFAULT);
> + if (!dsp) db_script_set(DB_SCRIPT_KDBENTER_DEFAULT, "show registers; bt; x/s \
> ofw_name_history"); +
> snprintf(scriptname, sizeof(scriptname), "%s.%s",
> DB_SCRIPT_KDBENTER_PREFIX, eventname);
> if (db_script_exec(scriptname, 0) == ENOENT)
> (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> +
> + /* HACK!!! : Additional lines to always use the default script,
> +  *           even if scriptname existed and was executed.
> +  * Will dump information even if ddb input is not available for early crash.
> +  * Used to get more information about PowerMac G5 "before Copyright" hangs.
> +  */
> + else
> + (void)db_script_exec(DB_SCRIPT_KDBENTER_DEFAULT, 0);
> }
> 
> /*-
> Index: /usr/src/sys/powerpc/conf/GENERIC64
> ===================================================================
> --- /usr/src/sys/powerpc/conf/GENERIC64 (revision 271944)
> +++ /usr/src/sys/powerpc/conf/GENERIC64 (working copy)
> @@ -76,6 +76,8 @@
> # Debugging support.  Always need this:
> options   KDB # Enable kernel debugger support.
> options   KDB_TRACE # Print a stack trace for a panic.
> +options   DDB
> +options   GDB
> 
> # Make an SMP-capable kernel by default
> options   SMP # Symmetric MultiProcessor Kernel
> Index: /usr/src/sys/powerpc/ofw/ofw_machdep.c
> ===================================================================
> --- /usr/src/sys/powerpc/ofw/ofw_machdep.c (revision 271944)
> +++ /usr/src/sys/powerpc/ofw/ofw_machdep.c (working copy)
> @@ -324,6 +324,12 @@
> openfirmware(&args);
> }
> 
> +/* Part of HACK to have record of ofw call names */
> +#define ofw_name_history_record_size 256
> +char ofw_name_history[ofw_name_history_record_size+1] = {}; /* Initially: \
> automatically '\0' filled */ +char * ofw_name_history_pos = ofw_name_history;
> +/* End Part of HACK */
> +
> static int
> openfirmware_core(void *args)
> {
> @@ -330,6 +336,42 @@
> int result;
> register_t oldmsr;
> 
> + { /* HACK to have record of ofw call names */
> + struct argtype_prefix {
> + cell_t name;
> + };
> +
> + char *name = (char*) (uintptr_t) (((struct argtype_prefix*)args)->name);
> + 
> + int i;
> +
> + *ofw_name_history_pos = '<';
> +
> + for(i=0; (*name) && i!=20; i++) {
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = *name;
> +
> + name++;
> + }
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = '>';
> +
> + ofw_name_history_pos++;
> + if (ofw_name_history_pos == &ofw_name_history[ofw_name_history_record_size]) {
> + ofw_name_history_pos = ofw_name_history;
> + }
> + *ofw_name_history_pos = '@';
> +
> + ofw_name_history[ofw_name_history_record_size] = '\0'; /* Paranoia */
> + } /* HACK end */
> +
> /*
> * Turn off exceptions - we really don't want to end up
> * anywhere unexpected with PCPU set to something strange
> @@ -337,14 +379,22 @@
> */
> oldmsr = intr_disable();
> 
> + *ofw_name_history_pos = '#'; /* HACK */
> +
> ofw_sprg_prepare();
> 
> + *ofw_name_history_pos = '$'; /* HACK */
> +
> /* Save trap vectors */
> ofw_save_trap_vec(save_trap_of);
> 
> + *ofw_name_history_pos = '%'; /* HACK */
> +
> /* Restore initially saved trap vectors */
> ofw_restore_trap_vec(save_trap_init);
> 
> + *ofw_name_history_pos = '^'; /* HACK */
> +
> #if defined(AIM) && !defined(__powerpc64__)
> /*
> * Clear battable[] translations
> @@ -357,13 +407,21 @@
> 
> result = ofwcall(args);
> 
> + *ofw_name_history_pos = '&'; /* HACK */
> +
> /* Restore trap vecotrs */
> ofw_restore_trap_vec(save_trap_of);
> 
> + *ofw_name_history_pos = '*'; /* HACK */
> +
> ofw_sprg_restore();
> 
> + *ofw_name_history_pos = '~'; /* HACK */
> +
> intr_restore(oldmsr);
> 
> + *ofw_name_history_pos = '!'; /* HACK */
> +
> return (result);
> }
> 
> 
> 
> 
> 
> ===
> Mark Millard
> markmi at dsl-only.net
> 


_______________________________________________
freebsd-ppc@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-ppc
To unsubscribe, send any mail to "freebsd-ppc-unsubscribe@freebsd.org"


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic