--===============4554200391576093686== Content-Type: multipart/alternative; boundary="001a1140ad5e0a5bbf0561c87bb3" --001a1140ad5e0a5bbf0561c87bb3 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, IS_FATAL_ERROR sounds like a utility macro, I would add a parameter to test the flags and make it return a Bool compatible value. My 2 cents comments. Cheers Le 2 janv. 2018 11:05, "Stefan Fritsch" a =C3=A9crit : Hi, on a Fujitsu E744 laptop we have a problem that sometimes there is a very long delay (up to several minutes) when booting from hard disk. It seems accessing the DVD drive (which has no disk inserted) sometimes fails with some errors that are not correctly handled by grub, which leads to each access being stalled until the 20s timeout triggers. This seems to happen when grub is trying to read filesystem/partition data. Example debug output: disk/ahci.c:1020: AHCI command scheduled disk/ahci.c:1022: AHCI tfd =3D 50 disk/ahci.c:1024: AHCI inten =3D 7dc000ff disk/ahci.c:1026: AHCI intstatus =3D 3 disk/ahci.c:1031: AHCI inten =3D 7dc000ff disk/ahci.c:1033: AHCI tfd =3D 50 disk/ahci.c:1036: AHCI sig =3D eb140101 disk/ahci.c:1038: AHCI tfd =3D 2051 disk/ahci.c:1049: AHCI status <1 1 40000001 2051> disk/ahci.c:926: AHCI tfd =3D 2051 disk/ahci.c:932: AHCI tfd =3D 2051 disk/ahci.c:936: AHCI tfd =3D 2051 disk/ahci.c:942: grub_ahci_read (size=3D0, cmdsize =3D 0) disk/ahci.c:957: AHCI tfd =3D 2051, CL=3D0xda455000 disk/ahci.c:968: AHCI tfd =3D 2051 disk/ahci.c:980: AHCI tfd =3D 2051 disk/ahci.c:987: AHCI tfd =3D 2051 disk/ahci.c:998: cfis: 27 80 08 00 00 00 00 00 disk/ahci.c:1003: cfis: 00 00 00 00 00 00 00 00 disk/ahci.c:1015: PRDT =3D da453000, 0, ffffffff (128) The problem is that the command_issue bit that is checked in the loop is only reset if the "HBA receives a FIS which clears the BSY, DRQ, and ERR bits for the command", but the ERR bit is never cleared. The patch below seems to fix the issue by checking the error bits in the interrupt status register. According to the AHCI 1.2 spec, "Interrupt sources that are disabled (=E2=80=980=E2=80=99) are still reflec= ted in the status registers.", so this should work even though grub uses polling. The relevant bit in our case is the Task File Error Status (TFES), which is equivalent to the ERR bit 0 in tfd. But the patch below also checks the other error bits except for the "Interface non-fatal error status" bit. Cheers, Stefan diff --git a/grub-core/disk/ahci.c b/grub-core/disk/ahci.c --- a/grub-core/disk/ahci.c +++ b/grub-core/disk/ahci.c @@ -82,6 +82,19 @@ enum grub_ahci_hba_port_command GRUB_AHCI_HBA_PORT_CMD_FR =3D 0x4000, }; +enum grub_ahci_hba_port_int_status + { + GRUB_AHCI_HBA_PORT_IS_IFS =3D (1UL << 27), + GRUB_AHCI_HBA_PORT_IS_HBDS =3D (1UL << 28), + GRUB_AHCI_HBA_PORT_IS_HBFS =3D (1UL << 29), + GRUB_AHCI_HBA_PORT_IS_TFES =3D (1UL << 30), + }; +#define IS_FATAL_ERROR (\ + GRUB_AHCI_HBA_PORT_IS_IFS| \ + GRUB_AHCI_HBA_PORT_IS_HBDS|\ + GRUB_AHCI_HBA_PORT_IS_HBFS|\ + GRUB_AHCI_HBA_PORT_IS_TFES) + struct grub_ahci_hba { grub_uint32_t cap; @@ -1026,7 +1039,8 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev, endtime =3D grub_get_time_ms () + (spinup ? 20000 : 20000); while ((dev->hba->ports[dev->port].command_issue & 1)) - if (grub_get_time_ms () > endtime) + if (grub_get_time_ms () > endtime || + (dev->hba->ports[dev->port].intstatus & IS_FATAL_ERROR)) { grub_dprintf ("ahci", "AHCI status <%x %x %x %x>\n", dev->hba->ports[dev->port].command_issue, @@ -1034,7 +1048,10 @@ grub_ahci_readwrite_real (struct grub_ahci_device *dev, dev->hba->ports[dev->port].intstatus, dev->hba->ports[dev->port].task_file_data); dev->hba->ports[dev->port].command_issue =3D 0; - err =3D grub_error (GRUB_ERR_IO, "AHCI transfer timed out"); + if (dev->hba->ports[dev->port].intstatus & IS_FATAL_ERROR) + err =3D grub_error (GRUB_ERR_IO, "AHCI transfer error"); + else + err =3D grub_error (GRUB_ERR_IO, "AHCI transfer timed out"); if (!reset) grub_ahci_reset_port (dev, 1); break; _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel --001a1140ad5e0a5bbf0561c87bb3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,
IS_FATAL_ERROR sounds like a utility= macro, I would add a parameter to test the flags and make it return a Bool= compatible value.
My 2 cents comments.
Cheers

Le=C2=A02 janv. 2018 11:05, "Stefan Fritsch" <sf@sfritsch.de> a =C3=A9crit=C2=A0:
Hi,

on a Fujitsu E744 laptop we have a problem that sometimes there is a
very long delay (up to several minutes) when booting from hard disk.
It seems accessing the DVD drive (which has no disk inserted)
sometimes fails with some errors that are not correctly handled by
grub, which leads to each access being stalled until the 20s timeout
triggers. This seems to happen when grub is trying to read
filesystem/partition data.

Example debug output:

disk/ahci.c:1020: AHCI command scheduled
disk/ahci.c:1022: AHCI tfd =3D 50
disk/ahci.c:1024: AHCI inten =3D 7dc000ff
disk/ahci.c:1026: AHCI intstatus =3D 3
disk/ahci.c:1031: AHCI inten =3D 7dc000ff
disk/ahci.c:1033: AHCI tfd =3D 50
disk/ahci.c:1036: AHCI sig =3D eb140101
disk/ahci.c:1038: AHCI tfd =3D 2051
disk/ahci.c:1049: AHCI status <1 1 40000001 2051>
disk/ahci.c:926: AHCI tfd =3D 2051
disk/ahci.c:932: AHCI tfd =3D 2051
disk/ahci.c:936: AHCI tfd =3D 2051
disk/ahci.c:942: grub_ahci_read (size=3D0, cmdsize =3D 0)
disk/ahci.c:957: AHCI tfd =3D 2051, CL=3D0xda455000
disk/ahci.c:968: AHCI tfd =3D 2051
disk/ahci.c:980: AHCI tfd =3D 2051
disk/ahci.c:987: AHCI tfd =3D 2051
disk/ahci.c:998: cfis: 27 80 08 00 00 00 00 00
disk/ahci.c:1003: cfis: 00 00 00 00 00 00 00 00
disk/ahci.c:1015: PRDT =3D da453000, 0, ffffffff (128)

The problem is that the command_issue bit that is checked in the loop
is only reset if the "HBA receives a FIS which clears the BSY, DRQ, an= d
ERR bits for the command", but the ERR bit is never cleared.

The patch below seems to fix the issue by checking the error bits in
the interrupt status register. According to the AHCI 1.2 spec,
"Interrupt sources that are disabled (=E2=80=980=E2=80=99) are still r= eflected in the
status registers.", so this should work even though grub uses polling.=
The relevant bit in our case is the Task File Error Status (TFES),
which is equivalent to the ERR bit 0 in tfd. But the patch below also
checks the other error bits except for the "Interface non-fatal error<= br> status" bit.

Cheers,
Stefan

diff --git a/grub-core/disk/ahci.c b/grub-core/disk/ahci.c
--- a/grub-core/disk/ahci.c
+++ b/grub-core/disk/ahci.c
@@ -82,6 +82,19 @@ enum grub_ahci_hba_port_command
=C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_CMD_FR =3D 0x4000,
=C2=A0 =C2=A0};

+enum grub_ahci_hba_port_int_status
+=C2=A0 {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_IFS=C2=A0 =3D (1UL= << 27),
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_HBDS =3D (1UL <= < 28),
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_HBFS =3D (1UL <= < 29),
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_TFES =3D (1UL <= < 30),
+=C2=A0 };
+#define IS_FATAL_ERROR (\
+=C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_IFS| \
+=C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_HBDS|\
+=C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_HBFS|\
+=C2=A0 =C2=A0 =C2=A0 =C2=A0GRUB_AHCI_HBA_PORT_IS_TFES)
+
=C2=A0struct grub_ahci_hba
=C2=A0{
=C2=A0 =C2=A0grub_uint32_t cap;
@@ -1026,7 +1039,8 @@ grub_ahci_readwrite_real (struct grub_ahci_device *de= v,

=C2=A0 =C2=A0endtime =3D grub_get_time_ms () + (spinup ? 20000 : 20000); =C2=A0 =C2=A0while ((dev->hba->ports[dev->port].command_issue= & 1))
-=C2=A0 =C2=A0 if (grub_get_time_ms () > endtime)
+=C2=A0 =C2=A0 if (grub_get_time_ms () > endtime ||
+=C2=A0 =C2=A0 =C2=A0 =C2=A0(dev->hba->ports[dev->port].intst= atus & IS_FATAL_ERROR))
=C2=A0 =C2=A0 =C2=A0 =C2=A0{
=C2=A0 =C2=A0 =C2=A0 =C2=A0 grub_dprintf ("ahci", "AHCI stat= us <%x %x %x %x>\n",
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dev->hba->ports[dev->port].command_issue,
@@ -1034,7 +1048,10 @@ grub_ahci_readwrite_real (struct grub_ahci_device *d= ev,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dev->hba->ports[dev->port].intstatus,
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 dev->hba->ports[dev->port].task_file_data);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 dev->hba->ports[dev->port].comman= d_issue =3D 0;
-=C2=A0 =C2=A0 =C2=A0 =C2=A0err =3D grub_error (GRUB_ERR_IO, "AHCI tra= nsfer timed out");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0if (dev->hba->ports[dev->port].in= tstatus & IS_FATAL_ERROR)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0err =3D grub_error (GRUB_ERR_IO, "A= HCI transfer error");
+=C2=A0 =C2=A0 =C2=A0 =C2=A0else
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0err =3D grub_error (GRUB_ERR_IO, "A= HCI transfer timed out");
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (!reset)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 grub_ahci_reset_port (dev, 1);
=C2=A0 =C2=A0 =C2=A0 =C2=A0 break;
________________________________= _______________
Grub-devel mailing list
Grub-devel@gnu.org
https://lists.gnu.org/mailman/listinfo/grub-de= vel


--001a1140ad5e0a5bbf0561c87bb3-- --===============4554200391576093686== Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Grub-devel mailing list Grub-devel@gnu.org https://lists.gnu.org/mailman/listinfo/grub-devel --===============4554200391576093686==--