[prev in list] [next in list] [prev in thread] [next in thread] 

List:       freeipmi-users
Subject:    Re: [Freeipmi-users] ASROCK MT-C224 + ipmiconsole: [error received]: excess errors received
From:       Albert Chu <chu11 () llnl ! gov>
Date:       2017-01-30 19:10:52
Message-ID: 1485803452.30961.55.camel () llnl ! gov
[Download RAW message or body]

> I have learned that there are two ways to run IPMI on the ASROCK
> MT-C224. IPMI can share the LAN1 port and/or IPMI can use a dedicated
> IPMI_LAN port. The earlier grep (still attached below) of --debug for
> the linux boot was made sharing the LAN1 port.

Ahh, this is a common feature of many boards.  However, I have not heard
of a system that has had the "jumping sequence numbers" issue you saw.
I haven't seen it on any machines at my company, but we use dedicated
mode almost exclusively.

It's likely a bug on the ASROCK.  I can add it to the "bugs and
workarounds" doc
(https://www.gnu.org/software/freeipmi/freeipmi-bugs-issues-and-workarounds.txt).
 
> I switched to the dedicated IPMI_LAN port and the situation is
improved
> as shown by the grep of --debug output for the linux boot below ...
> 
> g1@g1 cat power-up-debug.txt | grep -a failed
> (ipmiconsole_checks.c, ipmiconsole_check_requester_sequence_number,
389): hostname=e3bIPMI; protocol_state=Ah: requester sequence number
check failed; p = 21; req_seq = 3Eh; expected_req_seq = 3Fh
> (ipmiconsole_checks.c, ipmiconsole_check_command, 353):
hostname=e3bIPMI; protocol_state=Bh: command check failed; p = 23; cmd =
49h; expected_cmd = 3Ch

This appears to be a single lost packet, which can happen once in
awhile.

Al

On Fri, 2017-01-27 at 23:05 -0500, myglc2 wrote:
> On 01/27/2017 at 00:52 Albert Chu writes:
> 
> > It's also worth mentioning, I don't know why the sequence numbers are
> > jumping by two most times.  It shouldn't be the case, also suggesting
> > lost messages.
> > 
> > If the BMC has a bug where sequence numbers are jumping by two somewhat
> > randomly, that could explain the issue further.  It means the default
> > ipmiconsole can only handle half as many messages dropping than it
> > normally would be.
> > 
> > Al
> 
> Hi Al,
> 
> I have learned that there are two ways to run IPMI on the ASROCK
> MT-C224. IPMI can share the LAN1 port and/or IPMI can use a dedicated
> IPMI_LAN port. The earlier grep (still attached below) of --debug for
> the linux boot was made sharing the LAN1 port.
> 
> I switched to the dedicated IPMI_LAN port and the situation is improved
> as shown by the grep of --debug output for the linux boot below ...
> 
> g1@g1 cat power-up-debug.txt | grep -a failed
> (ipmiconsole_checks.c, ipmiconsole_check_requester_sequence_number, 389): \
> hostname=e3bIPMI; protocol_state=Ah: requester sequence number check failed; p = \
> 21; req_seq = 3Eh; expected_req_seq = 3Fh (ipmiconsole_checks.c, \
> ipmiconsole_check_command, 353): hostname=e3bIPMI; protocol_state=Bh: command check \
> failed; p = 23; cmd = 49h; expected_cmd = 3Ch 
> ... and freeipmi doesn't drop the connection. So, at this point I have a
> usable system and I a happy user ;-)
> 
> I still see 5 to 10 control-@ 's in the linux log, which are errors that
> typically involve the lost of a few characters. Would you think turning
> on flow control would help this?
> 
> Many thanks for your help!
> 
> - George
> 
> > 
> > On Wed, 2017-01-25 at 20:31 -0500, myglc2 wrote:
> > > Hi Albert, Thank you for the quick response.
> > > 
> > > On 01/25/2017 at 20:00 Albert Chu writes:
> > > 
> > > > Hi,
> > > > 
> > > > This seems to be an error caused by a simple sequence number issue.
> > > > Enough messages from the remote service processor have gotten lost, so
> > > > ipmiconsole gives up at some point.  I don't know if your log output
> > > > below is showing consecutive
> > > > "ipmiconsole_check_outbound_sequence_number" errors, but there is
> > > > atleast that one big jump from #398 to #429, indicating lots of lost
> > > > messages.
> > > 
> > > Sorry, I think my cut-and-paste left a bit to be desired ;-) Here is a
> > > more informative ( hopefully) grep of a session that bags out about 1/2
> > > way thru a linux boot ...
> > > 
> > > g1@g1 /root/con/06$ cat freeipmi.debug.txt | grep -a -E '(failed;|excessive)'
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 329; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 331; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 333; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 335; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 337; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 339; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 341; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 343; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 345; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 347; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 349; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 351; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 353; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 355; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 19; session_sequence_number = 356; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 358; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 360; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; p = \
> > > 17; session_sequence_number = 362; highest_received_sequence_number = 317 \
> > > (ipmiconsole_processing.c, _process_ctx, 4077): hostname=e3bIPMI; \
> > > protocol_state=9h: closing with excessive errors (ipmiconsole_checks.c, \
> > > ipmiconsole_check_outbound_sequence_number, 186): hostname=e3bIPMI; \
> > > protocol_state=Ah: session sequence number check failed; p = 21; \
> > > session_sequence_number = 363; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_requester_sequence_number, 389): \
> > > hostname=e3bIPMI; protocol_state=Ah: requester sequence number check failed; p \
> > > = 21; req_seq = 2Eh; expected_req_seq = 2Fh (ipmiconsole_checks.c, \
> > > ipmiconsole_check_outbound_sequence_number, 186): hostname=e3bIPMI; \
> > > protocol_state=Bh: session sequence number check failed; p = 23; \
> > > session_sequence_number = 366; highest_received_sequence_number = 317 \
> > > (ipmiconsole_checks.c, ipmiconsole_check_command, 353): hostname=e3bIPMI; \
> > > protocol_state=Bh: command check failed; p = 23; cmd = 49h; expected_cmd = 3Ch \
> > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > hostname=e3bIPMI; protocol_state=Bh: session sequence number check failed; p = \
> > > 23; session_sequence_number = 367; highest_received_sequence_number = 317 g1@g1 \
> > > /root/con/06$ 
> > > > You may wish to check network connections and such for errors, lost
> > > > packets, etc.
> > > 
> > > I don't think so, the two machines are the only ones on a switch.
> > > 
> > > > If you believe this to not be the case, there is atleast 1 other known
> > > > situation where I know this to occur.  It occurs when the server is
> > > > being rebooted (or some similar to that) and the internal serial UART
> > > > chip is rebooted and leads to some communication problems between it and
> > > > the internal service processor, suddenly leading to huge jumps in
> > > > sequence numbers.  Unfortunately, there is no solution for this other
> > > > than to restart.
> > > 
> > > I tried rebooting/repowering and it does not affect this behavior.
> > > 
> > > FWIW, ipmitool does not report any errors and handles a full linux boot
> > > with out bagging out.  However I do see 5 to 10 control-@ 's in the log
> > > which are clearly errors.
> > > 
> > > So... I believe the problem is with the ASROCK MT-C224. I will follow up
> > > with them.
> > > 
> > > Many thanks!
> > > 
> > > - George
> > > 
> > > > Al
> > > > 
> > > > On Wed, 2017-01-25 at 14:38 -0500, myglc2 wrote:
> > > > > NOTE: Please pardon if duplicate, I also posted via gmane by mistake.
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > Using freeipmi ipmiconsole SOL to connect to ASROCK MT-C224 everything
> > > > > is looking good until ...
> > > > > 
> > > > > [...]
> > > > > [error received]: excess errors received
> > > > > [closing the connection]
> > > > > 
> > > > > So I tried ...
> > > > > 
> > > > > ipmiconsole -h e3bIPMI -u admin -p admin --debug
> > > > > 
> > > > > ... which showed me ....
> > > > > 
> > > > > [...]
> > > > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; \
> > > > > p = 17; session_sequence_number = 396; highest_received_sequence_number = \
> > > > > 384 [...]
> > > > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; \
> > > > > p = 17; session_sequence_number = 398; highest_received_sequence_number = \
> > > > > 384 [...]
> > > > > (ipmiconsole_checks.c, ipmiconsole_check_outbound_sequence_number, 186): \
> > > > > hostname=e3bIPMI; protocol_state=9h: session sequence number check failed; \
> > > > > p = 17; session_sequence_number = 429; highest_received_sequence_number = \
> > > > > 384 (ipmiconsole_processing.c, _process_ctx, 4077): hostname=e3bIPMI;
> > > > > protocol_state=9h: closing with excessive errors
> > > > > 
> > > > > Looks like the BMC is stuck at # 186, EH? So I tried ...
> > > > > 
> > > > > ipmiconsole -h e3bIPMI -u admin -p admin  -W solpacketseq --debug
> > > > > 
> > > > > ... and ...
> > > > > 
> > > > > ipmiconsole -h e3bIPMI -u admin -p admin  -W solpacketseq
> > > > > 
> > > > > ... neither of which helped. Suggestions would be most welcome.
> > > > > 
> > > > > Thanks in advance - George
> > > > > 
> > > > > VERSIONS:
> > > > > 
> > > > > ipmiconsole --version
> > > > > ipmiconsole - 1.4.5
> > > > > 
> > > > > ASROCK
> > > > > BIOS 3.20        7/17/2015
> > > > > BMC  04.04.00    9/3/2014
> > > > > 
> > > > > _______________________________________________
> > > > > Freeipmi-users mailing list
> > > > > Freeipmi-users@gnu.org
> > > > > https://lists.gnu.org/mailman/listinfo/freeipmi-users

-- 
Albert Chu
chu11@llnl.gov
Computer Scientist
High Performance Systems Division
Lawrence Livermore National Laboratory



_______________________________________________
Freeipmi-users mailing list
Freeipmi-users@gnu.org
https://lists.gnu.org/mailman/listinfo/freeipmi-users


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic