[prev in list] [next in list] [prev in thread] [next in thread]
List: ceph-users
Subject: [ceph-users] Network issues with a CephFS client mount via a Cloudstack instance
From: Jeremy Hansen <jeremy () skidrow ! la>
Date: 2021-08-31 0:49:35
Message-ID: 37c461a6-5d62-4135-abd0-13ab4cc7bb34 () Canary
[Download RAW message or body]
[Attachment #2 (multipart/signed)]
I'm going to also post this to the Cloudstack list as well.
Attempting to rsync a large file to the Ceph volume, the instance becomes \
unresponsive at the network level. It eventually returns but it will continually drop \
offline as the file copies. Dmesg shows this on the Cloudstack host machine:
[ 7144.888744] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <100687140>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7146.872563] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <100687900>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7148.856703] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
TDH <80>
TDT <d0>
next_to_use <d0>
next_to_clean <7f>
buffer_info[next_to_clean]:
time_stamp <100686d46>
next_to_watch <80>
jiffies <1006880c0>
next_to_watch.status <0>
MAC Status <80083>
PHY Status <796d>
PHY 1000BASE-T Status <3800>
PHY Extended Status <3000>
PCI Status <10>
[ 7150.199756] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
The host machine:
System Information
Manufacturer: Dell Inc.
Product Name: OptiPlex 990
Running CentOS 8.4.
I also see the same error on another host of a different hw type:
Manufacturer: Hewlett-Packard
Product Name: HP Compaq 8200 Elite SFF PC
but both are using e1000 drivers.
I upgraded the kernel to 5.13.x and I thought this fixed the issue, but now I see the \
error again.
Migrating the instance to a bigger server class machine (also e1000e, old Rackable \
system) where I have a bigger pipe via bonding, I don't seem to have the issue.
Just curious if this could be a known bug with e1000e and if there is any kind of \
work around.
Thanks
-jeremy
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-leave@ceph.io
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic