wrx80e_sage_nvme_disable_aer_severity_corrected.md
source link: https://gist.github.com/zekome/35db528b33206e68f18439ad7fabfcd5
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Turn off AER logging for NVMe and event severity corrected
Motherboard: Asus Pro WS WRX80E-SAGE SE WIFI
Card: Asus HYPER M.2 X16 GEN 4 CARD
NVMe: 4x Samsung SSD 980 PRO 1TB
OS: Linux fedora 5.16.12-200.fc35.x86_64
AER, advanced error reporting logs excessively:
dmesg
nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
nvme 0000:44:00.0: [ 0] RxErr (First)
nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
nvme 0000:44:00.0: AER: aer_status: 0x00000001, aer_mask: 0x00000000
nvme 0000:44:00.0: [ 0] RxErr (First)
nvme 0000:44:00.0: AER: aer_layer=Physical Layer, aer_agent=Receiver ID
{2085}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
{2085}[Hardware Error]: It has been corrected by h/w and requires no further action
{2085}[Hardware Error]: event severity: corrected
{2085}[Hardware Error]: Error 0, type: corrected
{2085}[Hardware Error]: section_type: PCIe error
{2085}[Hardware Error]: port_type: 0, PCIe end point
{2085}[Hardware Error]: version: 0.2
{2085}[Hardware Error]: command: 0x0406, status: 0x0010
{2085}[Hardware Error]: device_id: 0000:44:00.0
{2085}[Hardware Error]: slot: 0
{2085}[Hardware Error]: secondary_bus: 0x00
{2085}[Hardware Error]: vendor_id: 0x144d, device_id: 0xa80a
{2085}[Hardware Error]: class_code: 010802
{2085}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000
Note device id
in logs. In this case it's 0000:44:00.0
. Also there are similar logs for all four NVMe disks on the same card with respective device ids 0000:43:00.0
, 0000:42:00.0
, 0000:41:00.0
. Then, for each device id (for example: 0000:44:00.0
) turn off corrected-severity bit (clear the first bit) if set. Get the current value for CAP_EXP register and XOR it with 0x1 to toggle.
setpci -v -s 0000:44:00.0 CAP_EXP+0x8.w
0000:44:00.0 (cap 10 @70) @78 = 2937
So, the bit is set... toggle: 0x2937 XOR 0x1 = 0x2936
setpci -v -s 0000:44:00.0 CAP_EXP+0x8.w=0x2936
0000:44:00.0 (cap 10 @70) @78 2936
Device id and CAP_EXP values might differ in other cases.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK