Dump
CPU 0: Machine Check Exception: 0000000000000004
CPU 1: Machine Check Exception: 0000000000000005
Bank 5: b200000802000e0f
Kernel Code snippet:
rdmsr (MSR_IA32_MC0_STATUS+i*4,low,
high); - reads the values of 64 bit machine check registers
if (high &
(1<<31)) {
if (high & (1<<29))
recover |= 1;
if (high & (1<<25))
recover |= 2;
printk (KERN_EMERG
"Bank %d: %08x%08x", i, high, low);
if (recover & 2)
panic ("CPU
context corrupt");
Decoding :
( Reference Document)
You will need to browse to Intel’s website hosting Intel® 64 and IA-32 Architectures Software Developer Manuals. There, download a manual named “Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes 3A, 3B, and 3C: System Programming Guide”.
(Reference: https://vmxp.wordpress.com/2014/10/27/debugging-machine-check-errors-mces/comment-page-1/)
IA32_MCi_STATUS MSRS
Each IA32_MCi_STATUS MSR contains
information related to a machine-check error if its VAL (valid) flag is set.
Bank 5: b200000802000e0f - In hex format
63 62 61 60 59 58 57 56 55
54 53 52 -
38
37 36-32
31 -
16
1 0 1 1 0
0 1 0 0
00 000000000000000
0 01000 0000 0010 0000 0000
15 - 0
0000 1110 0000 1111 ( Bank value in binary format)
63 - VAL -MCi_STATUS
register valid
61 - Uncorrected error
60 - Error reporting
enabled
57 - Processor context corrupt
Model specific errors
( 16 – 31 bits) - Model-specific error code field, bits 31:16
27-25 bits - Bus queue error type
000 for BQ_ERR_HARD_TYPE error
001 for BQ_ERR_DOUBLE_TYPE error -- It’s Double bit
error detected on data read in our case
010 for BQ_ERR_AERR2_TYPE error
100 for BQ_ERR_SINGLE_TYPE error
101 for BQ_ERR_AERR1_TYPE error
0 -15 - Specifies the
machine-check architecture-
defined error code for the
machine-check error condition detected
IA32_MCi_Status [15:0] Compound
Error Code Encoding
No comments:
Post a Comment