Click here

Monday, August 1, 2016

Decoding Machine check register exception

Dump
CPU 0: Machine Check Exception: 0000000000000004
CPU 1: Machine Check Exception: 0000000000000005
      Bank 5: b200000802000e0f


Kernel Code snippet:

rdmsr (MSR_IA32_MC0_STATUS+i*4,low, high);  - reads the values of 64 bit machine check registers
        if (high & (1<<31)) {
            if (high & (1<<29))
                recover |= 1;
            if (high & (1<<25))
                recover |= 2;
            printk (KERN_EMERG "Bank %d: %08x%08x", i, high, low);
if (recover & 2)
        panic ("CPU context corrupt");

Decoding : 

( Reference Document)
You will need to browse to Intel’s website hosting Intel® 64 and IA-32 Architectures Software Developer Manuals. There, download a manual named “Intel 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes 3A, 3B, and 3C: System Programming Guide”.

(Reference: https://vmxp.wordpress.com/2014/10/27/debugging-machine-check-errors-mces/comment-page-1/)

IA32_MCi_STATUS MSRS
Each IA32_MCi_STATUS MSR contains information related to a machine-check error if its VAL (valid) flag is set.

Bank 5: b200000802000e0f  - In hex format


63 62 61 60 59 58 57 56 55  54 53      52 - 38                37   36-32         31 - 16                     
1    0  1   1   0   0  1    0  0     00        000000000000000  0    01000   0000 0010 0000 0000   

 15  - 0
0000 1110 0000 1111  ( Bank value in binary format)


63 -  VAL -MCi_STATUS register valid
61 - Uncorrected error
60  - Error reporting enabled
57 - Processor context corrupt

Model specific errors   ( 16 – 31 bits) - Model-specific error code field, bits 31:16
27-25 bits - Bus queue error type

000 for BQ_ERR_HARD_TYPE error
001 for BQ_ERR_DOUBLE_TYPE error   --  It’s Double bit error detected on data read in our case
010 for BQ_ERR_AERR2_TYPE error
100 for BQ_ERR_SINGLE_TYPE error
101 for BQ_ERR_AERR1_TYPE error

0 -15 -  Specifies the machine-check architecture-
defined error code for the machine-check error condition detected
IA32_MCi_Status [15:0] Compound Error Code Encoding


No comments:

Post a Comment

Omicron - people gathers in crowd

Amidst omicron thread, people are gathered in crowd at markets and public places to buy their daily needs. Because of full lockdown at Sunda...