History log of /drivers/edac/mce_amd.c
Revision Date Author Comments
eba4bfb34d45a2219d1d7534905c026eea6fcd49 14-Jul-2014 Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> EDAC, MCE, AMD: Add MCE decoding for F15h M60h

Add decoding logic for new Fam15h model 60h.

Tested using mce_amd_inj module and works fine.

Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1405098795-4678-1-git-send-email-Aravind.Gopalakrishnan@amd.com
[ Boris: simplify a bit. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
c5c0903b2cda930c76d296419d290137294779f2 08-May-2014 Borislav Petkov <bp@suse.de> EDAC, MCE, AMD: Remove leftover unused mask

295d8cda2689 ("EDAC, MCE, AMD: Drop local coreid reporting") removed the
code snippet which used that mask but forgot to drop the mask itself. Do
that now.

Signed-off-by: Borislav Petkov <bp@suse.de>
fd0f5ffff8a21fd9a32688b850c5bd694e76cc27 17-Feb-2014 Borislav Petkov <bp@suse.de> MCE, AMD: Fix decoding module loading on unsupported hw

We want to still be able to issue some error information on systems for
which there is no decoding support (think older distro kernels here,
for example). Therefore, we allow module registration but skip the
per-family bank-specific decoders and issue the general information
only, i.e.:

[ 46.822828] [Hardware Error]: Error Status: Uncorrected, software containable error.
[ 46.822846] [Hardware Error]: CPU:0 (15:30:0) MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f
[ 46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out)

with the hope that it still contains helpful useful bits.

Suggested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
Link: http://lkml.kernel.org/r/1392659391-2411-1-git-send-email-Aravind.Gopalakrishnan@amd.com
Signed-off-by: Borislav Petkov <bp@suse.de>
aad19e51769d761ffc0608b381313e18f0bd82b3 05-Jun-2013 Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> EDAC, MCE, AMD: Add an MCE signature for new Fam15h models

Add a new error signature for Family 15h, models 30h-3fh. Patch has been
tested on Fam15h using mce_amd_inj facility and has been verified to
work correctly.

Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
[ cleanup commit message and error string ]
Signed-off-by: Borislav Petkov <bp@suse.de>
0f08669e869e7732846088d67acd2e339c2aa2fb 23-Dec-2012 Borislav Petkov <bp@alien8.de> EDAC, MCE, AMD: Remove unneeded exports

Initially, those strings describing different parts of an MCE message
were shared with amd64_edac and were therefore exported to modules.
However, all except pp_msgs are used only in one place right now so hide
them and make them static.

No functionality change.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
980eec8b20a9093f862a28f0f4bf67e55a9497be 18-Dec-2012 Jacob Shin <jacob.shin@amd.com> EDAC, MCE, AMD: Add MCE decoding support for Family 16h

Add MCE decoding logic for AMD Family 16h processors.

Boris:

- drop unneeded uu_msgs export
- exit early in cat_mc1_mce and save us an indentation level

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
4a73d3de63d4c4498e3dbf8614604c6b1dcc1fc2 18-Dec-2012 Jacob Shin <jacob.shin@amd.com> EDAC, MCE, AMD: Make MC2 decoding per-family

Currently only AMD Family 15h processors have special handling for MC2
errors. Since upcoming Family 16h will also need unique handling, let's
make MC2 handling part of amd_decoder_ops.

Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Signed-off-by: Borislav Petkov <bp@alien8.de>
d5c6770d4cb27bc33aa433cf8fb848ad9af6644b 14-Sep-2012 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Dump error status

Dump error status after decoding the error which describes the error
disposition.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
d824c7718b78b6a5afae7fc78731b70318cd076f 14-Sep-2012 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Report decoded error type first

Instead of starting with the error details, report the decoded, readable
error type first.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
f89f8388cd11faa8e77992cb11ab44ac9a6abf4f 13-Sep-2012 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Dump CPU f/m/s triple with the error

It is very useful to have the family/model/stepping with the reported
error so dump it. This saves us asking the bug reporter about it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
f05c41a9c6057a0d5851ebc9589e3834fde1a4b6 11-Sep-2012 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Remove functional unit references

Having the functional unit names in each bank decode is only misleading
as this code supports multiple families and there's no guarantee the
mapping between FUs and MCE banks will stay the same.

And also, knowing the functional unit name doesn't help much since you
end up looking at the respective BKDG anyway.

So drop all FU references and use the MC bank numbers instead.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
ec3e82d6dc46cac7309b01ff9761f469b0263019 04-Apr-2012 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Drop too granulary family model checks

MCA details seldom change inbetween the models of a family so don't
be too conservative and enable decoding on everything starting from
K8 onwards. Minor adjustments can come in later but most importantly,
we have some decoding infrastructure in place for upcoming models by
default.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
ebe2aea86872622d4352cd71d55298fedf69a7bb 29-Nov-2011 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Constify error tables

... so that checkpatch can chill out.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
ae615b4b5f0b875cbe8a029239436c6aed8c0ef4 25-Nov-2011 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Correct bank 5 error signatures

... and remove superfluous ErrorCodeExt check.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
68782673e6dd69054a9b75b0983a5e45e16f6625 24-Nov-2011 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Rework NB MCE signatures

Correct their formulation, replace per-family functions with a single,
unified lookup table.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
b64a99c1752d2b6525a5011a8e473f8f8a4bdd79 23-Nov-2011 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Correct VB data error description

Sync with latest BKDG error types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
6c1173a61e63c32bd40cb1e6dd16343240a328eb 21-Nov-2011 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Correct ucode patch buffer description

This MC1 error signature is called differently now, fix it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
344f0a0631e1b2784859fbe2351d99dce2652b77 15-Nov-2011 Borislav Petkov <borislav.petkov@amd.com> MCE, AMD: Correct some MC0 error types

Use "System Read Data Error" as a more general name for MC0 bus errors
on F15h and update some error definitions.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
3653ada5d3e173489b3a466305687cb5c44b2ab1 04-Dec-2011 Borislav Petkov <borislav.petkov@amd.com> x86, mce: Add wrappers for registering on the decode chain

No functionality change, this is done so that in a follow-on patch all
queued-up MCEs can be decoded after registering on the chain.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
b0b07a2bd4fbb6198d4e7142337214eeb77c417a 24-Aug-2011 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE, AMD: Simplify NB MCE decoder interface

Drop third nbcfg argument which is old remains and not required anymore.

No functionality change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
295d8cda2689a74ae88bcece7b4cfe0bf8bf9a91 24-Aug-2011 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE, AMD: Drop local coreid reporting

MCE decoding code is reporting the core which encountered the error
unconditionally now so drop this piece. Besides, it reported the
coreid in the local processor package which is not that valuable as a
datapoint.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
086be786ca10af7a9783ab06a9b5594c2c6facbf 30-Sep-2011 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE, AMD: Print valid addr when reporting an error

The MCi_STATUS bank has a AddrV bit which, when set, denotes that the
corresponding MCi_ADDR MSR contains a valid address belonging to the
MCE currently being reported. Dump it since it is definitely relevant
information.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
bff7b812465a797bc563e9938fa11316fcd2ac0d 04-Aug-2011 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE, AMD: Print CPU number when reporting the error

Currently, correctable ECCs go through mcelog and do not print the scary
MCE banner. In that case, however, reporting the core where the CECC
happened is important information so dump it along with the decoded
string albeit at risk of having a minor redundancy.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
df71a053241548b728d3bf45b0c11ed092a20319 19-Jan-2011 Borislav Petkov <borislav.petkov@amd.com> amd64_edac: Enable driver on F15h

Add the PCI device ids required for driver registration. Remove
pvt->ctl_name and use the family descriptor directly, instead. Then,
bump driver version and fixup its format. Finally, enable DRAM ECC
decoding on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
bcd781f46a5f892ef2ae5843839849aa579fe096 07-Jan-2011 Borislav Petkov <borislav.petkov@amd.com> amd64_edac: Cleanup NBSH cruft

Remove reporting of errors with UC bit set - this is done by the MCE
decoding code anyway and this driver deals with DRAM ECC errors only. UC
(NB uncorrectable error) doesn't necessarily mean it is a DRAM error.
Remove unused macros while at it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
6d5db4668796d903dc3bad2852c82073509c37d2 25-Nov-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Fix NB error formatting

Minor formatting fixup since the information which core was associated
with the MCE is not always valid.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
50adbbd8a8e572ad2533eace228c841ec84028a3 13-Nov-2010 Randy Dunlap <randy.dunlap@oracle.com> EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bit

Building for X86_32 produces shift count warnings, so use BIT_64() to
eliminate the warnings.

drivers/edac/mce_amd.c:778: warning: left shift count >= width of type
drivers/edac/mce_amd.c:778: warning: left shift count >= width of type

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: bluesmoke-devel@lists.sourceforge.net
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
bad11e031862294265145d87dd4be1ae4af0d57f 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Enable MCE decoding on F15h

Now that everything is inplace, enable MCE decoding on F15h. Make
initcall routine a bit more readable.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
fa7ae8cc8c88c0679eab24c5a1b5d3b134a5f542 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Shorten error report formatting

Shorten up MCi_STATUS flags and add BD's new deferred and poison types.
Also, simplify formatting.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
6245288232516aadf293f575d1812dafb4696aee 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Overhaul error fields extraction macros

Make macro names shorter thus making code shorter and more clear.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
b8f85c477bdf1fec98ea7cbe952fdb5f40eb0aa7 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F15h FP MCE decoder

Add decoder for FP MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
8259a7e5724c42c89d927b92cda3e0ab15b9ade9 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F15 EX MCE decoder

Integrate the single FIROB signature into an expanded table along with
the new BD MCE types.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
05cd667d668eb08845dd49c02130e5223121b715 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add an F15h NB MCE decoder

by (almost) reusing the F10h one since the signatures are the same.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
b18434cad1740466f7a1c304ea4af0f4d3c874f1 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: No F15h LS MCE decoder

F15h BD doesn't generate LS MCEs so warn about it.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
70fdb494aa8c82f76745d5a32b8abc505813557c 21-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F15h CU MCE decoder

MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h.
Add a decoder function for CU MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
86039cd401e1780573733870f9c0bd458fc96ea2 08-Nov-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F15h IC MCE decoder

Add support for decoding F15h IC MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
25a4f8b05917f8137bfff8a3f8c6c8c1ac561208 17-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F15h DC MCE decoder

Add a decoder for F15h DC MCEs to support the new types of DC MCEs
introduced by the BD microarchitecture.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
2be64bfac71378e1aa8c20031a499bd55e391244 17-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Select extended error code mask

F15h enlarges the extended error code of an MCE to a 5-bit field
(MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden
on F15h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
525906bc898d712f21e5bfcfc85ab0e517e3d086 15-Oct-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Fix shift warning on 32-bit

Fix

drivers/edac/mce_amd.c:262: warning: left shift count >= width of type

on 32-bit builds.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
fda7561f438aeddf074e2db0890e283195aa7779 22-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Enable MCE decoding on F12h

Turn on MCE decoding on F12h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
cb9d5ecdff66197f65a6be8032ccc1ebf7199684 16-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F12h NB MCE decoder

F12h is completely covered by the generic path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
e7281eb37da045abac5bd795d1169fc2e3eeea49 16-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F12h IC MCE decoder

... which is the same as for K8 and F10h.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
9be0bb1072e3544934e0ac20f184e50805aecf9c 16-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add F12h DC MCE decoder

F12h DC MCE signatures are a subset of F10h's so reuse them.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
f0157b3afd2ec6331245768a785487249a3c9734 05-Oct-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Add support for F11h MCEs

F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5
bank errors. Reuse functionality from the other families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
9530d608ef0e1f76b7fd82bb92645062292fc009 06-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Enable MCE decoding on F14h

Now that all decoders have been taught about F14h, models < 0x10
MCEs, enable decoding on this family of CPUs. Also, issue a short
informational message upon boot that MCE decoding gets enabled.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
fe4ea2623bec3e595f8e77a8514307c389c096ae 31-Aug-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Fix FR MCEs decoding

Those are N/A on K8, so don't decode them there.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
5ce88f6ea6bef929f59f9468413f922c9a486fa4 31-Aug-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Complete NB MCE decoders

Add support for decoding F14h BU MCEs and improve decoding of the
remaining families.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
ded506232865e8e932bc21c87f48170d50db4d97 27-Aug-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Warn about LS MCEs on F14h

F14h CPUs do not generate LS MCEs so exit early and warn the user in
case this path is ever hit that something else might be going haywire.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
dd53bce4e8987f6848840d42bbeead5221eff308 26-Aug-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Adjust IC decoders to F14h

Add support for IC MCEs for F14h CPUs. K8 and F10h are almost identical
so use one function for both.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
888ab8e6eb2e41179cdc8edf5d0abd1cce0f0370 18-Aug-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Adjust DC decoders to F14h

Add a per-family data cache decoders. Since there is a certain overlap
between the different DC MCE signatures, reuse functionality between the
families as far as possible.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
47ca08a40b043815134d489e21870b53276f1a4a 27-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC, MCE: Rename files

Drop "edac_" string from the filenames since they're prefixed with edac/
in their pathname anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>