Cross Reference: /drivers/acpi/apei/ghes.c

History log of /drivers/acpi/apei/ghes.c
Revision	Date	Author	Comments
594c7255dce7a13cac50cf2470cc56e2c3b0494e	22-Jul-2014	Tomasz Nowicki <tomasz.nowicki@linaro.org>	acpi, apei, ghes: Factor out ioremap virtual memory for IRQ and NMI context. GHES currently maps two pages with atomic_ioremap. From now on, NMI is architectural depended so there is no need to allocate an NMI page for platforms without NMI support. To make it possible to not use a second page, swap the existing page order so that the IRQ context page is first, and the optional NMI context page is second. Then, use HAVE_ACPI_APEI_NMI to decide how many pages are to be allocated. Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
44a69f6195628f6f940566d133a72987559e102d	22-Jul-2014	Tomasz Nowicki <tomasz.nowicki@linaro.org>	acpi, apei, ghes: Make NMI error notification to be GHES architecture extension. Currently APEI depends on x86 architecture. It is because of NMI hardware error notification of GHES which is currently supported by x86 only. However, many other APEI features can be still used perfectly by other architectures. This commit adds two symbols: 1. HAVE_ACPI_APEI for those archs which support APEI. 2. HAVE_ACPI_APEI_NMI which is used for NMI code isolation in ghes.c file. NMI related data and functions are grouped so they can be wrapped inside one #ifdef section. Appropriate function stubs are provided for !NMI case. Note there is no functional changes for x86 due to hard selected HAVE_ACPI_APEI and HAVE_ACPI_APEI_NMI symbols. Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
9dae3d0d9e64c3cb8bb172f041d4e66d4b92088a	22-Jul-2014	Tomasz Nowicki <tomasz.nowicki@linaro.org>	apei, mce: Factor out APEI architecture specific MCE calls. This commit abstracts MCE calls and provides weak corresponding default implementation for those architectures which do not need arch specific actions. Each platform willing to do additional architectural actions should provides desired function definition. It allows us to avoid wrap code into #ifdef in generic code and prevent new platform from introducing dummy stub function too. Initially, there are two APEI arch-specific calls: - arch_apei_enable_cmcff() - arch_apei_report_mem_error() Both interact with MCE driver for X86 architecture. Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
0a00fd5e20fd5dc89e976e163588d7c54edaf745	03-Jun-2014	Lv Zheng <lv.zheng@intel.com>	ACPICA: Restore error table definitions to reduce code differences between Linux and ACPICA upstream. The following commit has changed ACPICA table header definitions: Commit: 88f074f4871a8c212b212b725e4dcdcdb09613c1 Subject: ACPI, CPER: Update cper info While such definitions are currently maintained in ACPICA. As the modifications applying to the table definitions affect other OSPMs' drivers, it is very difficult for ACPICA to initiate a process to complete the merge. Thus this commit finally only leaves us divergences. Revert such naming modifications to reduce the source code differecnes between Linux and ACPICA upstream. No functional changes. Signed-off-by: Lv Zheng <lv.zheng@intel.com> Cc: Bob Moore <robert.moore@intel.com> Cc: Chen, Gong <gong.chen@linux.intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Borislav Petkov <bp@alien8.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
ca104edc17841da87850b20ab77e57fe0a99ead6	25-Nov-2013	Chen, Gong <gong.chen@linux.intel.com>	ACPI, APEI, GHES: Cleanup ghes memory error handling Cleanup the logic in ghes_handle_memory_failure(). While at it, add proper PFN validity check for UC error and cleanup the code logic to make it simpler and cleaner. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1385363701-12387-2-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message. ] Signed-off-by: Borislav Petkov <bp@suse.de>
addccbb264e5e0e5762f4893f6df24afad327c8c	25-Nov-2013	Chen, Gong <gong.chen@linux.intel.com>	ACPI, APEI, GHES: Do not report only correctable errors with SCI Currently SCI is employed to handle corrected errors - memory corrected errors, more specifically but in fact SCI still can be used to handle any errors, e.g. uncorrected or even fatal ones if enabled by the BIOS. Enable logging for those kinds of errors too. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Link: http://lkml.kernel.org/r/1385363701-12387-1-git-send-email-gong.chen@linux.intel.com [ Boris: massage commit message, rename function arg. ] Signed-off-by: Borislav Petkov <bp@suse.de>
27d50c82714f6477ac690034b37d202f76eb4f70	06-Dec-2013	Lv Zheng <lv.zheng@intel.com>	ACPI / i915: Fix incorrect <acpi/acpi.h> inclusions via <linux/acpi_io.h> To avoid build problems and breaking dependencies between ACPI header files, <acpi/acpi.h> should not be included directly by code outside of the ACPI core subsystem. However, that is possible if <linux/acpi_io.h> is included, because that file contains a direct inclusion of <acpi/acpi.h>. For this reason, remove the direct <acpi/acpi.h> inclusion from <linux/acpi_io.h>, move that file from include/linux/ to include/acpi/ and make <linux/acpi.h> include it for CONFIG_ACPI set along with the other ACPI header files. Accordingly, Remove the inclusions of <linux/acpi_io.h> from everywhere. Of course, that causes the contents of the new <acpi/acpi_io.h> file to be available for CONFIG_ACPI set only, so intel_opregion.o that depends on it should also depend on CONFIG_ACPI (and it really should not be compiled for CONFIG_ACPI unset anyway). References: https://01.org/linuxgraphics/sites/default/files/documentation/acpi_igd_opregion_spec.pdf Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Lv Zheng <lv.zheng@intel.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> [rjw: Subject and changelog] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
147de14772ed897727dba7353916b02d1e0f17f4	18-Oct-2013	Chen, Gong <gong.chen@linux.intel.com>	ACPI, APEI, CPER: Add UEFI 2.4 support for memory error In latest UEFI spec(by now it is 2.4) memory error definition for CPER (UEFI 2.4 Appendix N Common Platform Error Record) adds some new fields. These fields help people to locate memory error to an actual DIMM location. Original-author: Tony Luck <tony.luck@intel.com> Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Reviewed-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
88f074f4871a8c212b212b725e4dcdcdb09613c1	18-Oct-2013	Chen, Gong <gong.chen@linux.intel.com>	ACPI, CPER: Update cper info We have a lot of confusing names of functions and data structures in amongs the the error reporting code. In particular the "apei" prefix has been applied to many objects that are not part of APEI. Since we will be using these routines for extended error log reporting it will be clearer if we fix up the names first. Signed-off-by: Chen, Gong <gong.chen@linux.intel.com> Acked-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
cf870c70a194443f8fc654ddc9d6cfd02c58003b	10-Jul-2013	Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>	mce: acpi/apei: Soft-offline a page on firmware GHES notification If the firmware indicates in GHES error data entry that the error threshold has exceeded for a corrected error event, then we try to soft-offline the page. This could be called in interrupt context, so we queue this up similar to how we handle memory failure scenarios. Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
0ba98ec9196746dd6abfa7bb9856ef4f29ffb9be	06-Jun-2013	Betty Dall <betty.dall@hp.com>	ACPI / APEI: Force fatal AER severity when component has been reset The CPER error record has a reset bit that indicates that the platform has reset the component. The reset bit can be set for any severity error including recoverable. From the AER code path's perspective, any error is fatal if the component has been reset. This patch upgrades the severity of the AER recovery to AER_FATAL whenever the CPER error record indicates that the component has been reset. [bhelgaas: s/bus has been reset/component has been reset/] Signed-off-by: Betty Dall <betty.dall@hp.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
a98d4f64a20b2b88697e7e08c871144a7e3f0ec4	03-Jun-2013	Wei Yongjun <yongjun_wei@trendmicro.com.cn>	ACPI / APEI: fix error return code in ghes_probe() Fix to return a negative error code in the acpi_gsi_to_irq() and request_irq() error handling case instead of 0, as done elsewhere in this function. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Reviewed-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
37448adfc7ce0d6d5892b87aa8d57edde4126f49	30-May-2013	Lance Ortiz <lance.ortiz@hp.com>	aerdrv: Move cper_print_aer() call out of interrupt context The following warning was seen on 3.9 when a corrected PCIe error was being handled by the AER subsystem. WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90() This occurred because a call to pci_get_domain_bus_and_slot() was added to cper_print_pcie() to setup for the call to cper_print_aer(). The warning showed up because cper_print_pcie() is called in an interrupt context and pci_get* functions are not supposed to be called in that context. The solution is to move the cper_print_aer() call out of the interrupt context and into aer_recover_work_func() to avoid any warnings when calling pci_get* functions. Signed-off-by: Lance Ortiz <lance.ortiz@hp.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
21480547c8b85be6c08c4d77ed514673b73eda8a	15-Feb-2013	Mauro Carvalho Chehab <mchehab@redhat.com>	ghes: add the needed hooks for EDAC error report In order to allow reporting errors via EDAC, add hooks for: 1) register an EDAC driver; 2) unregister an EDAC driver; 3) report errors via EDAC. As the EDAC driver will need to access the ghes structure, adds it as one of the parameters for ghes_do_proc. Acked-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
40e064153814ce534a28714b25cb98259f107d96	15-Feb-2013	Mauro Carvalho Chehab <mchehab@redhat.com>	ghes: move structures/enum to a header file As a ghes_edac driver will need to access ghes structures, in order to properly handle the errors, move those structures to a separate header file. No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
da095fd3d5063f2dd03468d71f7df39a0430d86f	19-Nov-2012	Bill Pemberton <wfp5p@virginia.edu>	acpi: remove use of __devinit CONFIG_HOTPLUG is going away as an option so __devinit is no longer needed. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Acked-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
b59bc2fbb4bb67e486c40cdb6a306c06acbaec06	21-Nov-2012	Bill Pemberton <wfp5p@virginia.edu>	ACPI: remove use of __devexit CONFIG_HOTPLUG is going away as an option so __devexit is no longer needed. Signed-off-by: Bill Pemberton <wfp5p@virginia.edu> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
34ddeb035d704eafdcdb3cbc781894300136c3c4	12-Jun-2012	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Avoid too much error reporting in runtime This patch fixed the following bug. https://bugzilla.kernel.org/show_bug.cgi?id=43282 This is caused by a firmware bug checking (checking generic address register provided by firmware) in runtime. The checking should be done in address mapping time instead of runtime to avoid too much error reporting in runtime. Reported-by: Pawel Sikora <pluto@agmk.net> Signed-off-by: Huang Ying <ying.huang@intel.com> Tested-by: Jean Delvare <khali@linux-fr.org> Cc: stable@vger.kernel.org Signed-off-by: Len Brown <len.brown@intel.com>
700130b41f4ee54520ac2ef2f7f1d072789711a4	08-Nov-2011	Myron Stowe <mstowe@redhat.com>	ACPI APEI: Convert atomicio routines APEI needs memory access in interrupt context. The obvious choice is acpi_read(), but originally it couldn't be used in interrupt context because it makes temporary mappings with ioremap(). Therefore, we added drivers/acpi/atomicio.c, which provides: acpi_pre_map_gar() -- ioremap in process context acpi_atomic_read() -- memory access in interrupt context acpi_post_unmap_gar() -- iounmap Later we added acpi_os_map_generic_address() (2971852) and enhanced acpi_read() so it works in interrupt context as long as the address has been previously mapped (620242a). Now this sequence: acpi_os_map_generic_address() -- ioremap in process context acpi_read()/apei_read() -- now OK in interrupt context acpi_os_unmap_generic_address() is equivalent to what atomicio.c provides. This patch introduces apei_read() and apei_write(), which currently are functional equivalents of acpi_read() and acpi_write(). This is mainly proactive, to prevent APEI breakages if acpi_read() and acpi_write() are ever augmented to support the 'bit_offset' field of GAS, as APEI's __apei_exec_write_register() precludes splitting up functionality related to 'bit_offset' and APEI's 'mask' (see its APEI_EXEC_PRESERVE_REGISTER block). With apei_read() and apei_write() in place, usages of atomicio routines are converted to apei_read()/apei_write() and existing calls within osl.c and the CA, based on the re-factoring that was done in an earlier patch series - http://marc.info/?l=linux-acpi&m=128769263327206&w=2: acpi_pre_map_gar() --> acpi_os_map_generic_address() acpi_post_unmap_gar() --> acpi_os_unmap_generic_address() acpi_atomic_read() --> apei_read() acpi_atomic_write() --> apei_write() Note that acpi_read() and acpi_write() currently use 'bit_width' for accessing GARs which seems incorrect. 'bit_width' is the size of the register, while 'access_width' is the size of the access the processor must generate on the bus. The 'access_width' may be larger, for example, if the hardware only supports 32-bit or 64-bit reads. I wanted to minimize any possible impacts with this patch series so I did not change this behavior. Signed-off-by: Myron Stowe <myron.stowe@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
46d12f0bcb17b2de89a059114349d472b7eb1783	08-Dec-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Printk queued error record before panic Because printk is not safe inside NMI handler, the recoverable error records received in NMI handler will be queued to be printked in a delayed IRQ context via irq_work. If a fatal error occurs after the recoverable error and before the irq_work processed, we lost a error report. To solve the issue, the queued error records are printked in NMI handler if system will go panic. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
5ba82ab534a325d310fe02af1c149f1072792c7b	08-Dec-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, GHES, Distinguish interleaved error report in kernel log In most cases, printk only guarantees messages from different printk calling will not be interleaved between each other. But, one APEI GHES hardware error report will involve multiple printk calling, normally each for one line. So it is possible that the hardware error report comes from different generic hardware error source will be interleaved. In this patch, a sequence number is prefixed to each line of error report. So that, even if they are interleaved, they still can be distinguished by the prefixed sequence number. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
a654e5ee4f2213844d23361eda4955fe9efaf35f	08-Dec-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, GHES: Add PCIe AER recovery support aer_recover_queue() is called when recoverable PCIe AER errors are notified by firmware to do the recovery work. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
90ab5ee94171b3e28de6bb42ee30b527014e0be7	13-Jan-2012	Rusty Russell <rusty@rustcorp.com.au>	module_param: make bool parameters really bool (drivers & misc) module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9c48f1c629ecfa114850c03f875c6691003214de	30-Sep-2011	Don Zickus <dzickus@redhat.com>	x86, nmi: Wire up NMI handlers to new routines Just convert all the files that have an nmi handler to the new routines. Most of it is straight forward conversion. A couple of places needed some tweaking like kgdb which separates the debug notifier from the nmi handler and mce removes a call to notify_die. [Thanks to Ying for finding out the history behind that mce call https://lkml.org/lkml/2010/5/27/114 And Boris responding that he would like to remove that call because of it https://lkml.org/lkml/2011/9/21/163] The things that get converted are the registeration/unregistration routines and the nmi handler itself has its args changed along with code removal to check which list it is on (most are on one NMI list except for kgdb which has both an NMI routine and an NMI Unknown routine). Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Corey Minyard <minyard@acm.org> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Corey Minyard <minyard@acm.org> Cc: Jack Steiner <steiner@sgi.com> Link: http://lkml.kernel.org/r/1317409584-23662-4-git-send-email-dzickus@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
70cb6e1da00db6c9212e6fd69bd96fd41c797077	03-Aug-2011	Len Brown <len.brown@intel.com>	APEI GHES: 32-bit buildfix drivers/acpi/apei/ghes.c:542: warning: integer overflow in expression drivers/acpi/apei/ghes.c:619: warning: integer overflow in expression ghes.c:(.text+0x46289): undefined reference to `__udivdi3' in function ghes_estatus_cache_add(). Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Len Brown <len.brown@intel.com>
ba61ca4aab47441f1c6cec28a9a6aa0489fd1df3	13-Jul-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, GHES: Add hardware memory error recovery support memory_failure_queue() is called when recoverable memory errors are notified by firmware to do the recovery work. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
152cef40a808d3034e383465b3f7d6783613e458	13-Jul-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, GHES, Error records content based throttle printk is used by GHES to report hardware errors. Ratelimit is enforced on the printk to avoid too many hardware error reports in kernel log. Because there may be thousands or even millions of corrected hardware errors during system running. Currently, a simple scheme is used. That is, the total number of hardware error reporting is ratelimited. This may cause some issues in practice. For example, there are two kinds of hardware errors occurred in system. One is corrected memory error, because the fault memory address is accessed frequently, there may be hundreds error report per-second. The other is corrected PCIe AER error, it will be reported once per-second. Because they share one ratelimit control structure, it is highly possible that only memory error is reported. To avoid the above issue, an error record content based throttle algorithm is implemented in the patch. Where after the first successful reporting, all error records that are same are throttled for some time, to let other kinds of error records have the opportunity to be reported. In above example, the memory errors will be throttled for some time, after being printked. Then the PCIe AER error will be printked successfully. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
67eb2e99076708cc790019a6a08ca3e0ae130a3a	13-Jul-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, GHES, printk support for recoverable error via NMI Some APEI GHES recoverable errors are reported via NMI, but printk is not safe in NMI context. To solve the issue, a lock-less memory allocator is used to allocate memory in NMI handler, save the error record into the allocated memory, put the error record into a lock-less list. On the other hand, an irq_work is used to delay the operation from NMI context to IRQ context. The irq_work IRQ handler will remove nodes from lock-less list, printk the error record and do some further processing include recovery operation, then free the memory. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
9fb0bfe1408d5506b7b83d13d1eed573fd71d67d	13-Jul-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Add WHEA _OSC support APEI firmware first mode must be turned on explicitly on some machines, otherwise there may be no GHES hardware error record for hardware error notification. APEI bit in generic _OSC call can be used to do that, but on some machine, a special WHEA _OSC call must be used. This patch adds the support to that WHEA _OSC call. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
b6a9501658530d8b8374e37f1edb549039a8a260	13-Jul-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, GHES, Support disable GHES at boot time Some machine may have broken firmware so that GHES and firmware first mode should be disabled. This patch adds support to that. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Matthew Garrett <mjg@redhat.com> Signed-off-by: Len Brown <len.brown@intel.com>
5588340d46a484da53bbce8136184d9c7fbc259c	13-Jul-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, GHES, Do not ratelimit fatal error printk before panic printk is used by GHES to report hardware errors. Normally, the printk will be ratelimited to avoid too many hardware error reports in kernel log. Because there may be thousands or even millions of corrected hardware errors during system running. That is different for fatal hardware error, because system will go panic as soon as possible, there will be no more than several error records. And these error records are valuable for system fault diagnosis, so they should not be ratelimited. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
25985edcedea6396277003854657b5f3cb31a628	31-Mar-2011	Lucas De Marchi <lucas.demarchi@profusion.mobi>	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
81e88fdc432a1552401d6e91a984dcccce72b8dc	12-Jan-2011	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Generic Hardware Error Source POLL/IRQ/NMI notification type support Generic Hardware Error Source provides a way to report platform hardware errors (such as that from chipset). It works in so called "Firmware First" mode, that is, hardware errors are reported to firmware firstly, then reported to Linux by firmware. This way, some non-standard hardware error registers or non-standard hardware link can be checked by firmware to produce more valuable hardware error information for Linux. This patch adds POLL/IRQ/NMI notification types support. Because the memory area used to transfer hardware error information from BIOS to Linux can be determined only in NMI, IRQ or timer handler, but general ioremap can not be used in atomic context, so a special version of atomic ioremap is implemented for that. Known issue: - Error information can not be printed for recoverable errors notified via NMI, because printk is not NMI-safe. Will fix this via delay printing to IRQ context via irq_work or make printk NMI-safe. v2: - adjust printk format per comments. Signed-off-by: Huang Ying <ying.huang@intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
32c361f574f85fa47600d84900598e2efc99082e	07-Dec-2010	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Report GHES error information via printk printk is one of the methods to report hardware errors to user space. This patch implements hardware error reporting for GHES via printk. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
1dd6b20e368765223c31569d364219785b24700b	29-Sep-2010	Jin Dongming <jin.dongming@np.css.fujitsu.com>	ACPI, APEI, HEST Fix the unsuitable usage of platform_data platform_data in hest_parse_ghes() is used for saving the address of entry information of erst_tab. When the device is failed to be added, platform_data will be freed by platform_device_put(). But the value saved in platform_data should not be freed here. If it is done, it will make system panic. So I think platform_data should save the address of allocated memory which saves entry information of erst_tab. This patch fixed it and I confirmed it on x86_64 next-tree. v2: Transport the pointer of hest_hdr to platform_data using platform_device_add_data() Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com> Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
7ad6e9435596f692ff65f399da12816c94960185	02-Aug-2010	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Manage GHES as platform devices Register GHES during HEST initialization as platform devices. And make GHES driver into platform device driver. So that the GHES driver module can be loaded automatically when there are GHES available. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
ad4ecef2f13c790f95b55320f2925c205d8f971f	02-Aug-2010	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Rename CPER and GHES severity constants The abbreviation of severity should be SEV instead of SER, so the CPER severity constants are renamed accordingly. GHES severity constants are renamed in the same way too. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
d334a49113a4a33109fd24e46073280ecd1bea0d	18-May-2010	Huang Ying <ying.huang@intel.com>	ACPI, APEI, Generic Hardware Error Source memory error support Generic Hardware Error Source provides a way to report platform hardware errors (such as that from chipset). It works in so called "Firmware First" mode, that is, hardware errors are reported to firmware firstly, then reported to Linux by firmware. This way, some non-standard hardware error registers or non-standard hardware link can be checked by firmware to produce more valuable hardware error information for Linux. Now, only SCI notification type and memory errors are supported. More notification type and hardware error type will be added later. These memory errors are reported to user space through /dev/mcelog via faking a corrected Machine Check, so that the error memory page can be offlined by /sbin/mcelog if the error count for one page is beyond the threshold. On some machines, Machine Check can not report physical address for some corrected memory errors, but GHES can do that. So this simplified GHES is implemented firstly. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>