History log of /drivers/edac/edac_mc.c
Revision Date Author Comments
f4ce6eca71d15b8e12a33ac8e1ef733a83944d2e 13-Aug-2014 Borislav Petkov <bp@suse.de> EDAC: Fix mem_types strings type

This one got forgotten during an earlier cleanup.

Signed-off-by: Borislav Petkov <bp@suse.de>
76ac8275f296b49c58f684825543bf4eb85d43d0 11-Jun-2014 Chen, Gong <gong.chen@linux.intel.com> trace, RAS: Add basic RAS trace event

To avoid confuision and conflict of usage for RAS related trace event,
add an unified RAS trace event stub.

Start a RAS subsystem menu which will be fleshed out in time, when more
features get added to it.

Signed-off-by: Chen, Gong <gong.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1402475691-30045-2-git-send-email-gong.chen@linux.intel.com
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Tony Luck <tony.luck@intel.com>
aa2064d7dd35ac5812645780d2f22a7899e7c6e1 09-May-2014 Loc Ho <lho@apm.com> EDAC: Fix MC scrub mode comparsion bug for correctable errors

The MC structure field scrub_mode is of integer type - not bit field.
Use it accordingly.

Signed-off-by: Loc Ho <lho@apm.com>
Link: http://lkml.kernel.org/r/1399590199-12256-2-git-send-email-lho@apm.com
Signed-off-by: Borislav Petkov <bp@suse.de>
cb6ef42e516cb8948f15e4b70dc03af8020050a2 12-Feb-2014 Borislav Petkov <bp@suse.de> EDAC: Correct workqueue setup path

We're using edac_mc_workq_setup() both on the init path, when
we load an edac driver and when we change the polling period
(edac_mc_reset_delay_period) through /sys/.../edac_mc_poll_msec.

On that second path we don't need to init the workqueue which has been
initialized already.

Thanks to Tejun for workqueue insights.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: <stable@vger.kernel.org>
9da21b1509d8aa7ab4846722817d16c72d656c91 03-Feb-2014 Borislav Petkov <bp@suse.de> EDAC: Poll timeout cannot be zero, p2

Sanitize code even more to accept unsigned longs only and to not allow
polling intervals below 1 second as this is unnecessary and doesn't make
much sense anyway for polling errors.

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1391457913-881-1-git-send-email-prarit@redhat.com
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: <stable@vger.kernel.org>
7270a6085a20a9c6aecb0be8c70510702118dc71 10-Oct-2013 Robert Richter <robert.richter@linaro.org> edac: Unify reporting of device info for device, mc and pci

Log messages slightly differ between edac subsystems. Unifying it.

Signed-off-by: Robert Richter <robert.richter@linaro.org>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Acked-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Robert Richter <rric@kernel.org>
88d84ac97378c2f1d5fec9af1e8b7d9a662d6b00 19-Jul-2013 Borislav Petkov <bp@suse.de> EDAC: Fix lockdep splat

Fix the following:

BUG: key ffff88043bdd0330 not in .data!
------------[ cut here ]------------
WARNING: at kernel/lockdep.c:2987 lockdep_init_map+0x565/0x5a0()
DEBUG_LOCKS_WARN_ON(1)
Modules linked in: glue_helper sb_edac(+) edac_core snd acpi_cpufreq lrw gf128mul ablk_helper iTCO_wdt evdev i2c_i801 dcdbas button cryptd pcspkr iTCO_vendor_support usb_common lpc_ich mfd_core soundcore mperf processor microcode
CPU: 2 PID: 599 Comm: modprobe Not tainted 3.10.0 #1
Hardware name: Dell Inc. Precision T3600/0PTTT9, BIOS A08 01/24/2013
0000000000000009 ffff880439a1d920 ffffffff8160a9a9 ffff880439a1d958
ffffffff8103d9e0 ffff88043af4a510 ffffffff81a16e11 0000000000000000
ffff88043bdd0330 0000000000000000 ffff880439a1d9b8 ffffffff8103dacc
Call Trace:
dump_stack
warn_slowpath_common
warn_slowpath_fmt
lockdep_init_map
? trace_hardirqs_on_caller
? trace_hardirqs_on
debug_mutex_init
__mutex_init
bus_register
edac_create_sysfs_mci_device
edac_mc_add_mc
sbridge_probe
pci_device_probe
driver_probe_device
__driver_attach
? driver_probe_device
bus_for_each_dev
driver_attach
bus_add_driver
driver_register
__pci_register_driver
? 0xffffffffa0010fff
sbridge_init
? 0xffffffffa0010fff
do_one_initcall
load_module
? unset_module_init_ro_nx
SyS_init_module
tracesys
---[ end trace d24a70b0d3ddf733 ]---
EDAC MC0: Giving out device to 'sbridge_edac.c' 'Sandy Bridge Socket#0': DEV 0000:3f:0e.0
EDAC sbridge: Driver loaded.

What happens is that bus_register needs a statically allocated lock_key
because the last is handed in to lockdep. However, struct mem_ctl_info
embeds struct bus_type (the whole struct, not a pointer to it) and the
whole thing gets dynamically allocated.

Fix this by using a statically allocated struct bus_type for the MC bus.

Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Cc: Markus Trippelsdorf <markus@trippelsdorf.de>
Cc: stable@kernel.org # v3.10
Signed-off-by: Tony Luck <tony.luck@intel.com>
9713faecff3d071de1208b081d4943b002e9cb1c 11-Mar-2013 Mauro Carvalho Chehab <mchehab@redhat.com> EDAC: Merge mci.mem_is_per_rank with mci.csbased

Both mci.mem_is_per_rank and mci.csbased denote the same thing: the
memory controller is csrows based. Merge both fields into one.

There's no need for the driver to actually fill it, as the core detects
it by checking if one of the layers has the csrows type as part of the
memory hierarchy:

if (layers[i].type == EDAC_MC_LAYER_CHIP_SELECT)
per_rank = true;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
e7e248304c8ccf02b89e04c3b3b66006b993b5a7 31-Oct-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: add support for raw error reports

That allows APEI GHES driver to report errors directly, using
the EDAC error report API.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
c7ef7645544131b0750478d1cf94cdfa945c809d 21-Feb-2013 Mauro Carvalho Chehab <mchehab@redhat.com> edac: reduce stack pressure by using a pre-allocated buffer

The number of variables at the stack is too big.
Reduces the stack usage by using a pre-allocated error
buffer.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
80cc7d87d5eb34375f916d282450a0906a8ead60 31-Oct-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: lock module owner to avoid error report conflicts

APEI GHES and i7core_edac/sb_edac currently can be loaded at
the same time, but those are Highlander modules:
"There can be only one".

There are two reasons for that:

1) Each driver assumes that it is the only one registering at
the EDAC core, as it is driver's responsibility to number
the memory controllers, and all of them start from 0;

2) If BIOS is handling the memory errors, the OS can't also be
doing it, as one will mangle with the other.

So, we need to add an module owner's lock at the EDAC core,
in order to avoid having two different modules handling memory
errors at the same time. The best way for doing this lock seems
to use the driver's name, as this is unique, and won't require
changes on every driver.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
c66b5a79a9348ccd6d1cd81416027d0e12da965d 15-Feb-2013 Mauro Carvalho Chehab <mchehab@redhat.com> edac: add a new memory layer type

There are some cases where the memory controller layout is
completely hidden. This is the case of firmware-driven error
code, like the one provided by GHES. Add a new layer to be
used on such memory error report mechanisms.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
d3d09e18203dba16a9dbdb2b4cc673d90748cdd1 26-Jan-2013 Joe Perches <joe@perches.com> EDAC: Fix kcalloc argument order

First number, then size.

Signed-off-by: Joe Perches <joe@perches.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Borislav Petkov <bp@suse.de>
80f5ab097b87c86581cb9736a8e55c5a3047d4bb 19-Aug-2012 Shaun Ruffell <sruffell@digium.com> edac: edac_mc no longer deals with kobjects directly

There are no more embedded kobjects in struct mem_ctl_info. Remove a header and
a comment that does not reflect the code anymore.

Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
f430d5707aa47af8669bbc0083a79e7d780908b2 10-Sep-2012 Borislav Petkov <borislav.petkov@amd.com> EDAC: Handle empty msg strings when reporting errors

A reported error could look like this

[ 226.178315] EDAC MC0: 1 CE on mc#0csrow#0channel#0 (csrow:0 channel:0 page:0x427c0d offset:0xde0 grain:0 syndrome:0x1c6)

with two spaces back-to-back due to the msg argument of
edac_mc_handle_error being passed on empty by the specific drivers.
Handle that.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
4da1b7bfe7699881c761d71b5e299a65bce48ab2 10-Sep-2012 Borislav Petkov <borislav.petkov@amd.com> EDAC: Remove useless assignment of error type

The tracepoint decodes the error type later anyway so remove a useless
assignment to the temporary p which gets overwritten later anyway.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
24bef66e74d647aebd34e0bef7693512b7912029 24-Oct-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Fix the dimm filling for csrows-based layouts

The driver is currently filling data in a wrong way, on drivers
for csrows-based memory controller, when the first layer is a
csrow.

This is not easily to notice, as, in general, memories are
filed in dual, interleaved, symetric mode, as very few memory
controllers support asymetric modes.

While digging into a bug for i82795_edac driver, the asymetric
mode there is now working, allowing us to fill the machine with
4x1GB ranks at channel 0, and 2x512GB at channel 1:

Channel 0 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM A0: from page 0x00000000 to 0x0003ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A1: from page 0x00040000 to 0x0007ffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A2: from page 0x00080000 to 0x000bffff (size: 0x00040000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM A3: from page 0x000c0000 to 0x000fffff (size: 0x00040000 pages)

Channel 1 ranks:
EDAC DEBUG: i82975x_init_csrows: DIMM B0: from page 0x00100000 to 0x0011ffff (size: 0x00020000 pages)
EDAC DEBUG: i82975x_init_csrows: DIMM B1: from page 0x00120000 to 0x0013ffff (size: 0x00020000 pages)

Instead of properly showing the memories as such, before this patch, it
shows the memory layout as:

+-----------------------------------+
| mc0 |
| csrow0 | csrow1 | csrow2 |
----------+-----------------------------------+
channel1: | 1024 MB | 1024 MB | 512 MB |
channel0: | 1024 MB | 1024 MB | 512 MB |
----------+-----------------------------------+

as if both channels were symetric, grouping the DIMMs on a wrong
layout.

After this patch, the memory is correctly represented.
So, for csrows at layers[0], it shows:

+-----------------------------------------------+
| mc0 |
| csrow0 | csrow1 | csrow2 | csrow3 |
----------+-----------------------------------------------+
channel1: | 512 MB | 512 MB | 0 MB | 0 MB |
channel0: | 1024 MB | 1024 MB | 1024 MB | 1024 MB |
----------+-----------------------------------------------+

For csrows at layers[1], it shows:

+-----------------------+
| mc0 |
| channel0 | channel1 |
--------+-----------------------+
csrow3: | 1024 MB | 0 MB |
csrow2: | 1024 MB | 0 MB |
--------+-----------------------+
csrow1: | 1024 MB | 512 MB |
csrow0: | 1024 MB | 512 MB |
--------+-----------------------+

So, no matter of what comes first, the information between
channel and csrow will be properly represented.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
faa2ad09c01c48012fe4c117d3256e354e0f9238 23-Sep-2012 Shaun Ruffell <sruffell@digium.com> edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.

Fix potential NULL pointer dereference in edac_unregister_sysfs() on
system boot introduced in 3.6-rc1.

Since commit 7a623c039 ("edac: rewrite the sysfs code to use struct
device") edac_mc_alloc() no longer initializes embedded kobjects in
struct mem_ctl_info. Therefore edac_mc_free() can no longer simply
decrement a kobject reference count to free the allocated memory unless
the memory controller driver module had also called edac_mc_add_mc().

Now edac_mc_free() will check if the newly embedded struct device has
been registered with sysfs before using either the standard device
release functions or freeing the data structures itself with logic
pulled out of the error path of edac_mc_alloc().

The BUG this patch resolves for me:

BUG: unable to handle kernel NULL pointer dereference at (null)
EIP is at __wake_up_common+0x1a/0x6a
Process modprobe (pid: 933, ti=f3dc6000 task=f3db9520 task.ti=f3dc6000)
Call Trace:
complete_all+0x3f/0x50
device_pm_remove+0x23/0xa2
device_del+0x34/0x142
edac_unregister_sysfs+0x3b/0x5c [edac_core]
edac_mc_free+0x29/0x2f [edac_core]
e7xxx_probe1+0x268/0x311 [e7xxx_edac]
e7xxx_init_one+0x56/0x61 [e7xxx_edac]
local_pci_probe+0x13/0x15
...

Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Signed-off-by: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ef6e7816b4546475d04b4ea22d58c48472157c70 23-Sep-2012 Fengguang Wu <fengguang.wu@intel.com> edac_mc: fix messy kfree calls in the error path

coccinelle warns about:

+ drivers/edac/edac_mc.c:429:9-23: ERROR: reference preceded by free on line 429

421 if (mci->csrows) {
> 422 for (chn = 0; chn < tot_channels; chn++) {
423 csr = mci->csrows[chn];
424 if (csr) {
> 425 for (chn = 0; chn < tot_channels; chn++)
426 kfree(csr->channels[chn]);
427 kfree(csr);
428 }
> 429 kfree(mci->csrows[i]);
430 }
431 kfree(mci->csrows);
432 }

and that code block seem to mess things up in several ways (double free, memory
leak, out-of-bound reads etc.):

L422: The iterator "chn" and bound "tot_channels" are totally wrong. Should be
"row" and "tot_csrows" respectively. Which means either memory leak, or
out-of-bound reads (which if does not trigger an immediate page fault
error, will further lead to kfree() on random addresses).

L425: The inner loop is reusing the same iterator "chn" as the outer loop,
which could lead to premature end of the outer loop, and hence memory leak.

L429: The array index 'i' in mci->csrows[i] is a temporary value used in
previous loops, and won't change at all in the current loop. Which
means either out-of-bound read and possibly kfree(random number), or the
same mci->csrows[i] get freed once and again, and possibly double free
for the kfree(csr) in L427.

L426/L427: a kfree(csr->channels) is needed in between to avoid leaking the memory.

The buggy code was introduced by commit de3910eb ("edac: change the mem
allocation scheme to make Documentation/kobject.txt happy") in the 3.6-rc1
merge window. Fix it by freeing up resources in this order:

free csrows[i]->channels[j]
free csrows[i]->channels
free csrows[i]
free csrows

CC: Mauro Carvalho Chehab <mchehab@redhat.com>
CC: Shaun Ruffell <sruffell@digium.com>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
41f63c5359d14ca995172b8f6eaffd93f60fec54 03-Aug-2012 Tejun Heo <tj@kernel.org> workqueue: use mod_delayed_work() instead of cancel + queue

Convert delayed_work users doing cancel_delayed_work() followed by
queue_delayed_work() to mod_delayed_work().

Most conversions are straight-forward. Ones worth mentioning are,

* drivers/edac/edac_mc.c: edac_mc_workq_setup() converted to always
use mod_delayed_work() and cancel loop in
edac_mc_reset_delay_period() is dropped.

* drivers/platform/x86/thinkpad_acpi.c: No need to remember whether
watchdog is active or not. @fan_watchdog_active and related code
dropped.

* drivers/power/charger-manager.c: Seemingly a lot of
delayed_work_pending() abuse going on here.
[delayed_]work_pending() are unsynchronized and racy when used like
this. I converted one instance in fullbatt_handler(). Please
conver the rest so that it invokes workqueue APIs for the intended
target state rather than trying to game work item pending state
transitions. e.g. if timer should be modified - call
mod_delayed_work(), canceled - call cancel_delayed_work[_sync]().

* drivers/thermal/thermal_sys.c: thermal_zone_device_set_polling()
simplified. Note that round_jiffies() calls in this function are
meaningless. round_jiffies() work on absolute jiffies not delta
delay used by delayed_work.

v2: Tomi pointed out that __cancel_delayed_work() users can't be
safely converted to mod_delayed_work(). They could be calling it
from irq context and if that happens while delayed_work_timer_fn()
is running, it could deadlock. __cancel_delayed_work() users are
dropped.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Len Brown <len.brown@intel.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
9eb07a7fb8a90ee39fa9d5489afc0330cfcfbea7 04-Jun-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: edac_mc_handle_error(): add an error_count parameter

In order to avoid loosing error events, it is desirable to group
error events together and generate a single trace for several identical
errors.

The trace API already allows reporting multiple errors. Change the
handle_error function to also allow that.

The changes at the drivers were made by this small script:

$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\,]+)\,([^\,]+)\,/$1($2,$3, 1,/g;
print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
03f7eae80f4b913929be84e0c883ee98196fd6ff 04-Jun-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: remove arch-specific parameter for the error handler

Remove the arch-dependent parameter, as it were not used,
as the MCE tracepoint weren't implemented. It probably doesn't
make sense to have an MCE-specific tracepoint, as this will
cost more bytes at the tracepoint, and tracepoint is not free.

The changes at the EDAC drivers were done by this small perl script:

$file .=$_ while (<>);
$file =~ s/(edac_mc_handle_error)\s*\(([^\;]+)\,([^\,\)]+)\s*\)/$1($2)/g;
print $file;

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
08a4a136909602eae0e71e147153461df077a46f 18-May-2012 Dan Carpenter <dan.carpenter@oracle.com> edac_mc: check for allocation failure in edac_mc_alloc()

Add a check here for if kzalloc() failed.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
6e84d359b2bea5ce659b3c3e5d3003fb11bd91d5 30-Apr-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac_mc: Cleanup per-dimm_info debug messages

The edac_mc_alloc() routine allocates one dimm_info device for all
possible memories, including the non-filled ones. The debug messages
there are somewhat confusing. So, cleans them, by moving the code
that prints the memory location to edac_mc, and using it on both
edac_mc_sysfs and edac_mc.

Also, only dumps information when DIMM/ranks are actually
filled.

After this patch, a dimm-based memory controller will print the debug
info as:

[ 1011.380027] EDAC DEBUG: edac_mc_dump_csrow: csrow->csrow_idx = 0
[ 1011.380029] EDAC DEBUG: edac_mc_dump_csrow: csrow = ffff8801169be000
[ 1011.380031] EDAC DEBUG: edac_mc_dump_csrow: csrow->first_page = 0x0
[ 1011.380032] EDAC DEBUG: edac_mc_dump_csrow: csrow->last_page = 0x0
[ 1011.380034] EDAC DEBUG: edac_mc_dump_csrow: csrow->page_mask = 0x0
[ 1011.380035] EDAC DEBUG: edac_mc_dump_csrow: csrow->nr_channels = 3
[ 1011.380037] EDAC DEBUG: edac_mc_dump_csrow: csrow->channels = ffff8801149c2840
[ 1011.380039] EDAC DEBUG: edac_mc_dump_csrow: csrow->mci = ffff880117426000
[ 1011.380041] EDAC DEBUG: edac_mc_dump_channel: channel->chan_idx = 0
[ 1011.380042] EDAC DEBUG: edac_mc_dump_channel: channel = ffff8801149c2860
[ 1011.380044] EDAC DEBUG: edac_mc_dump_channel: channel->csrow = ffff8801169be000
[ 1011.380046] EDAC DEBUG: edac_mc_dump_channel: channel->dimm = ffff88010fe90400
...
[ 1011.380095] EDAC DEBUG: edac_mc_dump_dimm: dimm0: channel 0 slot 0 mapped as virtual row 0, chan 0
[ 1011.380097] EDAC DEBUG: edac_mc_dump_dimm: dimm = ffff88010fe90400
[ 1011.380099] EDAC DEBUG: edac_mc_dump_dimm: dimm->label = 'CPU#0Channel#0_DIMM#0'
[ 1011.380101] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
[ 1011.380103] EDAC DEBUG: edac_mc_dump_dimm: dimm->grain = 8
[ 1011.380104] EDAC DEBUG: edac_mc_dump_dimm: dimm->nr_pages = 0x40000
...

(a rank-based memory controller would print, instead of "dimm?", "rank?"
on the above debug info)

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
956b9ba156dbfdb9cede2b2927ddf8be2233b3a7 29-Apr-2012 Joe Perches <joe@perches.com> edac: Convert debugfX to edac_dbg(X,

Use a more common debugging style.

Remove __FILE__ uses, add missing newlines,
coalesce formats and align arguments.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
dd23cd6eb1f59ba722a6e6aa228adff7c01404de 29-Apr-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Don't add __func__ or __FILE__ for debugf[0-9] msgs

The debug macro already adds that. Most of the work here was
made by this small script:

$f .=$_ while (<>);

$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*": /\1"/g;
$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*/\1/g;
$f =~ s/(debugf[0-9]\s*\(\s*)__FILE__\s*"MC: /\1"/g;

$f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
$f =~ s/(debugf[0-9]\s*\(\")\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;
$f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+)__func__\s*\,\s*/\1\2/g;
$f =~ s/(debugf[0-9]\s*\(\"MC\:\s*)\%s[\:\,\(\)]*\s*([^\"]*\s*[^\)]+),\s*__func__\s*\)/\1\2)/g;

$f =~ s/\"MC\: \\n\"/"MC:\\n"/g;

print $f;

After running the script, manual cleanups were done to fix it the remaining
places.

While here, removed the __LINE__ on most places, as it doesn't actually give
useful info on most places.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
de3910eb79ac8c0f29a11224661c0ebaaf813039 24-Apr-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: change the mem allocation scheme to make Documentation/kobject.txt happy

Kernel kobjects have rigid rules: each container object should be
dynamically allocated, and can't be allocated into a single kmalloc.

EDAC never obeyed this rule: it has a single malloc function that
allocates all needed data into a single kzalloc.

As this is not accepted anymore, change the allocation schema of the
EDAC *_info structs to enforce this Kernel standard.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Greg K H <gregkh@linuxfoundation.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
d90c008963ef638cb7ab7d5eb76362b3c2d379bc 21-Mar-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Get rid of the old kobj's from the edac mc code

Now that al users for the old kobj raw access are gone,
we can get rid of the legacy kobj-based structures and
data.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
7a623c039075e4ea21648d88133fafa6dcfd113d 16-Apr-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: rewrite the sysfs code to use struct device

The EDAC subsystem uses the old struct sysdev approach,
creating all nodes using the raw sysfs API. This is bad,
as the API is deprecated.

As we'll be changing the EDAC API, let's first port the existing
code to struct device.

There's one drawback on this patch: driver-specific sysfs
nodes, used by mpc85xx_edac, amd64_edac and i7core_edac
won't be created anymore. While it would be possible to
also port the device-specific code, that would mix kobj with
struct device, with is not recommended. Also, it is easier and nicer
to move the code to the drivers, instead, as the core can get rid
of some complex logic that just emulates what the device_add()
and device_create_file() already does.

The next patches will convert the driver-specific code to use
the device-specific calls. Then, the remaining bits of the old
sysfs API will be removed.

NOTE: a per-MC bus is required, otherwise devices with more than
one memory controller will hit a bug like the one below:

[ 819.094946] EDAC DEBUG: find_mci_by_dev: find_mci_by_dev()
[ 819.094948] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device() idx=1
[ 819.094952] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device(): creating device mc1
[ 819.094967] EDAC DEBUG: edac_create_sysfs_mci_device: edac_create_sysfs_mci_device creating dimm0, located at channel 0 slot 0
[ 819.094984] ------------[ cut here ]------------
[ 819.100142] WARNING: at fs/sysfs/dir.c:481 sysfs_add_one+0xc1/0xf0()
[ 819.107282] Hardware name: S2600CP
[ 819.111078] sysfs: cannot create duplicate filename '/bus/edac/devices/dimm0'
[ 819.119062] Modules linked in: sb_edac(+) edac_core ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle iptable_filter ip_tables bridge stp llc sunrpc binfmt_misc dm_mirror dm_region_hash dm_log vhost_net macvtap macvlan tun kvm microcode pcspkr iTCO_wdt iTCO_vendor_support igb i2c_i801 i2c_core sg ioatdma dca sr_mod cdrom sd_mod crc_t10dif ahci libahci isci libsas libata scsi_transport_sas scsi_mod wmi dm_mod [last unloaded: scsi_wait_scan]
[ 819.175748] Pid: 10902, comm: modprobe Not tainted 3.3.0-0.11.el7.v12.2.x86_64 #1
[ 819.184113] Call Trace:
[ 819.186868] [<ffffffff8105adaf>] warn_slowpath_common+0x7f/0xc0
[ 819.193573] [<ffffffff8105aea6>] warn_slowpath_fmt+0x46/0x50
[ 819.200000] [<ffffffff811f53d1>] sysfs_add_one+0xc1/0xf0
[ 819.206025] [<ffffffff811f5cf5>] sysfs_do_create_link+0x135/0x220
[ 819.212944] [<ffffffff811f7023>] ? sysfs_create_group+0x13/0x20
[ 819.219656] [<ffffffff811f5df3>] sysfs_create_link+0x13/0x20
[ 819.226109] [<ffffffff813b04f6>] bus_add_device+0xe6/0x1b0
[ 819.232350] [<ffffffff813ae7cb>] device_add+0x2db/0x460
[ 819.238300] [<ffffffffa0325634>] edac_create_dimm_object+0x84/0xf0 [edac_core]
[ 819.246460] [<ffffffffa0325e18>] edac_create_sysfs_mci_device+0xe8/0x290 [edac_core]
[ 819.255215] [<ffffffffa0322e2a>] edac_mc_add_mc+0x5a/0x2c0 [edac_core]
[ 819.262611] [<ffffffffa03412df>] sbridge_register_mci+0x1bc/0x279 [sb_edac]
[ 819.270493] [<ffffffffa03417a3>] sbridge_probe+0xef/0x175 [sb_edac]
[ 819.277630] [<ffffffff813ba4e8>] ? pm_runtime_enable+0x58/0x90
[ 819.284268] [<ffffffff812f430c>] local_pci_probe+0x5c/0xd0
[ 819.290508] [<ffffffff812f5ba1>] __pci_device_probe+0xf1/0x100
[ 819.297117] [<ffffffff812f5bea>] pci_device_probe+0x3a/0x60
[ 819.303457] [<ffffffff813b1003>] really_probe+0x73/0x270
[ 819.309496] [<ffffffff813b138e>] driver_probe_device+0x4e/0xb0
[ 819.316104] [<ffffffff813b149b>] __driver_attach+0xab/0xb0
[ 819.322337] [<ffffffff813b13f0>] ? driver_probe_device+0xb0/0xb0
[ 819.329151] [<ffffffff813af5d6>] bus_for_each_dev+0x56/0x90
[ 819.335489] [<ffffffff813b0d7e>] driver_attach+0x1e/0x20
[ 819.341534] [<ffffffff813b0980>] bus_add_driver+0x1b0/0x2a0
[ 819.347884] [<ffffffffa0347000>] ? 0xffffffffa0346fff
[ 819.353641] [<ffffffff813b19f6>] driver_register+0x76/0x140
[ 819.359980] [<ffffffff8159f18b>] ? printk+0x51/0x53
[ 819.365524] [<ffffffffa0347000>] ? 0xffffffffa0346fff
[ 819.371291] [<ffffffff812f5896>] __pci_register_driver+0x56/0xd0
[ 819.378096] [<ffffffffa0347054>] sbridge_init+0x54/0x1000 [sb_edac]
[ 819.385231] [<ffffffff8100203f>] do_one_initcall+0x3f/0x170
[ 819.391577] [<ffffffff810bcd2e>] sys_init_module+0xbe/0x230
[ 819.397926] [<ffffffff815bb529>] system_call_fastpath+0x16/0x1b
[ 819.404633] ---[ end trace 1654fdd39556689f ]---

This happens because the bus is not being properly initialized.
Instead of putting the memory sub-devices inside the memory controller,
it is putting everything under the same directory:

$ tree /sys/bus/edac/
/sys/bus/edac/
├── devices
│ ├── all_channel_counts -> ../../../devices/system/edac/mc/mc0/all_channel_counts
│ ├── csrow0 -> ../../../devices/system/edac/mc/mc0/csrow0
│ ├── csrow1 -> ../../../devices/system/edac/mc/mc0/csrow1
│ ├── csrow2 -> ../../../devices/system/edac/mc/mc0/csrow2
│ ├── dimm0 -> ../../../devices/system/edac/mc/mc0/dimm0
│ ├── dimm1 -> ../../../devices/system/edac/mc/mc0/dimm1
│ ├── dimm3 -> ../../../devices/system/edac/mc/mc0/dimm3
│ ├── dimm6 -> ../../../devices/system/edac/mc/mc0/dimm6
│ ├── inject_addrmatch -> ../../../devices/system/edac/mc/mc0/inject_addrmatch
│ ├── mc -> ../../../devices/system/edac/mc
│ └── mc0 -> ../../../devices/system/edac/mc/mc0
├── drivers
├── drivers_autoprobe
├── drivers_probe
└── uevent

On a multi-memory controller system, the names "csrow%d" and "dimm%d"
should be under "mc%d", and not at the main hierarchy level.

So, we need to create a per-MC bus, in order to have its own namespace.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Greg K H <gregkh@linuxfoundation.org>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
8447c4d15e357a458c9051ddc84aa6c8b9c27000 06-Jun-2012 Chris Metcalf <cmetcalf@tilera.com> edac: Do alignment logic properly in edac_align_ptr()

The logic was checking the sizeof the structure being allocated to
determine whether an alignment fixup was required. This isn't right;
what we actually care about is the alignment of the actual pointer that's
about to be returned. This became an issue recently because struct
edac_mc_layer has a size that is not zero modulo eight, so we were
taking the correctly-aligned pointer and forcing it to be misaligned.
On Tile this caused an alignment exception.

Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
fd687502dc8037aa5a4b84c570ada971106574ee 16-Mar-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Rename the parent dev to pdev

As EDAC doesn't use struct device itself, it created a parent dev
pointer called as "pdev". Now that we'll be converting it to use
struct device, instead of struct devsys, this needs to be fixed.

No functional changes.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
53f2d02898755d1b24bde1975e202815d29fdb81 23-Feb-2012 Mauro Carvalho Chehab <mchehab@redhat.com> RAS: Add a tracepoint for reporting memory controller events

Add a new tracepoint-based hardware events report method for
reporting Memory Controller events.

Part of the description bellow is shamelessly copied from Tony
Luck's notes about the Hardware Error BoF during LPC 2010 [1].
Tony, thanks for your notes and discussions to generate the
h/w error reporting requirements.

[1] http://lwn.net/Articles/416669/

We have several subsystems & methods for reporting hardware errors:

1) EDAC ("Error Detection and Correction"). In its original form
this consisted of a platform specific driver that read topology
information and error counts from chipset registers and reported
the results via a sysfs interface.

2) mcelog - x86 specific decoding of machine check bank registers
reporting in binary form via /dev/mcelog. Recent additions make use
of the APEI extensions that were documented in version 4.0a of the
ACPI specification to acquire more information about errors without
having to rely reading chipset registers directly. A user level
programs decodes into somewhat human readable format.

3) drivers/edac/mce_amd.c - this driver hooks into the mcelog path and
decodes errors reported via machine check bank registers in AMD
processors to the console log using printk();

Each of these mechanisms has a band of followers ... and none
of them appear to meet all the needs of all users.

As part of a RAS subsystem, let's encapsulate the memory error hardware
events into a trace facility.

The tracepoint printk will be displayed like:

mc_event: [quant] (Corrected|Uncorrected|Fatal) error:[error msg] on [label] ([location] [edac_mc detail] [driver_detail]

Where:
[quant] is the quantity of errors
[error msg] is the driver-specific error message
(e. g. "memory read", "bus error", ...);
[location] is the location in terms of memory controller and
branch/channel/slot, channel/slot or csrow/channel;
[label] is the memory stick label;
[edac_mc detail] describes the address location of the error
and the syndrome;
[driver detail] is driver-specifig error message details,
when needed/provided (e. g. "area:DMA", ...)

For example:

mc_event: 1 Corrected error:memory read on memory stick DIMM_1A (mc:0 location:0:0:0 page:0x586b6e offset:0xa66 grain:32 syndrome:0x0 area:DMA)

Of course, any userspace tools meant to handle errors should not parse
the above data. They should, instead, use the binary fields provided by
the tracepoint, mapping them directly into their Management Information
Base.

NOTE: The original patch was providing an additional mechanism for
MCA-based trace events that also contained MCA error register data.
However, as no agreement was reached so far for the MCA-based trace
events, for now, let's add events only for memory errors.
A latter patch is planned to change the tracepoint, for those types
of event.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
5926ff502f6b93ca0c1654f8a5c5317ea236dbdb 09-Feb-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Initialize the dimm label with the known information

While userspace doesn't fill the dimm labels, add there the dimm location,
as described by the used memory model. This could eventually match what
is described at the dmidecode, making easier for people to identify the
memory.

For example, on an Intel motherboard where the DMI table is reliable,
the first memory stick is described as:

Memory Device
Array Handle: 0x0029
Error Information Handle: Not Provided
Total Width: 64 bits
Data Width: 64 bits
Size: 2048 MB
Form Factor: DIMM
Set: 1
Locator: A1_DIMM0
Bank Locator: A1_Node0_Channel0_Dimm0
Type: <OUT OF SPEC>
Type Detail: Synchronous
Speed: 800 MHz
Manufacturer: A1_Manufacturer0
Serial Number: A1_SerNum0
Asset Tag: A1_AssetTagNum0
Part Number: A1_PartNum0

The memory named as "A1_DIMM0" is physically located at the first
memory controller (node 0), at channel 0, dimm slot 0.

After this patch, the memory label will be filled with:
/sys/devices/system/edac/mc/csrow0/ch0_dimm_label:mc#0channel#0slot#0

And (after the new EDAC API patches) as:
/sys/devices/system/edac/mc/mc0/dimm0/dimm_label:mc#0channel#0slot#0

So, even if the memory label is not initialized on userspace, an useful
information with the error location is filled there, expecially since
several systems/motherboards are provided with enough info to map from
channel/slot (or branch/channel/slot) into the DIMM label. So, letting the
EDAC core fill it by default is a good thing.

It should noticed that, as the label filling happens at the
edac_mc_alloc(), drivers can override it to better describe the memories
(and some actually do it).

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
ca0907b9e413bb1d1f3ea123b663535b74928846 02-May-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Remove the legacy EDAC ABI

Now that all drivers got converted to use the new ABI, we can
drop the old one.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
4275be63559719c3149b19751029f1b0f1b26775 18-Apr-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Change internal representation to work with layers

Change the EDAC internal representation to work with non-csrow
based memory controllers.

There are lots of those memory controllers nowadays, and more
are coming. So, the EDAC internal representation needs to be
changed, in order to work with those memory controllers, while
preserving backward compatibility with the old ones.

The edac core was written with the idea that memory controllers
are able to directly access csrows.

This is not true for FB-DIMM and RAMBUS memory controllers.

Also, some recent advanced memory controllers don't present a per-csrows
view. Instead, they view memories as DIMMs, instead of ranks.

So, change the allocation and error report routines to allow
them to work with all types of architectures.

This will allow the removal of several hacks with FB-DIMM and RAMBUS
memory controllers.

Also, several tests were done on different platforms using different
x86 drivers.

TODO: a multi-rank DIMMs are currently represented by multiple DIMM
entries in struct dimm_info. That means that changing a label for one
rank won't change the same label for the other ranks at the same DIMM.
This bug is present since the beginning of the EDAC, so it is not a big
deal. However, on several drivers, it is possible to fix this issue, but
it should be a per-driver fix, as the csrow => DIMM arrangement may not
be equal for all. So, don't try to fix it here yet.

I tried to make this patch as short as possible, preceding it with
several other patches that simplified the logic here. Yet, as the
internal API changes, all drivers need changes. The changes are
generally bigger in the drivers for FB-DIMMs.

Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
93e4fe64ece4eccf0ff4ac69bceb389290b8ab7c 16-Apr-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: rewrite edac_align_ptr()

The edac_align_ptr() function is used to prepare data for a single
memory allocation kzalloc() call. It counts how many bytes are needed
by some data structure.

Using it as-is is not that trivial, as the quantity of memory elements
reserved is not there, but, instead, it is on a next call.

In order to avoid mistakes when using it, move the number of allocated
elements into it, making easier to use it.

Reviewed-by: Borislav Petkov <bp@amd64.org>
Cc: Aristeu Rozanski <arozansk@redhat.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
a895bf8b1e1ea4c032a8fa8a09475a2ce09fe77a 28-Jan-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: move nr_pages to dimm struct

The number of pages is a dimm property. Move it to the dimm struct.

After this change, it is possible to add sysfs nodes for the DIMM's that
will properly represent the DIMM stick properties, including its size.

A TODO fix here is to properly represent dual-rank/quad-rank DIMMs when
the memory controller represents the memory via chip select rows.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
084a4fccef39ac7abb039511f32380f28d0b67e6 27-Jan-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: move dimm properties to struct dimm_info

On systems based on chip select rows, all channels need to use memories
with the same properties, otherwise the memories on channels A and B
won't be recognized.

However, such assumption is not true for all types of memory
controllers.

Controllers for FB-DIMM's don't have such requirements.

Also, modern Intel controllers seem to be capable of handling such
differences.

So, we need to get rid of storing the DIMM information into a per-csrow
data, storing it, instead at the right place.

The first step is to move grain, mtype, dtype and edac_mode to the
per-dimm struct.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Mark Gross <mark.gross@intel.com>
Cc: Jason Uhlenkott <juhlenko@akamai.com>
Cc: Tim Small <tim@buttersideup.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: Olof Johansson <olof@lixom.net>
Cc: Egor Martovetsky <egor@pasemi.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Joe Perches <joe@perches.com>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Hitoshi Mitake <h.mitake@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Bottomley <James.Bottomley@parallels.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Cc: Shaohui Xie <Shaohui.Xie@freescale.com>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Mike Williams <mike@mikebwilliams.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
a7d7d2e1a07e3811dc49af2962c940fd8bbb6c8f 27-Jan-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: Create a dimm struct and move the labels into it

The way a DIMM is currently represented implies that they're
linked into a per-csrow struct. However, some drivers don't see
csrows, as they're ridden behind some chip like the AMB's
on FBDIMM's, for example.

This forced drivers to fake^Wvirtualize a csrow struct, and to create
a mess under csrow/channel original's concept.

Move the DIMM labels into a per-DIMM struct, and add there
the real location of the socket, in terms of csrow/channel.
Latter patches will modify the location to properly represent the
memory architecture.

All other drivers will use a per-csrow type of location.
Some of those drivers will require a latter conversion, as
they also fake the csrows internally.

TODO: While this patch doesn't change the existing behavior, on
csrows-based memory controllers, a csrow/channel pair points to a memory
rank. There's a known bug at the EDAC core that allows having different
labels for the same DIMM, if it has more than one rank. A latter patch
is need to merge the several ranks for a DIMM into the same dimm_info
struct, in order to avoid having different labels for the same DIMM.

The edac_mc_alloc() will now contain a per-dimm initialization loop that
will be changed by latter patches in order to match other types of
memory architectures.

Reviewed-by: Aristeu Rozanski <arozansk@redhat.com>
Reviewed-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Cc: Ranganathan Desikan <ravi@jetztechnologies.com>
Cc: "Arvind R." <arvino55@gmail.com>
Cc: "Niklas Söderlund" <niklas.soderlund@ericsson.com>
Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
a4b4be3fd7a76021f67380b03d8bccebf067db72 27-Jan-2012 Mauro Carvalho Chehab <mchehab@redhat.com> edac: rename channel_info to rank_info

What it is pointed by a csrow/channel vector is a rank information, and
not a channel information.

On a traditional architecture, the memory controller directly access the
memory ranks, via chip select rows. Different ranks at the same DIMM is
selected via different chip select rows. So, typically, one
csrow/channel pair means one different DIMM.

On FB-DIMMs, there's a microcontroller chip at the DIMM, called Advanced
Memory Buffer (AMB) that serves as the interface between the memory
controller and the memory chips.

The AMB selection is via the DIMM slot, and not via a csrow.

It is up to the AMB to talk with the csrows of the DRAM chips.

So, the FB-DIMM memory controllers see the DIMM slot, and not the DIMM
rank. RAMBUS is similar.

Newer memory controllers, like the ones found on Intel Sandy Bridge and
Nehalem, even working with normal DDR3 DIMM's, don't use the usual
channel A/channel B interleaving schema to provide 128 bits data access.

Instead, they have more channels (3 or 4 channels), and they can use
several interleaving schemas. Such memory controllers see the DIMMs
directly on their registers, instead of the ranks, which is better for
the driver, as its main usageis to point to a broken DIMM stick (the
Field Repleceable Unit), and not to point to a broken DRAM chip.

The drivers that support such such newer memory architecture models
currently need to fake information and to abuse on EDAC structures, as
the subsystem was conceived with the idea that the csrow would always be
visible by the CPU.

To make things a little worse, those drivers don't currently fake
csrows/channels on a consistent way, as the concepts there don't apply
to the memory controllers they're talking with. So, each driver author
interpreted the concepts using a different logic.

In order to fix it, let's rename the data structure that points into a
DIMM rank to "rank_info", in order to be clearer about what's stored
there.

Latter patches will provide a better way to represent the memory
hierarchy for the other types of memory controller.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
4e5df7ca3091a846b65f2a940a68506790a62d6a 25-Nov-2011 Cong Wang <amwang@redhat.com> edac: remove the second argument of k[un]map_atomic()

Signed-off-by: Cong Wang <amwang@redhat.com>
fe5ff8b84c8b03348a2f64ea9d884348faec2217 15-Dec-2011 Kay Sievers <kay.sievers@vrfy.org> edac: convert sysdev_class to a regular subsystem

After all sysdev classes are ported to regular driver core entities, the
sysdev implementation will be entirely removed from the kernel.

Cc: Doug Thompson <dougthompson@xmission.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
e2e77098764636456ba7092a8b3b3b34b2a8e8d8 27-May-2011 Lai Jiangshan <laijs@cn.fujitsu.com> edac,rcu: use synchronize_rcu() instead of call_rcu()+rcu_barrier()

synchronize_rcu() does the stuff as needed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
24f9a7fe3f19f3fd310f556364d01a22911724b3 07-Oct-2010 Borislav Petkov <borislav.petkov@amd.com> amd64_edac: Rework printk macros

Add a macro per printk level, shorten up error messages. Add relevant
information to KERN_INFO level. No functional change.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
bb31b3122c0dd07d2d958da17a50ad771ce79e2b 02-Dec-2010 Borislav Petkov <borislav.petkov@amd.com> EDAC: Fix workqueue-related crashes

00740c58541b6087d78418cebca1fcb86dc6077d changed edac_core to
un-/register a workqueue item only if a lowlevel driver supplies a
polling routine. Normally, when we remove a polling low-level driver, we
go and cancel all the queued work. However, the workqueue unreg happens
based on the ->op_state setting, and edac_mc_del_mc() sets this to
OP_OFFLINE _before_ we cancel the work item, leading to NULL ptr oops on
the workqueue list.

Fix it by putting the unreg stuff in proper order.

Cc: <stable@kernel.org> #36.x
Reported-and-tested-by: Tobias Karnat <tobias.karnat@googlemail.com>
LKML-Reference: <1291201307.3029.21.camel@Tobias-Karnat>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
accf74fff36315a31dc7319dae2927af06e9296f 16-Aug-2010 Mauro Carvalho Chehab <mchehab@redhat.com> i7core_edac: don't use a freed mci struct

This is a nasty bug. Since kobject count will be reduced by zero by
edac_mc_del_mc(), and this triggers the kobj release method, the
mci memory will be freed automatically. So, all we have left is ctl_name,
as shown by enabling debug:

[ 80.822186] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1020: edac_remove_sysfs_mci_device() remove_link
[ 80.832590] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 1024: edac_remove_sysfs_mci_device() remove_mci_instance
[ 80.843776] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 640: edac_mci_control_release() mci instance idx=0 releasing
[ 80.855163] EDAC MC: Removed device 0 for i7core_edac.c i7 core #0: DEV 0000:3f:03.0
[ 80.862936] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 2089: (null): free structs
[ 80.871134] EDAC DEBUG: in drivers/edac/edac_mc.c, line at 238: edac_mc_free()
[ 80.878379] EDAC DEBUG: in drivers/edac/edac_mc_sysfs.c, line at 726: edac_mc_unregister_sysfs_main_kobj()
[ 80.888043] EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1232: drivers/edac/i7core_edac.c: i7core_put_devices()

Also, kfree(mci) shouldn't happen at the kobj.release, as it happens
when edac_remove_sysfs_mci_device() is called, but the logic is:
edac_remove_sysfs_mci_device(mci);
edac_printk(KERN_INFO, EDAC_MC,
"Removed device %d for %s %s: DEV %s\n", mci->mc_idx,
mci->mod_name, mci->ctl_name, edac_dev_name(mci));
So, as the edac_printk() needs the mci struct, this generates an OOPS.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
bbc560ae677c0f4d7ff8404a21409c99f35b297b 16-Aug-2010 Mauro Carvalho Chehab <mchehab@redhat.com> edac_core: Print debug messages at release calls

This is important to track a nasty bug at the free logic.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
6fe1108f14f4f9581af97cab752f37dc8fa9fdec 12-Aug-2010 Mauro Carvalho Chehab <mchehab@redhat.com> edac_core: Do a better job with node removal

Make sure we remove groups at the right order

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
939747bd680eb09bb98792b17a5bfd2f525afe9d 10-Aug-2010 Mauro Carvalho Chehab <mchehab@redhat.com> i7core_edac: Be sure that the edac pci handler will be properly released

With multi-sockets, more than one edac pci handler is enabled. Be sure to
un-register all instances.

Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
00740c58541b6087d78418cebca1fcb86dc6077d 26-Sep-2010 Borislav Petkov <borislav.petkov@amd.com> amd64_edac: Fix driver module removal

f4347553b30ec66530bfe63c84530afea3803396 removed the edac polling
mechanism in favor of using a notifier chain for conveying MCE
information to edac. However, the module removal path didn't test
whether the driver had setup the polling function workqueue at all and
the rmmod process was hanging in the kernel at try_to_del_timer_sync()
in the cancel_delayed_work() path, trying to cancel an uninitialized
work struct.

Fix that by adding a balancing check to the workqueue removal path.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
239642fe19adc19ba0a69e96f3b1904dfd6a3b9f 12-Nov-2009 Borislav Petkov <borislav.petkov@amd.com> edac: add memory types strings for debugging

Instead of using deeply-nested conditionals for dumping the DIMM type in
debug mode, add a strings array of the supported DIMM types.

This is useful in cases where an edac driver supports multiple DRAM
types and is only defined in debug builds.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
458e5ff13e1bed050990d97e9aa55bcdafc951a7 24-Sep-2009 Jesper Dangaard Brouer <hawk@comx.dk> edac: core: remove completion-wait for complete with rcu_barrier

Module edac_core.ko uses call_rcu() callbacks in edac_device.c, edac_mc.c
and edac_pci.c.

They all use a wait_for_completion() scheme, but this scheme it not 100%
safe on multiple CPUs. See the _rcu_barrier() implementation which
explains why extra precausion is needed.

The patch adds a comment about rcu_barrier() and as a precausion calls
rcu_barrier(). A maintainer needs to look at removing the
wait_for_completion code.

[dougthompson@xmission.com: remove the wait_for_completion code]
Signed-off-by Jesper Dangaard Brouer <hawk@comx.dk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fbeb4384748abb78531bbe1e80d627412a0abcfa 13-Apr-2009 Jean Delvare <khali@linux-fr.org> edac: use to_delayed_work()

The edac-core driver includes code which assumes that the work_struct
which is included in every delayed_work is the first member of that
structure. This is currently the case but might change in the future, so
use to_delayed_work() instead, which doesn't make such an assumption.

linux-2.6.30-rc1 has the to_delayed_work() function that will allow this
patch to work

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
281efb17d88a91dc3b879bb1d49e3a66daf48797 06-Jan-2009 Kay Sievers <kay.sievers@vrfy.org> edac: struct device: replace bus_id with dev_name(), dev_set_name()

This patch is part of a larger patch series which will remove the "char
bus_id[20]" name string from struct device. The device name is managed in
the kobject anyway, and without any size limitation, and just needlessly
copied into "struct device".

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
17aa7e034416e3080bc57a786d09ba0a4a044561 05-May-2008 Stephen Rothwell <sfr@canb.auug.org.au> dev_name introduction fall out fix

Commit 06916639e2fed9ee475efef2747a1b7429f8fe76 ("driver-core: add
dev_name() to help transition away from using bus_id") added a static
inline dev_name() and used it in dev_printk.

Unfortunately, drivers/edac/edac_core.h defines a macro called
dev_name(). Rename the latter.

Diagnosis by Tony Breeds and Michael Ellerman.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1a45027d1afd7e85254b5ef8535e93ce3d588cf4 29-Apr-2008 Adrian Bunk <bunk@kernel.org> edac: remove unneeded functions and add static accessor

Collection of patches, merged into one, from Adrian that do the following:

1) This patch makes the following needlessly global functions static:
- edac_pci_get_log_pe()
- edac_pci_get_log_npe()
- edac_pci_get_panic_on_pe()
- edac_pci_unregister_sysfs_instance_kobj()
- edac_pci_main_kobj_setup()

2) Remove unneeded function edac_device_find()

3) Added #if 0 around function edac_pci_find()

4) make the needlessly global edac_pci_generic_check() static

5) Removed function edac_check_mc_devices()

Doug Thompson modified Adrian's patches, to bettern represent
the direction of EDAC, and make them one patch.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ff6ac2a616c85d1215899ffda815e29b699cbd3a 29-Apr-2008 Robert P. J. Day <rpjday@crashcourse.ca> edac: use the shorter LIST_HEAD for brevity

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Acked-by: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bce19683c17485b584b62b984d6dcf5332181588 26-Jul-2007 Doug Thompson <dougthompson@xmission.com> drivers/edac: fix reset edac_mc pollmsec

This fixes a deadlock that could occur on a 'setup' and 'teardown' sequence of
the workq for a edac_mc control structure instance. A similiar fix was
previously implemented for the edac_device code.

In addition, the edac_mc device code there was missing code to allow the workq
period valu to be altered via sysfs control.

This patch adds that fix on the code, and allows for the changing of the
period value as well.

Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bf52fa4a26567bfbf5b1d30f84cf0226e61d26cd 19-Jul-2007 Doug Thompson <dougthompson@xmission.com> drivers/edac: fix workq reset deadlock

Fix mutex locking deadlock on the device controller linked list. Was calling
a lock then a function that could call the same lock. Moved the cancel workq
function to outside the lock

Added some short circuit logic in the workq code

Added comments of description

Code tidying

Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8096cfafbb7ad3cb1a286ae7e8086167f4ebb4b6 19-Jul-2007 Doug Thompson <dougthompson@xmission.com> drivers/edac: fix edac_mc sysfs completion code

This patch refactors the 'releasing' of kobjects for the edac_mc type of
device. The correct pattern of kobject release is followed.

As internal kobjs are allocated they bump a ref count on the top level kobj.
It in turn has a module ref count on the edac_core module. When internal
kobjects are released, they dec the ref count on the top level kobj. When the
top level kobj reaches zero, it decrements the ref count on the edac_core
object, allow it to be unloaded, as all resources have all now been released.

Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
b8f6f9755248026f21282e25cac49a1af698056c 19-Jul-2007 Doug Thompson <dougthompson@xmission.com> drivers/edac: fix edac_mc init apis

Refactoring of sysfs code necessitated the refactoring of the edac_mc_alloc()
and edac_mc_add_mc() apis, of moving the index value to the alloc() function.
This patch alters the in tree drivers to utilize this new api signature.

Having the index value performed later created a chicken-and-the-egg issue.
Moving it to the alloc() function allows for creating the necessary sysfs
entries with the proper index number

Cc: Alan Cox alan@lxorguk.ukuu.org.uk
Signed-off-by: Doug Thompson <dougthompson@xmission.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7391c6dcab3094610cb99bbd559beaa282582eac 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: mod edac_align_ptr function

Refactor the edac_align_ptr() function to reduce the noise of casting the
aligned pointer to the various types of data objects and modified its callers
to its new signature

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
052dfb45ccb5ea354a426b52556bcfee75b9d2f5 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: cleanup spaces-gotos after Lindent messup

This patch fixes some remnant spaces inserted by the use of Lindent.
Seems Lindent adds some spaces when it shoulded. These have been fixed.
In addition, goto targets have issues, these have been fixed
in this patch.

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
86aa8cb7bc47fe786df073246055d69d98e6330a 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: cleanup workq ifdefs

The origin of this code comes from patches at sourceforge, that
allow EDAC to be updated to various kernels. With kernel version 2.6.20 a
new workq system was installed, thus the patches needed to be modified
based on the kernel version. For submitting to the latest kernel.org
those #ifdefs are removed

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
079708b9173595bf74b31b14c36e946359ae6c7e 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: core Lindent cleanup

Run the EDAC CORE files through Lindent for cleanup

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4de78c6877ec21142582ac19453c2d453d1ea298 19-Jul-2007 Dave Jiang <djiang@mvista.com> drivers/edac: mod PCI poll names

Fixup poll values for MC and PCI.
Also make mc function names unique to mc.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmissin.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
66ee2f940ac8ab25f0c43a1e717d25dc46bfe74d 19-Jul-2007 Dave Jiang <djiang@mvista.com> drivers/edac: mod assert_error check

Change error check and clear variable from an atomic to an int

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
81d87cb13e367bb804bf44889ae0de7369705d6c 19-Jul-2007 Dave Jiang <djiang@mvista.com> drivers/edac: mod MC to use workq instead of kthread

Move the memory controller object to work queue based implementation from the
kernel thread based.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c4192705fec85219086231a1c0fa61e8776e2c3b 19-Jul-2007 Dave Jiang <djiang@mvista.com> drivers/edac: add dev_name getter function

Move dev_name() macro to a more generic interface since it's not possible
to determine whether a device is pci, platform, or of_device easily.

Now each low level driver sets the name into the control structure, and
the EDAC core references the control structure for the information.

Better abstraction.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
20bcb7a81dee21bfa3408f03f46b2891c9b5c84b 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: mod use edac_core.h

In the refactoring of edac_mc.c into several subsystem files,
the header file edac_mc.h became meaningless. A new header file
edac_core.h was created. All the files that previously included
"edac_mc.h" are changed to include "edac_core.h".

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c0d121720220584bba2876b032e58a076b843fa1 19-Jul-2007 Dave Jiang <djiang@mvista.com> drivers/edac: add new nmi rescan

Provides a way for NMI reported errors on x86 to notify the EDAC
subsystem pending ECC errors by writing to a software state variable.

Here's the reworked patch. I added an EDAC stub to the kernel so we can
have variables that are in the kernel even if EDAC is a module. I also
implemented the idea of using the chip driver to select error detection
mode via module parameter and eliminate the kernel compile option.
Please review/test. Thx!

Also, I only made changes to some of the chipset drivers since I am
unfamiliar with the other ones. We can add similar changes as we go.

Signed-off-by: Dave Jiang <djiang@mvista.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
63b7df9101895d1f0a259c567b3bab949a23075f 19-Jul-2007 Matthias Kaehlcke <matthias.kaehlcke@gmail.com> drivers/edac: change from semaphore to mutex operation

The EDAC core code uses a semaphore as mutex. use the mutex API
instead of the (binary) semaphore.

Matthaias wrote this, but since I had some patches ahead of it,
I need to modify it to follow my patches.

Signed-off-by: Matthias Kaehlcke <matthias.kaehlcke@gmail.com>
Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e27e3dac651771fe3250f6305dee277bce29fc5d 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: add edac_device class

This patch adds the new 'class' of object to be managed, named: 'edac_device'.

As a peer of the 'edac_mc' class of object, it provides a non-memory centric
view of an ERROR DETECTING device in hardware. It provides a sysfs interface
and an abstraction for varioius EDAC type devices.

Multiple 'instances' within the class are possible, with each 'instance'
able to have multiple 'blocks', and each 'block' having 'attributes'.

At the 'block' level there are the 'ce_count' and 'ue_count' fields
which the device driver can update and/or call edac_device_handle_XX()
functions. At each higher level are additional 'total' count fields,
which are a summation of counts below that level.

This 'edac_device' has been used to capture and present ECC errors
which are found in a a L1 and L2 system on a per CORE/CPU basis.

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7c9281d76c1c0b130f79d5fc021084e9749959d4 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: split out functions to unique files

This is a large patch to refactor the original EDAC module in the kernel
and to break it up into better file granularity, such that each source
file contains a given subsystem of the EDAC CORE.

Originally, the EDAC 'core' was contained in one source file: edac_mc.c
with it corresponding edac_mc.h file.

Now, there are the following files:

edac_module.c The main module init/exit function and other overhead
edac_mc.c Code handling the edac_mc class of object
edac_mc_sysfs.c Code handling for sysfs presentation
edac_pci_sysfs.c Code handling for PCI sysfs presentation
edac_core.h CORE .h include file for 'edac_mc' and 'edac_device' drivers
edac_module.h Internal CORE .h include file

This forms a foundation upon which a later patch can create the 'edac_device'
class of object code in a new file 'edac_device.c'.

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2da1c119fd999cb834b4fe0c1a5a8c36195df1cb 19-Jul-2007 Adrian Bunk <bunk@stusta.de> drivers/edac: core: make functions static

This patch makes needlessly global code static, in the edac core

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Cc: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5da0831c598f94582bce6bb0a55b8de2f9897cb1 19-Jul-2007 Douglas Thompson <dougthompson@xmission.com> drivers/edac: add edac_mc_find API

This simple patch adds an important CORE API for EDAC that EDAC drivers can
use to find their edac_mc control structure by passing a mem_ctl_info
'instance' value

Needed for subsequent patches

Signed-off-by: Douglas Thompson <dougthompson@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
831441862956fffa17b9801db37e6ea1650b0f69 17-Jul-2007 Rafael J. Wysocki <rjw@sisk.pl> Freezer: make kernel threads nonfreezable by default

Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.

It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.

The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.

[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
9794f33ddedd878dd92fcf8b4834391840366919 12-Feb-2007 eric wollesen <ericw@xmtp.net> [PATCH] EDAC: Add Fully-Buffered DIMM APIs to core

Eric Wollesen ported the Bluesmoke Memory Controller driver for the Intel
5000X/V/P (Blackford/Greencreek) chipset to the in kernel EDAC model.

This patch incorporates those required changes to the edac_mc.c and edac_mc.h
core files by added new Fully Buffered DIMM interface to the EDAC Core module.

Signed-off-by: eric wollesen <ericw@xmtp.net>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4f423ddf56e5ecb1fb2eac83b8e228e3d0aae0f6 12-Feb-2007 Frithiof Jensen <frithiof.jensen@ericson.com> [PATCH] EDAC: Add memory scrubbing controls API to core

This is an attempt of providing an interface for memory scrubbing control in
EDAC.

This patch modifies the EDAC Core to provide the Interface for memory
controller modules to implment.

The following things are still outstanding:

- K8 is the first implemenation,

The patch provide a method of configuring the K8 hardware memory scrubber
via the 'mcX' sysfs directory. There should be some fallback to a generic
scrubber implemented in software if the hardware does not support
scrubbing.

Or .. the scrubbing sysfs entry should not be visible at all.

- Only works with SDRAM, not cache,

The K8 can scrub cache and l2cache also - but I think this is not so
useful as the cache is busy all the time (one hopes).

One would also expect that cache scrubbing requires hardware support.

- Error Handling,

I would like that errors are returned to the user in "terms of file
system".

- Presentation,

I chose Bandwidth in Bytes/Second as a representation of the scrubbing
rate for the following reasons:

I like that the sysfs entries are sort-of textual, related to something
that makes sense instead of magical values that must be looked up.

"My People" wants "% main memory scrubbed per hour" others prefer "%
memory bandwidth used" as representation, "bandwith used" makes it easy to
calculate both versions in one-liner scripts.

If one later wants to scrub cache, the scaling becomes wierd for K8
changing from "blocks of 64 byte memory" to "blocks of 64 cache lines" to
"blocks of 64 bit". Using "bandwidth used" makes sense in all three cases,
(I.M.O. anyway ;-).

- Discovery,

There is no way to discover the possible settings and what they do
without reading the code and the documentation.

*I* do not know how to make that work in a practical way.

- Bugs(??),

other tools can set invalid values in the memory scrub control register,
those will read back as '-1', requiring the user to reset the scrub rate.
This is how *I* think it should be.

- Afflicting other areas of code,

I made changes to edac_mc.c and edac_mc.h which will show up globally -
this is not nice, it would be better that the memory scrubbing fuctionality
and interface could be entirely contained within the memory controller it
applies to.

Frithiof Jensen

edac_mc.c and its .h file is a CORE helper module for EDAC
driver modules. This provides the abstraction for device specific
drivers. It is fine to modify this CORE to provide help for
new features of the the drivers

doug thompson

Signed-off-by: Frithiof Jensen <frithiof.jensen@ericson.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Acked-by: Alan Cox <alan@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7dfb71030f7636a0d65200158113c37764552f93 07-Dec-2006 Nigel Cunningham <ncunningham@linuxmail.org> [PATCH] Add include/linux/freezer.h and move definitions from sched.h

Move process freezing functions from include/linux/sched.h to freezer.h, so
that modifications to the freezer or the kernel configuration don't require
recompiling just about everything.

[akpm@osdl.org: fix ueagle driver]
Signed-off-by: Nigel Cunningham <nigel@suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
77d6e1397a004c9376fed855e4164ca2b1dba2ed 03-Nov-2006 Akinobu Mita <akinobu.mita@gmail.com> [PATCH] edac_mc: fix error handling

Call sysdev_class_unregister() on failure in edac_sysfs_memctrl_setup()
and decrease identation level for clear logic.

Acked-by: Doug Thompson <norsk5@xmission.com>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
49c0dab7e6000888b616bedcbbc8cd4710331610 10-Jul-2006 Doug Thompson <norsk5@xmission.com> [PATCH] Fix and enable EDAC sysfs operation

When EDAC was first introduced into the kernel it had a sysfs interface,
but due to some problems it was disabled in 2.6.16 and remained disabled in
2.6.17.

With feedback, several of the control and attribute files of that interface
had some good constructive feedback. PCI Blacklist/Whitelist was a major
set which has design issues and it has been removed in this patch. Instead
of storing PCI broken parity status in EDAC, it has been moved to the
pci_dev structure itself by a previous PCI patch. A future patch will
enable that feature in EDAC by utilizing the pci_dev info.

The sysfs is now enabled in this patch, with a minimal set of control and
attribute files for examining EDAC state and for enabling/disabling the
memory and PCI operations.

The Documentation for EDAC has also been updated to reflect the new state
of EDAC operation.

Signed-off-by:Doug Thompson <norsk5@xmisson.com>
Cc: Greg KH <greg@kroah.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2d7bbb91c8df26c60d223205a087507430024177 30-Jun-2006 Doug Thompson <norsk5@xmission.com> [PATCH] EDAC: mc numbers refactor 1-of-2

Remove add_mc_to_global_list(). In next patch, this function will be
reimplemented with different semantics.

1 Reimplement add_mc_to_global_list() with semantics that allow the caller to
determine the ID number for a mem_ctl_info structure. Then modify
edac_mc_add_mc() so that the caller specifies the ID number for the new
mem_ctl_info structure. Platform-specific code should be able to assign the
ID numbers in a platform-specific manner. For instance, on Opteron it makes
sense to have the ID of the mem_ctl_info structure match the ID of the node
that the memory controller belongs to.

2 Modify callers of edac_mc_add_mc() so they use the new semantics.

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
37f04581abac20444e5b7106c1e1f28bec5b989c 30-Jun-2006 Doug Thompson <norsk5@xmission.com> [PATCH] EDAC: PCI device to DEVICE cleanup

Change MC drivers from using CVS revision strings for their version number,
Now each driver has its own local string.

Remove some PCI dependencies from the core EDAC module. Made the code 'struct
device' centric instead of 'struct pci_dev' Most of the code changes here are
from a patch by Dave Jiang. It may be best to eventually move the
PCI-specific code into a separate source file.

Signed-off-by: Doug Thompson <norsk5@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
7f927fcc2fd1575d01efb4b76665975007945690 28-Mar-2006 Alexey Dobriyan <adobriyan@gmail.com> [PATCH] Typo fixes

Fix a lot of typos. Eyeballed by jmc@ in OpenBSD.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
9110540f7f2bbcc3577d2580a696fbb7af68c892 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: use EXPORT_SYMBOL_GPL

Change all instances of EXPORT_SYMBOL() in the core EDAC module to
EXPORT_SYMBOL_GPL().

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
e7ecd8910293564d357dbaf18eb179e06fa35fd0 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: formatting cleanup

Cosmetic indentation/formatting cleanup for EDAC code. Make sure we
are using tabs rather than spaces to indent, etc.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
54933dddc3e8ccd9db48966d8ada11951cb8a558 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: reorder EXPORT_SYMBOL macros

Fix EDAC code so EXPORT_SYMBOL comes after the function that is being
exported. This is to maintain consistency with the rest of the kernel.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
18dbc337af5d6efd30cb9291e74722c8ad134fd3 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: protect memory controller list

- Fix code so we always hold mem_ctls_mutex while we are stepping
through the list of mem_ctl_info structures. Otherwise bad things
may happen if one task is stepping through the list while another
task is modifying it. We may eventually want to use reference
counting to manage the mem_ctl_info structures. In the meantime we
may as well fix this bug.

- Don't disable interrupts while we are walking the list of
mem_ctl_info structures in check_mc_devices(). This is unnecessary.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
472678ebd30d87cbe8d97562dcc0e46d1076040f 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: kobject/sysfs fixes

- After we unregister a kobject, wait for our kobject release method
to call complete(). This causes us to wait until the kobject
reference count reaches 0. Otherwise, a task accessing the EDAC
sysfs interface can hold the reference count above 0 until after the
EDAC module has been unloaded. When the reference count finally
drops to 0, this will result in an attempt to call our release
method inside the EDAC module after the module has already been
unloaded.

This isn't the best fix, since a process can get stuck sleeping forever
uninterruptibly if the user does the following:

rmmod my_module < /sys/my_sysfs/file

I'll go back and implement a better fix later. However this should
be ok for now.

- Call edac_remove_sysfs_mci_device() from edac_mc_del_mc() rather
than from edac_mc_free(). Since edac_mc_add_mc() calls
edac_create_sysfs_mci_device(), edac_mc_del_mc() should call
edac_remove_sysfs_mci_device().

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6e5a8748507dea83386d1d76c58aeaed1ff5a1ec 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: kobject_init/kobject_put fixes

- Remove calls to kobject_init(). These are unnecessary because
kobject_register() calls kobject_init().

- Remove extra calls to kobject_put(). When we call
kobject_unregister(), this releases our reference to the kobject.
The extra calls to kobject_put() may cause the reference count to
drop to 0 while a kobject is still in use.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
028a7b6d3d9fa2cc41d76d45575345cca8d00a4c 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: edac_mc_add_mc fix [2/2]

This is part 2 of a 2-part patch set.

Fix edac_mc_add_mc() so it cleans up properly if call to
edac_create_sysfs_mci_device() fails.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a1d03fcc1399b1e23922bcc3af1772b128aa6e93 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: edac_mc_add_mc fix [1/2]

This is part 1 of a 2-part patch set. The code changes are split into
two parts to make the patches more readable.

Move complete_mc_list_del() and del_mc_from_global_list() so we can
call del_mc_from_global_list() from edac_mc_add_mc() without forward
declarations. Perhaps using forward declarations would be better?
I'm doing things this way because the rest of the code is missing
them.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
749ede57443b2a7ede2db105145f21047efcea6a 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: cleanup code for clearing initial errors

Fix xxx_probe1() functions so they call xxx_get_error_info() functions
to clear initial errors. This is simpler and cleaner than duplicating
the low-level code for accessing PCI config space.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
537fba28928c01b7db1580627450691a4bb0b9b3 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: printk cleanup

This implements the following idea:

On Monday 30 January 2006 19:22, Eric W. Biederman wrote:
> One piece missing from this conversation is the issue that we need errors
> in a uniform format. That is why edac_mc has helper functions.
>
> However there will always be errors that don't fit any particular model.
> Could we add a edac_printk(dev, ); That is similar to dev_printk but
> prints out an EDAC header and the device on which the error was found?
> Letting the rest of the string be user specified.
>
> For actual control that interface may be to blunt, but at least for people
> looking in the logs it allows all of the errors to be detected and
> harvested.

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
f2fe42abbf0d99a8c4b96f1cc55db10ac35d2fb9 26-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: switch to kthread_ API

This patch was originally posted by Christoph Hellwig (see
http://lkml.org/lkml/2006/2/14/331):

"Christoph Hellwig" <hch@lst.de> wrote:
> Use the kthread_ API instead of opencoding lots of hairy code for kernel
> thread creation and teardown, including tasklist_lock abuse.
>

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: <dave_peterson@pobox.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ceb2ca9cb0bfd885127fa9a2c27127b3fe1c8f28 14-Mar-2006 Dave Peterson <dsp@llnl.gov> [PATCH] EDAC: disable sysfs interface

- Disable the EDAC sysfs code. The sysfs interface that EDAC presents to
user space needs more thought, and is likely to change substantially.
Therefore disable it for now so users don't start depending on it in its
current form.

- Disable the default behavior of calling panic() when an uncorrectible
error is detected (since for now, there is no sysfs interface that allows
the user to configure this behavior).

Signed-off-by: David S. Peterson <dsp@llnl.gov>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
4136cabff33d6d73b8daf2f2612670cc0296f844 11-Mar-2006 Arjan van de Ven <arjan@linux.intel.com> [PATCH] edac: disable a few sysfs files to avoid them becoming an ABI

Disable (via ugly #if 0's) the 3 sysfs files that I think by now we all
agree are very much wrong. These files shouldn't become part of the ABI by
the 2.6.16 release, so I rather have this minimal patch merged to disable
them for now, the real fix can then come during the 2.6.17 devel window.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
353368dffb56b066cbe00264581a56caf0241b29 03-Feb-2006 Eric W. Biederman <ebiederm@xmission.com> [PATCH] edac_mc: Remove include of version.h

By including version.h edac_mc was rebuilding on every incremental build.
Which defeats the point of incremental builds.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
da9bb1d27b21cb24cbb6a2efb5d3c464d357a01e 19-Jan-2006 Alan Cox <alan@lxorguk.ukuu.org.uk> [PATCH] EDAC: core EDAC support code

This is a subset of the bluesmoke project core code, stripped of the NMI work
which isn't ready to merge and some of the "interesting" proc functionality
that needs reworking or just has no place in kernel. It requires no core
kernel changes except the added scrub functions already posted.

The goal is to merge further functionality only after the core code is
accepted and proven in the base kernel, and only at the point the upstream
extras are really ready to merge.

From: doug thompson <norsk5@xmission.com>

This converts EDAC to sysfs and is the final chunk neccessary before EDAC
has a stable user space API and can be considered for submission into the
base kernel.

Signed-off-by: Alan Cox <alan@redhat.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: doug thompson <norsk5@xmission.com>
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>