History log of /drivers/char/mem.c
Revision Date Author Comments
f945a126e264b7a570b8b358652c0b61efacaab2 29-Apr-2008 Robert Love <rlove@google.com> Make /dev/mem configurable, as we don't want it.

Signed-off-by: Brian Swetland <swetland@google.com>
13ba33e89991f6c020a36cfac0001dd54281e67c 18-Aug-2014 Al Viro <viro@zeniv.linux.org.uk> switch /dev/zero and /dev/full to ->read_iter()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
08d2d00b291ed4eb91530050274e67a761c1901d 30-Jan-2014 Petr Tesarik <ptesarik@suse.cz> /dev/mem: handle out-of-bounds read/write

The loff_t type may be wider than phys_addr_t (e.g. on 32-bit systems).
Consequently, the file offset may be truncated in the assignment.
Currently, /dev/mem wraps around, which may cause applications to read
or write incorrect regions of memory by accident.

Let's follow POSIX file semantics here and return 0 when reading from
and -EFBIG when writing to an offset that cannot be represented by a
phys_addr_t.

Note that the conditional is optimized out by the compiler if loff_t
has the same size as phys_addr_t.

Signed-off-by: Petr Tesarik <ptesarik@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
869a84e1ca163b737236dae997db4a6a1e230b9b 22-Jan-2014 Grygorii Strashko <grygorii.strashko@ti.com> mm/memblock: remove unnecessary inclusions of bootmem.h

Clean-up to remove depedency with bootmem headers.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Reviewed-by: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Paul Walmsley <paul@pwsan.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Lindgren <tony@atomide.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
a11edb59a05d8d5195419bd1fc28d82752324158 04-Jul-2013 Zhang Yanfei <zhangyanfei@cn.fujitsu.com> /dev/oldmem: Remove the interface

/dev/oldmem provides the interface for us to access the "old memory" in
the dump-capture kernel. Unfortunately, no one actually uses this
interface.

And this interface could actually cause some real problems if used on ia64
where the cached/uncached accesses are mixed. See the discussion from the
link: https://lkml.org/lkml/2013/4/12/386.

So Eric suggested that we should remove /dev/oldmem as an unused piece of
code.

[akpm@linux-foundation.org: mention /dev/oldmem obsolescence in devices.txt]
Suggested-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Dave Hansen <dave@sr71.net>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Matt Fleming <matt.fleming@intel.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
71811f3271cd986e223be44830e5961056561ac3 05-Jun-2013 Rasmus Villemoes <mail@rasmusvillemoes.dk> drivers: char: mem: use IS_ERR_VALUE() in memory_lseek()

Use IS_ERR_VALUE() instead of comparing the new offset to a hard-coded
value of -MAX_ERRNO (which is also off-by-one due to the use of ~
instead of -).

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
a27bb332c04cec8c4afd7912df0dc7890db27560 08-May-2013 Kent Overstreet <koverstreet@google.com> aio: don't include aio.h in sched.h

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Zach Brown <zab@redhat.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
162934de51e0271f6e2955075735656ea5092ea9 08-May-2013 Zach Brown <zab@redhat.com> char: add aio_{read,write} to /dev/{null,zero}

These are handy for measuring the cost of the aio infrastructure with
operations that do very little and complete immediately.

Signed-off-by: Zach Brown <zab@redhat.com>
Signed-off-by: Kent Overstreet <koverstreet@google.com>
Cc: Felipe Balbi <balbi@ti.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Asai Thambi S P <asamymuthupa@micron.com>
Cc: Selvan Mani <smani@micron.com>
Cc: Sam Bradshaw <sbradshaw@micron.com>
Acked-by: Jeff Moyer <jmoyer@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Reviewed-by: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
496ad9aa8ef448058e36ca7a787c61f2e63f0f54 23-Jan-2013 Al Viro <viro@zeniv.linux.org.uk> new helper: file_inode(file)

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
890537b3ac953ad2cc4f5ecb83744e967ae2aa31 06-Feb-2013 Hans Grob <H.Grob@physik.uni-muenchen.de> drivers/char/mem.c: fix small coding style issues

This patch fixes four foo * bar errors, and one trailing whitespace
complaint from checkpatch.pl

Signed-off-by: Hans Grob <H.Grob@physik.uni-muenchen.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7e6735c3578e76c270a2797225a4214176ba13ef 12-Sep-2012 Cyril Chemparathy <cyril@ti.com> /dev/mem: use phys_addr_t for physical addresses

This patch fixes the /dev/mem driver to use phys_addr_t for physical
addresses. This is required on PAE systems, especially those that run
entirely out of >4G physical memory space.

Signed-off-by: Cyril Chemparathy <cyril@ti.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
314e51b9851b4f4e8ab302243ff5a6fc6147f379 09-Oct-2012 Konstantin Khlebnikov <khlebnikov@openvz.org> mm: kill vma flag VM_RESERVED and mm->reserved_vm counter

A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

| effect | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct. Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e1612de9e4cdf375c3cf1c72434ab8abdcb3927e 11-Jul-2012 Haren Myneni <haren@us.ibm.com> powerpc: Disable /dev/port interface on systems without an ISA bridge

Some power systems do not have legacy ISA devices. So, /dev/port is not
a valid interface on these systems. User level tools such as kbdrate is
trying to access the device using this interface which is causing the
system crash.

This patch will fix this issue by not creating this interface on these
powerpc systems.

Signed-off-by: Haren Myneni <haren@us.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
7f3a781d6fd81e397c3928c9af33f1fc63232db6 09-May-2012 Kay Sievers <kay@vrfy.org> printk - fix compilation for CONFIG_PRINTK=n

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
e11fea92e13fb91c50bacca799a6131c81929986 03-May-2012 Kay Sievers <kay@vrfy.org> kmsg: export printk records to the /dev/kmsg interface

Support for multiple concurrent readers of /dev/kmsg, with read(),
seek(), poll() support. Output of message sequence numbers, to allow
userspace log consumers to reliably reconnect and reconstruct their
state at any given time. After open("/dev/kmsg"), read() always
returns *all* buffered records. If only future messages should be
read, SEEK_END can be used. In case records get overwritten while
/dev/kmsg is held open, or records get faster overwritten than they
are read, the next read() will return -EPIPE and the current reading
position gets updated to the next available record. The passed
sequence numbers allow the log consumer to calculate the amount of
lost messages.

[root@mop ~]# cat /dev/kmsg
5,0,0;Linux version 3.4.0-rc1+ (kay@mop) (gcc version 4.7.0 20120315 ...
6,159,423091;ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
7,160,424069;pci_root PNP0A03:00: host bridge window [io 0x0000-0x0cf7] (ignored)
SUBSYSTEM=acpi
DEVICE=+acpi:PNP0A03:00
6,339,5140900;NET: Registered protocol family 10
30,340,5690716;udevd[80]: starting version 181
6,341,6081421;FDC 0 is a S82078B
6,345,6154686;microcode: CPU0 sig=0x623, pf=0x0, revision=0x0
7,346,6156968;sr 1:0:0:0: Attached scsi CD-ROM sr0
SUBSYSTEM=scsi
DEVICE=+scsi:1:0:0:0
6,347,6289375;microcode: CPU1 sig=0x623, pf=0x0, revision=0x0

Cc: Karel Zak <kzak@redhat.com>
Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
7ff9554bb578ba02166071d2d487b7fc7d860d62 03-May-2012 Kay Sievers <kay@vrfy.org> printk: convert byte-buffer to variable-length record buffer

- Record-based stream instead of the traditional byte stream
buffer. All records carry a 64 bit timestamp, the syslog facility
and priority in the record header.

- Records consume almost the same amount, sometimes less memory than
the traditional byte stream buffer (if printk_time is enabled). The record
header is 16 bytes long, plus some padding bytes at the end if needed.
The byte-stream buffer needed 3 chars for the syslog prefix, 15 char for
the timestamp and a newline.

- Buffer management is based on message sequence numbers. When records
need to be discarded, the reading heads move on to the next full
record. Unlike the byte-stream buffer, no old logged lines get
truncated or partly overwritten by new ones. Sequence numbers also
allow consumers of the log stream to get notified if any message in
the stream they are about to read gets discarded during the time
of reading.

- Better buffered IO support for KERN_CONT continuation lines, when printk()
is called multiple times for a single line. The use of KERN_CONT is now
mandatory to use continuation; a few places in the kernel need trivial fixes
here. The buffering could possibly be extended to per-cpu variables to allow
better thread-safety for multiple printk() invocations for a single line.

- Full-featured syslog facility value support. Different facilities
can tag their messages. All userspace-injected messages enforce a
facility value > 0 now, to be able to reliably distinguish them from
the kernel-generated messages. Independent subsystems like a
baseband processor running its own firmware, or a kernel-related
userspace process can use their own unique facility values. Multiple
independent log streams can co-exist that way in the same
buffer. All share the same global sequence number counter to ensure
proper ordering (and interleaving) and to allow the consumers of the
log to reliably correlate the events from different facilities.

Tested-by: William Douglas <william.douglas@intel.com>
Signed-off-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2c9ede55ecec58099b72e4bb8eab719f32f72c31 24-Jul-2011 Al Viro <viro@zeniv.linux.org.uk> switch device_get_devnode() and ->devnode() to umode_t *

both callers of device_get_devnode() are only interested in lower 16bits
and nobody tries to return anything wider than 16bit anyway.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
66300e66c680f7bcc43127627740f493ef0b05bc 10-Jul-2011 Paul Gortmaker <paul.gortmaker@windriver.com> drivers/char: Add export.h for EXPORT_SYMBOL/THIS_MODULE as required

They will need it called out explicitly in the near future due
to a module.h usage cleanup that removes its implicit presence
everywhere.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
70a5f52165bd04cf3b33f30d5d234be28dcf29d4 13-Apr-2011 Andrew Morton <akpm@linux-foundation.org> kmsg: properly support writev to avoid interleaved printk lines fix

make `len' size_t, avoid multiple-assignments.

Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
7e5b58bcbcb3d7518389c1d82fb6e926f5a9f72c 07-Apr-2011 Kay Sievers <kay.sievers@vrfy.org> printk: /dev/kmsg - properly support writev() to avoid interleaved printk() lines

printk: /dev/kmsg - properly support writev() to avoid interleaved printk lines

We should avoid calling printk() in a loop, when we pass a single
string to /dev/kmsg with writev().

Cc: Lennart Poettering <lennart@poettering.net>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
cfaf346cb2741ca648d83527df173b759381e607 24-Mar-2011 Changli Gao <xiaosuo@gmail.com> drivers/char/mem.c: clean up the code

Reduce the lines of code and simplify the logic.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4a3956c790290efeb647bbb0c3a90476bb57800e 01-Oct-2010 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> vfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_pos

Now, rw_verify_area() checsk f_pos is negative or not. And if negative,
returns -EINVAL.

But, some special files as /dev/(k)mem and /proc/<pid>/mem etc.. has
negative offsets. And we can't do any access via read/write to the
file(device).

So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
6038f373a3dc1f1c26496e60b6c40b164716f07e 15-Aug-2010 Arnd Bergmann <arnd@arndb.de> llseek: automatically add .llseek fop

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{
<+...
nonseekable_open(...)
...+>
}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{
<+...
(
nonseekable_open(...)
|
nested_open(...)
)
...+>
}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
<+...
(
*off = E
|
*off += E
|
func(..., off, ...)
|
E = *off
)
...+>
}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Julia Lawall <julia@diku.dk>
Cc: Christoph Hellwig <hch@infradead.org>
371d217ee1ff8b418b8f73fb2a34990f951ec2d4 21-Sep-2010 Jan Kara <jack@suse.cz> char: Mark /dev/zero and /dev/kmem as not capable of writeback

These devices don't do any writeback but their device inodes still can get
dirty so mark bdi appropriately so that bdi code does the right thing and files
inodes to lists of bdi carrying the device inodes.

Cc: stable@kernel.org
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
31d1d48e199e99077fb30f6fb9a793be7bec756f 06-Aug-2010 David Howells <dhowells@redhat.com> Fix init ordering of /dev/console vs callers of modprobe

Make /dev/console get initialised before any initialisation routine that
invokes modprobe because if modprobe fails, it's going to want to open
/dev/console, presumably to write an error message to.

The problem with that is that if the /dev/console driver is not yet
initialised, the chardev handler will call request_module() to invoke
modprobe, which will fail, because we never compile /dev/console as a
module.

This will lead to a modprobe loop, showing the following in the kernel
log:

request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1
request_module: runaway loop modprobe char-major-5-1

This can happen, for example, when the built in md5 module can't find
the built in cryptomgr module (because the latter fails to initialise).
The md5 module comes before the call to tty_init(), presumably because
'crypto' comes before 'drivers' alphabetically.

Fix this by calling tty_init() from chrdev_init().

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ea56f411ec2d4d8689eb7f3bcd24e860acf87c47 06-Apr-2010 David Howells <dhowells@redhat.com> frv: hide uncached_access() when pgprot_noncached is not #defined

Hide uncached_access() when pgprot_noncached is not #defined. This prevents
the following warning:

CC drivers/char/mem.o
drivers/char/mem.c:229: warning: 'uncached_access' defined but not used

Repairs d7d4d849b4e3acc405ec222884936800ffb26d48 ("drivers/char/mem.c:
cleanups").

Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ee5d2acd5ca1534f40e06d4f0d41a940b17beb54 06-Apr-2010 Eric Dumazet <eric.dumazet@gmail.com> /dev/mem: allow rewinding

commit dcefafb6 ("/dev/mem: dont allow seek to last page") inadvertently
disabled rewinding on /dev/mem.

This broke x86info for example.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6e191f7bb083544dc4fa3879ff81caf97c65d197 06-Apr-2010 Anton Blanchard <anton@samba.org> devmem: handle class_create() failure

I hit this when we had a bug in IDR for a few days. Basically sysfs would
fail to create new inodes since it uses an IDR and therefore class_create
would fail.

While we are unlikely to see this fail we may as well handle it instead of
oopsing.

Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d7d4d849b4e3acc405ec222884936800ffb26d48 11-Mar-2010 Andrew Morton <akpm@linux-foundation.org> drivers/char/mem.c: cleanups

- fix switch statement layout

- fix whitespace stuff

- fix comment layout

- remove unneeded inlining

- use __weak

- remove trailing whitespace

- move uncached_access() inside `#ifndef __HAVE_PHYS_MEM_ACCESS_PROT' - it
is otherwise unused.

Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dcefafb6ac90ece8d68a6c203105f3d313e52da4 11-Mar-2010 Wu Fengguang <fengguang.wu@intel.com> /dev/mem: dont allow seek to last page

So as to return a uniform error -EOVERFLOW instead of a random one:

# kmem-seek 0xfffffffffffffff0
seek /dev/kmem: Device or resource busy
# kmem-seek 0xfffffffffffffff1
seek /dev/kmem: Block device required

Suggested by OGAWA Hirofumi.

Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c85e9a97c4102ce2e83112da850d838cfab5ab13 02-Feb-2010 Wu Fengguang <fengguang.wu@intel.com> devmem: fix kmem write bug on memory holes

write_kmem() used to assume vwrite() always return the full buffer length.
However now vwrite() could return 0 to indicate memory hole. This
creates a bug that "buf" is not advanced accordingly.

Fix it to simply ignore the return value, hence the memory hole.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
325fda71d0badc1073dc59f12a948f24ff05796a 02-Feb-2010 KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> devmem: check vmalloc address on kmem read/write

Otherwise vmalloc_to_page() will BUG().

This also makes the kmem read/write implementation aligned with mem(4):
"References to nonexistent locations cause errors to be returned." Here we
return -ENXIO (inspired by Hugh) if no bytes have been transfered to/from
user space, otherwise return partial read/write results.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ee32398fda8ab9867cf8d5469d6e83de5f5c1f7c 15-Dec-2009 Wu Fengguang <fengguang.wu@intel.com> /dev/mem: remove redundant parameter from do_write_kmem()

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
80ad89a0ceb3b16d0f670751ef9875c4569fb4d3 15-Dec-2009 Wu Fengguang <fengguang.wu@intel.com> /dev/mem: remove the "written" variable in write_kmem()

Also rename "len" to "sz". No behavior change.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
7fabaddd09ab32a7c0c08da80315758a2245189d 15-Dec-2009 Wu Fengguang <fengguang.wu@intel.com> /dev/mem: make size_inside_page() logic straight

Also convert more size_inside_page() users.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Avi Kivity <avi@qumranet.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fa29e97bb8c70fd7f564acbed3422403cee10ab7 15-Dec-2009 Wu Fengguang <fengguang.wu@intel.com> /dev/mem: cleanup unxlate_dev_mem_ptr() calls

No behaviour change.

[akpm@linux-foundation.org: cleanuplets]
[akpm@linux-foundation.org: remove unused `ret']
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
f222318e9c3a315723e3524fb9d6566b2430db44 15-Dec-2009 Wu Fengguang <fengguang.wu@intel.com> /dev/mem: introduce size_inside_page()

Introduce size_inside_page() to replace duplicate /dev/mem code.

Also apply it to /dev/kmem, whose alignment logic was buggy.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4ea2f43f28e30050bc99fe3134b6b679f3bf5b22 15-Dec-2009 Wu Fengguang <fengguang.wu@intel.com> /dev/mem: remove redundant test on len

The len test in write_kmem() is always true, so can be reduced.

Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Mark Brown <broonie@opensource.wolfsonmicro.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6b2f3d1f769be5779b479c37800229d9a4809fc3 27-Oct-2009 Christoph Hellwig <hch@lst.de> vfs: Implement proper O_SYNC semantics

While Linux provided an O_SYNC flag basically since day 1, it took until
Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
since that day we had generic_osync_around with only minor changes and the
great "For now, when the user asks for O_SYNC, we'll actually give
O_DSYNC" comment. This patch intends to actually give us real O_SYNC
semantics in addition to the O_DSYNC semantics. After Jan's O_SYNC
patches which are required before this patch it's actually surprisingly
simple, we just need to figure out when to set the datasync flag to
vfs_fsync_range and when not.

This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
numerical value to keep binary compatibility, and adds a new real O_SYNC
flag. To guarantee backwards compatiblity it is defined as expanding to
both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
sure we are backwards-compatible when compiled against the new headers.

This also means that all places that don't care about the differences can
just check O_DSYNC and get the right behaviour for O_SYNC, too - only
places that actuall care need to check __O_SYNC in addition. Drivers and
network filesystems have been updated in a fail safe way to always do the
full sync magic if O_DSYNC is set. The few places setting O_SYNC for
lower layers are kept that way for now to stay failsafe.

We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
to make sure we always get these sane options.

Note that parisc really screwed up their headers as they already define a
O_DSYNC that has always been a no-op. We try to repair it by using it for
the new O_DSYNC and redefinining O_SYNC to send both the traditional
O_SYNC numerical value _and_ the O_DSYNC one.

Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Grant Grundler <grundler@parisc-linux.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Andreas Dilger <adilger@sun.com>
Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Acked-by: Kyle McMartin <kyle@mcmartin.ca>
Acked-by: Ulrich Drepper <drepper@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jan Kara <jack@suse.cz>
af901ca181d92aac3a7dc265144a9081a86d8f39 14-Nov-2009 André Goddard Rosa <andre.goddard@gmail.com> tree-wide: fix assorted typos all over the place

That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
205153aa40b7fb36dc7fe76c1798584ace55b288 09-Oct-2009 Frederic Weisbecker <fweisbec@gmail.com> mem_class: Drop the bkl from memory_open()

The generic open callback for the mem class devices is "protected" by
the bkl.

Let's look at the datas manipulated inside memory_open:

- inode and file: safe
- the devlist: safe because it is constant
- the memdev classes inside this array are safe too (constant)

After we find out which memdev file operation we need to use, we call
its open callback. Depending on the targeted memdev, we call either
open_port() that doesn't manipulate any racy data (just a capable()
check), or we call nothing.

So it's safe to remove the big kernel lock there.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1255113062-5835-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
f0f37e2f77731b3473fa6bd5ee53255d9a9cdb40 27-Sep-2009 Alexey Dobriyan <adobriyan@gmail.com> const: mark struct vm_struct_operations

* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP code

But leave TTM code alone, something is fishy there with global vm_ops
being used.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
bb521c5de070b86a1e049e2dbf62328f717ff1e8 24-Sep-2009 Nikanth Karthikesan <knikanth@suse.de> /dev/zero: avoid repeated access_ok() checks

In read_zero, we check for access_ok() once for the count bytes. It is
unnecessarily checked again in clear_user. Use __clear_user, which does
not check for access_ok().

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e454cea20bdcff10ee698d11b8882662a0153a47 18-Sep-2009 Kay Sievers <kay.sievers@vrfy.org> Driver-Core: extend devnode callbacks to provide permissions

This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.

This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
162dd4212409fd2d36ff22547ea821bf3e86bcc9 14-Sep-2009 Jin Dongming <jin.dongming@np.css.fujitsu.com> mem_class: fix bug

When I build and boot -next on fedora 10, I can not login anymore.
When I input the user name and password, the system does not output
any message and requires user to input the user name and password
again and again.

I find the patch which caused this problem with "GIT BISECT" command.
And the patch is
commit 7c4b7daa1878972ed0137c95f23569124bd6e2b1
"mem_class: use minor as index instead of searching the array".

Though I don't know the real reason why user could not login, I
confirmed the patch I made as following could resolve the problem on
fedora 10.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Acked-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
389e0cb9a1dec22ec37a104dec5009dd7a33dd59 04-Jul-2009 Kay Sievers <kay.sievers@vrfy.org> mem_class: use minor as index instead of searching the array

Declare the device list with the minor numbers as the index, which saves us from
searching for a matching list entry. Remove old devfs permissions declaration.

Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
d993831fa7ffeb89e994f046f93eeb09ec91df08 12-Jun-2009 Jens Axboe <jens.axboe@oracle.com> writeback: add name to backing_dev_info

This enables us to track who does what and print info. Its main use
is catching dirty inodes on the default_backing_dev_info, so we can
fix that up.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
d6f47befdd7483cd1e14a7ae76ef22f7f9722c90 18-Jun-2009 Adriano dos Santos Fernandes <adrianosf@uol.com.br> drivers/char/mem.c: memory_open() cleanup: lookup minor device number from devlist

memory_open() ignores devlist and does a switch for each item, duplicating
code and conditional definitions.

Clean it up by adding backing_dev_info to devlist and use it to lookup for
the minor device.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Adriano dos Santos Fernandes <adrianosf@uol.com.br>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2b83868723d090078ac0e2120e06a1cc94dbaef0 10-Jun-2009 Linus Torvalds <torvalds@linux-foundation.org> Make /dev/zero reads interruptible by signals

This helps with bad latencies for large reads from /dev/zero, but might
conceivably break some application that "knows" that a read of /dev/zero
cannot return early. So do this early in the merge window to give us
maximal test coverage, even if the patch is totally trivial.

Obviously, no well-behaved application should ever depend on the read
being uninterruptible, but hey, bugs happen.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
730c586ad5228c339949b2eb4e72b80ae167abc4 05-Jun-2009 Salman Qazi <sqazi@google.com> drivers/char/mem.c: avoid OOM lockup during large reads from /dev/zero

While running 20 parallel instances of dd as follows:

#!/bin/bash
for i in `seq 1 20`; do
dd if=/dev/zero of=/export/hda3/dd_$i bs=1073741824 count=1 &
done
wait

on a 16G machine, we noticed that rather than just killing the processes,
the entire kernel went down. Stracing dd reveals that it first does an
mmap2, which makes 1GB worth of zero page mappings. Then it performs a
read on those pages from /dev/zero, and finally it performs a write.

The machine died during the reads. Looking at the code, it was noticed
that /dev/zero's read operation had been changed by
557ed1fa2620dc119adb86b34c614e152a629a80 ("remove ZERO_PAGE") from giving
zero page mappings to actually zeroing the page.

The zeroing of the pages causes physical pages to be allocated to the
process. But, when the process exhausts all the memory that it can, the
kernel cannot kill it, as it is still in the kernel mode allocating more
memory. Consequently, the kernel eventually crashes.

To fix this, I propose that when a fatal signal is pending during
/dev/zero read operation, we simply return and let the user process die.

Signed-off-by: Salman Qazi <sqazi@google.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
[ Modified error return and comment trivially. - Linus]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
0c3c8a18361a636069f5a5d9d0d0f9c2124e6b94 09-Apr-2009 Suresh Siddha <suresh.b.siddha@intel.com> x86, PAT: Remove duplicate memtype reserve in devmem mmap

/dev/mem mmap code was doing memtype reserve/free for a while now.
Recently we added memtype tracking in remap_pfn_range, and /dev/mem mmap
uses it indirectly. So, we don't need seperate tracking in /dev/mem code
any more. That means another ~100 lines of code removed :-).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <20090409212709.085210000@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
69beeb1d3428424fbc7546f85e5cd7ac4119c09d 06-Jan-2009 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> mm: make vread() and vwrite() declaration

Sparse output following warnings.

mm/vmalloc.c:1436:6: warning: symbol 'vread' was not declared. Should it be static?
mm/vmalloc.c:1474:6: warning: symbol 'vwrite' was not declared. Should it be static?

However, it is used by /dev/kmem. fixed here.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
03457cd455d042c9ee4cc47c1ed4532257980693 22-Jul-2008 Greg Kroah-Hartman <gregkh@suse.de> device create: char: convert device_create_drvdata to device_create

Now that device_create() has been audited, rename things back to the
original call to be sane.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
7ae8ed5053a39082d224a3f48409e016baca9c16 24-Jul-2008 Rik van Riel <riel@redhat.com> use generic_access_phys for /dev/mem mappings

Use generic_access_phys as the access_process_vm access function for
/dev/mem mappings. This makes it possible to debug the X server.

[akpm@linux-foundation.org: repair all the architectures which broke]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Benjamin Herrensmidt <benh@kernel.crashing.org>
Cc: Dave Airlie <airlied@linux.ie>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
47aa5793f78c274d51711f6a621fa6b02d4e6402 21-May-2008 Greg Kroah-Hartman <gregkh@suse.de> device create: char: convert device_create to device_create_drvdata

device_create() is race-prone, so use the race-free
device_create_drvdata() instead as device_create() is going away.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
d092633bff3b19faffc480fe9810805e7792a029 18-Jul-2008 Ingo Molnar <mingo@elte.hu> Subject: devmem, x86: fix rename of CONFIG_NONPROMISC_DEVMEM
From: Arjan van de Ven <arjan@infradead.org>
Date: Sat, 19 Jul 2008 15:47:17 -0700

CONFIG_NONPROMISC_DEVMEM was a rather confusing name - but renaming it
to CONFIG_PROMISC_DEVMEM causes problems on architectures that do not
support this feature; this patch renames it to CONFIG_STRICT_DEVMEM,
so that architectures can opt-in into it.

( the polarity of the option is still the same as it was originally; it
needs to be for now to not break architectures that don't have the
infastructure yet to support this feature)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: "V.Radhakrishnan" <rk@atr-labs.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
64d206d896ff70b828138577d5ff39deda5f1c4d 18-Jul-2008 Ingo Molnar <mingo@elte.hu> x86: rename CONFIG_NONPROMISC_DEVMEM to CONFIG_PROMISC_DEVMEM

Linus observed:

> The real bug is that we shouldn't have "double negatives", and
> certainly not negative config options. Making that "promiscuous
> /dev/mem" option a negated thing as a config option was bad.

right ... lets rename this option. There should never be a negation
in config options.

[ that reminds me of CONFIG_SCHED_NO_NO_OMIT_FRAME_POINTER, but that
is for another commit ;-) ]

Signed-off-by: Ingo Molnar <mingo@elte.hu>
1f439647a4072ec64bb2e4b9290cd7be6aee8328 15-May-2008 Jonathan Corbet <corbet@lwn.net> mem: cdev lock_kernel() pushdown

It's really hard to tell if this is necessary - lots of weird
magic happens by way of map_devmem()

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
b781ecb6a379f155568ef7093e38c6c1d857fe53 29-Apr-2008 Arjan van de Ven <arjan@linux.intel.com> make /dev/kmem a config option

Make /dev/kmem a config option; /dev/kmem is VERY rarely used, and when
used, it's generally for no good (rootkits tend to be the most common
users). With this config option, users have the choice to disable
/dev/kmem, saving some size as well.

A patch to disable /dev/kmem has been in the Fedora and RHEL kernels for
4+ years now without any known problems or legit users of /dev/kmem.

[akpm@linux-foundation.org: make CONFIG_DEVKMEM default to y]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e7f260a276f2c9184fe753732d834b1f6fbe9f17 19-Mar-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> x86: PAT use reserve free memtype in mmap of /dev/mem

Use reserve_memtype and free_memtype wrappers for /dev/mem mmaps. The memtype
is slightly complicated here, given that we have to support existing X mappings.
We fallback on UC_MINUS for that.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
f0970c13b6a5b01189aeb196ebb573cf87d95839 19-Mar-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> x86: PAT phys_mem_access_prot_allowed for dev/mem mmap

Introduce phys_mem_access_prot_allowed(), which checks whether the mapping
is possible, without any conflicts and returns success or failure based on that.
phys_mem_access_prot() by itself does not allow failure case. This ability
to return error is needed for PAT where we may have aliasing conflicts.

x86 setup __HAVE_PHYS_MEM_ACCESS_PROT and move x86 specific code out of
/dev/mem into arch specific area.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e045fb2a988a9a1964059b0d33dbaf18d12f925f 19-Mar-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> x86: PAT avoid aliasing in /dev/mem read/write

Add xlate and unxlate around /dev/mem read/write. This sets up the mapping
that can be used for /dev/mem read and write without aliasing worries.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e2beb3eae627211b67e456c53f946cede2ac10d7 07-Mar-2008 Venki Pallipadi <venkatesh.pallipadi@intel.com> devmem: add range_is_allowed() check to mmap of /dev/mem

Earlier patch that introduced CONFIG_NONPROMISC_DEVMEM, did the
range_is_allowed() check only for read and write. Add range_is_allowed()
check to mmap of /dev/mem as well.

Changes the paramaters of range_is_allowed() to pfn and size to handle
more than 32 bits of physical address on 32 bit arch cleanly.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ae531c26c5c2a28ca1b35a75b39b3b256850f2c8 24-Apr-2008 Arjan van de Ven <arjan@linux.intel.com> x86: introduce /dev/mem restrictions with a config option

This patch introduces a restriction on /dev/mem: Only non-memory can be
read or written unless the newly introduced config option is set.

The X server needs access to /dev/mem for the PCI space, but it doesn't need
access to memory; both the file permissions and SELinux permissions of /dev/mem
just make X effectively super-super powerful. With the exception of the
BIOS area, there's just no valid app that uses /dev/mem on actual memory.
Other popular users of /dev/mem are rootkits and the like.
(note: mmap access of memory via /dev/mem was already not allowed since
a really long time)

People who want to use /dev/mem for kernel debugging can enable the config
option.

The restrictions of this patch have been in the Fedora and RHEL kernels for
at least 4 years without any problems.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ca5cd877ae699e758e6f26efc11b01bf6631d427 29-Oct-2007 Al Viro <viro@ftp.linux.org.uk> x86 merge fallout: uml

Don't undef __i386__/__x86_64__ in uml anymore, make sure that (few) places
that required adjusting the ifdefs got those.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e0bf68ddec4f4f90e5871404be4f1854c17f3120 17-Oct-2007 Peter Zijlstra <a.p.zijlstra@chello.nl> mm: bdi init hooks

provide BDI constructor/destructor hooks

[akpm@linux-foundation.org: compile fix]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
557ed1fa2620dc119adb86b34c614e152a629a80 16-Oct-2007 Nick Piggin <npiggin@suse.de> remove ZERO_PAGE

The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note

A last caveat: the ZERO_PAGE is now refcounted and managed with rmap
(and thus mapcounted and count towards shared rss). These writes to
the struct page could cause excessive cacheline bouncing on big
systems. There are a number of ways this could be addressed if it is
an issue.

And indeed this cacheline bouncing has shown up on large SGI systems.
There was a situation where an Altix system was essentially livelocked
tearing down ZERO_PAGE pagetables when an HPC app aborted during startup.
This situation can be avoided in userspace, but it does highlight the
potential scalability problem with refcounting ZERO_PAGE, and corner
cases where it can really hurt (we don't want the system to livelock!).

There are several broad ways to fix this problem:
1. add back some special casing to avoid refcounting ZERO_PAGE
2. per-node or per-cpu ZERO_PAGES
3. remove the ZERO_PAGE completely

I will argue for 3. The others should also fix the problem, but they
result in more complex code than does 3, with little or no real benefit
that I can see.

Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a
false optimisation: if an application is performance critical, it would
not be doing many read faults of new memory, or at least it could be
expected to write to that memory soon afterwards. If cache or memory use
is critical, it should not be working with a significant number of
ZERO_PAGEs anyway (a more compact representation of zeroes should be
used).

As a sanity check -- mesuring on my desktop system, there are never many
mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not
increase much without it.

When running a make -j4 kernel compile on my dual core system, there are
about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000
ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second
is torn down without being COWed). So removing ZERO_PAGE will save 1,000
page faults per second when running kbuild, while keeping it only saves
less than 1 page clearing operation per second. 1 page clear is cheaper
than a thousand faults, presumably, so there isn't an obvious loss.

Neither the logical argument nor these basic tests give a guarantee of no
regressions. However, this is a reasonable opportunity to try to remove
the ZERO_PAGE from the pagefault path. If it is found to cause regressions,
we can reintroduce it and just avoid refcounting it.

The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see
much use to them except on benchmarks. All other users of ZERO_PAGE are
converted just to use ZERO_PAGE(0) for simplicity. We can look at
replacing them all and maybe ripping out ZERO_PAGE completely when we are
more satisfied with this solution.

Signed-off-by: Nick Piggin <npiggin@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
24e9d0b96dac5503c0b6f034d553030c604228a7 10-Jul-2007 Ralf Baechle <ralf@linux-mips.org> [MIPS] Hook for platforms to define cachability of /dev/mem regions

Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
d6b29d7cee064f28ca097e906de7453541351095 04-Jun-2007 Jens Axboe <jens.axboe@oracle.com> splice: divorce the splice structure/function definitions from the pipe header

We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
4f911d64e04a44c47985be30f978fb3c2efcee0c 08-May-2007 Russell King <rmk+lkml@arm.linux.org.uk> Make /dev/port conditional on config symbol

Instead of having /dev/port support dependent in multiple places on a
string of preprocessor symbols, define a new configuration directive for
it. This ensures that all four places remain consistent with each other.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e63340ae6b6205fef26b40a75673d1c9c0c8bb90 08-May-2007 Randy Dunlap <randy.dunlap@oracle.com> header cleaning: don't include smp_lock.h when not used

Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8a93258ce302c2b597289770cb7de8dba7c6c219 17-Apr-2007 Benjamin Herrenschmidt <benh@kernel.crashing.org> fix bogon in /dev/mem mmap'ing on nommu

While digging through my MAP_FIXED changes, I found that rather obvious
bug in /dev/mem mmap implementation for nommu archs. get_unmapped_area()
is expected to return an address, not a pfn.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-By: David Howells <dhowells@redhat.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6d3154cc1143f62c3b80d9929caeaec6db8cb451 22-Jan-2007 Linus Torvalds <torvalds@woody.linux-foundation.org> Revert "[PATCH] Fix up mmap_kmem"

This reverts commit 99a10a60ba9bedcf5d70ef81414d3e03816afa3f.

As per Hugh Dickins:

"Nadia Derbey has reported that mmap of /dev/kmem no longer works with
the kernel virtual address as offset, and Franck has confirmed that
his patch came from a misunderstanding of what an offset means to
/dev/kmem - whereas his patch description seems to say that he was
correcting the offset on a few plaforms, there was no such problem to
correct, and his patch was in fact changing its API on all platforms."

Suggested-by: Hugh Dickins <hugh@veritas.com>
Cc: Franck Bui-Huu <fbuihuu@gmail.com>
Cc: Nadia Derbey <Nadia.Derbey@bull.net>
Cc: Andi Kleen <ak@suse.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5fcf7bb73f66cc1c4ad90788b0f367c4d6852b75 10-Dec-2006 Hugh Dickins <hugh@veritas.com> [PATCH] read_zero_pagealigned() locking fix

Ramiro Voicu hits the BUG_ON(!pte_none(*pte)) in zeromap_pte_range: kernel
bugzilla 7645. Right: read_zero_pagealigned uses down_read of mmap_sem,
but another thread's racing read of /dev/zero, or a normal fault, can
easily set that pte again, in between zap_page_range and zeromap_page_range
getting there. It's been wrong ever since 2.4.3.

The simple fix is to use down_write instead, but that would serialize reads
of /dev/zero more than at present: perhaps some app would be badly
affected. So instead let zeromap_page_range return the error instead of
BUG_ON, and read_zero_pagealigned break to the slower clear_user loop in
that case - there's no need to optimize for it.

Use -EEXIST for when a pte is found: BUG_ON in mmap_zero (the other user of
zeromap_page_range), though it really isn't interesting there. And since
mmap_zero wants -EAGAIN for out-of-memory, the zeromaps better return that
than -ENOMEM.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Cc: Ramiro Voicu: <Ramiro.Voicu@cern.ch>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
a7113a966241b700aecc7b8cb326cecb62e3c4b2 08-Dec-2006 Josef Sipek <jsipek@fsl.cs.sunysb.edu> [PATCH] struct path: convert char-drivers

Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ebf644c4623bc3eb57683199cd2b9080028b0f6f 26-Jul-2006 Greg Kroah-Hartman <gregkh@suse.de> Driver core: change mem class_devices to be real devices

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
b8a3ad5b53918787f4708ad9dfe90d2557cc78dd 13-Oct-2006 Linus Torvalds <torvalds@g5.osdl.org> Include proper header file for PFN_DOWN()

The recent commit (99a10a60ba9bedcf5d70ef81414d3e03816afa3f) to fix up
mmap_kmem() broke compiles because it used PFN_DOWN() without including
<linux/pfn.h>.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
99a10a60ba9bedcf5d70ef81414d3e03816afa3f 12-Oct-2006 Franck Bui-Huu <fbuihuu@gmail.com> [PATCH] Fix up mmap_kmem

vma->vm_pgoff is an pfn _offset_ relatif to the begining
of the memory start. The previous code was doing at first:

vma->vm_pgoff << PAGE_SHIFT

which results into a wrong physical address since some
platforms have a physical mem start that can be different
from 0. After that the previous call __pa() on this
wrong physical address, however __pa() is used to convert
a _virtual_ address into a physical one.

This patch rewrites this convertion. It calculates the
pfn of PAGE_OFFSET which is the pfn of the mem start
then it adds the vma->vm_pgoff to it.

It also uses virt_to_phys() instead of __pa() since the
latter shouldn't be used by drivers.

Signed-off-by: Franck Bui-Huu <fbuihuu@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
153dcc54df826d2f8413c026313cba673c6bcc5b 29-Sep-2006 Geoff Levand <geoffrey.levand@am.sony.com> [PATCH] mem driver: fix conditional on isa i/o support

This change corrects the logic on the preprocessor conditionals that
include support for ISA port i/o (/dev/ioports) into the mem character
driver.

This fixes the following error when building for powerpc platforms with
CONFIG_PCI=n.

drivers/built-in.o: undefined reference to `pci_io_base'

Signed-off-by: Geoff Levand <geoffrey.levand@am.sony.com>
Acked-by: Linas Vepstas <lins@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
5da6185bca064e35aa73a7c1f27488d2b96434f4 27-Sep-2006 David Howells <dhowells@redhat.com> [PATCH] NOMMU: Set BDI capabilities for /dev/mem and /dev/kmem

Set the backing device info capabilities for /dev/mem and /dev/kmem to
permit direct sharing under no-MMU conditions and full mapping capabilities
under MMU conditions. Make the BDI used by these available to all directly
mappable character devices.

Also comment the capabilities for /dev/zero.

[akpm@osdl.org: ifdef reductions]
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
06c67befeeb16f2995c11b0e04a348103ddbfab1 10-Jul-2006 Lennert Buytenhek <buytenh@wantstofly.org> [PATCH] make valid_mmap_phys_addr_range() take a pfn

Newer ARMs have a 40 bit physical address space, but mapping physical
memory above 4G needs a special page table format which we (currently?) do
not use for userspace mappings, so what happens instead is that mapping an
address >= 4G will happily discard the upper bits and wrap.

There is a valid_mmap_phys_addr_range() arch hook where we could check for
>= 4G addresses and deny the mapping, but this hook takes an unsigned long
address:

static inline int valid_mmap_phys_addr_range(unsigned long addr, size_t size);

And drivers/char/mem.c:mmap_mem() calls it like this:

static int mmap_mem(struct file * file, struct vm_area_struct * vma)
{
size_t size = vma->vm_end - vma->vm_start;

if (!valid_mmap_phys_addr_range(vma->vm_pgoff << PAGE_SHIFT, size))

So that's not much help either.

This patch makes the hook take a pfn instead of a phys address.

Signed-off-by: Lennert Buytenhek <buytenh@wantstofly.org>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
62322d2554d2f9680c8ace7bbf1f97d8fa84ad1a 03-Jul-2006 Arjan van de Ven <arjan@infradead.org> [PATCH] make more file_operation structs static

Mark the static struct file_operations in drivers/char as const. Making
them const prevents accidental bugs, and moves them to the .rodata section
so that they no longer do any false sharing; in addition with the proper
debug option they are then protected against corruption..

[akpm@osdl.org: build fix]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6ab3d5624e172c553004ecc862bfeac16d9d68b7 30-Jun-2006 Jörn Engel <joern@wohnheim.fh-wedel.de> Remove obsolete #include <linux/config.h>

Signed-off-by: Jörn Engel <joern@wohnheim.fh-wedel.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
ff23eca3e8f613034e0d20ff86f6a89b62f5a14e 21-Jun-2005 Greg Kroah-Hartman <gregkh@suse.de> [PATCH] devfs: Remove the devfs_fs_kernel.h file from the tree

Also fixes up all files that #include it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
7c69ef79741910883d5543caafa06aca3ebadbd1 21-Jun-2005 Greg Kroah-Hartman <gregkh@suse.de> [PATCH] devfs: Remove devfs_mk_cdev() function from the kernel tree

Removes the devfs_mk_cdev() function and all callers of it.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
1ebd32fc54bd04de6b3944587f25513c0681f98e 26-Apr-2006 Jens Axboe <axboe@suse.de> [PATCH] splice: add ->splice_write support for /dev/null

Useful for testing.

Signed-off-by: Jens Axboe <axboe@suse.de>
99ac48f54a91d02140c497edc31dc57d4bc5c85d 28-Mar-2006 Arjan van de Ven <arjan@infradead.org> [PATCH] mark f_ops const in the inode

Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
136939a2b5aa4302281215745ccd567e1df2e8d4 26-Mar-2006 Bjorn Helgaas <bjorn.helgaas@hp.com> [PATCH] EFI, /dev/mem: simplify efi_mem_attribute_range()

Pass the size, not a pointer to the size, to efi_mem_attribute_range().

This function validates memory regions for the /dev/mem read/write/mmap paths.
The pointer allows arches to reduce the size of the range, but I think that's
unnecessary complexity. Simplifying it will let me use
efi_mem_attribute_range() to improve the ia64 ioremap() implementation.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Matt Domsch <Matt_Domsch@dell.com>
Cc: "Tolentino, Matthew E" <matthew.e.tolentino@intel.com>
Cc: "Brown, Len" <len.brown@intel.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
c654d60e8f0ea13e35b15cff54c0e473b8b162be 25-Mar-2006 Jan Beulich <JBeulich@novell.com> [PATCH] adjust /dev/{kmem,mem,port} write handlers

The /dev/mem and /dev/kmem write handlers weren't fully POSIX compliant in
that they wouldn't always force the file pointer to be updated when
returning success status.

The /dev/port write handler was inconsistent with the /dev/mem and
/dev/kmem handlers in that when encountering a -EFAULT condition after
already having written a number of items it would return -EFAULT rather
than the number of bytes written.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ee2cdecec4dce8f7eb0d37a1bbf820cb32b2b75b 12-Jan-2006 Stephen Rothwell <sfr@canb.auug.org.au> [PATCH] powerpc: iSeries fixes for build with no PCI

This reverts part of "ppc64 iSeries: allow build with no PCI"
(145d01e4287b8cbf50f87c3283e33bf5c84e8468) which affected generic code
and applies a fix in the arch specific code.

Commit "partly merge iseries do_IRQ"
(5fee9b3b39eb55c7e3619a3b36ceeabffeb8f144) introduced iSeries_get_irq
which was only available if CONFIG_PCI is set.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Paul Mackerras <paulus@samba.org>
1b1dcc1b57a49136f118a0f16367256ff9994a69 10-Jan-2006 Jes Sorensen <jes@sgi.com> [PATCH] mutex subsystem, semaphore to mutex: VFS, ->i_sem

This patch converts the inode semaphore to a mutex. I have tested it on
XFS and compiled as much as one can consider on an ia64. Anyway your
luck with it might be different.

Modified-by: Ingo Molnar <mingo@elte.hu>

(finished the conversion)

Signed-off-by: Jes Sorensen <jes@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
80851ef2a5a404e6054211ca96ecd5ac4b06d297 08-Jan-2006 Bjorn Helgaas <bjorn.helgaas@hp.com> [PATCH] /dev/mem: validate mmap requests

Add a hook so architectures can validate /dev/mem mmap requests.

This is analogous to validation we already perform in the read/write
paths.

The identity mapping scheme used on ia64 requires that each 16MB or
64MB granule be accessed with exactly one attribute (write-back or
uncacheable). This avoids "attribute aliasing", which can cause a
machine check.

Sample problem scenario:
- Machine supports VGA, so it has uncacheable (UC) MMIO at 640K-768K
- efi_memmap_init() discards any write-back (WB) memory in the first granule
- Application (e.g., "hwinfo") mmaps /dev/mem, offset 0
- hwinfo receives UC mapping (the default, since memmap says "no WB here")
- Machine check abort (on chipsets that don't support UC access to WB
memory, e.g., sx1000)

In the scenario above, the only choices are
- Use WB for hwinfo mmap. Can't do this because it causes attribute
aliasing with the UC mapping for the VGA MMIO space.
- Use UC for hwinfo mmap. Can't do this because the chipset may not
support UC for that region.
- Disallow the hwinfo mmap with -EINVAL. That's what this patch does.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Hugh Dickins <hugh@veritas.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
44ac8413901167589226abf824d994aa57e4fd28 08-Jan-2006 Bjorn Helgaas <bjorn.helgaas@hp.com> [PATCH] /dev/mem __HAVE_PHYS_MEM_ACCESS_PROT tidy-up

Tidy up __HAVE_PHYS_MEM_ACCESS_PROT usage to make mmap_mem() easier to read.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
cd140a5c1f456f50897af4a2e9a23d228a5fe719 08-Jan-2006 Guillaume Chazarain <guichaz@yahoo.fr> [PATCH] kmsg_write: don't return printk return value

kmsg_write returns with printk, so some programs may be confused by a
successful write() with a return value different than the buffer length.

# /bin/echo something > /dev/kmsg
/bin/echo: write error: Inappropriate ioctl for device

The drawbacks is that the printk return value can no more be quickly
checked from userspace.

Signed-off-by: Guillaume Chazarain <guichaz@yahoo.fr>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6aab341e0a28aff100a09831c5300a2994b8b986 28-Nov-2005 Linus Torvalds <torvalds@g5.osdl.org> mm: re-architect the VM_UNPAGED logic

This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
VM area to contain an arbitrary range of page table entries that the VM
never touches, and never considers to be normal pages.

Any user of "remap_pfn_range()" automatically gets this new
functionality, and doesn't even have to mark the pages reserved or
indeed mark them any other way. It just works. As a side effect, doing
mmap() on /dev/mem works for arbitrary ranges.

Sparc update from David in the next commit.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
f57e88a8d83de8d844b57e16b84d2f762fe9f092 22-Nov-2005 Hugh Dickins <hugh@veritas.com> [PATCH] unpaged: ZERO_PAGE in VM_UNPAGED

It's strange enough to be looking out for anonymous pages in VM_UNPAGED areas,
let's not insert the ZERO_PAGE there - though whether it would matter will
depend on what we decide about ZERO_PAGE refcounting.

But whereas do_anonymous_page may (exceptionally) be called on a VM_UNPAGED
area, do_no_page should never be: just BUG_ON.

Signed-off-by: Hugh Dickins <hugh@veritas.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
8b150478aeb1a8edb9015c2f7ac4da637ff65c45 29-Oct-2005 Roland Dreier <rolandd@cisco.com> [PATCH] ppc: make phys_mem_access_prot() work with pfns instead of addresses

Change the phys_mem_access_prot() function to take a pfn instead of an
address. This allows mmap64() to work on /dev/mem for addresses above 4G
on 32-bit architectures. We start with a pfn in mmap_mem(), so there's no
need to convert to an address; in fact, it's actively bad, since the
conversion can overflow when the address is above 4G.

Similarly fix the ppc32 page_is_ram() function to avoid a conversion to an
address by directly comparing to max_pfn. Working with max_pfn instead of
high_memory fixes page_is_ram() to give the right answer for highmem pages.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
53f4654272df7c51064825024340554b39c9efba 28-Oct-2005 Greg Kroah-Hartman <gregkh@suse.de> [PATCH] Driver Core: fix up all callers of class_device_create()

The previous patch adding the ability to nest struct class_device
changed the paramaters to the call class_device_create(). This patch
fixes up all in-kernel users of the function.

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
edf83015fcbff8976b75b42b565a77e9d450c567 07-Sep-2005 Christoph Hellwig <hch@lst.de> [PATCH] remove a dead extern in mem.c

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
4bb82551e165f887448f6f61055d7bcd90aefa2a 13-Aug-2005 Linus Torvalds <torvalds@g5.osdl.org> Fix up mmap of /dev/kmem

This leaves the issue of whether we should deprecate the whole thing (or
if we should check the whole mmap range, for that matter) open. Just do
the minimal fix for now.
72414d3f1d22fc3e311b162fca95c430048d38ce 25-Jun-2005 Maneesh Soni <maneesh@in.ibm.com> [PATCH] kexec code cleanup

o Following patch provides purely cosmetic changes and corrects CodingStyle
guide lines related certain issues like below in kexec related files

o braces for one line "if" statements, "for" loops,
o more than 80 column wide lines,
o No space after "while", "for" and "switch" key words

o Changes:
o take-2: Removed the extra tab before "case" key words.
o take-3: Put operator at the end of line and space before "*/"

Signed-off-by: Maneesh Soni <maneesh@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
315c215c0a7324894541d43b0e720f20cafca92e 25-Jun-2005 Vivek Goyal <vgoyal@in.ibm.com> [PATCH] kdump: cleanups for dump file access in linear raw format

Removed the dependency on backup region. Now all the information is encoded
in ELF format. /dev/oldmem is a dummy interface. User space tool need to be
intelligent enough to parse the elf headers and read the relevant memory areas
with the help of /dev/oldmem.

Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
50b1fdbd81edcc8bd343ca44aca2b87a29e2f15c 25-Jun-2005 Vivek Goyal <vgoyal@in.ibm.com> [PATCH] kdump: Accessing dump file in linear raw format (/dev/oldmem)

Hariprasad Nellitheertha <hari@in.ibm.com>

This patch contains the code that enables us to access the previous kernel's
memory as /dev/oldmem.

Signed-off-by: Eric Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
145d01e4287b8cbf50f87c3283e33bf5c84e8468 22-Jun-2005 Stephen Rothwell <sfr@canb.auug.org.au> [PATCH] ppc64 iSeries: allow build with no PCI

This patch allows iSeries to build with CONFIG_PCI=n. This is useful for
partitions that have only virtual I/O.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ca8eca6884861c1ce294b05aacfdf9045bba9aff 23-Mar-2005 gregkh@suse.de <gregkh@suse.de> [PATCH] class: convert drivers/char/* to use the new class api instead of class_simple

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!