History log of /arch/sh/mm/fault_32.c
Revision Date Author Comments
8d9a784d1e2c75e0dcae06f77a02f5e7bb547f3a 14-Feb-2012 Stuart Menefy <stuart.menefy@st.com> sh: Fix error synchronising kernel page tables

The problem is caused by the interaction of two features in the Linux
memory management code.

A processes address space is described by a struct mm_struct, and
every thread has a pointer to the mm it should run in. The exception
to this are kernel threads, which don't have an mm, and so borrow
the mm from the last thread which ran. The system is bootstrapped
by the initial kernel thread using init's mm (even though init hasn't
been created yet, its mm is the static init_mm).

The other feature is how the kernel handles the page table which
describes the portion of the address space which is only visible when
executing inside the kernel, and which is shared by all threads. On
the SH4 the only portion of the kernel's address space which described
using the page table is called P3, from 0xc0000000 to 0xdfffffff. This
portion of the address space is divided into three:
- mappings for dma_alloc_coherent()
- mappings for vmalloc() and ioremap()
- fixmap mappings, primarily used in copy_user_pages() to create
kernel mappings of user pages with the correct cache colour.

To optimise the TLB miss handler we don't want to add an additional
condition which checks whether the faulting address is in the user or
the kernel portion of the address space, and so all page tables have a
common portion which describes the kernel part of the address
space. As the SH4 uses a two level page table, only the kernel portion
of first level page table (the pgd entries) is duplicated. These all
point to the same second level entries (the pte's), and so no memory
is wasted.

The reference page table for the kernel is called the swapper_pg_dir,
and when a new page table is created for a new process the kernel
portion of the page table is copied from swapper_pg_dir. This works
fine when changes only occur in the second level of the kernel's page
table, or the first level entries are created before any new user
processes. However if a change occurs to the first level of the page
table, and there are existing processes which don't have this entry in
their page table, this new entry needs to be added. This is done on
demand, when the kernel accesses a P3 address which isn't mapped using
the current page table, the code in vmalloc_fault() copies the entry
from the reference page table (swapper_pg_dir) into the current
processes page table.

The bug which this patch addresses is that the code in vmalloc_fault()
was not copying addresses which fell in the dma_alloc_coherent()
portion of the address space, and it should have been copying any P3
address.

Why we hadn't seen this before, and what made this hard to reproduce,
is that normally the kernel will have called dma_alloc_coherent(), and
accessed the memory mapping created, before any user process
runs. Typically drivers such as USB or SATA will have created and used
mappings of this type during the kernel initialisation, when probing
for the attached devices, before init runs. Ethernet is slightly
different, as it normally only creates and accesses
dma_alloc_coherent() mappings when the network is brought up, but if
kernel level IP configuration is used this will also occur before any
user space process runs. So the first reproduction of this problem
which we saw was occurred when USB and SATA were removed from the
kernel, and then bring up Ethernet from user space using ifconfig.
I'd like to thank Joseph Bormolini who did the hard work reducing the
problem to this simple to reproduce criteria.

In your case the situation is slightly different, and turns out to
depends on the exact kernel configuration (which we had) and your
ramdisk contents (which we didn't - hence the need for some assumptions).

In this case the problem is a side effect of kernel level module
loading. Kernel subsystems sometimes trigger the load of kernel
modules directly, for example the crypto subsystem tries to load the
cryptomgr and MTD tries to load modules for Flash partitioning if
these are not built into the kernel. This is done by the kernel
creating a user process which runs insmod to try and load the
appropriate module.

In order for this to cause problems the system must be running with a
initrd or initramfs, which contains an insmod executable - if the
kernel can't find an insmod to run, no user process is created, and
the problem doesn't occur. If an insmod is found, a process is
created to run it, which will inherit the kernel portion of the
swapper_pg_dir first level page table. It doesn't matter whether the
inmod is successful or not, but when the the kernel scheduler context
switches back to the kernel initialisation thread, the insmod's mm is
'borrowed' by the kernel thread, as it doesn't have an address space
of its own. (Reference counting is used to ensure this mm is not
destroyed, even though the user process which caused its creation may no
longer exist.) If this address space doesn't have a first level page
table entry for the consistent mappings, and a driver tries to access
such a mapping, we are in the same situation as described above,
except this time in a kernel thread rather than a user thread
executing inside the kernel.

See bugzilla: 15425, 15836, 15862, 16106, 16793

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
e839ca528718e68cad32a307dc9aabf01ef3eb05 28-Mar-2012 David Howells <dhowells@redhat.com> Disintegrate asm/system.h for SH

Disintegrate asm/system.h for SH.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-sh@vger.kernel.org
a8b0ca17b80e92faab46ee7179ba9e99ccb61233 27-Jun-2011 Peter Zijlstra <a.p.zijlstra@chello.nl> perf: Remove the nmi parameter from the swevent and overflow interface

The nmi parameter indicated if we could do wakeups from the current
context, if not, we would set some state and self-IPI and let the
resulting interrupt do the wakeup.

For the various event classes:

- hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from
the PMI-tail (ARM etc.)
- tracepoint: nmi=0; since tracepoint could be from NMI context.
- software: nmi=[0,1]; some, like the schedule thing cannot
perform wakeups, and hence need 0.

As one can see, there is very little nmi=1 usage, and the down-side of
not using it is that on some platforms some software events can have a
jiffy delay in wakeup (when arch_irq_work_raise isn't implemented).

The up-side however is that we can remove the nmi parameter and save a
bunch of conditionals in fast paths.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: Anton Blanchard <anton@samba.org>
Cc: Eric B Munson <emunson@mgebm.net>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
364b97d9e2fec32b7c125f67e5a9e5f1cd0e6a37 26-Apr-2010 Paul Mundt <lethal@linux-sh.org> sh: Kill off dangling goto labels from oom-killer rework.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
6b6b18e62cfba44ce7b6489c7100f12b199232d7 22-Apr-2010 Nick Piggin <npiggin@suse.de> sh: invoke oom-killer from page fault

As explained in commit 1c0fe6e3bd, we want to call the architecture independent
oom killer when getting an unexplained OOM from handle_mm_fault, rather than
simply killing current.

Cc: linux-sh@vger.kernel.org
Cc: linux-arch@vger.kernel.org
Signed-off-by: Nick Piggin <npiggin@suse.de>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
4b3073e1c53a256275f1079c0fbfbe85883d9275 18-Dec-2009 Russell King <rmk+kernel@arm.linux.org.uk> MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself

On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies. We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
to construct a pointer to the pte again. Passing a pte_t * is much
more elegant. Maybe we might even replace the pte argument with the
pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

Passing the ptep in there is exactly what I want. I want that
-instead- of the PTE value, because I have issue on some ppc cases,
for I$/D$ coherency, where set_pte_at() may decide to mask out the
_PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

sparc: fix fallout from update_mmu_cache API change

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
5d9b4b19f118abfb75e352841f7bf74580d7e427 13-Dec-2009 Matt Fleming <matt@console-pimps.org> sh: Definitions for 3-level page table layout

If using 64-bit PTEs and 4K pages then each page table has 512 entries
(as opposed to 1024 entries with 32-bit PTEs). Unlike MIPS, SH follows
the convention that all structures in the page table (pgd_t, pmd_t,
pgprot_t, etc) must be the same size. Therefore, 64-bit PTEs require
64-bit PGD entries, etc. Using 2-levels of page tables and 64-bit PTEs
it is only possible to map 1GB of virtual address space.

In order to map all 4GB of virtual address space we need to adopt a
3-level page table layout. This actually works out better for
CONFIG_SUPERH32 because we only waste 2 PGD entries on the P1 and P2
areas (which are untranslated) instead of 256.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
cdd6c482c9ff9c55475ee7392ec8f672eddb7be6 21-Sep-2009 Ingo Molnar <mingo@elte.hu> perf: Do the big rename: Performance Counters -> Performance Events

Bye-bye Performance Counters, welcome Performance Events!

In the past few months the perfcounters subsystem has grown out its
initial role of counting hardware events, and has become (and is
becoming) a much broader generic event enumeration, reporting, logging,
monitoring, analysis facility.

Naming its core object 'perf_counter' and naming the subsystem
'perfcounters' has become more and more of a misnomer. With pending
code like hw-breakpoints support the 'counter' name is less and
less appropriate.

All in one, we've decided to rename the subsystem to 'performance
events' and to propagate this rename through all fields, variables
and API names. (in an ABI compatible fashion)

The word 'event' is also a bit shorter than 'counter' - which makes
it slightly more convenient to write/handle as well.

Thanks goes to Stephane Eranian who first observed this misnomer and
suggested a rename.

User-space tooling and ABI compatibility is not affected - this patch
should be function-invariant. (Also, defconfigs were not touched to
keep the size down.)

This patch has been generated via the following script:

FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')

sed -i \
-e 's/PERF_EVENT_/PERF_RECORD_/g' \
-e 's/PERF_COUNTER/PERF_EVENT/g' \
-e 's/perf_counter/perf_event/g' \
-e 's/nb_counters/nb_events/g' \
-e 's/swcounter/swevent/g' \
-e 's/tpcounter_event/tp_event/g' \
$FILES

for N in $(find . -name perf_counter.[ch]); do
M=$(echo $N | sed 's/perf_counter/perf_event/g')
mv $N $M
done

FILES=$(find . -name perf_event.*)

sed -i \
-e 's/COUNTER_MASK/REG_MASK/g' \
-e 's/COUNTER/EVENT/g' \
-e 's/\<event\>/event_id/g' \
-e 's/counter/event/g' \
-e 's/Counter/Event/g' \
$FILES

... to keep it as correct as possible. This script can also be
used by anyone who has pending perfcounters patches - it converts
a Linux kernel tree over to the new naming. We tried to time this
change to the point in time where the amount of pending patches
is the smallest: the end of the merge window.

Namespace clashes were fixed up in a preparatory patch - and some
stylistic fallout will be fixed up in a subsequent patch.

( NOTE: 'counters' are still the proper terminology when we deal
with hardware registers - and these sed scripts are a bit
over-eager in renaming them. I've undone some of that, but
in case there's something left where 'counter' would be
better than 'event' we can undo that on an individual basis
instead of touching an otherwise nicely automated patch. )

Suggested-by: Stephane Eranian <eranian@google.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <linux-arch@vger.kernel.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0906a3ad33a254094fb74828e3ddb9af8771a6da 03-Sep-2009 Paul Mundt <lethal@linux-sh.org> sh: Fix up and optimize the kmap_coherent() interface.

This fixes up the kmap_coherent/kunmap_coherent() interface for recent
changes both in the page fault path and the shared cache flushers, as
well as adding in some optimizations.

One of the key things to note here is that the TLB flush itself is
deferred until the unmap, and the call in to update_mmu_cache() itself
goes away, relying on the regular page fault path to handle the lazy
dcache writeback if necessary.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
8010fbe7a67c2f993cbb11b9d8b7e98528256dd1 14-Aug-2009 Paul Mundt <lethal@linux-sh.org> sh: TLB fast path optimizations for load/store exceptions.

This only bothers with the TLB entry flush in the case of the initial
page write exception, as it is unecessary in the case of the load/store
exceptions.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
112e58471de3431fbd03dee514777ad4a66a77b2 14-Aug-2009 Paul Mundt <lethal@linux-sh.org> sh: TLB protection violation exception optimizations.

This adds a bit of rework to have the TLB protection violations skip the
TLB miss fastpath and go directly in to do_page_fault(), as these require
slow path handling.

Based on an earlier patch by SUGIOKA Toshinobu.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
05dd2cd3bb3299540e33ff60c5b401dd88f273bd 13-Jul-2009 Matt Fleming <matt@console-pimps.org> sh: Restore previous behaviour on kernel fault

The last commit changed the behaviour on kernel faults when we were
doing something other than syncing the page tables. vmalloc_sync_one()
needs to return NULL if the page tables are up to date, because the
reason for the fault was not a missing/inconsitent page table entry. By
returning NULL if the page tables are sync'd we signal to the calling
function that further work must be done to resolve this fault.

Also, remove the superfluous __va() around the first argument to
vmalloc_sync_one(). The value of pgd_k is already a virtual address and
using it wth __va() causes a NULL dereference.

Signed-off-by: Matt Fleming <matt@console-pimps.org>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
0f60bb25b4036d30fd795709be09626c58c52464 04-Jul-2009 Paul Mundt <lethal@linux-sh.org> sh: Tidy up vmalloc fault handling.

This rewrites the vmalloc fault handling as per x86, which subsequently
allows for easy future tie-in for vmalloc_sync_all().

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
c63c3105e4991b2991ba73a742b8b59bfdbe4acd 04-Jul-2009 Paul Mundt <lethal@linux-sh.org> sh: use kprobes_built_in() for notify_page_fault().

Kill off the KPROBES ifdef, as per x86.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
7433ab770327b471399f7b5baacad78e160b5393 24-Jun-2009 Paul Mundt <lethal@linux-sh.org> sh: Hook up page fault events for software perf counters.

This adds page fault instrumentation for the software performance
counters. Follows the x86 and powerpc changes.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
d06063cc221fdefcab86589e79ddfdb7c0e14b63 10-Apr-2009 Linus Torvalds <torvalds@linux-foundation.org> Move FAULT_FLAG_xyz into handle_mm_fault() callers

This allows the callers to now pass down the full set of FAULT_FLAG_xyz
flags to handle_mm_fault(). All callers have been (mechanically)
converted to the new calling convention, there's almost certainly room
for architectures to clean up their code and then add FAULT_FLAG_RETRY
when that support is added.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4505ffda54b352a08eb08ebad62ac48725c41966 18-Jun-2009 Christoph Hellwig <hch@lst.de> sh: remove stray markers.

arch/sh has a couple of stray markers without any users introduced
in commit 3d58695edbfac785161bf282dc11fd42a483d6c9. Remove them in
preparation of removing the markers in favour of the TRACE_EVENT
macro (and also because we don't keep dead code around).

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
ab6e570ba33dbee18c2520d386e0f367a9b573c3 11-Dec-2008 Paul Mundt <lethal@linux-sh.org> sh: Generic kgdb stub support.

This migrates from the old bitrotted kgdb stub implementation and moves
to the generic stub. In the process support for SH-2/SH-2A is also added,
which the old stub never provided.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
716777db7270255f1f7210fd87a7188b08c9a267 25-Nov-2008 Magnus Damm <damm@igel.co.jp> sh: P4 ioremap pass-through

This patch adds a pass-through case when ioremapping P4 addresses.

Addresses passed to ioremap() should be physical addresses, so the
best option is usually to convert the virtual address to a physical
address before calling ioremap. This will give you a virtual address
in P2 which matches the physical address and this works well for
most internal hardware blocks on the SuperH architecture.

However, some hardware blocks must be accessed through P4. Converting
the P4 address to a physical and then back to a P2 does not work. One
example of this is the sh7722 TMU block, it must be accessed through P4.

Without this patch P4 addresses will be mapped using PTEs which
requires the page allocator to be up and running.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
3d58695edbfac785161bf282dc11fd42a483d6c9 21-Sep-2008 Paul Mundt <lethal@linux-sh.org> sh: Trivial trace_mark() instrumentation for core events.

This implements a few trace points across events that are deemed
interesting. This implements a number of trace points:

- The page fault handler / TLB miss
- IPC calls
- Kernel thread creation

The original LTTng patch had the slow-path instrumented, which
fails to account for the vast majority of events. In general
placing this in the fast-path is not a huge performance hit, as
we don't take page faults for kernel addresses.

The other bits of interest are some of the other trap handlers, as
well as the syscall entry/exit (which is better off being handled
through the tracehook API). Most of the other trap handlers are corner
cases where alternate means of notification exist, so there is little
value in placing extra trace points in these locations.

Based on top of the points provided both by the LTTng instrumentation
patch as well as the patch shipping in the ST-Linux tree, albeit in a
stripped down form.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
8f2baee28093ea77c7cc8da45049fd94cc76998e 20-Sep-2008 Paul Mundt <lethal@linux-sh.org> sh: Kill off duplicate page fault notifiers in slow path.

We already have hooks in place in the __do_page_fault() fast-path,
so kill them off in the slow path.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
887f1ae3bc1701604a7b5ef145e1021072675444 20-Sep-2008 Paul Mundt <lethal@linux-sh.org> sh: Look up the trap vector for the page fault notifier.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
037c10a612e8b7461e33672fb3848807ac6e2346 07-Sep-2008 Paul Mundt <lethal@linux-sh.org> sh: kprobes: Hook up kprobe_fault_handler() in the page fault path.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
96e14e54a6abd5a4bcd75e33962f87bef145d1f6 05-Sep-2008 Stuart Menefy <stuart.menefy@st.com> sh: vmalloc pgtable sync fix.

This fixes a problem in the code which copies the vmalloc portion of the
kernel's page table into the current user space page table. The addition
of the four level page table code breaks on folded page tables, because
the pud level is always present (although folded). This updates the code
to use the same style of updates for the pud as is used for the pgd
level.

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
f2fb4e4f647dabf1177d3ce164988e73482d76b1 02-Jul-2008 Stuart Menefy <stuart.menefy@st.com> sh: Conditionally re-enable IRQs in fault path.

The current kernel behaviour is to reenable interrupts unconditionally
when taking a page fault. This patch changes this to only enable them
if interrupts were previously enabled.

It also fixes a problem seen with this fix in place: the kernel previously
flushed the vsyscall page when handling a signal, which is not only
unncessary, but caused a possible sleep with interrupts disabled.

Signed-off-by: Stuart Menefy <stuart.menefy@st.com>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
a602cc05f8fc849023e72e2857bd842f0104f648 14-Feb-2008 Hideo Saito <saito@densan.co.jp> sh: Fix multiple UTLB hit on UP SH-4.

This acts as a reversion of 1c6b2ca5e0939bf8b5d1a11f1646f25189ecd447 in
the case of UP SH-4, where we still have the risk of a multiple hit
between the slow and fast paths. As seen on SH7780.

Signed-off-by: Hideo Saito <saito@densan.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
e7cc9a7340b8ec018caa9eb1d035fdaef1f2fc51 07-Feb-2008 Magnus Damm <magnus.damm@gmail.com> sh: trapped io support V2

The idea is that we want to get rid of the in/out/readb/writeb callbacks from
the machvec and replace that with simple inline read and write operations to
memory. Fast and simple for most hardware devices (think pci).

Some devices require special treatment though - like 16-bit only CF devices -
so we need to have some method to hook in callbacks.

This patch makes it possible to add a per-device trap generating filter. This
way we can get maximum performance of sane hardware - which doesn't need this
filter - and crappy hardware works but gets punished by a performance hit.

V2 changes things around a bit and replaces io access callbacks with a
simple minimum_bus_width value. In the future we can add stride as well.

Signed-off-by: Magnus Damm <damm@igel.co.jp>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
b62ad83d91ebf1368e9e72d476b18698ac67bef9 10-Jan-2008 Paul Mundt <lethal@linux-sh.org> sh: Correct pte size mismatch for X2 TLB.

Fixes up a build warning/error in arch/sh/mm/fault_32.c.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
811d50cb43eb730cc325df0c6913556e25739797 20-Nov-2007 Paul Mundt <lethal@linux-sh.org> sh: Move in the SH-5 TLB miss.

Signed-off-by: Paul Mundt <lethal@linux-sh.org>