8d9a784d1e2c75e0dcae06f77a02f5e7bb547f3a |
|
14-Feb-2012 |
Stuart Menefy <stuart.menefy@st.com> |
sh: Fix error synchronising kernel page tables The problem is caused by the interaction of two features in the Linux memory management code. A processes address space is described by a struct mm_struct, and every thread has a pointer to the mm it should run in. The exception to this are kernel threads, which don't have an mm, and so borrow the mm from the last thread which ran. The system is bootstrapped by the initial kernel thread using init's mm (even though init hasn't been created yet, its mm is the static init_mm). The other feature is how the kernel handles the page table which describes the portion of the address space which is only visible when executing inside the kernel, and which is shared by all threads. On the SH4 the only portion of the kernel's address space which described using the page table is called P3, from 0xc0000000 to 0xdfffffff. This portion of the address space is divided into three: - mappings for dma_alloc_coherent() - mappings for vmalloc() and ioremap() - fixmap mappings, primarily used in copy_user_pages() to create kernel mappings of user pages with the correct cache colour. To optimise the TLB miss handler we don't want to add an additional condition which checks whether the faulting address is in the user or the kernel portion of the address space, and so all page tables have a common portion which describes the kernel part of the address space. As the SH4 uses a two level page table, only the kernel portion of first level page table (the pgd entries) is duplicated. These all point to the same second level entries (the pte's), and so no memory is wasted. The reference page table for the kernel is called the swapper_pg_dir, and when a new page table is created for a new process the kernel portion of the page table is copied from swapper_pg_dir. This works fine when changes only occur in the second level of the kernel's page table, or the first level entries are created before any new user processes. However if a change occurs to the first level of the page table, and there are existing processes which don't have this entry in their page table, this new entry needs to be added. This is done on demand, when the kernel accesses a P3 address which isn't mapped using the current page table, the code in vmalloc_fault() copies the entry from the reference page table (swapper_pg_dir) into the current processes page table. The bug which this patch addresses is that the code in vmalloc_fault() was not copying addresses which fell in the dma_alloc_coherent() portion of the address space, and it should have been copying any P3 address. Why we hadn't seen this before, and what made this hard to reproduce, is that normally the kernel will have called dma_alloc_coherent(), and accessed the memory mapping created, before any user process runs. Typically drivers such as USB or SATA will have created and used mappings of this type during the kernel initialisation, when probing for the attached devices, before init runs. Ethernet is slightly different, as it normally only creates and accesses dma_alloc_coherent() mappings when the network is brought up, but if kernel level IP configuration is used this will also occur before any user space process runs. So the first reproduction of this problem which we saw was occurred when USB and SATA were removed from the kernel, and then bring up Ethernet from user space using ifconfig. I'd like to thank Joseph Bormolini who did the hard work reducing the problem to this simple to reproduce criteria. In your case the situation is slightly different, and turns out to depends on the exact kernel configuration (which we had) and your ramdisk contents (which we didn't - hence the need for some assumptions). In this case the problem is a side effect of kernel level module loading. Kernel subsystems sometimes trigger the load of kernel modules directly, for example the crypto subsystem tries to load the cryptomgr and MTD tries to load modules for Flash partitioning if these are not built into the kernel. This is done by the kernel creating a user process which runs insmod to try and load the appropriate module. In order for this to cause problems the system must be running with a initrd or initramfs, which contains an insmod executable - if the kernel can't find an insmod to run, no user process is created, and the problem doesn't occur. If an insmod is found, a process is created to run it, which will inherit the kernel portion of the swapper_pg_dir first level page table. It doesn't matter whether the inmod is successful or not, but when the the kernel scheduler context switches back to the kernel initialisation thread, the insmod's mm is 'borrowed' by the kernel thread, as it doesn't have an address space of its own. (Reference counting is used to ensure this mm is not destroyed, even though the user process which caused its creation may no longer exist.) If this address space doesn't have a first level page table entry for the consistent mappings, and a driver tries to access such a mapping, we are in the same situation as described above, except this time in a kernel thread rather than a user thread executing inside the kernel. See bugzilla: 15425, 15836, 15862, 16106, 16793 Signed-off-by: Stuart Menefy <stuart.menefy@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
e839ca528718e68cad32a307dc9aabf01ef3eb05 |
|
28-Mar-2012 |
David Howells <dhowells@redhat.com> |
Disintegrate asm/system.h for SH Disintegrate asm/system.h for SH. Signed-off-by: David Howells <dhowells@redhat.com> cc: linux-sh@vger.kernel.org
|
a8b0ca17b80e92faab46ee7179ba9e99ccb61233 |
|
27-Jun-2011 |
Peter Zijlstra <a.p.zijlstra@chello.nl> |
perf: Remove the nmi parameter from the swevent and overflow interface The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Michael Cree <mcree@orcon.net.nz> Cc: Will Deacon <will.deacon@arm.com> Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: Eric B Munson <emunson@mgebm.net> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Paul Mundt <lethal@linux-sh.org> Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Don Zickus <dzickus@redhat.com> Link: http://lkml.kernel.org/n/tip-agjev8eu666tvknpb3iaj0fg@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
364b97d9e2fec32b7c125f67e5a9e5f1cd0e6a37 |
|
26-Apr-2010 |
Paul Mundt <lethal@linux-sh.org> |
sh: Kill off dangling goto labels from oom-killer rework. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
6b6b18e62cfba44ce7b6489c7100f12b199232d7 |
|
22-Apr-2010 |
Nick Piggin <npiggin@suse.de> |
sh: invoke oom-killer from page fault As explained in commit 1c0fe6e3bd, we want to call the architecture independent oom killer when getting an unexplained OOM from handle_mm_fault, rather than simply killing current. Cc: linux-sh@vger.kernel.org Cc: linux-arch@vger.kernel.org Signed-off-by: Nick Piggin <npiggin@suse.de> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
4b3073e1c53a256275f1079c0fbfbe85883d9275 |
|
18-Dec-2009 |
Russell King <rmk+kernel@arm.linux.org.uk> |
MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself On VIVT ARM, when we have multiple shared mappings of the same file in the same MM, we need to ensure that we have coherency across all copies. We do this via make_coherent() by making the pages uncacheable. This used to work fine, until we allowed highmem with highpte - we now have a page table which is mapped as required, and is not available for modification via update_mmu_cache(). Ralf Beache suggested getting rid of the PTE value passed to update_mmu_cache(): On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables to construct a pointer to the pte again. Passing a pte_t * is much more elegant. Maybe we might even replace the pte argument with the pte_t? Ben Herrenschmidt would also like the pte pointer for PowerPC: Passing the ptep in there is exactly what I want. I want that -instead- of the PTE value, because I have issue on some ppc cases, for I$/D$ coherency, where set_pte_at() may decide to mask out the _PAGE_EXEC. So, pass in the mapped page table pointer into update_mmu_cache(), and remove the PTE value, updating all implementations and call sites to suit. Includes a fix from Stephen Rothwell: sparc: fix fallout from update_mmu_cache API change Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
|
5d9b4b19f118abfb75e352841f7bf74580d7e427 |
|
13-Dec-2009 |
Matt Fleming <matt@console-pimps.org> |
sh: Definitions for 3-level page table layout If using 64-bit PTEs and 4K pages then each page table has 512 entries (as opposed to 1024 entries with 32-bit PTEs). Unlike MIPS, SH follows the convention that all structures in the page table (pgd_t, pmd_t, pgprot_t, etc) must be the same size. Therefore, 64-bit PTEs require 64-bit PGD entries, etc. Using 2-levels of page tables and 64-bit PTEs it is only possible to map 1GB of virtual address space. In order to map all 4GB of virtual address space we need to adopt a 3-level page table layout. This actually works out better for CONFIG_SUPERH32 because we only waste 2 PGD entries on the P1 and P2 areas (which are untranslated) instead of 256. Signed-off-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
cdd6c482c9ff9c55475ee7392ec8f672eddb7be6 |
|
21-Sep-2009 |
Ingo Molnar <mingo@elte.hu> |
perf: Do the big rename: Performance Counters -> Performance Events Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <eranian@google.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Paul Mackerras <paulus@samba.org> Reviewed-by: Arjan van de Ven <arjan@linux.intel.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: David Howells <dhowells@redhat.com> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: <linux-arch@vger.kernel.org> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
0906a3ad33a254094fb74828e3ddb9af8771a6da |
|
03-Sep-2009 |
Paul Mundt <lethal@linux-sh.org> |
sh: Fix up and optimize the kmap_coherent() interface. This fixes up the kmap_coherent/kunmap_coherent() interface for recent changes both in the page fault path and the shared cache flushers, as well as adding in some optimizations. One of the key things to note here is that the TLB flush itself is deferred until the unmap, and the call in to update_mmu_cache() itself goes away, relying on the regular page fault path to handle the lazy dcache writeback if necessary. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
8010fbe7a67c2f993cbb11b9d8b7e98528256dd1 |
|
14-Aug-2009 |
Paul Mundt <lethal@linux-sh.org> |
sh: TLB fast path optimizations for load/store exceptions. This only bothers with the TLB entry flush in the case of the initial page write exception, as it is unecessary in the case of the load/store exceptions. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
112e58471de3431fbd03dee514777ad4a66a77b2 |
|
14-Aug-2009 |
Paul Mundt <lethal@linux-sh.org> |
sh: TLB protection violation exception optimizations. This adds a bit of rework to have the TLB protection violations skip the TLB miss fastpath and go directly in to do_page_fault(), as these require slow path handling. Based on an earlier patch by SUGIOKA Toshinobu. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
05dd2cd3bb3299540e33ff60c5b401dd88f273bd |
|
13-Jul-2009 |
Matt Fleming <matt@console-pimps.org> |
sh: Restore previous behaviour on kernel fault The last commit changed the behaviour on kernel faults when we were doing something other than syncing the page tables. vmalloc_sync_one() needs to return NULL if the page tables are up to date, because the reason for the fault was not a missing/inconsitent page table entry. By returning NULL if the page tables are sync'd we signal to the calling function that further work must be done to resolve this fault. Also, remove the superfluous __va() around the first argument to vmalloc_sync_one(). The value of pgd_k is already a virtual address and using it wth __va() causes a NULL dereference. Signed-off-by: Matt Fleming <matt@console-pimps.org> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
0f60bb25b4036d30fd795709be09626c58c52464 |
|
04-Jul-2009 |
Paul Mundt <lethal@linux-sh.org> |
sh: Tidy up vmalloc fault handling. This rewrites the vmalloc fault handling as per x86, which subsequently allows for easy future tie-in for vmalloc_sync_all(). Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
c63c3105e4991b2991ba73a742b8b59bfdbe4acd |
|
04-Jul-2009 |
Paul Mundt <lethal@linux-sh.org> |
sh: use kprobes_built_in() for notify_page_fault(). Kill off the KPROBES ifdef, as per x86. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
7433ab770327b471399f7b5baacad78e160b5393 |
|
24-Jun-2009 |
Paul Mundt <lethal@linux-sh.org> |
sh: Hook up page fault events for software perf counters. This adds page fault instrumentation for the software performance counters. Follows the x86 and powerpc changes. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
d06063cc221fdefcab86589e79ddfdb7c0e14b63 |
|
10-Apr-2009 |
Linus Torvalds <torvalds@linux-foundation.org> |
Move FAULT_FLAG_xyz into handle_mm_fault() callers This allows the callers to now pass down the full set of FAULT_FLAG_xyz flags to handle_mm_fault(). All callers have been (mechanically) converted to the new calling convention, there's almost certainly room for architectures to clean up their code and then add FAULT_FLAG_RETRY when that support is added. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
4505ffda54b352a08eb08ebad62ac48725c41966 |
|
18-Jun-2009 |
Christoph Hellwig <hch@lst.de> |
sh: remove stray markers. arch/sh has a couple of stray markers without any users introduced in commit 3d58695edbfac785161bf282dc11fd42a483d6c9. Remove them in preparation of removing the markers in favour of the TRACE_EVENT macro (and also because we don't keep dead code around). Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
ab6e570ba33dbee18c2520d386e0f367a9b573c3 |
|
11-Dec-2008 |
Paul Mundt <lethal@linux-sh.org> |
sh: Generic kgdb stub support. This migrates from the old bitrotted kgdb stub implementation and moves to the generic stub. In the process support for SH-2/SH-2A is also added, which the old stub never provided. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
716777db7270255f1f7210fd87a7188b08c9a267 |
|
25-Nov-2008 |
Magnus Damm <damm@igel.co.jp> |
sh: P4 ioremap pass-through This patch adds a pass-through case when ioremapping P4 addresses. Addresses passed to ioremap() should be physical addresses, so the best option is usually to convert the virtual address to a physical address before calling ioremap. This will give you a virtual address in P2 which matches the physical address and this works well for most internal hardware blocks on the SuperH architecture. However, some hardware blocks must be accessed through P4. Converting the P4 address to a physical and then back to a P2 does not work. One example of this is the sh7722 TMU block, it must be accessed through P4. Without this patch P4 addresses will be mapped using PTEs which requires the page allocator to be up and running. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
3d58695edbfac785161bf282dc11fd42a483d6c9 |
|
21-Sep-2008 |
Paul Mundt <lethal@linux-sh.org> |
sh: Trivial trace_mark() instrumentation for core events. This implements a few trace points across events that are deemed interesting. This implements a number of trace points: - The page fault handler / TLB miss - IPC calls - Kernel thread creation The original LTTng patch had the slow-path instrumented, which fails to account for the vast majority of events. In general placing this in the fast-path is not a huge performance hit, as we don't take page faults for kernel addresses. The other bits of interest are some of the other trap handlers, as well as the syscall entry/exit (which is better off being handled through the tracehook API). Most of the other trap handlers are corner cases where alternate means of notification exist, so there is little value in placing extra trace points in these locations. Based on top of the points provided both by the LTTng instrumentation patch as well as the patch shipping in the ST-Linux tree, albeit in a stripped down form. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
8f2baee28093ea77c7cc8da45049fd94cc76998e |
|
20-Sep-2008 |
Paul Mundt <lethal@linux-sh.org> |
sh: Kill off duplicate page fault notifiers in slow path. We already have hooks in place in the __do_page_fault() fast-path, so kill them off in the slow path. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
887f1ae3bc1701604a7b5ef145e1021072675444 |
|
20-Sep-2008 |
Paul Mundt <lethal@linux-sh.org> |
sh: Look up the trap vector for the page fault notifier. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
037c10a612e8b7461e33672fb3848807ac6e2346 |
|
07-Sep-2008 |
Paul Mundt <lethal@linux-sh.org> |
sh: kprobes: Hook up kprobe_fault_handler() in the page fault path. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
96e14e54a6abd5a4bcd75e33962f87bef145d1f6 |
|
05-Sep-2008 |
Stuart Menefy <stuart.menefy@st.com> |
sh: vmalloc pgtable sync fix. This fixes a problem in the code which copies the vmalloc portion of the kernel's page table into the current user space page table. The addition of the four level page table code breaks on folded page tables, because the pud level is always present (although folded). This updates the code to use the same style of updates for the pud as is used for the pgd level. Signed-off-by: Stuart Menefy <stuart.menefy@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
f2fb4e4f647dabf1177d3ce164988e73482d76b1 |
|
02-Jul-2008 |
Stuart Menefy <stuart.menefy@st.com> |
sh: Conditionally re-enable IRQs in fault path. The current kernel behaviour is to reenable interrupts unconditionally when taking a page fault. This patch changes this to only enable them if interrupts were previously enabled. It also fixes a problem seen with this fix in place: the kernel previously flushed the vsyscall page when handling a signal, which is not only unncessary, but caused a possible sleep with interrupts disabled. Signed-off-by: Stuart Menefy <stuart.menefy@st.com> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
a602cc05f8fc849023e72e2857bd842f0104f648 |
|
14-Feb-2008 |
Hideo Saito <saito@densan.co.jp> |
sh: Fix multiple UTLB hit on UP SH-4. This acts as a reversion of 1c6b2ca5e0939bf8b5d1a11f1646f25189ecd447 in the case of UP SH-4, where we still have the risk of a multiple hit between the slow and fast paths. As seen on SH7780. Signed-off-by: Hideo Saito <saito@densan.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
e7cc9a7340b8ec018caa9eb1d035fdaef1f2fc51 |
|
07-Feb-2008 |
Magnus Damm <magnus.damm@gmail.com> |
sh: trapped io support V2 The idea is that we want to get rid of the in/out/readb/writeb callbacks from the machvec and replace that with simple inline read and write operations to memory. Fast and simple for most hardware devices (think pci). Some devices require special treatment though - like 16-bit only CF devices - so we need to have some method to hook in callbacks. This patch makes it possible to add a per-device trap generating filter. This way we can get maximum performance of sane hardware - which doesn't need this filter - and crappy hardware works but gets punished by a performance hit. V2 changes things around a bit and replaces io access callbacks with a simple minimum_bus_width value. In the future we can add stride as well. Signed-off-by: Magnus Damm <damm@igel.co.jp> Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
b62ad83d91ebf1368e9e72d476b18698ac67bef9 |
|
10-Jan-2008 |
Paul Mundt <lethal@linux-sh.org> |
sh: Correct pte size mismatch for X2 TLB. Fixes up a build warning/error in arch/sh/mm/fault_32.c. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|
811d50cb43eb730cc325df0c6913556e25739797 |
|
20-Nov-2007 |
Paul Mundt <lethal@linux-sh.org> |
sh: Move in the SH-5 TLB miss. Signed-off-by: Paul Mundt <lethal@linux-sh.org>
|