History log of /arch/arc/mm/tlb.c
Revision Date Author Comments
56372082533afb859e6d64707859349a2ee171bf 25-Sep-2014 Vineet Gupta <vgupta@synopsys.com> ARC: boot: cpu feature print enhancements

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
5ea72a90261552ed5fdca35239feb6cba498301e 27-Oct-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [SMP] TLB flush

- Add mm_cpumask setting (aggregating only, unlike some other arches)
used to restrict the TLB flush cross-calling

- cross-calling versions of TLB flush routines (thanks to Noam)

Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
63eca94ca206e342bad4a06a86d8e7eda3053a4e 23-Aug-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [SMP] ASID allocation

-Track a Per CPU ASID counter
-mm-per-cpu ASID (multiple threads, or mm migrated around)

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
0a4c40a3b702730c8b1ad0952e6501e84fadd395 27-Sep-2013 Vineet Gupta <vgupta@synopsys.com> ARC: Fix bogus gcc warning and micro-optimise TLB iteration loop

------------------>8----------------------
arch/arc/mm/tlb.c: In function ‘do_tlb_overlap_fault’:
arch/arc/mm/tlb.c:688:13: warning: array subscript is above array bounds
[-Warray-bounds]
(pd0[n] & PAGE_MASK)) {
^
------------------>8----------------------

While at it, remove the usless last iteration of outer loop when reading
a TLB SET for duplicate entries.

Suggested-by: Mischa Jonker <mjonker@synopsys.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
947bf103fcd2defa3bc4b7ebc6b05d0427bcde2d 26-Jul-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [ASID] Track ASID allocation cycles/generations

This helps remove asid-to-mm reverse map

While mm->context.id contains the ASID assigned to a process, our ASID
allocator also used asid_mm_map[] reverse map. In a new allocation
cycle (mm->ASID >= @asid_cache), the Round Robin ASID allocator used this
to check if new @asid_cache belonged to some mm2 (from prev cycle).
If so, it could locate that mm using the ASID reverse map, and mark that
mm as unallocated ASID, to force it to refresh at the time of switch_mm()

However, for SMP, the reverse map has to be maintained per CPU, so
becomes 2 dimensional, hence got rid of it.

With reverse map gone, it is NOT possible to reach out to current
assignee. So we track the ASID allocation generation/cycle and
on every switch_mm(), check if the current generation of CPU ASID is
same as mm's ASID; If not it is refreshed.

(Based loosely on arch/sh implementation)

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
3daa48d1d9bc44baa079d65e72ef2e3f1139ac03 24-Jul-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [ASID] get_new_mmu_context() to conditionally allocate new ASID

ASID allocation changes/1

This patch does 2 things:

(1) get_new_mmu_context() NOW moves mm->ASID to a new value ONLY if it
was from a prev allocation cycle/generation OR if mm had no ASID
allocated (vs. before would unconditionally moving to a new ASID)

Callers desiring unconditional update of ASID, e.g.local_flush_tlb_mm()
(for parent's address space invalidation at fork) need to first force
the parent to an unallocated ASID.

(2) get_new_mmu_context() always sets the MMU PID reg with unchanged/new
ASID value.

The gains are:
- consolidation of all asid alloc logic into get_new_mmu_context()
- avoiding code duplication in switch_mm() for PID reg setting
- Enables future change to fold activate_mm() into switch_mm()

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
5bd87adf9b2ae5fa1bb469c68029b4eec06d6e03 23-Aug-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [ASID] Refactor the TLB paranoid debug code

-Asm code already has values of SW and HW ASID values, so they can be
passed to the printing routine.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
c0857f5d0e747dbbf53d8f27bcf7d977aac33760 29-Aug-2013 Vineet Gupta <vgupta@synopsys.com> ARC: No need to flush the TLB in early boot

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
483e9bcb01432ce66448c214bd0afc231da48b4b 01-Jul-2013 Vineet Gupta <vgupta@synopsys.com> ARC: MMUv4 preps/3 - Abstract out TLB Insert/Delete

This reorganizes the current TLB operations into psuedo-ops to better
pair with MMUv4's native Insert/Delete operations

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
d091fcb97ff48a5cb6de19ad0881fb2c8e76dbc0 17-Jun-2013 Vineet Gupta <vgupta@synopsys.com> ARC: MMUv4 preps/2 - Reshuffle PTE bits

With previous commit freeing up PTE bits, reassign them so as to:

- Match the bit to H/w counterpart where possible
(e.g. MMUv2 GLOBAL/PRESENT, this avoids a shift in create_tlb())
- Avoid holes in _PAGE_xxx definitions

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
64b703ef276964b160a5e88df0764f254460cafb 17-Jun-2013 Vineet Gupta <vgupta@synopsys.com> ARC: MMUv4 preps/1 - Fold PTE K/U access flags

The current ARC VM code has 13 flags in Page Table entry: some software
(accesed/dirty/non-linear-maps) and rest hardware specific. With 8k MMU
page, we need 19 bits for addressing page frame so remaining 13 bits is
just about enough to accomodate the current flags.

In MMUv4 there are 2 additional flags, SZ (normal or super page) and WT
(cache access mode write-thru) - and additionally PFN is 20 bits (vs. 19
before for 8k). Thus these can't be held in current PTE w/o making each
entry 64bit wide.

It seems there is some scope of compressing the current PTE flags (and
freeing up a few bits). Currently PTE contains fully orthogonal distinct
access permissions for kernel and user mode (Kr, Kw, Kx; Ur, Uw, Ux)
which can be folded into one set (R, W, X). The translation of 3 PTE
bits into 6 TLB bits (when programming the MMU) can be done based on
following pre-requites/assumptions:

1. For kernel-mode-only translations (vmalloc: 0x7000_0000 to
0x7FFF_FFFF), PTE additionally has PAGE_GLOBAL flag set (and user
space entries can never be global). Thus such a PTE can translate
to Kr, Kw, Kx (as appropriate) and zero for User mode counterparts.

2. For non global entries, the PTE flags can be used to create mirrored
K and U TLB bits. This is true after commit a950549c675f2c8c504
"ARC: copy_(to|from)_user() to honor usermode-access permissions"
which ensured that user-space translations _MUST_ have same access
permissions for both U/K mode accesses so that copy_{to,from}_user()
play fair with fault based CoW break and such...

There is no such thing as free lunch - the cost is slightly infalted
TLB-Miss Handlers.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
ce7599567e27eabc1003e35b6f05579268dafecd 24-Jun-2013 Paul Gortmaker <paul.gortmaker@windriver.com> arc: delete __cpuinit usage from all arc files

The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.

After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.

Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.

This removes all the arch/arc uses of the __cpuinit macros from
all C files. Currently arc does not have any __CPUINIT used in
assembly files.

[1] https://lkml.org/lkml/2013/5/20/589

Cc: Vineet Gupta <vgupta@synopsys.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2ed21dae021db1f9f988494ceee519290217520d 13-May-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [mm] Assume pagecache page dirty by default

Similar to ARM/SH

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
29b93c68bf81d2aad1030e989d844cff9f3ba99a 19-May-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [mm] Zero page optimization

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
da1677b02d3ef674dfd8a4ba1ed32153dc717fa2 14-May-2013 Vineet Gupta <vgupta@synopsys.com> ARC: Disintegrate arcregs.h

* Move the various sub-system defines/types into relevant files/functions
(reduces compilation time)

* move CPU specific stuff out of asm/tlb.h into asm/mmu.h

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
8235703e103579bdcedadcaf63bc1896f82b191b 31-May-2013 Vineet Gupta <vgupta@synopsys.com> ARC: Use kconfig helper IS_ENABLED() to get rid of defines.h

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
3e87974dec5ec25a8a4852d9292db6be659164e6 22-May-2013 Vineet Gupta <vgupta@synopsys.com> ARC: Brown paper bag bug in macro for checking cache color

The VM_EXEC check in update_mmu_cache() was getting optimized away
because of a stupid error in definition of macro addr_not_cache_congruent()

The intention was to have the equivalent of following:

if (a || (1 ? b : 0))

but we ended up with following:

if (a || 1 ? b : 0)

And because precedence of '||' is more that that of '?', gcc was optimizing
away evaluation of <a>

Nasty Repercussions:
1. For non-aliasing configs it would mean some extraneous dcache flushes
for non-code pages if U/K mappings were not congruent.
2. For aliasing config, some needed dcache flush for code pages might
be missed if U/K mappings were congruent.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
4102b53392d6397d80b6e09b516517efacf7ea77 09-May-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [mm] Aliasing VIPT dcache support 2/4

This is the meat of the series which prevents any dcache alias creation
by always keeping the U and K mapping of a page congruent.
If a mapping already exists, and other tries to access the page, prev
one is flushed to physical page (wback+inv)

Essentially flush_dcache_page()/copy_user_highpage() create K-mapping
of a page, but try to defer flushing, unless U-mapping exist.
When page is actually mapped to userspace, update_mmu_cache() flushes
the K-mapping (in certain cases this can be optimised out)

Additonally flush_cache_mm(), flush_cache_range(), flush_cache_page()
handle the puring of stale userspace mappings on exit/munmap...

flush_anon_page() handles the existing U-mapping for anon page before
kernel reads it via the GUP path.

Note that while not complete, this is enough to boot a simple
dynamically linked Busybox based rootfs

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
6ec18a81b22ab2b40df8424f2b5fc6be20ccad87 09-May-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [mm] Aliasing VIPT dcache support 1/4

This preps the low level dcache flush helpers to take vaddr argument in
addition to the existing paddr to properly flush the VIPT dcache

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
eacd0e950dc2100af54f2a94ae29105bf48ab921 16-Apr-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [mm] Lazy D-cache flush (non aliasing VIPT)

flush_dcache_page( ) is MM hook to ensure that a page has consistent
views between kernel and userspace. Thus it is called when

* kernel writes to a page which at some later point could get mapped to
userspace (so kernel mapping needs to be flushed-n-inv)
* kernel is about to read from a page with possible userspace mappings
(so userspace mappings needs to be made coherent with kernel ones)

However for Non aliasing VIPT dcache, any userspace mapping will always
be congruent to kernel mapping. Thus d-cache need need not be flushed at
all (or delayed indefinitely).

The only reason it does need to be flushed is when mapping code pages.
Since icache doesn't snoop dcache, those dirty dcache lines need to be
written back to memory and icache line invalidated so that icache lines
fetch will get the right data.

Decent gains on LMBench fork/exec/sh and File I/O micro-benchmarks.

(1) FPGA @ 80 MHZ

Processor, Processes - times in microseconds - smaller is better
------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
3.9-rc6-a Linux 3.9.0-r 80 4.79 8.72 66.7 116. 239. 8.39 30.4 4798 14.K 34.K
3.9-rc6-b Linux 3.9.0-r 80 4.79 8.62 65.4 111. 239. 8.35 29.0 3995 12.K 30.K
3.9-rc7-c Linux 3.9.0-r 80 4.79 9.00 66.1 106. 239. 8.61 30.4 2858 10.K 24.K
^^^^ ^^^^ ^^^

File & VM system latencies in microseconds - smaller is better
-------------------------------------------------------------------------------
Host OS 0K File 10K File Mmap Prot Page 100fd
Create Delete Create Delete Latency Fault Fault selct
--------- ------------- ------ ------ ------ ------ ------- ----- ------- -----
3.9-rc6-a Linux 3.9.0-r 317.8 204.2 1122.3 375.1 3522.0 4.288 20.7 126.8
3.9-rc6-b Linux 3.9.0-r 298.7 223.0 1141.6 367.8 3531.0 4.866 20.9 126.4
3.9-rc7-c Linux 3.9.0-r 278.4 179.2 862.1 339.3 3705.0 3.223 20.3 126.6
^^^^^ ^^^^^ ^^^^^ ^^^^

(2) Customer Silicon @ 500 MHz (166 MHz mem)

------------------------------------------------------------------------------
Host OS Mhz null null open slct sig sig fork exec sh
call I/O stat clos TCP inst hndl proc proc proc
--------- ------------- ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- ----
abilis-ba Linux 3.9.0-r 497 0.71 1.38 4.58 12.0 35.5 1.40 3.89 2070 5525 13.K
abilis-ca Linux 3.9.0-r 497 0.71 1.40 4.61 11.8 35.6 1.37 3.92 1411 4317 10.K
^^^^ ^^^^ ^^^

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
24603fdd19d978fcc0d089d92370ee1aa3a71e84 11-Apr-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [mm] optimise icache flush for user mappings

ARC icache doesn't snoop dcache thus executable pages need to be made
coherent before mapping into userspace in flush_icache_page().

However ARC700 CDU (hardware cache flush module) requires both vaddr
(index in cache) as well as paddr (tag match) to correctly identify a
line in the VIPT cache. A typical ARC700 SoC has aliasing icache, thus
the paddr only based flush_icache_page() API couldn't be implemented
efficiently. It had to loop thru all possible alias indexes and perform
the invalidate operation (ofcourse the cache op would only succeed at
the index(es) where tag matches - typically only 1, but the cost of
visiting all the cache-bins needs to paid nevertheless).

Turns out however that the vaddr (along with paddr) is available in
update_mmu_cache() hence better suits ARC icache flush semantics.
With both vaddr+paddr, exactly one flush operation per line is done.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
e3edeb67fbd6c522a46a844c569fc41a8a2b6876 26-Feb-2013 Noam Camus <noamc@ezchip.com> ARC: Respect the cpu_id passed for fetching correct cpu info

Signed-off-by: Noam Camus <noamc@ezchip.com>
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
30ecee8cdd05415e5602bd755d9210e1c5a5b64d 09-Apr-2013 Vineet Gupta <vgupta@synopsys.com> ARC: [build] Fix warnings with CONFIG_DEBUG_SECTION_MISMATCH

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
af61742813aa9dde65ca796801e36d03b83fa79f 18-Jan-2013 Vineet Gupta <vgupta@synopsys.com> ARC: Boot #2: Verbose Boot reporting / feature verification

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
41195d236e84458bebd4fdc218610a92231ac791 18-Jan-2013 Vineet Gupta <vgupta@synopsys.com> ARC: SMP support

ARC common code to enable a SMP system + ISS provided SMP extensions.

ARC700 natively lacks SMP support, hence some of the core features are
are only enabled if SoCs have the necessary h/w pixie-dust. This
includes:
-Inter Processor Interrupts (IPI)
-Cache coherency
-load-locked/store-conditional
...

The low level exception handling would be completely broken in SMP
because we don't have hardware assisted stack switching. Thus a fair bit
of this code is repurposing the MMU_SCRATCH reg for event handler
prologues to keep them re-entrant.

Many thanks to Rajeshwar Ranga for his initial "major" contributions to
SMP Port (back in 2008), and to Noam Camus and Gilad Ben-Yossef for help
with resurrecting that in 3.2 kernel (2012).

Note that this platform code is again singleton design pattern - so
multiple SMP platforms won't build at the moment - this deficiency is
addressed in subsequent patches within this series.

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rajeshwar Ranga <rajeshwar.ranga@gmail.com>
Cc: Noam Camus <noamc@ezchip.com>
Cc: Gilad Ben-Yossef <gilad@benyossef.com>
d79e678d746d3d4234477f08ce7d27d55ebe283a 18-Jan-2013 Vineet Gupta <vgupta@synopsys.com> ARC: TLB flush Handling

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
cc562d2eae93bc2768a6575d31c089719e8939e8 18-Jan-2013 Vineet Gupta <vgupta@synopsys.com> ARC: MMU Exception Handling

* MMU I-TLB / D-TLB Miss Exceptions
- Fast Path TLB Refill Handler
- slowpath TLB creation via do_page_fault() -> update_mmu_cache()
* Duplicate PD Exception Handler

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
f1f3347da9440eedd2350f4f5d13d8860f570b92 18-Jan-2013 Vineet Gupta <vgupta@synopsys.com> ARC: MMU Context Management

ARC700 MMU provides for tagging TLB entries with a 8-bit ASID to avoid
having to flush the TLB every task switch.

It also allows for a quick way to invalidate all the TLB entries for
task useful for:
* COW sementics during fork()
* task exit()ing

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>