History log of /arch/x86/include/asm/pgtable_64.h
Revision Date Author Comments
f2d6bfe9ff0acec30b713614260e78b03d20e909 14-Jan-2011 Johannes Weiner <hannes@cmpxchg.org> thp: add x86 32bit support

Add support for transparent hugepages to x86 32bit.

Share the same VM_ bitflag for VM_MAPPED_COPY. mm/nommu.c will never
support transparent hugepages.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
71e3aac0724ffe8918992d76acfe3aad7d8724a5 14-Jan-2011 Andrea Arcangeli <aarcange@redhat.com> thp: transparent hugepage core

Lately I've been working to make KVM use hugepages transparently without
the usual restrictions of hugetlbfs. Some of the restrictions I'd like to
see removed:

1) hugepages have to be swappable or the guest physical memory remains
locked in RAM and can't be paged out to swap

2) if a hugepage allocation fails, regular pages should be allocated
instead and mixed in the same vma without any failure and without
userland noticing

3) if some task quits and more hugepages become available in the
buddy, guest physical memory backed by regular pages should be
relocated on hugepages automatically in regions under
madvise(MADV_HUGEPAGE) (ideally event driven by waking up the
kernel deamon if the order=HPAGE_PMD_SHIFT-PAGE_SHIFT list becomes
not null)

4) avoidance of reservation and maximization of use of hugepages whenever
possible. Reservation (needed to avoid runtime fatal faliures) may be ok for
1 machine with 1 database with 1 database cache with 1 database cache size
known at boot time. It's definitely not feasible with a virtualization
hypervisor usage like RHEV-H that runs an unknown number of virtual machines
with an unknown size of each virtual machine with an unknown amount of
pagecache that could be potentially useful in the host for guest not using
O_DIRECT (aka cache=off).

hugepages in the virtualization hypervisor (and also in the guest!) are
much more important than in a regular host not using virtualization,
becasue with NPT/EPT they decrease the tlb-miss cacheline accesses from 24
to 19 in case only the hypervisor uses transparent hugepages, and they
decrease the tlb-miss cacheline accesses from 19 to 15 in case both the
linux hypervisor and the linux guest both uses this patch (though the
guest will limit the addition speedup to anonymous regions only for
now...). Even more important is that the tlb miss handler is much slower
on a NPT/EPT guest than for a regular shadow paging or no-virtualization
scenario. So maximizing the amount of virtual memory cached by the TLB
pays off significantly more with NPT/EPT than without (even if there would
be no significant speedup in the tlb-miss runtime).

The first (and more tedious) part of this work requires allowing the VM to
handle anonymous hugepages mixed with regular pages transparently on
regular anonymous vmas. This is what this patch tries to achieve in the
least intrusive possible way. We want hugepages and hugetlb to be used in
a way so that all applications can benefit without changes (as usual we
leverage the KVM virtualization design: by improving the Linux VM at
large, KVM gets the performance boost too).

The most important design choice is: always fallback to 4k allocation if
the hugepage allocation fails! This is the _very_ opposite of some large
pagecache patches that failed with -EIO back then if a 64k (or similar)
allocation failed...

Second important decision (to reduce the impact of the feature on the
existing pagetable handling code) is that at any time we can split an
hugepage into 512 regular pages and it has to be done with an operation
that can't fail. This way the reliability of the swapping isn't decreased
(no need to allocate memory when we are short on memory to swap) and it's
trivial to plug a split_huge_page* one-liner where needed without
polluting the VM. Over time we can teach mprotect, mremap and friends to
handle pmd_trans_huge natively without calling split_huge_page*. The fact
it can't fail isn't just for swap: if split_huge_page would return -ENOMEM
(instead of the current void) we'd need to rollback the mprotect from the
middle of it (ideally including undoing the split_vma) which would be a
big change and in the very wrong direction (it'd likely be simpler not to
call split_huge_page at all and to teach mprotect and friends to handle
hugepages instead of rolling them back from the middle). In short the
very value of split_huge_page is that it can't fail.

The collapsing and madvise(MADV_HUGEPAGE) part will remain separated and
incremental and it'll just be an "harmless" addition later if this initial
part is agreed upon. It also should be noted that locking-wise replacing
regular pages with hugepages is going to be very easy if compared to what
I'm doing below in split_huge_page, as it will only happen when
page_count(page) matches page_mapcount(page) if we can take the PG_lock
and mmap_sem in write mode. collapse_huge_page will be a "best effort"
that (unlike split_huge_page) can fail at the minimal sign of trouble and
we can try again later. collapse_huge_page will be similar to how KSM
works and the madvise(MADV_HUGEPAGE) will work similar to
madvise(MADV_MERGEABLE).

The default I like is that transparent hugepages are used at page fault
time. This can be changed with
/sys/kernel/mm/transparent_hugepage/enabled. The control knob can be set
to three values "always", "madvise", "never" which mean respectively that
hugepages are always used, or only inside madvise(MADV_HUGEPAGE) regions,
or never used. /sys/kernel/mm/transparent_hugepage/defrag instead
controls if the hugepage allocation should defrag memory aggressively
"always", only inside "madvise" regions, or "never".

The pmd_trans_splitting/pmd_trans_huge locking is very solid. The
put_page (from get_user_page users that can't use mmu notifier like
O_DIRECT) that runs against a __split_huge_page_refcount instead was a
pain to serialize in a way that would result always in a coherent page
count for both tail and head. I think my locking solution with a
compound_lock taken only after the page_first is valid and is still a
PageHead should be safe but it surely needs review from SMP race point of
view. In short there is no current existing way to serialize the O_DIRECT
final put_page against split_huge_page_refcount so I had to invent a new
one (O_DIRECT loses knowledge on the mapping status by the time gup_fast
returns so...). And I didn't want to impact all gup/gup_fast users for
now, maybe if we change the gup interface substantially we can avoid this
locking, I admit I didn't think too much about it because changing the gup
unpinning interface would be invasive.

If we ignored O_DIRECT we could stick to the existing compound refcounting
code, by simply adding a get_user_pages_fast_flags(foll_flags) where KVM
(and any other mmu notifier user) would call it without FOLL_GET (and if
FOLL_GET isn't set we'd just BUG_ON if nobody registered itself in the
current task mmu notifier list yet). But O_DIRECT is fundamental for
decent performance of virtualized I/O on fast storage so we can't avoid it
to solve the race of put_page against split_huge_page_refcount to achieve
a complete hugepage feature for KVM.

Swap and oom works fine (well just like with regular pages ;). MMU
notifier is handled transparently too, with the exception of the young bit
on the pmd, that didn't have a range check but I think KVM will be fine
because the whole point of hugepages is that EPT/NPT will also use a huge
pmd when they notice gup returns pages with PageCompound set, so they
won't care of a range and there's just the pmd young bit to check in that
case.

NOTE: in some cases if the L2 cache is small, this may slowdown and waste
memory during COWs because 4M of memory are accessed in a single fault
instead of 8k (the payoff is that after COW the program can run faster).
So we might want to switch the copy_huge_page (and clear_huge_page too) to
not temporal stores. I also extensively researched ways to avoid this
cache trashing with a full prefault logic that would cow in 8k/16k/32k/64k
up to 1M (I can send those patches that fully implemented prefault) but I
concluded they're not worth it and they add an huge additional complexity
and they remove all tlb benefits until the full hugepage has been faulted
in, to save a little bit of memory and some cache during app startup, but
they still don't improve substantially the cache-trashing during startup
if the prefault happens in >4k chunks. One reason is that those 4k pte
entries copied are still mapped on a perfectly cache-colored hugepage, so
the trashing is the worst one can generate in those copies (cow of 4k page
copies aren't so well colored so they trashes less, but again this results
in software running faster after the page fault). Those prefault patches
allowed things like a pte where post-cow pages were local 4k regular anon
pages and the not-yet-cowed pte entries were pointing in the middle of
some hugepage mapped read-only. If it doesn't payoff substantially with
todays hardware it will payoff even less in the future with larger l2
caches, and the prefault logic would blot the VM a lot. If one is
emebdded transparent_hugepage can be disabled during boot with sysfs or
with the boot commandline parameter transparent_hugepage=0 (or
transparent_hugepage=2 to restrict hugepages inside madvise regions) that
will ensure not a single hugepage is allocated at boot time. It is simple
enough to just disable transparent hugepage globally and let transparent
hugepages be allocated selectively by applications in the MADV_HUGEPAGE
region (both at page fault time, and if enabled with the
collapse_huge_page too through the kernel daemon).

This patch supports only hugepages mapped in the pmd, archs that have
smaller hugepages will not fit in this patch alone. Also some archs like
power have certain tlb limits that prevents mixing different page size in
the same regions so they will not fit in this framework that requires
"graceful fallback" to basic PAGE_SIZE in case of physical memory
fragmentation. hugetlbfs remains a perfect fit for those because its
software limits happen to match the hardware limits. hugetlbfs also
remains a perfect fit for hugepage sizes like 1GByte that cannot be hoped
to be found not fragmented after a certain system uptime and that would be
very expensive to defragment with relocation, so requiring reservation.
hugetlbfs is the "reservation way", the point of transparent hugepages is
not to have any reservation at all and maximizing the use of cache and
hugepages at all times automatically.

Some performance result:

vmx andrea # LD_PRELOAD=/usr/lib64/libhugetlbfs.so HUGETLB_MORECORE=yes HUGETLB_PATH=/mnt/huge/ ./largep
ages3
memset page fault 1566023
memset tlb miss 453854
memset second tlb miss 453321
random access tlb miss 41635
random access second tlb miss 41658
vmx andrea # LD_PRELOAD=/usr/lib64/libhugetlbfs.so HUGETLB_MORECORE=yes HUGETLB_PATH=/mnt/huge/ ./largepages3
memset page fault 1566471
memset tlb miss 453375
memset second tlb miss 453320
random access tlb miss 41636
random access second tlb miss 41637
vmx andrea # ./largepages3
memset page fault 1566642
memset tlb miss 453417
memset second tlb miss 453313
random access tlb miss 41630
random access second tlb miss 41647
vmx andrea # ./largepages3
memset page fault 1566872
memset tlb miss 453418
memset second tlb miss 453315
random access tlb miss 41618
random access second tlb miss 41659
vmx andrea # echo 0 > /proc/sys/vm/transparent_hugepage
vmx andrea # ./largepages3
memset page fault 2182476
memset tlb miss 460305
memset second tlb miss 460179
random access tlb miss 44483
random access second tlb miss 44186
vmx andrea # ./largepages3
memset page fault 2182791
memset tlb miss 460742
memset second tlb miss 459962
random access tlb miss 43981
random access second tlb miss 43988

============
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>

#define SIZE (3UL*1024*1024*1024)

int main()
{
char *p = malloc(SIZE), *p2;
struct timeval before, after;

gettimeofday(&before, NULL);
memset(p, 0, SIZE);
gettimeofday(&after, NULL);
printf("memset page fault %Lu\n",
(after.tv_sec-before.tv_sec)*1000000UL +
after.tv_usec-before.tv_usec);

gettimeofday(&before, NULL);
memset(p, 0, SIZE);
gettimeofday(&after, NULL);
printf("memset tlb miss %Lu\n",
(after.tv_sec-before.tv_sec)*1000000UL +
after.tv_usec-before.tv_usec);

gettimeofday(&before, NULL);
memset(p, 0, SIZE);
gettimeofday(&after, NULL);
printf("memset second tlb miss %Lu\n",
(after.tv_sec-before.tv_sec)*1000000UL +
after.tv_usec-before.tv_usec);

gettimeofday(&before, NULL);
for (p2 = p; p2 < p+SIZE; p2 += 4096)
*p2 = 0;
gettimeofday(&after, NULL);
printf("random access tlb miss %Lu\n",
(after.tv_sec-before.tv_sec)*1000000UL +
after.tv_usec-before.tv_usec);

gettimeofday(&before, NULL);
for (p2 = p; p2 < p+SIZE; p2 += 4096)
*p2 = 0;
gettimeofday(&after, NULL);
printf("random access second tlb miss %Lu\n",
(after.tv_sec-before.tv_sec)*1000000UL +
after.tv_usec-before.tv_usec);

return 0;
}
============

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
db3eb96f4e6281b84dd33c8980dacc27f2efe177 14-Jan-2011 Andrea Arcangeli <aarcange@redhat.com> thp: add pmd mangling functions to x86

Add needed pmd mangling functions with symmetry with their pte
counterparts. pmdp_splitting_flush() is the only new addition on the pmd_
methods and it's needed to serialize the VM against split_huge_page. It
simply atomically sets the splitting bit in a similar way
pmdp_clear_flush_young atomically clears the accessed bit.
pmdp_splitting_flush() also has to flush the tlb to make it effective
against gup_fast, but it wouldn't really require to flush the tlb too.
Just the tlb flush is the simplest operation we can invoke to serialize
pmdp_splitting_flush() against gup_fast.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5f6e8da70a289d403975907371ce5738c726ad3f 14-Jan-2011 Andrea Arcangeli <aarcange@redhat.com> thp: special pmd_trans_* functions

These returns 0 at compile time when the config option is disabled, to
allow gcc to eliminate the transparent hugepage function calls at compile
time without additional #ifdefs (only the export of those functions have
to be visible to gcc but they won't be required at link time and
huge_memory.o can be not built at all).

_PAGE_BIT_UNUSED1 is never used for pmd, only on pte.

Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ece0e2b6406a995c371e0311190631ea34ad851a 26-Oct-2010 Peter Zijlstra <a.p.zijlstra@chello.nl> mm: remove pte_*map_nested()

Since we no longer need to provide KM_type, the whole pte_*map_nested()
API is now redundant, remove it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
6afb5157b9eba4092e2f0f54d24a3806409bdde5 19-May-2010 Haicheng Li <haicheng.li@linux.intel.com> x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions

No behavior change.

Move some of vmalloc_sync_all() code into a new function
sync_global_pgds() that will be useful for memory hotplug.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <4C6E4ECD.1090607@linux.intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
4e60c86bd9e5a7110ed28874d0b6592186550ae8 10-Aug-2010 Andi Kleen <andi@firstfloor.org> gcc-4.6: mm: fix unused but set warnings

No real bugs, just some dead code and some fixups.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4b3073e1c53a256275f1079c0fbfbe85883d9275 18-Dec-2009 Russell King <rmk+kernel@arm.linux.org.uk> MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself

On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies. We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
to construct a pointer to the pte again. Passing a pte_t * is much
more elegant. Maybe we might even replace the pte argument with the
pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

Passing the ptep in there is exactly what I want. I want that
-instead- of the PTE value, because I have issue on some ppc cases,
for I$/D$ coherency, where set_pte_at() may decide to mask out the
_PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

sparc: fix fallout from update_mmu_cache API change

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
9063c61fd5cbd6f42e95929aa0e02380c9e15656 21-Jun-2009 Linus Torvalds <torvalds@linux-foundation.org> x86, 64-bit: Clean up user address masking

The discussion about using "access_ok()" in get_user_pages_fast() (see
commit 7f8189068726492950bf1a2dcfd9b51314560abf: "x86: don't use
'access_ok()' as a range check in get_user_pages_fast()" for details and
end result), made us notice that x86-64 was really being very sloppy
about virtual address checking.

So be way more careful and straightforward about masking x86-64 virtual
addresses:

- All the VIRTUAL_MASK* variants now cover half of the address
space, it's not like we can use the full mask on a signed
integer, and the larger mask just invites mistakes when
applying it to either half of the 48-bit address space.

- /proc/kcore's kc_offset_to_vaddr() becomes a lot more
obvious when it transforms a file offset into a
(kernel-half) virtual address.

- Unify/simplify the 32-bit and 64-bit USER_DS definition to
be based on TASK_SIZE_MAX.

This cleanup and more careful/obvious user virtual address checking also
uncovered a buglet in the x86-64 implementation of strnlen_user(): it
would do an "access_ok()" check on the whole potential area, even if the
string itself was much shorter, and thus return an error even for valid
strings. Our sloppy checking had hidden this.

So this fixes 'strnlen_user()' to do this properly, the same way we
already handled user strings in 'strncpy_from_user()'. Namely by just
checking the first byte, and then relying on fault handling for the
rest. That always works, since we impose a guard page that cannot be
mapped at the end of the user space address space (and even if we
didn't, we'd have the address space hole).

Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2c1b284e4fa260fd922b9a65c99169e2630c6862 10-Apr-2009 Jaswinder Singh Rajput <jaswinder@kernel.org> x86: clean up declarations and variables

Impact: cleanup, no code changed

- syscalls.h update declarations due to unifications
- irq.c declare smp_generic_interrupt() before it gets used
- process.c declare sys_fork() and sys_vfork() before they get used
- tsc.c rename tsc_khz shadowed variable
- apic/probe_32.c declare apic_default before it gets used
- apic/nmi.c prev_nmi_count should be unsigned
- apic/io_apic.c declare smp_irq_move_cleanup_interrupt() before it gets used
- mm/init.c declare direct_gbpages and free_initrd_mem before they get used

Signed-off-by: Jaswinder Singh Rajput <jaswinder@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fb3551491b20442c0de0d4debd54f40c654bbe98 09-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: Split pgtable_64.h into pgtable_64_types.h and pgtable_64.h

Signed-off-by: Jeremy Fitzhardinge <jeremy@goop.org>
6cf7150084500962b8e225e2409ec01ed06a2c71 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify io_remap_pfn_range

Impact: cleanup

Unify io_remap_pfn_range. Don't demacro yet.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
7325cc2e333cdaaabd2103552458876ea85adb33 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pgd_none

Impact: cleanup

Unify and demacro pgd_none.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
deb79cfb365c96ff960570d1bcf2c205424b6195 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pud_none

Impact: cleanup

Unify and demacro pud_none.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
cc290ca38cc4c78b0d6175633232f05b8d8732ab 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pages_to_mb

Impact: cleanup

Unify and demacro pages_to_mb.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
99510238bb428091e7caba020bc5e18b5f30b619 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_bad

Impact: cleanup

Unify and demacro pmd_bad.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
a61bb29af47b0e4052566d25f3391894306a23fd 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pgd_bad

Impact: cleanup

Unify and demacro pgd_bad.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
30f103167fcf2b08de64f5f37ece6bfff7392290 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pgd_bad

Impact: cleanup

Unify and demacro pgd_bad.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
3f6cbef1d7f474d16f3a824c6d2910d930778fbd 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pud_large

Impact: cleanup

Unify and demacro pud_large.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
3fbc2444f465710cdf0c832461a6a14338437453 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pte_offset_kernel

Impact: cleanup

Unify and demacro pte_offset_kernel.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
346309cff6127a38731cf102de3413a562700b84 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pte_index

Impact: cleanup

Unify and demacro pte_index.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
97e2817d3423d753fd2f80ea936a370032846382 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_pfn

Impact: cleanup

Unify pmd_pfn. Unfortunately it can't be demacroed because it has a
cyclic dependency on linux/mm.h:page_to_nid().

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
3180fba0eec0d14e4ac8183a90d643d0d3383c75 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_pfn

Impact: cleanup

Unify and demacro pmd_pfn.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
bd44d64db1db6acf2dea9ba130b8cb0a54e1dabd 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: remove redundant pfn_pmd definition

Impact: cleanup

It's already defined in pgtable.h

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
01ade20d5a22e6ef002cbb751dddc3a01a78f998 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_offset

Impact: cleanup

Unify and demacro pmd_offset.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
e24d7eee0beda24504bf6a4aa03be68328557475 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_index

Impact: cleanup

Unify and demacro pmd_index.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
20063ca4eb26d4b10f01d59925deea4aeee415e8 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_page

Impact: cleanup

Unify and demacro pmd_page.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
3ffb3564cd3cd59de8a0d74430ffe2d43ae11f19 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_page_vaddr

Impact: cleanup

Unify and demacro pmd_page_vaddr.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
3d081b1812bd4de2bbef58c6d598ddf45493010e 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pud_offset

Impact: cleanup

Unify and demacro pud_offset.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
7cfb81024bc1dbe8ad2bf5affd58a6a7ad4172ba 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pud_index

Impact: cleanup

Unify and demacro pud_index.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
777cba16aac5a1096db0b936912eb7fd06fb0cc5 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pgd_page

Impact: cleanup

Unify and demacro pgd_page.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
f476961cb16312fe4cb80b2b457ef9acf220a7fc 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pud_page

Impact: cleanup

Unify and demacro pud_page.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
6fff47e3ac5e17f7e164ac4ff9ea29aba3c54d73 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pud_page_vaddr

Impact: cleanup

Unify and demacro pud_page_vaddr.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
c5f040b12b2381591932a007432e7ed86b3f2796 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pgd_page_vaddr

Impact: cleanup

Unify and demacro pgd_page_vaddr.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
4fea801ac95d6534a93aa01d3ac62be163d845af 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_none

Impact: cleanup

Unify and demacro pmd_none.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
649e8ef60fac0a2f6960cdb090d73e78717ac065 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pmd_present

Impact: cleanup

Unify and demacro pmd_present.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
9f38d7e85e914f10a875f65d283432d55a12fc27 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pgd_present

Impact: cleanup

Unify and demacro pgd_present.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
5ba7c91341be61e0942f792c237ac067d9f32f51 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pud_present

Impact: cleanup

Unify and demacro pud_present.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
7c683851d96c8313586c0695b25ca41bde9f0f73 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pte_present

Impact: cleanup

Unify and demacro pte_present.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
8de01da35e9dbbb4a9d1e9d5a37df98395dfa558 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pte_same

Impact: cleanup

Unify and demacro pte_same.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
a034a010f48bf49efe25098c16c16b9708ccbba5 05-Feb-2009 Jeremy Fitzhardinge <jeremy@goop.org> x86: unify pte_none

Impact: cleanup

Unify and demacro pte_none.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
0d974d4592708f85044751817da4b7016e1b0602 19-Jan-2009 Brian Gerst <brgerst@gmail.com> x86: remove pda.h

Impact: cleanup

Signed-off-by: Brian Gerst <brgerst@gmail.com>
8a7b12f70fb135a1b1d865687de3edcdc780f6d1 18-Dec-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> x86: PAT: change pgprot_noncached to uc_minus instead of strong uc - v3

Impact: mm behavior change.

Make pgprot_noncached uc_minus instead of strong UC. This will make
pgprot_noncached to be in line with ioremap_nocache() and all the other
APIs that map page uc_minus on uc request.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
b6fd6f26733e864fba2ea3eb1d716e23d2e66f3a 16-Dec-2008 Ingo Molnar <mingo@elte.hu> x86, mm: limit MAXMEM on 64-bit

on 64-bit x86 the physical memory limit is controlled by the sparsemem
bits - which are 44 bits right now. But MAXMEM (the max pfn number
e820 parsing will allow to enter our sizing routines) is set to
0x00003fffffffffff, i.e. 46 bits - that's too large because it overlaps
into the vmalloc range.

So couple MAXMEM to MAX_PHYSMEM_BITS, and add a comment that the
maximum of MAX_PHYSMEM_BITS is 45 bits.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
1796316a8b028a148be48ba5d4e7be493a39d173 16-Dec-2008 Jan Beulich <jbeulich@novell.com> x86: consolidate __swp_XXX() macros

Impact: cleanup, code robustization

The __swp_...() macros silently relied upon which bits are used for
_PAGE_FILE and _PAGE_PROTNONE. After having changed _PAGE_PROTNONE in
our Xen kernel to no longer overlap _PAGE_PAT, live locks and crashes
were reported that could have been avoided if these macros properly
used the symbolic constants. Since, as pointed out earlier, for Xen
Dom0 support mainline likewise will need to eliminate the conflict
between _PAGE_PAT and _PAGE_PROTNONE, this patch does all the necessary
adjustments, plus it introduces a mechanism to check consistency
between MAX_SWAPFILES_SHIFT and the actual encoding macros.

This also fixes a latent bug in that x86-64 used a 6-bit mask in
__swp_type(), and if MAX_SWAPFILES_SHIFT was increased beyond 5 in (the
seemingly unrelated) linux/swap.h, this would have resulted in a
collision with _PAGE_FILE.

Non-PAE 32-bit code gets similarly adjusted for its pte_to_pgoff() and
pgoff_to_pte() calculations.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1965aae3c98397aad957412413c07e97b1bd4e64 23-Oct-2008 H. Peter Anvin <hpa@zytor.com> x86: Fix ASM_X86__ header guards

Change header guards named "ASM_X86__*" to "_ASM_X86_*" since:

a. the double underscore is ugly and pointless.
b. no leading underscore violates namespace constraints.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
bb8985586b7a906e116db835c64773b7a7d51663 18-Aug-2008 Al Viro <viro@zeniv.linux.org.uk> x86, um: ... and asm-x86 move

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>