History log of /arch/x86/include/asm/fpu-internal.h
Revision Date Author Comments
31d963389f67165402aa447a8e8ce5ffb9188b3d 02-Sep-2014 Oleg Nesterov <oleg@redhat.com> x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()

__thread_fpu_begin() checks X86_FEATURE_EAGER_FPU by hand, we have
a helper for that.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Link: http://lkml.kernel.org/r/20140902175720.GA21656@redhat.com
Reviewed-by: Suresh Siddha <sbsiddha@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
9b13a93df267af681a66a6a738bf1af10102da7d 18-Jun-2014 Borislav Petkov <bp@suse.de> x86, cpufeature: Convert more "features" to bugs

X86_FEATURE_FXSAVE_LEAK, X86_FEATURE_11AP and
X86_FEATURE_CLFLUSH_MONITOR are not really features but synthetic bits
we use for applying different bug workarounds. Call them what they
really are, and make sure they get the proper cross-CPU behavior (OR
rather than AND).

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1403042783-23278-1-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
f41d830fa890044cb60f6bb39fc8f6493ffebb47 29-May-2014 Fenghua Yu <fenghua.yu@intel.com> x86/xsaves: Save xstate to task's xsave area in __save_fpu during booting time

__save_fpu() can be called during early booting time when cpu caps are not
enabled and alternative can not be used yet. Therefore, it calls
xsave_state_booting() during booting time to save xstate to task's xsave area.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
Link: http://lkml.kernel.org/r/1401387164-43416-14-git-send-email-fenghua.yu@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
c6b406919288a617815f710175da20f3fca72065 27-Mar-2014 Matt Fleming <matt.fleming@intel.com> x86, fpu: Extend the use of static_cpu_has_safe

It may be necessary to save and restore the FPU context during EFI runtime
system services calls. However, this may happen during boot and before
alternatives have run. Thus, we need to use static_cpu_has_safe instead.

The rationale behind the use of static_cpu_has_safe is the same as in
commit 5f8c4218148822fde6ee ("x86, fpu: Use static_cpu_has_safe
before alternatives") by Borislav Petkov.

Signed-off-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Cc: Borislav Petkov <bp@suse.de>
26bef1318adc1b3a530ecc807ef99346db2aa8b0 12-Jan-2014 Linus Torvalds <torvalds@linux-foundation.org> x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround

Before we do an EMMS in the AMD FXSAVE information leak workaround we
need to clear any pending exceptions, otherwise we trap with a
floating-point exception inside this code.

Reported-by: halfdog <me@halfdog.net>
Tested-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
c375f15a434db1867cb004bafba92aba739e4e39 13-Nov-2013 Vineet Gupta <Vineet.Gupta1@synopsys.com> x86: move fpu_counter into ARCH specific thread_struct

Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
be moved out into ARCH specific thread_struct, reducing the size of
task_struct for other arches.

Compile tested i386_defconfig + gcc 4.7.3

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Paul Mundt <paul.mundt@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5f8c4218148822fde6eebbeefc34bd0a6061e031 09-Jun-2013 Borislav Petkov <bp@suse.de> x86, fpu: Use static_cpu_has_safe before alternatives

The call stack below shows how this happens: basically eager_fpu_init()
calls __thread_fpu_begin(current) which then does if (!use_eager_fpu()),
which, in turn, uses static_cpu_has.

And we're executing before alternatives so static_cpu_has doesn't work
there yet.

Use the safe variant in this path which becomes optimal after
alternatives have run.

WARNING: at arch/x86/kernel/cpu/common.c:1368 warn_pre_alternatives+0x1e/0x20()
You're using static_cpu_has before alternatives have run!
Modules linked in:
Pid: 0, comm: swapper Not tainted 3.9.0-rc8+ #1
Call Trace:
warn_slowpath_common
warn_slowpath_fmt
? fpu_finit
warn_pre_alternatives
eager_fpu_init
fpu_init
cpu_init
trap_init
start_kernel
? repair_env_string
x86_64_start_reservations
x86_64_start_kernel

Signed-off-by: Borislav Petkov <bp@suse.de>
Link: http://lkml.kernel.org/r/1370772454-6106-6-git-send-email-bp@alien8.de
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
60e019eb37a8d989031ad47ae9810453536f3127 29-Apr-2013 H. Peter Anvin <hpa@zytor.com> x86: Get rid of ->hard_math and all the FPU asm fu

Reimplement FPU detection code in C and drop old, not-so-recommended
detection method in asm. Move all the relevant stuff into i387.c where
it conceptually belongs. Finally drop cpuinfo_x86.hard_math.

[ hpa: huge thanks to Borislav for taking my original concept patch
and productizing it ]

[ Boris, note to self: do not use static_cpu_has before alternatives! ]

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
235b80226b986dabcbba844968f7807866bd0bfe 10-Nov-2012 Al Viro <viro@zeniv.linux.org.uk> x86: convert to ksignal

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
644c154186386bb1fa6446bc5e037b9ed098db46 30-Nov-2012 Vincent Palatin <vpalatin@chromium.org> x86, fpu: Avoid FPU lazy restore after suspend

When a cpu enters S3 state, the FPU state is lost.
After resuming for S3, if we try to lazy restore the FPU for a process running
on the same CPU, this will result in a corrupted FPU context.

Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
so nobody will try to lazy restore a state which doesn't exist in the hardware.

Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
by doing thousands of suspend/resume cycles with 4 processes doing FPU
operations running. Without the patch, a process is killed after a
few hundreds cycles by a SIGFPE.

Cc: Duncan Laurie <dlaurie@chromium.org>
Cc: Olof Johansson <olofj@chromium.org>
Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write
Signed-off-by: Vincent Palatin <vpalatin@chromium.org>
Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
e139e95590dfebab81841bf7a3ac46500f51a47c 26-Sep-2012 H. Peter Anvin <hpa@zytor.com> x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space

With SMAP, the [f][x]rstor_checking() functions are no longer usable
for user-space pointers by applying a simple __force cast. Instead,
create new [f][x]rstor_user() functions which do the proper SMAP
magic.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.siddha@intel.com
63bcff2a307b9bcc712a8251eb27df8b2e117967 21-Sep-2012 H. Peter Anvin <hpa@linux.intel.com> x86, smap: Add STAC and CLAC instructions to control user space access

When Supervisor Mode Access Prevention (SMAP) is enabled, access to
userspace from the kernel is controlled by the AC flag. To make the
performance of manipulating that flag acceptable, there are two new
instructions, STAC and CLAC, to set and clear it.

This patch adds those instructions, via alternative(), when the SMAP
feature is enabled. It also adds X86_EFLAGS_AC unconditionally to the
SYSCALL entry mask; there is simply no reason to make that one
conditional.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Link: http://lkml.kernel.org/r/1348256595-29119-9-git-send-email-hpa@linux.intel.com
a8615af4bc3621cb01096541dafa6f68352ec2d9 10-Sep-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, fpu: remove cpu_has_xmm check in the fx_finit()

CPUs with FXSAVE but no XMM/MXCSR (Pentium II from Intel,
Crusoe/TM-3xxx/5xxx from Transmeta, and presumably some of the K6
generation from AMD) ever looked at the mxcsr field during
fxrstor/fxsave. So remove the cpu_has_xmm check in the fx_finit()

Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1347300665-6209-6-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
5d2bd7009f306c82afddd1ca4d9763ad8473c216 06-Sep-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, fpu: decouple non-lazy/eager fpu restore from xsave

Decouple non-lazy/eager fpu restore policy from the existence of the xsave
feature. Introduce a synthetic CPUID flag to represent the eagerfpu
policy. "eagerfpu=on" boot paramter will enable the policy.

Requested-by: H. Peter Anvin <hpa@zytor.com>
Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1347300665-6209-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
304bceda6a18ae0b0240b8aac9a6bdf8ce2d2469 24-Aug-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, fpu: use non-lazy fpu restore for processors supporting xsave

Fundamental model of the current Linux kernel is to lazily init and
restore FPU instead of restoring the task state during context switch.
This changes that fundamental lazy model to the non-lazy model for
the processors supporting xsave feature.

Reasons driving this model change are:

i. Newer processors support optimized state save/restore using xsaveopt and
xrstor by tracking the INIT state and MODIFIED state during context-switch.
This is faster than modifying the cr0.TS bit which has serializing semantics.

ii. Newer glibc versions use SSE for some of the optimized copy/clear routines.
With certain workloads (like boot, kernel-compilation etc), application
completes its work with in the first 5 task switches, thus taking upto 5 #DNA
traps with the kernel not getting a chance to apply the above mentioned
pre-load heuristic.

iii. Some xstate features (like AMD's LWP feature) don't honor the cr0.TS bit
and thus will not work correctly in the presence of lazy restore. Non-lazy
state restore is needed for enabling such features.

Some data on a two socket SNB system:
* Saved 20K DNA exceptions during boot on a two socket SNB system.
* Saved 50K DNA exceptions during kernel-compilation workload.
* Improved throughput of the AVX based checksumming function inside the
kernel by ~15% as xsave/xrstor is faster than the serializing clts/stts
pair.

Also now kernel_fpu_begin/end() relies on the patched
alternative instructions. So move check_fpu() which uses the
kernel_fpu_begin/end() after alternative_instructions().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1345842782-24175-7-git-send-email-suresh.b.siddha@intel.com
Merge 32-bit boot fix from,
Link: http://lkml.kernel.org/r/1347300665-6209-4-git-send-email-suresh.b.siddha@intel.com
Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Avi Kivity <avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
377ffbcc536a5a6666dc077395163ab149c02610 24-Aug-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, fpu: remove unnecessary user_fpu_end() in save_xstate_sig()

Few lines below we do drop_fpu() which is more safer. Remove the
unnecessary user_fpu_end() in save_xstate_sig(), which allows
the drop_fpu() to ignore any pending exceptions from the user-space
and drop the current fpu.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1345842782-24175-3-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
e962591749dfd4df9fea2c530ed7a3cfed50e5aa 24-Aug-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, fpu: drop_fpu() before restoring new state from sigframe

No need to save the state with unlazy_fpu(), that is about to get overwritten
by the state from the signal frame. Instead use drop_fpu() and continue
to restore the new state.

Also fold the stop_fpu_preload() into drop_fpu().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1345842782-24175-2-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
72a671ced66db6d1c2bfff1c930a101ac8d08204 25-Jul-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, fpu: Unify signal handling code paths for x86 and x86_64 kernels

Currently for x86 and x86_32 binaries, fpstate in the user sigframe is copied
to/from the fpstate in the task struct.

And in the case of signal delivery for x86_64 binaries, if the fpstate is live
in the CPU registers, then the live state is copied directly to the user
sigframe. Otherwise fpstate in the task struct is copied to the user sigframe.
During restore, fpstate in the user sigframe is restored directly to the live
CPU registers.

Historically, different code paths led to different bugs. For example,
x86_64 code path was not preemption safe till recently. Also there is lot
of code duplication for support of new features like xsave etc.

Unify signal handling code paths for x86 and x86_64 kernels.

New strategy is as follows:

Signal delivery: Both for 32/64-bit frames, align the core math frame area to
64bytes as needed by xsave (this where the main fpu/extended state gets copied
to and excludes the legacy compatibility fsave header for the 32-bit [f]xsave
frames). If the state is live, copy the register state directly to the user
frame. If not live, copy the state in the thread struct to the user frame. And
for 32-bit [f]xsave frames, construct the fsave header separately before
the actual [f]xsave area.

Signal return: As the 32-bit frames with [f]xstate has an additional
'fsave' header, copy everything back from the user sigframe to the
fpstate in the task structure and reconstruct the fxstate from the 'fsave'
header (Also user passed pointers may not be correctly aligned for
any attempt to directly restore any partial state). At the next fpstate usage,
everything will be restored to the live CPU registers.
For all the 64-bit frames and the 32-bit fsave frame, restore the state from
the user sigframe directly to the live CPU registers. 64-bit signals always
restored the math frame directly, so we can expect the math frame pointer
to be correctly aligned. For 32-bit fsave frames, there are no alignment
requirements, so we can restore the state directly.

"lat_sig catch" microbenchmark numbers (for x86, x86_64, x86_32 binaries) are
with in the noise range with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-4-git-send-email-suresh.b.siddha@intel.com
[ Merged in compilation fix ]
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
0ca5bd0d886578ad0afeceaa83458c0f35cb3c6b 25-Jul-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, fpu: Consolidate inline asm routines for saving/restoring fpu state

Consolidate x86, x86_64 inline asm routines saving/restoring fpu state
using config_enabled().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-3-git-send-email-suresh.b.siddha@intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
050902c011712ad4703038fa4489ec4edd87d396 25-Jul-2012 Suresh Siddha <suresh.b.siddha@intel.com> x86, signal: Cleanup ifdefs and is_ia32, is_x32

Use config_enabled() to cleanup the definitions of is_ia32/is_x32. Move
the function prototypes to the header file to cleanup ifdefs,
and move the x32_setup_rt_frame() code around.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1343171129-2747-2-git-send-email-suresh.b.siddha@intel.com
Merged in compilation fix from,
Link: http://lkml.kernel.org/r/1344544736.8326.17.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
c6ae41e7d469f00d9c92a2b2887c7235d121c009 11-May-2012 Alex Shi <alex.shi@intel.com> x86: replace percpu_xxx funcs with this_cpu_xxx

Since percpu_xxx() serial functions are duplicated with this_cpu_xxx().
Removing percpu_xxx() definition and replacing them by this_cpu_xxx()
in code. There is no function change in this patch, just preparation for
later percpu_xxx serial function removing.

On x86 machine the this_cpu_xxx() serial functions are same as
__this_cpu_xxx() without no unnecessary premmpt enable/disable.

Thanks for Stephen Rothwell, he found and fixed a i386 build error in
the patch.

Also thanks for Andrew Morton, he kept updating the patchset in Linus'
tree.

Signed-off-by: Alex Shi <alex.shi@intel.com>
Acked-by: Christoph Lameter <cl@gentwo.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
1361b83a13d4d92e53fbb6c877528713e118b821 21-Feb-2012 Linus Torvalds <torvalds@linux-foundation.org> i387: Split up <asm/i387.h> into exported and internal interfaces

While various modules include <asm/i387.h> to get access to things we
actually *intend* for them to use, most of that header file was really
pretty low-level internal stuff that we really don't want to expose to
others.

So split the header file into two: the small exported interfaces remain
in <asm/i387.h>, while the internal definitions that are only used by
core architecture code are now in <asm/fpu-internal.h>.

The guiding principle for this was to expose functions that we export to
modules, and leave them in <asm/i387.h>, while stuff that is used by
task switching or was marked GPL-only is in <asm/fpu-internal.h>.

The fpu-internal.h file could be further split up too, especially since
arch/x86/kvm/ uses some of the remaining stuff for its module. But that
kvm usage should probably be abstracted out a bit, and at least now the
internal FPU accessor functions are much more contained. Even if it
isn't perhaps as contained as it _could_ be.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.org
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>