History log of /include/linux/hrtimer.h
Revision Date Author Comments
e1f91f82b8bb031fe1b7731fb3666fa68c97fd38 27-Jun-2011 Vitaliy Ivanov <vitalivanov@gmail.com> treewide: fix kernel-doc warnings

Fix 'make htmldocs' warnings:

Warning(/include/linux/hrtimer.h:153): No description found for
parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef
member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef
member 'sk_rmem_alloc' description in 'sock'

Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
4d258b25d947521c8b913154db61ec55198243f8 27-Jun-2011 Vitaliy Ivanov <vitalivanov@gmail.com> Fix some kernel-doc warnings

Fix 'make htmldocs' warnings:

Warning(/include/linux/hrtimer.h:153): No description found for parameter 'clockid'
Warning(/include/linux/device.h:604): Excess struct/union/enum/typedef member 'of_match' description in 'device'
Warning(/include/net/sock.h:349): Excess struct/union/enum/typedef member 'sk_rmem_alloc' description in 'sock'

Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
68fa61c026057a39d6ccb850aa8785043afbee02 20-May-2011 Thomas Gleixner <tglx@linutronix.de> hrtimers: Reorder clock bases

The ordering of the clock bases is historical due to the
CLOCK_REALTIME and CLOCK_MONOTONIC constants. Now the hrtimer bases
have their own enumeration due to the gap between CLOCK_MONOTONIC and
CLOCK_BOOTTIME. So we can be more clever as most timers end up on the
CLOCK_MONOTONIC base due to the virtue of POSIX declaring that
relative CLOCK_REALTIME timers are not affected by time changes. In
desktop environments this is slowly changing as applications switch to
absolute timers, but I've observed empty CLOCK_REALTIME bases often
enough. There is no performance penalty or overhead when
CLOCK_REALTIME timers are active, but in case they are not we don't
skip over a full cache line.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
ab8177bc53e8ae3a3ba6d200ce2c2dae263f7ee5 20-May-2011 Thomas Gleixner <tglx@linutronix.de> hrtimers: Avoid touching inactive timer bases

Instead of iterating over all possible timer bases avoid it by marking
the active bases in the cpu base.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
f24444b01bf6c51c300fd3ffc73423383d747882 20-May-2011 Thomas Gleixner <tglx@linutronix.de> hrtimers: Make struct hrtimer_cpu_base layout less stupid

In the HIGHRES=y case we access the members at the end of struct
hrtimer_cpu_base first and then the one at the beginning. Move the
hrtimer data to front, so we have linear progressing access.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
9ec2690758a5467f24beb301cca5098078073bba 20-May-2011 Thomas Gleixner <tglx@linutronix.de> timerfd: Manage cancelable timers in timerfd

Peter is concerned about the extra scan of CLOCK_REALTIME_COS in the
timer interrupt. Yes, I did not think about it, because the solution
was so elegant. I didn't like the extra list in timerfd when it was
proposed some time ago, but with a rcu based list the list walk it's
less horrible than the original global lock, which was held over the
list iteration.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
99ee5315dac6211e972fa3f23bcc9a0343ff58c4 27-Apr-2011 Thomas Gleixner <tglx@linutronix.de> timerfd: Allow timers to be cancelled when clock was set

Some applications must be aware of clock realtime being set
backward. A simple example is a clock applet which arms a timer for
the next minute display. If clock realtime is set backward then the
applet displays a stale time for the amount of time which the clock
was set backwards. Due to that applications poll the time because we
don't have an interface.

Extend the timerfd interface by adding a flag which puts the timer
onto a different internal realtime clock. All timers on this clock are
expired whenever the clock was set.

The timerfd core records the monotonic offset when the timer is
created. When the timer is armed, then the current offset is compared
to the previous recorded offset. When it has changed, then
timerfd_settime returns -ECANCELED. When a timer is read the offset is
compared and if it changed -ECANCELED returned to user space. Periodic
timers are not rearmed in the cancelation case.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: John Stultz <johnstul@us.ibm.com>
Cc: Chris Friesen <chris.friesen@genband.com>
Tested-by: Kay Sievers <kay.sievers@vrfy.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davide Libenzi <davidel@xmailserver.org>
Reviewed-by: Alexander Shishkin <virtuoso@slind.org>
Link: http://lkml.kernel.org/r/%3Calpine.LFD.2.02.1104271359580.3323%40ionos%3E
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
b12a03ce4880bd13786a98db6de494a3e0123129 02-May-2011 Thomas Gleixner <tglx@linutronix.de> hrtimers: Prepare for cancel on clock was set timers

Make clock_was_set() unconditional and rename hres_timers_resume to
hrtimers_resume. This is a preparatory patch for hrtimers which are
cancelled when clock realtime was set.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
53370d2e8c0382e3e2aa76def93365ed674e7fc7 10-Mar-2011 Thomas Gleixner <tglx@linutronix.de> hrtimer: Update hrtimer->state documentation

We changed some of the state bits and combinations thereof over time,
but never updated the documentation.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
70a08cca1227dc31c784ec930099a4417a06e7d0 15-Feb-2011 John Stultz <john.stultz@linaro.org> timers: Add CLOCK_BOOTTIME hrtimer base

CLOCK_MONOTONIC stops while the system is in suspend. This is because
to applications system suspend is invisible. However, there is a
growing set of applications that are wanting to be suspend-aware,
but do not want to deal with the complications of CLOCK_REALTIME
(which might jump around if settimeofday is called).

For these applications, I propose a new clockid: CLOCK_BOOTTIME.
CLOCK_BOOTTIME is idential to CLOCK_MONOTONIC, except it also
includes any time spent in suspend.

This patch add hrtimer base for CLOCK_BOOTTIME, using
get_monotonic_boottime/ktime_get_boottime, to allow
in kernel users to set timers against.

CC: Jamie Lokier <jamie@shareable.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Shishkin <virtuoso@slind.org>
CC: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
abb3a4ea2e0ea7114a4475745da2f32bd9ad5b73 15-Feb-2011 John Stultz <john.stultz@linaro.org> time: Introduce get_monotonic_boottime and ktime_get_boottime

This adds new functions that return the monotonic time since boot
(in other words, CLOCK_MONOTONIC + suspend time).

CC: Jamie Lokier <jamie@shareable.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Shishkin <virtuoso@slind.org>
CC: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
e06383db9ec591696a06654257474b85bac1f8cb 15-Dec-2010 John Stultz <john.stultz@linaro.org> hrtimers: extend hrtimer base code to handle more then 2 clockids

The hrtimer code is written mainly with CLOCK_REALTIME and CLOCK_MONOTONIC
in mind. These are clockids 0 and 1 resepctively. However, if we are
to introduce any new hrtimer bases, using new clockids, we have to skip
the cputimers (clockids 2,3) as well as other clockids that may not impelement
timers.

This patch adds a little bit of indirection between the clockid and
the base, so that we can extend the base by one when we add
a new clockid at number 7 or so.

CC: Jamie Lokier <jamie@shareable.org>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Alexander Shishkin <virtuoso@slind.org>
CC: Arve Hjønnevåg <arve@android.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
175881db8916a5f5cdf920d32214caef588870fd 09-Jan-2011 Randy Dunlap <randy.dunlap@oracle.com> hrtimer.h: fix kernel-doc warning

Fix new kernel-doc notation warning in hrtimer.h:

Warning(include/linux/hrtimer.h:150): Excess struct/union/enum/typedef member 'first' description in 'hrtimer_clock_base'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
998adc3dda59f811966b3ccb21eb223680b25ec4 21-Sep-2010 John Stultz <john.stultz@linaro.org> hrtimers: Convert hrtimers to use timerlist infrastructure

Converts the hrtimer code to use the new timerlist infrastructure

Signed-off-by: John Stultz <john.stultz@linaro.org>
LKML Reference: <1290136329-18291-3-git-send-email-john.stultz@linaro.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
CC: Alessandro Zummo <a.zummo@towertech.it>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Richard Cochran <richardcochran@gmail.com>
5e4f083f78d03e9f8d2e327daccde16976f9bb00 24-Oct-2010 Yong Zhang <yong.zhang@windriver.com> hrtimer: Remove stale comment on curr_timer

curr_timer doesn't resident in struct hrtimer_cpu_base anymore.

Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
LKML-Reference: <1287892253-2587-1-git-send-email-yong.zhang0@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
351b3f7a21e413a9b14d0393171497d2373bd702 02-Apr-2010 Carsten Emde <C.Emde@osadl.org> hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME

The current version of schedule_hrtimeout() always uses the
monotonic clock. Some system calls such as mq_timedsend()
and mq_timedreceive(), however, require the use of the wall
clock due to the definition of the system call.

This patch provides the infrastructure to use schedule_hrtimeout()
with a CLOCK_REALTIME timer.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Tested-by: Pradyumna Sampath <pradysam@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arjan van de Veen <arjan@infradead.org>
LKML-Reference: <20100402204331.167439615@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ecb49d1a639acbacfc3771cae5ec07bed5df3847 17-Nov-2009 Thomas Gleixner <tglx@linutronix.de> hrtimers: Convert to raw_spinlocks

Convert locks which cannot be sleeping locks in preempt-rt to
raw_spinlocks.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
5f201907dfe4ad42c44006ddfcec00ed12e59497 10-Dec-2009 Heiko Carstens <heiko.carstens@de.ibm.com> hrtimer: move timer stats helper functions to hrtimer.c

There is no reason to make timer_stats_hrtimer_set_start_info and
friends visible to the rest of the kernel. So move all of them to
hrtimer.c. Also make timer_stats_hrtimer_set_start_info a static
inline function so it gets inlined and we avoid another function call.
Based on a patch by Thomas Gleixner.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
LKML-Reference: <20091210095629.GC4144@osiris.boeblingen.de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
41d2e494937715d3150e5c75d01f0e75ae899337 13-Nov-2009 Thomas Gleixner <tglx@linutronix.de> hrtimer: Tune hrtimer_interrupt hang logic

The hrtimer_interrupt hang logic adjusts min_delta_ns based on the
execution time of the hrtimer callbacks.

This is error-prone for virtual machines, where a guest vcpu can be
scheduled out during the execution of the callbacks (and the callbacks
themselves can do operations that translate to blocking operations in
the hypervisor), which in can lead to large min_delta_ns rendering the
system unusable.

Replace the current heuristics with something more reliable. Allow the
interrupt code to try 3 times to catch up with the lost time. If that
fails use the total time spent in the interrupt handler to defer the
next timer interrupt so the system can catch up with other things
which got delayed. Limit that deferment to 100ms.

The retry events and the maximum time spent in the interrupt handler
are recorded and exposed via /proc/timer_list

Inspired by a patch from Marcelo.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org
8629ea2eaba8ca0de2e38ce1b4a825e16255976e 03-Sep-2009 Feng Tang <feng.tang@intel.com> hrtimer: Fix /proc/timer_list regression

commit 507e1231 (timer stats: Optimize by adding quick check to avoid
function calls) introduced a regression in /proc/timer_list.

/proc/timer_list shows now
#0: <c27d46b0>, tick_sched_timer, S:01, <(null)>, /-1
instead of
#0: <c27d46b0>, tick_sched_timer, S:01, hrtimer_start, swapper/0

Revert the hrtimer quick check for now. The optimization needs more
thought, but this is neither 2.6.32-rc7 nor stable material.

[ tglx: - Removed unrelated changes from the original patch
- Prevent unneccesary call to timer_stats_update_stats
- massaged the changelog ]

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <alpine.LFD.2.00.0911181933540.24119@localhost.localdomain>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: stable@kernel.org
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
fbd90375d7531927d312766b548376d909811b4d 22-Jul-2009 Peter Zijlstra <a.p.zijlstra@chello.nl> hrtimer: Remove cb_entry from struct hrtimer

It's unused, remove it.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
f9f868dbcca961ed62f1df1d114abd0c38c47dce 09-Jul-2009 Heiko Carstens <heiko.carstens@de.ibm.com> timer stats: fix quick check optimization

git commit 507e1231 "timer stats: Optimize by adding quick check to
avoid function calls" added one wrong check so that one unnecessary
function call isn't elimated.

time_stats_account_hrtimer() checks if timer->start_pid isn't
initialized in order to find out if timer_stats_update_stats() should
be called. However start_pid is initialized with -1 instead of 0, so
that the function call always happens.

Check timer->start_site like in timer_stats_account_timer() to fix
this.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
507e123151149e578c9aae33eb876c49824da5f8 23-Jun-2009 Heiko Carstens <heiko.carstens@de.ibm.com> timer stats: Optimize by adding quick check to avoid function calls

When the kernel is configured with CONFIG_TIMER_STATS but timer
stats are runtime disabled we still get calls to
__timer_stats_timer_set_start_info which initializes some
fields in the corresponding struct timer_list.

So add some quick checks in the the timer stats setup functions
to avoid function calls to __timer_stats_timer_set_start_info
when timer stats are disabled.

In an artificial workload that does nothing but playing ping
pong with a single tcp packet via loopback this decreases cpu
consumption by 1 - 1.5%.

This is part of a modified function trace output on SLES11:

perl-2497 [00] 28630647177732388 [+ 125]: sk_reset_timer <-tcp_v4_rcv
perl-2497 [00] 28630647177732513 [+ 125]: mod_timer <-sk_reset_timer
perl-2497 [00] 28630647177732638 [+ 125]: __timer_stats_timer_set_start_info <-mod_timer
perl-2497 [00] 28630647177732763 [+ 125]: __mod_timer <-mod_timer
perl-2497 [00] 28630647177732888 [+ 125]: __timer_stats_timer_set_start_info <-__mod_timer
perl-2497 [00] 28630647177733013 [+ 93]: lock_timer_base <-__mod_timer

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Mustafa Mesanovic <mustafa.mesanovic@de.ibm.com>
Cc: Arjan van de Ven <arjan@infradead.org>
LKML-Reference: <20090623153811.GA4641@osiris.boeblingen.de.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cd6d95d8449b7c9f415f26041e9ae173d387b6bd 12-Jun-2009 Thomas Gleixner <tglx@linutronix.de> clocksource: prevent selection of low resolution clocksourse also for nohz=on

commit 3f68535adad (clocksource: sanity check sysfs clocksource
changes) prevents selection of non high resolution capable
clocksources when high resolution mode is active, but did not take
into account that the same rules apply for highres=off nohz=on.

Check the tick device mode instead of hrtimer_hres_active() to verify
whether the system needs to be protected from a switch to jiffies or
other non highres capable clock sources.

Reported-by: Luming Yu <luming.yu@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
3f68535adad8dd89499505a65fb25d0e02d118cc 22-Jan-2009 john stultz <johnstul@us.ibm.com> clocksource: sanity check sysfs clocksource changes

Thomas, Andrew and Ingo pointed out that we don't have any safety checks
in the clocksource sysfs entries to make sure sysadmins don't try to
change the clocksource to a non high-res timer capable clocksource (such
as jiffies) when high-res timers (HRT) is enabled. Doing so will likely
hang a system.

Correct this by filtering non HRT clocksources from available_clocksources
and not accepting non HRT clocksources with HRT enabled.

Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
597d0275736dad9c3bda6f0a00a1c477dc0f37b1 15-Apr-2009 Arun R Bharadwaj <arun@linux.vnet.ibm.com> timers: Framework for identifying pinned timers

* Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]:

This patch creates a new framework for identifying cpu-pinned timers
and hrtimers.

This framework is needed because pinned timers are expected to fire on
the same CPU on which they are queued. So it is essential to identify
these and not migrate them, in case there are any.

For regular timers, the currently existing add_timer_on() can be used
queue pinned timers and subsequently mod_timer_pinned() can be used
to modify the 'expires' field.

For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
added to queue cpu-pinned hrtimer.

[ tglx: use .._PINNED mode argument instead of creating tons of new
functions ]

Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
7f1e2ca9f04b02794597f60e7b1d43f0a1317939 13-Mar-2009 Peter Zijlstra <a.p.zijlstra@chello.nl> hrtimer: fix rq->lock inversion (again)

It appears I inadvertly introduced rq->lock recursion to the
hrtimer_start() path when I delegated running already expired
timers to softirq context.

This patch fixes it by introducing a __hrtimer_start_range_ns()
method that will not use raise_softirq_irqoff() but
__raise_softirq_irqoff() which avoids the wakeup.

It then also changes schedule() to check for pending softirqs and
do the wakeup then, I'm not quite sure I like this last bit, nor
am I convinced its really needed.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus@samba.org
LKML-Reference: <20090313112301.096138802@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ca109491f612aab5c8152207631c0444f63da97f 25-Nov-2008 Peter Zijlstra <peterz@infradead.org> hrtimer: removing all ur callback modes

Impact: cleanup, move all hrtimer processing into hardirq context

This is an attempt at removing some of the hrtimer complexity by
reducing the number of callback modes to 1.

This means that all hrtimer callback functions will be ran from HARD-irq
context.

I went through all the 30 odd hrtimer callback functions in the kernel
and saw only one that I'm not quite sure of, which is the one in
net/can/bcm.c - hence I'm CC-ing the folks responsible for that code.

Furthermore, the hrtimer core now calls callbacks directly with IRQs
disabled in case you try to enqueue an expired timer. If this timer is a
periodic timer (which should use hrtimer_forward() to advance its time)
then it might be possible to end up in an inf. recursive loop due to the
fact that hrtimer_forward() doesn't round up to the next timer
granularity, and therefore keeps on calling the callback - obviously
this needs a fix.

Aside from that, this seems to compile and actually boot on my dual core
test box - although I'm sure there are some bugs in, me not hitting any
makes me certain :-)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
621a0d5207c18012cb39932f2d9830a11a6cb03d 12-Nov-2008 Peter Zijlstra <a.p.zijlstra@chello.nl> hrtimer: clean up unused callback modes

Impact: cleanup

git grep HRTIMER_CB_IRQSAFE revealed half the callback modes are actually
unused.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
7597bc94d6f3bdccb086ac7f2ad91292fdaee2a4 05-Nov-2008 David Howells <dhowells@redhat.com> Fix accidental implicit cast in HR-timer conversion

Fix the hrtimer_add_expires_ns() function. It should take a 'u64 ns' argument,
but rather takes an 'unsigned long ns' argument - which might only be 32-bits.

On FRV, this results in the kernel locking up because hrtimer_forward() passes
the result of a 64-bit multiplication to this function, for which the compiler
discards the top 32-bits - something that didn't happen when ktime_add_ns() was
called directly.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
592aa999d6a272856c9bfbdaac0cfba1bb37c24c 20-Oct-2008 Thomas Gleixner <tglx@linutronix.de> hrtimers: add missing docbook comments to struct hrtimer

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
40b8606253552109815786e5d4b0de98782d31f5 15-Oct-2008 Stephen Rothwell <sfr@canb.auug.org.au> DECLARE_PER_CPU needs linux/percpu.h

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
2075eb8d95612cadde91ef5be82691d97a2ea6c5 07-Oct-2008 Arjan van de Ven <arjan@linux.intel.com> rangetimer: fix x86 build failure for the !HRTIMERS case

the timer peek function was on the wrong side of an ifdef,
breaking for the !HRTIMERs case. Just provide an empty inline
for that case since it doesn't make sense in that scenario.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
ccc7dadf736639da86f3e0c86832c11a66fc8221 29-Sep-2008 Thomas Gleixner <tglx@linutronix.de> hrtimer: prevent migration of per CPU hrtimers

Impact: per CPU hrtimers can be migrated from a dead CPU

The hrtimer code has no knowledge about per CPU timers, but we need to
prevent the migration of such timers and warn when such a timer is
active at migration time.

Explicitely mark the timers as per CPU and use a more understandable
mode descriptor for the interrupts safe unlocked callback mode, which
is used by hrtimer_sleeper and the scheduler code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
b00c1a99e7758f794923c61e5cd55268d61c9469 29-Sep-2008 Thomas Gleixner <tglx@linutronix.de> hrtimer: mark migration state

Impact: during migration active hrtimers can be seen as inactive

The migration code removes the hrtimers from the queues of the dead
CPU and sets the state temporary to INACTIVE. The enqueue code sets it
to ACTIVE/PENDING again.

Prevent that the wrong state can be seen by using a separate migration
state bit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1b02469088ac7a13d7e622b618b7410d0f1ce5ec 22-Sep-2008 Richard Kennedy <richard@rsk.demon.co.uk> hrtimer: reorder struct hrtimer to save 8 bytes on 64bit builds

reorder struct hrtimer to save 8 bytes on 64 bit builds when
CONFIG_TIMER_STATS selected. (also removes 8 bytes from signal_struct)

Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
b91c4996df56fcd201f85c392a1de7bc3f6641f5 19-Sep-2008 Mark McLoughlin <markmc@redhat.com> hrtimer: remove hrtimer_clock_base::reprogram()

hrtimer_clock_base::reprogram() also appears to never
have been used, so remove it.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d7cfb60c5cf904ecf1e0ae23ec178175b86f0d4a 19-Sep-2008 Mark McLoughlin <markmc@redhat.com> hrtimer: remove hrtimer_clock_base::get_softirq_time()

Peter Zijlstra noticed this 8 months ago and I just noticed
it again.

hrtimer_clock_base::get_softirq_time() is currently unused
in the entire tree. In fact, looking at the logs, it appears
as if it was never used. Remove it.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2e94d1f71f7e4404d997e6fb4f1618aa147d76f9 11-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: peek at the timer queue just before going idle

As part of going idle, we already look at the time of the next timer event to determine
which C-state to select etc.

This patch adds functionality that causes the timers that are past their
soft expire time, to fire at this time, before we calculate the next wakeup
time. This functionality will thus avoid wakeups by running timers before
going idle rather than specially waking up for it.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
4ce105d30e08fb8a1783c55a0e48aa3fa200c455 08-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: incorporate feedback from Peter Zijlstra

(based on lkml review)
* use rt_task()
* task_nice() has a sign

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
da8f2e170ea94cc20f8ebbc8ee8d127edb8f12f1 07-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: add a hrtimer_start_range() function

this patch adds a _range version of hrtimer_start() so that range timers
can be created; the hrtimer_start() function is just a wrapper around this.

In addition, hrtimer_start_expires() will now preserve existing ranges.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
2ec02270c00f94b08fddfb68c37510a9fb47ac7c 06-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: another build fix

More randconfig testing

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
584fb4a76413ec9215741e075e0dfb69173b213f 06-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: fix build bug found by Ingo

in some randconfig configurations, hrtimers are used even though
the hrtimer config if off; and it broke the build due to some of
the new functions being on the wrong side of the ifdef.

This patch moves the functions to the other side of the ifdef, fixing
the build bug.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
654c8e0b1c623b156c5b92f28d914ab38c9c2c90 02-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: turn hrtimers into range timers

this patch turns hrtimers into range timers; they have 2 expire points
1) the soft expire point
2) the hard expire point

the kernel will do it's regular best effort attempt to get the timer run
at the hard expire point. However, if some other time fires after the soft
expire point, the kernel now has the freedom to fire this timer at this point,
and thus grouping the events and preventing a power-expensive wakeup in the
future.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
799b64de256ea68fbb5db63bb55f61c305870643 02-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: rename the "expires" struct member to avoid accidental usage

To catch code that still touches the "expires" memory directly, rename it
to have the compiler complain rather than get nasty, hard to explain,
runtime behavior

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
63ca243b271f5b44e0b1057003cf498b6d0fadf7 01-Sep-2008 Arjan van de Ven <arjan@linux.intel.com> hrtimer: add abstraction functions for accessing the "expires" member

In order to be able to turn hrtimers into range based, we need to provide
accessor functions for getting to the "expires" ktime_t member of the
struct hrtimer.

This patch adds a set of accessors for this purpose:
* hrtimer_set_expires
* hrtimer_set_expires_tv64
* hrtimer_add_expires
* hrtimer_add_expires_ns
* hrtimer_get_expires
* hrtimer_get_expires_tv64
* hrtimer_get_expires_ns
* hrtimer_expires_remaining
* hrtimer_start_expires

No users of these new accessors are added yet; these follow in later patches.
Hopefully this patch can even go into 2.6.27-rc so that the conversions will
not have a bottleneck in -next

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
7bb67439bf6bd3782f07f1d7be1e63406453d5de 31-Aug-2008 Arjan van de Ven <arjan@linux.intel.com> select: Introduce a hrtimeout function

This patch adds a schedule_hrtimeout() function, to be used by select() and
poll() in a later patch. This function works similar to schedule_timeout()
in most ways, but takes a timespec rather than jiffies.

With a lot of contributions/fixes from Thomas

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4346f65426cbceb64794b468e4af6f5632d58c5e 30-Apr-2008 Oliver Hartkopp <oliver@hartkopp.net> hrtimer: remove duplicate helper function

The helper function hrtimer_callback_running() is used in
kernel/hrtimer.c as well as in the updated net/can/bcm.c which now
supports hrtimers. Moving the helper function to hrtimer.h removes the
duplicate definition in the C-files.

Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
237fc6e7a35076f584b9d0794a5204fe4bd9b9e5 30-Apr-2008 Thomas Gleixner <tglx@linutronix.de> add hrtimer specific debugobjects code

hrtimers have now dynamic users in the network code. Put them under
debugobjects surveillance as well.

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Greg KH <greg@kroah.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8e60e05fdc7344415fa69a3883b11f65db967b47 04-Apr-2008 Oleg Nesterov <oleg@tv-sign.ru> hrtimers: simplify lockdep handling

In order to avoid the false positive from lockdep, each per-cpu base->lock has
the separate lock class and migrate_hrtimers() uses double_spin_lock().

This is overcomplicated: except for migrate_hrtimers() we never take 2 locks
at once, and migrate_hrtimers() can use spin_lock_nested().

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
080344b98805553f9b01de0f59a41b1533036d8d 01-Feb-2008 Oleg Nesterov <oleg@tv-sign.ru> hrtimer: fix *rmtp handling in hrtimer_nanosleep()

Spotted by Pavel Emelyanov and Alexey Dobriyan.

hrtimer_nanosleep() sets restart_block->arg1 = rmtp, but this rmtp points to
the local variable which lives in the caller's stack frame. This means that
if sys_restart_syscall() actually happens and it is interrupted as well, we
don't update the user-space variable, but write into the already dead stack
frame.

Introduced by commit 04c227140fed77587432667a574b14736a06dd7f
hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

Change the callers to pass "__user *rmtp" to hrtimer_nanosleep(), and change
hrtimer_nanosleep() to use copy_to_user() to actually update *rmtp.

Small problem remains. man 2 nanosleep states that *rtmp should be written if
nanosleep() was interrupted (it says nothing whether it is OK to update *rmtp
if nanosleep returns 0), but (with or without this patch) we can dirty *rem
even if nanosleep() returns 0.

NOTE: this patch doesn't change compat_sys_nanosleep(), because it has other
bugs. Fixed by the next patch.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Alexey Dobriyan <adobriyan@sw.ru>
Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
Cc: Pavel Emelyanov <xemul@sw.ru>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Toyo Abe <toyoa@mvista.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

include/linux/hrtimer.h | 2 -
kernel/hrtimer.c | 51 +++++++++++++++++++++++++-----------------------
kernel/posix-timers.c | 14 +------------
3 files changed, 30 insertions(+), 37 deletions(-)
3eb056764dd806bbe84eb604e45e7470feeaafd8 08-Feb-2008 Li Zefan <lizf@cn.fujitsu.com> time: fix typo in comments

Fix typo in comments.

BTW: I have to fix coding style in arch/ia64/kernel/time.c also, otherwise
checkpatch.pl will be complaining.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
151db1fc23800875c7ac353b106b7dab77061275 07-Feb-2008 Tony Breeds <tony@bakeyournoodle.com> Fix compilation of powerpc asm-offsets.c with old gcc

Commit ad7f71674ad7c3c4467e48f6ab9e85516dae2720 ("[POWERPC] Use a
sensible default for clock_getres() in the VDSO") corrected the clock
resolution reported by the VDSO clock_getres() but introduced another
problem in that older versions of gcc (gcc-4.0 and earlier) fail to
compile the new code in arch/powerpc/kernel/asm-offsets.c.

This fixes it by introducing a new MONOTONIC_RES_NSEC define in the
generic code which is equivalent to KTIME_MONOTONIC_RES but is just an
integer constant, not a ktime union.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
4d672e7ac79b5ec5cdc90e450823441e20464691 05-Feb-2008 Davide Libenzi <davidel@xmailserver.org> timerfd: new timerfd API

This is the new timerfd API as it is implemented by the following patch:

int timerfd_create(int clockid, int flags);
int timerfd_settime(int ufd, int flags,
const struct itimerspec *utmr,
struct itimerspec *otmr);
int timerfd_gettime(int ufd, struct itimerspec *otmr);

The timerfd_create() API creates an un-programmed timerfd fd. The "clockid"
parameter can be either CLOCK_MONOTONIC or CLOCK_REALTIME.

The timerfd_settime() API give new settings by the timerfd fd, by optionally
retrieving the previous expiration time (in case the "otmr" parameter is not
NULL).

The time value specified in "utmr" is absolute, if the TFD_TIMER_ABSTIME bit
is set in the "flags" parameter. Otherwise it's a relative time.

The timerfd_gettime() API returns the next expiration time of the timer, or
{0, 0} if the timerfd has not been set yet.

Like the previous timerfd API implementation, read(2) and poll(2) are
supported (with the same interface). Here's a simple test program I used to
exercise the new timerfd APIs:

http://www.xmailserver.org/timerfd-test2.c

[akpm@linux-foundation.org: coding-style cleanups]
[akpm@linux-foundation.org: fix ia64 build]
[akpm@linux-foundation.org: fix m68k build]
[akpm@linux-foundation.org: fix mips build]
[akpm@linux-foundation.org: fix alpha, arm, blackfin, cris, m68k, s390, sparc and sparc64 builds]
[heiko.carstens@de.ibm.com: fix s390]
[akpm@linux-foundation.org: fix powerpc build]
[akpm@linux-foundation.org: fix sparc64 more]
Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5e05ad7d4e3b11f935998882b5d9c3b257137f1b 05-Feb-2008 Davide Libenzi <davidel@xmailserver.org> timerfd: introduce a new hrtimer_forward_now() function

I think that advancing the timer against the timer's current "now" can be a
pretty common usage, so, w/out exposing hrtimer's internals, we add a new
hrtimer_forward_now() function.

Signed-off-by: Davide Libenzi <davidel@xmailserver.org>
Cc: Michael Kerrisk <mtk-manpages@gmx.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ef08cce81d9be38063ec7796e36f2b32bdf82ff2 03-Feb-2008 Li Zefan <lizf@cn.fujitsu.com> time: delete comments that refer to noexistent symbols

Function do_timer_interrupt_hook() don't take argument regs,
and structure hrtimer_sleeper don't have member cb_pending.
So delete comments refering to these symbols.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
d3d74453c34f8fd87674a8cf5b8a327c68f22e99 25-Jan-2008 Peter Zijlstra <a.p.zijlstra@chello.nl> hrtimer: fixup the HRTIMER_CB_IRQSAFE_NO_SOFTIRQ fallback

Currently all highres=off timers are run from softirq context, but
HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context.

Fix this up by splitting it similar to the highres=on case.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8f4d37ec073c17e2d4aa8851df5837d798606d6f 25-Jan-2008 Peter Zijlstra <a.p.zijlstra@chello.nl> sched: high-res preemption tick

Use HR-timers (when available) to deliver an accurate preemption tick.

The regular scheduler tick that runs at 1/HZ can be too coarse when nice
level are used. The fairness system will still keep the cpu utilisation 'fair'
by then delaying the task that got an excessive amount of CPU time but try to
minimize this by delivering preemption points spot-on.

The average frequency of this extra interrupt is sched_latency / nr_latency.
Which need not be higher than 1/HZ, its just that the distribution within the
sched_latency period is important.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
04c227140fed77587432667a574b14736a06dd7f 15-Oct-2007 Anton Blanchard <anton@samba.org> hrtimer: Rework hrtimer_nanosleep to make sys_compat_nanosleep easier

Pull the copy_to_user out of hrtimer_nanosleep and into the callers
(common_nsleep, sys_nanosleep) in preparation for converting
compat_sys_nanosleep to use hrtimers.

Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c5c061b8f9726bc2c25e19dec227933a13d1e6b7 16-Jul-2007 Venki Pallipadi <venkatesh.pallipadi@intel.com> Add a flag to indicate deferrable timers in /proc/timer_stats

Add a flag in /proc/timer_stats to indicate deferrable timers. This will
let developers/users to differentiate between types of tiemrs in
/proc/timer_stats.

Deferrable timer and normal timer will appear in /proc/timer_stats as below.
10D, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
10, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)

Also version of timer_stats changes from v0.1 to v0.2

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
995f054f2a342f8505fed4f8395d12c0f5966414 07-Apr-2007 Ingo Molnar <mingo@elte.hu> [PATCH] high-res timers: resume fix

Soeren Sonnenburg reported that upon resume he is getting
this backtrace:

[<c0119637>] smp_apic_timer_interrupt+0x57/0x90
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0104d30>] apic_timer_interrupt+0x28/0x30
[<c0142d30>] retrigger_next_event+0x0/0xb0
[<c0140068>] __kfifo_put+0x8/0x90
[<c0130fe5>] on_each_cpu+0x35/0x60
[<c0143538>] clock_was_set+0x18/0x20
[<c0135cdc>] timekeeping_resume+0x7c/0xa0
[<c02aabe1>] __sysdev_resume+0x11/0x80
[<c02ab0c7>] sysdev_resume+0x47/0x80
[<c02b0b05>] device_power_up+0x5/0x10

it turns out that on resume we mistakenly re-enable interrupts too
early. Do the timer retrigger only on the current CPU.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Soeren Sonnenburg <kernel@nn7.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d1d67174b42a02c7d106894df0ed155d595871f7 06-Mar-2007 Andres Salomon <dilinger@debian.org> [PATCH] hrtimers: hrtimer_clock_base description typo

The description for the hrtimer_clock_base struct describes "hrtimer_base".
That should be hrtimer_clock_base.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
8437fdc7428eac363579bf0cce2526c35573735c 06-Mar-2007 Andres Salomon <dilinger@debian.org> [PATCH] hrtimers: fix HRTIMER_CB_IRQSAFE_NO_SOFTIRQ description

The description for HRTIMER_CB_IRQSAFE_NO_SOFTIRQ is backwards; "NO
SOFTIRQ" sounds a whole lot like it means it must not be run in a softirq.

Signed-off-by: Andres Salomon <dilinger@debian.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
05fb6bf0b29552b64dc86f405a484de2514e0ac2 01-Mar-2007 Randy Dunlap <randy.dunlap@oracle.com> [PATCH] kernel-doc fixes for 2.6.20-git15 (non-drivers)

Fix kernel-doc warnings in 2.6.20-git15 (lib/, mm/, kernel/, include/).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
88ad0bf6890505cbd9ca1dbb79944a27b5c8697d 16-Feb-2007 Ingo Molnar <mingo@elte.hu> [PATCH] Add SysRq-Q to print timer_list debug info

Add SysRq-Q to print pending timers and other timer info.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
82f67cd9fca8c8762c15ba7ed0d5747588c1e221 16-Feb-2007 Ingo Molnar <mingo@elte.hu> [PATCH] Add debugging feature /proc/timer_stat

Add /proc/timer_stats support: debugging feature to profile timer expiration.
Both the starting site, process/PID and the expiration function is captured.
This allows the quick identification of timer event sources in a system.

Sample output:

# echo 1 > /proc/timer_stats
# cat /proc/timer_stats
Timer Stats Version: v0.1
Sample period: 4.010 s
24, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
11, 0 swapper sk_reset_timer (tcp_delack_timer)
6, 0 swapper hrtimer_stop_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
17, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
2, 1 swapper queue_delayed_work_on (delayed_work_timer_fn)
4, 2050 pcscd do_nanosleep (hrtimer_wakeup)
5, 4179 sshd sk_reset_timer (tcp_write_timer)
4, 2248 yum-updatesd schedule_timeout (process_timeout)
18, 0 swapper hrtimer_restart_sched_tick (hrtimer_sched_tick)
3, 0 swapper sk_reset_timer (tcp_delack_timer)
1, 1 swapper neigh_table_init_no_netlink (neigh_periodic_timer)
2, 1 swapper e1000_up (e1000_watchdog)
1, 1 init schedule_timeout (process_timeout)
100 total events, 25.24 events/sec

[ cleanups and hrtimers support from Thomas Gleixner <tglx@linutronix.de> ]
[bunk@stusta.de: nr_entries can become static]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
54cdfdb47f73b5af3d1ebb0f1e383efbe70fde9e 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimers: add high resolution timer support

Implement high resolution timers on top of the hrtimers infrastructure and the
clockevents / tick-management framework. This provides accurate timers for
all hrtimer subsystem users.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
79bf2bb335b85db25d27421c798595a2fa2a0e82 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de> [PATCH] tick-management: dyntick / highres functionality

With Ingo Molnar <mingo@elte.hu>

Add functions to provide dynamic ticks and high resolution timers. The code
which keeps track of jiffies and handles the long idle periods is shared
between tick based and high resolution timer based dynticks. The dyntick
functionality can be disabled on the kernel commandline. Provide also the
infrastructure to support high resolution timers.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
d316c57ff6bfad9557462b9100f25c6260d2b774 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de> [PATCH] clockevents: add core functionality

Architectures register their clock event devices, in the clock events core.
Users of the clockevents core can get clock event devices for their use. The
clockevents core code provides notification mechanisms for various clock
related management events.

This allows to control the clock event devices without the architectures
having to worry about the details of function assignment. This is also a
preliminary for high resolution timers and dynamic ticks to allow the core
code to control the clock functionality without intrusive changes to the
architecture code.

[Fixes-by: Ingo Molnar <mingo@elte.hu>]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
5cfb6de7cd7c8f04655c9d23533ca506647beace 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimers: clean up callback tracking

Reintroduce ktimers feature "optimized away" by the ktimers review process:
remove the curr_timer pointer from the cpu-base and use the hrtimer state.

No functional changes.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
303e967ff90a9d19ad3f8c9028ccbfa7f408fbb3 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimers; add state tracking

Reintroduce ktimers feature "optimized away" by the ktimers review process:
multiple hrtimer states to enable the running of hrtimers without holding the
cpu-base-lock.

(The "optimized" rbtree hack carried only 2 states worth of information and we
need 4 for high resolution timers and dynamic ticks.)

No functional changes.

Build-fixes-from: Andrew Morton <akpm@osdl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3c8aa39d7c445ae2612b6b626f76f077e7a7ab0d 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimers: cleanup locking

Improve kernel/hrtimers.c locking: use a per-CPU base with a lock to control
locking of all clocks belonging to a CPU. This simplifies code that needs to
lock all clocks at once. This makes life easier for high-res timers and
dyntick.

No functional changes.

[ optimization change from Andrew Morton <akpm@osdl.org> ]

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
c9cb2e3d7c9178ab75d0942f96abb3abe0369906 16-Feb-2007 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimers: namespace and enum cleanup

- hrtimers did not use the hrtimer_restart enum and relied on the implict
int representation. Fix the prototypes and the functions using the enums.
- Use seperate name spaces for the enumerations
- Convert hrtimer_restart macro to inline function
- Add comments

No functional changes.

[akpm@osdl.org: fix input driver]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Dmitry Torokhov <dtor@mail.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
411187fb05cd11676b0979d9fbf3291db69dbce2 16-Feb-2007 John Stultz <johnstul@us.ibm.com> [PATCH] GTOD: persistent clock support

Persistent clock support: do proper timekeeping across suspend/resume.

[bunk@stusta.de: cleanup]
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: Roman Zippel <zippel@linux-m68k.org>
Cc: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1711ef3866b0360e102327389fe4b76c849bbe83 29-Sep-2006 Toyo Abe <toyoa@mvista.com> [PATCH] posix-timers: Fix clock_nanosleep() doesn't return the remaining time in compatibility mode

The clock_nanosleep() function does not return the time remaining when the
sleep is interrupted by a signal.

This patch creates a new call out, compat_clock_nanosleep_restart(), which
handles returning the remaining time after a sleep is interrupted. This
patch revives clock_nanosleep_restart(). It is now accessed via the new
call out. The compat_clock_nanosleep_restart() is used for compatibility
access.

Since this is implemented in compatibility mode the normal path is
virtually unaffected - no real performance impact.

Signed-off-by: Toyo Abe <toyoa@mvista.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
6dba28379edc08327ede01ff41bd3c9dd46a7fa0 06-Sep-2006 Henrik Kretzschmar <henne@nachtwindheim.de> [PATCH] Documentation for lock_key in struct hrtimer_base

Fixes an error message on make xmldocs.

Signed-off-by: Henrik Kretzschmar <henne@nachtwindheim.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
543655244866b8ec648fea1eb9c32a35ffba5721 03-Jul-2006 Ingo Molnar <mingo@elte.hu> [PATCH] lockdep: annotate hrtimer base locks

Teach special (recursive) locking code to the lock validator. Has no effect
on non-lockdep kernels.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
fa9799e33d362aeca4555cd6318735bab1c04d16 25-Jun-2006 Randy Dunlap <rdunlap@xenotime.net> [PATCH] ktime/hrtimer: fix kernel-doc comments

Fix kernel-doc formatting in ktime.h and hrtimer.[ch] files.

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ed198cb49750fd9ec564e9f1df66c10efea605f1 22-Apr-2006 David Woodhouse <dwmw2@infradead.org> [RBTREE] Update hrtimers to use rb_parent() accessor macro.

Also switch it to use the same method of using off-tree nodes as
everyone else now does -- set them to point to themselves.

Signed-off-by: David Woodhouse <dwmw2@infradead.org>
a580290c3e64bb695158a090d02d1232d9609311 02-Apr-2006 Martin Waitz <tali@admingilde.org> Documentation: fix minor kernel-doc warnings

This patch updates the comments to match the actual code.

Signed-off-by: Martin Waitz <tali@admingilde.org>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
00362e33f65f1cb5d15e62ea5509520ce2770360 31-Mar-2006 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimer: create generic sleeper

The removal of the data field in the hrtimer structure enforces the
embedding of the timer into another data structure. nanosleep now uses a
private implementation of the most common used timer callback function
(simple task wakeup).

In order to avoid the reimplentation of such functionality all over the
place a generic hrtimer_sleeper functionality is created.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
05cfb614ddbf3181540ce09d44d96486f8ba8d6a 26-Mar-2006 Roman Zippel <zippel@linux-m68k.org> [PATCH] hrtimers: remove data field

The nanosleep cleanup allows to remove the data field of hrtimer. The
callback function can use container_of() to get it's own data. Since the
hrtimer structure is anyway embedded in other structures, this adds no
overhead.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
b75f7a51ca75c977d7d77f735d7a7859194eb39e 26-Mar-2006 Roman Zippel <zippel@linux-m68k.org> [PATCH] hrtimers: remove state field

Remove the state field and encode this information in the rb_node similiar to
normal timer.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
432569bb9d9d424d7ffe5b21f8205c55bdd1aaa8 26-Mar-2006 Roman Zippel <zippel@linux-m68k.org> [PATCH] hrtimers: simplify nanosleep

nanosleep is the only user of the expired state, so let it manage this itself,
which makes the hrtimer code a bit simpler. The remaining time is also only
calculated if requested.

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
44f21475511bbc0135b52c66ad74dcc6a9026da3 26-Mar-2006 Roman Zippel <zippel@linux-m68k.org> [PATCH] hrtimers: pass current time to hrtimer_forward()

Pass current time to hrtimer_forward(). This allows to use the softirq time
in the timer base when the forward function is called from the timer callback.
Other places pass current time with a call to timer->base->get_time().

Signed-off-by: Roman Zippel <zippel@linux-m68k.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
92127c7a45d4d167d9b015a5f9de6b41ed66f1d0 26-Mar-2006 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimers: optimize softirq runqueues

The hrtimer softirq is called from the timer softirq every tick. Retrieve the
current time from xtime and wall_to_monotonic instead of calling
base->get_time() for each timer base. Store the time in the base structure
and provide a hook once clock source abstractions are in place and to keep the
code open for new base clocks.

Based on a patch from: Roman Zippel <zippel@linux-m68k.org>

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
69239749e1ac4f3496906aa4267cb9f61ce52c9c 07-Mar-2006 Tony Lindgren <tony@atomide.com> [PATCH] fix next_timer_interrupt() for hrtimer

Also from Thomas Gleixner <tglx@linutronix.de>

Function next_timer_interrupt() got broken with a recent patch
6ba1b91213e81aa92b5cf7539f7d2a94ff54947c as sys_nanosleep() was moved to
hrtimer. This broke things as next_timer_interrupt() did not check hrtimer
tree for next event.

Function next_timer_interrupt() is needed with dyntick (CONFIG_NO_IDLE_HZ,
VST) implementations, as the system can be in idle when next hrtimer event
was supposed to happen. At least ARM and S390 currently use
next_timer_interrupt().

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
7978672c4d9a1e6a6081de3a9d9ba5e5b24904a0 01-Feb-2006 George Anzinger <george@wildturkeyranch.net> [PATCH] hrtimers: cleanups and simplifications

Clean up the interface to hrtimers by changing the init code to pass the mode
as well as the clock. This allow the init code to select the correct base and
eliminates extra timer re-init code in posix-timers. We also simplify the
restart interface nanosleep use.

Signed-off-by: George Anzinger <george@mvista.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ff60a5dc4fa584d47022d2533bc5c53b80096fb5 01-Feb-2006 akpm@osdl.org <akpm@osdl.org> [PATCH] hrtimers: fix posix-timer requeue race

From: Steven Rostedtrostedt@goodmis.org <rostedt@goodmis.org>

CPU0 expires a posix-timer and runs the callback function. The signal is
queued.

After releasing the posix-timer lock and before returning to hrtimer_run_queue
CPU0 gets interrupted. CPU1 delivers the queued signal and rearms the timer.
CPU0 comes back to hrtimer_run_queue and sets the timer state to expired.

The next modification of the timer can result in an oops, because the state
information is wrong.

Keep track of state = RUNNING and check if the state has been in the return
path of hrtimer_run_queue. In case the state has been changed, ignore a
restart request and do not touch the state variable.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
c9db4fa11526affde83603fe52595bd1260c1354 12-Jan-2006 Thomas Gleixner <tglx@linutronix.de> [hrtimer] Enforce resolution as lower limit of intervals

Roman Zippel pointed out that the missing lower limit of intervals
leads to an accounting error in the overrun count. Enforce the lower
limit of intervals to resolution in the timer forwarding code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
e2787630c1abb075c935cf47e91beb7c656f48c4 12-Jan-2006 Thomas Gleixner <tglx@linutronix.de> [hrtimer] Change resolution storage to ktime_t format

Change the storage format of the per base resolution to ktime_t to
make it easier accessible in the hrtimers code.

Change the resolution from (NSEC_PER_SEC/HZ) to TICK_NSEC as Roman
pointed out. TICK_NSEC is closer to the real resolution.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
288867ec5c377db82933b64460ce050e5c998ee9 12-Jan-2006 Thomas Gleixner <tglx@linutronix.de> [hrtimer] Remove listhead from hrtimer struct

The list_head in the hrtimer structure was introduced for easy access
to the first timer with the further extensions of real high resolution
timers in mind, but it turned out in the course of development that
it is not necessary for the standard use case. Remove the list head
and access the first expiry timer by a datafield in the timer base.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
becf8b5d00f4b47e847f98322cdaf8cd16243861 10-Jan-2006 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimer: convert posix timers completely

- convert posix-timers.c to use hrtimers

- remove the now obsolete abslist code

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
10c94ec16dd187f8d8dfdbb088e98330c05bf03c 10-Jan-2006 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimer: create hrtimer nanosleep API

introduce the hrtimer_nanosleep() and hrtimer_nanosleep_real() APIs. Not yet
used by any code.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
c0a3132963db68f1fbbd0e316b73de100fee3f08 10-Jan-2006 Thomas Gleixner <tglx@linutronix.de> [PATCH] hrtimer: hrtimer core code

hrtimer subsystem core. It is initialized at bootup and expired by the timer
interrupt, but is otherwise not utilized by any other subsystem yet.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>