History log of /drivers/cpufreq/cpufreq_ondemand.c
Revision Date Author Comments
fd0ef7a0583b9af3efeb7b1f965ea80b5ff70cdf 29-Feb-2012 MyungJoo Ham <myungjoo.ham@samsung.com> [CPUFREQ] CPUfreq ondemand: update sampling rate without waiting for next sampling

When a new sampling rate is shorter than the current one, (e.g., 1 sec
--> 10 ms) regardless how short the new one is, the current ondemand
mechanism wait for the previously set timer to be expired.

For example, if the user has just expressed that the sampling rate
should be 10 ms from now and the previous was 1000 ms, the new rate may
become effective 999 ms later, which could be not acceptable for the
user if the user has intended to speed up sampling because the system is
expected to react to CPU load fluctuation quickly from __now__.

In order to address this issue, we need to cancel the previously set
timer (schedule_delayed_work) and reset the timer if resetting timer is
expected to trigger the delayed_work ealier.

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Signed-off-by: Dave Jones <davej@redhat.com>
648616343cdbe904c585a6c12e323d3b3c72e46f 15-Dec-2011 Martin Schwidefsky <schwidefsky@de.ibm.com> [S390] cputime: add sparse checking and cleanup

Make cputime_t and cputime64_t nocast to enable sparse checking to
detect incorrect use of cputime. Drop the cputime macros for simple
scalar operations. The conversion macros are still needed.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
21f2e3c86b3746aaa462f9a2734363f4f41a641c 09-Dec-2011 Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> [CPUFREQ] Remove wall variable from cpufreq_gov_dbs_init()

CPUFREQ Remove wall variable from cpufreq_gov_dbs_init()

Remove wall variable from cpufreq_gov_dbs_init() as
get_cpu_idle_time_us() no longer updates the last_update_time
unconditionally. Passing non-NULL last_update_time address
will result in accounting additional idle time with
update_ts_time_stats() before returning idle_sleeptime.

Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
Signed-off-by: Dave Jones <davej@redhat.com>
--
drivers/cpufreq/cpufreq_ondemand.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
3292beb340c76884427faa1f5d6085719477d889 28-Nov-2011 Glauber Costa <glommer@parallels.com> sched/accounting: Change cpustat fields to an array

This patch changes fields in cpustat from a structure, to an
u64 array. Math gets easier, and the code is more flexible.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Tuner <pjt@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
6beea0cda8ce71c01354e688e5735c47e331e84f 24-Aug-2011 Michal Hocko <mhocko@suse.cz> nohz: Fix update_ts_time_stat idle accounting

update_ts_time_stat currently updates idle time even if we are in
iowait loop at the moment. The only real users of the idle counter
(via get_cpu_idle_time_us) are CPU governors and they expect to get
cumulative time for both idle and iowait times.
The value (idle_sleeptime) is also printed to userspace by print_cpu
but it prints both idle and iowait times so the idle part is misleading.

Let's clean this up and fix update_ts_time_stat to account both counters
properly and update consumers of idle to consider iowait time as well.
If we do this we might use get_cpu_{idle,iowait}_time_us from other
contexts as well and we will get expected values.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
Cc: Dave Jones <davej@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Link: http://lkml.kernel.org/r/e9c909c221a8da402c4da07e4cd968c3218f8eb1.1314172057.git.mhocko@suse.cz
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bd74b32b77e0e4af47651da852cb3ced84e423b3 06-Aug-2011 Paul Bolle <pebolle@tiscali.nl> Fix documentation and comment typo 'no_hz'

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Len Brown <len.brown@intel.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
326c86deaed54ad1b364fcafe5073f563671eb58 03-Mar-2011 Thomas Renninger <trenn@suse.de> [CPUFREQ] Remove unneeded locks

There cannot be any concurrent access to these through
different cpu sysfs files anymore, because these tunables
are now all global (not per cpu).

I still have some doubts whether some of these locks
were needed at all. Anyway, let's get rid of them.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
CC: cpufreq@vger.kernel.org
e8951251b89440644a39f2512b4f265973926b41 03-Mar-2011 Thomas Renninger <trenn@suse.de> [CPUFREQ] Remove old, deprecated per cpu ondemand/conservative sysfs files

Marked deprecated for quite a whilte now...

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
CC: cpufreq@vger.kernel.org
ef598549b28014ec2ea7574d4e793728e0e33d02 03-Mar-2011 Thomas Renninger <trenn@suse.de> [CPUFREQ] Remove deprecated sysfs file sampling_rate_max

Marked deprecated for quite a while now...

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
CC: cpufreq@vger.kernel.org
5cb2c3bd0c5e0f3ced63f250ec2ad59d7c5c626a 07-Feb-2011 Vincent Guittot <vincent.guittot@linaro.org> [CPUFREQ] calculate delay after dbs_check_cpu

calculate ondemand delay after dbs_check_cpu call because it can
modify rate_mult value

use freq_lo_jiffies value for the sub sample period of powersave_bias mode

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Dave Jones <davej@redhat.com>
57df5573a56322e6895451f759c19e875252817d 26-Jan-2011 Tejun Heo <tj@kernel.org> cpufreq: use system_wq instead of dedicated workqueues

With cmwq, there's no reason for cpufreq drivers to use separate
workqueues. Remove the dedicated workqueues from cpufreq_conservative
and cpufreq_ondemand and use system_wq instead. The work items are
already sync canceled on stop, so it's already guaranteed that no work
is running on module exit.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dave Jones <davej@redhat.com>
Cc: cpufreq@vger.kernel.org
3f78a9f7fcee0e9b44a15f39ac382664e301fad5 06-Oct-2010 David C Niemi <dniemi@verisign.com> [CPUFREQ] add sampling_down_factor tunable to improve ondemand performance

Adds a new global tunable, sampling_down_factor. Set to 1 it makes no
changes from existing behavior, but set to greater than 1 (e.g. 100)
it acts as a multiplier for the scheduling interval for reevaluating
load when the CPU is at its top speed due to high load. This improves
performance by reducing the overhead of load evaluation and helping
the CPU stay at its top speed when truly busy, rather than shifting
back and forth in speed. This tunable has no effect on behavior at
lower speeds/lower CPU loads.

This patch is against 2.6.36-rc6.

This patch should help solve kernel bug 19672 "ondemand is slow".

Signed-off-by: David Niemi <dniemi@verisign.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
CC: Daniel Hollocher <danielhollocher@gmail.com>
CC: <cpufreq-list@vger.kernel.org>
CC: <linux-kernel@vger.kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
a665df9d510bfd5bac5664f436411f921471264a 11-Mar-2010 Jocelyn Falempe <jocelyn.falempe@motorola.com> [CPUFREQ] ondemand: don't synchronize sample rate unless multiple cpus present

For UP systems this is not required, and results in a more consistent
sample interval.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jocelyn Falempe <jocelyn.falempe@motorola.com>
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
00e299fff3cc2745847b03eebcc9e9362db9366d 27-Jan-2010 Mike Chan <mike@android.com> [CPUFREQ] ondemand: Refactor frequency increase code

Make simpler to read and call.

*** v3 - Always call when powersave_bias is enabled.

Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Dave Jones <davej@redhat.com>
19379b11819efc1fc3b602e64f7e7531050aaddb 09-May-2010 Arjan van de Ven <arjan@linux.intel.com> ondemand: Make the iowait-is-busy time a sysfs tunable

Pavel Machek pointed out that not all CPUs have an efficient
idle at high frequency. Specifically, older Intel and various
AMD cpus would get a higher powerusage when copying files from
USB.

Mike Chan pointed out that the same is true for various ARM
chips as well.

Thomas Renninger suggested to make this a sysfs tunable with a
reasonable default.

This patch adds a sysfs tunable for the new behavior, and uses
a very simple function to determine a reasonable default,
depending on the CPU vendor/type.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082651.46914d04@infradead.org>
[ minor tidyup ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
6b8fcd9029f217a9ecce822db645e19111c11080 09-May-2010 Arjan van de Ven <arjan@linux.intel.com> ondemand: Solve a big performance issue by counting IOWAIT time as busy

The ondemand cpufreq governor uses CPU busy time (e.g. not-idle
time) as a measure for scaling the CPU frequency up or down.
If the CPU is busy, the CPU frequency scales up, if it's idle,
the CPU frequency scales down. Effectively, it uses the CPU busy
time as proxy variable for the more nebulous "how critical is
performance right now" question.

This algorithm falls flat on its face in the light of workloads
where you're alternatingly disk and CPU bound, such as the ever
popular "git grep", but also things like startup of programs and
maildir using email clients... much to the chagarin of Andrew
Morton.

This patch changes the ondemand algorithm to count iowait time
as busy, not idle, time. As shown in the breakdown cases above,
iowait is performance critical often, and by counting iowait,
the proxy variable becomes a more accurate representation of the
"how critical is performance" question.

The problem and fix are both verified with the "perf timechar"
tool.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100509082606.3d9f00d0@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
6dad2a29646ce3792c40cfc52d77e9b65a7bb143 31-Mar-2010 Borislav Petkov <borislav.petkov@amd.com> cpufreq: Unify sysfs attribute definition macros

Multiple modules used to define those which are with identical
functionality and were needlessly replicated among the different cpufreq
drivers. Push them into the header and remove duplication.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
1dbf58881f307e21a3df4b990a5bea401360d02e 21-Dec-2009 Nagananda.Chumbalkar@hp.com <Nagananda.Chumbalkar@hp.com> [CPUFREQ] Fix ondemand to not request targets outside policy limits

Dominik said:
target_freq cannot be below policy->min or above policy->max.
If it were, the whole cpufreq subsystem is broken.

But (answer):
I think the "ondemand" governor can ask for a target frequency that is
below policy->min.
...
A patch such as below may be needed to sanitize the target frequency
requested by "ondemand". The "conservative" governor already has this check:

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>

# diff -bur x/drivers/cpufreq/cpufreq_ondemand.c.orig y/drivers/cpufreq/cpufreq_ondemand.c
54c9a35d9faef06e00e2a941eb8fe674f1886901 12-Nov-2009 Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com> [CPUFREQ] Resolve time unit thinko in ondemand/conservative govs

ondemand and conservative governors are messing up time units in the
code path where NO_HZ is not enabled and ignore_nice is set. The walltime
idletime stored is in jiffies and nice time calculation is happening in
microseconds.

The problem was reported and diagnosed by Alexander here.
http://marc.info/?l=linux-kernel&m=125752550404513&w=2

The patch below fixes this thinko.

Reported-by: Alexander Miller <Miller@fmi.uni-stuttgart.de>
Tested-by: Alexander Miller <Miller@fmi.uni-stuttgart.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
0e625ac153126a0a62b7635fa9dc91f87ff39e38 24-Jul-2009 Thomas Renninger <trenn@suse.de> [CPUFREQ] ondemand - Use global sysfs dir for tuning settings

Ondemand has only global variables for userspace tunings via sysfs.
But they were exposed per CPU which wrongly implies to the user that
his settings are applied per cpu. Also locking sysfs against concurrent
access won't be necessary anymore after deprecation time.

This means the ondemand config dir is moved:
/sys/devices/system/cpu/cpu*/cpufreq/ondemand ->
/sys/devices/system/cpu/cpufreq/ondemand

The old files will still exist, but reading or writing to them will
result in one (printk_once) deprecation msg to syslog per file.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
5a75c82828e7c088ca6e7b4827911dc29cc8e774 03-Jul-2009 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> [CPUFREQ] Cleanup locking in ondemand governor

Redesign the locking inside ondemand driver. Make dbs_mutex handle all the
global state changes inside the driver and invent a new percpu mutex
to serialize percpu timer and frequency limit change.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
7d26e2d5e2da37e92c6c7644b26b294dedd8c982 03-Jul-2009 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> [CPUFREQ] Eliminate the recent lockdep warnings in cpufreq

Commit b14893a62c73af0eca414cfed505b8c09efc613c although it was very
much needed to properly cleanup ondemand timer, opened-up a can of worms
related to locking dependencies in cpufreq.

Patch here defines the need for dbs_mutex and cleans up its usage in
ondemand governor. This also resolves the lockdep warnings reported here

http://lkml.indiana.edu/hypermail/linux/kernel/0906.1/01925.html
http://lkml.indiana.edu/hypermail/linux/kernel/0907.0/00820.html

and few others..

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
245b2e70eabd797932adb263a65da0bab3711753 24-Jun-2009 Tejun Heo <tj@kernel.org> percpu: clean up percpu variable definitions

Percpu variable definition is about to be updated such that all percpu
symbols including the static ones must be unique. Update percpu
variable definitions accordingly.

* as,cfq: rename ioc_count uniquely

* cpufreq: rename cpu_dbs_info uniquely

* xen: move nesting_count out of xen_evtchn_do_upcall() and rename it

* mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
rename it

* ipv4,6: rename cookie_scratch uniquely

* x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
pmc_irq_entry and nmi_entry to pmc_nmi_entry

* perf_counter: rename disable_count to perf_disable_count

* ftrace: rename test_event_disable to ftrace_test_event_disable

* kmemleak: rename test_pointer to kmemleak_test_pointer

* mce: rename next_interval to mce_next_interval

[ Impact: percpu usage cleanups, no duplicate static percpu var names ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Dave Jones <davej@redhat.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: linux-mm <linux-mm@kvack.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Andi Kleen <andi@firstfloor.org>
4f4d1ad6ee69027f51f9d137f7e7d3c863cbc53d 22-Apr-2009 Thomas Renninger <trenn@suse.de> [CPUFREQ] Only set sampling_rate_max deprecated, sampling_rate_min is useful

Update the documentation accordingly.
Cleanup and use printk_once.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
cef9615a853ebc4972084f7e70b52892557420ac 22-Apr-2009 Thomas Renninger <trenn@suse.de> [CPUFREQ] ondemand: Uncouple minimal sampling rate from HZ in NO_HZ case

With this patch you have following minimal sampling rate restrictions:

Kernel restrictions:
If CONFIG_NO_HZ is set, the limit is 10ms fixed.
If CONFIG_NO_HZ is not set or no_hz=off boot parameter is used, the
limits depend on the CONFIG_HZ option:
HZ=1000: min=20000us (20ms)
HZ=250: min=80000us (80ms)
HZ=100: min=200000us (200ms)

HW restrictions:
Do not sample/poll more often than HW latency * 100 exported by the low
level cpufreq HW driver

The higher value of above restrictions is the minimal sampling rate
that can be set (and can be seen via ondemand/sampling_rate_min sysfs file)

Default sampling rate still is HW latency * 1000, but this will now end
up in lower values on latest (Intel and AMD) hardware as these can switch
really fast and sampling rate mostly was limited to the 80ms or 200ms
(depending on whether HZ=250 or HZ=1000 is used).

Signed-off-by: Thomas Renninger <trenn@suse.de>
Cc: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
b14893a62c73af0eca414cfed505b8c09efc613c 17-May-2009 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> [CPUFREQ] fix timer teardown in ondemand governor

* Rafael J. Wysocki (rjw@sisk.pl) wrote:
> This message has been generated automatically as a part of a report
> of regressions introduced between 2.6.28 and 2.6.29.
>
> The following bug entry is on the current list of known regressions
> introduced between 2.6.28 and 2.6.29. Please verify if it still should
> be listed and let me know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13186
> Subject : cpufreq timer teardown problem
> Submitter : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Date : 2009-04-23 14:00 (24 days old)
> References : http://marc.info/?l=linux-kernel&m=124049523515036&w=4
> Handled-By : Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
> Patch : http://patchwork.kernel.org/patch/19754/
> http://patchwork.kernel.org/patch/19753/
>

(updated changelog)

cpufreq fix timer teardown in ondemand governor

The problem is that dbs_timer_exit() uses cancel_delayed_work() when it should
use cancel_delayed_work_sync(). cancel_delayed_work() does not wait for the
workqueue handler to exit.

The ondemand governor does not seem to be affected because the
"if (!dbs_info->enable)" check at the beginning of the workqueue handler returns
immediately without rescheduling the work. The conservative governor in
2.6.30-rc has the same check as the ondemand governor, which makes things
usually run smoothly. However, if the governor is quickly stopped and then
started, this could lead to the following race :

dbs_enable could be reenabled and multiple do_dbs_timer handlers would run.
This is why a synchronized teardown is required.

The following patch applies to, at least, 2.6.28.x, 2.6.29.1, 2.6.30-rc2.

Depends on patch
cpufreq: remove rwsem lock from CPUFREQ_GOV_STOP call

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: gregkh@suse.de
CC: stable@kernel.org
CC: cpufreq@vger.kernel.org
CC: Ingo Molnar <mingo@elte.hu>
CC: rjw@sisk.pl
CC: Ben Slusky <sluskyb@paranoiacs.org>
Signed-off-by: Dave Jones <davej@redhat.com>
112124ab0a9f507a0d7fdbb1e1ed2b9a24f8c4ea 04-Feb-2009 Thomas Renninger <trenn@suse.de> [CPUFREQ] ondemand/conservative: sanitize sampling_rate restrictions

Limit sampling rate to transition_latency * 100 or kernel limits.
If sampling_rate is tried to be set too low, set the lowest allowed value.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
9411b4ef7fcb534fe1582fe02738254e398dd931 04-Feb-2009 Thomas Renninger <trenn@suse.de> [CPUFREQ] ondemand/conservative: deprecate sampling_rate{min,max}

The same info can be obtained via the transition_latency sysfs file

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2b03f891ad3804dd3fa4dadfd33e5dcb200389c5 18-Jan-2009 Dave Jones <davej@redhat.com> [CPUFREQ] checkpatch cleanups for ondemand governor.

Signed-off-by: Dave Jones <davej@redhat.com>
1ca3abdb6a4b87246b00292f048acd344325fd12 23-Jan-2009 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] Make ignore_nice_load setting of ondemand work as expected.

ondemand micro-accounting of idle time changes broke ignore_nice_load
sysfs setting due to a thinko in the code.

The bug entry:
http://bugzilla.kernel.org/show_bug.cgi?id=12310

Reported-by: Jim Bray <jimsantelmo@gmail.com>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
835481d9bcd65720b473db6b38746a74a3964218 04-Jan-2009 Rusty Russell <rusty@rustcorp.com.au> cpumask: convert struct cpufreq_policy to cpumask_var_t

Impact: use new cpumask API to reduce memory usage

This is part of an effort to reduce structure sizes for machines
configured with large NR_CPUS. cpumask_t gets replaced by
cpumask_var_t, which is either struct cpumask[1] (small NR_CPUS) or
struct cpumask * (large NR_CPUS).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
4f6e6b9f97b0ce98a8d1da65adbaf743bd0486a9 18-Sep-2008 Andrea Righi <righi.andrea@gmail.com> [CPUFREQ] Fix BUG: using smp_processor_id() in preemptible code

Use get_cpu()/put_cpu() in cpufreq_ondemand init routine, instead of
smp_processor_id() to avoid the following BUG:

[ 35.313118] BUG: using smp_processor_id() in preemptible [00000000] code=: modprobe/4952
[ 35.313132] caller is cpufreq_gov_dbs_init+0xa/0x8f [cpufreq_ondemand]
[ 35.313140] Pid: 4952, comm: modprobe Not tainted 2.6.27-rc5-mm1 #23
[ 35.313145] Call Trace:
[ 35.313158] [<ffffffff80361ff7>] debug_smp_processor_id+0xd7/0xe0
[ 35.313167] [<ffffffffa010800a>] cpufreq_gov_dbs_init+0xa/0x8f [cpufreq_ondemand]
[ 35.313176] [<ffffffff8020903b>] _stext+0x3b/0x160
[ 35.313185] [<ffffffff804768c5>] __mutex_unlock_slowpath+0xe5/0x190
[ 35.313195] [<ffffffff8026236a>] trace_hardirqs_on_caller+0xca/0x140
[ 35.313205] [<ffffffff8026ef4c>] sys_init_module+0xdc/0x210
[ 35.313212] [<ffffffff8020b7cb>] system_call_fastpath+0x16/0x1b

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
c4d14bc0bb5d13e316890651ae4518b764c3344c 20-Sep-2008 Sven Wegener <sven.wegener@stealer.net> [CPUFREQ] Don't export governors for default governor

We don't need to export the governors for use as the default governor,
because the default governor will be built-in anyway and we can access
the symbol directly.

This also fixes the following sparse warnings:

drivers/cpufreq/cpufreq_conservative.c:578:25: warning: symbol 'cpufreq_gov_conservative' was not declared. Should it be static?
drivers/cpufreq/cpufreq_ondemand.c:582:25: warning: symbol 'cpufreq_gov_ondemand' was not declared. Should it be static?
drivers/cpufreq/cpufreq_performance.c:39:25: warning: symbol 'cpufreq_gov_performance' was not declared. Should it be static?
drivers/cpufreq/cpufreq_powersave.c:38:25: warning: symbol 'cpufreq_gov_powersave' was not declared. Should it be static?
drivers/cpufreq/cpufreq_userspace.c:190:25: warning: symbol 'cpufreq_gov_userspace' was not declared. Should it be static?

Signed-off-by: Sven Wegener <sven.wegener@stealer.net>
Signed-off-by: Dave Jones <davej@redhat.com>
808009131046b62ac434dbc796c0fe8accaab415 04-Aug-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> [CPUFREQ][6/6] cpufreq: Add idle microaccounting in ondemand governor

Use get_cpu_idle_time_us() to get micro-accounted idle information.
This enables ondemand to get more accurate idle and busy timings
than the jiffy based calculation. As a result, we can decrease
the ondemand safety gaurd band from 80-10 to 95-3.

Results in more aggressive power savings.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
e9d95bf7eb929b9ddc9af9f4327b76c77ed4c7d6 04-Aug-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> [CPUFREQ][4/6] cpufreq_ondemand: Parameterize down differential

Use a parameter for down differential, instead of hardcoded 10%. Follow-on
patch changes the down-differential dynamically, based on whether
we are using idle micro-accounting or not.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
3430502d356284ff4f7782d75bb01a402fd3d45e 04-Aug-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> [CPUFREQ][3/6] cpufreq: get_cpu_idle_time() changes in ondemand for idle-microaccounting

Preparatory changes for doing idle micro-accounting in ondemand governor.
get_cpu_idle_time() gets extra parameter and returns idle time and also the
wall time that corresponds to the idle time measurement.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
c43aa3bd99a67009a167430e80c5fde6f37288d8 04-Aug-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> [CPUFREQ][2/6] cpufreq: Change load calculation in ondemand for software coordination

Change the load calculation algorithm in ondemand to work well with software
coordination of frequency across the dependent cpus.

Multiply individual CPU utilization with the average freq of that logical CPU
during the measurement interval (using getavg call). And find the max CPU
utilization number in terms of CPU freq. That number is then used to
get to the target freq for next sampling interval.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
bf0b90e357c883e8efd72954432efe652de74c76 04-Aug-2008 venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com> [CPUFREQ][1/6] cpufreq: Add cpu number parameter to __cpufreq_driver_getavg()

Add a cpu parameter to __cpufreq_driver_getavg(). This is needed for software
cpufreq coordination where policy->cpu may not be same as the CPU on which we
want to getavg frequency.

A follow-on patch will use this parameter to getavg freq from all cpus
in policy->cpus.

Change since last patch. Fix the offline/online and suspend/resume
oops reported by Youquan Song <youquan.song@intel.com>

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
888a794cacd8950ac6be701db20b62a4ab2ce90c 13-Jul-2008 Akinobu Mita <akinobu.mita@gmail.com> [CPUFREQ] add error handling for cpufreq_register_governor() error

Add error handling for cpufreq_register_governor() error

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: cpufreq@lists.linux.org.uk
Signed-off-by: Dave Jones <davej@redhat.com>
068b12772a64c2440ef2f64ac5d780688c06576f 12-May-2008 Mike Travis <travis@sgi.com> cpufreq: use performance variant for_each_cpu_mask_nr

Change references from for_each_cpu_mask to for_each_cpu_mask_nr
where appropriate

Reviewed-by: Paul Jackson <pj@sgi.com>
Reviewed-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Mike Travis <travis@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
6915719b36a97d28fab576c6fa2a20364b435fe6 18-Jan-2008 Johannes Weiner <hannes@saeurebad.de> cpufreq: Initialise default governor before use

When the cpufreq driver starts up at boot time, it calls into the default
governor which might not be initialised yet. This hurts when the
governor's worker function relies on memory that is not yet set up by its
init function.

This migrates all governors from module_init() to fs_initcall() when being
the default, as was already done in cpufreq_performance when it was the
only possible choice. The performance governor is always initialized early
because it might be used as fallback even when not being the default.

Fixes at least one actual oops where ondemand is the default governor and
cpufreq_governor_dbs() uses the uninitialised kondemand_wq work-queue
during boot-time.

Signed-off-by: Johannes Weiner <hannes@saeurebad.de>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1c2562459faedc35927546cfa5273ec6c2884cce 02-Oct-2007 Thomas Renninger <trenn@suse.de> [CPUFREQ] allow ondemand and conservative cpufreq governors to be used as default

Depending on the transition latency of the HW for cpufreq switches, the
ondemand or conservative governor cannot be used with certain cpufreq
drivers. Still the ondemand should be the default governor on a wide range
of systems. This patch allows this and lets the governor fallback to the
performance governor at cpufreq driver load time, if the driver does not
support fast enough frequency switching.

Main benefit is that on e.g. installation or other systems without
userspace support a working dynamic cpufreq support can be achieved on most
systems by simply loading the cpufreq driver. This is especially essential
for recent x86(_64) laptop hardware which may rely on working dynamic
cpufreq OS support.

Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Cc: Andi Kleen <ak@suse.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
ea48761519bd40d7a881c587b5f3177664b2987e 20-Jun-2007 Venki Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] ondemand: fix tickless accounting and software coordination bug

With tickless kernel and software coordination os P-states, ondemand
can look at wrong idle statistics. This can happen when ondemand sampling
is happening on CPU 0 and due to software coordination sampling also looks at
utilization of CPU 1. If CPU 1 is in tickless state at that moment, its idle
statistics will not be uptodate and CPU 0 thinks CPU 1 is idle for less
amount of time than it actually is.

This can be resolved by looking at all the busy times of CPUs, which is
accurate, even with tickless, and use that to determine idle time in a
round about way (total time - busy time).

Thanks to Arjan for originally reporting the ondemand bug on
Lenovo T61.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
0af99b13c9f323e658b4f1d69a1ccae7d6f3f80a 20-Jun-2007 Venki Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] ondemand: add a check to avoid negative load calculation

Due to rounding and inexact jiffy accounting, idle_ticks can sometimes
be higher than total_ticks. Make sure those cases are handled as
zero load case.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
28287033e12463c8ff89f1ea8038783d0360391c 08-May-2007 Venki Pallipadi <venkatesh.pallipadi@intel.com> Add a new deferrable delayed work init

Add a new deferrable delayed work init. This can be used to schedule work
that are 'unimportant' when CPU is idle and can be called later, when CPU
eventually comes out of idle.

Use this init in cpufreq ondemand governor.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
48ac3271e52d23ee987da93f80d20f6bec8e6717 18-Feb-2007 Oleg Nesterov <oleg@tv-sign.ru> [CPUFREQ] cpufreq_ondemand.c: don't use _WORK_NAR

Looks like dbs_timer() is very careful wrt per_cpu(cpu_dbs_info),
and it doesn't need the help of WORK_STRUCT_NOAUTOREL.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Acked-By: David Howells <dhowells@redhat.com>
Signed-off-by: Dave Jones <davej@redhat.com>
c18a1483f478adbeb4cc7148db22c4a9c10aaee3 11-Feb-2007 Dave Jones <davej@redhat.com> [CPUFREQ] Whitespace fixup

Signed-off-by: Dave Jones <davej@redhat.com>
56463b78cdca8e9ff8cc1759bca0c0777a061d6b 06-Feb-2007 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] ondemand governor use new cpufreq rwsem locking in work callback

Eliminate flush_workqueue in cpufreq_governor(STOP) callpath. Using flush
there has a deadlock potential as in

http://uwsg.iu.edu/hypermail/linux/kernel/0611.3/1223.html

Also, cleanup the locking issues with do_dbs_timer delayed_work callback. As
it changes the CPU frequency using __cpufreq_target, it needs to have
policy_rwsem in write mode, which also protects it from hot plug.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
529af7a14f04f92213bac371931a2b2b060c63fa 06-Feb-2007 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] ondemand governor restructure the work callback

Restructure the delayed_work callback in ondemand.

This eliminates the need for smp_processor_id in the callback function and
also helps in proper locking and avoiding flush_workqueue when stopping the
governor (done in subsequent patch).

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
c120069779e3e35917c15393cf2847fa79811eb6 06-Feb-2007 Dave Jones <davej@redhat.com> [CPUFREQ] Remove hotplug cpu crap

The hotplug CPU locking in cpufreq is horrendous. No-one seems to care
enough to fix it, so just remove it so that the 99.9% of the real world
users of this code can use cpufreq without being bothered by warnings.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
c4028958b6ecad064b1a6303a6a5906d4fe48d73 22-Nov-2006 David Howells <dhowells@redhat.com> WorkStruct: make allyesconfig

Fix up for make allyesconfig.

Signed-Off-By: David Howells <dhowells@redhat.com>
e08f5f5bb5dfaaa28d69ffe37eb774533297657f 26-Oct-2006 Gautham R Shenoy <ego@in.ibm.com> [CPUFREQ] Fix coding style issues in cpufreq.

Clean up cpufreq subsystem to fix coding style issues and to improve
the readability.

Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Dave Jones <davej@redhat.com>
914f7c31b0bea0ccf3bf474d0b99d803f7985097 20-Oct-2006 Jeff Garzik <jeff@garzik.org> [CPUFREQ] handle sysfs errors

Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
dfde5d62ed9b28b0bda676c16e8cb635df244ef2 03-Oct-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ][8/8] acpi-cpufreq: Add support for freq feedback from hardware

Enable ondemand governor and acpi-cpufreq to use IA32_APERF and IA32_MPERF MSR
to get active frequency feedback for the last sampling interval. This will
make ondemand take right frequency decisions when hardware coordination of
frequency is going on.

Without APERF/MPERF, ondemand can take wrong decision at times due
to underlying hardware coordination or TM2.
Example:
* CPU 0 and CPU 1 are hardware cooridnated.
* CPU 1 running at highest frequency.
* CPU 0 was running at highest freq. Now ondemand reduces it to
some intermediate frequency based on utilization.
* Due to underlying hardware coordination with other CPU 1, CPU 0 continues to
run at highest frequency (as long as other CPU is at highest).
* When ondemand samples CPU 0 again next time, without actual frequency
feedback from APERF/MPERF, it will think that previous frequency change
was successful and can go to wrong target frequency. This is because it
thinks that utilization it has got this sampling interval is when running at
intermediate frequency, rather than actual highest frequency.

More information about IA32_APERF IA32_MPERF MSR:
Refer to IA-32 Intel® Architecture Software Developer's Manual at
http://developer.intel.com

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
3906f4edeef976c081c4e7bd92164d2f59c325ae 05-Sep-2006 Dave Jones <davej@redhat.com> [CPUFREQ] Fix sparse warning in ondemand

drivers/cpufreq/cpufreq_ondemand.c:323:2: warning: Using plain integer as NULL pointer

Signed-off-by: Dave Jones <davej@redhat.com>
b5ecf60fe6b18de0bc59d336d444835d4ef835ed 13-Aug-2006 Adrian Bunk <bunk@stusta.de> [CPUFREQ] make drivers/cpufreq/cpufreq_ondemand.c:powersave_bias_target() static

This patch makes the needlessly global powersave_bias_target() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>
05ca0350e8caa91a5ec9961c585c98005b6934ea 31-Jul-2006 Alexey Starikovskiy <alexey_y_starikovskiy@linux.intel.com> [CPUFREQ][2/2] ondemand: updated add powersave_bias tunable

ondemand selects the minimum frequency that can retire
a workload with negligible idle time -- ideally resulting in the highest
performance/power efficiency with negligible performance impact.

But on some systems and some workloads, this algorithm
is more performance biased than necessary, and
de-tuning it a bit to allow some performance impact
can save measurable power.

This patch adds a "powersave_bias" tunable to ondemand
to allow it to reduce its target frequency by a specified percent.

By default, the powersave_bias is 0 and has no effect.
powersave_bias is in units of 0.1%, so it has an effective range
of 1 through 1000, resulting in 0.1% to 100% impact.

In practice, users will not be able to detect a difference between
0.1% increments, but 1.0% increments turned out to be too large.
Also, the max value of 1000 (100%) would simply peg the system
in its deepest power saving P-state, unless the processor really has
a hardware P-state at 0Hz:-)

For example, If ondemand requests 2.0GHz based on utilization,
and powersave_bias=100, this code will knock 10% off the target
and seek a target of 1.8GHz instead of 2.0GHz until the
next sampling. If 1.8 is an exact match with an hardware frequency
we use it, otherwise we average our time between the frequency
next higher than 1.8 and next lower than 1.8.

Note that a user or administrative program can change powersave_bias
at run-time depending on how they expect the system to be used.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi at intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy at intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
1ce28d6b19112a7c76af8e971e2de3109d19a943 31-Jul-2006 Alexey Starikovskiy <alexey_y_starikovskiy@linux.intel.com> [CPUFREQ][1/2] ondemand: updated tune for hardware coordination

Try to make dbs_check_cpu() call on all CPUs at the same jiffy.
This will help when multiple cores share P-states via Hardware Coordination.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi at intel.com>
Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy at intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
153d7f3fcae7ed4e19328549aa9467acdfbced10 26-Jul-2006 Arjan van de Ven <arjan@linux.intel.com> [PATCH] Reorganize the cpufreq cpu hotplug locking to not be totally bizare

The patch below moves the cpu hotplugging higher up in the cpufreq
layering; this is needed to avoid recursive taking of the cpu hotplug
lock and to otherwise detangle the mess.

The new rules are:
1. you must do lock_cpu_hotplug() around the following functions:
__cpufreq_driver_target
__cpufreq_governor (for CPUFREQ_GOV_LIMITS operation only)
__cpufreq_set_policy
2. governer methods (.governer) must NOT take the lock_cpu_hotplug()
lock in any way; they are called with the lock taken already
3. if your governer spawns a thread that does things, like calling
__cpufreq_driver_target, your thread must honor rule #1.
4. the policy lock and other cpufreq internal locks nest within
the lock_cpu_hotplug() lock.

I'm not entirely happy about how the __cpufreq_governor rule ended up
(conditional locking rule depending on the argument) but basically all
callers pass this as a constant so it's not too horrible.

The patch also removes the cpufreq_governor() function since during the
locking audit it turned out to be entirely unused (so no need to fix it)

The patch works on my testbox, but it could use more testing
(otoh... it can't be much worse than the current code)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2cd7cbdf4bd0d0fe58e4dc903e8b413412595504 23-Jul-2006 Linus Torvalds <torvalds@macmini.osdl.org> [cpufreq] ondemand: make shutdown sequence more robust

Shutting down the ondemand policy was fraught with potential
problems, causing issues for SMP suspend (which wants to hot-
unplug) all but the last CPU.

This should fix at least the worst problems (divide-by-zero
and infinite wait for the workqueue to shut down).

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
ffac80e925e54d84f6ea580231aa46d0ef051756 28-Jun-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] Misc cleanups in ondemand.

Misc cleanups in ondemand. Should have zero functional impact.
Also adding Alexey as author.

Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2f8a835c705794f71726eb12c06fb0f24fe07ed3 28-Jun-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] Make ondemand sampling per CPU and remove the mutex usage in sampling path.

Make ondemand sampling per CPU and remove the mutex usage in sampling path.

Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
ccb2fe209dac9ff67f6351e783e610073afaaeaf 28-Jun-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] Remove slowdown from ondemand sampling path.

Remove slowdown from ondemand sampling path. This reduces the code path length
in dbs_check_cpu() by half. slowdown was not used by ondemand by default.
If there are any user level tools that were using this tunable, they
may report error now.

Signed-off-by: Alexey Starikovskiy <alexey.y.starikovskiy@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
138a0128c01899ee8492858a50856b90d0d9d815 23-Jun-2006 Andrew Morton <akpm@osdl.org> [PATCH] cpufreq build fix

drivers/cpufreq/cpufreq_ondemand.c: In function 'do_dbs_timer':
drivers/cpufreq/cpufreq_ondemand.c:374: warning: implicit declaration of function 'lock_cpu_hotplug'
drivers/cpufreq/cpufreq_ondemand.c:381: warning: implicit declaration of function 'unlock_cpu_hotplug'
drivers/cpufreq/cpufreq_conservative.c: In function 'do_dbs_timer':
drivers/cpufreq/cpufreq_conservative.c:425: warning: implicit declaration of function 'lock_cpu_hotplug'
drivers/cpufreq/cpufreq_conservative.c:432: warning: implicit declaration of function 'unlock_cpu_hotplug'

Cc: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
4ec223d02f4d5f5a3129edc0e3d22550d6ac8a32 22-Jun-2006 Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> [CPUFREQ] Fix ondemand vs suspend deadlock

Rootcaused the bug to a deadlock in cpufreq and ondemand. Due to non-existent
ordering between cpu_hotplug lock and dbs_mutex. Basically a race condition
between cpu_down() and do_dbs_timer().

cpu_down() flow:
* cpu_down() call for CPU 1
* Takes hot plug lock
* Calls pre down notifier
* cpufreq notifier handler calls cpufreq_driver_target() which takes
cpu_hotplug lock again. OK as cpu_hotplug lock is recursive in same
process context
* CPU 1 goes down
* Calls post down notifier
* cpufreq notifier handler calls ondemand event stop which takes dbs_mutex

So, cpu_hotplug lock is taken before dbs_mutex in this flow.

do_dbs_timer is triggerred by a periodic timer event.
It first takes dbs_mutex and then takes cpu_hotplug lock in
cpufreq_driver_target().
Note the reverse order here compared to above. So, if this timer event happens
at right moment during cpu_down, system will deadlok.

Attached patch fixes the issue for both ondemand and conservative.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
6810b548b25114607e0814612d84125abccc0a4f 08-May-2006 Andi Kleen <ak@suse.de> [PATCH] x86_64: Move ondemand timer into own work queue

Taking the cpu hotplug semaphore in a normal events workqueue
is unsafe because other tasks can wait for any workqueues with
it hold. This results in a deadlock.

Move the DBS timer into its own work queue which is not
affected by other work queue flushes to avoid this.

Has been acked by Venkatesh.

Cc: venkatesh.pallipadi@intel.com
Cc: cpufreq@lists.linux.org.uk
Signed-off-by: Andi Kleen <ak@suse.de>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
7c9d8c0e84d395a01289ebd1597758939a875a86 26-Mar-2006 Dominik Brodowski <linux@dominikbrodowski.net> [PATCH] cpufreq_ondemand: add range check

Assert that cpufreq_target is, at least, called with the minimum frequency
allowed by this policy, not something lower. It triggered problems on ARM.

Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
9cbad61b41f0b6f0a4c600fe96d8292ffd592b50 10-Mar-2006 Eric Piel <Eric.Piel@tremplin-utc.net> [PATCH] cpufreq_ondemand: keep ignore_nice_load value when it is reselected

Keep the value of ignore_nice_load of the ondemand governor even after
the governor has been deselected and selected back. This is the behavior
of the other exported values of the ondemand governor and it's much more
user-friendly.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
ff8c288d7d1a368b663058cdee1ea0adcdef2fa2 10-Mar-2006 Eric Piel <Eric.Piel@tremplin-utc.net> [PATCH] cpufreq_ondemand: Warn if it cannot run due to too long transition latency

Display a warning if the ondemand governor can not be selected due to a
transition latency of the cpufreq driver which is too long.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
32ee8c3e470d86588b51dc42ed01e85c5fa0f180 28-Feb-2006 Dave Jones <davej@redhat.com> [CPUFREQ] Lots of whitespace & CodingStyle cleanup.

Signed-off-by: Dave Jones <davej@redhat.com>
3fc54d37ab64733448faf0185e19a80f070eb9e3 14-Jan-2006 akpm@osdl.org <akpm@osdl.org> [CPUFREQ] Convert drivers/cpufreq semaphores to mutexes.

Semaphore to mutex conversion.

The conversion was generated via scripts, and the result was validated
automatically via a script as well.

Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
001893cda2f280ab882164737a0b608208524809 01-Dec-2005 Alexander Clouter <alex@digriz.org.uk> [PATCH] cpufreq_conservative/ondemand: invert meaning of 'ignore nice'

The use of the 'ignore_nice' sysfs file is confusing to anyone using it.
This removes the sysfs file 'ignore_nice' and in its place creates a
'ignore_nice_load' entry that defaults to '0'; meaning nice'd processes
_are_ counted towards the 'business' calculation.

WARNING: this obvious breaks any userland tools that expected ignore_nice'
to exist, to draw attention to this fact it was concluded on the mailing
list that the entry should be removed altogether so the userland app breaks
and so the author can build simple to detect workaround. Having said that
it seems currently very few tools even make use of this functionality; all
I could find was a Gentoo Wiki entry.

Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Dave Jones <davej@redhat.com>
df8b59be0976c56820453730078bef99a8d1dbda 20-Sep-2005 Dave Jones <davej@redhat.com> [CPUFREQ] Avoid the ondemand cpufreq governor to use a too high frequency for stats.

The problem is in the ondemand governor, there is a periodic measurement
of the CPU usage. This CPU usage is updated by the scheduler after every
tick (basically, by adding 1 either to "idle" or to "user" or to
"system"). So if the frequency of the governor is too high, the stat
will be meaningless (as mostly no number have changed).

So this patch checks that the measurements are separated by at least 10
ticks. It means that by default, stats will have about 5% error (20
ticks). Of course those numbers can be argued but, IMHO, they look sane.
The patch also includes a small clean-up to check more explictly the
result of the conversion from ns to µs being null.

Let's note that (on x86) this has never been really needed before 2.6.13
because HZ was always 1000. Now that HZ can be 100, some CPU might be
affected by this problem. For instance when HZ=100, the centrino ,which
has a 10µs transition latency, would lead to the governor allowing to
read stats every tick (10ms)!

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Dave Jones <davej@redhat.com>
e131832ca7d3a3e5f9c7624bb310a7747dc2b57c 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] ondemand governor default sampling downfactor as 1

[PATCH] [5/5] ondemand governor default sampling downfactor as 1

Make default sampling downfactor 1.
This works better with earlier auto downscaling change in ondemand governor.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
c29f1403098135bdef75b190a5037db514701031 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] ondemand governor automatic downscaling

[PATCH] [4/5] ondemand governor automatic downscaling

Here is a change of policy for the ondemand governor. The modification
concerns the frequency downscaling. Instead of decreasing to a lower
frequency when the CPU usage is under 20%, this new policy automatically
scales to the optimal frequency. The optimal frequency being the lowest
frequency which provides enough power to not trigger the upscaling policy.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
9c7d269b9b05440dd0fe92d96f4e5d7e73dd7238 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] ondemand,conservative governor idle_tick clean-up

[PATCH] [3/5] ondemand,conservative governor idle_tick clean-up

Ondemand and conservative governor clean-up, it factorises the idle ticks
measurement.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
790d76fa979f55bfc49a6901bb911778949b582d 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] ondemand,conservative governor store the idle ticks for all cpus

[PATCH] [2/5] ondemand,conservative governor store the idle ticks for all cpus

Ondemand, conservative governor did not store prev_cpu_idle_up into
prev_cpu_idle_down for other CPUs than the current CPU.

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
dac1c1a56279b4545a822ec7bc770003c233e546 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] ondemand,conservative minor bug-fix and cleanup

[PATCH] [1/5] ondemand,conservative minor bug-fix and cleanup

Attached patch fixes some minor issues with Alexander's patch and related
cleanup in both ondemand and conservative governor.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
1206aaac285904e3e3995eecbf4129b6555a8973 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] Allow ondemand stepping to be changed by user.

Adds support so that the cpufreq change stepping is no longer fixed at 5% and
can be changed dynamically by the user

Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
c11420a616039e2181e4ecbffb4d125d39e6877d 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] Prevents un-necessary cpufreq changes if we are already at min/max

Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
3d5ee9e55d13de28d2fa58d6e13f2e4d3a5f8b1a 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] Add support to cpufreq_ondemand to ignore 'nice' cpu time

Signed-off-by: Alexander Clouter <alex-kernel@digriz.org.uk>
Signed-off-by: Dave Jones <davej@redhat.com>
7f335d4ef2d50a693fad70b8fa053d0382f4a45c 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] make cpufreq_gov_dbs static

This patch makes a needlessly global and EXPORT_SYMBOL'ed struct static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Dave Jones <davej@redhat.com>
6fe711658fcf92d39d84c0b7e6332ed6625dc520 01-Jun-2005 Dave Jones <davej@redhat.com> [CPUFREQ] ondemand: trivial clean-ups

Trivial ondemand governor clean-ups:
- change from sampling_rate_in_HZ() to the official function
usecs_to_jiffies().
- use for_each_online_cpu() to instead of using "if (cpu_online(i))"

Signed-off-by: Eric Piel <eric.piel@tremplin-utc.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 17-Apr-2005 Linus Torvalds <torvalds@ppc970.osdl.org> Linux-2.6.12-rc2

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!