Cross Reference: /kernel/rcu/tree

History log of /kernel/rcu/tree_plugin.h
Revision	Date	Author	Comments
d7e29933969e5ca7c112ce1368a07911f4485dc2	27-Oct-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Make rcu_barrier() understand about missing rcuo kthreads Commit 35ce7f29a44a (rcu: Create rcuo kthreads only for onlined CPUs) avoids creating rcuo kthreads for CPUs that never come online. This fixes a bug in many instances of firmware: Instead of lying about their age, these systems instead lie about the number of CPUs that they have. Before commit 35ce7f29a44a, this could result in huge numbers of useless rcuo kthreads being created. It appears that experience indicates that I should have told the people suffering from this problem to fix their broken firmware, but I instead produced what turned out to be a partial fix. The missing piece supplied by this commit makes sure that rcu_barrier() knows not to post callbacks for no-CBs CPUs that have not yet come online, because otherwise rcu_barrier() will hang on systems having firmware that lies about the number of CPUs. It is tempting to simply have rcu_barrier() refuse to post a callback on any no-CBs CPU that does not have an rcuo kthread. This unfortunately does not work because rcu_barrier() is required to wait for all pending callbacks. It is therefore required to wait even for those callbacks that cannot possibly be invoked. Even if doing so hangs the system. Given that posting a callback to a no-CBs CPU that does not yet have an rcuo kthread can hang rcu_barrier(), It is tempting to report an error in this case. Unfortunately, this will result in false positives at boot time, when it is perfectly legal to post callbacks to the boot CPU before the scheduler has started, in other words, before it is legal to invoke rcu_barrier(). So this commit instead has rcu_barrier() avoid posting callbacks to CPUs having neither rcuo kthread nor pending callbacks, and has it complain bitterly if it finds CPUs having no rcuo kthread but some pending callbacks. And when rcu_barrier() does find CPUs having no rcuo kthread but pending callbacks, as noted earlier, it has no choice but to hang indefinitely. Reported-by: Yanko Kaneti <yaneti@declera.com> Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Reported-by: Meelis Roos <mroos@linux.ee> Reported-by: Eric B Munson <emunson@akamai.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Eric B Munson <emunson@akamai.com> Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com> Tested-by: Yanko Kaneti <yaneti@declera.com> Tested-by: Kevin Fenzi <kevin@scrye.com> Tested-by: Meelis Roos <mroos@linux.ee>
dd56af42bd829c6e770ed69812bd65a04eaeb1e4	26-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Eliminate deadlock between CPU hotplug and expedited grace periods Currently, the expedited grace-period primitives do get_online_cpus(). This greatly simplifies their implementation, but means that calls to them holding locks that are acquired by CPU-hotplug notifiers (to say nothing of calls to these primitives from CPU-hotplug notifiers) can deadlock. But this is starting to become inconvenient, as can be seen here: https://lkml.org/lkml/2014/8/5/754. The problem in this case is that some developers need to acquire a mutex from a CPU-hotplug notifier, but also need to hold it across a synchronize_rcu_expedited(). As noted above, this currently results in deadlock. This commit avoids the deadlock and retains the simplicity by creating a try_get_online_cpus(), which returns false if the get_online_cpus() reference count could not immediately be incremented. If a call to try_get_online_cpus() returns true, the expedited primitives operate as before. If a call returns false, the expedited primitives fall back to normal grace-period operations. This falling back of course results in increased grace-period latency, but only during times when CPU hotplug operations are actually in flight. The effect should therefore be negligible during normal operation. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Tested-by: Lan Tianyu <tianyu.lan@intel.com>
c847f14217d5aec5336272a54a32ffcf6e06ddcb	12-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Avoid misordering in nocb_leader_wait() The NOCB follower wakeup ordering depends on the store to the tail pointer happening before the wakeup. However, because atomic_long_add() does not return a value, it does not provide ordering guarantees, and the locking in wake_up() only guarantees that the store will happen before the unlock, which might be too late. Even though this is only a theoretical issue, this commit adds a smp_mb__after_atomic() after the final atomic_long_add() to provide the needed ordering guarantee. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
1772947bd0126661866069157e95197e9c0020e9	12-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Handle NOCB callbacks from irq-disabled idle code If an RCU callback is queued on a no-CBs CPU from idle code with irqs disabled, and if that CPU stays idle forever after, the callback will never be invoked. This commit therefore adds a check for this situation in ____call_rcu_nocb(), invoking the RCU core solely for the purpose of the ensuing return-to-idle transition. (If the CPU doesn't return to idle, the next scheduling-clock interrupt will fix things up.) Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
39953dfd40077c7480b1d5deb4d617e086b1c865	12-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Avoid misordering in __call_rcu_nocb_enqueue() The NOCB leader wakeup ordering depends on the store to the header happening before the check for the leader already being awake. However, because atomic_long_add() does not return a value, it does not provide ordering guarantees, the incorrect comment in wake_nocb_leader() notwithstanding. This commit therefore adds a smp_mb__after_atomic() after the final atomic_long_add() to provide the needed ordering guarantee. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
663e131090dd10bac9dc0b4f5b624dd3211b20f6	21-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Don't track sysidle state if no nohz_full= CPUs If there are no nohz_full= CPUs, then there is currently no reason to track sysidle state. This commit therefore short-circuits this state tracking if !tick_nohz_full_enabled(). Note that these checks will need to be revisited if nohz_full= state can ever be changed at runtime. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
417e8d26557c4264a484d78a7491316751afa46f	21-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Eliminate redundant rcu_sysidle_state variable Now that we have rcu_state_p, which references rcu_preempt_state for TREE_PREEMPT_RCU and rcu_sched_state for TREE_RCU, we don't need a separate rcu_sysidle_state variable. This commit therefore eliminates rcu_preempt_state in favor of rcu_state_p. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
22c2f669611590b428647ac9a73bc63ef3989d4b	18-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Check for have_rcu_nocb_mask instead of rcu_nocb_mask If we configure a kernel with CONFIG_NOCB_CPU=y, CONFIG_RCU_NOCB_CPU_NONE=y and CONFIG_CPUMASK_OFFSTACK=n and do not pass in a rcu_nocb= boot parameter, the cpumask rcu_nocb_mask can be garbage instead of NULL. Hence this commit replaces checks for rcu_nocb_mask == NULL with a check for have_rcu_nocb_mask. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
35ce7f29a44a888c45c0a9f202f69e10613c5306	11-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Create rcuo kthreads only for onlined CPUs RCU currently uses for_each_possible_cpu() to spawn rcuo kthreads, which can result in more rcuo kthreads than one would expect, for example, derRichard reported 64 CPUs worth of rcuo kthreads on an 8-CPU image. This commit therefore creates rcuo kthreads only for those CPUs that actually come online. This was reported by derRichard on the OFTC IRC network. Reported-by: Richard Weinberger <richard@nod.at> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
9386c0b75dda05f535a10ea1abf1817fe292c81c	13-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Rationalize kthread spawning Currently, RCU spawns kthreads from several different early_initcall() functions. Although this has served RCU well for quite some time, as more kthreads are added a more deterministic approach is required. This commit therefore causes all of RCU's early-boot kthreads to be spawned from a single early_initcall() function. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
f4aa84ba24872e3a8e59b58bc8533cae95597f2e	09-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() Return false instead of 0 in rcu_nocb_adopt_orphan_cbs() as this has bool as return type. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
4afc7e269befc7b6e09a994e48c67e36f4a378e1	09-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Use false for return in __call_rcu_nocb() Return false instead of 0 in __call_rcu_nocb() as this has bool as return type. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
0a9e1e111b3a9e1c21d2dd27ca361cd9601d99af	09-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Use true/false for return in rcu_nocb_adopt_orphan_cbs() Return true/false in rcu_nocb_adopt_orphan_cbs() instead of 0/1 as this function has return type of bool. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
c271d3a957384a162f7a6aae53455d8e8afd1f3e	09-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Use true/false for return in __call_rcu_nocb() Return true/false instead of 0/1 in __call_rcu_nocb() as this returns a bool type. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
949cccdbe6d286544ce3fe170298183eb7ada81c	26-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Check the return value of zalloc_cpumask_var() This commit checks the return value of the zalloc_cpumask_var() used for allocating cpumask for rcu_nocb_mask. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
f4579fc57cf4244057b713b1f73f4dc9f0b11e97	25-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Fix attempt to avoid unsolicited offloading of callbacks Commit b58cc46c5f6b (rcu: Don't offload callbacks unless specifically requested) failed to adjust the callback lists of the CPUs that are known to be no-CBs CPUs only because they are also nohz_full= CPUs. This failure can result in callbacks that are posted during early boot getting stranded on nxtlist for CPUs whose no-CBs property becomes apparent late, and there can also be spurious warnings about offline CPUs posting callbacks. This commit fixes these problems by adding an early-boot rcu_init_nohz() that properly initializes the no-CBs CPUs. Note that kernels built with CONFIG_RCU_NOCB_CPU_ALL=y or with CONFIG_RCU_NOCB_CPU=n do not exhibit this bug. Neither do kernels booted without the nohz_full= boot parameter. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Pranith Kumar <bobby.prani@gmail.com> Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
284a8c93af47306beed967a303d84730b32bab39	15-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Per-CPU operation cleanups to rcu_*_qs() functions The rcu_bh_qs(), rcu_preempt_qs(), and rcu_sched_qs() functions use old-style per-CPU variable access and write to ->passed_quiesce even if it is already set. This commit therefore updates to use the new-style per-CPU variable access functions and avoids the spurious writes. This commit also eliminates the "cpu" argument to these functions because they are always invoked on the indicated CPU. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
1d082fd061884a587c490c4fc8a2056ce1e47624	15-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Remove local_irq_disable() in rcu_preempt_note_context_switch() The rcu_preempt_note_context_switch() function is on a scheduling fast path, so it would be good to avoid disabling irqs. The reason that irqs are disabled is to synchronize process-level and irq-handler access to the task_struct ->rcu_read_unlock_special bitmask. This commit therefore makes ->rcu_read_unlock_special instead be a union of bools with a short allowing single-access checks in RCU's __rcu_read_unlock(). This results in the process-level and irq-handler accesses being simple loads and stores, so that irqs need no longer be disabled. This commit therefore removes the irq disabling from rcu_preempt_note_context_switch(). Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
176f8f7a52cc6d09d686f0d900abda6942a52fbb	05-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Make TASKS_RCU handle nohz_full= CPUs Currently TASKS_RCU would ignore a CPU running a task in nohz_full= usermode execution. There would be neither a context switch nor a scheduling-clock interrupt to tell TASKS_RCU that the task in question had passed through a quiescent state. The grace period would therefore extend indefinitely. This commit therefore makes RCU's dyntick-idle subsystem record the task_struct structure of the task that is running in dyntick-idle mode on each CPU. The TASKS_RCU grace period can then access this information and record a quiescent state on behalf of any CPU running in dyntick-idle usermode. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
bde6c3aa993066acb0d6ce32ecabe03b9d5df92d	01-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Provide cond_resched_rcu_qs() to force quiescent states in long loops RCU-tasks requires the occasional voluntary context switch from CPU-bound in-kernel tasks. In some cases, this requires instrumenting cond_resched(). However, there is some reluctance to countenance unconditionally instrumenting cond_resched() (see http://lwn.net/Articles/603252/), so this commit creates a separate cond_resched_rcu_qs() that may be used in place of cond_resched() in locations prone to long-duration in-kernel looping. This commit currently instruments only RCU-tasks. Future possibilities include also instrumenting RCU, RCU-bh, and RCU-sched in order to reduce IPI usage. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
73a860cd58a1eb258e889b615cebf738ab33aa23	14-Aug-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Replace flush_signals() with WARN_ON(signal_pending()) Currently, when RCU awakens from a wait_event_interruptible() that might have awakened prematurely, it does a flush_signals(). This is done on the off-chance that someone figured out how to deliver a signal to a kthread, which is supposed to be impossible. Given that this is supposed to be impossible, this commit changes the flush_signals() calls into WARN_ON(signal_pending()). Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
9fdd3bc9005824704f9802bec7b3e06f5edae434	29-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Break more call_rcu() deadlock involving scheduler and perf Commit 96d3fd0d315a9 (rcu: Break call_rcu() deadlock involving scheduler and perf) covered the case where __call_rcu_nocb_enqueue() needs to wake the rcuo kthread due to the queue being initially empty, but did not do anything for the case where the queue was overflowing. This commit therefore also defers wakeup for the overflow case. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
d0bc90fd37e50e4ea22c51c26947fd78c2a7a6c2	09-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Return bool type for rcu_try_advance_all_cbs() Return a bool type instead of 0 in rcu_try_advance_all_cbs(). Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
bf33eb1aef23e8049cd222471d35b0988c420b18	09-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Fix sparse warning about rcu_batches_completed_preempt() being non-static fix sparse warning about rcu_batches_completed_preempt() being non-static by marking it as static Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
4de376a1b14e32f550931274f06b571abc0f3d4b	08-Jul-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Remove remaining read-modify-write ACCESS_ONCE() calls Change the remaining uses of ACCESS_ONCE() so that each ACCESS_ONCE() either does a load or a store, but not both. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
11ed7f934cb807f26da09547b5946c2e534d1dac	27-Aug-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Make nocb leader kthreads process pending callbacks after spawning The nocb callbacks generated before the nocb kthreads are spawned are enqueued in the nocb queue for later processing. Commit fbce7497ee5af ("rcu: Parallelize and economize NOCB kthread wakeups") introduced nocb leader kthreads which checked the nocb_leader_wake flag to see if there were any such pending callbacks. A case was reported in which newly spawned leader kthreads were not processing the pending callbacks as this flag was not set, which led to a boot hang. The following commit ensures that the newly spawned nocb kthreads process the pending callbacks by allowing the kthreads to run immediately after spawning instead of waiting. This is done by inverting the logic of nocb_leader_wake tests to nocb_leader_sleep which allows us to use the default initialization of this flag to 0 to let the kthreads run. Reported-by: Amit Shah <amit.shah@redhat.com> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Link: http://www.spinics.net/lists/kernel/msg1802899.html [ paulmck: Backported to v3.17-rc2. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Amit Shah <amit.shah@redhat.com>
187497fa5e9e9383820d33e48b87f8200a747c2a	16-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Allow for NULL tick_nohz_full_mask when nohz_full= missing If there isn't a nohz_full= kernel parameter specified, then tick_nohz_full_mask can legitimately be NULL. This can cause problems when RCU's boot code tries to cpumask_or() this value into rcu_nocb_mask. In addition, if NO_HZ_FULL_ALL=y, there is no point in doing the cpumask_or() in the first place because this will cause RCU_NOCB_CPU_ALL=y, which in turn will have all bits already set in rcu_nocb_mask. This commit therefore avoids the cpumask_or() if NO_HZ_FULL_ALL=y and checks for !tick_nohz_full_running otherwise, this latter check catching cases when there was no nohz_full= kernel parameter specified. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
b41d1b924d0bd41a225a17f39297b9de0dca93d9	11-Jun-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Fix a sparse warning in rcu_report_unblock_qs_rnp() This commit annotates rcu_report_unblock_qs_rnp() in order to fix the following sparse warning: kernel/rcu/tree_plugin.h:990:13: warning: context imbalance in 'rcu_report_unblock_qs_rnp' - unexpected unlock Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
615e41c6050a4878b2b68297f4672287941b93cd	11-Jun-2014	Pranith Kumar <bobby.prani@gmail.com>	rcu: Fix a sparse warning in rcu_initiate_boost() This commit annotates rcu_initiate_boost() fixes the following sparse warning: kernel/rcu/tree_plugin.h:1494:13: warning: context imbalance in 'rcu_initiate_boost' - unexpected unlock Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
c0f489d2c6fec8994c642c2ec925eb858727dc7b	04-Jun-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Bind grace-period kthreads to non-NO_HZ_FULL CPUs Binding the grace-period kthreads to the timekeeping CPU resulted in significant performance decreases for some workloads. For more detail, see: https://lkml.org/lkml/2014/6/3/395 for benchmark numbers https://lkml.org/lkml/2014/6/4/218 for CPU statistics It turns out that it is necessary to bind the grace-period kthreads to the timekeeping CPU only when all but CPU 0 is a nohz_full CPU on the one hand or if CONFIG_NO_HZ_FULL_SYSIDLE=y on the other. In other cases, it suffices to bind the grace-period kthreads to the set of non-nohz_full CPUs. This commit therefore creates a tick_nohz_not_full_mask that is the complement of tick_nohz_full_mask, and then binds the grace-period kthread to the set of CPUs indicated by this new mask, which covers the CONFIG_NO_HZ_FULL_SYSIDLE=n case. The CONFIG_NO_HZ_FULL_SYSIDLE=y case still binds the grace-period kthreads to the timekeeping CPU. This commit also includes the tick_nohz_full_enabled() check suggested by Frederic Weisbecker. Reported-by: Jet Chen <jet.chen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> [ paulmck: Created housekeeping_affine() and housekeeping_mask per fweisbec feedback. ]
abaa93d9e1de2c29297e69ddba8ddd38f15064cf	12-Jun-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Simplify priority boosting by putting rt_mutex in rcu_node RCU priority boosting currently checks for boosting via a pointer in task_struct. However, this is not needed: As Oleg noted, if the rt_mutex is placed in the rcu_node instead of on the booster's stack, the boostee can simply check it see if it owns the lock. This commit makes this change, shrinking task_struct by one pointer and the kernel by thirteen lines. Suggested-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
dfeb9765ce3c33cb3cbc5f16db423f1c58a4cc55	11-Jun-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Allow post-unlock reference for rt_mutex The current approach to RCU priority boosting uses an rt_mutex strictly for its priority-boosting side effects. The rt_mutex_init_proxy_locked() function is used by the booster to initialize the lock as held by the boostee. The booster then uses rt_mutex_lock() to acquire this rt_mutex, which priority-boosts the boostee. When the boostee reaches the end of its outermost RCU read-side critical section, it checks a field in its task structure to see whether it has been boosted, and, if so, uses rt_mutex_unlock() to release the rt_mutex. The booster can then go on to boost the next task that is blocking the current RCU grace period. But reasonable implementations of rt_mutex_unlock() might result in the boostee referencing the rt_mutex's data after releasing it. But the booster might have re-initialized the rt_mutex between the time that the boostee released it and the time that it later referenced it. This is clearly asking for trouble, so this commit introduces a completion that forces the booster to wait until the boostee has completely finished with the rt_mutex, thus avoiding the case where the booster is re-initializing the rt_mutex before the last boostee's last reference to that rt_mutex. This of course does introduce some overhead, but the priority-boosting code paths are miles from any possible fastpath, and the overhead of executing the completion will normally be quite small compared to the overhead of priority boosting and deboosting, so this should be OK. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
4da117cfa72e6cde3d9e8f5ed932381863cdeec9	10-May-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Remove redundant ACCESS_ONCE() from tick_do_timer_cpu In kernels built with CONFIG_NO_HZ_FULL, tick_do_timer_cpu is constant once boot completes. Thus, there is no need to wrap it in ACCESS_ONCE() in code that is built only when CONFIG_NO_HZ_FULL. This commit therefore removes the redundant ACCESS_ONCE(). Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Reviewed-by: Lai Jiangshan <laijs@cn.fujitsu.com>
b58cc46c5f6b57f1c814e374dbc47176e6b4938e	02-Jul-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Don't offload callbacks unless specifically requested Enabling NO_HZ_FULL currently has the side effect of enabling callback offloading on all CPUs. This results in lots of additional rcuo kthreads, and can also increase context switching and wakeups, even in cases where callback offloading is neither needed nor particularly desirable. This commit therefore enables callback offloading on a given CPU only if specifically requested at build time or boot time, or if that CPU has been specifically designated (again, either at build time or boot time) as a nohz_full CPU. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
fbce7497ee5af800a1c350c73f3c3f103cb27a15	24-Jun-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Parallelize and economize NOCB kthread wakeups An 80-CPU system with a context-switch-heavy workload can require so many NOCB kthread wakeups that the RCU grace-period kthreads spend several tens of percent of a CPU just awakening things. This clearly will not scale well: If you add enough CPUs, the RCU grace-period kthreads would get behind, increasing grace-period latency. To avoid this problem, this commit divides the NOCB kthreads into leaders and followers, where the grace-period kthreads awaken the leaders each of whom in turn awakens its followers. By default, the number of groups of kthreads is the square root of the number of CPUs, but this default may be overridden using the rcutree.rcu_nocb_leader_stride boot parameter. This reduces the number of wakeups done per grace period by the RCU grace-period kthread by the square root of the number of CPUs, but of course by shifting those wakeups to the leaders. In addition, because the leaders do grace periods on behalf of their respective followers, the number of wakeups of the followers decreases by up to a factor of two. Instead of being awakened once when new callbacks arrive and again at the end of the grace period, the followers are awakened only at the end of the grace period. For a numerical example, in a 4096-CPU system, the grace-period kthread would awaken 64 leaders, each of which would awaken its 63 followers at the end of the grace period. This compares favorably with the 79 wakeups for the grace-period kthread on an 80-CPU system. Reported-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
4a81e8328d3791a4f99bf5b436d050f6dc5ffea3	21-Jun-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Reduce overhead of cond_resched() checks for RCU Commit ac1bea85781e (Make cond_resched() report RCU quiescent states) fixed a problem where a CPU looping in the kernel with but one runnable task would give RCU CPU stall warnings, even if the in-kernel loop contained cond_resched() calls. Unfortunately, in so doing, it introduced performance regressions in Anton Blanchard's will-it-scale "open1" test. The problem appears to be not so much the increased cond_resched() path length as an increase in the rate at which grace periods complete, which increased per-update grace-period overhead. This commit takes a different approach to fixing this bug, mainly by moving the RCU-visible quiescent state from cond_resched() to rcu_note_context_switch(), and by further reducing the check to a simple non-zero test of a single per-CPU variable. However, this approach requires that the force-quiescent-state processing send resched IPIs to the offending CPUs. These will be sent only once the grace period has reached an age specified by the boot/sysfs parameter rcutree.jiffies_till_sched_qs, or once the grace period reaches an age halfway to the point at which RCU CPU stall warnings will be emitted, whichever comes first. Reported-by: Dave Hansen <dave.hansen@intel.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Christoph Lameter <cl@gentwo.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the ktest build robot. Also fixed smp_mb() comment as noted by Oleg Nesterov. ] Merge with e552592e (Reduce overhead of cond_resched() checks for RCU) Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
e534165bbf6a04d001748c573c7d6a7bae3713a5	24-Mar-2014	Uma Sharma <uma.sharma523@gmail.com>	rcu: Variable name changed in tree_plugin.h and used in tree.c The variable and struct both having the name "rcu_state" confuses sparse in some situations, so this commit changes the variable to "rcu_state_p" in order to avoid this confusion. This also makes things easier for human readers. Signed-off-by: Uma Sharma <uma.sharma523@gmail.com> [ paulmck: Changed the declaration and several additional uses. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
fa07a58f71ee23a82597ce337126982d0cc60b72	15-Apr-2014	Christoph Lameter <cl@linux.com>	rcu: Replace __this_cpu_ptr() uses with raw_cpu_ptr() __this_cpu_ptr is being phased out. One special case is increment_cpu_stall_ticks(). A per cpu variable is incremented so use raw_cpu_inc(). Cc: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
becb41bfe0544f1f7f494f48d6f68cbdb2e1ed0e	07-Apr-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Make large and small sysidle systems use same state machine Currently, small systems move back into RCU_SYSIDLE_NOT from RCU_SYSIDLE_SHORT and large systems do not. This works because moving aggressively to RCU_SYSIDLE_NOT affects only performance, not correctness, and on small systems, the performance impact should be negligible. That said, this difference does make RCU a bit more complex, and RCU does not seem to be suffering from any lack of complexity. This commit therefore adjusts small-system operation to match that of large systems, so that the state never moves back to RCU_SYSIDLE_NOT from RCU_SYSIDLE_SHORT. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
5057f55e543b7859cfd26bc281291795eac93f8a	01-Apr-2014	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	rcu: Bind RCU grace-period kthreads if NO_HZ_FULL Currently, RCU binds the grace-period kthreads to the timekeeping CPU only if CONFIG_NO_HZ_FULL_SYSIDLE=y. This means that these kthreads must be bound manually when CONFIG_NO_HZ_FULL_SYSIDLE=n and CONFIG_NO_HZ_FULL=y: Otherwise, these kthreads will induce OS jitter on random CPUs. Given that we are trying to reduce the amount of manual tweaking required to make CONFIG_NO_HZ_FULL=y work nicely, this commit makes this binding happen when CONFIG_NO_HZ_FULL=y, even in cases where CONFIG_NO_HZ_FULL_SYSIDLE=n. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>