History log of /kernel/trace/trace.c
Revision Date Author Comments
63351df85f7ece011618955054c1b0d15b393691 26-Jan-2015 Ruchi Kandoi <kandoiruchi@google.com> Fix compile errors in accordance with changes from 3.14 to 3.18

Signed-off-by: Ruchi Kandoi<kandoiruchi@google.com>
0438cf86ad894af8b5065708afd283568d373cfc 22-Nov-2012 Jamie Gennis <jgennis@google.com> trace: Add an option to show tgids in trace output

The tgids are tracked along side the saved_cmdlines tracking, and can be
included in trace output by enabling the 'print-tgid' trace option. This is
useful when doing post-processing of the trace data, as it allows events to be
grouped by tgid.

Change-Id: I52ed04c3a8ca7fddbb868b792ce5d21ceb76250e
Signed-off-by: Jamie Gennis <jgennis@google.com>
07906da78810dce5fd35b9449358c9208c693dca 06-Nov-2014 Rabin Vincent <rabin@rab.in> tracing: Do not risk busy looping in buffer splice

If the read loop in trace_buffers_splice_read() keeps failing due to
memory allocation failures without reading even a single page then this
function will keep busy looping.

Remove the risk for that by exiting the function if memory allocation
failures are seen.

Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in

Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e30f53aad2202b5526c40c36d8eeac8bf290bde5 10-Nov-2014 Rabin Vincent <rabin@rab.in> tracing: Do not busy wait in buffer splice

On a !PREEMPT kernel, attempting to use trace-cmd results in a soft
lockup:

# trace-cmd record -e raw_syscalls:* -F false
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61]
...
Call Trace:
[<ffffffff8105b580>] ? __wake_up_common+0x90/0x90
[<ffffffff81092e25>] wait_on_pipe+0x35/0x40
[<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0
[<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0
[<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40
[<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290
[<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0
[<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
[<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60
[<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80
[<ffffffff8112415d>] do_splice_to+0x6d/0x90
[<ffffffff81126971>] SyS_splice+0x7c1/0x800
[<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8

The problem is this: tracing_buffers_splice_read() calls
ring_buffer_wait() to wait for data in the ring buffers. The buffers
are not empty so ring_buffer_wait() returns immediately. But
tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1,
meaning it only wants to read a full page. When the full page is not
available, tracing_buffers_splice_read() tries to wait again with
ring_buffer_wait(), which again returns immediately, and so on.

Fix this by adding a "full" argument to ring_buffer_wait() which will
make ring_buffer_wait() wait until the writer has left the reader's
page, i.e. until full-page reads will succeed.

Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in

Cc: stable@vger.kernel.org # 3.16+
Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function")
Signed-off-by: Rabin Vincent <rabin@rab.in>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1b3e5c0936046e7e023149ddc8946d21c2ea20eb 16-Jul-2014 Thomas Gleixner <tglx@linutronix.de> ftrace: Provide trace clocks monotonic

Expose the new NMI safe accessor to clock monotonic to the tracer.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
58d4e21e50ff3cc57910a8abc20d7e14375d2f61 18-Jul-2014 Tony Luck <tony.luck@intel.com> tracing: Fix wraparound problems in "uptime" trace clock

The "uptime" trace clock added in:

commit 8aacf017b065a805d27467843490c976835eb4a5
tracing: Add "uptime" trace clock that uses jiffies

has wraparound problems when the system has been up more
than 1 hour 11 minutes and 34 seconds. It converts jiffies
to nanoseconds using:
(u64)jiffies_to_usecs(jiffy) * 1000ULL
but since jiffies_to_usecs() only returns a 32-bit value, it
truncates at 2^32 microseconds. An additional problem on 32-bit
systems is that the argument is "unsigned long", so fixing the
return value only helps until 2^32 jiffies (49.7 days on a HZ=1000
system).

Avoid these problems by using jiffies_64 as our basis, and
not converting to nanoseconds (we do convert to clock_t because
user facing API must not be dependent on internal kernel
HZ values).

Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com

Cc: stable@vger.kernel.org # 3.10+
Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies"
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6508fa761c330a1d2b4ae36199d08dbcb70e3ddb 18-Jul-2014 Stanislav Fomichev <stfomichev@yandex-team.ru> tracing: let user specify tracing_thresh after selecting function_graph

Currently, tracing_thresh works only if we specify it before selecting
function_graph tracer. If we do the opposite, tracing_thresh will change
it's value, but it will not be applied.
To fix it, we add update_thresh callback which is called whenever
tracing_thresh is updated and for function_graph tracer we register
handler which reinitializes tracer depending on tracing_thresh.

Link: http://lkml.kernel.org/p/20140718111727.GA3206@stfomichev-desktop.yandex.net

Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f0160a5a2912267c02cfe692eac955c360de5fdf 18-Jul-2013 zhangwei(Jovi) <jovi.zhangwei@huawei.com> tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs

The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing,
so add it, to be consistent with __trace_printk/__trace_bprintk.
Those functions are all called by the same function: trace_printk().

Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com

Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8abfb8727f4a724d31f9ccfd8013fbd16d539445 18-Jul-2013 zhangwei(Jovi) <jovi.zhangwei@huawei.com> tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs

Currently trace option stacktrace is not applicable for
trace_printk with constant string argument, the reason is
in __trace_puts/__trace_bputs ftrace_trace_stack is missing.

In contrast, when using trace_printk with non constant string
argument(will call into __trace_printk/__trace_bprintk), then
trace option stacktrace is workable, this inconstant result
will confuses users a lot.

Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com

Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
099ed151675cd1d2dbeae1dac697975f6a68716d 25-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Remove ftrace_stop/start() from reading the trace file

Disabling reading and writing to the trace file should not be able to
disable all function tracing callbacks. There's other users today
(like kprobes and perf). Reading a trace file should not stop those
from happening.

Cc: stable@vger.kernel.org # 3.0+
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d048a8c7b509f35dd351e1415fe49fa99e4cb7ef 12-Jun-2014 Namhyung Kim <namhyung@kernel.org> tracing: Add description of set_graph_notrace to tracing/README

It was missing the description of set_graph_notrace file. Add it.

Link: http://lkml.kernel.org/p/1402590233-22321-5-git-send-email-namhyung@kernel.org

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3f4d8f78a07dba1cb333ce749bd6a15c1ada362d 26-Jun-2014 Fabian Frederick <fabf@skynet.be> tracing: Remove unnecessary null test before debugfs_remove()

This fixes checkpatch warning:

"WARNING: debugfs_remove(NULL) is safe this check is probably not required"
Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.be

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12306276fabcb746a14979e96f43a13c724dec49 20-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Move the trace_seq_* functions into its own trace_seq.c file

The trace_seq_*() functions are a nice utility that allows users to manipulate
buffers with printf() like formats. It has its own trace_seq.h header in
include/linux and should be in its own file. Being tied with trace_output.c
is rather awkward.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f0b70cc48cc282cb326a4d71b3d1dda7d8fafd2a 10-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix leak of per cpu max data in instances

The freeing of an instance, if max data is configured, there will be
per cpu data structures created. But these are not freed when the instance
is deleted, which causes a memory leak.

A new helper function is added that frees the individual buffers within a
trace array, instead of duplicating the code. This way changes made for one
are applied to the other (normal buffer vs max buffer).

Link: http://lkml.kernel.org/r/87k38pbake.fsf@sejong.aot.lge.com

Reported-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a6af8fbf17989e41fef5cacf3988a724fb687d78 10-Jun-2014 Namhyung Kim <namhyung@kernel.org> tracing: Cleanup saved_cmdlines_size changes

The recent addition of saved_cmdlines_size file had some remaining
(minor - mostly coding style) issues. Fix them by passing pointer
name to sizeof() and using scnprintf().

Link: http://lkml.kernel.org/p/1402384295-23680-1-git-send-email-namhyung@kernel.org

Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8b8b36834d0fff67fc8668093f4312dd04dcf21d 10-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> ring-buffer: Check if buffer exists before polling

The per_cpu buffers are created one per possible CPU. But these do
not mean that those CPUs are online, nor do they even exist.

With the addition of the ring buffer polling, it assumes that the
caller polls on an existing buffer. But this is not the case if
the user reads trace_pipe from a CPU that does not exist, and this
causes the kernel to crash.

Simple fix is to check the cpu against buffer bitmask against to see
if the buffer was allocated or not and return -ENODEV if it is
not.

More updates were done to pass the -ENODEV back up to userspace.

Link: http://lkml.kernel.org/r/5393DB61.6060707@oracle.com

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a9fcaaac37b3baba1343f906f52aeb65c4d4e356 07-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix memory leak on instance deletion

When an instance is created, it also gets a snapshot ring buffer
allocated (with minimum of pages). But when it is deleted the snapshot
buffer is not. There was a helper function added to match the allocation
of these ring buffers to a way to free them, but it wasn't used by
the deletion of an instance. Using that helper function solves this
memory leak.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
23aaa3c18e33fe048671b419781b5e44175efafe 06-Jun-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix leak of ring buffer data when new instances creation fails

Yoshihiro Yunomae reported that the ring buffer data for a trace
instance does not get properly cleaned up when it fails. He proposed
a patch that manually cleaned the data up and addad a bunch of labels.
The labels are not needed because all trace array is allocated with
a kzalloc which initializes it to 0 and all kfree()s can take a NULL
pointer and will ignore it.

Adding a new helper function free_trace_buffers() that can also take
null buffers to free the buffers that were allocated by
allocate_trace_buffers().

Link: http://lkml.kernel.org/r/20140605223522.32311.31664.stgit@yunodevel

Reported-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
939c7a4f04fcd2162109744e8bf88194948a6e65 05-Jun-2014 Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> tracing: Introduce saved_cmdlines_size file

Introduce saved_cmdlines_size file for changing the number of saved pid-comms.
saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but
'no-existing processes' names are often lost in saved_cmdlines when we
read the trace data. So, by introducing saved_cmdlines_size file, we can
now change the 128 command names saved to something much larger if needed.

When we write a value to saved_cmdlines_size, the number of the value will
be stored in pid-comm list:

# echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size

Here, 1024 command names can be stored. The default number is 128 and the maximum
number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we
want to avoid losing any command names, we need to set 32768 to
saved_cmdlines_size.

We can read the maximum number of the list:

# cat /sys/kernel/debug/tracing/saved_cmdlines_size
128

Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
198376cd8877a612cc6494201b815c36d1e40391 03-Jun-2014 Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> tracing: Eliminate double free on failure of allocation on boot up

If allocation of the max_buffer fails on boot up, the error path will
free both per_cpu data structures from the buffers. With the new redesign
of the code, those structures are freed if allocations failed. That is,
the helper function that allocates the buffers will free the per cpu data
on failure. No need to do it again. In fact, the second free will cause
a bug as the code can not handle a double free.

Link: http://lkml.kernel.org/p/20140603042803.27308.30956.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4c27e756bc019ec1c11232893af036fdae720a97 30-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Move locking of trace_cmdline_lock into start/stop seq calls

With the conversion of the saved_cmdlines output to use seq_read, there
is now a race between accessing the values of the saved_cmdlines and
the writing to them. The trace_cmdline_lock needs to be taken at
the start and stop of the seq calls.

A new __trace_find_cmdline() call is created to allow for the look up
to happen without taking the lock.

Fixes: 42584c81c5ad tracing: Have saved_cmdlines use the seq_read infrastructure
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
379cfdac37923653c9d4242d10052378b7563005 30-May-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Try again for saved cmdline if failed due to locking

In order to prevent the saved cmdline cache from being filled when
tracing is not active, the comms are only recorded after a trace event
is recorded.

The problem is, a comm can fail to be recorded if the trace_cmdline_lock
is held. That lock is taken via a trylock to allow it to happen from
any context (including NMI). If the lock fails to be taken, the comm
is skipped. No big deal, as we will try again later.

But! Because of the code that was added to only record after an event,
we may not try again later as the recording is made as a oneshot per
event per CPU.

Only disable the recording of the comm if the comm is actually recorded.

Fixes: 7ffbd48d5cab "tracing: Cache comms only after an event occurred"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
42584c81c5adc1737a6fe0687facc5e62a5dc8c1 20-Feb-2014 Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> tracing: Have saved_cmdlines use the seq_read infrastructure

Current tracing_saved_cmdlines_read() implementation is naive; It allocates
a large buffer, constructs output data to that buffer for each read
operation, and then copies a portion of the buffer to the user space
buffer. This has several issues such as slow memory allocation, high
CPU usage, and even corruption of the output data.

The seq_read infrastructure is made to handle this type of work.
By converting it to use seq_read() the code becomes smaller, simplified,
as well as correct.

Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevel

Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2184db46e425c2b84a783eeead0e2b24ce67cd7f 28-May-2014 Steven Rostedt <rostedt@goodmis.org> tracing: Print nasty banner when trace_printk() is in use

trace_printk() is used to debug fast paths within the kernel. Places
that gets called in any context (interrupt or NMI) or thousands of
times a second. Something you do not want to do with a printk().

In order to make it completely lockless as it needs a temporary buffer
to handle some of the string formatting, a page is created per cpu for
every context (four per cpu; normal, softirq, irq, NMI).

Since trace_printk() should only be used for debugging purposes,
there's no reason to waste memory on these buffers on a production
system. That means, trace_printk() should never be used unless a
developer is debugging their kernel. There's macro magic to allocate
the buffers if trace_printk() is used anywhere in the kernel.

To help enforce that trace_printk() isn't used outside of development,
when it is used, a nasty banner is displayed on bootup (or when a module
is loaded that uses trace_printk() and the kernel core does not).

Here's the banner:

**********************************************************
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
** **
** trace_printk() being used. Allocating extra memory. **
** **
** This means that this is a DEBUG kernel and it is **
** unsafe for produciton use. **
** **
** If you see this message and you are not debugging **
** the kernel, report this immediately to your vendor! **
** **
** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE **
**********************************************************

That should hopefully keep developers from trying to sneak in a
trace_printk() or two.

Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.home

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bdffd893a0e9c431304142d12d9a0a21d365c502 29-Apr-2014 Christoph Lameter <cl@linux.com> tracing: Replace __get_cpu_var uses with this_cpu_ptr

Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr.

Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org

Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b1169cc69ba96b124df820904a6d3eb775491d7f 29-Apr-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Remove mock up poll wait function

Now that the ring buffer has a built in way to wake up readers
when there's data, using irq_work such that it is safe to do it
in any context. But it was still using the old "poor man's"
wait polling that checks every 1/10 of a second to see if it
should wake up a waiter. This makes the latency for a wake up
excruciatingly long. No need to do that anymore.

Completely remove the different wait_poll types from the tracers
and have them all use the default one now.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f4874261049e3abdd481359d82cafa5068369ebd 29-Apr-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Break out of tracing_wait_pipe() before wait_pipe() is called

When reading from trace_pipe, if tracing is off but nothing was read
it should block. If something is read and tracing is off, then EOF
is returned. If tracing is on and there's nothing to read, it will block.

But because the check of whether tracing is off and something was read
is done after the block on the pipe, it is hit or miss if the EOF is
returned or not leading to inconsistent behavior.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ad1438a076e275b70d1a04de1364bc483e5a81db 17-Apr-2014 Fabian Frederick <fabf@skynet.be> tracing: Add static to local functions

This patch adds static to the following functions:
-cycle_t buffer_ftrace_now
-void free_snapshot
-int trace_selftest_startup_dynamic_tracing

Link: http://lkml.kernel.org/p/20140417214442.d7abc7c0b0e4b90e7fedecc9@skynet.be

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0b9b12c1b884eb34773312f15c194220025e0416 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Move ftrace_max_lock into trace_array

In preparation for having tracers enabled in instances, the max_lock
should be unique as updating the max for one tracer is a separate
operation than updating it for another tracer using a different max.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6d9b3fa5e7f663bbfb9d2d80d46136f75319cb28 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Move tracing_max_latency into trace_array

In preparation for letting the latency tracers be used by instances,
remove the global tracing_max_latency variable and add a max_latency
field to the trace_array that the latency tracers will now use.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4104d326b670c2b66f575d2004daa28b2d1b4c8d 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> ftrace: Remove global function list and call function directly

Instead of having a list of global functions that are called,
as only one global function is allow to be enabled at a time, there's
no reason to have a list.

Instead, simply have all the users of the global ops, use the global ops
directly, instead of registering their own ftrace_ops. Just switch what
function is used before enabling the function tracer.

This removes a lot of code as well as the complexity involved with it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a786c06d9f2719203c00b3d97b21f9a96980d0b5 11-Apr-2014 Al Viro <viro@zeniv.linux.org.uk> missing bits of "splice: fix racy pipe->buffers uses"

that commit has fixed only the parts of that mess in fs/splice.c itself;
there had been more in several other ->splice_read() instances...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17a280ea8111c66791c18c0353b7986aafcb24fe 11-Apr-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add missing function triggers dump and cpudump to README

The debugfs tracing README file lists all the function triggers except for
dump and cpudump. These should be added too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
fbb32750a62df75d1ffea547f3908b21c5496d9f 03-Feb-2014 Al Viro <viro@zeniv.linux.org.uk> pipe: kill ->map() and ->unmap()

all pipe_buffer_operations have the same instances of those...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2c4a33aba5f9ea3a28f2e40351f078d95f00786b 26-Mar-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix traceon trigger condition to actually turn tracing on

While working on my tutorial for 2014 Linux Collaboration Summit
I found that the traceon trigger did not work when conditions were
used. The other triggers worked fine though. Looking into it, it
is because of the way the triggers use the ring buffer to store
the fields it will use for the condition. But if tracing is off, nothing
is stored in the buffer, and the tracepoint exits before calling the
trigger to test the condition. This is fine for all the triggers that
only work when tracing is on, but for traceon trigger that is to
work when tracing is off, nothing happens.

The fix is simple, just use a temp ring buffer to record the event
if tracing is off and the event has a trace event conditional trigger
enabled. The rest of the tracepoint code will work just fine, but
the tracepoint wont be recorded in the other buffers.

Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e1e232ca6b8faa210e5509f17d55519b4392524f 11-Feb-2014 Steven Rostedt <rostedt@goodmis.org> tracing: Add trace_clock=<clock> kernel parameter

Being able to change the trace clock at boot can be advantageous if
you need a better source of when things happen across CPUs. The default
trace clock is the fastest, but it uses local clocks which may not be
synced across CPUs and it does not let you know when events took place
with respect to events on other CPUs.

The global trace clock can help in this case, and if you do not care
about timings, the counter "clock" is the best, as that is just a simple
atomic counter that is incremented for every event.

Usage is to add "trace_clock=counter" on the kernel command line. You
can replace counter with "global" or any of the clocks listed in
/sys/kernel/debug/tracing/trace_clock

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Appreciated-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
591dffdade9f07692a7dd3ed16830ec24e901ece 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> ftrace: Allow for function tracing instance to filter functions

Create a "set_ftrace_filter" and "set_ftrace_notrace" files in the instance
directories to let users filter of functions to trace for the given instance.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
50512ab576e1ce29953c9259e1f36ce16f350f20 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Convert tracer->enabled to counter

As tracers will soon be used by instances, the tracer enabled field
needs to be converted to a counter instead of a boolean.
This counter is protected by the trace_types_lock mutex.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6b450d2533e2c1c71fbe7f1bdce0bb1c9f813030 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Disable tracers before deletion of instance

When an instance is about to be deleted, make sure the tracer
is set to nop. If it isn't reset the tracer and set it to the nop
tracer, otherwise memory leaks and bad pointers may result.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f1b21c9a40704dfdf7b8423c7d2969ea31c9857d 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Only let top level have option files

Currently, only the top level instance can have tracing options.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
607e2ea167e56db84387f3ab97e59a862e101cab 07-Nov-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Set up infrastructure to allow tracers for instances

Currently the tracers (function, function_graph, irqsoff, etc) can only
be used by the top level tracing directory (not for instances).

This sets up the infrastructure to allow instances to be able to
run a separate tracer apart from the what the top level tracing is
doing.

As tracers need to adapt for being used by instances, the tracers
must flag if they can be used by instances or not. Currently only the
'nop' tracer can be used by all instances.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bf6065b5c7014ab30383405718c7a6b96d2cbdb2 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Pass trace_array to flag_changed callback

As options (flags) may affect instances instead of being global
the flag_changed() callbacks need to receive the trace_array descriptor
of the instance they will be modifying.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8c1a49aedb73fb2f15aaa32ad9e2e1c4289f45cb 10-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Pass trace_array to set_flag callback

As options (flags) may affect instances instead of being global
the set_flag() callbacks need to receive the trace_array descriptor
of the instance they will be modifying.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3132e107d608f8753240d82d61303c500fd515b4 23-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Check if tracing is enabled in trace_puts()

If trace_puts() is used very early in boot up, it can crash the machine
if it is called before the ring buffer is allocated. If a trace_printk()
is used with no arguments, then it will be converted into a trace_puts()
and suffer the same fate.

Cc: stable@vger.kernel.org # 3.10+
Fixes: 09ae72348ecc "tracing: Add trace_puts() for even faster trace_printk() tracing"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
71485c45891b8a0fcc4ce22d87251424ab51e096 23-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix formatting of trace README file

Fix the formatting of the README file in the trace debugfs to fit in
an 80 character window.

Also add a comment about the event trigger counter with regards to
traceon and traceoff.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
26f255646e0ca6fde0e994e2a815ba2b31770dce 17-Jan-2014 Tom Zanussi <tom.zanussi@linux.intel.com> tracing/README: Add event file usage to tracing mini-HOWTO

It would be useful to have a cheat-sheet for everything under
tracing/events/ alongside the existing text describing the other files
in the tracing/ dir.

Add short descriptions of the directories and files under events/
along with examples, similar to the existing text for the other files
in tracing/.

Also clean up a few minor alignment problems noticed when adding the
new text.

Link: http://lkml.kernel.org/r/1389993104.3040.445.camel@empanada

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
92fdd98cf8bdec4d6b0c510e2f073ac4fd059be8 17-Jan-2014 Al Viro <viro@ZenIV.linux.org.uk> tracing: Fix buggered tee(2) on tracing_pipe

In kernel/trace/trace.c we have this:
static void tracing_pipe_buf_release(struct pipe_inode_info *pipe,
struct pipe_buffer *buf)
{
__free_page(buf->page);
}
static const struct pipe_buf_operations tracing_pipe_buf_ops = {
.can_merge = 0,
.map = generic_pipe_buf_map,
.unmap = generic_pipe_buf_unmap,
.confirm = generic_pipe_buf_confirm,
.release = tracing_pipe_buf_release,
.steal = generic_pipe_buf_steal,
.get = generic_pipe_buf_get,
};
with
void generic_pipe_buf_get(struct pipe_inode_info *pipe, struct pipe_buffer *buf)
{
page_cache_get(buf->page);
}

and I don't see anything that would've prevented tee(2) called on the pipe
that got stuff spliced into it from that sucker. ->ops->get() will be
called, then buf gets copied into target pipe's ->bufs[] and eventually
readers get to both copies of the buffer. With
get_page(page)
look at that page
__free_page(page)
look at that page
__free_page(page)
which is not a good thing, to put it mildly. AFAICS, that ought to use
the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)),
shouldn't it?

[
SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL),
and doesn't do anything special with it (no LRU logic). The __free_page()
should be fine, as it wont actually free a page with reference count.
Maybe there's a chance to leak memory? Anyway, This change is at a minimum
good for being symmetric with generic_pipe_buf_get, it is fine to add.
]

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ SDR - Removed no longer used tracing_pipe_buf_release ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dced341b2d4f06668efaab33f88de5d287c0f45b 14-Jan-2014 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Have trace buffer point back to trace_array

The trace buffer has a descriptor pointer that goes back to the trace
array. But it was never assigned. Luckily, nothing uses it (yet), but
it will in the future.

Although nothing currently uses this, if any of the new features get
backported to older kernels, and because this is such a simple change,
I'm marking it for stable too.

Cc: stable@vger.kernel.org # v3.10+
Fixes: 12883efb670c "tracing: Consolidate max_tr into main trace_array structure"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
098c879e1f2d6ee7afbfe959f6b04070065cec90 21-Dec-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add generic tracing_lseek() function

Trace event triggers added a lseek that uses the ftrace_filter_lseek()
function. Unfortunately, when function tracing is not configured in
that function is not defined and the kernel fails to build.

This is the second time that function was added to a file ops and
it broke the build due to requiring special config dependencies.

Make a generic tracing_lseek() that all the tracing utilities may
use.

Also, modify the old ftrace_filter_lseek() to return 0 instead of
1 on WRONLY. Not sure why it was a 1 as that does not make sense.

This also changes the old tracing_seek() to modify the file pos
pointer on WRONLY as well.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
93e31ffbf417a84fbae518fb46b3ea3f0d8fa6e1 24-Oct-2013 Tom Zanussi <tom.zanussi@linux.intel.com> tracing: Add 'snapshot' event trigger command

Add 'snapshot' event_command. snapshot event triggers are added by
the user via this command in a similar way and using practically the
same syntax as the analogous 'snapshot' ftrace function command, but
instead of writing to the set_ftrace_filter file, the snapshot event
trigger is written to the per-event 'trigger' files:

echo 'snapshot' > .../somesys/someevent/trigger

The above command will turn on snapshots for someevent i.e. whenever
someevent is hit, a snapshot will be done.

This also adds a 'count' version that limits the number of times the
command will be invoked:

echo 'snapshot:N' > .../somesys/someevent/trigger

Where N is the number of times the command will be invoked.

The above command will snapshot N times for someevent i.e. whenever
someevent is hit N times, a snapshot will be done.

Also adds a new tracing_alloc_snapshot() function - the existing
tracing_snapshot_alloc() function is a special version of
tracing_snapshot() that also does the snapshot allocation - the
snapshot triggers would like to be able to do just the allocation but
not take a snapshot; the existing tracing_snapshot_alloc() in turn now
also calls tracing_alloc_snapshot() underneath to do that allocation.

Link: http://lkml.kernel.org/r/c9524dd07ce01f9dcbd59011290e0a8d5b47d7ad.1382622043.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
[ fix up from kbuild test robot <fengguang.wu@intel.com report ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e5137b50a0640009fd63a3e65c14bc6e1be8796a 04-Oct-2013 Peter Zijlstra <peterz@infradead.org> ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED

Since the introduction of PREEMPT_NEED_RESCHED in:

f27dde8deef3 ("sched: Add NEED_RESCHED to the preempt_count")

we need to be able to look at both TIF_NEED_RESCHED and
PREEMPT_NEED_RESCHED to understand the full preemption behaviour.

Add it to the trace output.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/20131004152826.GP3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
042b10d83d05174e50ee861ee3aca55fd6204324 06-Nov-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Remove unused function ftrace_off_permanent()

In the past, ftrace_off_permanent() was called if something
strange was detected. But the ftrace_bug() now handles all the
anomolies that can happen with ftrace (function tracing), and there
are no uses of ftrace_off_permanent(). Get rid of it.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2e86421debc2cf4d1513c9b73fcd34c5ce431ae3 19-Oct-2013 Geyslan G. Bem <geyslan@gmail.com> tracing: Add helper function tracing_is_disabled()

This patch creates the function 'tracing_is_disabled', which
can be used outside of trace.c.

Link: http://lkml.kernel.org/r/1382141754-12155-1-git-send-email-geyslan@gmail.com

Signed-off-by: Geyslan G. Bem <geyslan@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b2f974d6af9accfec11e69cc76d2ab9f0c7359e0 23-Oct-2013 Cody P Schafer <cody@linux.vnet.ibm.com> tracing: Open tracer when ftrace_dump_on_oops is used

With ftrace_dump_on_oops, we previously did not open the tracer in
question, sometimes causing the trace output to be useless.

For example, the function_graph tracer with tracing_thresh set dumped via
ftrace_dump_on_oops would show a series of '}' indented at different levels,
but no function names.

call trace->open() (and do a few other fixups copied from the normal dump
path) to make the output more intelligible.

Link: http://lkml.kernel.org/r/1382554197-16961-1-git-send-email-cody@linux.vnet.ibm.com

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
38de93abec8d8acd8d6dbbe9b0d92d6d5cdb3090 24-Oct-2013 Tom Zanussi <tom.zanussi@linux.intel.com> tracing: Make register/unregister_ftrace_command __init

register/unregister_ftrace_command() are only ever called from __init
functions, so can themselves be made __init.

Also make register_snapshot_cmd() __init for the same reason.

Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f306cc82a93d6b19f01634b80c580b9755c8b7cc 24-Oct-2013 Tom Zanussi <tom.zanussi@linux.intel.com> tracing: Update event filters for multibuffer

The trace event filters are still tied to event calls rather than
event files, which means you don't get what you'd expect when using
filters in the multibuffer case:

Before:

# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 2048
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048

Setting the filter in tracing/instances/test1/events shouldn't affect
the same event in tracing/events as it does above.

After:

# echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# mkdir /sys/kernel/debug/tracing/instances/test1
# echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
# cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter
bytes_alloc > 8192
# cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter
bytes_alloc > 2048

We'd like to just move the filter directly from ftrace_event_call to
ftrace_event_file, but there are a couple cases that don't yet have
multibuffer support and therefore have to continue using the current
event_call-based filters. For those cases, a new USE_CALL_FILTER bit
is added to the event_call flags, whose main purpose is to keep the
old behavior for those cases until they can be updated with
multibuffer support; at that point, the USE_CALL_FILTER flag (and the
new associated call_filter_check_discard() function) can go away.

The multibuffer support also made filter_current_check_discard()
redundant, so this change removes that function as well and replaces
it with filter_check_discard() (or call_filter_check_discard() as
appropriate).

Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com

Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
057db8488b53d5e4faa0cedb2f39d4ae75dfbdbb 10-Oct-2013 Steven Rostedt <rostedt@goodmis.org> tracing: Fix potential out-of-bounds in trace_get_user()

Andrey reported the following report:

ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3
ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3)
Accessed by thread T13003:
#0 ffffffff810dd2da (asan_report_error+0x32a/0x440)
#1 ffffffff810dc6b0 (asan_check_region+0x30/0x40)
#2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20)
#3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260)
#4 ffffffff812a1065 (__fput+0x155/0x360)
#5 ffffffff812a12de (____fput+0x1e/0x30)
#6 ffffffff8111708d (task_work_run+0x10d/0x140)
#7 ffffffff810ea043 (do_exit+0x433/0x11f0)
#8 ffffffff810eaee4 (do_group_exit+0x84/0x130)
#9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30)
#10 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Allocated by thread T5167:
#0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0)
#1 ffffffff8128337c (__kmalloc+0xbc/0x500)
#2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90)
#3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0)
#4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40)
#5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430)
#6 ffffffff8129b668 (finish_open+0x68/0xa0)
#7 ffffffff812b66ac (do_last+0xb8c/0x1710)
#8 ffffffff812b7350 (path_openat+0x120/0xb50)
#9 ffffffff812b8884 (do_filp_open+0x54/0xb0)
#10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0)
#11 ffffffff8129d4b7 (SyS_open+0x37/0x50)
#12 ffffffff81928782 (system_call_fastpath+0x16/0x1b)

Shadow bytes around the buggy address:
ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa
ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb
ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap redzone: fa
Heap kmalloc redzone: fb
Freed heap region: fd
Shadow gap: fe

The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;'

Although the crash happened in ftrace_regex_open() the real bug
occurred in trace_get_user() where there's an incrementation to
parser->idx without a check against the size. The way it is triggered
is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop
that reads the last character stores it and then breaks out because
there is no more characters. Then the last character is read to determine
what to do next, and the index is incremented without checking size.

Then the caller of trace_get_user() usually nulls out the last character
with a zero, but since the index is equal to the size, it writes a nul
character after the allocated space, which can corrupt memory.

Luckily, only root user has write access to this file.

Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home

Reported-by: Andrey Konovalov <andreyknvl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b9be6d026d327593784b042aab4fa27e2de9c825 13-Sep-2013 Wang YanQing <udknight@gmail.com> tracing: Show more exact help information about snapshot

The current "help" that comes out of the snapshot file when it is
not allocated looks like this:

# * Snapshot is freed *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')

Echo 2 says that it does not allocate the buffer, which is correct,
but to be more consistent with "echo 0" it should also state
that it does not free.

Link: http://lkml.kernel.org/r/20130914045916.GA4243@udknight

Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ccfe9e42e451232dd17a230d1b4e979c3d15311e 08-Aug-2013 Alexander Z Lam <azl@google.com> tracing: Make tracing_cpumask available for all instances

Allow tracer instances to disable tracing by cpu by moving
the static global tracing_cpumask into trace_array.

Link: http://lkml.kernel.org/r/921622317f239bfc2283cac2242647801ef584f2.1375980149.git.azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9457158bbc0ee04ecef76862d73eecd8076e9c7b 03-Aug-2013 Alexander Z Lam <azl@google.com> tracing: Fix reset of time stamps during trace_clock changes

Fixed two issues with changing the timestamp clock with trace_clock:

- The global buffer was reset on instance clock changes. Change this to pass
the correct per-instance buffer
- ftrace_now() is used to set buf->time_start in tracing_reset_online_cpus().
This was incorrect because ftrace_now() used the global buffer's clock to
return the current time. Change this to use buffer_ftrace_now() which
returns the current time for the correct per-instance buffer.

Also removed tracing_reset_current() because it is not used anywhere

Link: http://lkml.kernel.org/r/1375493777-17261-2-git-send-email-azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
711e124379e0f889e40e2f01d7f5d61936d3cd23 03-Aug-2013 Alexander Z Lam <azl@google.com> tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer

Releasing the free_buffer file in an instance causes the global buffer
to be stopped when TRACE_ITER_STOP_ON_FREE is enabled. Operate on the
correct buffer.

Link: http://lkml.kernel.org/r/1375493777-17261-1-git-send-email-azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ed5467da0e369e65b247b99eb6403cb79172bcda 02-Aug-2013 Andrew Vagin <avagin@openvz.org> tracing: Fix fields of struct trace_iterator that are zeroed by mistake

tracing_read_pipe zeros all fields bellow "seq". The declaration contains
a comment about that, but it doesn't help.

The first field is "snapshot", it's true when current open file is
snapshot. Looks obvious, that it should not be zeroed.

The second field is "started". It was converted from cpumask_t to
cpumask_var_t (v2.6.28-4983-g4462344), in other words it was
converted from cpumask to pointer on cpumask.

Currently the reference on "started" memory is lost after the first read
from tracing_read_pipe and a proper object will never be freed.

The "started" is never dereferenced for trace_pipe, because trace_pipe
can't have the TRACE_FILE_ANNOTATE options.

Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org

Cc: stable@vger.kernel.org # 2.6.30
Signed-off-by: Andrew Vagin <avagin@openvz.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
09d8091c024ec88d1541d93eb8ddb2bd5cf10c39 24-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus()

Commit a82274151af "tracing: Protect ftrace_trace_arrays list in trace_events.c"
added taking the trace_types_lock mutex in trace_events.c as there were
several locations that needed it for protection. Unfortunately, it also
encapsulated a call to tracing_reset_all_online_cpus() which also takes
the trace_types_lock, causing a deadlock.

This happens when a module has tracepoints and has been traced. When the
module is removed, the trace events module notifier will grab the
trace_types_lock, do a bunch of clean ups, and also clears the buffer
by calling tracing_reset_all_online_cpus. This doesn't happen often
which explains why it wasn't caught right away.

Commit a82274151af was marked for stable, which means this must be
sent to stable too.

Link: http://lkml.kernel.org/r/51EEC646.7070306@broadcom.com

Reported-by: Arend van Spril <arend@broadcom.com>
Tested-by: Arend van Spriel <arend@broadcom.com>
Cc: Alexander Z Lam <azl@google.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9c01fe4593db123c5a72dc36f0400f776e92c954 23-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Kill trace_cpu struct/members

After the previous changes trace_array_cpu->trace_cpu and
trace_array->trace_cpu becomes write-only. Remove these members
and kill "struct trace_cpu" as well.

As a side effect this also removes memset(per_cpu_memory, 0).
It was not needed, alloc_percpu() returns zero-filled memory.

Link: http://lkml.kernel.org/r/20130723152613.GA23741@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6484c71cbc170634fa131b6d022d86d61686b88b 23-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu()

tracing_open() and tracing_snapshot_open() are racy, the memory
inode->i_private points to can be already freed.

Convert these last users of "inode->i_private == trace_cpu" to
use "i_private = trace_array" and rely on tracing_get_cpu().

v2: incorporate the fix from Steven, tracing_release() must not
blindly dereference file->private_data unless we know that
the file was opened for reading.

Link: http://lkml.kernel.org/r/20130723152610.GA23737@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0bc392ee46d0fd8e6b678457ef71f074f19a03c5 23-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Change tracing_entries_fops to rely on tracing_get_cpu()

tracing_open_generic_tc() is racy, the memory inode->i_private
points to can be already freed.

1. Change its last user, tracing_entries_fops, to use
tracing_*_generic_tr() instead.

2. Change debugfs_create_file("buffer_size_kb", data) callers
to pass "data = tr".

3. Change tracing_entries_read() and tracing_entries_write() to
use tracing_get_cpu().

4. Kill the no longer used tracing_open_generic_tc() and
tracing_release_generic_tc().

Link: http://lkml.kernel.org/r/20130723152606.GA23730@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4d3435b8a4c3357695e09c5e7a3bf73a19fca5b0 23-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Change tracing_stats_fops to rely on tracing_get_cpu()

tracing_open_generic_tc() is racy, the memory inode->i_private
points to can be already freed.

1. Change one of its users, tracing_stats_fops, to use
tracing_*_generic_tr() instead.

2. Change trace_create_cpu_file("stats", data) to pass "data = tr".

3. Change tracing_stats_read() to use tracing_get_cpu().

Link: http://lkml.kernel.org/r/20130723152603.GA23727@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
46ef2be0d1d5ccea0c41bb606143586daadd537c 23-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Change tracing_buffers_fops to rely on tracing_get_cpu()

tracing_buffers_open() is racy, the memory inode->i_private points
to can be already freed.

Change debugfs_create_file("trace_pipe_raw", data) caller to pass
"data = tr", tracing_buffers_open() can use tracing_get_cpu().

Change debugfs_create_file("snapshot_raw_fops", data) caller too,
this file uses tracing_buffers_open/release.

Link: http://lkml.kernel.org/r/20130723152600.GA23720@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
15544209cb0b5312e5220a9337a1fe61d1a1f2d9 23-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu()

tracing_open_pipe() is racy, the memory inode->i_private points to
can be already freed.

Change debugfs_create_file("trace_pipe", data) callers to to pass
"data = tr", tracing_open_pipe() can use tracing_get_cpu().

Link: http://lkml.kernel.org/r/20130723152557.GA23717@redhat.com

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
649e9c70da6bfbeb563193a35d3424a5aa7c0d38 23-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Introduce trace_create_cpu_file() and tracing_get_cpu()

Every "file_operations" used by tracing_init_debugfs_percpu is buggy.
f_op->open/etc does:

1. struct trace_cpu *tc = inode->i_private;
struct trace_array *tr = tc->tr;

2. trace_array_get(tr) or fail;

3. do_something(tc);

But tc (and tr) can be already freed before trace_array_get() is called.
And it doesn't matter whether this file is per-cpu or it was created by
init_tracer_debugfs(), free_percpu() or kfree() are equally bad.

Note that even 1. is not safe, the freed memory can be unmapped. But even
if it was safe trace_array_get() can wrongly succeed if we also race with
the next new_instance_create() which can re-allocate the same tr, or tc
was overwritten and ->tr points to the valid tr. In this case 3. uses the
freed/reused memory.

Add the new trivial helper, trace_create_cpu_file() which simply calls
trace_create_file() and encodes "cpu" in "struct inode". Another helper,
tracing_get_cpu() will be used to read cpu_nr-or-RING_BUFFER_ALL_CPUS.

The patch abuses ->i_cdev to encode the number, it is never used unless
the file is S_ISCHR(). But we could use something else, say, i_bytes or
even ->d_fsdata. In any case this hack is hidden inside these 2 helpers,
it would be trivial to change them if needed.

This patch only changes tracing_init_debugfs_percpu() to use the new
trace_create_cpu_file(), the next patches will change file_operations.

Note: tracing_get_cpu(inode) is always safe but you can't trust the
result unless trace_array_get() was called, without trace_types_lock
which acts as a barrier it can wrongly return RING_BUFFER_ALL_CPUS.

Link: http://lkml.kernel.org/r/20130723152554.GA23710@redhat.com

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e70e78e3c83b536730e31231dd9b979768d8df3c 19-Jul-2013 Oleg Nesterov <oleg@redhat.com> tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()

tracing_buffers_open() does trace_array_get() and then it wrongly
inrcements tr->ref again under trace_types_lock. This means that
every caller leaks trace_array:

# cd /sys/kernel/debug/tracing/
# mkdir instances/X
# true < instances/X/per_cpu/cpu0/trace_pipe_raw
# rmdir instances/X
rmdir: failed to remove `instances/X': Device or resource busy

Link: http://lkml.kernel.org/r/20130719153644.GA18899@redhat.com

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f77d09a384676bde6445413949d9d2c508ff3e62 18-Jul-2013 Alexander Z Lam <azl@google.com> tracing: Miscellaneous fixes for trace_array ref counting

Some error paths did not handle ref counting properly, and some trace files need
ref counting.

Link: http://lkml.kernel.org/r/1374171524-11948-1-git-send-email-azl@google.com

Cc: stable@vger.kernel.org # 3.10
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
609e85a70bcd0eedf4ec60639dbcfb1ab011e054 11-Jul-2013 Alexander Z Lam <azl@google.com> tracing: Fix error handling to ensure instances can always be removed

Remove debugfs directories for tracing instances during creation if an error
occurs causing the trace_array for that instance to not be added to
ftrace_trace_arrays. If the directory continues to exist after the error, it
cannot be removed because the respective trace_array is not in
ftrace_trace_arrays.

Link: http://lkml.kernel.org/r/1373502874-1706-2-git-send-email-azl@google.com

Cc: stable@vger.kernel.org # 3.10
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
991821c86c2fb6cc4104ce679247864dbc070a83 15-Jul-2013 zhangwei(Jovi) <jovi.zhangwei@huawei.com> tracing: Use correct config guard CONFIG_STACK_TRACER

We should use CONFIG_STACK_TRACER to guard readme text
of stack tracer related file, not CONFIG_STACKTRACE.

Link: http://lkml.kernel.org/r/51E3B3A2.8080609@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dcc302232c1f9b3ca16f6b8ee190eb0b1a8a0da3 03-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Make tracing_open_generic_{tr,tc}() static

I have patches that will use tracing_open_generic_tr/tc() in other
files, but as they are not ready to be merged yet, and Fengguang Wu's
sparse scripts pointed out that these functions were not declared
anywhere, I'll make them static for now.

When these functions are required to be used elsewhere, I'll remove
the static then.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8de1eb02778b64f8b292db531cf39a429f84315f 10-Apr-2013 zhangwei(Jovi) <jovi.zhangwei@huawei.com> tracing: Remove ftrace() function

The only caller of function ftrace(...) was removed a long time ago,
so remove the function body as well.

Link: http://lkml.kernel.org/r/1365564393-10972-10-git-send-email-jovi.zhangwei@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5280bcef91e706770cc1706eb97353e3513322b9 03-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Make tracer_tracing_{off,on,is_on}() static

I have patches that will use tracer_tracing_on/off/is_on() in other
files, but as they are not ready to be merged yet, and Fengguang Wu's
sparse scripts pointed out that these functions were not declared
anywhere, I'll make them static for now.

When these functions are required to be used elsewhere, I'll remove
the static then.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7b85af63034818e43aee6c1d7bf1c7c6796a9073 02-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Get trace_array ref counts when accessing trace files

When a trace file is opened that may access a trace array, it must
increment its ref count to prevent it from being deleted.

Cc: stable@vger.kernel.org # 3.10
Reported-by: Alexander Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ff451961a8b2a17667a7bfa39c86fb9b351445db 02-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add trace_array_get/put() to handle instance refs better

Commit a695cb58162 "tracing: Prevent deleting instances when they are being read"
tried to fix a race between deleting a trace instance and reading contents
of a trace file. But it wasn't good enough. The following could crash the kernel:

# cd /sys/kernel/debug/tracing/instances
# ( while :; do mkdir foo; rmdir foo; done ) &
# ( while :; do cat foo/trace &> /dev/null; done ) &

Luckily this can only be done by root user, but it should be fixed regardless.

The problem is that a delete of the file can happen after the reader starts
to open the file but before it grabs the trace_types_mutex.

The solution is to validate the trace array before using it. If the trace
array does not exist in the list of trace arrays, then it returns -ENODEV.

There's a possibility that a trace_array could be deleted and a new one
created and the open would open its file instead. But that is very minor as
it will just return the data of the new trace array, it may confuse the user
but it will not crash the system. As this can only be done by root anyway,
the race will only occur if root is deleting what its trying to read at
the same time.

Cc: stable@vger.kernel.org # 3.10
Reported-by: Alexander Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a82274151af2b075163e3c42c828529dee311487 02-Jul-2013 Alexander Z Lam <azl@google.com> tracing: Protect ftrace_trace_arrays list in trace_events.c

There are multiple places where the ftrace_trace_arrays list is accessed in
trace_events.c without the trace_types_lock held.

Link: http://lkml.kernel.org/r/1372732674-22726-1-git-send-email-azl@google.com

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2d71619c59fac95a5415a326162fa046161b938c 02-Jul-2013 Alexander Z Lam <azl@google.com> tracing: Make trace_marker use the correct per-instance buffer

The trace_marker file was present for each new instance created, but it
added the trace mark to the global trace buffer instead of to
the instance's buffer.

Link: http://lkml.kernel.org/r/1372717885-4543-2-git-send-email-azl@google.com

Cc: David Sharp <dhsharp@google.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Alexander Z Lam <lambchop468@gmail.com>
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Alexander Z Lam <azl@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
10246fa35d4ffdfe472185d4cbf9c2dfd9a9f023 01-Jul-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Use flag buffer_disabled for irqsoff tracer

If the ring buffer is disabled and the irqsoff tracer records a trace it
will clear out its buffer and lose the data it had previously recorded.

Currently there's a callback when writing to the tracing_of file, but if
tracing is disabled via the function tracer trigger, it will not inform
the irqsoff tracer to stop recording.

By using the "mirror" flag (buffer_disabled) in the trace_array, that keeps
track of the status of the trace_array's buffer, it gives the irqsoff
tracer a fast way to know if it should record a new trace or not.
The flag may be a little behind the real state of the buffer, but it
should not affect the trace too much. It's more important for the irqsoff
tracer to be fast.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
de7edd31457b626e54a0b2a7e8ff4d65492f01ad 14-Jun-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Disable tracing on warning

Add a traceoff_on_warning option in both the kernel command line as well
as a sysctl option. When set, any WARN*() function that is hit will cause
the tracing_on variable to be cleared, which disables writing to the
ring buffer.

This is useful especially when tracing a bug with function tracing. When
a warning is hit, the print caused by the warning can flood the trace with
the functions that producing the output for the warning. This can make the
resulting trace useless by either hiding where the bug happened, or worse,
by overflowing the buffer and losing the trace of the bug totally.

Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
238ae93d699d59876b470bf6455de22bcfaa9a1b 26-May-2013 Wang YanQing <udknight@gmail.com> tracing: Fix file mode of free_buffer

Commit 4f271a2a60c748599b30bb4dafff30d770439b96
(tracing: Add a proc file to stop tracing and free buffer)
implement a method to free up ring buffer in kernel memory
in the release code path of free_buffer's fd.

Then we don't need read/write support for free_buffer,
indeed we just have a dummy write fop, and don't implement read fop.

So the 0200 is more reasonable file mode for free_buffer than
the current file mode 0644.

Link: http://lkml.kernel.org/r/20130526085201.GA3183@udknight

Acked-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Acked-by: David Sharp <dhsharp@google.com>
Signed-off-by: Wang YanQing <udknight@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
58e8eedf18577c7eac722d5d1f190507ea263d1b 23-Apr-2013 Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> tracing: Fix outputting formats of x86-tsc and counter when use trace_clock

Outputting formats of x86-tsc and counter should be a raw format, but after
applying the patch(2b6080f28c7cc3efc8625ab71495aae89aeb63a0), the format was
changed to nanosec. This is because the global variable trace_clock_id was used.
When we use multiple buffers, clock_id of each sub-buffer should be used. Then,
this patch uses tr->clock_id instead of the global variable trace_clock_id.

[ Basically, this fixes a regression where the multibuffer code changed the
trace_clock file to update tr->clock_id but the traces still use the old
global trace_clock_id variable, negating the file's effect. The global
trace_clock_id variable is obsolete and removed. - SR ]

Link: http://lkml.kernel.org/r/20130423013239.22334.7394.stgit@yunodevel

Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f17a5194859a82afe4164e938b92035b86c55794 31-May-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Use current_uid() for critical time tracing

The irqsoff tracer records the max time that interrupts are disabled.
There are hooks in the assembly code that calls back into the tracer when
interrupts are disabled or enabled.

When they are enabled, the tracer checks if the amount of time they
were disabled is larger than the previous recorded max interrupts off
time. If it is, it creates a snapshot of the currently running trace
to store where the last largest interrupts off time was held and how
it happened.

During testing, this RCU lockdep dump appeared:

[ 1257.829021] ===============================
[ 1257.829021] [ INFO: suspicious RCU usage. ]
[ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G W
[ 1257.829021] -------------------------------
[ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle!
[ 1257.829021]
[ 1257.829021] other info that might help us debug this:
[ 1257.829021]
[ 1257.829021]
[ 1257.829021] RCU used illegally from idle CPU!
[ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0
[ 1257.829021] RCU used illegally from extended quiescent state!
[ 1257.829021] 2 locks held by trace-cmd/4831:
[ 1257.829021] #0: (max_trace_lock){......}, at: [<ffffffff810e2b77>] stop_critical_timing+0x1a3/0x209
[ 1257.829021] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810dae5a>] __update_max_tr+0x88/0x1ee
[ 1257.829021]
[ 1257.829021] stack backtrace:
[ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G W 3.10.0-rc1-test+ #171
[ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007
[ 1257.829021] 0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8
[ 1257.829021] ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003
[ 1257.829021] ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a
[ 1257.829021] Call Trace:
[ 1257.829021] [<ffffffff8153dd2b>] dump_stack+0x19/0x1b
[ 1257.829021] [<ffffffff81092a00>] lockdep_rcu_suspicious+0x109/0x112
[ 1257.829021] [<ffffffff810daebf>] __update_max_tr+0xed/0x1ee
[ 1257.829021] [<ffffffff810dae5a>] ? __update_max_tr+0x88/0x1ee
[ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021] [<ffffffff810dbf85>] update_max_tr_single+0x11d/0x12d
[ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021] [<ffffffff810e2b15>] stop_critical_timing+0x141/0x209
[ 1257.829021] [<ffffffff8109569a>] ? trace_hardirqs_on+0xd/0xf
[ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021] [<ffffffff810e3057>] time_hardirqs_on+0x2a/0x2f
[ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107
[ 1257.829021] [<ffffffff8109550c>] trace_hardirqs_on_caller+0x16/0x197
[ 1257.829021] [<ffffffff8109569a>] trace_hardirqs_on+0xd/0xf
[ 1257.829021] [<ffffffff811002b9>] user_enter+0xfd/0x107
[ 1257.829021] [<ffffffff810029b4>] do_notify_resume+0x92/0x97
[ 1257.829021] [<ffffffff8154bdca>] int_signal+0x12/0x17

What happened was entering into the user code, the interrupts were enabled
and a max interrupts off was recorded. The trace buffer was saved along with
various information about the task: comm, pid, uid, priority, etc.

The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock()
to retrieve the data, and this happened to happen where RCU is blind (user_enter).

As only the preempt and irqs off tracers can have this happen, and they both
only have the tsk == current, if tsk == current, use current_uid() instead of
task_uid(), as current_uid() does not use RCU as only current can change its uid.

This fixes the RCU suspicious splat.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ca1643186d3dce6171d8f171e516b02496360a9e 23-May-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix crash when ftrace=nop on the kernel command line

If ftrace=<tracer> is on the kernel command line, when that tracer is
registered, it will be initiated by tracing_set_tracer() to execute that
tracer.

The nop tracer is just a stub tracer that is used to have no tracer
enabled. It is assigned at early bootup as it is the default tracer.

But if ftrace=nop is on the kernel command line, the registering of the
nop tracer will call tracing_set_tracer() which will try to execute
the nop tracer. But it expects tr->current_trace to be assigned something
as it usually is assigned to the nop tracer. As it hasn't been assigned
to anything yet, it causes the system to crash.

The simple fix is to move the tr->current_trace = nop before registering
the nop tracer. The functionality is still the same as the nop tracer
doesn't do anything anyway.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6c24499f40d96bf07a85b709fb1bee5cea611a1d 30-Apr-2013 Steven Rostedt <rostedt@goodmis.org> tracing: Fix small merge bug

During the 3.10 merge, a conflict happened and the resolution was
almost, but not quite, correct. An if statement was reversed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[ Duh. That was just silly of me - Linus ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ed6f1c996bfe4b6e520cf7a74b51cd6988d84420 10-Apr-2013 Namhyung Kim <namhyung.kim@lge.com> tracing: Check return value of tracing_init_dentry()

Check return value and bail out if it's NULL.

Link: http://lkml.kernel.org/r/1365553093-10180-2-git-send-email-namhyung@kernel.org

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung.kim@lge.com>
Cc: stable@vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9607a869ee59594f3f7b9f3ac43a11d92bf3f960 07-Apr-2013 Chen Gang <gang.chen@asianux.com> kernel: tracing: Use strlcpy instead of strncpy

Use strlcpy() instead of strncpy() as it will always add a '\0'
to the end of the string even if the buffer is smaller than what
is being copied.

Link: http://lkml.kernel.org/r/51624254.30301@asianux.com

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2930e04d00e113ae24bb2b7c2b58de7b648a62c7 26-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix race with update_max_tr_single and changing tracers

The commit 34600f0e9 "tracing: Fix race with max_tr and changing tracers"
fixed the updating of the main buffers with the race of changing
tracers, but left out the fix to the updating of just a per cpu buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
67012ab1d2ce871afea4ee55408f233f97d09d07 07-Apr-2013 Chen Gang <gang.chen@asianux.com> perf: Fix strncpy() use, use strlcpy() instead of strncpy()

For NUL terminated string we always need to set '\0' at the end.

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Cc: rostedt@goodmis.org
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Link: http://lkml.kernel.org/r/51624254.30301@asianux.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
22f45649ce08642ad7df238d5c25fa5c86bfdd31 15-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Update debugfs README file

Update the README file in debugfs/tracing to something more useful.
What's currently in the file is very old and what it shows doesn't
have much use. Heck, it tells you how to mount debugfs! But to read
this file you would have already needed to mount it.

Replace the file with current up-to-date information. It's rather
limited, but what do you expect from a pseudo README file.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7fe70b579c9e3daba71635e31b6189394e7b79d3 15-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix ftrace_dump()

ftrace_dump() had a lot of issues. What ftrace_dump() does, is when
ftrace_dump_on_oops is set (via a kernel parameter or sysctl), it
will dump out the ftrace buffers to the console when either a oops,
panic, or a sysrq-z occurs.

This was written a long time ago when ftrace was fragile to recursion.
But it wasn't written well even for that.

There's a possible deadlock that can occur if a ftrace_dump() is happening
and an NMI triggers another dump. This is because it grabs a lock
before checking if the dump ran.

It also totally disables ftrace, and tracing for no good reasons.

As the ring_buffer now checks if it is read via a oops or NMI, where
there's a chance that the buffer gets corrupted, it will disable
itself. No need to have ftrace_dump() do the same.

ftrace_dump() is now cleaned up where it uses an atomic counter to
make sure only one dump happens at a time. A simple atomic_inc_return()
is enough that is needed for both other CPUs and NMIs. No need for
a spinlock, as if one CPU is running the dump, no other CPU needs
to do it too.

The tracing_on variable is turned off and not turned on. The original
code did this, but it wasn't pretty. By just disabling this variable
we get the result of not seeing traces that happen between crashes.

For sysrq-z, it doesn't get turned on, but the user can always write
a '1' to the tracing_on file. If they are using sysrq-z, then they should
know about tracing_on.

The new code is much easier to read and less error prone. No more
deadlock possibility when an NMI triggers here.

Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Cc: stable@vger.kernel.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bd6df18716fa45bc4aa9587aca033de909e5382b 11-Mar-2013 zhangwei(Jovi) <jovi.zhangwei@huawei.com> tracing: Use TRACE_MAX_PRINT instead of constant

TRACE_MAX_PRINT macro is defined, but is not used.

Link: http://lkml.kernel.org/r/513D8421.4070404@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
687c878afb526a0c3117dbc408ca76ad80d689f7 11-Mar-2013 zhangwei(Jovi) <jovi.zhangwei@huawei.com> tracing: Use pr_warn_once instead of open coded implementation

Use pr_warn_once, instead of making an open coded implementation.

Link: http://lkml.kernel.org/r/513D8419.20400@huawei.com

Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
76f119179b8ce3188a8c61d2486d37810a416655 14-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add "perf" trace_clock

The function trace_clock() calls "local_clock()" which is exactly
the same clock that perf uses. I'm not sure why perf doesn't call
trace_clock(), as trace_clock() doesn't have any users.

But now it does. As trace_clock() calls local_clock() like perf does,
I added the trace_clock "perf" option that uses trace_clock().

Now the ftrace buffers can use the same clock as perf uses. This
will be useful when perf starts reading the ftrace buffers, and will
be able to interleave them with the same clock data.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8aacf017b065a805d27467843490c976835eb4a5 14-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add "uptime" trace clock that uses jiffies

Add a simple trace clock called "uptime" for those that are
interested in the uptime of the trace. It uses jiffies as that's
the safest method, as other uptime clocks grab seq locks, which could
cause a deadlock if taken from an event or function tracer.

Requested-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
328df4759c03e2c3e7429cc6cb0e180c38f32063 14-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add function-trace option to disable function tracing of latency tracers

Currently, the only way to stop the latency tracers from doing function
tracing is to fully disable the function tracer from the proc file
system:

echo 0 > /proc/sys/kernel/ftrace_enabled

This is a big hammer approach as it disables function tracing for
all users. This includes kprobes, perf, stack tracer, etc.

Instead, create a function-trace option that the latency tracers can
check to determine if it should enable function tracing or not.
This option can be set or cleared even while the tracer is active
and the tracers will disable or enable function tracing depending
on how the option was set.

Instead of using the proc file, disable latency function tracing with

echo 0 > /debug/tracing/options/function-trace

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Clark Williams <williams@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c142be8ebe0b7bf73c8a0063925623f3e4b980c0 13-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add skip argument to trace_dump_stack()

Altough the trace_dump_stack() already skips three functions in
the call to stack trace, which gets the stack trace to start
at the caller of the function, the caller may want to skip some
more too (as it may have helper functions).

Add a skip argument to the trace_dump_stack() that lets the caller
skip back tracing functions that it doesn't care about.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
77fd5c15e3216b901be69047ca43b05ae9099951 12-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add snapshot trigger to function probes

echo 'schedule:snapshot:1' > /debug/tracing/set_ftrace_filter

This will cause the scheduler to trigger a snapshot the next time
it's called (you can use any function that's not called by NMI).

Even though it triggers only once, you still need to remove it with:

echo '!schedule:snapshot:0' > /debug/tracing/set_ftrace_filter

The :1 can be left off for the first command:

echo 'schedule:snapshot' > /debug/tracing/set_ftrace_filter

But this will cause all calls to schedule to trigger a snapshot.
This must be removed without the ':0'

echo '!schedule:snapshot' > /debug/tracing/set_ftrace_filter

As adding a "count" is a different operation (internally).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3209cff4490bee55fd2bc1d087cb8ecf2a686a88 12-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add alloc/free_snapshot() to replace duplicate code

Add alloc_snapshot() and free_snapshot() to allocate and free the
snapshot buffer respectively, and use these to remove duplicate
code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1b22e382ab40b0e3ee5abb3e310dffb16fee22aa 09-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Let tracing_snapshot() be used by modules but not NMI

Add EXPORT_SYMBOL_GPL() to let the tracing_snapshot() functions be
called from modules.

Also add a test to see if the snapshot was called from NMI context
and just warn in the tracing buffer if so, and return.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ca268da6e415448a43138e1abc5d5f057af319d7 09-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add internal ftrace trace_puts() for ftrace to use

There's a few places that ftrace uses trace_printk() for internal
use, but this requires context (normal, softirq, irq, NMI) buffers
to keep things lockless. But the trace_puts() does not, as it can
write the string directly into the ring buffer. Make a internal helper
for trace_puts() and have the internal functions use that.

This way the extra context buffers are not used.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
09ae72348eccb60e304cf8ce94653f4a78fcd407 09-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Add trace_puts() for even faster trace_printk() tracing

The trace_printk() is extremely fast and is very handy as it can be
used in any context (including NMIs!). But it still requires scanning
the fmt string for parsing the args. Even the trace_bprintk() requires
a scan to know what args will be saved, although it doesn't copy the
format string itself.

Several times trace_printk() has no args, and wastes cpu cycles scanning
the fmt string.

Adding trace_puts() allows the developer to use an even faster
tracing method that only saves the pointer to the string in the
ring buffer without doing any format parsing at all. This will
help remove even more of the "Heisenbug" effect, when debugging.

Also fixed up the F_printk()s for the ftrace internal bprint and print events.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
55034cd6e648155393b0d665eef76b38d49ad6bf 08-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Add alloc_snapshot kernel command line parameter

If debugging the kernel, and the developer wants to use
tracing_snapshot() in places where tracing_snapshot_alloc() may
be difficult (or more likely, the developer is lazy and doesn't
want to bother with tracing_snapshot_alloc() at all), then adding

alloc_snapshot

to the kernel command line parameter will tell ftrace to allocate
the snapshot buffer (if configured) when it allocates the main
tracing buffer.

I also noticed that ring_buffer_expanded and tracing_selftest_disabled
had inconsistent use of boolean "true" and "false" with "0" and "1".
I cleaned that up too.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f4e781c0a89d5810729772290441ac7d61f321ec 07-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Move the tracing selftest code into its own function

Move the tracing startup selftest code into its own function and
when not enabled, always have that function succeed.

This makes the register_tracer() function much more readable.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ad909e21bbe69f1d39055d346540abd827190eca 07-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Add internal tracing_snapshot() functions

The new snapshot feature is quite handy. It's a way for the user
to take advantage of the spare buffer that, until then, only
the latency tracers used to "snapshot" the buffer when it hit
a max latency. Now users can trigger a "snapshot" manually when
some condition is hit in a program. But a snapshot currently can
not be triggered by a condition inside the kernel.

With the addition of tracing_snapshot() and tracing_snapshot_alloc(),
snapshots can now be taking when a condition is hit, and the
developer wants to snapshot the case without stopping the trace.

Note, any snapshot will overwrite the old one, so take care
in how this is done.

These new functions are to be used like tracing_on(), tracing_off()
and trace_printk() are. That is, they should never be called
in the mainline Linux kernel. They are solely for the purpose
of debugging.

The tracing_snapshot() will not allocate a buffer, but it is
safe to be called from any context (except NMIs). But if a
snapshot buffer isn't allocated when it is called, it will write
to the live buffer, complaining about the lack of a snapshot
buffer, and then stop tracing (giving you the "permanent snapshot").

tracing_snapshot_alloc() will allocate the snapshot buffer if
it was not already allocated and then take the snapshot. This routine
*may sleep*, and must be called from context that can sleep.
The allocation is done with GFP_KERNEL and not atomic.

If you need a snapshot in an atomic context, say in early boot,
then it is best to call the tracing_snapshot_alloc() before then,
where it will allocate the buffer, and then you can use the
tracing_snapshot() anywhere you want and still get snapshots.

Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a695cb5816228f86576f5f5c6809fdf8ed382ece 06-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Prevent deleting instances when they are being read

Add a ref count to the trace_array structure and prevent removal
of instances that have open descriptors.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
121aaee7b0a82605d33af200c7e9ebab6fd6e444 06-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Add per_cpu directory into tracing instances

Add the per_cpu directory to the created tracing instances:

cd /sys/kernel/debug/tracing/instances
mkdir foo
ls foo/per_cpu/cpu0
buffer_size_kb snapshot_raw trace trace_pipe_raw
snapshot stats trace_pipe

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ce9bae55972b228cf7bac34350c4d2caf8ea0d0b 06-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Add snapshot feature to instances

Add the "snapshot" file to the the multi-buffer instances.

cd /sys/kernel/debug/tracing/instances
mkdir foo
ls foo
buffer_size_kb buffer_total_size_kb events free_buffer set_event
snapshot trace trace_clock trace_marker trace_options trace_pipe
tracing_on
cat foo/snapshot
# tracer: nop
#
#
# * Snapshot is freed *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
737223fbca3b1c91feb947c7f571b35749b743b6 06-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Consolidate buffer allocation code

There's a bit of duplicate code in creating the trace buffers for
the normal trace buffer and the max trace buffer among the instances
and the main global_trace. This code can be consolidated and cleaned
up a bit making the code cleaner and more readable as well as less
duplication.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
45ad21ca5530efdca6a19e4a5ac5e7bd6e24f996 06-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Have trace_array keep track if snapshot buffer is allocated

The snapshot buffer belongs to the trace array not the tracer that is
running. The trace array should be the data structure that keeps track
of whether or not the snapshot buffer is allocated, not the tracer
desciptor. Having the trace array keep track of it makes modifications
so much easier.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6de58e6269cd0568ca5fbae14423914eff0f7811 05-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Add snapshot_raw to extract the raw data from snapshot

Add a 'snapshot_raw' per_cpu file that allows tools to read the raw
binary data of the snapshot buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f1affcaaa861f27752a769f889bf1486ebd301fe 05-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Add snapshot in the per_cpu trace directories

Add the snapshot file into the per_cpu tracing directories to allow
them to be read for an individual cpu. This also allows to clear
an individual cpu from the snapshot buffer.

If the kernel allows it (CONFIG_RING_BUFFER_ALLOW_SWAP is set), then
echoing in '1' into one of the per_cpu snapshot files will do an
individual cpu buffer swap instead of the entire file.

Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12883efb670c28dff57dcd7f4f995a1ffe153b2d 05-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Consolidate max_tr into main trace_array structure

Currently, the way the latency tracers and snapshot feature works
is to have a separate trace_array called "max_tr" that holds the
snapshot buffer. For latency tracers, this snapshot buffer is used
to swap the running buffer with this buffer to save the current max
latency.

The only items needed for the max_tr is really just a copy of the buffer
itself, the per_cpu data pointers, the time_start timestamp that states
when the max latency was triggered, and the cpu that the max latency
was triggered on. All other fields in trace_array are unused by the
max_tr, making the max_tr mostly bloat.

This change removes the max_tr completely, and adds a new structure
called trace_buffer, that holds the buffer pointer, the per_cpu data
pointers, the time_start timestamp, and the cpu where the latency occurred.

The trace_array, now has two trace_buffers, one for the normal trace and
one for the max trace or snapshot. By doing this, not only do we remove
the bloat from the max_trace but the instances of traces can now use
their own snapshot feature and not have just the top level global_trace have
the snapshot feature and latency tracers for itself.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
873c642f5964b260480850040dec21e42d0ae4e4 05-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Clear all trace buffers when unloaded module event was used

Currently we do not know what buffer a module event was enabled in.
On unload, it is safest to clear all buffer instances, not just the
top level buffer.

Todo: Clear only the buffer that the event was used in. The
infrastructure is there to do this, but it makes the code a bit
more complex. Lets get the current code vetted before we add that.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
15693458c4bc0693fd63a50d60f35b628fcf4e29 01-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing/ring-buffer: Move poll wake ups into ring buffer code

Move the logic to wake up on ring buffer data into the ring buffer
code itself. This simplifies the tracing code a lot and also has the
added benefit that waiters on one of the instance buffers can be woken
only when data is added to that instance instead of data added to
any instance.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b627344fef0c38fa4e3050348e168e46db87c905 28-Feb-2013 Steven Rostedt <srostedt@redhat.com> tracing: Fix read blocking on trace_pipe_raw

If the ring buffer is empty, a read to trace_pipe_raw wont block.
The tracing code has the infrastructure to wake up waiting readers,
but the trace_pipe_raw doesn't take advantage of that.

When a read is done to trace_pipe_raw without the O_NONBLOCK flag
set, have the read block until there's data in the requested buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cc60cdc952be09bca5b0bff9fefc7aa6185c3049 28-Feb-2013 Steven Rostedt <srostedt@redhat.com> tracing: Fix polling on trace_pipe_raw

The trace_pipe_raw never implemented polling and this was casing
issues for several utilities. This is now implemented.

Blocked reads still are on the TODO list.

Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Tested-by: Mauro Carvalho Chehab <mchehab@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
189e5784f6c5e001a84127b83f03bc76a8bfb1ec 01-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Do not block on splice if either file or splice NONBLOCK flag is set

Currently only the splice NONBLOCK flag is checked to determine if
the splice read should block or not. But the file descriptor NONBLOCK
flag also needs to be checked.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0c8916c34203734d3b05953ebace52d7c2969f16 07-Aug-2012 Steven Rostedt <srostedt@redhat.com> tracing: Add rmdir to remove multibuffer instances

Add a method to the hijacked dentry descriptor of the
"instances" directory to allow for rmdir to remove an
instance of a multibuffer.

Example:

cd /debug/tracing/instances
mkdir hello
ls
hello/
rmdir hello
ls

Like the mkdir method, the i_mutex is dropped for the instances
directory. The instances directory is created at boot up and can
not be renamed or removed. The trace_types_lock mutex is used to
synchronize adding and removing of instances.

I've run several stress tests with different threads trying to
create and delete directories of the same name, and it has stood
up fine.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
277ba04461c2746cf935353474c0961161951b68 03-Aug-2012 Steven Rostedt <srostedt@redhat.com> tracing: Add interface to allow multiple trace buffers

Add the interface ("instances" directory) to add multiple buffers
to ftrace. To create a new instance, simply do a mkdir in the
instances directory:

This will create a directory with the following:

# cd instances
# mkdir foo
# ls foo
buffer_size_kb free_buffer trace_clock trace_pipe
buffer_total_size_kb set_event trace_marker tracing_enabled
events/ trace trace_options tracing_on

Currently only events are able to be set, and there isn't a way
to delete a buffer when one is created (yet).

Note, the i_mutex lock is dropped from the parent "instances"
directory during the mkdir operation. As the "instances" directory
can not be renamed or deleted (created on boot), I do not see
any harm in dropping the lock. The creation of the sub directories
is protected by trace_types_lock mutex, which only lets one
instance get into the code path at a time. If two tasks try to
create or delete directories of the same name, only one will occur
and the other will fail with -EEXIST.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a7603ff4b5f7e26e67af82a4c3d05eeeb8d7b160 06-Aug-2012 Steven Rostedt <srostedt@redhat.com> tracing: Replace the static global per_cpu arrays with allocated per_cpu

The global and max-tr currently use static per_cpu arrays for the CPU data
descriptors. But in order to get new allocated trace_arrays, they need to
be allocated per_cpu arrays. Instead of using the static arrays, switch
the global and max-tr to use allocated data.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ccb469a198cffac94a7eea0b69f715f06e2ddf15 02-Aug-2012 Steven Rostedt <srostedt@redhat.com> tracing: Pass the ftrace_file to the buffer lock reserve code

Pass the struct ftrace_event_file *ftrace_file to the
trace_event_buffer_lock_reserve() (new function that replaces the
trace_current_buffer_lock_reserver()).

The ftrace_file holds a pointer to the trace_array that is in use.
In the case of multiple buffers with different trace_arrays, this
allows different events to be recorded into different buffers.

Also fixed some of the stale comments in include/trace/ftrace.h

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2b6080f28c7cc3efc8625ab71495aae89aeb63a0 11-May-2012 Steven Rostedt <srostedt@redhat.com> tracing: Encapsulate global_trace and remove dependencies on global vars

The global_trace variable in kernel/trace/trace.c has been kept 'static' and
local to that file so that it would not be used too much outside of that
file. This has paid off, even though there were lots of changes to make
the trace_array structure more generic (not depending on global_trace).

Removal of a lot of direct usages of global_trace is needed to be able to
create more trace_arrays such that we can add multiple buffers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ae3b5093ad6004b52e2825f3db1ad8200a2724d8 23-Jan-2013 Steven Rostedt <srostedt@redhat.com> tracing: Use RING_BUFFER_ALL_CPUS for TRACE_PIPE_ALL_CPU

Both RING_BUFFER_ALL_CPUS and TRACE_PIPE_ALL_CPU are defined as
-1 and used to say that all the ring buffers are to be modified
or read (instead of just a single cpu, which would be >= 0).

There's no reason to keep TRACE_PIPE_ALL_CPU as it is also started
to be used for more than what it was created for, and now that
the ring buffer code added a generic RING_BUFFER_ALL_CPUS define,
we can clean up the trace code to use that instead and remove
the TRACE_PIPE_ALL_CPU macro.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ae63b31e4d0e2ec09c569306ea46f664508ef717 04-May-2012 Steven Rostedt <srostedt@redhat.com> tracing: Separate out trace events from global variables

The trace events for ftrace are all defined via global variables.
The arrays of events and event systems are linked to a global list.
This prevents multiple users of the event system (what to enable and
what not to).

By adding descriptors to represent the event/file relation, as well
as to which trace_array descriptor they are associated with, allows
for more than one set of events to be defined. Once the trace events
files have a link between the trace event and the trace_array they
are associated with, we can create multiple trace_arrays that can
record separate events in separate buffers.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
613f04a0f51e6e68ac6fe571ab79da3c0a5eb4da 14-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Prevent buffer overwrite disabled for latency tracers

The latency tracers require the buffers to be in overwrite mode,
otherwise they get screwed up. Force the buffers to stay in overwrite
mode when latency tracers are enabled.

Added a flag_changed() method to the tracer structure to allow
the tracers to see what flags are being changed, and also be able
to prevent the change from happing.

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
80902822658aab18330569587cdb69ac1dfdcea8 14-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Keep overwrite in sync between regular and snapshot buffers

Changing the overwrite mode for the ring buffer via the trace
option only sets the normal buffer. But the snapshot buffer could
swap with it, and then the snapshot would be in non overwrite mode
and the normal buffer would be in overwrite mode, even though the
option flag states otherwise.

Keep the two buffers overwrite modes in sync.

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
69d34da2984c95b33ea21518227e1f9470f11d95 14-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Protect tracer flags with trace_types_lock

Seems that the tracer flags have never been protected from
synchronous writes. Luckily, admins don't usually modify the
tracing flags via two different tasks. But if scripts were to
be used to modify them, then they could get corrupted.

Move the trace_types_lock that protects against tracers changing
to also protect the flags being set.

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2721e72dd10f71a3ba90f59781becf02638aa0d9 12-Mar-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Fix race in snapshot swapping

Although the swap is wrapped with a spin_lock, the assignment
of the temp buffer used to swap is not within that lock.
It needs to be moved into that lock, otherwise two swaps
happening on two different CPUs, can end up using the wrong
temp buffer to assign in the swap.

Luckily, all current callers of the swap function appear to have
their own locks. But in case something is added that allows two
different callers to call the swap, then there's a chance that
this race can trigger and corrupt the buffers.

New code is coming soon that will allow for this race to trigger.

I've Cc'd stable, so this bug will not show up if someone backports
one of the changes that can trigger this bug.

Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c9960e48543799f168c4c9486f9790fb686ce5a8 05-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Do not return EINVAL in snapshot when not allocated

To use the tracing snapshot feature, writing a '1' into the snapshot
file causes the snapshot buffer to be allocated if it has not already
been allocated and dose a 'swap' with the main buffer, so that the
snapshot now contains what was in the main buffer, and the main buffer
now writes to what was the snapshot buffer.

To free the snapshot buffer, a '0' is written into the snapshot file.

To clear the snapshot buffer, any number but a '0' or '1' is written
into the snapshot file. But if the file is not allocated it returns
-EINVAL error code. This is rather pointless. It is better just to
do nothing and return success.

Acked-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d8741e2e88ac9a458765a0c7b4e6542d7c038334 05-Mar-2013 Steven Rostedt (Red Hat) <srostedt@redhat.com> tracing: Add help of snapshot feature when snapshot is empty

When cat'ing the snapshot file, instead of showing an empty trace
header like the trace file does, show how to use the snapshot
feature.

Also, this is a good place to show if the snapshot has been allocated
or not. Users may want to "pre allocate" the snapshot to have a fast
"swap" of the current buffer. Otherwise, a swap would be slow and might
fail as it would need to allocate the snapshot buffer, and that might
fail under tight memory constraints.

Here's what it looked like before:

# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |

Here's what it looks like now:

# tracer: nop
#
#
# * Snapshot is freed *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')

Acked-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8bd75c77b7c6a3954140dd2e20346aef3efe4a35 07-Feb-2013 Clark Williams <williams@redhat.com> sched/rt: Move rt specific bits into new header file

Move rt scheduler definitions out of include/linux/sched.h into
new file include/linux/sched/rt.h

Signed-off-by: Clark Williams <williams@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan
Signed-off-by: Ingo Molnar <mingo@kernel.org>
d840f718d28715a9833c1a8f46c2493ff3fd219b 02-Feb-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> tracing: Init current_trace to nop_trace and remove NULL checks

On early boot up, when the ftrace ring buffer is initialized, the
static variable current_trace is initialized to &nop_trace.
Before this initialization, current_trace is NULL and will never
become NULL again. It is always reassigned to a ftrace tracer.

Several places check if current_trace is NULL before it uses
it, and this check is frivolous, because at the point in time
when the checks are made the only way current_trace could be
NULL is if ftrace failed its allocations at boot up, and the
paths to these locations would probably not be possible.

By initializing current_trace to &nop_trace where it is declared,
current_trace will never be NULL, and we can remove all these
checks of current_trace being NULL which never needed to be
checked in the first place.

Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
debdd57f5145f3c6a4b3f8d0126abd1a2def7fc6 26-Dec-2012 Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> tracing: Make a snapshot feature available from userspace

Ftrace has a snapshot feature available from kernel space and
latency tracers (e.g. irqsoff) are using it. This patch enables
user applictions to take a snapshot via debugfs.

Add "snapshot" debugfs file in "tracing" directory.

snapshot:
This is used to take a snapshot and to read the output of the
snapshot.

# echo 1 > snapshot

This will allocate the spare buffer for snapshot (if it is
not allocated), and take a snapshot.

# cat snapshot

This will show contents of the snapshot.

# echo 0 > snapshot

This will free the snapshot if it is allocated.

Any other positive values will clear the snapshot contents if
the snapshot is allocated, or return EINVAL if it is not allocated.

Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
[
Fixed irqsoff selftest and also a conflict with a change
that fixes the update_max_tr.
]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2fd196ec1eab2623096e7fc7e6f3976160392bce 26-Dec-2012 Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> tracing: Replace static old_tracer check of tracer name

Currently the trace buffer read functions use a static variable
"old_tracer" for detecting if the current tracer changes. This
was suitable for a single trace file ("trace"), but to add a
snapshot feature that will use the same function for its file,
a check against a static variable is not sufficient.

To use the output functions for two different files, instead of
storing the current tracer in a static variable, as the trace
iterator descriptor contains a pointer to the original current
tracer's name, that pointer can now be used to check if the
current tracer has changed between different reads of the trace
file.

Link: http://lkml.kernel.org/r/20121226025252.3252.9276.stgit@liselsia

Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ad964704ba9326d027fc10fd0099b7c880e50172 29-Jan-2013 Steven Rostedt (Red Hat) <rostedt@goodmis.org> ring-buffer: Add stats field for amount read from trace ring buffer

Add a stat about the number of events read from the ring buffer:

# cat /debug/tracing/per_cpu/cpu0/stats
entries: 39869
overrun: 870512
commit overrun: 0
bytes: 1449912
oldest event ts: 6561.368690
now ts: 6565.246426
dropped events: 0
read events: 112 <-- Added

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
38dbe0b137bfe6ea92be495017885c0785179a02 25-Jan-2013 Jovi Zhang <bookjovi@gmail.com> tracing: Remove second iterator initializer

The trace iterator is already initialized by trace_init_global_iter(),
so there is no need to initialize it again.

Link: http://lkml.kernel.org/r/CACV3sb+G1YnO6168JhY3dEadmJi58pA5-2cSZT8E0WVHJNFt9Q@mail.gmail.com

Signed-off-by: Jovi Zhang <bookjovi@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
821465295b36136998ef294fe176fba4e09c1cd9 19-Nov-2012 Shan Wei <davidshan@tencent.com> tracing: Use __this_cpu_inc/dec operation instead of __get_cpu_var

__this_cpu_inc_return() or __this_cpu_dec generates a single instruction,
which is faster than __get_cpu_var operation.

Link: http://lkml.kernel.org/r/50A9C1BD.1060308@gmail.com

Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b736f48bda54ec75b7dc9306884c3843f1a78a0a 19-Nov-2012 Josh Triplett <josh@joshtriplett.org> tracing: Mark tracing_dentry_percpu() static

Nothing outside of kernel/trace/trace.c references tracing_dentry_percpu().

Link: http://lkml.kernel.org/r/1353302917-13995-7-git-send-email-josh@joshtriplett.org

Signed-off-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
34600f0e9c33c9cd48ae87448205f51332b7d5a0 22-Jan-2013 Steven Rostedt <srostedt@redhat.com> tracing: Fix race with max_tr and changing tracers

There's a race condition between the setting of a new tracer and
the update of the max trace buffers (the swap). When a new tracer
is added, it sets current_trace to nop_trace before disabling
the old tracer. At this moment, if the old tracer uses update_max_tr(),
the update may trigger the warning against !current_trace->use_max-tr,
as nop_trace doesn't have that set.

As update_max_tr() requires that interrupts be disabled, we can
add a check to see if current_trace == nop_trace and bail if it
does. Then when disabling the current_trace, set it to nop_trace
and run synchronize_sched(). This will make sure all calls to
update_max_tr() have completed (it was called with interrupts disabled).

As a clean up, this commit also removes shrinking and recreating
the max_tr buffer if the old and new tracers both have use_max_tr set.
The old way use to always shrink the buffer, and then expand it
for the next tracer. This is a waste of time.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b000c8065a92b0fe0e1694f41b2c8d8ba7b7b1ec 18-Jan-2013 Steven Rostedt <srostedt@redhat.com> tracing: Remove the extra 4 bytes of padding in events

Due to a userspace issue with PowerTop v2beta, which hardcoded
the offset of event fields that it was using, it broke when
we removed the Big Kernel Lock counter from the event header.

(commit e6e1e2593 "tracing: Remove lock_depth from event entry")

Because this broke userspace, it was determined that we must
keep those 4 bytes around.

(commit a3a4a5acd "Regression: partial revert "tracing: Remove lock_depth from event entry"")

This unfortunately wastes space in the ring buffer. 4 bytes per
event, where a lot of events are just 24 bytes. That's 16% of the
buffer wasted. A million events will add 4 megs of white space
into the buffer.

It was later noticed that PowerTop v2beta could not work on systems
where the kernel was 64 bit but the userspace was 32 bits.
The reason was because the offsets are different between the
two and the hard coded offset of one would not work with the other.

With PowerTop v2 final, it implemented the same interface that both
perf and trace-cmd use. That is, it reads the format file of
the event to find the offsets of the fields it needs. This fixes
the problem with running powertop on a 32 bit userspace running
on a 64 bit kernel. It also no longer requires the 4 byte padding.

As PowerTop v2 has been out for a while, and is included in all
major distributions, it is time that we can safely remove the
4 bytes of padding. Users of PowerTop v2beta should upgrade to
PowerTop v2 final.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
84c6cf0db6a00601eb43cfc08244a398ffb0894c 21-Dec-2012 Steven Rostedt <srostedt@redhat.com> tracing: Remove unneeded check of max_tr->buffer before tracing_reset

There's now a check in tracing_reset_online_cpus() if the buffer is
allocated or NULL. No need to do a check before calling it with max_tr.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a54164114b96b4693b42cdb553260eec41ea4393 19-Dec-2012 Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> tracing: Add checks if tr->buffer is NULL in tracing_reset{_online_cpus}

max_tr->buffer could be NULL in the tracing_reset{_online_cpus}. In this
case, a NULL pointer dereference happens, so we should return immediately
from these functions.

Note, the current code does not call tracing_reset*() with max_tr when
its buffer is NULL, but future code will. This patch is needed to prevent
the future code from crashing.

Link: http://lkml.kernel.org/r/20121219070234.31200.93863.stgit@liselsia

Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d8a0349c0cea477322c66ea9362f10c62fad5f62 13-Nov-2012 Shan Wei <davidshan@tencent.com> tracing: Use this_cpu_ptr per-cpu helper

typeof(&buffer) is a pointer to array of 1024 char, or char (*)[1024].
But, typeof(&buffer[0]) is a pointer to char which match the return type of get_trace_buf().
As well-known, the value of &buffer is equal to &buffer[0].
so return this_cpu_ptr(&percpu_buffer->buffer[0]) can avoid type cast.

Link: http://lkml.kernel.org/r/50A1A800.3020102@gmail.com

Reviewed-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Shan Wei <davidshan@tencent.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
250bfd3d8e7e19cb649dd94689f0af2ce3474060 14-Jan-2013 Liu Bo <bo.li.liu@oracle.com> tracing: Fix regression of trace_pipe

Commit 0fb9656d "tracing: Make tracing_enabled be equal to tracing_on"
changes the behaviour of trace_pipe, ie. it makes trace_pipe return if
we've read something and tracing is enabled, and this means that we have
to 'cat trace_pipe' again and again while running tests.

IMO the right way is if tracing is enabled, we always block and wait for
ring buffer, or we may lose what we want since ring buffer's size is limited.

Link: http://lkml.kernel.org/r/1358132051-5410-1-git-send-email-bo.li.liu@oracle.com

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2df8f8a6a897ebf4c5613b5be6103d33b2a21520 11-Jan-2013 Steven Rostedt <srostedt@redhat.com> tracing: Fix regression with irqsoff tracer and tracing_on file

Commit 02404baf1b47 "tracing: Remove deprecated tracing_enabled file"
removed the tracing_enabled file as it never worked properly and
the tracing_on file should be used instead. But the tracing_on file
didn't call into the tracers start/stop routines like the
tracing_enabled file did. This caused trace-cmd to break when it
enabled the irqsoff tracer.

If you just did "echo irqsoff > current_tracer" then it would work
properly. But the tool trace-cmd disables tracing first by writing
"0" into the tracing_on file. Then it writes "irqsoff" into
current_tracer and then writes "1" into tracing_on. Unfortunately,
the above commit changed the irqsoff tracer to check the tracing_on
status instead of the tracing_enabled status. If it's disabled then
it does not start the tracer internals.

The problem is that writing "1" into tracing_on does not call the
tracers "start" routine like writing "1" into tracing_enabled did.
This makes the irqsoff tracer not start when using the trace-cmd
tool, and is a regression for userspace.

Simple fix is to have the tracing_on file call the tracers start()
method when being enabled (and the stop() method when disabled).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a8dd2176a8e988e3744e863ac39647a6f59fa900 10-Jan-2013 Steven Rostedt <srostedt@redhat.com> tracing: Fix regression of trace_options file setting

The latest change to allow trace options to be set on the command
line also broke the trace_options file.

The zeroing of the last byte of the option name that is echoed into
the trace_option file was removed with the consolidation of some
of the code. The compare between the option and what was written to
the trace_options file fails because the string holding the data
written doesn't terminate with a null character.

A zero needs to be added to the end of the string copied from
user space.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6d49e352ae9aed3f599041b0c0389aa924815f14 06-Dec-2012 Nadia Yvette Chambers <nyc@holomorphy.com> propagate name change to comments in kernel source

I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.

Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
bf3071f5a054db9e5bab873355d27a7330ce5187 25-Jul-2012 Dave Jones <davej@redhat.com> tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read

WARN shouldn't be used as a means of communicating failure to a userspace programmer.

Link: http://lkml.kernel.org/r/20120725153908.GA25203@redhat.com

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d60da506cbeb3f1907a740547dd7ef04a93e908e 17-Oct-2012 Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> tracing: Add a resize function to make one buffer equivalent to another buffer

Trace buffer size is now per-cpu, so that there are the following two
patterns in resizing of buffers.

(1) resize per-cpu buffers to same given size
(2) resize per-cpu buffers to another trace_array's buffer size
for each CPU (such as preparing the max_tr which is equivalent
to the global_trace's size)

__tracing_resize_ring_buffer() can be used for (1), and had
implemented (2) inside it for resetting the global_trace to the
original size.

(2) was also implemented in another place. So this patch assembles
them in a new function - resize_buffer_duplicate_size().

Link: http://lkml.kernel.org/r/20121017025616.2627.91226.stgit@falsita

Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
11043d8b125671a32253cddb0b05177be0e976f6 13-Nov-2012 Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> tracing: Show raw time stamp on stats per cpu using counter or tsc mode for trace_clock

Show raw time stamp values for stats per cpu if you choose counter or tsc mode
for trace_clock. Although a unit of tracing time stamp is nsec in local or global mode,
the units in counter and TSC mode are tracing counter and cycles respectively.
Link: http://lkml.kernel.org/r/1352837903-32191-3-git-send-email-dhsharp@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8be0709f10e3dd5d7d07933ad61a9f18c4b93ca5 13-Nov-2012 David Sharp <dhsharp@google.com> tracing: Format non-nanosec times from tsc clock without a decimal point.

With the addition of the "tsc" clock, formatting timestamps to look like
fractional seconds is misleading. Mark clocks as either in nanoseconds or
not, and format non-nanosecond timestamps as decimal integers.

Tested:
$ cd /sys/kernel/debug/tracing/
$ cat trace_clock
[local] global tsc
$ echo sched_switch > set_event
$ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
$ cat trace
<idle>-0 [000] 6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
sleep-29964 [000] 6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
...
$ echo 1 > options/latency-format
$ cat trace
<idle>-0 0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120
sleep-29964 0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
...
$ echo tsc > trace_clock
$ cat trace
$ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on
$ echo 0 > options/latency-format
$ cat trace
<idle>-0 [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
...
echo 1 > options/latency-format
$ cat trace
<idle>-0 0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120
sleep-31128 0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120
...

v2:
Move arch-specific bits out of generic code.
v4:
Fix x86_32 build due to 64-bit division.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8cbd9cc6254065c97c4bac42daa55ba1abe73a8e 13-Nov-2012 David Sharp <dhsharp@google.com> tracing,x86: Add a TSC trace_clock

In order to promote interoperability between userspace tracers and ftrace,
add a trace_clock that reports raw TSC values which will then be recorded
in the ring buffer. Userspace tracers that also record TSCs are then on
exactly the same time base as the kernel and events can be unambiguously
interlaced.

Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large
timestamp values.

v2:
Move arch-specific bits out of generic code.
v3:
Rename "x86-tsc", cleanups
v7:
Generic arch bits in Kbuild.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com

Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7bcfaf54f591a0775254c4ea679faf615152ee3a 02-Nov-2012 Steven Rostedt <srostedt@redhat.com> tracing: Add trace_options kernel command line parameter

Add trace_options to the kernel command line parameter to be able to
set options at early boot. For example, to enable stack dumps of
events, add the following:

trace_options=stacktrace

This along with the trace_event option, you can get not only
traces of the events but also the stack dumps with them.

Requested-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0d5c6e1c19bab82fad4837108c2902f557d62a04 02-Nov-2012 Steven Rostedt <srostedt@redhat.com> tracing: Use irq_work for wake ups and remove *_nowake_*() functions

Have the ring buffer commit function use the irq_work infrastructure to
wake up any waiters waiting on the ring buffer for new data. The irq_work
was created for such a purpose, where doing the actual wake up at the
time of adding data is too dangerous, as an event or function trace may
be in the midst of the work queue locks and cause deadlocks. The irq_work
will either delay the action to the next timer interrupt, or trigger an IPI
to itself forcing an interrupt to do the work (in a safe location).

With irq_work, all ring buffer commits can safely do wakeups, removing
the need for the ring buffer commit "nowake" variants, which were used
by events and function tracing. All commits can now safely use the
normal commit, and the "nowake" variants can be removed.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
02404baf1b47123f1c88c9f9f1f3b00e1e2b10db 01-Nov-2012 Steven Rostedt <srostedt@redhat.com> tracing: Remove deprecated tracing_enabled file

The tracing_enabled file was used as a quick way to stop
tracers, and try to bring down overhead for things like
the latency tracers (irqsoff, wakeup, etc). But it didn't
work that well.

The tracing_on file was created as a really fast way to
stop recording into the ftrace ring buffer and can interact
with the kernel. That is a tracing_off() call in the kernel
can disable recording of events, and then from userspace one
could echo 1 into the tracing_on file to continue it. The
tracing_enabled function did too much to allow for this.

The tracing_on has taken over as a way to start and stop tracing
and the tracing_enabled file should not be used. But because of
its existance, it still confuses people. Over a year ago the
following commit was added:

commit 6752ab4a9c30d5411b2dfdb251a3f1cb18aae487
Author: Steven Rostedt <srostedt@redhat.com>
Date: Tue Feb 8 13:54:06 2011 -0500

tracing: Deprecate tracing_enabled for tracing_on

This commit added a WARN_ON() if the tracing_enabled file's variable
was changed. After this was added, only LatencyTop complained, and
they soon fixed their tool as there was no reason that LatencyTop
should touch this file as it was using the perf ring buffers which
this file does not interact with. But since that time no one else
has complained about this WARN_ON(). Thus it is safe to assume that
this file is no longer needed. Time to get rid of it.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0fb9656d957d79dbe7ae155bb6533b1d465e4a50 11-May-2012 Steven Rostedt <srostedt@redhat.com> tracing: Make tracing_enabled be equal to tracing_on

The tracing_enabled file has been deprecated as it never was able
to serve its purpose well. The tracing_on file has taken over.
Instead of having code to keep tracing_enabled, have the tracing_enabled
file just set tracing_on, and remove the tracing_enabled variable.

This allows us to remove the tracing_enabled file. The reason that
the remove is in a different change set and not removed here is
in case we find some lonely userspace tool that requires the file
to exist. Then the removal patch will get reverted, but this one
will not.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c7b84ecada9a8b7fe3e6c081e70801703897ed5d 12-May-2012 Steven Rostedt <srostedt@redhat.com> tracing: Remove unused function unregister_tracer()

The function register_tracer() is only used by kernel core code,
that never needs to remove the tracer. As trace_events have become
the main way to add new tracing to the kernel, the need to
unregister a tracer has diminished. Remove the unused function
unregister_tracer(). If a need arises where we need it, then we
can always add it back.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
60303ed3f4b9332b9aa9bc17c68bc174e7343e2d 12-Oct-2012 David Sharp <dhsharp@google.com> tracing: Reset ring buffer when changing trace_clocks

Because the "tsc" clock isn't in nanoseconds, the ring buffer must be
reset when changing clocks so that incomparable timestamps don't end up
in the same trace.

Tested: Confirmed switching clocks resets the trace buffer.

Google-Bug-Id: 6980623
Link: http://lkml.kernel.org/r/1349998076-15495-3-git-send-email-dhsharp@google.com

Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: David Sharp <dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7ffbd48d5cab22bcd1120eb2349db1319e2d827a 11-Oct-2012 Steven Rostedt <srostedt@redhat.com> tracing: Cache comms only after an event occurred

Whenever an event is registered, the comm of tasks are saved at
every task switch instead of saving them at every event. But if
an event isn't executed much, the comm cache will be filled up
by tasks that did not record the event and you lose out on the comms
that did.

Here's an example, if you enable the following events:

echo 1 > /debug/tracing/events/kvm/kvm_cr/enable
echo 1 > /debug/tracing/events/net/net_dev_xmit/enable

Note, there's no kvm running on this machine so the first event will
never be triggered, but because it is enabled, the storing of comms
will continue. If we now disable the network event:

echo 0 > /debug/tracing/events/net/net_dev_xmit/enable

and look at the trace:

cat /debug/tracing/trace
sshd-2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
sshd-2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
sshd-2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
sshd-2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
sshd-2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0

We see that process 2672 which triggered the events has the comm "sshd".
But if we run hackbench for a bit and look again:

cat /debug/tracing/trace
<...>-2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0
<...>-2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0
<...>-2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0
<...>-2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0
<...>-2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0

The stored "sshd" comm has been flushed out and we get a useless "<...>".

But by only storing comms after a trace event occurred, we can run
hackbench all day and still get the same output.

Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
81698831bc462ff16f76bc11249a1e492424da4c 11-Oct-2012 Steven Rostedt <srostedt@redhat.com> tracing: Enable comm recording if trace_printk() is used

If comm recording is not enabled when trace_printk() is used then
you just get this type of output:

[ adding trace_printk("hello! %d", irq); in do_IRQ ]

<...>-2843 [001] d.h. 80.812300: do_IRQ: hello! 14
<...>-2734 [002] d.h2 80.824664: do_IRQ: hello! 14
<...>-2713 [003] d.h. 80.829971: do_IRQ: hello! 14
<...>-2814 [000] d.h. 80.833026: do_IRQ: hello! 14

By enabling the comm recorder when trace_printk is enabled:

hackbench-6715 [001] d.h. 193.233776: do_IRQ: hello! 21
sshd-2659 [001] d.h. 193.665862: do_IRQ: hello! 21
<idle>-0 [001] d.h1 193.665996: do_IRQ: hello! 21

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b382ede6b5eb8188926b72a9ef42fd2354342a97 11-Oct-2012 Steven Rostedt <srostedt@redhat.com> tracing: Expand ring buffer when trace_printk() is used

Since tracing is not used by 99% of Linux users, even though tracing
may be configured in, it does not make sense to allocate 1.4 Megs
per CPU for the ring buffers if they are not used. Thus, on boot up
the ring buffers are set to a minimal size until something needs the
and they are expanded.

This works well for events and tracers (function, etc), but for the
asynchronous use of trace_printk() which can write to the ring buffer
at any time, does not expand the buffers.

On boot up a check is made to see if any trace_printk() is used to
see if the trace_printk() temp buffer pages should be allocated. This
same code can be used to expand the buffers as well.

Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
884bfe89a462fcc85c8abd96171519cf2fe70929 15-Jul-2011 Slava Pestov <slavapestov@google.com> ring-buffer: Add a 'dropped events' counter

The existing 'overrun' counter is incremented when the ring
buffer wraps around, with overflow on (the default). We wanted
a way to count requests lost from the buffer filling up with
overflow off, too. I decided to add a new counter instead
of retro-fitting the existing one because it seems like a
different statistic to count conceptually, and also because
of how the code was structured.

Link: http://lkml.kernel.org/r/1310765038-26399-1-git-send-email-slavapestov@google.com

Signed-off-by: Slava Pestov <slavapestov@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bcd83ea6cbfee54e33d1527b87538dc99ca2137b 26-Sep-2012 Daniel Walter <sahne@0x90.at> tracing: Replace strict_strto* with kstrto*

* remove old string conversions with kstrto*

Link: http://lkml.kernel.org/r/20120926200838.GC1244@0x90.at

Signed-off-by: Daniel Walter <sahne@0x90.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d55cb6cf143ae16eaa415baab520b8eaf4a1012f 09-Aug-2012 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> ftrace: Allow stealing pages from pipe buffer

Use generic steal operation on pipe buffer to allow stealing
ring buffer's read page from pipe buffer.

Note that this could reduce the performance of splice on the
splice_write side operation without affinity setting.
Since the ring buffer's read pages are allocated on the
tracing-node, but the splice user does not always execute
splice write side operation on the same node. In this case,
the page will be accessed from the another node.
Thus, it is strongly recommended to assign the splicing
thread to corresponding node.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
5224c3a31549f1c056039545b289e1b01ed02f12 08-Sep-2012 Mandeep Singh Baines <mandeep.baines@gmail.com> tracing: Add an option for disabling markers

In our application, we have trace markers spread through user-space.
We have markers in GL, X, etc. These are super handy for Chrome's
about:tracing feature (Chrome + system + kernel trace view), but
can be very distracting when you're trying to debug a kernel issue.

I normally, use "grep -v tracing_mark_write" but it would be nice
if I could just temporarily disable markers all together.

Link: http://lkml.kernel.org/r/1347066739-26285-1-git-send-email-msb@chromium.org

CC: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d20b92ab668cc44fc84bba0001839c5a8013a5cd 14-Mar-2012 Eric W. Biederman <ebiederm@xmission.com> userns: Teach trace to use from_kuid

- When tracing capture the kuid.
- When displaying the data to user space convert the kuid into the
user namespace of the process that opened the report file.

Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
87abb3b15c62033409f5bf2ffb5620c94f91cf2c 02-Aug-2012 Wang Tianhong <wangthbj@linux.vnet.ibm.com> tracing/trivial: Fix some typos in kernel/trace

Fix some typos in kernel/trace.

Link: http://lkml.kernel.org/r/1343887320.2228.9.camel@louis-ThinkPad-T410

Signed-off-by: Wang Tianhong <wangthbj@linux.vnet.ibm.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b2ad368bebc0f772613668e893fa176396e9094c 10-Jul-2012 Anton Vorontsov <anton.vorontsov@linaro.org> tracing: Fix initialization failure path in tracing_set_tracer()

If tracer->init() fails, current code will leave current_tracer pointing
to an unusable tracer, which at best makes 'current_tracer' report
inaccurate value.

Fix the issue by pointing current_tracer to nop tracer, and only update
current_tracer with the new one after all the initialization succeeds.

Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
93574fcc5b50cc7b8834698acb2ce947e5b6a5dc 11-Jul-2012 Dan Carpenter <dan.carpenter@oracle.com> tracing: Check for allocation failure in __tracing_open()

Clean up and return -ENOMEM on if the kzalloc() fails.

This also prevents a potential crash, as the pointer that failed to
allocate would be later used.

Link: http://lkml.kernel.org/r/20120711063507.GF11812@elgon.mountain

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6d158a813efcd09661c23f16ddf7e2ff834cb20c 28-Jun-2012 Steven Rostedt <srostedt@redhat.com> tracing: Remove NR_CPUS array from trace_iterator

Replace the NR_CPUS array of buffer_iter from the trace_iterator
with an allocated array. This will just create an array of
possible CPUS instead of the max number specified.

The use of NR_CPUS in that array caused allocation failures for
machines that were tight on memory. This did not cause any failures
to the system itself (no crashes), but caused unnecessary failures
for reading the trace files.

Added a helper function called 'trace_buffer_iter()' that returns
the buffer_iter item or NULL if it is not defined or the array was
not allocated. Some routines do not require the array
(tracing_open_pipe() for one).

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0be61ebc18b919dddbdbcd1c4f42513c310ecf59 18-Jun-2012 Steven Rostedt <srostedt@redhat.com> tracing/selftest: Add a WARN_ON() if a tracer test fails

Add a WARN_ON() output on test failures so that they are easier to detect
in automated tests. Although, the WARN_ON() will not print if the test
causes the system to crash, obviously.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
047fe3605235888f3ebcda0c728cb31937eadfe6 12-Jun-2012 Eric Dumazet <edumazet@google.com> splice: fix racy pipe->buffers uses

Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
by splice_shrink_spd() called from vmsplice_to_pipe()

commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes)
added capability to adjust pipe->buffers.

Problem is some paths don't hold pipe mutex and assume pipe->buffers
doesn't change for their duration.

Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
use it in place of pipe->buffers where appropriate.

splice_shrink_spd() loses its struct pipe_inode_info argument.

Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Tom Herbert <therbert@google.com>
Cc: stable <stable@vger.kernel.org> # 2.6.35
Tested-by: Dave Jones <davej@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
f2bf1f6f5f89d031245067512449fc889b2f4bb2 07-Jun-2012 Steven Rostedt <srostedt@redhat.com> tracing: Have tracing_off() actually turn tracing off

A recent update to have tracing_on/off() only affect the ftrace ring
buffers instead of all ring buffers had a cut and paste error.
The tracing_off() did the exact same thing as tracing_on() and
would not actually turn off tracing. Unfortunately, tracing_off()
is more important to be working than tracing_on() as this is a key
development tool, as it lets the developer turn off tracing as soon
as a problem is discovered. It is also used by panic and oops code.

This bug also breaks the 'echo func:traceoff > set_ftrace_filter'

Cc: <stable@vger.kernel.org> # 3.4
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
895b67fd5830ce18a6f1375a7c062fcf84b4b874 07-Nov-2011 Richard Weinberger <richard@nod.at> tracing: Remove kernel_lock annotations

The BKL is gone, these annotations are useless.

Link: http://lkml.kernel.org/r/1320654202-4433-1-git-send-email-richard@nod.at

Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a591c73f127505cdbd0aa399a92112a8ddff8730 03-May-2012 Vaibhav Nagarnaik <vnagarnaik@google.com> tracing: Fix initial buffer_size_kb state

Make sure that the state of buffer_size_kb is initialized correctly and
returns actual size of the ring buffer.

Link: http://lkml.kernel.org/r/1336066834-1673-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
71babb2705e2203a64c27ede13ae3508a0d2c16c 04-May-2012 Vaibhav Nagarnaik <vnagarnaik@google.com> tracing: change CPU ring buffer state from tracing_cpumask

According to Documentation/trace/ftrace.txt:

tracing_cpumask:

This is a mask that lets the user only trace
on specified CPUS. The format is a hex string
representing the CPUS.

The tracing_cpumask currently doesn't affect the tracing state of
per-CPU ring buffers.

This patch enables/disables CPU recording as its corresponding bit in
tracing_cpumask is set/unset.

Link: http://lkml.kernel.org/r/1336096792-25373-3-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0a3d7ce7e6caa8c39cb5184bd9047a01a40abc2a 23-Apr-2012 Namhyung Kim <namhyung.kim@lge.com> tracing: Check return value of tracing_dentry_percpu()

If tracing_dentry_percpu() failed, tracing_init_debugfs_percpu()
will try to create each cpu directories on debugfs' root directory
as d_percpu is NULL.

Link: http://lkml.kernel.org/r/1335143517-2285-1-git-send-email-namhyung.kim@lge.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Namhyung Kim <namhyung.kim@lge.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
83f40318dab00e3298a1f6d0b12ac025e84e478d 04-May-2012 Vaibhav Nagarnaik <vnagarnaik@google.com> ring-buffer: Make removal of ring buffer pages atomic

This patch adds the capability to remove pages from a ring buffer
without destroying any existing data in it.

This is done by removing the pages after the tail page. This makes sure
that first all the empty pages in the ring buffer are removed. If the
head page is one in the list of pages to be removed, then the page after
the removed ones is made the head page. This removes the oldest data
from the ring buffer and keeps the latest data around to be read.

To do this in a non-racey manner, tracing is stopped for a very short
time while the pages to be removed are identified and unlinked from the
ring buffer. The pages are freed after the tracing is restarted to
minimize the time needed to stop tracing.

The context in which the pages from the per-cpu ring buffer are removed
runs on the respective CPU. This minimizes the events not traced to only
NMI trace contexts.

Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Laurent Chavey <chavey@google.com>
Cc: Justin Teravest <teravest@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6edb2a8a385f0cdef51dae37ff23e74d76d8a6ce 12-May-2012 Steven Rostedt <srostedt@redhat.com> tracing: Clean up tracing_mark_write()

On gcc 4.5 the function tracing_mark_write() would give a warning
of page2 being uninitialized. This is due to a bug in gcc because
the logic prevents page2 from being used uninitialized, and
gcc 4.6+ does not complain (correctly).

Instead of adding a "unitialized" around page2, which could show
a bug later on, I combined page1 and page2 into an array map_pages[].
This binds the two and the two are modified according to nr_pages
(what gcc 4.5 seems to ignore). This no longer gives a warning with
gcc 4.5 nor with gcc 4.6.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
68179686ac67cb08f08b1ef28b860d5ed899f242 09-May-2012 Steven Rostedt <srostedt@redhat.com> tracing: Remove ftrace_disable/enable_cpu()

The ftrace_disable_cpu() and ftrace_enable_cpu() functions were
needed back before the ring buffer was lockless. Now that the
ring buffer is lockless (and has been for some time), these functions
serve no purpose, and unnecessarily slow down operations of the tracer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
50e18b94c695644d824381e7574b9c44acc25ffe 25-Apr-2012 Jiri Olsa <jolsa@redhat.com> tracing: Use seq_*_private interface for some seq files

It's appropriate to use __seq_open_private interface to open
some of trace seq files, because it covers all steps we are
duplicating in tracing code - zallocating the iterator and
setting it as seq_file's private.

Using this for following files:
trace
available_filter_functions
enabled_functions

Link: http://lkml.kernel.org/r/1335342219-2782-5-git-send-email-jolsa@redhat.com

Signed-off-by: Jiri Olsa <jolsa@redhat.com>

[
Fixed warnings for:
kernel/trace/trace.c: In function '__tracing_open':
kernel/trace/trace.c:2418:11: warning: unused variable 'ret' [-Wunused-variable]
kernel/trace/trace.c:2417:19: warning: unused variable 'm' [-Wunused-variable]
]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
438ced1720b584000a9e8a4349d1f6bb7ee3ad6d 02-Feb-2012 Vaibhav Nagarnaik <vnagarnaik@google.com> ring-buffer: Add per_cpu ring buffer control files

Add a debugfs entry under per_cpu/ folder for each cpu called
buffer_size_kb to control the ring buffer size for each CPU
independently.

If the global file buffer_size_kb is used to set size, the individual
ring buffers will be adjusted to the given size. The buffer_size_kb will
report the common size to maintain backward compatibility.

If the buffer_size_kb file under the per_cpu/ directory is used to
change buffer size for a specific CPU, only the size of the respective
ring buffer is updated. When tracing/buffer_size_kb is read, it reports
'X' to indicate that sizes of per_cpu ring buffers are not equivalent.

Link: http://lkml.kernel.org/r/1328212844-11889-1-git-send-email-vnagarnaik@google.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Justin Teravest <teravest@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5a26c8f0cf1e95106858bb4e23ca6dd14c9b842f 20-Apr-2012 Dan Carpenter <dan.carpenter@oracle.com> tracing: Remove an unneeded check in trace_seq_buffer()

memcpy() returns a pointer to "bug". Hopefully, it's not NULL here or
we would already have Oopsed.

Link: http://lkml.kernel.org/r/20120420063145.GA22649@elgon.mountain

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
07d777fe8c3985bc83428c2866713c2d1b3d4129 22-Sep-2011 Steven Rostedt <srostedt@redhat.com> tracing: Add percpu buffers for trace_printk()

Currently, trace_printk() uses a single buffer to write into
to calculate the size and format needed to save the trace. To
do this safely in an SMP environment, a spin_lock() is taken
to only allow one writer at a time to the buffer. But this could
also affect what is being traced, and add synchronization that
would not be there otherwise.

Ideally, using percpu buffers would be useful, but since trace_printk()
is only used in development, having per cpu buffers for something
never used is a waste of space. Thus, the use of the trace_bprintk()
format section is changed to be used for static fmts as well as dynamic ones.
Then at boot up, we can check if the section that holds the trace_printk
formats is non-empty, and if it does contain something, then we
know a trace_printk() has been added to the kernel. At this time
the trace_printk per cpu buffers are allocated. A check is also
done at module load time in case a module is added that contains a
trace_printk().

Once the buffers are allocated, they are never freed. If you use
a trace_printk() then you should know what you are doing.

A buffer is made for each type of context:

normal
softirq
irq
nmi

The context is checked and the appropriate buffer is used.
This allows for totally lockless usage of trace_printk(),
and they no longer even disable interrupts.

Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
348f0fc238efb441a28e7644c51f9fd3001b228a 16-Apr-2012 Steven Rostedt <srostedt@redhat.com> tracing: Fix regression with tracing_on

The change to make tracing_on affect only the ftrace ring buffer, caused
a bug where it wont affect any ring buffer. The problem was that the buffer
of the trace_array was passed to the write function and not the trace array
itself.

The trace_array can change the buffer when running a latency tracer. If this
happens, then the buffer being disabled may not be the buffer currently used
by ftrace. This will cause the tracing_on file to become useless.

The simple fix is to pass the trace_array to the write function instead of
the buffer. Then the actual buffer may be changed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12b5da349a8b94c9dbc3430a6bc42eabd9eaf50b 27-Mar-2012 Steven Rostedt <srostedt@redhat.com> tracing: Fix ent_size in trace output

When reading the trace file, the records of each of the per_cpu buffers
are examined to find the next event to print out. At the point of looking
at the event, the size of the event is recorded. But if the first event is
chosen, the other events in the other CPU buffers will reset the event size
that is stored in the iterator descriptor, causing the event size passed to
the output functions to be incorrect.

In most cases this is not a problem, but for the case of stack traces, it
is. With the change to the stack tracing to record a dynamic number of
back traces, the output depends on the size of the entry instead of the
fixed 8 back traces. When the entry size is not correct, the back traces
would not be fully printed.

Note, reading from the per-cpu trace files were not affected.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b892e5c89787716b95a8e55d77d25a1c0748df10 02-Mar-2012 Steven Rostedt <srostedt@redhat.com> tracing: Keep NMI watchdog from triggering when dumping trace

As ftrace_dump() (called by ftrace_dump_on_oops) disables interrupts
as it dumps its output to the console, it can keep interrupts disabled
for long periods of time. This is likely to trigger the NMI watchdog,
and it can disrupt the output of critical data.

Add a touch_nmi_watchdog() to each event that is written to the screen
to keep the NMI watchdog from affecting the output.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
499e547057f5bba5cd6f87ebe59b05d0c59da905 22-Feb-2012 Steven Rostedt <srostedt@redhat.com> tracing/ring-buffer: Only have tracing_on disable tracing buffers

As the ring-buffer code is being used by other facilities in the
kernel, having tracing_on file disable *all* buffers is not a desired
affect. It should only disable the ftrace buffers that are being used.

Move the code into the trace.c file and use the buffer disabling
for tracing_on() and tracing_off(). This way only the ftrace buffers
will be affected by them and other kernel utilities will not be
confused to why their output suddenly stopped.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1e42e83fde5537266c1d1e7fd8c010b3028d50fc 08-Feb-2012 Geunsik Lim <geunsik.lim@samsung.com> ftrace: sched_switch plugin is deprecated

Actually, sched_switch function tracer is merged into wakeup/wakeup_rt
Update 'mini-HOWTO' for ftrace(Kernel function tracer).
If we want to trace "sched:sched_switch" to trace sched_switch func,
We may utilize event option.(e.g: trace-cmd list -e | grep sched)
This patch is based on Linux-3.3.rc2-SMP-PREEMPT

Link: http://lkml.kernel.org/r/1328695537-15081-1-git-send-email-geunsik.lim@gmail.com

Cc: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f4ae40a6a50a98ac23d4b285f739455e926a473e 24-Jul-2011 Al Viro <viro@zeniv.linux.org.uk> switch debugfs to umode_t

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a8eecf2248a45bf69f0625b23c003ad2ccd765ee 02-Oct-2011 Paul E. McKenney <paul.mckenney@linaro.org> trace: Allow ftrace_dump() to be called from modules

Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer
upon detection of an RCU error.

Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
39eaf7ef884dcc44f7ff1bac803ca2a1dcf43544 17-Nov-2011 Steven Rostedt <srostedt@redhat.com> tracing: Add entries in buffer and total entries to default output header

Knowing the number of event entries in the ring buffer compared
to the total number that were written is useful information. The
latency format gives this information and there's no reason that the
default format does not.

This information is now added to the default header, along with the
number of online CPUs:

# tracer: nop
#
# entries-in-buffer/entries-written: 159836/64690869 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [000] ...2 49.442971: local_touch_nmi <-cpu_idle
<idle>-0 [000] d..2 49.442973: enter_idle <-cpu_idle
<idle>-0 [000] d..2 49.442974: atomic_notifier_call_chain <-enter_idle
<idle>-0 [000] d..2 49.442976: __atomic_notifier_call_chain <-atomic_notifier

The above shows that the trace contains 159836 entries, but
64690869 were written. One could figure out that there were
64531033 entries that were dropped.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
77271ce4b2c0df0a76ad1cbb6a95b07e1f88c1ea 17-Nov-2011 Steven Rostedt <srostedt@redhat.com> tracing: Add irq, preempt-count and need resched info to default trace output

People keep asking how to get the preempt count, irq, and need resched info
and we keep telling them to enable the latency format. Some developers think
that traces without this info is completely useless, and for a lot of tasks
it is useless.

The first option was to enable the latency trace as the default format, but
the header for the latency format is pretty useless for most tracers and
it also does the timestamp in straight microseconds from the time the trace
started. This is sometimes more difficult to read as the default trace is
seconds from the start of boot up.

Latency format:

# tracer: nop
#
# nop latency trace v1.1.5 on 3.2.0-rc1-test+
# --------------------------------------------------------------------
# latency: 0 us, #159771/64234230, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: -0 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
migratio-6 0...2 41778231us+: rcu_note_context_switch <-__schedule
migratio-6 0...2 41778233us : trace_rcu_utilization <-rcu_note_context_switch
migratio-6 0...2 41778235us+: rcu_sched_qs <-rcu_note_context_switch
migratio-6 0d..2 41778236us+: rcu_preempt_qs <-rcu_note_context_switch
migratio-6 0...2 41778238us : trace_rcu_utilization <-rcu_note_context_switch
migratio-6 0...2 41778239us+: debug_lockdep_rcu_enabled <-__schedule

default format:

# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
migration/0-6 [000] 50.025810: rcu_note_context_switch <-__schedule
migration/0-6 [000] 50.025812: trace_rcu_utilization <-rcu_note_context_switch
migration/0-6 [000] 50.025813: rcu_sched_qs <-rcu_note_context_switch
migration/0-6 [000] 50.025815: rcu_preempt_qs <-rcu_note_context_switch
migration/0-6 [000] 50.025817: trace_rcu_utilization <-rcu_note_context_switch
migration/0-6 [000] 50.025818: debug_lockdep_rcu_enabled <-__schedule
migration/0-6 [000] 50.025820: debug_lockdep_rcu_enabled <-__schedule

The latency format header has latency information that is pretty meaningless
for most tracers. Although some of the header is useful, and we can add that
later to the default format as well.

What is really useful with the latency format is the irqs-off, need-resched
hard/softirq context and the preempt count.

This commit adds the option irq-info which is on by default that adds this
information:

# tracer: nop
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
<idle>-0 [000] d..2 49.309305: cpuidle_get_driver <-cpuidle_idle_call
<idle>-0 [000] d..2 49.309307: mwait_idle <-cpu_idle
<idle>-0 [000] d..2 49.309309: need_resched <-mwait_idle
<idle>-0 [000] d..2 49.309310: test_ti_thread_flag <-need_resched
<idle>-0 [000] d..2 49.309312: trace_power_start.constprop.13 <-mwait_idle
<idle>-0 [000] d..2 49.309313: trace_cpu_idle <-mwait_idle
<idle>-0 [000] d..2 49.309315: need_resched <-mwait_idle

If a user wants the old format, they can disable the 'irq-info' option:

# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
<idle>-0 [000] 49.309305: cpuidle_get_driver <-cpuidle_idle_call
<idle>-0 [000] 49.309307: mwait_idle <-cpu_idle
<idle>-0 [000] 49.309309: need_resched <-mwait_idle
<idle>-0 [000] 49.309310: test_ti_thread_flag <-need_resched
<idle>-0 [000] 49.309312: trace_power_start.constprop.13 <-mwait_idle
<idle>-0 [000] 49.309313: trace_cpu_idle <-mwait_idle
<idle>-0 [000] 49.309315: need_resched <-mwait_idle

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7e9a49ef542610609144d1afcd516dc3fafac4d6 07-Nov-2011 Jiri Olsa <jolsa@redhat.com> tracing/latency: Fix header output for latency tracers

In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the
function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers
do not display proper latency header.

The involved/fixed latency tracers are:
wakeup_rt
wakeup
preemptirqsoff
preemptoff
irqsoff

The patch adds proper handling of tracer configuration options for latency
tracers, and displaying correct header info accordingly.

* The current output (for wakeup tracer) with both graph and function
tracers disabled is:

# tracer: wakeup
#
<idle>-0 0d.h5 1us+: 0:120:R + [000] 7: 0:R watchdog/0
<idle>-0 0d.h5 3us+: ttwu_do_activate.clone.1 <-try_to_wake_up
...

* The fixed output is:

# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 3.1.0-tip+
# --------------------------------------------------------------------
# latency: 55 us, #4/4, CPU#0 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: migration/0-6 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
cat-1129 0d..4 1us : 1129:120:R + [000] 6: 0:R migration/0
cat-1129 0d..4 2us+: ttwu_do_activate.clone.1 <-try_to_wake_up

* The current output (for wakeup tracer) with only function
tracer enabled is:

# tracer: wakeup
#
cat-1140 0d..4 1us+: 1140:120:R + [000] 6: 0:R migration/0
cat-1140 0d..4 2us : ttwu_do_activate.clone.1 <-try_to_wake_up

* The fixed output is:
# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 3.1.0-tip+
# --------------------------------------------------------------------
# latency: 207 us, #109/109, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2)
# -----------------
# | task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| / delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
<idle>-0 1d.h5 1us+: 0:120:R + [001] 12: 0:R watchdog/1
<idle>-0 1d.h5 3us : ttwu_do_activate.clone.1 <-try_to_wake_up

Link: http://lkml.kernel.org/r/20111107150849.GE1807@m.brq.redhat.com

Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
436fc280261dcfce5af38f08b89287750dc91cd2 14-Oct-2011 Steven Rostedt <srostedt@redhat.com> tracing: Fix returning of duplicate data after EOF in trace_pipe_raw

The trace_pipe_raw handler holds a cached page from the time the file
is opened to the time it is closed. The cached page is used to handle
the case of the user space buffer being smaller than what was read from
the ring buffer. The left over buffer is held in the cache so that the
next read will continue where the data left off.

After EOF is returned (no more data in the buffer), the index of
the cached page is set to zero. If a user app reads the page again
after EOF, the check in the buffer will see that the cached page
is less than page size and will return the cached page again. This
will cause reading the trace_pipe_raw again after EOF to return
duplicate data, making the output look like the time went backwards
but instead data is just repeated.

The fix is to not reset the index right after all data is read
from the cache, but to reset it after all data is read and more
data exists in the ring buffer.

Cc: stable <stable@kernel.org>
Reported-by: Jeremy Eder <jeder@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9b5f8b31af57a8ce9e9f77864d9143b5e3304815 12-Aug-2011 Geunsik Lim <geunsik.lim@samsung.com> ftrace: Fix README to state tracing_on to start/stop tracing

tracing_enabled option is deprecated.
To start/stop tracing, write to /sys/kernel/debug/tracing/tracing_on
without tracing_enabled. This patch is based on Linux 3.1.0-rc1

Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com>
Link: http://lkml.kernel.org/r/1313127022-23830-1-git-send-email-leemgs1@gmail.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d696b58ca2c3ca76e784ef89a7e0453d9b7ab187 22-Sep-2011 Steven Rostedt <srostedt@redhat.com> tracing: Do not allocate buffer for trace_marker

When doing intense tracing, the kmalloc inside trace_marker can
introduce side effects to what is being traced.

As trace_marker() is used by userspace to inject data into the
kernel ring buffer, it needs to do so with the least amount
of intrusion to the operations of the kernel or the user space
application.

As the ring buffer is designed to write directly into the buffer
without the need to make a temporary buffer, and userspace already
went through the hassle of knowing how big the write will be,
we can simply pin the userspace pages and write the data directly
into the buffer. This improves the impact of tracing via trace_marker
tremendously!

Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the
use of get_user_pages_fast() and kmap_atomic().

Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e0a413f619ef8bc366dafc6f8221674993b8d85f 30-Sep-2011 Steven Rostedt <srostedt@redhat.com> tracing: Warn on output if the function tracer was found corrupted

As the function tracer is very intrusive, lots of self checks are
performed on the tracer and if something is found to be strange
it will shut itself down keeping it from corrupting the rest of the
kernel. This shutdown may still allow functions to be traced, as the
tracing only stops new modifications from happening. Trying to stop
the function tracer itself can cause more harm as it requires code
modification.

Although a WARN_ON() is executed, a user may not notice it. To help
the user see that something isn't right with the tracing of the system
a big warning is added to the output of the tracer that lets the user
know that their data may be incomplete.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6249687f76b69cc0b2ad34636f4a18d693ef3262 19-Sep-2011 Steven Rostedt <srostedt@redhat.com> tracing: Add a counter clock for those that do not trust clocks

When debugging tight race conditions, it can be helpful to have a
synchronized tracing method. Although in most cases the global clock
provides this functionality, if timings is not the issue, it is more
comforting to know that the order of events really happened in a precise
order.

Instead of using a clock, add a "counter" that is simply an incrementing
atomic 64bit counter that orders the events as they are perceived to
happen.

The trace_clock_counter() is added from the attempt by Peter Zijlstra
trying to convert the trace_clock_global() to it. I took Peter's counter
code and made trace_clock_counter() instead, and added it to the choice
of clocks. Just echo counter > /debug/tracing/trace_clock to activate
it.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-By: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5389f6fad27019f2ba78f1b332f719ec05f12a42 25-Jul-2009 Thomas Gleixner <tglx@linutronix.de> locking, tracing: Annotate tracing locks as raw

The tracing locks can be taken in atomic context and therefore
cannot be preempted on -rt - annotate it.

In mainline this change documents the low level nature of
the lock - otherwise there's no functional difference. Lockdep
and Sparse checking will work as usual.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c64e148a3be3cb786534ad38298c25c833116c26 16-Aug-2011 Vaibhav Nagarnaik <vnagarnaik@google.com> trace: Add ring buffer stats to measure rate of events

The stats file under per_cpu folder provides the number of entries,
overruns and other statistics about the CPU ring buffer. However, the
numbers do not provide any indication of how full the ring buffer is in
bytes compared to the overall size in bytes. Also, it is helpful to know
the rate at which the cpu buffer is filling up.

This patch adds an entry "bytes: " in printed stats for per_cpu ring
buffer which provides the actual bytes consumed in the ring buffer. This
field includes the number of bytes used by recorded events and the
padding bytes added when moving the tail pointer to next page.

It also adds the following time stamps:
"oldest event ts:" - the oldest timestamp in the ring buffer
"now ts:" - the timestamp at the time of reading

The field "now ts" provides a consistent time snapshot to the userspace
when being read. This is read from the same trace clock used by tracing
event timestamps.

Together, these values provide the rate at which the buffer is filling
up, from the formula:
bytes / (now_ts - oldest_event_ts)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f81ab074c30234b07c8309c542cafd07bed721f7 16-Aug-2011 Vaibhav Nagarnaik <vnagarnaik@google.com> trace: Add a new readonly entry to report total buffer size

The current file "buffer_size_kb" reports the size of per-cpu buffer and
not the overall memory allocated which could be misleading. A new file
"buffer_total_size_kb" adds up all the enabled CPU buffer sizes and
reports it. This is only a readonly entry.

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Link: http://lkml.kernel.org/r/1313531179-9323-2-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4a9bd3f134decd6d16ead8d288342d57aad486be 14-Jul-2011 Steven Rostedt <srostedt@redhat.com> tracing: Have dynamic size event stack traces

Currently the stack trace per event in ftace is only 8 frames.
This can be quite limiting and sometimes useless. Especially when
the "ignore frames" is wrong and we also use up stack frames for
the event processing itself.

Change this to be dynamic by adding a percpu buffer that we can
write a large stack frame into and then copy into the ring buffer.

For interrupts and NMIs that come in while another event is being
process, will only get to use the 8 frame stack. That should be enough
as the task that it interrupted will have the full stack frame anyway.

Requested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1fd8df2c3970c9e7e4e262354154ee39e58bdd7c 08-Jun-2011 Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> tracing/kprobes: Fix kprobe-tracer to support stack trace

Fix to support kernel stack trace correctly on kprobe-tracer.
Since the execution path of kprobe-based dynamic events is different
from other tracepoint-based events, normal ftrace_trace_stack() doesn't
work correctly. To fix that, this introduces ftrace_trace_stack_regs()
which traces stack via pt_regs instead of current stack register.

e.g.

# echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events
# echo 1 > /sys/kernel/debug/tracing/options/stacktrace
# echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable
# head -n 20 /sys/kernel/debug/tracing/trace
bash-2968 [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca)
bash-2968 [000] 10297.050247: <stack trace>
=> schedule_timeout
=> n_tty_read
=> tty_read
=> vfs_read
=> sys_read
=> system_call_fastpath
kworker/0:1-2940 [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca)
kworker/0:1-2940 [000] 10297.050266: <stack trace>
=> worker_thread
=> kthread
=> kernel_thread_helper
sshd-1132 [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca)
sshd-1132 [000] 10297.050365: <stack trace>
=> sysret_careful

Note: Even with this fix, the first entry will be skipped
if the probe is put on the function entry area before
the frame pointer is set up (usually, that is 4 bytes
(push %bp; mov %sp %bp) on x86), because stack unwinder
depends on the frame pointer.

Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Namhyung Kim <namhyung@gmail.com>
Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
22fe9b54d859e53bfbbbdc1a0a77a82bc453927c 07-Jun-2011 Peter Huewe <peterhuewe@gmx.de> tracing: Convert to kstrtoul_from_user

This patch replaces the code for getting an unsigned long from a
userspace buffer by a simple call to kstroul_from_user.
This makes it easier to read and less error prone.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f56e7f8efb4ec200364f690a9902713410e24d47 03-Jun-2011 Jiri Olsa <jolsa@redhat.com> tracing, function: Fix trace header to follow context-info option

The header display of function tracer does not follow
the context-info option, so field names are displayed even
if this option is off.

Added check for TRACE_ITER_CONTEXT_INFO trace_flags.

With following commands:
# echo function > ./current_tracer
# echo 0 > options/context-info
# cat trace

This is what it looked like before:
# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
add_preempt_count <-schedule
rcu_note_context_switch <-schedule
...

This is what it looks like now:
# tracer: function
#
_raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel
...

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1307113131-10045-4-git-send-email-jolsa@redhat.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cf30cf67d6c7592c670ec946d89fc15ee0deb0eb 15-Jun-2011 Steven Rostedt <srostedt@redhat.com> tracing: Add disable_on_free option

Add a trace option to disable tracing on free. When this option is
set, a write into the free_buffer file will not only shrink the
ring buffer down to zero, but it will also disable tracing.

Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4f271a2a60c748599b30bb4dafff30d770439b96 14-Jun-2011 Vaibhav Nagarnaik <vnagarnaik@google.com> tracing: Add a proc file to stop tracing and free buffer

The proc file entry buffer_size_kb is used to set the size of tracing
buffer. The memory to expand the buffer size is kernel memory. Consider
a use case where tracing is handled by a user space utility, which acts
as a gate keeper for tracing requests. In an OOM condition, tracing is
considered a low priority task and if the utility gets killed the ring
buffer memory cannot be released back to the kernel.

This patch adds a proc file called "free_buffer" whose purpose is to
stop tracing and free up the ring buffer when it is closed.

The user space process can then set the desired size in buffer_size_kb
file and open the fd to the "free_buffer" file. Under OOM condition, if
the process gets killed, the kernel closes the file descriptor. The
release handler stops the tracing and releases the kernel memory
automatically.

Cc: Ingo Molnar <mingo@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7ea5906405a1f3fc1c0033dfd7e02f2cfd1de5e5 04-May-2011 Vaibhav Nagarnaik <vnagarnaik@google.com> tracing: Use NUMA allocation for per-cpu ring buffer pages

The tracing ring buffer is a group of per-cpu ring buffers where
allocation and logging is done on a per-cpu basis. The events that are
generated on a particular CPU are logged in the corresponding buffer.
This is to provide wait-free writes between CPUs and good NUMA node
locality while accessing the ring buffer.

However, the allocation routines consider NUMA locality only for buffer
page metadata and not for the actual buffer page. This causes the pages
to be allocated on the NUMA node local to the CPU where the allocation
routine is running at the time.

This patch fixes the problem by using a NUMA node specific allocation
routine so that the pages are allocated from a NUMA node local to the
logging CPU.

I tested with the getuid_microbench from autotest. It is a simple binary
that calls getuid() in a loop and measures the average time for the
syscall to complete. The following command was used to test:
$ getuid_microbench 1000000

Compared the numbers found on kernel with and without this patch and
found that logging latency decreases by 30-50 ns/call.
tracing with non-NUMA allocation - 569 ns/call
tracing with NUMA allocation - 512 ns/call

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e7e2ee89a9dbf48d70a922d5625cd7320a27cbff 10-May-2011 Vaibhav Nagarnaik <vnagarnaik@google.com> tracing: Schedule a delayed work to call wakeup()

In using syscall tracing by concurrent processes, the wakeup() that is
called in the event commit function causes contention on the spin lock
of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
tracepoints, and by running getuid_microbench from autotest in parallel
I found that the contention causes exponential latency increase in the
tracing path.

The autotest binary getuid_microbench calls getuid() in a tight loop for
the given number of iterations and measures the average time required to
complete a single invocation of syscall.

The patch schedules a delayed work after 2 ms once an event commit calls
to wake up the trace wait_queue. This removes the delay caused by
contention on spin lock in wakeup() and amortizes the wakeup() calls
scheduled over the 2 ms period.

In the following example, the script enables the sys_enter_getuid and
sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
with the given number of processes. The output clearly shows the latency
increase caused by contentions.

$ ~/getuid.sh 1
1000000 calls in 0.720974253 s (720.974253 ns/call)

$ ~/getuid.sh 2
1000000 calls in 1.166457554 s (1166.457554 ns/call)
1000000 calls in 1.168933765 s (1168.933765 ns/call)

$ ~/getuid.sh 3
1000000 calls in 1.783827516 s (1783.827516 ns/call)
1000000 calls in 1.795553270 s (1795.553270 ns/call)
1000000 calls in 1.796493376 s (1796.493376 ns/call)

$ ~/getuid.sh 4
1000000 calls in 4.483041796 s (4483.041796 ns/call)
1000000 calls in 4.484165388 s (4484.165388 ns/call)
1000000 calls in 4.484850762 s (4484.850762 ns/call)
1000000 calls in 4.485643576 s (4485.643576 ns/call)

$ ~/getuid.sh 5
1000000 calls in 6.497521653 s (6497.521653 ns/call)
1000000 calls in 6.502000236 s (6502.000236 ns/call)
1000000 calls in 6.501709115 s (6501.709115 ns/call)
1000000 calls in 6.502124100 s (6502.124100 ns/call)
1000000 calls in 6.502936358 s (6502.936358 ns/call)

After the patch, the latencies scale better.
1000000 calls in 0.728720455 s (728.720455 ns/call)

1000000 calls in 0.842782857 s (842.782857 ns/call)
1000000 calls in 0.883803135 s (883.803135 ns/call)

1000000 calls in 0.902077764 s (902.077764 ns/call)
1000000 calls in 0.902838202 s (902.838202 ns/call)
1000000 calls in 0.908896885 s (908.896885 ns/call)

1000000 calls in 0.932523515 s (932.523515 ns/call)
1000000 calls in 0.958009672 s (958.009672 ns/call)
1000000 calls in 0.986188020 s (986.188020 ns/call)
1000000 calls in 0.989771102 s (989.771102 ns/call)

1000000 calls in 0.933518391 s (933.518391 ns/call)
1000000 calls in 0.958897947 s (958.897947 ns/call)
1000000 calls in 1.031038897 s (1031.038897 ns/call)
1000000 calls in 1.089516025 s (1089.516025 ns/call)
1000000 calls in 1.141998347 s (1141.998347 ns/call)

Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Rubin <mrubin@google.com>
Cc: David Sharp <dhsharp@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.com
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a3a4a5acd3bd2f6f1e102e1f1b9d2e2bb320a7fd 06-May-2011 Arjan van de Ven <arjan@linux.intel.com> Regression: partial revert "tracing: Remove lock_depth from event entry"

This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266.

That commit changed the structure layout of the trace structure, which
in turn broke PowerTOP (1.9x generation) quite badly.

I appreciate not wanting to expose the variable in question, and
PowerTOP was not using it, so I've replaced the variable with just a
padding field - that way if in the future a new field is needed it can
just use this padding field.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ee5e51f51be755830f57445e268ba50e88ccbdbb 25-Mar-2011 Jiri Olsa <jolsa@redhat.com> tracing: Avoid soft lockup in trace_pipe

running following commands:

# enable the binary option
echo 1 > ./options/bin
# disable context info option
echo 0 > ./options/context-info
# tracing only events
echo 1 > ./events/enable
cat trace_pipe

plus forcing system to generate many tracing events,
is causing lockup (in NON preemptive kernels) inside
tracing_read_pipe function.

The issue is also easily reproduced by running ltp stress test.
(ftrace_stress_test.sh)

The reasons are:
- bin/hex/raw output functions for events are set to
trace_nop_print function, which prints nothing and
returns TRACE_TYPE_HANDLED value
- LOST EVENT trace do not handle trace_seq overflow

These reasons force the while loop in tracing_read_pipe
function never to break.

The attached patch fixies handling of lost event trace, and
changes trace_nop_print to print minimal info, which is needed
for the correct tracing_read_pipe processing.

v2 changes:
- omit the cond_resched changes by trace_nop_print changes
- WARN changed to WARN_ONCE and added info to be able
to find out the culprit

v3 changes:
- make more accurate patch comment

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <20110325110518.GC1922@jolsa.brq.redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
25985edcedea6396277003854657b5f3cb31a628 31-Mar-2011 Lucas De Marchi <lucas.demarchi@profusion.mobi> Fix common misspellings

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
4a0b1665db09cf2da9ad7d0f12da386373c10bfa 10-Mar-2011 Steven Rostedt <srostedt@redhat.com> tracing: Fix irqoff selftest expanding max buffer

If the kernel command line declares a tracer "ftrace=sometracer" and
that tracer is either not defined or is enabled after irqsoff,
then the irqs off selftest will fail with the following error:

Testing tracer irqsoff:
------------[ cut here ]------------
WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra
ce.c:713 update_max_tr_single+0xfa/0x11b()
Hardware name:
Modules linked in:
Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1
Call Trace:
[<c0441d9d>] ? warn_slowpath_common+0x65/0x7a
[<c049adb2>] ? update_max_tr_single+0xfa/0x11b
[<c0441dc1>] ? warn_slowpath_null+0xf/0x13
[<c049adb2>] ? update_max_tr_single+0xfa/0x11b
[<c049e454>] ? stop_critical_timing+0x154/0x204
[<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
[<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
[<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
[<c049e529>] ? time_hardirqs_on+0x25/0x28
[<c0468bca>] ? trace_hardirqs_on_caller+0x18/0x12f
[<c0468cec>] ? trace_hardirqs_on+0xb/0xd
[<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
[<c049b6b8>] ? register_tracer+0xf8/0x1a3
[<c14e93fe>] ? init_irqsoff_tracer+0xd/0x11
[<c040115e>] ? do_one_initcall+0x71/0x121
[<c14e93f1>] ? init_irqsoff_tracer+0x0/0x11
[<c14ce3a9>] ? kernel_init+0x13a/0x1b6
[<c14ce26f>] ? kernel_init+0x0/0x1b6
[<c0403842>] ? kernel_thread_helper+0x6/0x10
---[ end trace e93713a9d40cd06c ]---
.. no entries found ..FAILED!

What happens is the "ftrace=..." will expand the ring buffer to its
default size (from its minimum size) but it will not expand the
max ring buffer (the ring buffer to store maximum latencies).
When the irqsoff test runs, it will call the ring buffer swap routine
that checks if the max ring buffer is the same size as the normal
ring buffer, and will fail if it is not. This causes the test to fail.

The solution is to expand the max ring buffer before running the self
test if the max ring buffer is used by that tracer and the normal ring
buffer is expanded. The max ring buffer should be shrunk again after
the test is done to save space.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e6e1e2593592a8f6f6380496655d8c6f67431266 09-Mar-2011 Steven Rostedt <srostedt@redhat.com> tracing: Remove lock_depth from event entry

The lock_depth field in the event headers was added as a temporary
data point for help in removing the BKL. Now that the BKL is pretty
much been removed, we can remove this field.

This in turn changes the header from 12 bytes to 8 bytes,
removing the 4 byte buffer that gcc would insert if the first field
in the data load was 8 bytes in size.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
750912fa366312e9c5bc83eab352898a26750401 08-Dec-2010 David Sharp <dhsharp@google.com> tracing: Add an 'overwrite' trace_option.

Add an "overwrite" trace_option for ftrace to control whether the buffer should
be overwritten on overflow or not. The default remains to overwrite old events
when the buffer is full. This patch adds the option to instead discard newest
events when the buffer is full. This is useful to get a snapshot of traces just
after enabling traces. Dropping the current event is also a simpler code path.

Signed-off-by: David Sharp <dhsharp@google.com>
LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6752ab4a9c30d5411b2dfdb251a3f1cb18aae487 08-Feb-2011 Steven Rostedt <srostedt@redhat.com> tracing: Deprecate tracing_enabled for tracing_on

tracing_enabled should not be used, it is heavy weight and does not
do much in helping lower the overhead.

tracing_on should be used instead. Warn users to use tracing_on
when tracing_enabled is used as it will soon be removed from the
tracing directory.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1dbd1951f39e13da579ffe879cce19586d0462de 09-Dec-2010 Li Zefan <lizf@cn.fujitsu.com> tracing: Fix preempt count leak

While running my ftrace stress test, this showed up:

BUG: sleeping function called from invalid context at mm/mmap.c:233
...
note: cat[3293] exited with preempt_count 1

The bug was introduced by commit 91e86e560d0b3ce4c5fc64fd2bbb99f856a30a4e
("tracing: Fix recursive user stack trace")

Cc: <stable@kernel.org>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4D0089AC.1020802@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
364829b1263b44aa60383824e4c1289d83d78ca7 25-Nov-2010 Slava Pestov <slavapestov@google.com> tracing: Fix panic when lseek() called on "trace" opened for writing

The file_ops struct for the "trace" special file defined llseek as seq_lseek().
However, if the file was opened for writing only, seq_open() was not called,
and the seek would dereference a null pointer, file->private_data.

This patch introduces a new wrapper for seq_lseek() which checks if the file
descriptor is opened for reading first. If not, it does nothing.

Cc: <stable@kernel.org>
Signed-off-by: Slava Pestov <slavapestov@google.com>
LKML-Reference: <1290640396-24179-1-git-send-email-slavapestov@google.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
451a3c24b0135bce54542009b5fde43846c7cf67 17-Nov-2010 Arnd Bergmann <arnd@arndb.de> BKL: remove extraneous #include <smp_lock.h>

The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
91e86e560d0b3ce4c5fc64fd2bbb99f856a30a4e 10-Nov-2010 Steven Rostedt <srostedt@redhat.com> tracing: Fix recursive user stack trace

The user stack trace can fault when examining the trace. Which
would call the do_page_fault handler, which would trace again,
which would do the user stack trace, which would fault and call
do_page_fault again ...

Thus this is causing a recursive bug. We need to have a recursion
detector here.

[ Resubmitted by Jiri Olsa ]

[ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ]

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1289390172-9730-3-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dd49a38cf30944be27892c10b1c0e5b3fa73bcb2 21-Oct-2010 Steven Rostedt <srostedt@redhat.com> tracing: Do not limit the size of the number of CPU buffers

The tracing per_cpu buffers were limited to 999 CPUs for a mear
savings in stack space of a char array. Up the array to 30 characters
which is more than enough to hold a 64 bit number.

Reported-by: Robin Holt <holt@sgi.com>
Suggested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
907f27840985fe6a0c62e43cd4702c6e04b4bcc7 28-Sep-2010 matt mooney <mfm@muteddisk.com> tracing/trivial: Remove cast from void*

Unnecessary cast from void* in assignment.

Signed-off-by: matt mooney <mfm@muteddisk.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1aa54bca6ee0d07ebcafb8ca8074b624d80724aa 28-Jul-2010 Marcin Slusarz <marcin.slusarz@gmail.com> tracing: Sanitize value returned from write(trace_marker, "...", len)

When userspace code writes non-new-line-terminated string to trace_marker
file, write handler appends new-line and returns number of bytes written
to trace buffer, so
write(fd, "abc", 3) will return 4

That's unexpected and unfortunately it confuses glibc's fprintf function.

Example:
int main() {
fprintf(stderr, "abc");
return 0;
}

$ gcc test.c -o test
$ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer
$ ./test 2>/sys/kernel/debug/tracing/trace_marker

results in infinite loop:
write(fd, "abc", 3) = 4
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
write(fd, "", 1) = 0
(...)

...and kernel trace buffer full of empty markers.

Fix it by sanitizing write return value.

Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
LKML-Reference: <20100727231801.GB2826@joi.lan>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
955b61e597984745fb7d34c75708f6503b6aaeab 05-Aug-2010 Jason Wessel <jason.wessel@windriver.com> ftrace,kdb: Extend kdb to be able to dump the ftrace buffer

Add in a helper function to allow the kdb shell to dump the ftrace
buffer.

Modify trace.c to expose the capability to iterate over the ftrace
buffer in a read only capacity.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
CC: Frederic Weisbecker <fweisbec@gmail.com>
24a461d537f49f9da6533d83100999ea08c6c755 10-Jul-2010 Dan Carpenter <error27@gmail.com> trace: strlen() return doesn't account for the NULL

We need to add one to the strlen() return because of the NULL
character. The type->name here generally comes from the kernel and I
don't think any of them come close to being MAX_TRACER_SIZE (100)
characters long so this is basically a cleanup.

Signed-off-by: Dan Carpenter <error27@gmail.com>
LKML-Reference: <20100710100644.GV19184@bicker>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ef710e100c1068d3dd5774d2b34c5485219e06ce 01-Jul-2010 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> tracing: Shrink max latency ringbuffer if unnecessary

Documentation/trace/ftrace.txt says

buffer_size_kb:

This sets or displays the number of kilobytes each CPU
buffer can hold. The tracer buffers are the same size
for each CPU. The displayed number is the size of the
CPU buffer and not total size of all buffers. The
trace buffers are allocated in pages (blocks of memory
that the kernel uses for allocation, usually 4 KB in size).
If the last page allocated has room for more bytes
than requested, the rest of the page will be used,
making the actual allocation bigger than requested.
( Note, the size may not be a multiple of the page size
due to buffer management overhead. )

This can only be updated when the current_tracer
is set to "nop".

But it's incorrect. currently total memory consumption is
'buffer_size_kb x CPUs x 2'.

Why two times difference is there? because ftrace implicitly allocate
the buffer for max latency too.

That makes sad result when admin want to use large buffer. (If admin
want full logging and makes detail analysis). example, If admin
have 24 CPUs machine and write 200MB to buffer_size_kb, the system
consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is
usually unacceptable.

Fortunatelly, almost all users don't use max latency feature.
The max latency buffer can be disabled easily.

This patch shrink buffer size of the max latency buffer if
unnecessary.

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e870e9a1240bcef1157ffaaf71dac63362e71904 02-Jul-2010 Li Zefan <lizf@cn.fujitsu.com> tracing: Allow to disable cmdline recording

We found that even enabling a single trace event that will rarely be
triggered can add big overhead to context switch.

(lmbench context switch test)
-------------------------------------------------
2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw
------ ------ ------ ------ ------ ------- -------
2.19 2.3 2.21 2.56 2.13 2.54 2.07
2.39 2.51 2.35 2.75 2.27 2.81 2.24

The overhead is 6% ~ 11%.

It's because when a trace event is enabled 3 tracepoints (sched_switch,
sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname.

We'd like to avoid this overhead, so add a trace option '(no)record-cmd'
to allow to disable cmdline recording.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b444786f1a797a7f84e2561346a670649f9c7b3c 07-Jul-2010 Arnd Bergmann <arnd@arndb.de> tracing: Use generic_file_llseek for debugfs

The default for llseek will change to no_llseek,
so the tracing debugfs files need to add explicit
.llseek assignments. Since we're dealing with regular
files from a VFS perspective, use generic_file_llseek.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <1278538820-1392-10-git-send-email-arnd@arndb.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
eb7beb5c09af75494234ea6acd09d0a647cf7338 16-Jul-2010 Frederic Weisbecker <fweisbec@gmail.com> tracing: Remove special traces

Special traces type was only used by sysprof. Lets remove it now
that sysprof ftrace plugin has been dropped.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Soeren Sandmann <sandmann@daimi.au.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
f376bf5ffbad863d4bc3b2586b7e34cdf756ad17 16-Jul-2010 Frederic Weisbecker <fweisbec@gmail.com> tracing: Remove sysprof ftrace plugin

The sysprof ftrace plugin doesn't seem to be seriously used
somewhere. There is a branch in the sysprof tree that makes
an interface to it, but the real sysprof tool uses either its
own module or perf events.

Drop the sysprof ftrace plugin then, as it's mostly useless.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Soeren Sandmann <sandmann@daimi.au.dk>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
5e3d20a68f63fc5a310687d81956c3b96e488b84 04-Jul-2010 Arnd Bergmann <arnd@arndb.de> init: Remove the BKL from startup code

I have shown by code review that no driver takes
the BKL at init time any more, so whatever the
init code was locking against is no longer there
and it is now safe to remove the BKL there.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steven Rostedt <rostedt@goodmis>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
30dbb20e68e6f7df974b77d2350ebad5eb6f6c9e 26-May-2010 Américo Wang <xiyou.wangcong@gmail.com> tracing: Remove boot tracer

The boot tracer is useless. It simply logs the initcalls
but in fact these initcalls are also logged through printk
while using the initcall_debug kernel parameter.

Nobody seem to be using it so far. Then just remove it.

Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Chase Douglas <chase.douglas@canonical.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <20100526105753.GA5677@cr0.nay.redhat.com>
[ remove the hooks in main.c, and the headers ]
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
5168ae50a66e3ff7184c2b16d661bd6d70367e50 03-Jun-2010 Steven Rostedt <srostedt@redhat.com> tracing: Remove ftrace_preempt_disable/enable

The ftrace_preempt_disable/enable functions were to address a
recursive race caused by the function tracer. The function tracer
traces all functions which makes it easily susceptible to recursion.
One area was preempt_enable(). This would call the scheduler and
the schedulre would call the function tracer and loop.
(So was it thought).

The ftrace_preempt_disable/enable was made to protect against recursion
inside the scheduler by storing the NEED_RESCHED flag. If it was
set before the ftrace_preempt_disable() it would not call schedule
on ftrace_preempt_enable(), thinking that if it was set before then
it would have already scheduled unless it was already in the scheduler.

This worked fine except in the case of SMP, where another task would set
the NEED_RESCHED flag for a task on another CPU, and then kick off an
IPI to trigger it. This could cause the NEED_RESCHED to be saved at
ftrace_preempt_disable() but the IPI to arrive in the the preempt
disabled section. The ftrace_preempt_enable() would not call the scheduler
because the flag was already set before entring the section.

This bug would cause a missed preemption check and cause lower latencies.

Investigating further, I found that the recusion caused by the function
tracer was not due to schedule(), but due to preempt_schedule(). Now
that preempt_schedule is completely annotated with notrace, the recusion
no longer is an issue.

Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2711ca237a084286ea1c2dcf82ab2aadab23a00d 21-May-2010 Steven Rostedt <srostedt@redhat.com> ring-buffer: Move zeroing out excess in page to ring buffer code

Currently the trace splice code zeros out the excess bytes in the page before
sending it off to userspace.

This is to make sure userspace is not getting anything it should not be
when reading the pages, because the excess data was never initialized
to zero before writing (for perfomance reasons).

But the splice code has no business in doing this work, it should be
done by the ring buffer. With the latest changes for recording lost
events, the splice code gets it wrong anyway.

Move the zeroing out of excess bytes into the ring buffer code.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
35f3d14dbbc58447c61e38a162ea10add6b31dc7 20-May-2010 Jens Axboe <jens.axboe@oracle.com> pipe: add support for shrinking and growing pipes

This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
(and relay and network splice) usage to work with these larger (or smaller)
pipes.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
a9a5776380208a3e48a92d0c763ee1a3b486fb73 23-Apr-2010 Steven Rostedt <srostedt@redhat.com> tracing: Allow events to share their print functions

Multiple events may use the same method to print their data.
Instead of having all events have a pointer to their print funtions,
the trace_event structure now points to a trace_event_functions structure
that will hold the way to print ouf the event.

The event itself is now passed to the print function to let the print
function know what kind of event it should print.

This opens the door to consolidating the way several events print
their output.

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4900382 1048964 861512 6810858 67ecea vmlinux.init
4900446 1049028 861512 6810986 67ed6a vmlinux.preprint

This change slightly increases the size but is needed for the next change.

v3: Fix the branch tracer events to handle this change.

v2: Fix the new function graph tracer event calls to handle this change.

Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
72c9ddfd4c5bf54ef03cfdf57026416cb678eeba 21-Apr-2010 David Miller <davem@davemloft.net> ring-buffer: Make non-consuming read less expensive with lots of cpus.

When performing a non-consuming read, a synchronize_sched() is
performed once for every cpu which is actively tracing.

This is very expensive, and can make it take several seconds to open
up the 'trace' file with lots of cpus.

Only one synchronize_sched() call is actually necessary. What is
desired is for all cpus to see the disabling state change. So we
transform the existing sequence:

for_each_cpu() {
ring_buffer_read_start();
}

where each ring_buffer_start() call performs a synchronize_sched(),
into the following:

for_each_cpu() {
ring_buffer_read_prepare();
}
ring_buffer_read_prepare_sync();
for_each_cpu() {
ring_buffer_read_start();
}

wherein only the single ring_buffer_read_prepare_sync() call needs to
do the synchronize_sched().

The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
memory and increments ->record_disabled.

In the second phase, ring_buffer_read_prepare_sync() makes sure this
->record_disabled state is visible fully to all cpus.

And in the final third phase, the ring_buffer_read_start() calls reset
the 'iter' objects allocated in the first phase since we now know that
none of the cpus are adding trace entries any more.

This makes openning the 'trace' file nearly instantaneous on a
sparc64 Niagara2 box with 128 cpus tracing.

Signed-off-by: David S. Miller <davem@davemloft.net>
LKML-Reference: <20100420.154711.11246950.davem@davemloft.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
62b915f1060996a8e1f69be50e3b8e9e43b710cb 02-Apr-2010 Jiri Olsa <jolsa@redhat.com> tracing: Add graph output support for irqsoff tracer

Add function graph output to irqsoff tracer.

The graph output is enabled by setting new 'display-graph' trace option.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cecbca96da387428e220e307a9c945e37e2f4d9e 18-Apr-2010 Frederic Weisbecker <fweisbec@gmail.com> tracing: Dump either the oops's cpu source or all cpus buffers

The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one
dump every cpu buffers when an oops or panic happens.

It's nice when you have few cpus but it may take ages if have many,
plus you miss the real origin of the problem in all the cpu traces.

Sometimes, all you need is to dump the cpu buffer that triggered the
opps, most of the time it is our main interest.

This patch modifies ftrace_dump_on_oops to handle this choice.

The ftrace_dump_on_oops kernel parameter, when it comes alone, has
the same behaviour than before. But ftrace_dump_on_oops=orig_cpu
will only dump the buffer of the cpu that oops'ed.

Similarly, sysctl kernel.ftrace_dump_on_oops=1 and
echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous
behaviour. But setting 2 jumps into cpu origin dump mode.

v2: Fix double setup
v3: Fix spelling issues reported by Randy Dunlap
v4: Also update __ftrace_dump in the selftests

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
aa27497c2fb4c7f57706099bd489e683e5cc3e3b 05-Apr-2010 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: Fix uninitialized variable of tracing/trace output

Because a local variable is not initialized, I got these
when I did 'cat tracing/trace'. (not trace_pipe):

CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770221: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446612133255294080 EVENTS]
ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770222: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock
CPU:0 [LOST 18446744071579453134 EVENTS]
ps-3099 [000] 560.770222: lock_release: ffffffff816cfb98 dcache_lock

See peek_next_entry(), it does not set *lost_events when we 'cat tracing/trace'

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4BB9A929.2000303@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bc21b478425ac73f66a5ec0b375a5e0d12d609ce 01-Apr-2010 Steven Rostedt <srostedt@redhat.com> tracing: Show the lost events in the trace_pipe output

Now that the ring buffer can keep track of where events are lost.
Use this information to the output of trace_pipe:

hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock
hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock
hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock
CPU:1 [LOST 673 EVENTS]
hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738
hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem
hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem

Even works with the function graph tracer:

2) ! 170.098 us | }
2) 4.036 us | rcu_irq_exit();
2) 3.657 us | idle_cpu();
2) ! 190.301 us | }
CPU:2 [LOST 2196 EVENTS]
2) 0.853 us | } /* cancel_dirty_page */
2) | remove_from_page_cache() {
2) 1.578 us | _raw_spin_lock_irq();
2) | __remove_from_page_cache() {

Note, it does not work with the iterator "trace" file, since it requires
the use of consuming the page from the ring buffer to determine how many
events were lost, which the iterator does not do.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
66a8cb95ed04025664d1db4e952155ee1dccd048 31-Mar-2010 Steven Rostedt <srostedt@redhat.com> ring-buffer: Add place holder recording of dropped events

Currently, when the ring buffer drops events, it does not record
the fact that it did so. It does inform the writer that the event
was dropped by returning a NULL event, but it does not put in any
place holder where the event was dropped.

This is not a trivial thing to add because the ring buffer mostly
runs in overwrite (flight recorder) mode. That is, when the ring
buffer is full, new data will overwrite old data.

In a produce/consumer mode, where new data is simply dropped when
the ring buffer is full, it is trivial to add the placeholder
for dropped events. When there's more room to write new data, then
a special event can be added to notify the reader about the dropped
events.

But in overwrite mode, any new write can overwrite events. A place
holder can not be inserted into the ring buffer since there never
may be room. A reader could also come in at anytime and miss the
placeholder.

Luckily, the way the ring buffer works, the read side can find out
if events were lost or not, and how many events. Everytime a write
takes place, if it overwrites the header page (the next read) it
updates a "overrun" variable that keeps track of the number of
lost events. When a reader swaps out a page from the ring buffer,
it can record this number, perfom the swap, and then check to
see if the number changed, and take the diff if it has, which would be
the number of events dropped. This can be stored by the reader
and returned to callers of the reader.

Since the reader page swap will fail if the writer moved the head
page since the time the reader page set up the swap, this gives room
to record the overruns without worrying about races. If the reader
sets up the pages, records the overrun, than performs the swap,
if the swap succeeds, then the overrun variable has not been
updated since the setup before the swap.

For binary readers of the ring buffer, a flag is set in the header
of each sub page (sub buffer) of the ring buffer. This flag is embedded
in the size field of the data on the sub buffer, in the 31st bit (the size
can be 32 or 64 bits depending on the architecture), but only 27
bits needs to be used for the actual size (less actually).

We could add a new field in the sub buffer header to also record the
number of events dropped since the last read, but this will change the
format of the binary ring buffer a bit too much. Perhaps this change can
be made if the information on the number of events dropped is considered
important enough.

Note, the notification of dropped events is only used by consuming reads
or peeking at the ring buffer. Iterating over the ring buffer does not
keep this information because the necessary data is only available when
a page swap is made, and the iterator does not swap out pages.

Cc: Robert Richter <robert.richter@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5a0e3ad6af8660be21ca98a971cd00f331318c05 24-Mar-2010 Tejun Heo <tj@kernel.org> include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
b6345879ccbd9b92864fbd7eb8ac48acdb4d6b15 13-Mar-2010 Steven Rostedt <srostedt@redhat.com> tracing: Do not record user stack trace from NMI context

A bug was found with Li Zefan's ftrace_stress_test that caused applications
to segfault during the test.

Placing a tracing_off() in the segfault code, and examining several
traces, I found that the following was always the case. The lock tracer
was enabled (lockdep being required) and userstack was enabled. Testing
this out, I just enabled the two, but that was not good enough. I needed
to run something else that could trigger it. Running a load like hackbench
did not work, but executing a new program would. The following would
trigger the segfault within seconds:

# echo 1 > /debug/tracing/options/userstacktrace
# echo 1 > /debug/tracing/events/lock/enable
# while :; do ls > /dev/null ; done

Enabling the function graph tracer and looking at what was happening
I finally noticed that all cashes happened just after an NMI.

1) | copy_user_handle_tail() {
1) | bad_area_nosemaphore() {
1) | __bad_area_nosemaphore() {
1) | no_context() {
1) | fixup_exception() {
1) 0.319 us | search_exception_tables();
1) 0.873 us | }
[...]
1) 0.314 us | __rcu_read_unlock();
1) 0.325 us | native_apic_mem_write();
1) 0.943 us | }
1) 0.304 us | rcu_nmi_exit();
[...]
1) 0.479 us | find_vma();
1) | bad_area() {
1) | __bad_area() {

After capturing several traces of failures, all of them happened
after an NMI. Curious about this, I added a trace_printk() to the NMI
handler to read the regs->ip to see where the NMI happened. In which I
found out it was here:

ffffffff8135b660 <page_fault>:
ffffffff8135b660: 48 83 ec 78 sub $0x78,%rsp
ffffffff8135b664: e8 97 01 00 00 callq ffffffff8135b800 <error_entry>

What was happening is that the NMI would happen at the place that a page
fault occurred. It would call rcu_read_lock() which was traced by
the lock events, and the user_stack_trace would run. This would trigger
a page fault inside the NMI. I do not see where the CR2 register is
saved or restored in NMI handling. This means that it would corrupt
the page fault handling that the NMI interrupted.

The reason the while loop of ls helped trigger the bug, was that
each execution of ls would cause lots of pages to be faulted in, and
increase the chances of the race happening.

The simple solution is to not allow user stack traces in NMI context.
After this patch, I ran the above "ls" test for a couple of hours
without any issues. Without this patch, the bug would trigger in less
than a minute.

Cc: stable@kernel.org
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a2f8071428ed9a0f06865f417c962421c9a6b488 13-Mar-2010 Steven Rostedt <srostedt@redhat.com> tracing: Disable buffer switching when starting or stopping trace

When the trace iterator is read, tracing_start() and tracing_stop()
is called to stop tracing while the iterator is processing the trace
output.

These functions disable both the standard buffer and the max latency
buffer. But if the wakeup tracer is running, it can switch these
buffers between the two disables:

buffer = global_trace.buffer;
if (buffer)
ring_buffer_record_disable(buffer);

<<<--------- swap happens here

buffer = max_tr.buffer;
if (buffer)
ring_buffer_record_disable(buffer);

What happens is that we disabled the same buffer twice. On tracing_start()
we can enable the same buffer twice. All ring_buffer_record_disable()
must be matched with a ring_buffer_record_enable() or the buffer
can be disable permanently, or enable prematurely, and cause a bug
where a reset happens while a trace is commiting.

This patch protects these two by taking the ftrace_max_lock to prevent
a switch from occurring.

Found with Li Zefan's ftrace_stress_test.

Cc: stable@kernel.org
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
283740c619d211e34572cc93c8cdba92ccbdb9cc 13-Mar-2010 Steven Rostedt <srostedt@redhat.com> tracing: Use same local variable when resetting the ring buffer

In the ftrace code that resets the ring buffer it references the
buffer with a local variable, but then uses the tr->buffer as the
parameter to reset. If the wakeup tracer is running, which can
switch the tr->buffer with the max saved buffer, this can break
the requirement of disabling the buffer before the reset.

buffer = tr->buffer;
ring_buffer_record_disable(buffer);
synchronize_sched();
__tracing_reset(tr->buffer, cpu);

If the tr->buffer is swapped, then the reset is not happening to the
buffer that was disabled. This will cause the ring buffer to fail.

Found with Li Zefan's ftrace_stress_test.

Cc: stable@kernel.org
Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0e95017355dcf43031da6d0e360a748717e56df1 26-Feb-2010 Tim Bird <tim.bird@am.sony.com> function-graph: Add tracing_thresh support to function_graph tracer

Add support for tracing_thresh to the function_graph tracer. This
version of this feature isolates the checks into new entry and
return functions, to avoid adding more conditional code into the
main function_graph paths.

When the tracing_thresh is set and the function graph tracer is
enabled, only the functions that took longer than the time in
microseconds that was set in tracing_thresh are recorded. To do this
efficiently, only the function exits are recorded:

[tracing]# echo 100 > tracing_thresh
[tracing]# echo function_graph > current_tracer
[tracing]# cat trace
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
1) ! 119.214 us | } /* smp_apic_timer_interrupt */
1) <========== |
0) ! 101.527 us | } /* __rcu_process_callbacks */
0) ! 126.461 us | } /* rcu_process_callbacks */
0) ! 145.111 us | } /* __do_softirq */
0) ! 149.667 us | } /* do_softirq */
0) ! 168.817 us | } /* irq_exit */
0) ! 248.254 us | } /* smp_apic_timer_interrupt */

Also, add support for specifying tracing_thresh on the kernel
command line. When used like so: "tracing_thresh=200 ftrace=function_graph"
this can be used to analyse system startup. It is important to disable
tracing soon after boot, in order to avoid losing the trace data.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Tim Bird <tim.bird@am.sony.com>
LKML-Reference: <4B87098B.4040308@am.sony.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1acaa1b2d9b5904c9cce06122990a2d71046ce16 05-Mar-2010 Arnaldo Carvalho de Melo <acme@redhat.com> tracing: Update the comm field in the right variable in update_max_tr

The latency output showed:

# | task: -3 (uid:0 nice:0 policy:1 rt_prio:99)

The comm is missing in the "task:" and it looks like a minus 3 is
the output. The correct display should be:

# | task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99)

The problem is that the comm is being stored in the wrong data
structure. The max_tr.data[cpu] is what stores the comm, not the
tr->data[cpu].

Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace
ended up showing just the '-' sign followed by the pid.

Also remove a needless initialization of max_data.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ac91d85456372a90af5b85eb6620fd2efb1e431b 02-Mar-2010 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: Fix warning in s_next of trace file ops

This warning in s_next() can be triggered by lseek():
[<c018b3f7>] ? s_next+0x77/0x80
[<c013e3c1>] warn_slowpath_common+0x81/0xa0
[<c018b3f7>] ? s_next+0x77/0x80
[<c013e3fa>] warn_slowpath_null+0x1a/0x20
[<c018b3f7>] s_next+0x77/0x80
[<c01efa77>] traverse+0x117/0x200
[<c01eff13>] seq_lseek+0xa3/0x120
[<c01efe70>] ? seq_lseek+0x0/0x120
[<c01d7081>] vfs_llseek+0x41/0x50
[<c01d8116>] sys_llseek+0x66/0xa0
[<c0102bd0>] sysenter_do_call+0x12/0x26

The iterator "leftover" variable is zeroed in the opening of the trace
file. But lseek can call s_start() which will call s_next() without
reseting the "leftover" variable back to zero, which might trigger
the WARN_ON_ONCE(iter->leftover) that is in s_next().

Cc: stable@kernel.org
Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B8CE06A.9090207@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
74bf4076f2ed79b5510440b72a561823a8852ec0 25-Jan-2010 Steven Rostedt <srostedt@redhat.com> tracing: Prevent kernel oops with corrupted buffer

If the contents of the ftrace ring buffer gets corrupted and the trace
file is read, it could create a kernel oops (usualy just killing the user
task thread). This is caused by the checking of the pid in the buffer.
If the pid is negative, it still references the cmdline cache array,
which could point to an invalid address.

The simple fix is to test for negative PIDs.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d931369b74b3d6f2044f595af6f3dd074f65d9cf 06-Jan-2010 Steven Rostedt <srostedt@redhat.com> tracing: Add stack dump to trace_printk if stacktrace option is set

If the ftrace stacktrace option is set, then add the stack dumps to
trace_printk.

Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7e53bd42d14c75192b99674c40fcc359392da59d 06-Jan-2010 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: Consolidate protection of reader access to the ring buffer

At the beginning, access to the ring buffer was fully serialized
by trace_types_lock. Patch d7350c3f4569 gives more freedom to readers,
and patch b04cc6b1f6 adds code to protect trace_pipe and cpu#/trace_pipe.

But actually it is not enough, ring buffer readers are not always
read-only, they may consume data.

This patch makes accesses to trace, trace_pipe, trace_pipe_raw
cpu#/trace, cpu#/trace_pipe and cpu#/trace_pipe_raw serialized.
And removes tracing_reader_cpumask which is used to protect trace_pipe.

Details:

Ring buffer serializes readers, but it is low level protection.
The validity of the events (which returns by ring_buffer_peek() ..etc)
are not protected by ring buffer.

The content of events may become garbage if we allow another process to consume
these events concurrently:
A) the page of the consumed events may become a normal page
(not reader page) in ring buffer, and this page will be rewritten
by the events producer.
B) The page of the consumed events may become a page for splice_read,
and this page will be returned to system.

This patch adds trace_access_lock() and trace_access_unlock() primitives.

These primitives allow multi process access to different cpu ring buffers
concurrently.

These primitives don't distinguish read-only and read-consume access.
Multi read-only access is also serialized.

And we don't use these primitives when we open files,
we only use them when we read files.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <4B447D52.1050602@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c757bea93bea4b77ebd181cc6dca60c15e3b1a2c 22-Dec-2009 Steven Rostedt <srostedt@redhat.com> tracing: Fix setting tracer specific options

The function __set_tracer_option() takes as its last parameter a
"neg" value. If set it should negate the value of the option.

The trace_options_write() passed the value written to the file
which is what the new value needs to be set as. But since this
is not the negative, it never sets the value.

Reported-by: Peter Zijlstra <peterz@infradead.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
28dfef8febe48f59cf1e7596e1992a6a1893ca24 16-Dec-2009 Alexey Dobriyan <adobriyan@gmail.com> const: constify remaining pipe_buf_operations

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e36c54582c6f14adc9e10473e2aec2cc4f0acc03 14-Dec-2009 Steven Rostedt <srostedt@redhat.com> tracing: Fix return of trace_dump_stack()

The trace_dump_stack() returned a value for a void function.

Also, added the missing stub for trace_dump_stack() when tracing is
not configured.

Reported-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20091214162713.GA31060@elte.hu>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0199c4e68d1f02894bdefe4b5d9e9ee4aedd8d62 02-Dec-2009 Thomas Gleixner <tglx@linutronix.de> locking: Convert __raw_spin* functions to arch_spin*

Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
edc35bd72e2079b25f99c5da7d7a65dbbffc4a26 03-Dec-2009 Thomas Gleixner <tglx@linutronix.de> locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED

Further name space cleanup. No functional change

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
445c89514be242b1b0080056d50bdc1b72adeb5c 02-Dec-2009 Thomas Gleixner <tglx@linutronix.de> locking: Convert raw_spinlock to arch_spinlock

The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
8d18eaaff5acaa58369be342c86e607643ce10c7 08-Dec-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: Simplify trace_option_write()

- remove duplicate code inside trace_options_write()
- extract duplicate code in trace_options_write() and set_tracer_option()

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC532.9010802@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2cbafd68b826f8e0471875cf33cdfb8a1478aef1 08-Dec-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: Remove useless trace option

Since commit 4d9493c90f8e6e1b164aede3814010a290161abb
("ftrace: remove add-hoc code"), option "sched-tree"
has become useless.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC50A.7040402@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
13f16d209161c95e92aef40e350cc6cf56ac440b 08-Dec-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: Use seq file for trace_clock

The buffer for the output is as small as 64 bytes, so it'll
overflow if we add more clock type. Use seq file instead.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4B1DC4FB.5030407@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
fdb372ed4cadbfe9dbba0e932a77d0523682e690 08-Dec-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: Use seq file for trace_options

Code simplification for reading trace_options.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-reference: <4B1DC4EF.3090106@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
273b281fa22c293963ee3e6eec418f5dda2dbc83 18-Oct-2009 Sam Ravnborg <sam@ravnborg.org> kbuild: move utsrelease.h to include/generated

Fix up all users of utsrelease.h

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: Michal Marek <mmarek@suse.cz>
03889384cee7a198a79447c1ea6aca2c8e54d155 11-Dec-2009 Steven Rostedt <srostedt@redhat.com> tracing: Add trace_dump_stack()

I've been asked a few times about how to find out what is calling
some location in the kernel. One way is to use dynamic function tracing
and implement the func_stack_trace. But this only finds out who is
calling a particular function. It does not tell you who is calling
that function and entering a specific if conditional.

I have myself implemented a quick version of trace_dump_stack() for
this purpose a few times, and just needed it now. This is when I realized
that this would be a good tool to have in the kernel like trace_printk().

Using trace_dump_stack() is similar to dump_stack() except that it
writes to the trace buffer instead and can be used in critical locations.

For example:

@@ -5485,8 +5485,12 @@ need_resched_nonpreemptible:
if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) {
if (unlikely(signal_pending_state(prev->state, prev)))
prev->state = TASK_RUNNING;
- else
+ else {
deactivate_task(rq, prev, 1);
+ trace_printk("Deactivating task %s:%d\n",
+ prev->comm, prev->pid);
+ trace_dump_stack();
+ }
switch_count = &prev->nvcsw;
}

Produces:

<...>-3249 [001] 296.105269: schedule: Deactivating task ntpd:3249
<...>-3249 [001] 296.105270: <stack trace>
=> schedule
=> schedule_hrtimeout_range
=> poll_schedule_timeout
=> do_select
=> core_sys_select
=> sys_select
=> system_call_fastpath

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f2942487ffb0c0a80b2312f667ea30dd55a24bb0 06-Dec-2009 Carsten Emde <Carsten.Emde@osadl.org> tracing: Remove comparing of NULL to va_list in trace_array_vprintk()

Olof Johansson stated the following:

Comparing a va_list with NULL is bogus. It's supposed to be treated like
an opaque type and only be manipulated with va_* accessors.

Olof noticed that this code broke the ARM builds:

kernel/trace/trace.c: In function 'trace_array_vprintk':
kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *')
kernel/trace/trace.c: In function 'tracing_mark_write':
kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk'

This patch partly reverts c13d2f7c3231e873f30db92b96c8caa48f100f33 and
re-installs the original mark_printk() mechanism.

Reported-by: Olof Johansson <olof@lixom.net>
Signed-off-by: Carsten Emde <C.Emde@osadl.org>
LKML-Reference: <4B1BAB74.104@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a63ce5b306855bccdacba95c03bfc293316c8ae3 07-Dec-2009 Steven Rostedt <srostedt@redhat.com> tracing: Buffer the output of seq_file in case of filled buffer

If the seq_read fills the buffer it will call s_start again on the next
itertation with the same position. This causes a problem with the
function_graph tracer because it consumes the iteration in order to
determine leaf functions.

What happens is that the iterator stores the entry, and the function
graph plugin will look at the next entry. If that next entry is a return
of the same function and task, then the function is a leaf and the
function_graph plugin calls ring_buffer_read which moves the ring buffer
iterator forward (the trace iterator still points to the function start
entry).

The copying of the trace_seq to the seq_file buffer will fail if the
seq_file buffer is full. The seq_read will not show this entry.
The next read by userspace will cause seq_read to again call s_start
which will reuse the trace iterator entry (the function start entry).
But the function return entry was already consumed. The function graph
plugin will think that this entry is a nested function and not a leaf.

To solve this, the trace code now checks the return status of the
seq_printf (trace_print_seq). If the writing to the seq_file buffer
fails, we set a flag in the iterator (leftover) and we do not reset
the trace_seq buffer. On the next call to s_start, we check the leftover
flag, and if it is set, we just reuse the trace_seq buffer and do not
call into the plugin print functions.

Before this patch:

2) | fput() {
2) | __fput() {
2) 0.550 us | inotify_inode_queue_event();
2) | __fsnotify_parent() {
2) 0.540 us | inotify_dentry_parent_queue_event();

After the patch:

2) | fput() {
2) | __fput() {
2) 0.550 us | inotify_inode_queue_event();
2) 0.548 us | __fsnotify_parent();
2) 0.540 us | inotify_dentry_parent_queue_event();

[
Updated the patch to fix a missing return 0 from the trace_print_seq()
stub when CONFIG_TRACING is disabled.

Reported-by: Ingo Molnar <mingo@elte.hu>
]

Reported-by: Jiri Olsa <jolsa@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
29bf4a5e3fed3dde3eb629a0cb1762c1e9217458 09-Dec-2009 Steven Rostedt <srostedt@redhat.com> tracing: Only call pipe_close if pipe_close is defined

This fixes a cut and paste error that had pipe_close get called
if pipe_open was defined (not pipe_close).

Reported-by: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com>
LKML-Reference: <20091209153204.F4CD.A69D9226@jp.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c521efd1700a8c0f7ce26f011f5eaecca17fabfa 07-Dec-2009 Steven Rostedt <srostedt@redhat.com> tracing: Add pipe_close interface

An ftrace plugin can add a pipe_open interface when the user opens
trace_pipe. But if the plugin allocates something within the pipe_open
it can not free it because there exists no pipe_close. The hook to
the trace file open has a corresponding close. The closing of the
trace_pipe file should also have a corresponding close.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c13d2f7c3231e873f30db92b96c8caa48f100f33 16-Nov-2009 Carsten Emde <Carsten.Emde@osadl.org> tracing: Fix trace_marker output

When a string was written to <debugfs>/tracing/trace_marker, some
strange characters appeared in the trace output instead of the
string, since a vprint function erroneously called a vararg print
function with a va_list argument. This patch fixes the problem and
simplifies the related code.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
LKML-Reference: <4B01AE5D.1010801@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a646365cc330b5aaf4452c91f61b1e0d1acf68d0 11-Nov-2009 Roel Kluin <roel.kluin@gmail.com> tracing: Fix return value of tracing_stats_read()

The function tracing_stats_read() mistakenly returns ENOMEM instead
of the negative value -ENOMEM.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
LKML-Reference: <4AFB2C0B.50605@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dd17c8f72993f9461e9c19250e3f155d6d99df22 29-Oct-2009 Rusty Russell <rusty@rustcorp.com.au> percpu: remove per_cpu__ prefix.

Now that the return from alloc_percpu is compatible with the address
of per-cpu vars, it makes sense to hand around the address of per-cpu
variables. To make this sane, we remove the per_cpu__ prefix we used
created to stop people accidentally using these vars directly.

Now we have sparse, we can use that (next patch).

tj: * Updated to convert stuff which were missed by or added after the
original patch.

* Kill per_cpu_var() macro.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
9705f69ed0a5ef593f45e618bcb3cbfdbf391f64 29-Oct-2009 Tejun Heo <tj@kernel.org> percpu: make percpu symbols in tracer unique

This patch updates percpu related symbols in kernel tracer such that
percpu symbols are unique and don't clash with local symbols. This
serves two purposes of decreasing the possibility of global percpu
symbol collision and allowing dropping per_cpu__ prefix from percpu
symbols.

* kernel/trace/trace.c: s/max_data/max_tr_data/
* kernel/trace/trace_hw_branches: s/tracer/hwb_tracer/, s/buffer/hwb_buffer/

Partly based on Rusty Russell's "alloc_percpu: rename percpu vars
which cause name clashes" patch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
cf8517cf905b5cd31d5790250b9ac39f7cb8aa53 24-Oct-2009 Jiri Olsa <jolsa@redhat.com> tracing: Update *ppos instead of filp->f_pos

Instead of directly updating filp->f_pos we should update the *ppos
argument. The filp->f_pos gets updated within the file_pos_write()
function called from sys_write().

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20091023233646.399670810@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1beee96bae0daf7f491356777c3080cc436950f5 14-Oct-2009 Frederic Weisbecker <fweisbec@gmail.com> ftrace: Rename set_bootup_ftrace into set_cmdline_ftrace

set_cmdline_ftrace is a better match against what does this function:
apply a tracer name from the kernel command line.

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
aef6f81b55f462082699c06e8e67e6eb5630ed45 12-Oct-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing: Rename set_ftrace to set_bootup_ftrace

Do this rename because set_ftrace is too much generic and not enough
self-explainable as a name.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
9288f99aa52d90a5b82573c4b769c97c55af2f56 08-Oct-2009 Christoph Lameter <cl@linux-foundation.org> this_cpu: Use this_cpu_xx for ftrace

this_cpu_xx can reduce the instruction count here and also
avoid address arithmetic.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Tejun Heo <tj@kernel.org>
a813a159766ee9d36aa1fc717c60d63325a6d077 09-Oct-2009 Steven Rostedt <srostedt@redhat.com> tracing: fix trace_vprintk call

The addition of trace_array_{v}printk used the wrong function for
trace_vprintk to call. This broke trace_marker and trace_vprintk
itself. Although trace_printk may not have been affected by those
that end up calling trace_vbprintk.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
79f5599772ac2f138d7a75b8f3f06a93f09c75f7 15-Jun-2009 Li Zefan <lizf@cn.fujitsu.com> cpumask: use zalloc_cpumask_var() where possible

Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
88e9d34c727883d7d6f02cf1475b3ec98b8480c7 23-Sep-2009 James Morris <jmorris@namei.org> seq_file: constify seq_operations

Make all seq_operations structs const, to help mitigate against
revectoring user-triggerable function pointers.

This is derived from the grsecurity patch, although generated from scratch
because it's simpler than extracting the changes from there.

Signed-off-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3c235a337e205da0f614e456be72881483dcde6e 22-Sep-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: Fix off-by-one in trace_get_user()

Leave the last slot for the tailing '\0'.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4AB865FA.5080801@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
89f19f04dc72363d912fd007a399cb10310eff6e 19-Sep-2009 Andrew Morton <akpm@linux-foundation.org> sched: Fix raciness in runqueue_is_locked()

runqueue_is_locked() is unavoidably racy due to a poor interface design.
It does

cpu = get_cpu()
ret = some_perpcu_thing(cpu);
put_cpu(cpu);
return ret;

Its return value is unreliable.

Fix.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <200909191855.n8JItiko022148@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ee6c2c1bd15e60a442d1861b66285f112ce4f25c 18-Sep-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: remove max_tracer_type_len

Limit the length of a tracer's name within 100 chars, and then we
don't have to play with max_tracer_type_len.

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4AB32377.9020601@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
60ba77022712c7cda0eda286154bae160446b24a 13-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: add filter event logic to special, mmiotrace and boot tracers

Now that the pluging tracers use macros to create the structures and
automate the exporting of their formats to the format files, they also
automatically get a filter file.

This patch adds the code to implement the filter logic in the trace
recordings.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b5130b1e7d3717d03ab1916b198bf0d49fa0a619 13-Sep-2009 Carsten Emde <Carsten.Emde@osadl.org> tracing: do not update tracing_max_latency when tracer is stopped

The state of the function pair tracing_stop()/tracing_start() is
correctly considered when tracer data are updated. However, the global
and externally accessible variable tracing_max_latency is always updated
- even when tracing is stopped.

The update should only occur, if tracing was not stopped.

Signed-off-by: Carsten Emde <C.Emde@osadl.org>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b63f39ea50330f836e301ddda21c6a93dcf0d6a3 11-Sep-2009 jolsa@redhat.com <jolsa@redhat.com> tracing: create generic trace parser

Create a "trace_parser" that can parse the user space input for
separate words.

struct trace_parser is the descriptor.

Generic "trace_get_user" function that can be a helper to read multiple
words passed in by user space.

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
LKML-Reference: <1252682969-3366-2-git-send-email-jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
637e7e864103a7a68c1ce43ada27dfc25c0d113f 11-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: add lock depth to entries

This patch adds the lock depth of the big kernel lock to the generic
entry header. This way we can see the depth of the lock and help
in removing the BKL.

Example:

# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /_--=> lock-depth
# |||||/ delay
# cmd pid |||||| time | caller
# \ / |||||| \ | /
<idle>-0 2.N..3 5902255250us+: lock_acquire: read rcu_read_lock
<idle>-0 2.N..3 5902255253us+: lock_release: rcu_read_lock
<idle>-0 2dN..3 5902255257us+: lock_acquire: xtime_lock
<idle>-0 2dN..4 5902255259us : lock_acquire: clocksource_lock
<idle>-0 2dN..4 5902255261us+: lock_release: clocksource_lock

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
48659d31195bb76d688e99dabd816c5472fb1656 11-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: move tgid out of generic entry and into userstack

The userstack trace required the recording of the tgid entry.
Unfortunately, it was added to the generic entry where it wasted
4 bytes of every entry and was only used by one entry.

This patch moves it out of the generic field and moves it into the
only user (userstack_entry).

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e8165dbb03ed04d798163ee512074b9a9466a9c8 04-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: report error in trace if we fail to swap latency buffer

The irqsoff tracer will fail to swap the cpu buffer with the max
buffer if it preempts a commit. Instead of ignoring this, this patch
makes the tracer report it if the last max latency failed due to preempting
a current commit.

The output of the latency tracer will look like this:

# tracer: irqsoff
#
# irqsoff latency trace v1.1.5 on 2.6.31-rc5
# --------------------------------------------------------------------
# latency: 112 us, #1/1, CPU#1 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: -4281 (uid:0 nice:0 policy:0 rt_prio:0)
# -----------------
# => started at: save_args
# => ended at: __do_softirq
#
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress

Note the latency time and the functions that disabled the irqs or preemption
will still be listed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
659372d3e42a3e17a2e042d38a8bcdb94bfbe797 04-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: add trace_array_printk for internal tracers to use

This patch adds a trace_array_printk to allow a tracer to use the
trace_printk on its own trace array.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e77405ad80f53966524b5c31244e13fbbbecbd84 02-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: pass around ring buffer instead of tracer

The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.

This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f633903af2ceb0cec07d45e499a072b6593d0ed1 04-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: make tracing_reset safe for external use

Reseting the trace buffer without first disabling the buffer and
waiting for any writers to complete, can corrupt the ring buffer.

This patch makes the external version of tracing_reset safe from
corruption by disabling the ring buffer and calling synchronize_sched.

This version can no longer be called from interrupt context. But all those
callers have been removed.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2f26ebd549b9ab55ac756b836ec759c11fe93f81 01-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: use timestamp to determine start of latency traces

Currently the latency tracers reset the ring buffer. Unfortunately
if a commit is in process (due to a trace event), this can corrupt
the ring buffer. When this happens, the ring buffer will detect
the corruption and then permanently disable the ring buffer.

The bug does not crash the system, but it does prevent further tracing
after the bug is hit.

Instead of reseting the trace buffers, the timestamp of the start of
the trace is used instead. The buffers will still contain the previous
data, but the output will not count any data that is before the
timestamp of the trace.

Note, this only affects the static trace output (trace) and not the
runtime trace output (trace_pipe). The runtime trace output does not
make sense for the latency tracers anyway.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
76f0d07376388f32698ba51b6090a26b90c1342f 04-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: remove users of tracing_reset

The function tracing_reset is deprecated for outside use of trace.c.

The new function to reset the the buffers is tracing_reset_online_cpus.

The reason for this is that resetting the buffers while the event
trace points are active can corrupt the buffers, because they may
be writing at the time of reset. The tracing_reset_online_cpus disables
writes and waits for current writers to finish.

This patch replaces all users of tracing_reset except for the latency
tracers. Those changes require more work and will be removed in the
following patches.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
621968cdb2563b667d6ecb484ba91ef4c3a797b3 04-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: disable buffers and synchronize_sched before resetting

Resetting the ring buffers while traces are happening can corrupt
the ring buffer and disable it (no kernel crash to worry about).

The safest thing to do is disable the ring buffers, call synchronize_sched()
to wait for all current writers to finish and then reset the buffer.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b8de7bd168fa54d059b16d3057b2f8a32cc5bdc3 01-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: disable update max tracer while reading trace

When reading the tracer from the trace file, updating the max latency
may corrupt the output. This patch disables the tracing of the max
latency while reading the trace file.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8248ac052dfd1eb41819fbc0ca5c7a1667e7e70c 02-Sep-2009 Steven Rostedt <srostedt@redhat.com> tracing: print out start and stop in latency traces

During development of the tracer, we would copy information from
the live tracer to the max tracer with one memcpy. Since then we
added a generic ring buffer and we handle the copies differently now.
Unfortunately, we never copied the critical section information, and
we lost the output:

# => started at: kmem_cache_alloc
# => ended at: kmem_cache_alloc

This patch adds back the critical start and end copying as well as
removes the unused "trace_idx" and "overrun" fields of the
trace_array_cpu structure.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5d4a9dba2d7fbab69f00dedd430d1788834a055a 27-Aug-2009 Steven Rostedt <srostedt@redhat.com> tracing: only show tracing_max_latency when latency tracer configured

The tracing_max_latency file should only be present when one of the
latency tracers ({preempt|irqs}off, wakeup*) are enabled.

This patch also removes tracing_thresh when latency tracers are not
enabled, as well as compiles out code that is only used for latency
tracers.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5079f3261ffd7fe4a537679af695f2328943a245 25-Aug-2009 Zhaolei <zhaolei@cn.fujitsu.com> ftrace: Move setting of clock-source out of options

There are many clock sources for the tracing system but we can only
enable/disable one at a time with the trace/options file.
We can move the setting of clock-source out of options and add a separate
file for it:
# cat trace_clock
[local] global
# echo global > trace_clock
# cat trace_clock
local [global]

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
LKML-Reference: <4A939D08.6050604@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f2d84b65b9778e8a35dd904f7d3993f0a60c9756 07-Aug-2009 Zhaolei <zhaolei@cn.fujitsu.com> ftrace: Unify effect of writing to trace_options and option/*

"echo noglobal-clock > trace_options" can be used to change trace
clock but "echo 0 > options/global-clock" can't. The flag toggling
will be silently accepted without actually changing the clock callback.

We can fix it by using set_tracer_flags() in
trace_options_core_write().

Changelog:
v1->v2: Simplified switch() after Li Zefan <lizf@cn.fujitsu.com>'s
suggestion

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
7770841e63730d62928b0879498064e9614b2ce0 07-Aug-2009 Zhaolei <zhaolei@cn.fujitsu.com> tracing: Rename set_tracer_flags()'s local variable trace_flags

set_tracer_flags() have a local variable named trace_flags which has
the same name than a global one in the same scope.
This leads to confusion, using tracer_flags should be better by its
meaning.

Changelog:
v1->v2: Simplified another patch in this patchset, no change in this
patch.

Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
f413cdb80ce00ec1a4d0ab949b5d96c81cae7f75 07-Aug-2009 Frederic Weisbecker <fweisbec@gmail.com> perf_counter: Fix/complete ftrace event records sampling

This patch implements the kernel side support for ftrace event
record sampling.

A new counter sampling attribute is added:

PERF_SAMPLE_TP_RECORD

which requests ftrace events record sampling. In this case
if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint
fires, we emit the tracepoint binary record to the
perfcounter event buffer, as a sample.

Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf
record:

perf record -f -F 1 -a -e workqueue:workqueue_execution
perf report -D

0x21e18 [0x48]: event: 9
.
. ... raw event: size 72 bytes
. 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........
. 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!......
. 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve
. 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1...........
. 0040: e0 b1 31 81 ff ff ff ff .......
.
0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33

The raw ftrace binary record starts at offset 0020.

Translation:

struct trace_entry {
type = 0x2b = 43;
flags = 1;
preempt_count = 2;
pid = 0xa = 10;
tgid = 0xa = 10;
}

thread_comm = "events/1"
thread_pid = 0xa = 10;
func = 0xffffffff8131b1e0 = flush_to_ldisc()

What will come next?

- Userspace support ('perf trace'), 'flight data recorder' mode
for perf trace, etc.

- The unconditional copy from the profiling callback brings
some costs however if someone wants no such sampling to
occur, and needs to be fixed in the future. For that we need
to have an instant access to the perf counter attribute.
This is a matter of a flag to add in the struct ftrace_event.

- Take care of the events recursivity! Don't ever try to record
a lock event for example, it seems some locking is used in
the profiling fast path and lead to a tracing recursivity.
That will be fixed using raw spinlock or recursivity
protection.

- [...]

- Profit! :-)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1a0799a8fef5acc6503f9c5e79b2cd003317826c 29-Jul-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-graph-tracer: Move graph event insertion helpers in the graph tracer file

The function graph events helpers which insert the function entry and
return events into the ring buffer currently reside in trace.c
But this file is quite overloaded and the right place for these helpers
is in the function graph tracer file.

Then move them to trace_functions_graph.c

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
82e04af498a85ba425efe77580b7ba08234411df 29-Jul-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing: Move sched event insertion helpers in the sched switch tracer file

The sched events helpers which insert the sched switch and wakeup
events into the ring buffer currently reside in trace.c
But this file is quite overloaded and the right place for these helpers
is in the sched switch tracer file.

Then move them to trace_functions.c

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
c0a0d0d3f65284c71115a9bb1ed801ee33eeb552 29-Jul-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: Make the stack entry helpers global

Make the stacktrace event insertion helpers globals.
This has two effects:

- Prepare for moving the sched events insertion helpers to
the sched switch tracer file.
- Move some ifdef outside function definitions

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
5e5bf483986ad86ad25f25eec5299c86eb2d1c57 29-Jul-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: Turn ftrace_cpu_disabled into a global var

In order to prepare the moving of the function graph tracer insertion
helpers from trace.c to trace_functions_graph.c, we need to export the
ftrace_cpu_disabled variable.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
74e7ff8c50b6b022e6ffaa736b16a4dc161d3eaf 28-Jul-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: Fix missing function_graph events when we splice_read from trace_pipe

About a half events are missing when we splice_read
from trace_pipe. They are unexpectedly consumed because we ignore
the TRACE_TYPE_NO_CONSUME return value used by the function graph
tracer when it needs to consume the events by itself to walk on
the ring buffer.

The same problem appears with ftrace_dump()

Example of an output before this patch:

1) | ktime_get_real() {
1) 2.846 us | read_hpet();
1) 4.558 us | }
1) 6.195 us | }

After this patch:

0) | ktime_get_real() {
0) | getnstimeofday() {
0) 1.960 us | read_hpet();
0) 3.597 us | }
0) 5.196 us | }

The fix also applies on 2.6.30

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: stable@kernel.org
LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
8650ae32ef7045e763825dee6256dde7f331bb85 23-Jul-2009 Steven Rostedt <srostedt@redhat.com> tracing: only truncate ftrace files when O_TRUNC is set

The current code will truncate the ftrace files contents if O_APPEND
is not set and the file is opened in write mode. This is incorrect.
It should only truncate the file if O_TRUNC is set. Otherwise
if one of these files is opened by a C program with fopen "r+",
it will incorrectly truncate the file.

Reported-by: Jiri Olsa <jolsa@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ff4e9da2330beb8d64498a513d3f9694e941b01a 22-Jun-2009 Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> tracing: cleanup for tracing_trace_options_read()

'\n' is already appended, and what we need is just an extra
space for the '\0'.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
LKML-Reference: <4A3EED63.3090908@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
566b0aaf798a0dddfc455d1a5b05c424c6686c65 16-Jul-2009 jolsa@redhat.com <jolsa@redhat.com> tracing: Remove unused fields/variables

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: rostedt@goodmis.org
LKML-Reference: <1247773468-11594-2-git-send-email-jolsa@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
405f55712dfe464b3240d7816cc4fe4174831be2 11-Jul-2009 Alexey Dobriyan <adobriyan@gmail.com> headers: smp_lock.h redux

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
77ae365eca895061c8bf2b2e3ae1d9ea62869739 27-Mar-2009 Steven Rostedt <srostedt@redhat.com> ring-buffer: make lockless

This patch converts the ring buffers into a completely lockless
buffer recording system. The read side still takes locks since
we still serialize readers. But the writers are the ones that
must be lockless (those can happen in NMIs).

The main change is to the "head_page" pointer. We write to the
tail, and read from the head. The "head_page" pointer in the cpu
buffer is now just a reference to where to look. The real head
page is now kept in the head_page->list->prev->next pointer.
That is, in the list head of the previous page we set flags.

The list pages are allocated to be aligned such that the lowest
significant bits are always zero pointing to the list. This gives
us play to put in flags to their pointers.

bit 0: set when the page is a head page
bit 1: set when the writer is moving the page (for overwrite mode)

cmpxchg is used to update the pointer.

When the writer wraps the buffer and the tail meets the head,
in overwrite mode, the writer must move the head page forward.
It first uses cmpxchg to change the pointer flag from 1 to 2.
Once this is done, the reader on another CPU will not take the
page from the buffer.

The writers need to protect against interrupts (we don't bother with
disabling interrupts because NMIs are allowed to write too).

After the writer sets the pointer flag to 2, it takes care to
manage interrupts coming in. This is discribed in detail within the
comments of the code.

Changes in version 2:
- Let reader reset entries value of header page.
- Fix tail page passing commit page on reader page test.
- Always increment entries and write counter in rb_tail_page_update
- Add safety check in rb_set_commit_to_write to break out of infinite loop
- add mask in rb_is_reader_page

[ Impact: lock free writing to the ring buffer ]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
020e5f85cb087a40572c8b8b2dd06292a14fa212 01-Jul-2009 Li Zefan <lizf@cn.fujitsu.com> tracing/events: Add trace_event boot option

We already have ftrace= boot option, and this adds a similar
boot option for trace events, so allow trace events to be
enabled at boot, for boot debugging purpose.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A4ACE29.3010407@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
9d612beff5089b89a295a2331883a8ce3fff08c1 24-Jun-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: Fix trace_buf_size boot option

We should be able to specify [KMG] when setting trace_buf_size
boot option, as documented in kernel-parameters.txt

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A41F2DB.4020102@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
f129e965bef40c6153e4fe505f1e408286213424 24-Jun-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: Reset iterator in t_start()

The iterator is m->private, but it's not reset to trace_types in
t_start(). If the output is larger than PAGE_SIZE and t_start()
is called the 2nd time, things will go wrong.

Reviewed-by: Liming Wang <liming.wang@windriver.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4A418728.5020506@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
156f5a7801195fa2ce44aeeb62d6cf8468f3332a 02-Jun-2009 GeunSik Lim <leemgs1@gmail.com> debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem.

Many developers use "/debug/" or "/debugfs/" or "/sys/kernel/debug/"
directory name to mount debugfs filesystem for ftrace according to
./Documentation/tracers/ftrace.txt file.

And, three directory names(ex:/debug/, /debugfs/, /sys/kernel/debug/) is
existed in kernel source like ftrace, DRM, Wireless, Documentation,
Network[sky2]files to mount debugfs filesystem.

debugfs means debug filesystem for debugging easy to use by greg kroah
hartman. "/sys/kernel/debug/" name is suitable as directory name
of debugfs filesystem.
- debugfs related reference: http://lwn.net/Articles/334546/

Fix inconsistency of directory name to mount debugfs filesystem.

* From Steven Rostedt
- find_debugfs() and tracing_files() in this patch.

Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Acked-by : Inaky Perez-Gonzalez <inaky@linux.intel.com>
Reviewed-by : Steven Rostedt <rostedt@goodmis.org>
Reviewed-by : James Smart <james.smart@emulex.com>
CC: Jiri Kosina <trivial@kernel.org>
CC: David Airlie <airlied@linux.ie>
CC: Peter Osterlund <petero2@telia.com>
CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
CC: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
e4f2d10f479d18198ebafcb5e124cc3dd8b8817a 15-Jun-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: replace a GFP_ATOMIC with GFP_KERNEL allocation

Atomic allocation is not needed here.

[ Impact: clean up of memory alloction type ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B898.2050607@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
215368e8e59023d6a0abdda896923018d74fdf7f 15-Jun-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: fix a typo in tracing_cpumask_write()

It's tracing_cpumask_new that should be kfree()ed.

This causes tracing_cpumask to be freed due to the typo:

# echo z > tracing_cpumask
bash: echo: write error: Invalid argument

And subsequent reads/writes to tracing_cpuamsk will access this
already-freed tracing_cpumask, thus may lead to crash.

[ Impact: fix leak and crash when writing invalid val to tracing_cpumask ]

Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <4A35B86A.7070608@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
112f38a7e36e9d688b389507136bf3af3e6d159b 01-Jun-2009 Steven Rostedt <srostedt@redhat.com> tracing: make trace pipe recognize latency format flag

The trace_pipe did not recognize the latency format flag and would produce
different output than the trace file. The problem was partly due that
the trace flags in the iterator was not set as well as the trace_pipe
zeros out part of the iterator (including the flags) to be able to use
the same routines as the trace file. trace_flags of the iterator should
not cause any problems when not zeroed out by for trace_pipe.

Reported-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5b6045a906f48d37591365c5dcdd6d1d146bfd4a 26-May-2009 Heiko Carstens <heiko.carstens@de.ibm.com> trace: disable preemption before taking raw spinlocks

s390 code uses smp_processor_id() in __raw_spin_lock() code which
reveals that a (raw) spinlock is taken without preemption disabled.
This can potentially deadlock.

To fix this explicitly disable and enable preemption.

BUG: using smp_processor_id() in preemptible [00000000] code: cat/2278
caller is trace_find_cmdline+0x40/0xfc
CPU: 0 Not tainted 2.6.30-rc7-dirty #39
Process cat (pid: 2278, task: 000000003faedb68, ksp: 000000003b33b988)
000000003b33b988 000000003b33bae0 0000000000000002 0000000000000000
000000003b33bb80 000000003b33baf8 000000003b33baf8 00000000000175d6
0000000000000001 000000003b33b988 000000003f9b0000 000000000000000b
000000000000000c 000000003b33bb40 000000003b33bae0 0000000000000000
0000000000000000 00000000000175d6 000000003b33bae0 000000003b33bb28
Call Trace:
([<00000000000174b2>] show_trace+0x112/0x170)
[<0000000000017582>] show_stack+0x72/0x100
[<0000000000441538>] dump_stack+0xc8/0xd8
[<000000000025c350>] debug_smp_processor_id+0x114/0x130
[<00000000000bf0e4>] trace_find_cmdline+0x40/0xfc
[<00000000000c35d4>] trace_print_context+0x58/0xac
[<00000000000bb676>] print_trace_line+0x416/0x470
[<00000000000bc8fe>] s_show+0x4e/0x428
[<000000000013834e>] seq_read+0x36a/0x5d4
[<0000000000112a78>] vfs_read+0xc8/0x174
[<0000000000112c58>] SyS_read+0x74/0xc4
[<000000000002c7ae>] sysc_noemu+0x10/0x16
[<000002000012436c>] 0x2000012436c
1 lock held by cat/2278:
#0: (&p->lock){+.+.+.}, at: [<0000000000138056>] seq_read+0x72/0x5d4

[ Impact: fix preempt-unsafe raw spinlock ]

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
4f5359685af6de7dca101393dc606620adbe963f 18-May-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: add trace_event_read_lock()

I found that there is nothing to protect event_hash in
ftrace_find_event(). Rcu protects the event hashlist
but not the event itself while we use it after its extraction
through ftrace_find_event().

This lack of a proper locking in this spot opens a race
window between any event dereferencing and module removal.

Eg:

--Task A--

print_trace_line(trace) {
event = find_ftrace_event(trace)

--Task B--

trace_module_remove_events(mod) {
list_trace_events_module(ev, mod) {
unregister_ftrace_event(ev->event) {
hlist_del(ev->event->node)
list_del(....)
}
}
}
|--> module removed, the event has been dropped

--Task A--

event->print(trace); // Dereferencing freed memory

If the event retrieved belongs to a module and this module
is concurrently removed, we may end up dereferencing a data
from a freed module.

RCU could solve this, but it would add latency to the kernel and
forbid tracers output callbacks to call any sleepable code.
So this fix converts 'trace_event_mutex' to a read/write semaphore,
and adds trace_event_read_lock() to protect ftrace_find_event().

[ Impact: fix possible freed memory dereference in ftrace ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <4A114806.7090302@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
88fc86c283d9c3854e67e4155808027bc2519eb6 14-May-2009 GeunSik Lim <leemgs1@gmail.com> tracing: Append prompt in /debug/tracing/README file

append prompt in /debug/tracing/README file.

This is trivial issue. Fix typo Mini Howto file(README) for ftrace.

[ Impact: cleanup ]

Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: williams <williams@redhat.com>
LKML-Reference: <1242289418.31161.45.camel@centos51>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
9456f0fa6d3cb944d3b9fc31c9a244e0362c26ea 07-May-2009 Steven Rostedt <srostedt@redhat.com> tracing: reset ring buffer when removing modules with events

Li Zefan found that there's a race using the event ids of events and
modules. When a module is loaded, an event id is incremented. We only
have 16 bits for event ids (65536) and there is a possible (but highly
unlikely) race that we could load and unload a module that registers
events so many times that the event id counter overflows.

When it overflows, it then restarts and goes looking for available
ids. An id is available if it was added by a module and released.

The race is if you have one module add an id, and then is removed.
Another module loaded can use that same event id. But if the old module
still had events in the ring buffer, the new module's call back would
get bogus data. At best (and most likely) the output would just be
garbage. But if the module for some reason used pointers (not recommended)
then this could potentially crash.

The safest thing to do is just reset the ring buffer if a module that
registered events is removed.

[ Impact: prevent unpredictable results of event id overflows ]

Reported-by: Li Zefan <lizf@cn.fujitsu.com>
LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
94487d6d53af5acae10cf9fd52f74498994d46b1 06-May-2009 Steven Rostedt <srostedt@redhat.com> tracing: use proper export symbol for tracing api

When adding the EXPORT_SYMBOL to some of the tracing API, I accidently
used EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL. This patch fixes
that mistake.

[ Impact: export the tracing code only for GPL modules ]

Reported-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c8d771835e18c938dae8690611d65fe98ad30f58 30-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: export stats of ring buffers to userspace

This patch adds stats to the ftrace ring buffers:

# cat /debugfs/tracing/per_cpu/cpu0/stats
entries: 42360
overrun: 30509326
commit overrun: 0
nmi dropped: 0

Where entries are the total number of data entries in the buffer.

overrun is the number of entries not consumed and were overwritten by
the writer.

commit overrun is the number of entries dropped due to nested writers
wrapping the buffer before the initial writer finished the commit.

nmi dropped is the number of entries dropped due to the ring buffer
lock being held when an nmi was going to write to the ring buffer.
Note, this field will be meaningless and will go away when the ring
buffer becomes lockless.

[ Impact: let userspace know what is happening in the ring buffers ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7267fa6819467669f5cc2ba81a615dcc88158b4b 29-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: fix ref count in splice pages

The pages allocated for the splice binary buffer did not initialize
the ref count correctly. This caused pages not to be freed and causes
a drastic memory leak.

Thanks to logdev I was able to trace the tracer to find where the leak
was.

[ Impact: stop memory leak when using splice ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
f2957f1f196b0217644a17c1379855a118a37d72 29-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: have splice only copy full pages

Splice works with pages, it is much more effecient to use an entire
page than to copy bits over several pages.

Using logdev to trace the internals of the splice mechanism, I was
able to see that splice can be very aggressive. When tracing is
occurring, and the reader caught up to the writer, and the writer
is on the reader page, the reader will copy what is there into the
splice page. Splice may iterate over several pages and if the
writer is still writing to the page, the reader will keep copying
bits to new pages to pass to userspace.

This patch changes it to only pass data to userspace if the page
is full (the writer has left the page). This has a small side effect
that splice can not read a partial page, and must wait for the
page to fill. This should not be an issue. If tracing has stopped,
then a use of "read" will still read all of the page.

[ Impact: better performance for ring buffer splice code ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
93459c6cb9816c52200993d29dd18cea1daee335 29-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: only add splice page if entries exist

The splice code allocates a page even when the ring buffer is empty.
It detects the ring buffer being empty when it it fails to copy
anything from the ring buffer into the page.

This patch adds a check to see if there is anything in the ring buffer
before allocating a page.

Thanks to logdev for letting me trace the tracer to find this.

[ Impact: speed up due to removing unnecessary allocation ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5beae6efd1004b44c3e257dc96087978e4c763c1 29-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: fix ref count in splice pages

The pages allocated for the splice binary buffer did not initialize
the ref count correctly. This caused pages not to be freed and causes
a drastic memory leak.

Thanks to logdev I was able to trace the tracer to find where the leak
was.

[ Impact: stop memory leak when using splice ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cd891ae0305601bdb4d2e7e85282961c4ff256cd 28-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: convert ftrace_dump spinlocks to raw

ftrace_dump is used for printing out the contents of the ftrace ring buffer
to the console on failure. Currently it uses a spinlock to synchronize
the output from multiple failures on different CPUs. This spin lock
currently is a normal spinlock and can cause issues with lockdep and
lock tracing.

This patch converts it to raw since it is for error handling only.
The lock is local to the ftrace_dump and is not used by any other
infrastructure.

[ Impact: prevent ftrace_dump from locking up by internal tracing ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7a4f453b6d7379a7c380825949977c5a838aa012 22-Apr-2009 Li Zefan <lizf@cn.fujitsu.com> tracing/events: make struct trace_entry->type to be int type

struct trace_entry->type is unsigned char, while trace event's id is
int type, thus for a event with id >= 256, it's entry->type is cast
to (id % 256), and then we can't see the trace output of this event.

# insmod trace-events-sample.ko
# echo foo_bar > /mnt/tracing/set_event
# cat /debug/tracing/events/trace-events-sample/foo_bar/id
256
# cat /mnt/tracing/trace_pipe
<...>-3548 [001] 215.091142: Unknown type 0
<...>-3548 [001] 216.089207: Unknown type 0
<...>-3548 [001] 217.087271: Unknown type 0
<...>-3548 [001] 218.085332: Unknown type 0

[ Impact: fix output for trace events with id >= 256 ]

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <49EEDB0E.5070207@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
3189cdb31622f4e40688ce5a6fc5d940b42bc805 17-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: protect trace_printk from recursion

trace_printk can be called from any context, including NMIs.
If this happens, then we must test for for recursion before
grabbing any spinlocks.

This patch prevents trace_printk from being called recursively.

[ Impact: prevent hard lockup in lockdep event tracer ]

Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12acd473d45cf2e40de3782cb2de712e5cd4d715 17-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing: add EXPORT_SYMBOL_GPL for trace commits

Not all the necessary symbols were exported to allow for tracing
by modules. This patch adds them in.

[ Impact: allow modules to commit data to the ring buffer ]

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
339ae5d3c3fc2025e3657637921495fd600027c7 17-Apr-2009 Li Zefan <lizf@cn.fujitsu.com> tracing: fix file mode of trace and README

trace is read-write and README is read-only.

[ Impact: fix /debug/tracing/ file permissions. ]

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <49E7EAB6.4070605@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
69abe6a5d18a9394baa325bab8f57748b037c517 10-Apr-2009 Avadh Patel <avadh4all@gmail.com> tracing: add saved_cmdlines file to show cached task comms

Export the cached task comms to userspace. This allows user apps to translate
the pids from a trace into their respective task command lines.

[ Impact: let userspace apps reading binary buffer know comm's of pids ]

Signed-off-by: Avadh Patel <avadh4all@gmail.com>
[ added error checking and use of buf pointer to index file_buf ]
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
17c873ec280a03894bc718af817f7f24fa787ae1 11-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing/events: add export symbols for trace events in modules

Impact: let modules add trace events

The trace event code requires some functions to be exported to allow
modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the
necessary functions.

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
eb02ce017dd83985041a7e54c6449f92d53b026f 08-Apr-2009 Tom Zanussi <tzanussi@gmail.com> tracing/filters: use ring_buffer_discard_commit() in filter_check_discard()

This patch changes filter_check_discard() to make use of the new
ring_buffer_discard_commit() function and modifies the current users to
call the old commit function in the non-discard case.

It also introduces a version of filter_check_discard() that uses the
global trace buffer (filter_current_check_discard()) for those cases.

v2 changes:

- fix compile error noticed by Ingo Molnar

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: fweisbec@gmail.com
LKML-Reference: <1239178554.10295.36.camel@tropicana>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
77d9f465d46fd67cdb82ee5e1ab99dd57a17c486 02-Apr-2009 Steven Rostedt <srostedt@redhat.com> tracing/filters: use ring_buffer_discard_commit for discarded events

The ring_buffer_discard_commit makes better usage of the ring_buffer
when an event has been discarded. It tries to remove it completely if
possible.

This patch converts the trace event filtering to use
ring_buffer_discard_commit instead of the ring_buffer_event_discard.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e45f2e2bd298e1ff687448e5fd15a3588b5807ec 31-Mar-2009 Tom Zanussi <tzanussi@gmail.com> tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro

Frederic Weisbecker suggested that the trace_special event shouldn't be
filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro
that allows an event format to be exported without having a filter
attached, and removes filtering from the trace_special event.

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e1112b4d96859367a93468027c9635e2ac04eb3f 31-Mar-2009 Tom Zanussi <tzanussi@gmail.com> tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events

This patch adds run-time field descriptions to all the event formats
exported using TRACE_EVENT_FORMAT. It also hooks up all the tracers
that use them (i.e. the tracers in the 'ftrace subsystem') so they can
also have their output filtered by the event-filtering mechanism.

When I was testing this, there were a couple of things that fooled me
into thinking the filters weren't working, when actually they were -
I'll mention them here so others don't make the same mistakes (and file
bug reports. ;-)

One is that some of the tracers trace multiple events e.g. the
sched_switch tracer uses the context_switch and wakeup events, and if
you don't set filters on all of the traced events, the unfiltered output
from the events without filters on them can make it look like the
filtering as a whole isn't working properly, when actually it is doing
what it was asked to do - it just wasn't asked to do the right thing.

The other is that for the really high-volume tracers e.g. the function
tracer, the volume of filtered events can be so high that it pushes the
unfiltered events out of the ring buffer before they can be read so e.g.
cat'ing the trace file repeatedly shows either no output, or once in
awhile some output but that isn't there the next time you read the
trace, which isn't what you normally expect when reading the trace file.
If you read from the trace_pipe file though, you can catch them before
they disappear.

Changes from v1:

As suggested by Frederic Weisbecker:

- get rid of externs in functions
- added unlikely() to filter_check_discard()

Signed-off-by: Tom Zanussi <tzanussi@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
93cfb3c9fd83d877a8f1ffad9ff862b617b32828 02-Apr-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: fix splice return too large

I got these from strace:

splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192
splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192

I wanted to splice_read 4096 bytes, but it returns 8192 or larger.

It is because the return value of tracing_buffers_splice_read()
does not include "zero out any left over data" bytes.

But tracing_buffers_read() includes these bytes, we make them
consistent.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46674.9030804@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c7625a555f55d7ae49236cde551786c88f5a5ce1 02-Apr-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: update file->f_pos when splice(2) it

Impact: Cleanup

These two lines:

if (unlikely(*ppos))
return -ESPIPE;

in tracing_buffers_splice_read() are not needed, VFS layer
has disabled seek(2).

We remove these two lines, and then we can update file->f_pos.

And tracing_buffers_read() updates file->f_pos, this fix
make tracing_buffers_splice_read() updates file->f_pos too.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46670.4010503@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ddd538f3e6a1a4bec2f6942f83a753263e6577b4 02-Apr-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: allocate page when needed

Impact: Cleanup

Sometimes, we open trace_pipe_raw, but we don't read(2) it,
we just splice(2) it, thus, the page is not used.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D4666B.4010608@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d1e7e02f30be672c6f6ee40908be83877a0d49d1 02-Apr-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: disable seeking for trace_pipe_raw

Impact: disable pread()

We set tracing_buffers_fops.llseek to no_llseek,
but we can still perform pread() to read this file.

That is not expected.

This fix uses nonseekable_open() to disable it.

tracing_buffers_fops.llseek is still set to no_llseek,
it mark this file is a "non-seekable device" and is used by
sys_splice(). See also do_splice() or manual of splice(2):

ERRORS
EINVAL Target file system doesn't support splicing;
neither of the descriptors refers to a pipe;
or offset given for non-seekable device.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
LKML-Reference: <49D46668.8030806@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5452af664f6fba26b80eb2c8c4ceae2999d5cf56 27-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: factorize the tracing files creation

Impact: cleanup

Most of the tracing files creation follow the same pattern:

ret = debugfs_create_file(...)
if (!ret)
pr_warning("Couldn't create ... entry\n")

Unify it!

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238109938-11840-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bc2b6871c17b3aff79fb14e1a1c06c5f5a187f76 23-Mar-2009 Nikanth Karthikesan <knikanth@suse.de> Update /debug/tracing/README

Some of the tracers have been renamed, which was not updated in the in-kernel
run-time README file. Update it.

Signed-off-by: Nikanth Karthikesan <knikanth@suse.de>
LKML-Reference: <200903231158.32151.knikanth@suse.de>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b0dfa978c7a1699fb3506fbfcba0b6a5c4bd17ae 01-Apr-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: alloc the started cpumask for the trace file

Impact: fix a crash while cat trace file

Currently we are using a cpumask to remind each cpu where a
trace occured. It lets us notice the user that a cpu just had
its first trace.

But on latest -tip we have the following crash once we cat the trace
file:

IP: [<c0270c4a>] print_trace_fmt+0x45/0xe7
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
last sysfs file: /sys/class/net/eth0/carrier
Pid: 3897, comm: cat Not tainted (2.6.29-tip-02825-g0f22972-dirty #81)
EIP: 0060:[<c0270c4a>] EFLAGS: 00010297 CPU: 0
EIP is at print_trace_fmt+0x45/0xe7
EAX: 00000000 EBX: 00000000 ECX: c12d9e98 EDX: ccdb7010
ESI: d31f4000 EDI: 00322401 EBP: d31f3f10 ESP: d31f3efc
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process cat (pid: 3897, ti=d31f2000 task=d3b3cf20 task.ti=d31f2000)
Stack:
d31f4080 ccdb7010 d31f4000 d691fe70 ccdb7010 d31f3f24 c0270e5c d31f4000
d691fe70 d31f4000 d31f3f34 c02718e8 c12d9e98 d691fe70 d31f3f70 c02bfc33
00001000 09130000 d3b46e00 d691fe98 00000000 00000079 00000001 00000000
Call Trace:
[<c0270e5c>] ? print_trace_line+0x170/0x17c
[<c02718e8>] ? s_show+0xa7/0xbd
[<c02bfc33>] ? seq_read+0x24a/0x327
[<c02bf9e9>] ? seq_read+0x0/0x327
[<c02ab18b>] ? vfs_read+0x86/0xe1
[<c02ab289>] ? sys_read+0x40/0x65
[<c0202d8f>] ? sysenter_do_call+0x12/0x3c
Code: 00 00 00 89 45 ec f7 c7 00 20 00 00 89 55 f0 74 4e f6 86 98 10 00 00 02 74 45 8b 86 8c 10 00 00 8b 9e a8 10 00 00 e8 52 f3 ff ff <0f> a3 03 19 c0 85 c0 75 2b 8b 86 8c 10 00 00 8b 9e a8 10 00 00
EIP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 SS:ESP 0068:d31f3efc
CR2: 0000000000000000
---[ end trace aa9cf38e5ebed9dd ]---

This is because we alloc the iter->started cpumask on tracing_pipe_open but
not on tracing_open.

It hadn't been noticed until now because we need to have ring buffer overruns
to activate the starting of cpu buffer detection.

Also, we need a check to not print the messagge for the first trace on the file.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238619188-6109-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5f0c6c03c5fee91c02c696bc9bf4c0d41392abe7 27-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: fix missing include string.h

Building a kernel with tracing can raise the following warning on
tip/master:

kernel/trace/trace.c:1249: error: implicit declaration of function 'vbin_printf'

We are missing an include to string.h

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1238160130-7437-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cf8e3474654f20433aab9aa35826d43b5f245008 30-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: fix incorrect return type of ns2usecs()

Impact: fix time output bug in 32bits system

ns2usecs() returns 'long', it's incorrect.

(In i386)
...
<idle>-0 [000] 521.442100: _spin_lock <-tick_do_update_jiffies64
<idle>-0 [000] 521.442101: do_timer <-tick_do_update_jiffies64
<idle>-0 [000] 521.442102: update_wall_time <-do_timer
<idle>-0 [000] 521.442102: update_xtime_cache <-update_wall_time
....
(It always print the time less than 2200 seconds besides ...)
Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds)

...
<idle>-0 [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq
<idle>-0 [001] 4154502640.134760: _local_bh_enable <-__do_softirq
<idle>-0 [001] 4154502640.134761: idle_cpu <-irq_exit
...
(very large value)
Because 'long' is a signed type and it is 32bits in i386.

Changes in v2:
return 'unsigned long long' instead of 'cycle_t'

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
LKML-Reference: <49D05D10.4030009@cn.fujitsu.com>
Reported-by: Li Zefan <lizf@cn.fujitsu.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a2a16d6a3156ef7309ca7328a20c35df9418e670 25-Mar-2009 Steven Rostedt <srostedt@redhat.com> function-graph: add option to calculate graph time or not

graph time is the time that a function is executing another function.
Thus if function A calls B, if graph-time is set, then the time for
A includes B. This is the default behavior. But if graph-time is off,
then the time spent executing B is subtracted from A.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
0706f1c48ca8a7ab478090b4e38f2e578ae2bfe0 24-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: adding function timings to function profiler

If the function graph trace is enabled, the function profiler will
use it to take the timing of the functions.

cat /debug/tracing/trace_stat/functions

Function Hit Time
-------- --- ----
mwait_idle 127 183028.4 us
schedule 26 151997.7 us
__schedule 31 151975.1 us
sys_wait4 2 74080.53 us
do_wait 2 74077.80 us
sys_newlstat 138 39929.16 us
do_path_lookup 179 39845.79 us
vfs_lstat_fd 138 39761.97 us
user_path_at 153 39469.58 us
path_walk 179 39435.76 us
__link_path_walk 189 39143.73 us
[...]

Note the times are skewed due to the function graph tracer not taking
into account schedules.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
be6f164a02f394675e2ac2077dd354cebef5b4c0 24-Mar-2009 Steven Rostedt <srostedt@redhat.com> function-graph: add option for include sleep times

Impact: give user a choice to show times spent while sleeping

The user may want to see the time a function spent sleeping.
This patch adds the trace option "sleep-time" to allow that.
The "sleep-time" option is default on.

echo sleep-time > /debug/tracing/trace_options

produces:

------------------------------------------
2) avahi-d-3428 => <idle>-0
------------------------------------------

2) | finish_task_switch() {
2) 0.621 us | _spin_unlock_irq();
2) 2.202 us | }
2) ! 1002.197 us | }
2) ! 1003.521 us | }

where as,

echo nosleep-time > /debug/tracing/trace_options

produces:

0) <idle>-0 => yum-upd-3416
------------------------------------------

0) | finish_task_switch() {
0) 0.643 us | _spin_unlock_irq();
0) 2.342 us | }
0) + 41.302 us | }
0) + 42.453 us | }

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
1618536961d31f9b3f55767b22d4a897f4204c26 23-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-graph-tracer: fix functions call traces imbalance

Impact: fix traces output

Sometimes one can observe an imbalance in the traces between function
calls and function return traces:

func1() {
}
}

The curly brace inside func1() is the return of another function nested
inside func1. The return trace have been inserted in the buffer but not
the entry.
We are storing a return address on the function traces stack while we
haven't inserted its entry on the buffer, hence the imbalance on the
traces.

This is because the tracers doesn't check all failures that can happen
on buffer insertion.

This patch reports the tracing recursion failures and the ring buffer
failures. In such cases, we now restore the original return address for
the function, giving up its return trace.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237843021-11695-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
3e1f60b80cafcb5d7e8d3665b35962fbb8fb9efa 22-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: check if debugfs is registered before creating files

Impact: fix a crash with ftrace={nop,boot} parameter

If the nop or initcall tracers are launched as boot tracers,
they will attempt to create their option directory and files.
But these tracers are registered very early and then assigned
as "boot tracers" very early if asked to.

Since they do this before debugfs has been registered (core initcall),
a crash is triggered.

Another early tracers could also come later. So we fix it by
checking if debugfs is initialized before creating the root
tracing directory.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237759847-21025-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
07edf7121374609709ef1b0889f6e7b8d6a62ec1 22-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/events: don't use wake up for events

Impact: fix hard-lockup with sched switch events

Some ftrace events, such as sched wakeup, can be traced
while the runqueue lock is hold. Since they are using
trace_current_buffer_unlock_commit(), they call wake_up()
which can try to grab the runqueue lock too, resulting in
a deadlock.

Now for all event, we call a new helper:
trace_nowake_buffer_unlock_commit() which do pretty the same than
trace_current_buffer_unlock_commit() except than it doesn't call
trace_wake_up().

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237759847-21025-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b8b94265337f83b7db9c5f429b1769d463d7da8c 22-Mar-2009 Dmitri Vorobiev <dmitri.vorobiev@movial.com> tracing: fix four sparse warnings

Impact: cleanup.

This patch fixes the following sparse warnings:

kernel/trace/trace.c:385:9: warning: symbol 'trace_seq_to_buffer' was
not declared. Should it be static?

kernel/trace/trace_clock.c:29:13: warning: symbol 'trace_clock_local'
was not declared. Should it be static?

kernel/trace/trace_clock.c:54:13: warning: symbol 'trace_clock' was not
declared. Should it be static?

kernel/trace/trace_clock.c:74:13: warning: symbol 'trace_clock_global'
was not declared. Should it be static?

Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com>
LKML-Reference: <1237741871-5827-4-git-send-email-dmitri.vorobiev@movial.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
cf586b61f80229491127d3c57c06ed93c9f530d3 22-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-graph-tracer: prevent hangs during self-tests

Impact: detect tracing related hangs

Sometimes, with some configs, the function graph tracer can make
the timer interrupt too much slow, hanging the kernel in an endless
loop of timer interrupts servicing.

As suggested by Ingo, this patch brings a watchdog which stops the
selftest after a defined number of functions traced, definitely
disabling this tracer.

For those who want to debug the cause of the function graph trace
hang, you can pass the ftrace_dump_on_oops kernel parameter to dump
the traces after this hang detection.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1237694675-23509-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
40ce74f19c28077550646c76d96a075bf312e461 19-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: remove recording function depth from trace_printk

The function depth in trace_printk was to facilitate the function
graph output. Now that the function graph calculates the depth within
the trace output, we no longer need to record the depth when the
trace_printk is called.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5ef841f6f32dce0b752a4fa0622781ee67a0e874 19-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: make print_(b)printk_msg_only global

This patch makes print_printk_msg_only and print_bprintk_msg_only
global for other functions to use. It also renames them by adding
a "trace_" to the beginning to avoid namespace collisions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
4acd4d00f716873e27e7b60ae292cbdbfae674dd 18-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: give easy way to clear trace buffer

There is currently no easy way to clear the trace buffer. Currently
the only way is to change the current tracer.

This patch lets the user clear the trace buffer by simply writing
into the trace files.

echo > /debug/tracing/trace

or to clear a single cpu (i.e. for CPU 1):

echo > /debug/tracing/per_cpu/cpu1/trace

Requested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a635cf0497342978d417cae19d4a4823932977ff 18-Mar-2009 Carsten Emde <Carsten.Emde@osadl.org> tracing: fix command line to pid reverse map

Impact: fix command line to pid mapping

map_cmdline_to_pid[] is checked in trace_save_cmdline(), but never
updated. This results in stale pid to command line mappings and the
tracer output will associate the wrong comm string.

Signed-off-by: Carsten Emde <Carsten.Emde@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
50d88758a3f9787cbdbdbc030560b815721eab4b 18-Mar-2009 Thomas Gleixner <tglx@linutronix.de> tracing: fix trace_find_cmdline()

Impact: prevent stale command line output

In case there is no valid command line mapping for a pid
trace_find_cmdline() returns without updating the comm buffer. The
trace dump keeps the previous entry which results in confusing trace
output:

<idle>-0 [000] 280.702056 ....
<idle>-23456 [000] 280.702080 ....

Update the comm buffer with "<...>" when no mapping is found.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2c7eea4c62ba090b7f4583c3d7337ea0019be900 18-Mar-2009 Thomas Gleixner <tglx@linutronix.de> tracing: replace the crude (unsigned) -1 hackery

Impact: cleanup

The command line recorder uses (unsigned) -1 to mark non mapped
entries in the pid to command line maps. The validity check is
completely unintuitive: idx >= SAVED_CMDLINES

There is no need for such casting games. Use a constant to mark
unmapped entries and check for that constant to make the code readable
and understandable.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
18aecd362a1c991fbf5f7919ae051a77532ba2f8 18-Mar-2009 Thomas Gleixner <tglx@linutronix.de> tracing: stop command line recording when tracing is disabled

Impact: prevent overwrite of command line entries

When the tracer is stopped the command line recording continues to
record. The check for tracing_is_on() is not sufficient here as the
ringbuffer status is not affected by setting
debug/tracing/tracing_enabled to 0. On a non idle system this can
result in the loss of the command line information for the stopped
trace, which makes the trace harder to read and analyse.

Check tracer_enabled to allow further recording.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
af4617bdba34aa556272b34c3986b0a4d588f568 17-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: add global-clock option to provide cross CPU clock to traces

Impact: feature to allow better serialized clock

This patch adds an option called "global-clock" that will allow
the tracer to switch to a slower but more accurate (across CPUs)
clock.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
37886f6a9f62d22530ffee8d3f9215c8345b6969 17-Mar-2009 Steven Rostedt <srostedt@redhat.com> ring-buffer: add api to allow a tracer to change clock source

This patch adds a new function called ring_buffer_set_clock that
allows a tracer to assign its own clock source to the buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
6adaad14d7d4d3ef31b4e2dc992b18b5da7c4eb3 17-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: stop comm recording on tracing off

Impact: fix for losing comms in trace

The command lines of tasks are cached at sched switch to not need
to record them at every trace point. Disabling the tracing on stops
the recording of traces, but does not stop the caching of command lines.
When the tracing is off the cache may overflow and cause the tracing
to show incorrect tasks matching the PIDs.

This patch disables prevents updates to the comm cache when the ring buffer
is off.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
4ca530852346be239b7c19e7bec5d2b78855bebe 17-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: protect reader of cmdline output

Impact: fix to one cause of incorrect comm outputs in trace

The spinlock only protected the creation of a comm <=> pid pair.
But it was possible that a reader could look up a pid, and get the
wrong comm because it had no locking.

This also required changing trace_find_cmdline to copy the comm cache
and not just send back a pointer to it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2fc1dfbe17e7705c55b7a99da995fa565e26f151 16-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: fix early free of cpumasks

Impact: fix crashes when tracing cpumasks

While ring-buffer allocation, the cpumasks are allocated too,
including the tracing cpumask and the per-cpu file mask handler.
But these cpumasks are freed accidentally just after.
Fix it.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1237164303-11476-1-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
59f586db98919d7d9c43527b26c8de1cdf9ed912 15-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: fix missing mutex unlock on tracing_set_tracer()

Impact: fix possible locking imbalance

In case of ring buffer resize failure, tracing_set_tracer forgot to
release trace_types_lock. Fix it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1237151439-6755-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
7f96f93f02b7637491a1637dee12dcdcd40b9802 13-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: move binary buffers into per cpu directory

The binary_buffers directory in /debugfs/tracing held the files
to read the trace buffers in a binary format. This held one file
per CPU buffer. But we also have a per_cpu directory that holds
a way to read the pretty-print formats.

This patch moves the binary buffers into the per_cpu_directory:

# ls /debug/tracing/per_cpu/cpu1/
trace trace_pipe trace_pipe_raw

The new name is called "trace_pipe_raw". The binary buffers always
acted similar to trace_pipe, except that they produce raw data.

Requested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
48ead02030f849d011259244bb4ea9b985479006 12-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: bring back raw trace_printk for dynamic formats strings

Impact: fix callsites with dynamic format strings

Since its new binary implementation, trace_printk() internally uses static
containers for the format strings on each callsites. But the value is
assigned once at build time, which means that it can't take dynamic
formats.

So this patch unearthes the raw trace_printk implementation for the callers
that will need trace_printk to be able to carry these dynamic format
strings. The trace_printk() macro will use the appropriate implementation
for each callsite. Most of the time however, the binary implementation will
still be used.

The other impact of this patch is that mmiotrace_printk() will use the old
implementation because it calls the low level trace_vprintk and we can't
guess here whether the format passed in it is dynamic or not.

Some parts of this patch have been written by Steven Rostedt (most notably
the part that chooses the appropriate implementation for each callsites).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
db526ca329f855510e8ce672332eba3304aed590 12-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: show that buffer size is not expanded

Impact: do not confuse user on small trace buffer sizes

When the system boots up, the trace buffer is small to conserve memory.
It is only two pages per online CPU. When the tracer is used, it expands
to the default value.

This can confuse the user if they look at the buffer size and see only
7, but then later they see 1408.

# cat /debug/tracing/buffer_size_kb
7

# echo sched_switch > /debug/tracing/current_tracer

# cat /debug/tracing/buffer_size_kb
1408

This patch tries to help remove this confustion by showing that the
buffer has not been expanded.

# cat /debug/tracing/buffer_size_kb
7 (expanded: 1408)

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
1027fcb206a0fb8348e63aff078c74bdee1c2698 12-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: protect ring_buffer_expanded with trace_types_lock

Impact: prevent races with ring_buffer_expanded

This patch places the expanding of the tracing buffer under the
protection of the trace_types_lock mutex. It is highly unlikely
that there would be any contention, but better safe than sorry.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a123c52b46a1f84bcec3dc963351896c6d6afaf7 12-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: fix comments about trace buffer resizing

Impact: cleanup

Some of the comments about the trace buffer resizing is gobbledygook.
And I wonder why people question if I'm a native English speaker.

This patch makes the comments make a bit more sense.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
554f786e284a6ce859d51f62240d615603944c8e 12-Mar-2009 Steven Rostedt <srostedt@redhat.com> ring-buffer: only allocate buffers for online cpus

Impact: save on memory

Currently, a ring buffer was allocated for each "possible_cpus". On
some systems, this is the same as NR_CPUS. Thus, if a system defined
NR_CPUS = 64 but it only had 1 CPU, we could have possibly 63 useless
ring buffers taking up space. With a default buffer of 3 megs, this
could be quite drastic.

This patch changes the ring buffer code to only allocate ring buffers
for online CPUs. If a CPU goes off line, we do not free the buffer.
This is because the user may still have trace data in that buffer
that they would like to look at.

Perhaps in the future we could add code to delete a ring buffer if
the CPU is offline and the ring buffer becomes empty.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
9aba60fe6eb20453de53a572143bef22fa929fba 12-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: fix trace_wait to know to wait on all cpus or just one

Impact: fix to task live locking on reading trace_pipe on one CPU

The same code is used for both trace_pipe (all CPUS) and the per_cpu
trace_pipe file. When there is no data to read, it will check for
signals and wait on the trace wait queue.

The problem happens with the per_cpu wait. The trace_wait code checks
all CPUs. Thus, if there's data in another CPU buffer, then it will
exit the wait, without checking for signals or waiting on the wait queue.

It would then try to read the empty buffer, and since that will just
return nothing, then it will try to wait again. Unfortunately, that will
again fail due to there still being data in the other buffers. This
ends up with a live lock for the task.

This patch fixes the trace_wait to be aware that the iterator may only
be waiting on a single buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
1852fcce181faa237c010a3dbedb473cf9d4555f 11-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: expand the ring buffers when an event is activated

To save memory, the tracer ring buffers are set to a minimum.
The activating of a trace expands the ring buffer size. This patch
adds this expanding, when an event is activated.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
73c5162aa362a543793f4a957c6c536dcbaa89ce 11-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: keep ring buffer to minimum size till used

Impact: less memory impact on systems not using tracer

When the kernel boots up that has tracing configured, it allocates
the default size of the ring buffer. This currently happens to be
1.4Megs per possible CPU. This is quite a bit of wasted memory if
the system is never using the tracer.

The current solution is to keep the ring buffers to a minimum size
until the user uses them. Once a tracer is piped into the current_tracer
the ring buffer will be expanded to the default size. If the user
changes the size of the ring buffer, it will take the size given
by the user immediately.

If the user adds a "ftrace=" to the kernel command line, then the ring
buffers will be set to the default size on initialization.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
80370cb758e7ca2692cd9fb5e413d970b1f4b2b2 10-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: use raw spinlocks for trace_vprintk

Impact: prevent locking up by lockdep tracer

The lockdep tracer uses trace_vprintk and thus trace_vprintk can not
call back into lockdep without locking up.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
ef18012b248b47ec9a12c3a83ca5e99782d39c5d 10-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: remove funky whitespace in the trace code

Impact: clean up

There existed a lot of <space><tab>'s in the tracing code. This
patch removes them.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
888b55dc314d26239d84c3b187dae555a81c1605 08-Mar-2009 KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> ftrace: tracing header should put '#' at the beginning of a line

In a recent discussion, Andrew Morton pointed out that tracing header
should put '#' at the beginning of a line.

Then, we can easily filtered the header by following grep usage:

cat trace | grep -v '^#'

Wakeup trace also has the same header problem.

Comparison of headers displayed:

before this patch:

# tracer: wakeup
#
wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip
--------------------------------------------------------------------
latency: 19059 us, #21277/21277, CPU#1 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
-----------------
| task: kondemand/1-1644 (uid:0 nice:-5 policy:0 rt_prio:0)
-----------------

# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
irqbalan-1887 1d.s. 0us : 1887:120:R + [001] 1644:115:S kondemand/1
irqbalan-1887 1d.s. 1us : default_wake_function <-autoremove_wake_function
irqbalan-1887 1d.s. 2us : check_preempt_wakeup <-try_to_wake_up

after this patch:

# tracer: wakeup
#
# wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip
# --------------------------------------------------------------------
# latency: 529 us, #530/530, CPU#0 | (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4)
# -----------------
# | task: kondemand/0-1641 (uid:0 nice:-5 policy:0 rt_prio:0)
# -----------------
#
# _------=> CPU#
# / _-----=> irqs-off
# | / _----=> need-resched
# || / _---=> hardirq/softirq
# ||| / _--=> preempt-depth
# |||| /
# ||||| delay
# cmd pid ||||| time | caller
# \ / ||||| \ | /
sshd-2496 0d.s. 0us : 2496:120:R + [000] 1641:115:S kondemand/0
sshd-2496 0d.s. 1us : default_wake_function <-autoremove_wake_function
sshd-2496 0d.s. 1us : check_preempt_wakeup <-try_to_wake_up

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20090308124421.23C3.A69D9226@jp.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
769b0441f438c4bb4872cb8560eb6fe51bcc09ee 06-Mar-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk()

Impact: faster and lighter tracing

Now that we have trace_bprintk() which is faster and consume lesser
memory than trace_printk() and has the same purpose, we can now drop
the old implementation in favour of the binary one from trace_bprintk(),
which means we move all the implementation of trace_bprintk() to
trace_printk(), so the Api doesn't change except that we must now use
trace_seq_bprintk() to print the TRACE_PRINT entries.

Some changes result of this:

- Previously, trace_bprintk depended of a single tracer and couldn't
work without. This tracer has been dropped and the whole implementation
of trace_printk() (like the module formats management) is now integrated
in the tracing core (comes with CONFIG_TRACING), though we keep the file
trace_printk (previously trace_bprintk.c) where we can find the module
management. Thus we don't overflow trace.c

- changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries.

- change a bit trace_printk/trace_vprintk macros to support non-builtin formats
constants, and fix 'const' qualifiers warnings. But this is all transparent for
developers.

- etc...

V2:

- Rebase against last changes
- Fix mispell on the changelog

V3:

- Rebase against last changes (moving trace_printk() to kernel.h)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1ba28e02a18cbdbea123836f6c98efb09cbf59ec 06-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: add trace_bprintk()

Impact: add a generic printk() for tracing, like trace_printk()

trace_bprintk() uses the infrastructure to record events on ring_buffer.

[ fweisbec@gmail.com: ported to latest -tip, made it work if
!CONFIG_MODULES, never free the format strings from modules
because we can't keep track of them and conditionnaly create
the ftrace format strings section (reported by Steven Rostedt) ]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1427cdf0592368bdec57276edaf714040ee8744f 06-Mar-2009 Lai Jiangshan <laijs@cn.fujitsu.com> tracing: infrastructure for supporting binary record

Impact: save on memory for tracing

Current tracers are typically using a struct(like struct ftrace_entry,
struct ctx_switch_entry, struct special_entr etc...)to record a binary
event. These structs can only record a their own kind of events.
A new kind of tracer need a new struct and a lot of code too handle it.

So we need a generic binary record for events. This infrastructure
is for this purpose.

[fweisbec@gmail.com: rebase against latest -tip, make it safe while sched
tracing as reported by Steven Rostedt]

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5e2336a0d47c9661a40cc5ef85135ce1406af6e8 06-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: make all file_operations const

Impact: cleanup

All file_operations structures should be constant. No one is going to
change them.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5e1607a00bd082972629d3d68c95c8bcf902b55a 05-Mar-2009 Ingo Molnar <mingo@elte.hu> tracing: rename ftrace_printk() => trace_printk()

Impact: cleanup

Use a more generic name - this also allows the prototype to move
to kernel.h and be generally available to kernel developers who
want to do some quick tracing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
27d48be84477d2f0a2e2ac3738a3971dece631d5 05-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: consolidate print_lat_fmt and print_trace_fmt

Impact: clean up

Both print_lat_fmt and print_trace_fmt do pretty much the same thing
except for one different function call. This patch consolidates the
two functions and adds an if statement to perform the difference.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5fd73f862468280d4cbb5ba4321502f911f9f89a 05-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: remove extra latency_trace method from trace structure

Impact: clean up

The trace and latency_trace function pointers are identical for
every tracer but the function tracer. The differences in the function
tracer are trivial (latency output puts paranthesis around parent).

This patch removes the latency_trace pointer and all prints will
now just use the trace output function pointer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
c032ef64d680717e4e8ce3da65da6419a35f8a2c 05-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: add latency output format option

With the removal of the latency_trace file, we lost the ability
to see some of the finer details in a trace. Like the state of
interrupts enabled, the preempt count, need resched, and if we
are in an interrupt handler, softirq handler or not.

This patch simply creates an option to bring back the old format.
This also removes the warning about an unused variable that held
the latency_trace file operations.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
e74da5235cec6cb71eb338c987f876ecc793138b 05-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: fix seq read from trace files

The buffer used by trace_seq was updated incorrectly. Instead
of consuming what was actually read, it consumed the rest of the
buffer on reads.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2dc5d12b1f43134e9bc5037f69f4739cfdfab93e 05-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: do not return EFAULT if read copied anything

Impact: fix trace read to conform to standards

Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention
that a userspace read should not return -EFAULT if it succeeded in
copying anything. It should only return -EFAULT if it failed to copy
at all.

This patch modifies the check of copy_from_user and updates the return
code appropriately.

I also used H. Peter Anvin's short cut rule to just test ret == count.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
e543ad76914abec1acf6631604a4154cd7a2ca6b 05-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: add cpu_file intialization for ftrace_dump

Impact: fix to ftrace_dump output corruption

The commit: b04cc6b1f6398b0e0b60d37e27ce51b4899672ec
tracing/core: introduce per cpu tracing files

added a new field to the iterator called cpu_file. This was a handle
to differentiate between the per cpu trace output files and the
all cpu "trace" file. The all cpu "trace" file required setting this
to TRACE_PIPE_ALL_CPU.

The problem is that the ftrace_dump sets up its own iterator but was
not updated to handle this change. The result was only CPU 0 printing
out on crash and a lot of "<0>"'s also being printed.

Reported-by: Thomas Gleixner <tglx@linuxtronix.de>
Tested-by: Darren Hart <dvhtc@us.ibm.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
efed792d6738964f399a508ef9e831cd60fa4657 04-Mar-2009 Peter Zijlstra <peterz@infradead.org> tracing: add lockdep tracepoints for lock acquire/release

Augment the traces with lock names when lockdep is available:

1) | down_read_trylock() {
1) | _spin_lock_irqsave() {
1) | /* lock_acquire: &sem->wait_lock */
1) 4.201 us | }
1) | _spin_unlock_irqrestore() {
1) | /* lock_release: &sem->wait_lock */
1) 3.523 us | }
1) | /* lock_acquire: try read &mm->mmap_sem */
1) + 13.386 us | }
1) 1.635 us | find_vma();
1) | handle_mm_fault() {
1) | __do_fault() {
1) | filemap_fault() {
1) | find_lock_page() {
1) | find_get_page() {
1) | /* lock_acquire: read rcu_read_lock */
1) | /* lock_release: rcu_read_lock */
1) 5.697 us | }
1) 8.158 us | }
1) + 11.079 us | }
1) | _spin_lock() {
1) | /* lock_acquire: __pte_lockptr(page) */
1) 3.949 us | }
1) 1.460 us | page_add_file_rmap();
1) | _spin_unlock() {
1) | /* lock_release: __pte_lockptr(page) */
1) 3.115 us | }
1) | unlock_page() {
1) 1.421 us | page_waitqueue();
1) 1.220 us | __wake_up_bit();
1) 6.519 us | }
1) + 34.328 us | }
1) + 37.452 us | }
1) | up_read() {
1) | /* lock_release: &mm->mmap_sem */
1) | _spin_lock_irqsave() {
1) | /* lock_acquire: &sem->wait_lock */
1) 3.865 us | }
1) | _spin_unlock_irqrestore() {
1) | /* lock_release: &sem->wait_lock */
1) 8.562 us | }
1) + 17.370 us | }

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1236166375.5330.7209.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2cadf9135eb3b6d84b6427314be827ddd443c308 02-Dec-2008 Steven Rostedt <srostedt@redhat.com> tracing: add binary buffer files for use with splice

Impact: new feature

This patch creates a directory of files that correspond to the
per CPU ring buffers. These are binary files and are made to
be used with splice. This is the fastest way to extract data from
the ftrace ring buffers.

Thanks to Jiaying Zhang for pushing me to get this code fixed,
and to Eduard - Gabriel Munteanu for his splice code that helped
me debug my code.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
f9520750c4c9924c14325cd951efae5fae58104c 02-Mar-2009 Steven Rostedt <srostedt@redhat.com> tracing: make trace_seq_reset global and rename to trace_seq_init

Impact: clean up

The trace_seq functions may be used separately outside of the ftrace
iterator. The trace_seq_reset is needed for these operations.

This patch also renames trace_seq_reset to the more appropriate
trace_seq_init.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
ef5580d0fffce6e0a01043bac0625128b5d409a7 28-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: add interface to write into current tracer buffer

Right now all tracers must manage their own trace buffers. This was
to enforce tracers to be independent in case we finally decide to
allow each tracer to have their own trace buffer.

But now we are adding event tracing that writes to the current tracer's
buffer. This adds an interface to allow events to write to the current
tracer buffer without having to manage its own. Since event tracing
has no "tracer", and is just a way to hook into any other tracer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
0cfe82451dfa3ebf4e69158f2eb450f2fbb6b715 27-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: replace kzalloc with kcalloc

Impact: clean up

kcalloc is a better approach to allocate a NULL array.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5c6a3ae1b4beebb56e2916b84f1208d96a9e32ff 27-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: use newline separator for trace options list

Impact: clean up

Instead of listing the trace options like:

# cat /debug/tracing/trace_options
print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj

We now list them like:

# cat /debug/tracing/trace_options
print-parent
nosym-offset
nosym-addr
noverbose
noraw
nohex
nobin
noblock
nostacktrace
nosched-tree
ftrace_printk
noftrace_preempt
nobranch
annotate
nouserstacktrace
nosym-userobj

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
85a2f9b46f8cd8aaa11c64c715e1ea3ec27ec486 27-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: use pointer error returns for __tracing_open

Impact: fix compile warning and clean up

When I first wrote __tracing_open, instead of passing the error
code via the ERR_PTR macros, I lazily used a separate parameter
to hold the return for errors.

When Frederic Weisbecker updated that function, he used the Linux
kernel ERR_PTR for the returns. This caused the parameter return
to possibly not be initialized on error. gcc correctly pointed this
out with a warning.

This patch converts the entire function to use the Linux kernel
ERR_PTR macro methods.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
d8e83d26b5ab3b31ee0ff6d093a2627707a1e221 27-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: add protection around open use of current_tracer

Impact: fix to possible race conditions

There's some uses of current_tracer that is not protected by the
trace_types_lock. There is a small chance that a sysadmin changes
the tracer while the current_tracer is being referenced.

If the race is hit, it is unlikely to cause any harm since the
tracers are constant and are not freed. But some strang side
effects may occur.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
577b785f55168d5acb3d123ba41bfe8d7981e044 27-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: add tracer dependent options to options directory

This patch adds the tracer dependent options dynamically to the
options directory when the tracer is activated. These options are
removed when the tracer is deactivated.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a8259075074fb09c230b4cd2c8d3ee3c49d6ecd1 27-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: add options directory and core option files

This patch creates an options directory in the debugfs, that contains
the available tracing options. These files contain 1 or 0, where 1
is the option is enabled and 0 it is disabled.

Simply echoing in 1 will enable the option and 0 will disable it.
This patch only contains the core options, not the tracer options.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
8656e7a2fa6afcd8682990f804a2a9674568738f 26-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: make the per cpu trace files in per cpu directories

Impact: restructure the VFS layout of per CPU trace buffers

The per cpu trace files are all in a single directory:
/debug/tracing/per_cpu. In case of a large number of cpu, the
content of this directory becomes messy so we create now one
directory per cpu inside /debug/tracing/per_cpu which contain
each their own trace_pipe and trace files.

Ie:

/debug/tracing$ ls -R per_cpu
per_cpu:
cpu0 cpu1

per_cpu/cpu0:
trace trace_pipe

per_cpu/cpu1:
trace trace_pipe

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d7350c3f45694104e820041969c8185c5f99e57c 25-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: make the read callbacks reentrants

Now that several per-cpu files can be read or spliced at the
same, we want the read/splice callbacks for tracing files to be
reentrants.

Until now, a single global mutex (trace_types_lock) serialized
the access to tracing_read_pipe(), tracing_splice_read_pipe(),
and the seq helpers.

Ie: it means that if a user tries to read trace_pipe0 and
trace_pipe1 at the same time, the access to the function
tracing_read_pipe() is contended and one reader must wait for
the other to finish its read call.

The trace_type_lock mutex is mostly here to serialize the access
to the global current tracer (current_trace), which can be
changed concurrently. Although the iter struct keeps a private
pointer to this tracer, its callbacks can be changed by another
function.

The method used here is to not keep anymore private reference to
the tracer inside the iterator but to make a copy of it inside
the iterator. Then it checks on subsequents read calls if the
tracer has changed. This is not costly because the current
tracer is not expected to be changed often, so we use a branch
prediction for that.

Moreover, we add a private mutex to the iterator (there is one
iterator per file descriptor) to serialize the accesses in case
of multiple consumers per file descriptor (which would be a
silly idea from the user). Note that this is not to protect the
ring buffer, since the ring buffer already serializes the
readers accesses. This is to prevent from traces weirdness in
case of concurrent consumers. But these mutexes can be dropped
anyway, that would not result in any crash. Just tell me what
you think about it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b04cc6b1f6398b0e0b60d37e27ce51b4899672ec 25-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: introduce per cpu tracing files

Impact: split up tracing output per cpu

Currently, on the tracing debugfs directory, three files are
available to the user to let him extracting the trace output:

- trace is an iterator through the ring-buffer. It's a reader
but not a consumer It doesn't block when no more traces are
available.

- trace pretty similar to the former, except that it adds more
informations such as prempt count, irq flag, ...

- trace_pipe is a reader and a consumer, it will also block
waiting for traces if necessary (heh, yes it's a pipe).

The traces coming from different cpus are curretly mixed up
inside these files. Sometimes it messes up the informations,
sometimes it's useful, depending on what does the tracer
capture.

The tracing_cpumask file is useful to filter the output and
select only the traces captured a custom defined set of cpus.
But still it is not enough powerful to extract at the same time
one trace buffer per cpu.

So this patch creates a new directory: /debug/tracing/per_cpu/.

Inside this directory, you will now find one trace_pipe file and
one trace file per cpu.

Which means if you have two cpus, you will have:

trace0
trace1
trace_pipe0
trace_pipe1

And of course, reading these files will have the same effect
than with the usual tracing files, except that you will only see
the traces from the given cpu.

The original all-in-one cpu trace file are still available on
their original place.

Until now, only one consumer was allowed on trace_pipe to avoid
racy consuming on the ring-buffer. Now the approach changed a
bit, you can have only one consumer per cpu.

Which means you are allowed to read concurrently trace_pipe0 and
trace_pipe1 But you can't have two readers on trace_pipe0 or
trace_pipe1.

Following the same logic, if there is one reader on the common
trace_pipe, you can not have at the same time another reader on
trace_pipe0 or in trace_pipe1. Because in trace_pipe is already
a consumer in all cpu buffers in essence.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
886b5b73d71e4027d7dc6c14f5f7ab102201ea6b 25-Feb-2009 Ingo Molnar <mingo@elte.hu> tracing: remove /debug/tracing/latency_trace

Impact: remove old debug/tracing API

/debug/tracing/latency_trace is an old legacy format we kept from
the old latency tracer. Remove the file for now. If there's any
useful bit missing then we'll propagate any useful output bits into
the /debug/tracing/trace output.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fa7c7f6e11f70d62505074a8b30a776236850dec 11-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: remove unused parameter in tracing_fill_pipe_page()

Impact: cleanup

The struct page *pages parameter is unused.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
6eaaa5d57e76c454479833fc8594cd7c3b75c789 11-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/core: use appropriate waiting on trace_pipe

Impact: api and pipe waiting change

Currently, the waiting used in tracing_read_pipe() is done through a
100 msecs schedule_timeout() loop which periodically check if there
are traces on the buffer.

This can cause small latencies for programs which are reading the incoming
events.

This patch makes the reader waiting for the trace_wait waitqueue except
for few tracers such as the sched and functions tracers which might be
already hold the runqueue lock while waking up the reader.

This is performed through a new callback wait_pipe() on struct tracer.
If none is implemented on a specific tracer, the default waiting for
trace_wait queue is attached.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
af513098452b8887d7c0e15a39d7cb74479501bd 17-Feb-2009 Wenji Huang <wenji.huang@oracle.com> tracing: use the more proper parameter

Pass tsk to tracing_record_cmdline instead of current.

Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
e7669b8e329255bbcb40af65b38e342825d97a46 10-Feb-2009 Hannes Eder <hannes@hanneseder.net> tracing: fix sparse warning: attribute function with __acquires/__releases

Fix this sparse warning:

kernel/trace/trace.c:458:9: warning: context imbalance in 'register_tracer' - unexpected unlock

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5e39841c45cf5e6ea930ede1b0303309e03037a2 10-Feb-2009 Hannes Eder <hannes@hanneseder.net> tracing: fix sparse warnings: fix (un-)signedness

Fix these sparse warnings:

kernel/trace/ring_buffer.c:70:37: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:84:39: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:96:43: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2500:40: warning: incorrect type in argument 3 (different signedness)
kernel/trace/ring_buffer.c:2505:44: warning: incorrect type in argument 2 (different signedness)
kernel/trace/ring_buffer.c:2507:46: warning: incorrect type in argument 2 (different signedness)
kernel/trace/trace.c:2130:40: warning: incorrect type in argument 3 (different signedness)
kernel/trace/trace.c:2280:40: warning: incorrect type in argument 3 (different signedness)

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
4fd2735881bf4d8bf5e30979f31fc2f1b1d505fa 10-Feb-2009 Hannes Eder <hannes@hanneseder.net> tracing: fix sparse warnings: make symbols static

Impact: make global variables and a global function static

The function '__trace_userstack' does not seem to have a caller, so it
is commented out.

Fix this sparse warnings:
kernel/trace/trace.c:82:5: warning: symbol 'tracing_disabled' was not declared. Should it be static?
kernel/trace/trace.c:600:10: warning: symbol 'trace_record_cmdline_disabled' was not declared. Should it be static?
kernel/trace/trace.c:957:6: warning: symbol '__trace_userstack' was not declared. Should it be static?
kernel/trace/trace.c:1694:5: warning: symbol 'tracing_release' was not declared. Should it be static?

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c3706f005c3aaf570e71f0f083fdbb59a5a9fa2e 10-Feb-2009 Wenji Huang <wenji.huang@oracle.com> tracing: fix typos in comments

Impact: clean up.

Fix typos in the comments.

Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
34cd4998d38f9bd04f34b78a7cb0c7f1bee00bd9 09-Feb-2009 Steven Rostedt <srostedt@redhat.com> tracing: clean up splice code

Ingo Molnar suggested a series of clean ups for the splice code.
This patch implements those suggestions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
ff98781bab2735e6c89793034173e0cb5007a7e5 09-Feb-2009 Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> tracing: Move pipe waiting code out of tracing_read_pipe().

This moves the pipe waiting code from tracing_read_pipe() into
tracing_wait_pipe(), which is useful to implement other fops, like
splice_read.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
3c56819b14b00dd449bd776303e61f8532fad09f 09-Feb-2009 Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> tracing: splice support for tracing_pipe

Added and implemented tracing_pipe_fops->splice_read(). This allows
userspace programs to get tracing data more efficiently.

Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
b91facc367366b3f71375f337eb5997ec9ab4e69 06-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-graph-tracer: handle the leaf functions from trace_pipe

When one cats the trace file, the leaf functions are printed without brackets:

function();

whereas in the trace_pipe file we'll see the following:

function() {
}

This is because the ring_buffer handling is not the same between those two files.
On the trace file, when an entry is printed, the iterator advanced and then we can
check the next entry.

There is no iterator with trace_pipe, the current entry to print has been peeked
and not consumed. So checking the next entry will still return the current one while
we don't consume it.

This patch introduces a new value for the output callbacks to ask the tracing
core to not consume the current entry after printing it.

We need it because we will have to consume the current entry ourself to check
the next one.

Now the trace_pipe is able to handle well the leaf functions.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b5db03c4355e568f1567758287b30a6a262d5057 07-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> tracing: handle unregistering the current tracer

Impact: simplification

Instead of requiring that plugins have the sequence:

my_tracer_stop(my_trace_array);
unregister_tracer(my_tracer);

it should be possible just do a:

unregister_tracer(my_tracer);

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1830b52d0de8c60c4f5dfbac134aa8f69d815801 08-Feb-2009 Steven Rostedt <srostedt@redhat.com> trace: remove deprecated entry->cpu

Impact: fix to prevent developers from using entry->cpu

With the new ring buffer infrastructure, the cpu for the entry is
implicit with which CPU buffer it is on.

The original code use to record the current cpu into the generic
entry header, which can be retrieved by entry->cpu. When the
ring buffer was introduced, the users were convert to use the
the cpu number of which cpu ring buffer was in use (this was passed
to the tracers by the iterator: iter->cpu).

Unfortunately, the cpu item in the entry structure was never removed.
This allowed for developers to use it instead of the proper iter->cpu,
unknowingly, using an uninitialized variable. This was not the fault
of the developers, since it would seem like the logical place to
retrieve the cpu identifier.

This patch removes the cpu item from the entry structure and fixes
all the users that should have been using iter->cpu.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
b6f11df26fdc28324cf9c9e3b77f2dc985c1bb13 05-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> trace: Call tracing_reset_online_cpus before tracer->init()

Impact: cleanup

To make it easy for ftrace plugin writers, as this was open coded in
the existing plugins

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
51a763dd84253bab1d0a1e68e11a7753d1b702ca 05-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> tracing: Introduce trace_buffer_{lock_reserve,unlock_commit}

Impact: new API

These new functions do what previously was being open coded, reducing
the number of details ftrace plugin writers have to worry about.

It also standardizes the handling of stacktrace, userstacktrace and
other trace options we may introduce in the future.

With this patch, for instance, the blk tracer (and some others already
in the tree) can use the "userstacktrace" /d/tracing/trace_options
facility.

$ codiff /tmp/vmlinux.before /tmp/vmlinux.after
linux-2.6-tip/kernel/trace/trace.c:
trace_vprintk | -5
trace_graph_return | -22
trace_graph_entry | -26
trace_function | -45
__ftrace_trace_stack | -27
ftrace_trace_userstack | -29
tracing_sched_switch_trace | -66
tracing_stop | +1
trace_seq_to_user | -1
ftrace_trace_special | -63
ftrace_special | +1
tracing_sched_wakeup_trace | -70
tracing_reset_online_cpus | -1
13 functions changed, 2 bytes added, 355 bytes removed, diff: -353

linux-2.6-tip/block/blktrace.c:
__blk_add_trace | -58
1 function changed, 58 bytes removed, diff: -58

linux-2.6-tip/kernel/trace/trace.c:
trace_buffer_lock_reserve | +88
trace_buffer_unlock_commit | +86
2 functions changed, 174 bytes added, diff: +174

/tmp/vmlinux.after:
16 functions changed, 176 bytes added, 413 bytes removed, diff: -237

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0a9877514c4fed10a70720293b37213dd172ee3e 05-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> ring_buffer: remove unused flags parameter

Impact: API change, cleanup

>From ring_buffer_{lock_reserve,unlock_commit}.

$ codiff /tmp/vmlinux.before /tmp/vmlinux.after
linux-2.6-tip/kernel/trace/trace.c:
trace_vprintk | -14
trace_graph_return | -14
trace_graph_entry | -10
trace_function | -8
__ftrace_trace_stack | -8
ftrace_trace_userstack | -8
tracing_sched_switch_trace | -8
ftrace_trace_special | -12
tracing_sched_wakeup_trace | -8
9 functions changed, 90 bytes removed, diff: -90

linux-2.6-tip/block/blktrace.c:
__blk_add_trace | -1
1 function changed, 1 bytes removed, diff: -1

/tmp/vmlinux.after:
10 functions changed, 91 bytes removed, diff: -91

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frédéric Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dac74940289f350c2590bec92737833bad608541 05-Feb-2009 Steven Rostedt <srostedt@redhat.com> trace: code style clean up

Ingo Molnar suggested using goto logic to keep the indentation
down and to be able to remove the nasty line breaks. This actually
makes the code a bit more readable.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
7be421510b91491d5aa5a29fa1005712039b95af 05-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> trace: Remove unused trace_array_cpu parameter

Impact: cleanup

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
268ccda0cb4d1292029d07ee3dbd07117baf6ecb 04-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> trace: assign defaults at register_ftrace_event

Impact: simplification of tracers

As all tracers are doing this we might as well do it in
register_ftrace_event and save one branch each time we call these
callbacks.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ae7462b4f1fe1f36b5d562dbd5202a2eba01f072 04-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> trace: make the trace_event callbacks return enum print_line_t

As they actually all return these enumerators.

Reported-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d9793bd8018f835c64b10f44e278c86cecb8e932 03-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> trace: judicious error checking of trace_seq results

Impact: bugfix and cleanup

Some callsites were returning either TRACE_ITER_PARTIAL_LINE if the
trace_seq routines (trace_seq_printf, etc) returned 0 meaning its buffer
was full, or zero otherwise.

But...

/* Return values for print_line callback */
enum print_line_t {
TRACE_TYPE_PARTIAL_LINE = 0, /* Retry after flushing the seq */
TRACE_TYPE_HANDLED = 1,
TRACE_TYPE_UNHANDLED = 2 /* Relay to other output functions */
};

In other cases the return value was not being relayed at all.

Most of the time it didn't hurt because the page wasn't get filled, but
for correctness sake, handle the return values everywhere.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2c9b238eb325895d3312dad64e2685783575e474 02-Feb-2009 Arnaldo Carvalho de Melo <acme@redhat.com> trace: Change struct trace_event callbacks parameter list

Impact: API change

The trace_seq and trace_entry are in trace_iterator, where there are
more fields that may be needed by tracers, so just pass the
tracer_iterator as is already the case for struct tracer->print_line.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c4a8e8be2d43cc22b371e8e9c05c253409759d94 02-Feb-2009 Frederic Weisbecker <fweisbec@gmail.com> trace: better manage the context info for events

Impact: make trace_event more convenient for tracers

All tracers (for the moment) that use the struct trace_event want to
have the context info printed before their own output: the pid/cmdline,
cpu, and timestamp.

But some other tracers that want to implement their trace_event
callbacks will not necessary need these information or they may want to
format them as they want.

This patch adds a new default-enabled trace option:
TRACE_ITER_CONTEXT_INFO When disabled through:

echo nocontext-info > /debugfs/tracing/trace_options

The pid, cpu and timestamps headers will not be printed.

IE with the sched_switch tracer with context-info (default):

bash-2935 [001] 100.356561: 2935:120:S ==> [001] 0:140:R <idle>
<idle>-0 [000] 100.412804: 0:140:R + [000] 11:115:S events/0
<idle>-0 [000] 100.412816: 0:140:R ==> [000] 11:115:R events/0
events/0-11 [000] 100.412829: 11:115:S ==> [000] 0:140:R <idle>

Without context-info:

2935:120:S ==> [001] 0:140:R <idle>
0:140:R + [000] 11:115:S events/0
0:140:R ==> [000] 11:115:R events/0
11:115:S ==> [000] 0:140:R <idle>

A tracer can disable it at runtime by clearing the bit
TRACE_ITER_CONTEXT_INFO in trace_flags.

The print routines were renamed to trace_print_context and
trace_print_lat_context, so that they can be used by tracers if they
want to use them for one of the trace_event callbacks.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
79fb0768fbd371f3b94d909f51f587b3a24ab272 03-Feb-2009 Steven Rostedt <srostedt@redhat.com> trace: let boot trace be chosen by command line

Now that we have a working ftrace=<tracer> function, make the boot
tracer get activated by it. This way we can turn it on or off without
recompiling the kernel, as well as keeping the selftests on. The
selftests are disabled whenever a default tracer starts running.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b2821ae68b14480bfc85ea1629537163310bc5cd 03-Feb-2009 Steven Rostedt <srostedt@redhat.com> trace: fix default boot up tracer

Peter Zijlstra started the functionality to start up a default
tracing at bootup. This patch finishes the work.

Now if you add 'ftrace=<tracer>' to the command line, when that tracer
is registered on bootup, that tracer is selected and starts tracing.

Note, all selftests for tracers that are registered after this tracer
is disabled. This prevents the selftests from disturbing the running
tracer, or the running tracer from disturbing the selftest.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
9011262a37cb438f0fa9394b5e83840db8f9680a 23-Jan-2009 Arnaldo Carvalho de Melo <acme@redhat.com> ftrace: add ftrace_vprintk

Impact: new helper function

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b06a830183b610c0a88c29a92feb7991a867ab46 22-Jan-2009 Steven Rostedt <srostedt@redhat.com> trace: fix logic to start/stop counting

The logic in the tracing_start/stop code prevents the WARN_ON
from ever detecting if a start/stop pair was mismatched.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
97b17efe4537e11bf6669106cfe4ee2c5331b267 21-Jan-2009 Steven Rostedt <srostedt@redhat.com> ring-buffer: do not swap if recording is disabled

If the ring buffer recording has been disabled. Do not let
swapping of ring buffers occur. Simply return -EAGAIN.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1092307d582a7566d23779c304cf86f3075ac5f0 16-Jan-2009 Steven Rostedt <srostedt@redhat.com> trace: set max latency variable to zero on default

Impact: trace max latencies on start of latency tracing

This patch sets the max latency to zero whenever one of the
irq variant tracers or the wakeup tracer is set to current tracer.

Most developers expect to see output when starting up a latency
tracer. But since the max_latency is already set to max, and
it takes a latency greater than max_latency to be recorded, there
is no trace. This is not the expected behavior and has even confused
myself.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a442e5e0a2011af5b2d1f118fee0a8f9079f1d88 14-Jan-2009 Steven Rostedt <srostedt@redhat.com> trace: stop all recording to ring buffer on ftrace_dump

Impact: limit ftrace dump output

Currently ftrace_dump only calls ftrace_kill that is a fast way
to prevent the function tracer functions from being called (just sets
a flag and clears the function to call, nothing else). It is better
to also turn off any recording to the ring buffers as well.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
faf6861ebd776871e77b761c43ec045cd20b5716 14-Jan-2009 Steven Rostedt <srostedt@redhat.com> trace: print ftrace_dump at KERN_EMERG log level

Impact: fix to print out ftrace_dump when expected

I was debugging a hard race condition to only find out that
after I hit the race, my log level was not at level to show
KERN_INFO. The time it took to trigger the race was wasted because
I did not capture the trace.

Since ftrace_dump is only called from kernel oops (and only when
it is set in the kernel command line to do so), or when a
developer adds it to their own local tree, the log level of
the print should be at KERN_EMERG to make sure the print appears.

ftrace_dump is not called by a normal user setup, and will not
add extra unwanted print out to the console. There is no reason
it should be at KERN_INFO.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
745b1626dd71ce9661a05ea4db57859ed5c773d2 16-Jan-2009 Steven Rostedt <srostedt@redhat.com> trace: set max latency variable to zero on default

Impact: trace max latencies on start of latency tracing

This patch sets the max latency to zero whenever one of the
irq variant tracers or the wakeup tracer is set to current tracer.

Most developers expect to see output when starting up a latency
tracer. But since the max_latency is already set to max, and
it takes a latency greater than max_latency to be recorded, there
is no trace. This is not the expected behavior and has even confused
myself.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a225cdd263f340c864febb1992802fb5b08bc328 16-Jan-2009 Steven Rostedt <srostedt@redhat.com> ftrace: remove static from function tracer functions

Impact: clean up

After reorganizing the functions in trace.c and trace_function.c,
they no longer need to be in global context. This patch makes the
functions and one variable into static.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
bb3c3c95f330f7bf16e33b002e48882616089db1 16-Jan-2009 Steven Rostedt <srostedt@redhat.com> ftrace: move function tracer functions out of trace.c

Impact: clean up of trace.c

The function tracer functions were put in trace.c because it needed
to share static variables that were in trace.c. Since then, those
variables have become global for various reasons. This patch moves
the function tracer functions into trace_function.c where they belong.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5361499101306cfb776c3cfa0f69d0479bc63868 16-Jan-2009 Steven Rostedt <srostedt@redhat.com> ftrace: add stack trace to function tracer

Impact: new feature to stack trace any function

Chris Mason asked about being able to pick and choose a function
and get a stack trace from it. This feature enables his request.

# echo io_schedule > /debug/tracing/set_ftrace_filter
# echo function > /debug/tracing/current_tracer
# echo func_stack_trace > /debug/tracing/trace_options

Produces the following in /debug/tracing/trace:

kjournald-702 [001] 135.673060: io_schedule <-sync_buffer
kjournald-702 [002] 135.673671:
<= sync_buffer
<= __wait_on_bit
<= out_of_line_wait_on_bit
<= __wait_on_buffer
<= sync_dirty_buffer
<= journal_commit_transaction
<= kjournald

Note, be careful about turning this on without filtering the functions.
You may find that you have a 10 second lag between typing and seeing
what you typed. This is why the stack trace for the function tracer
does not use the same stack_trace flag as the other tracers use.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0ee6b6cf5bdb793b4c68507dd65adf16341aa4ca 14-Jan-2009 Steven Rostedt <srostedt@redhat.com> trace: stop all recording to ring buffer on ftrace_dump

Impact: limit ftrace dump output

Currently ftrace_dump only calls ftrace_kill that is a fast way
to prevent the function tracer functions from being called (just sets
a flag and clears the function to call, nothing else). It is better
to also turn off any recording to the ring buffers as well.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
428aee1460a75197f0190534b4d610450ee59af1 14-Jan-2009 Steven Rostedt <srostedt@redhat.com> trace: print ftrace_dump at KERN_EMERG log level

Impact: fix to print out ftrace_dump when expected

I was debugging a hard race condition to only find out that
after I hit the race, my log level was not at level to show
KERN_INFO. The time it took to trigger the race was wasted because
I did not capture the trace.

Since ftrace_dump is only called from kernel oops (and only when
it is set in the kernel command line to do so), or when a
developer adds it to their own local tree, the log level of
the print should be at KERN_EMERG to make sure the print appears.

ftrace_dump is not called by a normal user setup, and will not
add extra unwanted print out to the console. There is no reason
it should be at KERN_INFO.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
002bb86d8d42f18937aef396c3ecd65c7e02e21a 10-Jan-2009 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: separate events tracing and stats tracing engine

Impact: tracing's Api change

Currently, the stat tracing depends on the events tracing.
When you switch to a new tracer, the stats files of the previous tracer
will disappear. But it's more scalable to separate those two engines.
This way, we can keep the stat files of one or several tracers when we
want, without bothering of multiple tracer stat files or tracer switching.

To build/destroys its stats files, a tracer just have to call
register_stat_tracer/unregister_stat_tracer everytimes it wants to.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
4462344ee9ea9224d026801b877887f2f39774a3 01-Jan-2009 Rusty Russell <rusty@rustcorp.com.au> cpumask: convert kernel trace functions further

Impact: Reduce future memory usage, use new cpumask API.

Since the last patch was created and acked, more old cpumask users
slipped into kernel/trace.

Mostly trivial conversions, except struct trace_iterator's "started"
member becomes a cpumask_var_t.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9e01c1b74c9531e301c900edaa92a99fcb7738f2 01-Jan-2009 Rusty Russell <rusty@rustcorp.com.au> cpumask: convert kernel trace functions

Impact: Reduce future memory usage, use new cpumask API.

(Eventually, cpumask_var_t will be allocated based on nr_cpu_ids, not NR_CPUS).

Convert kernel trace functions to use struct cpumask API:
1) Use cpumask_copy/cpumask_test_cpu/for_each_cpu.
2) Use cpumask_var_t and alloc_cpumask_var/free_cpumask_var everywhere.
3) Use on_each_cpu instead of playing with current->cpus_allowed.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
1af237a099a3b8ff56aa384f605c6a68af7bf288 29-Dec-2008 Huang Weiyi <weiyi.huang@gmail.com> tracing: removed duplicated #include

Removed duplicated #include in kernel/trace/trace.c.

Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dbd0b4b33074aa6b7832a9d9a5bd985eca5c1aa2 29-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: provide the base infrastructure for histogram tracing

Impact: extend the tracing API

The goal of this patch is to normalize and make more easy the
implementation of statistical (histogram) tracing.

It implements a trace_stat file into the /debugfs/tracing directory where
one can print a one-shot output of statistics/histogram entries.

A tracer has to provide two basic iterator callbacks:

stat_start() => the first entry
stat_next(prev, idx) => the next one.

Note that it is adapted for arrays or hash tables or lists.... since it
provides a pointer to the previous entry and the current index of the
iterator.

These two callbacks are called to get a snapshot of the statistics at each
opening of the trace_stat file because. The values are so updated between
two "cat trace_stat". And the tracer is free to lock its datas during the
iteration to keep consistent values.

Since it is almost always interesting to sort statisticals values to
address the problems by priority, this infrastructure provides a "sorting"
of the stat entries too if desired. A tracer has just to provide a
stat_cmp callback to compare two entries and the stat tracing
infrastructure will build a sorted list of the given entries.

A last callback, called stat_headers, can be implemented by a tracer to
output headers on its trace.

If one of these callbacks is changed on runtime, it just have to signal it
to the stat tracing API by calling the init_tracer_stat() helper.

Changes in V2:

- Fix a memory leak if the user opens multiple times the trace_stat file
without closing it. Now we always free our list before rebuilding it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
f633cef0200bbaec539e2dbb0bc4bed7f022f98b 24-Dec-2008 Steven Rostedt <srostedt@redhat.com> ftrace: change trace.c to use registered events

Impact: rework trace.c to use new event register API

Almost every ftrace event has to implement its output display in
trace.c through a different function. Some events did not handle
all the formats (trace, latency-trace, raw, hex, binary), and
this method does not scale well.

This patch converts the format functions to use the event API to
find the event and and print its format. Currently, we have
a print function for trace, latency_trace, raw, hex and binary.
A trace_nop_print is available if the event wants to avoid output
on a particular format.

Perhaps other tracers could use this in the future (like mmiotrace and
function_graph).

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
f0868d1e23a8efec33beb3aa688aab7fdb1ae093 24-Dec-2008 Steven Rostedt <srostedt@redhat.com> ftrace: set up trace event hash infrastructure

Impact: simplify/generalize/refactor trace.c

The trace.c file is becoming more difficult to maintain due to the
growing number of events. There is several formats that an event may
be printed. This patch sets up the infrastructure of an event hash to
allow for events to register how they should be printed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c47956d9ae3341d2d1998bff26620fa3338c01e4 24-Dec-2008 Steven Rostedt <srostedt@redhat.com> ftrace: remove obsolete print continue functionality

Impact: cleanup, remove obsolete code

Now that the ring buffer used by ftrace allows for variable length
entries, we do not need the 'cont' feature of the buffer. This code
makes other parts of ftrace more complex and by removing this it
simplifies the ftrace code.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
213cc060797378059a28ebc5c539f3e9a80160bd 18-Dec-2008 Pekka J Enberg <penberg@cs.helsinki.fi> ftrace: introduce tracing_reset_online_cpus() helper

Impact: cleanup

This patch factors out common code from multiple tracers into a
tracing_reset_online_cpus() function and converts the tracers to use it.

Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
3bddb9a3246f6df5cf3b7655cb541ac10203bb71 19-Dec-2008 Ingo Molnar <mingo@elte.hu> tracing: fix warning in kernel/trace/trace.c

this warning:

kernel/trace/trace.c: In function ‘print_lat_fmt’:
kernel/trace/trace.c:1826: warning: unused variable ‘state’

Triggers because 'state' has become unused - remove it.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
3d9101e92529e1ff6014f95a69afc82f37b9b13a 17-Dec-2008 Thomas Gleixner <tglx@linutronix.de> trace: fix task state printout

Impact: fix occasionally incorrect trace output

The tracing code has interesting varieties of printing out task state.

Unfortunalely only one of the instances is correct as it copies the
code from sched.c:sched_show_task(). The others are plain wrong as
they treatthe bitfield as an integer offset into the character
array. Also the size check of the character array is wrong as it
includes the trailing \0.

Use a common state decoder inline which does the Right Thing.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
66896a85cf2890b6bbbc4c9ccdcd296600ffbf89 13-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: add the printk-msg-only option

Impact: display ftrace_printk messages "as is"

By default, ftrace_printk() messages find their output with some other
informations like pid, caller, ...
Sometimes a developer just want to have the ftrace_printk left "as is", without
other information.

This is done by providing a default-off option called printk-msg-only.
To enable it, just do `echo printk-msg-only > /debugfs/tracing/trace_options`

Before the patch:

<...>-2739 [000] 145.692153: __might_sleep: I'm an ftrace_printk msg in __might_sleep
<...>-2739 [000] 145.692155: __might_sleep: I'm another ftrace_printk msg in __might_sleep

After the patch and the printk-msg-only option enabled:

I'm an ftrace_printk msg in __might_sleep
I'm another ftrace_printk msg in __might_sleep

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
29c0177e6a4ac094302bed54a1d4bbb6b740a9ef 13-Dec-2008 Rusty Russell <rusty@rustcorp.com.au> cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers.

Impact: change calling convention of existing cpumask APIs

Most cpumask functions started with cpus_: these have been replaced by
cpumask_ ones which take struct cpumask pointers as expected.

These four functions don't have good replacement names; fortunately
they're rarely used, so we just change them over.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Mike Travis <travis@sgi.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: mingo@redhat.com
Cc: tony.luck@intel.com
Cc: ralf@linux-mips.org
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: cl@linux-foundation.org
Cc: srostedt@redhat.com
a93751cab71d63126687551823ed3e70cd85854a 11-Dec-2008 Markus Metzger <markut.t.metzger@intel.com> x86, bts, ftrace: adapt the hw-branch-tracer to the ds.c interface

Impact: restructure code, cleanup

Remove BTS bits from the hw-branch-tracer (renamed from bts-tracer) and
use the ds interface.

Signed-off-by: Markus Metzger <markut.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e2ac8ef576e45d9db7264abc51383e68d26067bb 12-Nov-2008 Robert Richter <robert.richter@amd.com> ftrace: remove unused function arg in trace_iterator_increment()

This removes the unused cpu function parameter.

Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Robert Richter <robert.richter@amd.com>
e726f5f91effd8944c76475a2688093a03ba0d10 08-Dec-2008 Ingo Molnar <mingo@elte.hu> tracing/function-graph-tracer: fix 'flags' variable mismatch

this warning:

kernel/trace/trace.c: In function ‘trace_vprintk’:
kernel/trace/trace.c:3626: warning: ‘flags’ may be used uninitialized in this function

shows some confusion about irq_flags / flags use here. We already have
irq_flags so remove the extra flags variable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
380c4b1411ccd6885f92b2c8ceb08433a720f44e 06-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-graph-tracer: append the tracing_graph_flag

Impact: Provide a way to pause the function graph tracer

As suggested by Steven Rostedt, the previous patch that prevented from
spinlock function tracing shouldn't use the raw_spinlock to fix it.
It's much better to follow lockdep with normal spinlock, so this patch
adds a new flag for each task to make the function graph tracer able
to be paused. We also can send an ftrace_printk whithout worrying of
the irrelevant traced spinlock during insertion.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8e1b82e0866befaa0b2920be296c6e4c3fc7f422 06-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-graph-tracer: turn tracing_selftest_running into an int

Impact: cleanup

Apply some suggestions of Steven Rostedt:

_turn tracing_selftest_running into a simple int (no need of an atomic_t)
_set it __read_mostly
_fix a comment style

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
21a8c466f99063eeb8567318b4e305eda9015408 04-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: provide the macro task_curr_ret_stack()

Impact: cleanup

As suggested by Steven Rostedt, this patch provide a new macro
task_curr_ret_stack() to move the cpp conditionnal CONFIG into
the linux/ftrace.h headers.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ff32504fdc56407654584ef187b20022c94a3486 04-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: don't insert TRACE_PRINT during selftests

Impact: fix tracer selfstests false results

After setting a ftrace_printk somewhere in th kernel, I saw the
Function tracer selftest failing.

When a selftest occurs, the ring buffer is lurked to see if
some entries were inserted. But concurrent insertion such as
ftrace_printk could occured at the same time and could give
false positive or negative results.

This patch prevent prevent from TRACE_PRINT entries insertion
during selftests.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1fd8f2a3f9a91b287a876cef830b21baafc8a799 03-Dec-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-graph-tracer: handle ftrace_printk entries

Handle the TRACE_PRINT entries from the function grapg tracer
and output them as a C comment just below the function that called
it, as if it was a comment inside this function.

Example with an ftrace_printk inside might_sleep() function:

void __might_sleep(char *file, int line)
{
static unsigned long prev_jiffy; /* ratelimiting */

ftrace_printk("Hi I'm a comment in might_sleep() :-)");

A chunk of a resulting trace:

0) | _reiserfs_free_block() {
0) | reiserfs_read_bitmap_block() {
0) | __bread() {
0) | __getblk() {
0) | __find_get_block() {
0) 0.698 us | mark_page_accessed();
0) 2.267 us | }
0) | __might_sleep() {
0) | /* Hi I'm a comment in might_sleep() :-) */
0) 1.321 us | }
0) 5.872 us | }
0) 7.313 us | }
0) 8.718 us | }

And this patch brings two minor fixes:

- The newline after a switch-out task has disappeared
- The "|" sign just before the cpu number on task-switch has been deleted.

0) 0.616 us | pick_next_task_rt();
0) 1.457 us | _spin_trylock();
0) 0.653 us | _spin_unlock();
0) 0.728 us | _spin_trylock();
0) 0.631 us | _spin_unlock();
0) 0.729 us | native_load_sp0();
0) 0.593 us | native_load_tls();
------------------------------------------
0) cat-2834 => migrati-3
------------------------------------------

0) | finish_task_switch() {
0) 0.841 us | _spin_unlock_irq();
0) 0.616 us | post_schedule_rt();
0) 3.882 us | }

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
804a685162a7080386714166776f57255a75238e 03-Dec-2008 Steven Rostedt <srostedt@redhat.com> ftrace: trace single pid for function graph tracer

Impact: New feature

This patch makes the changes to set_ftrace_pid apply to the function
graph tracer.

# echo $$ > /debugfs/tracing/set_ftrace_pid
# echo function_graph > /debugfs/tracing/current_tracer

Will cause only the current task to be traced. Note, the trace flags are
also inherited by child processes, so the children of the shell
will also be traced.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ea4e2bc4d9f7370e57a343ccb5e7c0ad3222ec3c 03-Dec-2008 Steven Rostedt <srostedt@redhat.com> ftrace: graph of a single function

This patch adds the file:

/debugfs/tracing/set_graph_function

which can be used along with the function graph tracer.

When this file is empty, the function graph tracer will act as
usual. When the file has a function in it, the function graph
tracer will only trace that function.

For example:

# echo blk_unplug > /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/trace
[...]
------------------------------------------
| 2) make-19003 => kjournald-2219
------------------------------------------

2) | blk_unplug() {
2) | dm_unplug_all() {
2) | dm_get_table() {
2) 1.381 us | _read_lock();
2) 0.911 us | dm_table_get();
2) 1. 76 us | _read_unlock();
2) + 12.912 us | }
2) | dm_table_unplug_all() {
2) | blk_unplug() {
2) 0.778 us | generic_unplug_device();
2) 2.409 us | }
2) 5.992 us | }
2) 0.813 us | dm_table_put();
2) + 29. 90 us | }
2) + 34.532 us | }

You can add up to 32 functions into this file. Currently we limit it
to 32, but this may change with later improvements.

To add another function, use the append '>>':

# echo sys_read >> /debugfs/tracing/set_graph_function
# cat /debugfs/tracing/set_graph_function
blk_unplug
sys_read

Using the '>' will clear out the function and write anew:

# echo sys_write > /debug/tracing/set_graph_function
# cat /debug/tracing/set_graph_function
sys_write

Note, if you have function graph running while doing this, the small
time between clearing it and updating it will cause the graph to
record all functions. This should not be an issue because after
it sets the filter, only those functions will be recorded from then on.
If you need to only record a particular function then set this
file first before starting the function graph tracer. In the future
this side effect may be corrected.

The set_graph_function file is similar to the set_ftrace_filter but
it does not take wild cards nor does it allow for more than one
function to be set with a single write. There is no technical reason why
this is the case, I just do not have the time yet to implement that.

Note, dynamic ftrace must be enabled for this to appear because it
uses the dynamic ftrace records to match the name to the mcount
call sites.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e49dc19c6a19ea112fcb94b7c62ec62cdd5c08aa 03-Dec-2008 Steven Rostedt <srostedt@redhat.com> ftrace: function graph return for function entry

Impact: feature, let entry function decide to trace or not

This patch lets the graph tracer entry function decide if the tracing
should be done at the end as well. This requires all function graph
entry functions return 1 if it should trace, or 0 if the return should
not be traced.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a5e25883a445dce94a087ca479b21a5959cd5c18 02-Dec-2008 Steven Rostedt <srostedt@redhat.com> ftrace: replace raw_local_irq_save with local_irq_save

Impact: fix for lockdep and ftrace

The raw_local_irq_save/restore confuses lockdep. This patch
converts them to the local_irq_save/restore variants.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c7425acb42fff1e723b05fbf4ea11e9a455d95dc 28-Nov-2008 Török Edwin <edwintorok@gmail.com> tracing, alpha: fix build: add missing #ifdef CONFIG_STACKTRACE

There are architectures that still have no stacktrace support.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
660c7f9be96321fc80026d76411bd15e6f418a72 26-Nov-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: add thread comm to function graph tracer

Impact: enhancement to function graph tracer

Export the trace_find_cmdline so the function graph tracer can
use it to print the comms of the threads.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
287b6e68ca7209caec40b2f44f837c580a413bae 26-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-return-tracer: set a more human readable output

Impact: feature

This patch sets a C-like output for the function graph tracing.
For this aim, we now call two handler for each function: one on the entry
and one other on return. This way we can draw a well-ordered call stack.

The pid of the previous trace is loosely stored to be compared against
the one of the current trace to see if there were a context switch.

Without this little feature, the call tree would seem broken at
some locations.
We could use the sched_tracer to capture these sched_events but this
way of processing is much more simpler.

2 spaces have been chosen for indentation to fit the screen while deep
calls. The time of execution in nanosecs is printed just after closed
braces, it seems more easy this way to find the corresponding function.
If the time was printed as a first column, it would be not so easy to
find the corresponding function if it is called on a deep depth.

I plan to output the return value but on 32 bits CPU, the return value
can be 32 or 64, and its difficult to guess on which case we are.
I don't know what would be the better solution on X86-32: only print
eax (low-part) or even edx (high-part).

Actually it's thee same problem when a function return a 8 bits value, the
high part of eax could contain junk values...

Here is an example of trace:

sys_read() {
fget_light() {
} 526
vfs_read() {
rw_verify_area() {
security_file_permission() {
cap_file_permission() {
} 519
} 1564
} 2640
do_sync_read() {
pipe_read() {
__might_sleep() {
} 511
pipe_wait() {
prepare_to_wait() {
} 760
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 504
} 1587
clear_buddies() {
} 512
add_cfs_task_weight() {
} 519
update_min_vruntime() {
} 511
} 5602
dequeue_entity() {
update_curr() {
update_min_vruntime() {
} 496
} 1631
clear_buddies() {
} 496
update_min_vruntime() {
} 527
} 4580
hrtick_update() {
hrtick_start_fair() {
} 488
} 1489
} 13700
} 14949
} 16016
msecs_to_jiffies() {
} 496
put_prev_task_fair() {
} 504
pick_next_task_fair() {
} 489
pick_next_task_rt() {
} 496
pick_next_task_fair() {
} 489
pick_next_task_idle() {
} 489

------------8<---------- thread 4 ------------8<----------

finish_task_switch() {
} 1203
do_softirq() {
__do_softirq() {
__local_bh_disable() {
} 669
rcu_process_callbacks() {
__rcu_process_callbacks() {
cpu_quiet() {
rcu_start_batch() {
} 503
} 1647
} 3128
__rcu_process_callbacks() {
} 542
} 5362
_local_bh_enable() {
} 587
} 8880
} 9986
kthread_should_stop() {
} 669
deactivate_task() {
dequeue_task() {
dequeue_task_fair() {
dequeue_entity() {
update_curr() {
calc_delta_mine() {
} 511
update_min_vruntime() {
} 511
} 2813

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fb52607afcd0629776f1dc9e657647ceae81dd50 25-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-return-tracer: change the name into function-graph-tracer

Impact: cleanup

This patch changes the name of the "return function tracer" into
function-graph-tracer which is a more suitable name for a tracing
which makes one able to retrieve the ordered call stack during
the code flow.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8bba1bf5e2434c83f2fe8b1422604ace9bbe4cb8 25-Nov-2008 Markus Metzger <markus.t.metzger@intel.com> x86, ftrace: call trace->open() before stopping tracing; add trace->print_header()

Add a callback to allow an ftrace plug-in to write its own header.

Move the call to trace->open() up a few lines.

The changes are required by the BTS ftrace plug-in.

Signed-off-by: Markus Metzger <markus.t.metzger@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e38da59269be8c0196d16dff1be5bb26076afc6a 23-Nov-2008 Török Edwin <edwintorok@gmail.com> tracing/stack-tracer: avoid races accessing file

Impact: fix race

vma->vm_file reference is only stable while holding the mmap_sem,
so move usage of it to within the critical section.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
cffa10aecb6891f090a4d53a075bc40c082c45fc 22-Nov-2008 Török Edwin <edwintorok@gmail.com> tracing/stack-tracer: fix locking and refcounts

Impact: fix refcounting/object-access bug

Hold mmap_sem while looking up/accessing vma.
Hold the RCU lock while using the task we looked up.

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
8d7c6a96164651dbbab449ef0b5c20ae1f76a3a1 22-Nov-2008 Török Edwin <edwintorok@gmail.com> tracing/stack-tracer: fix style issues

Impact: cleanup

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
69bb54ec05f57da7f6fac2cec0820cbc970df20f 21-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: add ftrace_off_permanent

Impact: add new API to disable all of ftrace on anomalies

It case of a serious anomaly being detected (like something caught by
lockdep) it is a good idea to disable all tracing immediately, without
grabing any locks.

This patch adds ftrace_off_permanent that disables the tracers, function
tracing and ring buffers without a way to enable them again. This should
only be used when something serious has been detected.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b54d3de9f3b8956653b06f1a32e9f9321c6d9027 22-Nov-2008 Török Edwin <edwintorok@gmail.com> tracing: identify which executable object the userspace address belongs to

Impact: modify+improve the userstacktrace tracing visualization feature

Store thread group leader id, and use it to lookup the address in the
process's map. We could have looked up the address on thread's map,
but the thread might not exist by the time we are called. The process
might not exist either, but if you are reading trace_pipe, that is
unlikely.

Example usage:

mount -t debugfs nodev /sys/kernel/debug
cd /sys/kernel/debug/tracing
echo userstacktrace >iter_ctrl
echo sym-userobj >iter_ctrl
echo sched_switch >current_tracer
echo 1 >tracing_enabled
cat trace_pipe >/tmp/trace&
.... run application ...
echo 0 >tracing_enabled
cat /tmp/trace

You'll see stack entries like:

/lib/libpthread-2.7.so[+0xd370]

You can convert them to function/line using:

addr2line -fie /lib/libpthread-2.7.so 0xd370

Or:

addr2line -fie /usr/lib/debug/libpthread-2.7.so 0xd370

For non-PIC/PIE executables this won't work:

a.out[+0x73b]

You need to run the following: addr2line -fie a.out 0x40073b
(where 0x400000 is the default load address of a.out)

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
02b67518e2b1c490787dac7f35e1204e74fe21ba 22-Nov-2008 Török Edwin <edwintorok@gmail.com> tracing: add support for userspace stacktraces in tracing/iter_ctrl

Impact: add new (default-off) tracing visualization feature

Usage example:

mount -t debugfs nodev /sys/kernel/debug
cd /sys/kernel/debug/tracing
echo userstacktrace >iter_ctrl
echo sched_switch >current_tracer
echo 1 >tracing_enabled
.... run application ...
echo 0 >tracing_enabled

Then read one of 'trace','latency_trace','trace_pipe'.

To get the best output you can compile your userspace programs with
frame pointers (at least glibc + the app you are tracing).

Signed-off-by: Török Edwin <edwintorok@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
86fa2f60674540df0b34f5c547ed0c1cf3a8f212 19-Nov-2008 Ingo Molnar <mingo@elte.hu> ftrace: fix selftest locking

Impact: fix self-test boot crash

Self-test failure forgot to re-lock the BKL - crashing the next
initcall:

Testing tracer irqsoff: .. no entries found ..FAILED!
initcall init_irqsoff_tracer+0x0/0x11 returned 0 after 3906 usecs
calling init_mmio_trace+0x0/0xf @ 1
------------[ cut here ]------------
Kernel BUG at c0c0a915 [verbose debug info unavailable]
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
last sysfs file:

Pid: 1, comm: swapper Not tainted (2.6.28-rc5-tip #53704)
EIP: 0060:[<c0c0a915>] EFLAGS: 00010286 CPU: 1
EIP is at unlock_kernel+0x10/0x2b
EAX: ffffffff EBX: 00000000 ECX: 00000000 EDX: f7030000
ESI: c12da19c EDI: 00000000 EBP: f7039f54 ESP: f7039f54
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
Process swapper (pid: 1, ti=f7038000 task=f7030000 task.ti=f7038000)
Stack:
f7039f6c c0164d30 c013fed8 a7d8d7b4 00000000 00000000 f7039f74 c12fb78a
f7039fd0 c0101132 c12fb77d 00000000 6f727200 6f632072 2d206564 c1002031
0000000f f7039fa2 f7039fb0 3531b171 00000000 00000000 0000002f c12ca480
Call Trace:
[<c0164d30>] ? register_tracer+0x66/0x13f
[<c013fed8>] ? ktime_get+0x19/0x1b
[<c12fb78a>] ? init_mmio_trace+0xd/0xf
[<c0101132>] ? do_one_initcall+0x4a/0x111
[<c12fb77d>] ? init_mmio_trace+0x0/0xf
[<c015c7e6>] ? init_irq_proc+0x46/0x59
[<c12e851d>] ? kernel_init+0x104/0x152
[<c12e8419>] ? kernel_init+0x0/0x152
[<c01038b7>] ? kernel_thread_helper+0x7/0x10
Code: 58 14 43 75 0a b8 00 9b 2d c1 e8 51 43 7a ff 64 a1 00 a0 37 c1 89 58 14 5b 5d c3 55 64 8b 15 00 a0 37 c1 83 7a 14 00 89 e5 79 04 <0f> 0b eb fe 8b 42 14 48 85 c0 89 42 14 79 0a b8 00 9b 2d c1 e8
EIP: [<c0c0a915>] unlock_kernel+0x10/0x2b SS:ESP 0068:f7039f54
---[ end trace a7919e7f17c0a725 ]---
Kernel panic - not syncing: Attempted to kill init!

So clean up the flow a bit.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
a22506347d038a66506c6f57e9b97104128e280d 18-Nov-2008 Heiko Carstens <heiko.carstens@de.ibm.com> ftrace: preemptoff selftest not working

Impact: fix preemptoff and preemptirqsoff tracer self-tests

I was wondering why the preemptoff and preemptirqsoff tracer selftests
don't work on s390. After all its just that they get called from
non-preemptible context:

kernel_init() will execute all initcalls, however the first line in
kernel_init() is lock_kernel(), which causes the preempt_count to be
increased. Any later calls to add_preempt_count() (especially those
from the selftests) will therefore not result in a call to
trace_preempt_off() since the check below in add_preempt_count()
will be false:

if (preempt_count() == val)
trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1));

Hence the trace buffer will be empty.

Fix this by releasing the BKL during the self-tests.

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0bb943c7a2136716757a263f604d26309fd98042 14-Nov-2008 Julia Lawall <julia@diku.dk> tracing: kernel/trace/trace.c: introduce missing kfree()

Impact: fix memory leak

Error handling code following a kzalloc should free the allocated data.

The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@r exists@
local idexpression x;
statement S;
expression E;
identifier f,l;
position p1,p2;
expression *ptr != NULL;
@@

(
if ((x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...)) == NULL) S
|
x@p1 = \(kmalloc\|kzalloc\|kcalloc\)(...);
...
if (x == NULL) S
)
<... when != x
when != if (...) { <+...x...+> }
x->f = E
...>
(
return \(0\|<+...x...+>\|ptr\);
|
return@p2 ...;
)

@script:python@
p1 << r.p1;
p2 << r.p2;
@@

print "* file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0231022cc32d5f2e7f3c06b75691dda0ad6aec33 17-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/function-return-tracer: add the overrun field

Impact: help to find the better depth of trace

We decided to arbitrary define the depth of function return trace as
"20". Perhaps this is not enough. To help finding an optimal depth, we
measure now the overrun: the number of functions that have been missed
for the current thread. By default this is not displayed, we have to
do set a particular flag on the return tracer: echo overrun >
/debug/tracing/trace_options And the overrun will be printed on the
right.

As the trace shows below, the current 20 depth is not enough.

update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838)
update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838)
do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838)
tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838)
tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838)
vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274)
vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274)
set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274)
con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274)
release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274)
release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274)
con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274)
con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274)
n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274)
irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274)
smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274)
ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274)
ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274)
ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274)
hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274)

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
adf9f19574334c9a29a2bc956009fcac7edf1a6b 17-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: implement a set_flag callback for tracers

Impact: give a way to send specific messages to tracers

The current implementation of tracing uses some flags to control the
output of general tracers. But we have no way to implement custom
flags handling for a specific tracer. This patch proposes a new
callback for the struct tracer which called set_flag and a structure
that represents a 32 bits variable flag.

A tracer can implement a struct tracer_flags on which it puts the
initial value of the flag integer. Than it can place a range of flags
with their name and their flag mask on the flag integer. The structure
that implement a single flag is called struct tracer_opt.

These custom flags will be available through the trace_options file
like the general tracing flags. Changing their value is done like the
other general flags. For example if you have a flag that calls "foo",
you can activate it by writing "foo" or "nofoo" on trace_options.

Note that the set_flag callback is optional and is only needed if you
want the flags changing to be signaled to your tracer and let it to
accept or refuse their assignment.

V2: Some arrangements in coding style....

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0c726da983de0704254250ef6495ca152e7abcca 16-Nov-2008 Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> tracing: branch tracer, fix writing to trace/trace_options

Impact: fix trace_options behavior

writing to trace/trace_options use the index of the array
to find the value of the flag. With branch tracer flag
defined conditionally, this breaks writing to trace_options
with branch tracer disabled.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1c80025a49855b12fa09bb6db71820e3367b1369 16-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: change the type of the init() callback

Impact: extend the ->init() method with the ability to fail

This bring a way to know if the initialization of a tracer successed.
A tracer must return 0 on success and a traditional error (ie:
-ENOMEM) if it fails.

If a tracer fails to init, it is free to print a detailed warn. The
tracing api will not and switch to a new tracer will just return the
error from the init callback.

Note: this will be used for the return tracer.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
e6e7a65aabdb696cf05a56cfd495c49a11fd4cde 16-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: fix unexpected -EINVAL when longest tracer name is set

Impact: fix confusing write() -EINVAL when changing the tracer

The following commit d9e540762f5cdd89f24e518ad1fd31142d0b9726 remade
alive the bug which made the set of a new tracer returning -EINVAL if
this is the longest name of tracer. This patch corrects it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d51ad7ac48f991c4a8834485727efa99a691cb87 15-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: replace raw_local_irq_save with local_irq_save

Impact: fix lockdep disabling itself when function tracing is enabled

The raw_local_irq_saves used in ftrace is causing problems with
lockdep. (it thinks the irq flags are out of sync and disables
itself with a warning)

The raw ops here are not needed, and the normal local_irq_save is fine.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b6dff3ec5e116e3af6f537d4caedcad6b9e5082a 14-Nov-2008 David Howells <dhowells@redhat.com> CRED: Separate task security context from task_struct

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com>

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: James Morris <jmorris@namei.org>
Acked-by: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: James Morris <jmorris@namei.org>
12ef7d448613ead2babd41c3ebfa1fe03c20edef 12-Nov-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: CPU buffer start annotation clean ups

Impact: better handling of CPU buffer start annotation

Because of the confusion with the per CPU buffers wrapping where
one CPU might be more active at the end of the trace than the other
CPUs causing that one CPU to have a shorter history. Kernel
developers were confused by the "missing" data of that one CPU
at the beginning of the trace output. An annotation was added to
the trace output to show that the buffer had started:

# tracer: function
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
##### CPU 3 buffer started ####
<idle>-0 [003] 158.192959: smp_apic_timer_interrupt
[...]
<idle>-0 [003] 161.556520: default_idle
##### CPU 1 buffer started ####
<idle>-0 [001] 161.592494: hrtimer_force_reprogram
[etc]

But this annotation gets a bit messy when tracers do not fill the
buffers. This patch does a couple of things:

One) it adds a flag to trace_options to disable these annotations

Two) it does not annotate if the tracer did not overflow its buffer.

This makes the output much cleaner.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ee6bce52276c0717ed3e63296e5d9465d339e923 12-Nov-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: rename iter_ctrl to trace_options

Impact: rename file /debug/tracing/iter_ctrl to /debug/tracing/trace_options

The original ftrace had a file called "iter_ctrl" that would control
the way the output was iterated. But this file grew into a catch all
for different trace options. This patch renames the file from iter_ctrl
to trace_options to reflect this change.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
1696b2b0f44a8d42f3e6b1ea90c21790871c04d9 13-Nov-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: show buffer size in kilobytes

Impact: change the units of buffer_size_kb to kilobytes

This patch changes the units of the buffer_size_kb file to kilobytes.
Reading and writing to the file uses kilobytes as units. To help
users to know what units are used, the output of the file now
looks like:

# cat /debug/tracing/buffer_size_kb
1408

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a94c80e78bc9f4493ffc25a02d5d7bcd93c399d0 12-Nov-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: rename trace_entries to buffer_size_kb

Impact: rename of debugfs file trace_entries to buffer_size_kb

The original ftrace had fixed size entries, and the number of entries
was shown and modified via the file called trace_entries. By converting
to the unified trace buffer, we now allow for variable size entries
which makes the meaning of trace_entries pointless.

Since trace_size might be confused to the size of the trace, this patch
names it "buffer_size_kb" (thanks to Arjan van de Ven for this idea).

[ mingo@elte.hu: changed from buffer_size to buffer_size_kb ]

( Note, the units are still bytes - the next patch changes that,
to keep the wide rename patch separate from the unit-change patch. )

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
9f029e83e968e5661d7be045bbcb620dbb909938 12-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: rename unlikely iter_ctrl to branch

Impact: rename of iter_ctrl unlikely to branch

The unlikely name is ugly. This patch converts the iter_ctrl command
"unlikely" and "nounlikely" to "branch" and "nobranch" respectively.

It also renames a lot of internal functions to use "branch" instead
of "unlikely".

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2ed84eeb8808cf3c9f039213ca137ffd7d753f0e 12-Nov-2008 Steven Rostedt <srostedt@redhat.com> trace: rename unlikely profiler to branch profiler

Impact: name change of unlikely tracer and profiler

Ingo Molnar suggested changing the config from UNLIKELY_PROFILE
to BRANCH_PROFILING. I never did like the "unlikely" name so I
went one step farther, and renamed all the unlikely configurations
to a "BRANCH" variant.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
68d119f0a66f7e3663304343b072e56a2693446b 12-Nov-2008 Ingo Molnar <mingo@elte.hu> tracing: finetune branch-tracer output

Steve suggested the to change the output from this:

> bash-3471 [003] 357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177
> bash-3471 [003] 357.014756: [ .... ] update_curr:sched_fair.c:489
> bash-3471 [003] 357.014758: [ .... ] calc_delta_fair:sched_fair.c:411

to this:

> bash-3471 [003] 357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177
> bash-3471 [003] 357.014756: [ ok ] update_curr:sched_fair.c:489
> bash-3471 [003] 357.014758: [ ok ] calc_delta_fair:sched_fair.c:411

as it makes it clearer to the user what it means exactly.

Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
f88c4ae9f8c3939bee4337c75c7a673b5de7a8a7 12-Nov-2008 Ingo Molnar <mingo@elte.hu> tracing: branch tracer, tweak output

Impact: modify the tracer output, to make it a bit easier to read

Change the output from:

> bash-3471 [003] 357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177
> bash-3471 [003] 357.014756: [correct] update_curr:sched_fair.c:489
> bash-3471 [003] 357.014758: [correct] calc_delta_fair:sched_fair.c:411

to:

> bash-3471 [003] 357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177
> bash-3471 [003] 357.014756: [ .... ] update_curr:sched_fair.c:489
> bash-3471 [003] 357.014758: [ .... ] calc_delta_fair:sched_fair.c:411

it's good to have fields aligned vertically, and the only important
information is a prediction miss, so display only that information.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
52f232cb720a7babb752849cbc2cab2d24021209 12-Nov-2008 Steven Rostedt <rostedt@goodmis.org> tracing: likely/unlikely branch annotation tracer

Impact: new likely/unlikely branch tracer

This patch adds a way to record the instances of the likely() and unlikely()
branch condition annotations.

When "unlikely" is set in /debugfs/tracing/iter_ctrl the unlikely conditions
will be added to any of the ftrace tracers. The change takes effect when
a new tracer is passed into the current_tracer file.

For example:

bash-3471 [003] 357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177
bash-3471 [003] 357.014756: [correct] update_curr:sched_fair.c:489
bash-3471 [003] 357.014758: [correct] calc_delta_fair:sched_fair.c:411
bash-3471 [003] 357.014759: [correct] account_group_exec_runtime:sched_stats.h:356
bash-3471 [003] 357.014761: [correct] update_curr:sched_fair.c:489
bash-3471 [003] 357.014763: [INCORRECT] calc_delta_fair:sched_fair.c:411
bash-3471 [003] 357.014765: [correct] calc_delta_mine:sched.c:1279

Which shows the normal tracer heading, as well as whether the condition was
correct "[correct]" or was mistaken "[INCORRECT]", followed by the function,
file name and line number.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
15e6cb3673ea6277999642802406a764b49391b0 11-Nov-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing: add a tracer to catch execution time of kernel functions

Impact: add new tracing plugin which can trace full (entry+exit) function calls

This tracer uses the low level function return ftrace plugin to
measure the execution time of the kernel functions.

The first field is the caller of the function, the second is the
measured function, and the last one is the execution time in
nanoseconds.

- v3:

- HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return
should enable it.
- ftrace_return_stub becomes ftrace_stub.
- CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER
- Return traces printing can be used for other tracers on trace.c
- Adapt to the new tracing API (no more ctrl_update callback)
- Correct the check of "disabled" during insertion.
- Minor changes...

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5aa1ba6a6c710e747838a22d798ac97a8b248745 11-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: prevent ftrace_special from recursion

Impact: stop ftrace_special from recursion

The ftrace_special is used to help debug areas of the kernel.
Because of this, if it is put in certain locations, the fact that
it allows recursion can become a problem if the kernel developer
using does not realize that.

This patch changes ftrace_special to not allow recursion into itself
to make it more robust.

It also changes from preempt disable interrupts disable to prevent
any loss of trace entries.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
bf5e6519b85b3853f2d0bb4f17a4e2eaeffeb574 11-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: disable tracing on resize

Impact: fix for bug on resize

This patch addresses the bug found here:

http://bugzilla.kernel.org/show_bug.cgi?id=11996

When ftrace converted to the new unified trace buffer, the resizing of
the buffer was not protected as much as it was originally. If tracing
is performed while the resize occurs, then the buffer can be corrupted.

This patch disables all ftrace buffer modifications before a resize
takes place.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a309720c876d7ad2e224bfd1982c92ae4364c82e 08-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: display start of CPU buffer in trace output

Impact: change in trace output

Because the trace buffers are per cpu ring buffers, the start of
the trace can be confusing. If one CPU is very active at the
end of the trace, its history will not go as far back as the
other CPU traces. This means that output for a particular CPU
may not appear for the first part of a trace.

To help annotate what is happening, and to prevent any more
confusion, this patch adds a line that annotates the start of
a CPU buffer output.

For example:

automount-3495 [001] 184.596443: dnotify_parent <-vfs_write
[...]
automount-3495 [001] 184.596449: dput <-path_put
automount-3496 [002] 184.596450: down_read_trylock <-do_page_fault
[...]
sshd-3497 [001] 184.597069: up_read <-do_page_fault
<idle>-0 [000] 184.597074: __exit_idle <-exit_idle
[...]
automount-3496 [002] 184.597257: filemap_fault <-__do_fault
<idle>-0 [003] 184.597261: exit_idle <-smp_apic_timer_interrupt

Note, parsers of a trace output should always ignore any lines that
start with a '#'.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c76f06945be50564f925799ddfb6235ee4c26aa0 08-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: remove trace array ctrl

Impact: remove obsolete variable in trace_array structure

With the new start / stop method of ftrace, the ctrl variable
in the trace_array structure is now obsolete. Remove it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
bbf5b1a0cecb56de6236db8b01c5bfb7ab8ba8b2 08-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: remove ctrl_update method

Impact: Remove the ctrl_update tracer method

With the new quick start/stop method of tracing, the ctrl_update
method is out of date.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
49833fc232bd6a5076496994d855f601354501d7 08-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: enable trace_printk by default

Impact: have the ftrace_printk enabled on startup

It is confusing to have to "echo trace_printk > /debug/tracing/iter_ctrl"
after adding ftrace_printk in the kernel.

Currently the trace_printk is set to off by default. ftrace_printk
should only be in open kernel code when used for debugging, and thus
it should be enabled by default.

It may also be used to record data within a tracer, but those ftrace_printks
should be within wrappers that are either enabled by trace_points or
have a variable protecting the code path from being entered when the
tracer is disabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
9036990d462e09366f7297a2d1da6582c3e6b1d3 05-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: restructure tracing start/stop infrastructure

Impact: change where tracing is started up and stopped

Currently, when a new tracer is selected via echo'ing a tracer name into
the current_tracer file, the startup is only done if tracing_enabled is
set to one. If tracing_enabled is changed to zero (by echo'ing 0 into
the tracing_enabled file) a full shutdown is performed.

The full startup and shutdown of a tracer can be expensive and the
user can lose out traces when echo'ing in 0 to the tracing_enabled file,
because the process takes too long. There can also be places that
the user would like to start and stop the tracer several times and
doing the full startup and shutdown of a tracer might be too expensive.

This patch performs the full startup and shutdown when a tracer is
selected. It also adds a way to do a quick start or stop of a tracer.
The quick version is just a flag that prevents the tracing from
taking place, but the overhead of the code is still there.

For example, the startup of a tracer may enable tracepoints, or enable
the function tracer. The stop and start will just set a flag to
have the tracer ignore the calls when the tracepoint or function trace
is called. The overhead of the tracer may still be present when
the tracer is stopped, but no tracing will occur. Setting the tracer
to the 'nop' tracer (or any other tracer) will perform the shutdown
of the tracer which will disable the tracepoint or disable the
function tracer.

The tracing_enabled file will simply start or stop tracing.

This change is all internal. The end result for the user should be the same
as before. If tracing_enabled is not set, no trace will happen.
If tracing_enabled is set, then the trace will happen. The tracing_enabled
variable is static between tracers. Enabling tracing_enabled and
going to another tracer will keep tracing_enabled enabled. Same
is true with disabling tracing_enabled.

This patch will now provide a fast start/stop method to the users
for enabling or disabling tracing.

Note: There were two methods to the struct tracer that were never
used: The methods start and stop. These were to be used as a hook
to the reading of the trace output, but ended up not being
necessary. These two methods are now used to enable the start
and stop of each tracer, in case the tracer needs to do more than
just not write into the buffer. For example, the irqsoff tracer
must stop recording max latencies when tracing is stopped.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
0f04870148ecb825133bc2733f473b1c5773ac0b 05-Nov-2008 Steven Rostedt <srostedt@redhat.com> ftrace: soft tracing stop and start

Impact: add way to quickly start stop tracing from the kernel

This patch adds a soft stop and start to the trace. This simply
disables function tracing via the ftrace_disabled flag, and
disables the trace buffers to prevent recording. The tracing
code may still be executed, but the trace will not be recorded.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
072ba49838b42c873c496d72c91bb237914cf3b6 26-Oct-2008 Eric Anholt <eric@anholt.net> ftrace: fix breakage in bin_fmt results

In 777e208d40d0953efc6fb4ab58590da3f7d8f02d we changed from outputting
field->cpu (a char) to iter->cpu (unsigned int), increasing the resulting
structure size by 3 bytes.

Signed-off-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d7ad44b697c9d13e445ddc7d16f736fbac333249 31-Oct-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/fastboot: use sched switch tracer from boot tracer

Impact: enhance boot trace output with scheduling events

Use the sched_switch tracer from the boot tracer.

We also can trace schedule events inside the initcalls.
Sched tracing is disabled after the initcall has finished and
then reenabled before the next one is started.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b2a866f9344cb30d7ddf5d67b5b8393daf8bef07 04-Nov-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: function tracer with irqs disabled

Impact: disable interrupts during trace entry creation (as opposed to preempt)

To help with performance, I set the ftracer to not disable interrupts,
and only to disable preemption. If an interrupt occurred, it would not
be traced, because the function tracer protects itself from recursion.
This may be faster, but the trace output might miss some traces.

This patch makes the fuction trace disable interrupts, but it also
adds a runtime feature to disable preemption instead. It does this by
having two different tracer functions. When the function tracer is
enabled, it will check to see which version is requested (irqs disabled
or preemption disabled). Then it will use the corresponding function
as the tracer.

Irq disabling is the default behaviour, but if the user wants better
performance, with the chance of missing traces, then they can choose
the preempt disabled version.

Running hackbench 3 times with the irqs disabled and 3 times with
the preempt disabled function tracer yielded:

tracing type times entries recorded
------------ -------- ----------------
irq disabled 43.393 166433066
43.282 166172618
43.298 166256704

preempt disabled 38.969 159871710
38.943 159972935
39.325 161056510

Average:

irqs disabled: 43.324 166287462
preempt disabled: 39.079 160300385

preempt is 10.8 percent faster than irqs disabled.

I wrote a patch to count function trace recursion and reran hackbench.

With irq disabled: 1,150 times the function tracer did not trace due to
recursion.
with preempt disabled: 5,117,718 times.

The thousand times with irq disabled could be due to NMIs, or simply a case
where it called a function that was not protected by notrace.

But we also see that a large amount of the trace is lost with the
preempt version.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
182e9f5f704ed6b9175142fe8da33c9ce0c52b52 04-Nov-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: insert in the ftrace_preempt_disable()/enable() functions

Impact: use new, consolidated APIs in ftrace plugins

This patch replaces the schedule safe preempt disable code with the
ftrace_preempt_disable() and ftrace_preempt_enable() safe functions.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b3aa557722b3d5858f14ca559e03461c24125aaf 31-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: use kretprobe trampoline name to test in output

Impact: ia64+tracing build fix

When a function is kprobed, the return address is set to the
kprobe_trampoline, or something similar. This caused the output
of the trace to look confusing when the parent seemed to be this
"kprobe_trampoline" function.

To fix this, Abhishek Sagar added a test of the instruction pointer
of the parent to see if it matched the kprobe_trampoline. If it
did, the output would print a "[unknown/kretprobe'd]" instead.

Unfortunately, not all archs do this the same way, and the trampoline
function may not be exported, which causes failures in builds.

This patch will compare the name instead of the pointer to see
if it matches. This prevents us from depending on a function from
being exported, and should work on all archs. The worst that can
happen is that an arch might use a different name and then we
go back to the confusing output. At least the arch will still build.

Reported-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Acked-by: Abhishek Sagar <sagar.abhishek@gmail.com>
c2c80529460095035752bf0ecc1af82c1e0f6e0f 31-Oct-2008 Al Viro <viro@ZenIV.linux.org.uk> tracing, alpha: undefined reference to `save_stack_trace'

Impact: build fix on !stacktrace architectures

only select STACKTRACE on architectures that have STACKTRACE_SUPPORT

... since we also need to ifdef out the guts of ftrace_trace_stack().
We also want to disallow setting TRACE_ITER_STACKTRACE in trace_flags
on such configs, but that can wait.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d9e540762f5cdd89f24e518ad1fd31142d0b9726 01-Nov-2008 Peter Zijlstra <a.p.zijlstra@chello.nl> ftrace: ftrace_dump_on_oops=[tracer]

Impact: add new (optional) debug boot option

In order to facilitate early boot trouble, allow one to specify a tracer
on the kernel boot line.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a26a2a27396c0a0877aa701f8f92d08ba550a6c9 31-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: nmi safe code clean ups

Impact: cleanup

This patch cleans up the NMI safe code for dynamic ftrace as suggested
by Andrew Morton.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
9244489a7b69fe0746dc7cb3957f02e05bd1ceb0 24-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: handle archs that do not support irqs_disabled_flags

Impact: build fix on non-lockdep architectures

Some architectures do not support a way to read the irq flags that
is set from "local_irq_save(flags)" to determine if interrupts were
disabled or enabled. Ftrace uses this information to display to the user
if the trace occurred with interrupts enabled or disabled.

Besides the fact that those archs that do not support this will fail to
compile, unless they fix it, we do not want to have the trace simply
say interrupts were not disabled or they were enabled, without knowing
the real answer.

This patch adds a 'X' in the output to let the user know that the
architecture they are running on does not support a way for the tracer
to determine if interrupts were enabled or disabled. It also lets those
same archs compile with tracing enabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b807c3d0f8e39ed7cbbbe6da162650e305e8de15 30-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: nmi update statistics

Impact: add more debug info to /debugfs/tracing/dyn_ftrace_total_info

This patch adds dynamic ftrace NMI update statistics to the
/debugfs/tracing/dyn_ftrace_total_info stat file.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
60063a66236c15f5613f91390631e06718689782 28-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: fix current_tracer error return

The commit (in linux-tip) c2931e05ec5965597cbfb79ad332d4a29aeceb23
( ftrace: return an error when setting a nonexistent tracer )
added useful code that would error when a bad tracer was written into
the current_tracer file.

But this had a bug if the amount written was more than the amount read by
that code. The first iteration would set the tracer correctly, but since
it did not consume the rest of what was written (usually whitespace), the
userspace utility would continue to write what was not consumed. This
second iteration would fail to find a tracer and return -EINVAL. Funny
thing is that the tracer would have already been set.

This patch just consumes all the data that is written to the file.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
21798a84ab383cdac0e7ee3368e0792b718b867d 28-Oct-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing: fix a build error on alpha

Impact: build fix on Alpha

When tracing is enabled, some arch have included <linux/irqflags.h>
on their <asm/system.h> but others like alpha or m68k don't.

Build error on alpha:

kernel/trace/trace.c: In function 'tracing_cpumask_write':
kernel/trace/trace.c:2145: error: implicit declaration of function 'raw_local_irq_disable'
kernel/trace/trace.c:2162: error: implicit declaration of function 'raw_local_irq_enable'

Tested on Alpha through a cross-compiler (should correct a similar issue on m68k).

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
944ac4259e39801c843a915c3da8194ac9af0440 24-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: ftrace dump on oops control

Impact: add (default-off) dump-trace-on-oops flag

Currently, ftrace is set up to dump its contents to the console if the
kernel panics or oops. This can be annoying if you have trace data in
the buffers and you experience an oops, but the trace data is old or
static.

Usually when you want ftrace to dump its contents is when you are debugging
your system and you have set up ftrace to trace the events leading to
an oops.

This patch adds a control variable called "ftrace_dump_on_oops" that will
enable the ftrace dump to console on oops. This variable is default off
but a developer can enable it either through the kernel command line
by adding "ftrace_dump_on_oops" or at run time by setting (or disabling)
/proc/sys/kernel/ftrace_dump_on_oops.

v2:

Replaced /** with /* as Randy explained that kernel-doc does
not yet handle variables.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
08f5ac906d2c0faf96d608c54a0b03177376da8d 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: remove ftrace hash

The ftrace hash was used by the ftrace_daemon code. The record ip function
would place the calling address (ip) into the hash. The daemon would later
read the hash and modify that code.

The hash complicates the code. This patch removes it.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
81adbdc029ecc416d56563e7f159100181dd711d 23-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: only have ftrace_kill atomic

When an anomaly is detected, we need a way to completely disable
ftrace. Right now we have two functions: ftrace_kill and ftrace_kill_atomic.
The ftrace_kill tries to do it in a "nice" way by converting everything
back to a nop.

The "nice" way is dangerous itself, so this patch removes it and only
has the "atomic" version, which is all that is needed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
606576ce816603d9fe1fb453a88bc6eea16ca709 07-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: rename FTRACE to FUNCTION_TRACER

Due to confusion between the ftrace infrastructure and the gcc profiling
tracer "ftrace", this patch renames the config options from FTRACE to
FUNCTION_TRACER. The other two names that are offspring from FTRACE
DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same.

This patch was generated mostly by script, and partially by hand.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ad0a3b68114e8f3c25ac0045b45a2838f23e3b3a 09-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> trace: add build-time check to avoid overrunning hex buffer

Remove the runtime BUG_ON and change to a compile-time check in
the macro that calls the hex format routine

[Noticed by Joe Perches]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2fbc474901933c8f0c09b0280dfbb6780cb8bd60 09-Oct-2008 Harvey Harrison <harvey.harrison@gmail.com> ftrace: fix hex output mode of ftrace

Fix the output of ftrace in hex mode as the hi/lo nibbles are output in
reverse order. Without this patch, the output of ftrace is:

raw mode : 6474 0 141531612444 0 140 + 6402 120 S
hex mode : 000091a4 00000000 000000023f1f50c1 00000000 c8 000000b2 00009120 87 ffff00c8 00000035

There is an inversion on ouput hex(6474) is 194a

[based on a patch by Philippe Reynes <tremyfr@yahoo.fr>]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
c2931e05ec5965597cbfb79ad332d4a29aeceb23 04-Oct-2008 Frederic Weisbecker <fweisbec@gmail.com> ftrace: return an error when setting a nonexistent tracer

When one try to set a nonexistent tracer, no error is returned
as if the name of the tracer was correct.
We should return -EINVAL.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
3ea2e6d71aafe35b8aaf89ed711a283815acfae6 04-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: make some tracers reentrant

Now that the ring buffer is reentrant, some of the ftrace tracers
(sched_swich, debugging traces) can also be reentrant.

Note: Never make the function tracer reentrant, that can cause
recursion problems all over the kernel. The function tracer
must disable reentrancy.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
38697053fa006411224a1790e2adb8216440ab0f 01-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: preempt disable over interrupt disable

With the new ring buffer infrastructure in ftrace, I'm trying to make
ftrace a little more light weight.

This patch converts a lot of the local_irq_save/restore into
preempt_disable/enable. The original preempt count in a lot of cases
has to be sent in as a parameter so that it can be recorded correctly.
Some places were recording it incorrectly before anyway.

This is also laying the ground work to make ftrace a little bit
more reentrant, and remove all locking. The function tracers must
still protect from reentrancy.

Note: All the function tracers must be careful when using preempt_disable.
It must do the following:

resched = need_resched();
preempt_disable_notrace();
[...]
if (resched)
preempt_enable_no_resched_notrace();
else
preempt_enable_notrace();

The reason is that if this function traces schedule() itself, the
preempt_enable_notrace() will cause a schedule, which will lead
us into a recursive failure.

If we needed to reschedule before calling preempt_disable, we
should have already scheduled. Since we did not, this is most
likely that we should not and are probably inside a schedule
function.

If resched was not set, we still need to catch the need resched
flag being set when preemption was off and the if case at the
end will catch that for us.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
7104f300c5a69b46dda00d898034dd05c9f21739 01-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: type cast filter+verifier

The mmiotrace map had a bug that would typecast the entry from
the trace to the wrong type. That is a known danger of C typecasts,
there's absolutely zero checking done on them.

Help that problem a bit by using a GCC extension to implement a
type filter that restricts the types that a trace record can be
cast into, and by adding a dynamic check (in debug mode) to verify
the type of the entry.

This patch adds a macro to assign all entries of ftrace using the type
of the variable and checking the entry id. The typecasts are now done
in the macro for only those types that it knows about, which should
be all the types that are allowed to be read from the tracer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
797d3712a9dd75c720558612be05f42c031a7bb5 30-Sep-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: adapt mmiotrace to the new type of print_line, fix

Correct the value's type of trace_empty function

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
d769041f865330034131525ee6a7f72eb4af2a24 01-Oct-2008 Steven Rostedt <rostedt@goodmis.org> ring_buffer: implement new locking

The old "lock always" scheme had issues with lockdep, and was not very
efficient anyways.

This patch does a new design to be partially lockless on writes.
Writes will add new entries to the per cpu pages by simply disabling
interrupts. When a write needs to go to another page than it will
grab the lock.

A new "read page" has been added so that the reader can pull out a page
from the ring buffer to read without worrying about the writer writing over
it. This allows us to not take the lock for all reads. The lock is
now only taken when a read needs to go to a new page.

This is far from lockless, and interrupts still need to be disabled,
but it is a step towards a more lockless solution, and it also
solves a lot of the issues that were noticed by the first conversion
of ftrace to the ring buffers.

Note: the ring_buffer_{un}lock API has been removed.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
9ff4b9744c187cae58c3774361ea090addbc4130 29-Sep-2008 Pekka Paalanen <pq@iki.fi> tracing/ftrace: fix pipe breaking

This patch fixes a bug which break the pipe when the seq is empty.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2c4f035f6c3e8fda661eb6105aa51ef07aa71607 29-Sep-2008 Frederic Weisbecker <fweisbec@gmail.com> tracing/ftrace: change the type of the print_line callback

We need a kind of disambiguation when a print_line callback
returns 0.

_There is not enough space to print all the entry.
Please flush the seq and retry.
_I can't handle this type of entry

This patch changes the type of this callback for better information.

Also some changes have been made in this V2.

_ Only relay to default functions after the print_line callback fails.
_ This patch doesn't fix the issue with the broken pipe (see patch 2/4 for that)

Some things are still in discussion:

_ Find better names for the enum print_line_t values
_ Change the type of print_trace_line into boolean.

Patches to change that can be sent later.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
777e208d40d0953efc6fb4ab58590da3f7d8f02d 30-Sep-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: take advantage of variable length entries

Now that the underlining ring buffer for ftrace now hold variable length
entries, we can take advantage of this by only storing the size of the
actual event into the buffer. This happens to increase the number of
entries in the buffer dramatically.

We can also get rid of the "trace_cont" operation, but I'm keeping that
until we have no more users. Some of the ftrace tracers can now change
their code to adapt to this new feature.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
3928a8a2d98081d1bc3c0a84a2d70e29b90ecf1c 30-Sep-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: make work with new ring buffer

This patch ports ftrace over to the new ring buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b5ad384e79add1d87fff54070000dadcf218ffab 23-Sep-2008 Frédéric Weisbecker <fweisbec@gmail.com> tracing/ftrace: make tracing suitable to run the boot tracer

The tracing engine have now to be init in early_initcall to set the
boot tracer. Only the debugfs settings will be initialized at
fs_initcall time.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
43a15386c4faf913f7d70a47748c266d6210cd6e 21-Sep-2008 Frédéric Weisbecker <fweisbec@gmail.com> tracing/ftrace: replace none tracer by nop tracer

Replace "none" tracer by the recently created "nop" tracer.
Both are pretty similar except that nop accepts TRACE_PRINT
or TRACE_SPECIAL entries.

And as a consequence, changing the size of the ring buffer now
requires that tracing has already been disabled.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Steven Noonan <steven@uplinklabs.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5bf9a1ee350a10feb94107de32a203d81fbbe706 16-Sep-2008 Pekka Paalanen <pq@iki.fi> ftrace: inject markers via trace_marker file

Allow a user to inject a marker (TRACE_PRINT entry) into the trace ring
buffer. The related file operations are derived from code by Frédéric
Weisbecker <fweisbec@gmail.com>.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
fc5e27ae4b45a0619701a83f30d9b7fad7ed9400 16-Sep-2008 Pekka Paalanen <pq@iki.fi> mmiotrace: handle TRACE_PRINT entries

Also make trace_seq_print_cont() non-static, and add a newline if the
seq buffer can't hold all data.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
801fe40001dfc263848552fb28924b766ed44ea4 16-Sep-2008 Pekka Paalanen <pq@iki.fi> ftrace: add trace_vprintk()

trace_vprintk() for easier implementation of tracer specific *_printk
functions. Add check check for no_tracer, and implement
__ftrace_printk() as a wrapper.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
45dcd8b8a8ca855591e3ac882d3a7fc255d09d43 16-Sep-2008 Pekka Paalanen <pq@iki.fi> ftrace: move mmiotrace functions out of trace.c

Moves the mmiotrace specific functions from trace.c to
trace_mmiotrace.c. Functions trace_wake_up(), tracing_get_trace_entry(),
and tracing_generic_entry_update() are therefore made available outside
trace.c.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
80b5e940050c273ba277acbf3a9fbc1d4441e681 04-Sep-2008 Peter Zijlstra <peterz@infradead.org> ftrace: sched_switch: show the wakee's cpu

While profiling the smp behaviour of the scheduler it was needed to know to
which cpu a task got woken.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
f09ce573f57ddc35c67b39e51f34545877b30031 04-Sep-2008 Peter Zijlstra <peterz@infradead.org> ftrace: make ftrace_printk usable with the other tracers

Currently ftrace_printk only works with the ftrace tracer, switch it to an
iter_ctrl setting so we can make us of them with other tracers too.

[rostedt@redhat.com: tweak to the disable condition]

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
5a90f577e5369a84b720ead42e621fcb1b8a8b21 03-Sep-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: print continue index fix

An item in the trace buffer that is bigger than one entry may be split
up using the TRACE_CONT entry. This makes it a virtual single entry.
The current code increments the iterator index even while traversing
TRACE_CONT entries, making it look like the iterator is further than
it actually is.

This patch adds code to not increment the iterator index while skipping
over TRACE_CONT entries.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
652567aa2000f1d4a1fd434382a30d8dd4a7c980 03-Sep-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: binary and not logical for continue test

Peter Zijlstra provided me with a nice brown paper bag while letting me know
that I was doing a logical AND and not a binary one, making a condition
true more often than it should be.

Luckily, a false true is handled by the calling function and no harm is
done. But this needs to be fixed regardless.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a6168353d1c0b24b7512e14d1c3e47ed69881a23 21-Aug-2008 Michael Ellerman <michael@ellerman.id.au> ftrace: make output nicely spaced for up to 999 cpus

Currently some of the ftrace output goes skewiff if you have more
than 9 cpus, and some if you have more than 99.

Twiddle with the headers and format strings to make up to 999 cpus
display without causing spacing problems.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
98a983aad2e5b3dc83a8a761675445cdd8f3e6bd 15-Aug-2008 Frédéric Weisbecker <fweisbec@gmail.com> ftrace: fix some mistakes in error messages

This patch fixes some mistakes on the tracer in warning messages when
debugfs fails to create tracing files.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: srostedt@redhat.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
3f5a54e371ca20b119b73704f6c01b71295c1714 31-Jul-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: dump out ftrace buffers to console on panic

At OLS I had a lot of interest to be able to have the ftrace buffers
dumped on panic. Usually one would expect to uses kexec and examine
the buffers after a new kernel is loaded. But sometimes the resources
do not permit kdump and kexec, so having an option to still see the
sequence of events up to the crash is very advantageous.

This patch adds the option to have the ftrace buffers dumped to the
console in the latency_trace format on a panic. When the option is set,
the default entries per CPU buffer are lowered to 16384, since the writing
to the serial (if that is the console) may take an awful long time
otherwise.

[
Changes since -v1:
Got alpine to send correctly (as well as spell check working).
Removed config option.
Moved the static variables into ftrace_dump itself.
Gave printk a log level.
]

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2f2c99dba2398ef7d9c21f7c793180a50e68b1f0 01-Aug-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: ftrace_printk doc moved

Based on Randy Dunlap's suggestion, the ftrace_printk kernel-doc belongs
with the ftrace_printk macro that should be used. Not with the
__ftrace_printk internal function.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
dd0e545f061f90099a3dcc13aa77e29c6295cf23 01-Aug-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: printk formatting infrastructure

This patch adds a feature that can help kernel developers debug their
code using ftrace.

int ftrace_printk(const char *fmt, ...);

This records into the ftrace buffer using printf formatting. The entry
size in the buffers are still a fixed length. A new type has been added
that allows for more entries to be used for a single recording.

The start of the print is still the same as the other entries.

It returns the number of characters written to the ftrace buffer.

For example:

Having a module with the following code:

static int __init ftrace_print_test(void)
{
ftrace_printk("jiffies are %ld\n", jiffies);
return 0;
}

Gives me:

insmod-5441 3...1 7569us : ftrace_print_test: jiffies are 4296626666

for the latency_trace file and:

insmod-5441 [03] 1959.370498: ftrace_print_test jiffies are 4296626666

for the trace file.

Note: Only the infrastructure should go into the kernel. It is to help
facilitate debugging for other kernel developers. Calls to ftrace_printk
is not intended to be left in the kernel, and should be frowned upon just
like scattering printks around in the code.

But having this easily at your fingertips helps the debugging go faster
and bugs be solved quicker.

Maybe later on, we can hook this with markers and have their printf format
be sucked into ftrace output.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2e2ca155cd2213b4f398031180fb3d399d5b7db9 01-Aug-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: new continue entry - separate out from trace_entry

Some tracers will need to work with more than one entry. In order to do this
the trace_entry structure was split into two fields. One for the start of
all entries, and one to continue an existing entry.

The trace_entry structure now has a "field" entry that consists of the previous
content of the trace_entry, and a "cont" entry that is just a string buffer
the size of the "field" entry.

Thanks to Andrew Morton for suggesting this idea.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
605ccb73f6a1c891a16268b3a2923208fc637958 27-Jul-2008 Andrea Righi <righi.andrea@gmail.com> tracing: remove unused variable

Remove the following warning with CONFIG_TRACING=y:

kernel/trace/trace.c: In function ‘s_next’:
kernel/trace/trace.c:1186: warning: unused variable ‘last_ent’

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1986b0cb1671ea39178b4e2b00461109728fc935 24-Jul-2008 Ingo Molnar <mingo@elte.hu> ftrace: remove latency-tracer leftover

remove the :vim=ft=help tag from trace files.

I used them years ago to syntax-highlight traces and forgot about this hack.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
60bc080090e3bf6afa29c62cb25f913706551010 11-Jul-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: separate out the function enabled variable

Currently the function tracer uses the global tracer_enabled variable that
is used to keep track if the tracer is enabled or not. The function tracing
startup needs to be separated out, otherwise the internal happenings of
the tracer startup is also recorded.

This patch creates a ftrace_function_enabled variable to all the starting
of the function traces to happen after everything has been started.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
b5c21b4514b38f450848feb432f7120376d01ffe 11-Jul-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: check proper config for preempt type

There is no CONFIG_PREEMPT_DESKTOP. Use the proper entry CONFIG_PREEMPT.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
ecea656d1d5e912d2f3d332657ea4a6d8380f891 21-Jun-2008 Abhishek Sagar <sagar.abhishek@gmail.com> ftrace: freeze kprobe'd records

Let records identified as being kprobe'd be marked as "frozen". The trouble
with records which have a kprobe installed on their mcount call-site is
that they don't get updated. So if such a function which is currently being
traced gets its tracing disabled due to a new filter rule (or because it
was added to the notrace list) then it won't be updated and continue being
traced. This patch allows scanning of all frozen records during tracing to
check if they should be traced.

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
20764ff1efb440640353053ec83263e69e1259e0 12-Jun-2008 Jiri Slaby <jirislaby@gmail.com> ftrace: fix printout

Do not print loglevel before "entries of %ld bytes". Move it to the previous
pr_info.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2b1bce1787700768cbc87c8509851c6f49d252dc 09-Jun-2008 Ankita Garg <ankita@in.ibm.com> ftrace: disable tracing when current_tracer is set to "none"

Found that inspite of setting the current_tracer to "none", trace from
the previous trace type continued to be collected. The patch below fixes
this and causes the trace to be disabled when the "none" type is
selected.

Compile and boot tested the patch for functionality.

Signed-off-by: Ankita Garg <ankita@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
76094a2cf46e4ab776055d4086615b884408568c 27-May-2008 Abhishek Sagar <sagar.abhishek@gmail.com> ftrace: distinguish kretprobe'd functions in trace logs

Tracing functions via ftrace which have a kretprobe installed on them, can produce misleading output in their trace logs. E.g, consider the correct trace of the following sequence:

do_IRQ()
{
~
irq_enter();
~
}

Trace log (sample):
<idle>-0 [00] 4154504455.781616: irq_enter <- do_IRQ

But if irq_enter() has a kretprobe installed on it, the return value stored on the stack at each invocation is modified to divert the return to a kprobe trampoline function called kretprobe_trampoline(). So with this the trace would (currently) look like:

<idle>-0 [00] 4154504455.781616: irq_enter <- kretprobe_trampoline

Now this is quite misleading to the end user, as it suggests something that didn't actually happen. So just to avoid such misinterpretations, the inlined patch aims to output such a log as:

<idle>-0 [00] 4154504455.781616: irq_enter <- [unknown/kretprobe'd]

Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com>
Acked-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
41bc8144d02028133bcd1d545023c6f49e8b2411 22-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: fix up cmdline recording

The new work with converting the trace hooks over to markers broke the
command line recording of ftrace. This patch fixes it again.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4902f8849da6d2805bd291551a6dfd48f1b4f604 22-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: move ftrace_special to trace.c

Move the ftrace_special out of sched_switch to trace.c.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: pq@iki.fi
Cc: proski@gnu.org
Cc: sandmann@redhat.com
Cc: a.p.zijlstra@chello.nl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
19384c0314342222b18d4c7f09cdce1ca74dfd2a 22-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: limit use of check pages

The check_pages function is called often enough that it can cause problems
with trace outputs or even bringing the system to a halt.

This patch limits the check_pages to the places that are most likely to
have problems. The check is made at the flip between the global array and
the max save array, as well as when the size of the buffers changes and
the self tests.

This patch also removes the BUG_ON from check_pages and replaces it with
a WARN_ON and disabling of the tracer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Cc: pq@iki.fi
Cc: proski@gnu.org
Cc: sandmann@redhat.com
Cc: a.p.zijlstra@chello.nl
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
801a175bf601f9a9d5e86e92dee9adeeb6625da8 12-May-2008 Ingo Molnar <mingo@elte.hu> mmiotrace: ftrace fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bd8ac686c73c7e925fcfe0b02dc4e7b947127864 12-May-2008 Pekka Paalanen <pq@iki.fi> ftrace: mmiotrace, updates

here is a patch that makes mmiotrace work almost well within the tracing
framework. The patch applies on top of my previous patch. I have my own
output formatting in place now.

Summary of changes:
- fix the NULL dereference that was due to not calling tracing_reset()
- add print_line() callback into struct tracer
- implement print_line() for mmiotrace, producing up-to-spec text
- add my output header, but that is not really called in the right place
- rewrote the main structs in mmiotrace
- added two new trace entry types: TRACE_MMIO_RW and TRACE_MMIO_MAP
- made some functions in trace.c non-static
- check current==NULL in tracing_generic_entry_update()
- fix(?) comparison in trace_seq_printf()

Things seem to work fine except a few issues. Markers (text lines injected
into mmiotrace log) are missing, I did not feel hacking them in before we
have variable length entries. My output header is printed only for 'trace'
file, but not 'trace_pipe'. For some reason, despite my quick fix,
iter->trace is NULL in print_trace_line() when called from 'trace_pipe'
file, which means I don't get proper output formatting.

I only tried by loading nouveau.ko, which just detects the card, and that
is traced fine. I didn't try further. Map, two reads and unmap. Works
perfectly.

I am missing the information about overflows, I'd prefer to have a
counter for lost events. I didn't try, but I guess currently there is no
way of knowning when it overflows?

So, not too far from being fully operational, it seems :-)
And looking at the diffstat, there also is some 700-900 lines of user space
code that just became obsolete.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d618b3e6e50970a6248ac857653fdd49bcd3c045 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: sysprof updates

make the sample period configurable.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
74f4e369fc5b52433ad824cef32d3bf1304549be 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: stacktrace fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
3eefae994d9224fb7771a3ddb683868363c23510 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: limit trace entries

Currently there is no protection from the root user to use up all of
memory for trace buffers. If the root user allocates too many entries,
the OOM killer might start kill off all tasks.

This patch adds an algorith to check the following condition:

pages_requested > (freeable_memory + current_trace_buffer_pages) / 4

If the above is met then the allocation fails. The above prevents more
than 1/4th of freeable memory from being used by trace buffers.

To determine the freeable_memory, I made determine_dirtyable_memory in
mm/page-writeback.c global.

Special thanks goes to Peter Zijlstra for suggesting the above calculation.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
6c6c27969a4c6024e6c8838829546c02aaddca18 12-May-2008 Pekka Paalanen <pq@iki.fi> ftrace: add readpos to struct trace_seq; add trace_seq_to_user()

Refactor code from tracing_read_pipe() and create trace_seq_to_user().
Moved trace_seq_reset() call before iter->trace->read() call so that
when all leftover data is returned, trace_seq is reset automatically.

Signed-off-by: Pekka Paalanen <pq@iki.fi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a4feb8348b62fe76a63cdb5569f5c920f5283c06 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: special stacktrace

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
9fe068e92f6290e89e19adc521441661a1229f00 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: trace faster

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4823ed7eadf35e4b57ce581327e21d39585f1f32 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: fix setting of pos in read_pipe

In resetting the iterator in read_pipe, the reset of pos was
postitioned in the wrong location with respect to the memset
operation. The current code sets pos, incorrectly, to zero.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
107bad8bef5ab2c3a3bff7648c18c9dc3abdc13b 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: add trace pipe header pluggin

This patch adds a method for open_pipe and open_read to the pluggins
so that they can add a header to the trace pipe call.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
53d0aa773053ab18287781e25d52c5faf9e0e09e 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: add logic to record overruns

This patch sets up the infrastructure to record overruns of the tracing
buffer.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
25b0b44a1c732ccfc58095cdd8438955a0a19fff 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: fix comm on function trace output

In cleaning up of the sched_switch code, the function trace recording
of task comms was removed. This patch adds back the recording of comms
for function trace. The output of ftrace now has the task comm instead
of <...>.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4fcdae83cebda24b519a89d3dd976081fff1ca80 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: comment code

This is first installment of adding documentation to the ftrace.
Expect many more patches of this kind in the near future.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ab46428c6969d50ecf6f6e97b7a84abba6274368 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: modulize the number of CPU buffers

Currently ftrace allocates a trace buffer for every possible CPU.
Work is being done to change it to only online CPUs and add hooks
to hotplug CPUS.

This patch lays out the infrastructure for such a change.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c6caeeb142cd3a03c46107aac45c8effc02bbadb 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: replace simple_strtoul with strict_strtoul

Andrew Morton suggested using strict_strtoul over simple_strtoul.
This patch replaces them in ftrace.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cffae437cdfbc8a5370d38cefbff1dfe34dad6ca 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: simple clean ups

Andrew Morton mentioned some clean ups that should be done to ftrace.
This patch does some of the simple clean ups.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
afc2abc0ae4265730a0fc48618012193a09a1d10 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: cleanups

no code changed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
93dcc6ea096c51cc30ad0f6647ed05fead2439bf 12-May-2008 Thomas Gleixner <tglx@linutronix.de> ftrace: simplify hexprint

simplify hex to ascii conversion.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a98a3c3fde3ae7614f19758a043691b6f59dac53 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: trace_entries to dynamically change trace buffer size

This patch adds /debug/tracing/trace_entries that allows users to
see as well as modify the number of trace entries the buffers hold.

The number of entries only increments in ENTRIES_PER_PAGE which is
calculated by the size of an entry with the number of entries that
can fit in a page. The user does not need to use an exact size, but
the entries will be rounded to one of the increments.

Trying to set the entries to 0 will return with -EINVAL.

To avoid race conditions, the modification of the buffer size can only
be done when tracing is completely disabled (current_tracer == none).
A info message will be printed if a user tries to modify the buffer size
when not set to none.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2dc8f09571a61d9cb3dc47bba6608689dfe15d16 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: trace_pipe implement NONBLOCK

This patch implements "NONBLOCK" for trace_pipe. If the trace_pipe is opened
with O_NONBLOCK, then the trace_pipe read will not block when buffer is empty.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
845279972f1736c3463c9cebd1bad92a0a347176 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: return EOF in trace_pipe on change of tracer

Break out of while loop with EOF when the current_trace changes.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
b5685aede3b7b65e72ddc73b951aa1f70798a614 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: restore iterator trace in pipe read

The trace iterator is reset in the read. We still need to restore the tracer
that the trace_pipe was opened with.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8487c23765b6e0444ec6b5f1530766d63fe68e35 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: allow trace_pipe to block on all reads

We expect things like "cat" to block on reads to trace_pipe. That's what
trace_pipe is for.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d17d969160c18b631a19c2b34d260691402650f8 12-May-2008 Ankita Garg <ankita@in.ibm.com> ftrace: fix conversion of task state to char in latency tracer

The conversion of task states to a character in the sched_switch tracer (part
of latency tracer infrastructure), seems to be incorrect. We currently do it
by indexing into the state_to_char array using the state value. The state
values do not map directly into the array index and are thus incorrect. The
following patch addresses this issue. This is also what is being done even
in the show_task() routine in kernel/sched.c

The patch has been compile and run tested.

Signed-off-by: Ankita Garg <ankita@in.ibm.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
72829bc3d63cdc592d8f7dd86ad3b3fe8900fb74 23-May-2008 Thomas Gleixner <tglx@linutronix.de> ftrace: move enums to ftrace.h and make helper function global

picked from the mmiotracer patches to distangle the patch queues.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d15f57f23eaba975309a153b23699cd0c0236974 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: fix mutex unlock in trace output

If the trace output changes on reading the trace files, there is a chance
that the start function will return NULL. If the start function of a sequence
returns NULL the stop equivalent is not called. In this case, all locks
that are taken must be released even if they are released in the stop function.

This patch fixes a case that a mutex was not released on return of NULL
in the start sequence function.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
92205c2343527a863d660360599a4bf8cede77b0 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: user raw_spin_lock in tracing

Lock debugging enabled cause huge performance problems for tracing. Having
the lock verification happening for every function that is called
because mcount calls spin_lock can cripple the system.

This patch converts the spin_locks used by ftrace into raw_spin_locks.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8f96da02c14d722ad9a3713cd7273ce28c9036ad 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: remove wakeup from function trace

trace_function is called by mcount and calling wake_up from that
can have unpredictable results. This patch removes the wakeup from
trace_function.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bac524d3f3dfeffa3a9d44f2c64035b88bcaacb4 12-May-2008 Peter Zijlstra <a.p.zijlstra@chello.nl> ftrace: trace next state

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
88a4216c3ec4281fc7e6725cc3a3ccd01fb1aa14 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: sched special

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
f29c73fe3404f8799ed57aaf48859e0b55fc071f 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: include cpu in stacktrace

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
442e544ce52d4415a024920b84fb95c5f9aa0855 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: iter ctrl fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
36dfe9252bd4c9b55e8454363fb7e444c92c5030 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: make use of tracing_cpumask

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c7078de1aaf562a31b20984409c38cc1b40fa8a3 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: add tracing_cpumask

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4ac3ba41d372e3a9e420b36bc43589662b188a14 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: trace scheduler rbtree

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1a3c3034336320554a3342572dae98d69e054fc7 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: fix __trace_special()

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
017730c11241e26577673eb9d957cfc66172ea91 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: fix wakeups

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4e65551905fb0300ae7e667cbaa41ee2e3f29a13 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: sched tracer, trace full rbtree

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4c1f4d4f0175129934d5dbc19a39296430937a05 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: make nostacktrace the default

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
86387f7ee5d3273ff4859e2c64ce656639b6ca65 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: add stack tracing

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
57422797dc009fc83766bcf230d29dbe6e08e21e 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: add wakeup events to sched tracer

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
e309b41dd65aa953f86765eeeecc941d8e1e8b8f 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: remove notrace

now that we have a kbuild method for notrace, no need to pollute the
C code with the annotations.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
6fb44b717c10ecf37beaaebd312f3afa93fed714 12-May-2008 Steven Rostedt <srostedt@srostedt@redhat.com> ftrace: add trace_function api for other tracers to use

A new check was added in the ftrace function that wont trace if the CPU
trace buffer is disabled. Unfortunately, other tracers used ftrace() to
write to the buffer after they disabled it. The new disable check makes
these calls into a nop.

This patch changes the __ftrace that is called without the check into a
new api for the other tracers to use, called "trace_function". The other
tracers use this interface instead when the trace CPU buffer is already
disabled.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2a2cc8f7c4d0dfd75720867f7dc58d24f075edfc 12-May-2008 Soeren Sandmann Pedersen <sandmann@redhat.com> ftrace: allow the event pipe to be polled

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2577046740fe6d77864128c6187c11125c2449ea 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: build fix

no need to backmerge, only affects ftrace-enabled kernels. (which is
not the default)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
5e3ca0ec76fce92daa4eed0d02de9c79b1fe3920 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: introduce the "hex" output method

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2e0f57618529a2739a5e1570e6c445c9c966b595 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: build fix

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
0fd9e0dac9026df09986a4b201518ae015814aef 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: use cpu clock again

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
26994ead1fc8cced63f17e9848edc1771036664e 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: enabled tracing by default

This patch is the correct way to have tracing enabled by default.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
088b1e427dbba2af93cb6a7d39258c10ff58dd27 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: pipe fixes

Some fixes for better output with the trace pipe.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
dcb6308f2b56720599f6b9d5a01c33e67e69bde4 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace, locking fix

should be an irq-safe lock ...

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
f0a920d5752e1788c0cba2add103076bcc0f7a49 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: add trace_special()

for ad-hoc tracing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cb0f12aae8d085140d37ada351aa5a8e76c3f9b0 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: bin-output

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
f9896bf30928922a3913a3810a4ab7908da6cfe7 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: add raw output

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8c523a9d82dbc4f3f7d972df8c0f1eacd83d0d55 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: clean-up-pipe-iteration

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cdd31cd2d7a0dcbec2cce3974f7129dd4fc8c879 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: remove-idx-sync

remove idx syncing - it's expensive on SMP.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
53c37c17aafcf50f7c6fddaf01dda8f9d7e31ddf 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: fast, scalable, synchronized timestamps

implement globally synchronized, fast and scalable time source for tracing.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
750ed1a40783432d0dcb0e6c2e813a12615d7664 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: timestamp syncing, prepare

rename and uninline now() to ftrace_now().

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4bf39a9411a4ce8712954e03a9bd1592ee345919 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: cleanups

no code changed.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d4c5a2f5870939d837293de87b41dda0012a4572 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: fix locking

we can hold all cpu trace buffer locks at once - put each into a
separate lock class.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
b3806b4316306dc9c542eff6c23d7d42918f3504 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: user run time file reading

This patch creates a file called trace_pipe in the tracing
debug directory. This file is a consumer of the trace buffers.
This means that reads of this file consumes the entries from
the trace buffers so that they will not be read a second time,
as contrast to the static buffers latency_trace and trace.

Reading from the trace_pipe will remove the entries from trace
and latency_trace too.

The advantage that trace_pipe has is that it can record live
traces. It will block when there is nothing in the buffer,
and read the entries as they are entered. An EOF happens when
tracing is disabled (tracing_enabled = 0).

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
214023c3d13a71525e463b5e54e360b926b4dc90 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: add a buffer for output

Later patches will need to print the same things as the seq output
does. But those outputs will not use the seq utility. This patch
adds a buffer to the iterator, that can be used by either the
seq utility or other output.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
93a588f459da134be6ab17c4104e28441beb0d22 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: change buffers to producer consumer

This patch changes the way the CPU trace buffers are handled.
Instead of always starting from the trace page head, the logic
is changed to a producer consumer logic. This allows for the
buffers to be drained while they are alive.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1d4db00a5e30c7b8f8dc2a1b19e886fd942be143 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: reset selftests

The tests may leave stuff in the buffers. This resets the buffers
after each test is run. If a test fails, it does not reset the
buffer to avoid touching a buffer that is corrupted.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4e3c3333f3bd7eedfd21b1155b3c7cd24fc7f754 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: fix time offset

fix time offset calculations and ordering, plus make code more consistent.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
7bd2f24c2f769e3f8f1d4fc8b9fddf689825f6a7 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: add README

make it easier for newbies to find their way around.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c7aafc549766b87819285d3480648fc652a47bc4 12-May-2008 Ingo Molnar <mingo@elte.hu> ftrace: cleanups

factor out code and clean it up.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
60a11774b38fef1ab90b18c5353bd1c7c4d311c8 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: add self-tests

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
57f50be14d57b0dbf88dd019e7bb0ff3a3dc7b81 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: fix max latency

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
89b2f97819dd074297bbe3e19eaa4afcc98845ad 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: fix updates to max trace

This patch fixes some bugs to the updating of the max trace that
was caused by implementing the new buffering.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
18cef379d30f5ded20cc31d7f2d342639d39919d 12-May-2008 Steven Rostedt <rostedt@goodmis.org> ftrace: don't use raw_local_irq_save/restore

Using raw_local_irq_save/restore confuses lockdep.
It's fine to use the normal ones.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4c11d7aed389375253b59e2b1865eec96663c65d 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: convert single large buffer into single pages.

Allocating large buffers for the tracer may fail easily.
This patch converts the buffer from a large ordered allocation
to single pages. It uses the struct page LRU field to link the
pages together.

Later patches may also implement dynamic increasing and decreasing
of the trace buffers.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bc0c38d139ec7fcd5c030aea16b008f3732e42ac 12-May-2008 Steven Rostedt <srostedt@redhat.com> ftrace: latency tracer infrastructure

This patch adds the latency tracer infrastructure. This patch
does not add anything that will select and turn it on, but will
be used by later patches.

If it were to be compiled, it would add the following files
to the debugfs:

The root tracing directory:

/debugfs/tracing/

This patch also adds the following files:

available_tracers
list of available tracers. Currently no tracers are
available. Looking into this file only shows
"none" which is used to unregister all tracers.

current_tracer
The trace that is currently active. Empty on start up.
To switch to a tracer simply echo one of the tracers that
are listed in available_tracers:

example: (used with later patches)

echo function > /debugfs/tracing/current_tracer

To disable the tracer:

echo disable > /debugfs/tracing/current_tracer

tracing_enabled
echoing "1" into this file starts the ftrace function tracing
(if sysctl kernel.ftrace_enabled=1)
echoing "0" turns it off.

latency_trace
This file is readonly and holds the result of the trace.

trace
This file outputs a easier to read version of the trace.

iter_ctrl
Controls the way the output of traces look.
So far there's two controls:
echoing in "symonly" will only show the kallsyms variables
without the addresses (if kallsyms was configured)
echoing in "verbose" will change the output to show
a lot more data, but not very easy to understand by
humans.
echoing in "nosymonly" turns off symonly.
echoing in "noverbose" turns off verbose.

Signed-off-by: Steven Rostedt <srostedt@redhat.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>