Cross Reference: /kernel/trace/trace.c

History log of /kernel/trace/trace.c
Revision	Date	Author	Comments
63351df85f7ece011618955054c1b0d15b393691	26-Jan-2015	Ruchi Kandoi <kandoiruchi@google.com>	Fix compile errors in accordance with changes from 3.14 to 3.18 Signed-off-by: Ruchi Kandoi<kandoiruchi@google.com>
0438cf86ad894af8b5065708afd283568d373cfc	22-Nov-2012	Jamie Gennis <jgennis@google.com>	trace: Add an option to show tgids in trace output The tgids are tracked along side the saved_cmdlines tracking, and can be included in trace output by enabling the 'print-tgid' trace option. This is useful when doing post-processing of the trace data, as it allows events to be grouped by tgid. Change-Id: I52ed04c3a8ca7fddbb868b792ce5d21ceb76250e Signed-off-by: Jamie Gennis <jgennis@google.com>
07906da78810dce5fd35b9449358c9208c693dca	06-Nov-2014	Rabin Vincent <rabin@rab.in>	tracing: Do not risk busy looping in buffer splice If the read loop in trace_buffers_splice_read() keeps failing due to memory allocation failures without reading even a single page then this function will keep busy looping. Remove the risk for that by exiting the function if memory allocation failures are seen. Link: http://lkml.kernel.org/r/1415309167-2373-2-git-send-email-rabin@rab.in Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e30f53aad2202b5526c40c36d8eeac8bf290bde5	10-Nov-2014	Rabin Vincent <rabin@rab.in>	tracing: Do not busy wait in buffer splice On a !PREEMPT kernel, attempting to use trace-cmd results in a soft lockup: # trace-cmd record -e raw_syscalls:* -F false NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [trace-cmd:61] ... Call Trace: [<ffffffff8105b580>] ? __wake_up_common+0x90/0x90 [<ffffffff81092e25>] wait_on_pipe+0x35/0x40 [<ffffffff810936e3>] tracing_buffers_splice_read+0x2e3/0x3c0 [<ffffffff81093300>] ? tracing_stats_read+0x2a0/0x2a0 [<ffffffff812d10ab>] ? _raw_spin_unlock+0x2b/0x40 [<ffffffff810dc87b>] ? do_read_fault+0x21b/0x290 [<ffffffff810de56a>] ? handle_mm_fault+0x2ba/0xbd0 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff810951e2>] ? trace_buffer_lock_reserve+0x22/0x60 [<ffffffff81095c80>] ? trace_event_buffer_lock_reserve+0x40/0x80 [<ffffffff8112415d>] do_splice_to+0x6d/0x90 [<ffffffff81126971>] SyS_splice+0x7c1/0x800 [<ffffffff812d1edd>] tracesys_phase2+0xd3/0xd8 The problem is this: tracing_buffers_splice_read() calls ring_buffer_wait() to wait for data in the ring buffers. The buffers are not empty so ring_buffer_wait() returns immediately. But tracing_buffers_splice_read() calls ring_buffer_read_page() with full=1, meaning it only wants to read a full page. When the full page is not available, tracing_buffers_splice_read() tries to wait again with ring_buffer_wait(), which again returns immediately, and so on. Fix this by adding a "full" argument to ring_buffer_wait() which will make ring_buffer_wait() wait until the writer has left the reader's page, i.e. until full-page reads will succeed. Link: http://lkml.kernel.org/r/1415645194-25379-1-git-send-email-rabin@rab.in Cc: stable@vger.kernel.org # 3.16+ Fixes: b1169cc69ba9 ("tracing: Remove mock up poll wait function") Signed-off-by: Rabin Vincent <rabin@rab.in> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1b3e5c0936046e7e023149ddc8946d21c2ea20eb	16-Jul-2014	Thomas Gleixner <tglx@linutronix.de>	ftrace: Provide trace clocks monotonic Expose the new NMI safe accessor to clock monotonic to the tracer. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
58d4e21e50ff3cc57910a8abc20d7e14375d2f61	18-Jul-2014	Tony Luck <tony.luck@intel.com>	tracing: Fix wraparound problems in "uptime" trace clock The "uptime" trace clock added in: commit 8aacf017b065a805d27467843490c976835eb4a5 tracing: Add "uptime" trace clock that uses jiffies has wraparound problems when the system has been up more than 1 hour 11 minutes and 34 seconds. It converts jiffies to nanoseconds using: (u64)jiffies_to_usecs(jiffy) * 1000ULL but since jiffies_to_usecs() only returns a 32-bit value, it truncates at 2^32 microseconds. An additional problem on 32-bit systems is that the argument is "unsigned long", so fixing the return value only helps until 2^32 jiffies (49.7 days on a HZ=1000 system). Avoid these problems by using jiffies_64 as our basis, and not converting to nanoseconds (we do convert to clock_t because user facing API must not be dependent on internal kernel HZ values). Link: http://lkml.kernel.org/p/99d63c5bfe9b320a3b428d773825a37095bf6a51.1405708254.git.tony.luck@intel.com Cc: stable@vger.kernel.org # 3.10+ Fixes: 8aacf017b065 "tracing: Add "uptime" trace clock that uses jiffies" Signed-off-by: Tony Luck <tony.luck@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6508fa761c330a1d2b4ae36199d08dbcb70e3ddb	18-Jul-2014	Stanislav Fomichev <stfomichev@yandex-team.ru>	tracing: let user specify tracing_thresh after selecting function_graph Currently, tracing_thresh works only if we specify it before selecting function_graph tracer. If we do the opposite, tracing_thresh will change it's value, but it will not be applied. To fix it, we add update_thresh callback which is called whenever tracing_thresh is updated and for function_graph tracer we register handler which reinitializes tracer depending on tracing_thresh. Link: http://lkml.kernel.org/p/20140718111727.GA3206@stfomichev-desktop.yandex.net Signed-off-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f0160a5a2912267c02cfe692eac955c360de5fdf	18-Jul-2013	zhangwei(Jovi) <jovi.zhangwei@huawei.com>	tracing: Add TRACE_ITER_PRINTK flag check in __trace_puts/__trace_bputs The TRACE_ITER_PRINTK check in __trace_puts/__trace_bputs is missing, so add it, to be consistent with __trace_printk/__trace_bprintk. Those functions are all called by the same function: trace_printk(). Link: http://lkml.kernel.org/p/51E7A7D6.8090900@huawei.com Cc: stable@vger.kernel.org # 3.11+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8abfb8727f4a724d31f9ccfd8013fbd16d539445	18-Jul-2013	zhangwei(Jovi) <jovi.zhangwei@huawei.com>	tracing: Add ftrace_trace_stack into __trace_puts/__trace_bputs Currently trace option stacktrace is not applicable for trace_printk with constant string argument, the reason is in __trace_puts/__trace_bputs ftrace_trace_stack is missing. In contrast, when using trace_printk with non constant string argument(will call into __trace_printk/__trace_bprintk), then trace option stacktrace is workable, this inconstant result will confuses users a lot. Link: http://lkml.kernel.org/p/51E7A7C9.9040401@huawei.com Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
099ed151675cd1d2dbeae1dac697975f6a68716d	25-Jun-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Remove ftrace_stop/start() from reading the trace file Disabling reading and writing to the trace file should not be able to disable all function tracing callbacks. There's other users today (like kprobes and perf). Reading a trace file should not stop those from happening. Cc: stable@vger.kernel.org # 3.0+ Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d048a8c7b509f35dd351e1415fe49fa99e4cb7ef	12-Jun-2014	Namhyung Kim <namhyung@kernel.org>	tracing: Add description of set_graph_notrace to tracing/README It was missing the description of set_graph_notrace file. Add it. Link: http://lkml.kernel.org/p/1402590233-22321-5-git-send-email-namhyung@kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3f4d8f78a07dba1cb333ce749bd6a15c1ada362d	26-Jun-2014	Fabian Frederick <fabf@skynet.be>	tracing: Remove unnecessary null test before debugfs_remove() This fixes checkpatch warning: "WARNING: debugfs_remove(NULL) is safe this check is probably not required" Link: http://lkml.kernel.org/p/1403802871-8599-1-git-send-email-fabf@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12306276fabcb746a14979e96f43a13c724dec49	20-Jun-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Move the trace_seq_* functions into its own trace_seq.c file The trace_seq_*() functions are a nice utility that allows users to manipulate buffers with printf() like formats. It has its own trace_seq.h header in include/linux and should be in its own file. Being tied with trace_output.c is rather awkward. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f0b70cc48cc282cb326a4d71b3d1dda7d8fafd2a	10-Jun-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix leak of per cpu max data in instances The freeing of an instance, if max data is configured, there will be per cpu data structures created. But these are not freed when the instance is deleted, which causes a memory leak. A new helper function is added that frees the individual buffers within a trace array, instead of duplicating the code. This way changes made for one are applied to the other (normal buffer vs max buffer). Link: http://lkml.kernel.org/r/87k38pbake.fsf@sejong.aot.lge.com Reported-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a6af8fbf17989e41fef5cacf3988a724fb687d78	10-Jun-2014	Namhyung Kim <namhyung@kernel.org>	tracing: Cleanup saved_cmdlines_size changes The recent addition of saved_cmdlines_size file had some remaining (minor - mostly coding style) issues. Fix them by passing pointer name to sizeof() and using scnprintf(). Link: http://lkml.kernel.org/p/1402384295-23680-1-git-send-email-namhyung@kernel.org Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8b8b36834d0fff67fc8668093f4312dd04dcf21d	10-Jun-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	ring-buffer: Check if buffer exists before polling The per_cpu buffers are created one per possible CPU. But these do not mean that those CPUs are online, nor do they even exist. With the addition of the ring buffer polling, it assumes that the caller polls on an existing buffer. But this is not the case if the user reads trace_pipe from a CPU that does not exist, and this causes the kernel to crash. Simple fix is to check the cpu against buffer bitmask against to see if the buffer was allocated or not and return -ENODEV if it is not. More updates were done to pass the -ENODEV back up to userspace. Link: http://lkml.kernel.org/r/5393DB61.6060707@oracle.com Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: stable@vger.kernel.org # 3.10+ Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a9fcaaac37b3baba1343f906f52aeb65c4d4e356	07-Jun-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix memory leak on instance deletion When an instance is created, it also gets a snapshot ring buffer allocated (with minimum of pages). But when it is deleted the snapshot buffer is not. There was a helper function added to match the allocation of these ring buffers to a way to free them, but it wasn't used by the deletion of an instance. Using that helper function solves this memory leak. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
23aaa3c18e33fe048671b419781b5e44175efafe	06-Jun-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix leak of ring buffer data when new instances creation fails Yoshihiro Yunomae reported that the ring buffer data for a trace instance does not get properly cleaned up when it fails. He proposed a patch that manually cleaned the data up and addad a bunch of labels. The labels are not needed because all trace array is allocated with a kzalloc which initializes it to 0 and all kfree()s can take a NULL pointer and will ignore it. Adding a new helper function free_trace_buffers() that can also take null buffers to free the buffers that were allocated by allocate_trace_buffers(). Link: http://lkml.kernel.org/r/20140605223522.32311.31664.stgit@yunodevel Reported-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
939c7a4f04fcd2162109744e8bf88194948a6e65	05-Jun-2014	Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>	tracing: Introduce saved_cmdlines_size file Introduce saved_cmdlines_size file for changing the number of saved pid-comms. saved_cmdlines currently stores 128 command names using SAVED_CMDLINES, but 'no-existing processes' names are often lost in saved_cmdlines when we read the trace data. So, by introducing saved_cmdlines_size file, we can now change the 128 command names saved to something much larger if needed. When we write a value to saved_cmdlines_size, the number of the value will be stored in pid-comm list: # echo 1024 > /sys/kernel/debug/tracing/saved_cmdlines_size Here, 1024 command names can be stored. The default number is 128 and the maximum number is PID_MAX_DEFAULT (=32768 if CONFIG_BASE_SMALL is not set). So, if we want to avoid losing any command names, we need to set 32768 to saved_cmdlines_size. We can read the maximum number of the list: # cat /sys/kernel/debug/tracing/saved_cmdlines_size 128 Link: http://lkml.kernel.org/p/20140605012427.22115.16173.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
198376cd8877a612cc6494201b815c36d1e40391	03-Jun-2014	Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>	tracing: Eliminate double free on failure of allocation on boot up If allocation of the max_buffer fails on boot up, the error path will free both per_cpu data structures from the buffers. With the new redesign of the code, those structures are freed if allocations failed. That is, the helper function that allocates the buffers will free the per cpu data on failure. No need to do it again. In fact, the second free will cause a bug as the code can not handle a double free. Link: http://lkml.kernel.org/p/20140603042803.27308.30956.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4c27e756bc019ec1c11232893af036fdae720a97	30-May-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Move locking of trace_cmdline_lock into start/stop seq calls With the conversion of the saved_cmdlines output to use seq_read, there is now a race between accessing the values of the saved_cmdlines and the writing to them. The trace_cmdline_lock needs to be taken at the start and stop of the seq calls. A new __trace_find_cmdline() call is created to allow for the look up to happen without taking the lock. Fixes: 42584c81c5ad tracing: Have saved_cmdlines use the seq_read infrastructure Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
379cfdac37923653c9d4242d10052378b7563005	30-May-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Try again for saved cmdline if failed due to locking In order to prevent the saved cmdline cache from being filled when tracing is not active, the comms are only recorded after a trace event is recorded. The problem is, a comm can fail to be recorded if the trace_cmdline_lock is held. That lock is taken via a trylock to allow it to happen from any context (including NMI). If the lock fails to be taken, the comm is skipped. No big deal, as we will try again later. But! Because of the code that was added to only record after an event, we may not try again later as the recording is made as a oneshot per event per CPU. Only disable the recording of the comm if the comm is actually recorded. Fixes: 7ffbd48d5cab "tracing: Cache comms only after an event occurred" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
42584c81c5adc1737a6fe0687facc5e62a5dc8c1	20-Feb-2014	Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>	tracing: Have saved_cmdlines use the seq_read infrastructure Current tracing_saved_cmdlines_read() implementation is naive; It allocates a large buffer, constructs output data to that buffer for each read operation, and then copies a portion of the buffer to the user space buffer. This has several issues such as slow memory allocation, high CPU usage, and even corruption of the output data. The seq_read infrastructure is made to handle this type of work. By converting it to use seq_read() the code becomes smaller, simplified, as well as correct. Link: http://lkml.kernel.org/p/20140220084431.3839.51793.stgit@yunodevel Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2184db46e425c2b84a783eeead0e2b24ce67cd7f	28-May-2014	Steven Rostedt <rostedt@goodmis.org>	tracing: Print nasty banner when trace_printk() is in use trace_printk() is used to debug fast paths within the kernel. Places that gets called in any context (interrupt or NMI) or thousands of times a second. Something you do not want to do with a printk(). In order to make it completely lockless as it needs a temporary buffer to handle some of the string formatting, a page is created per cpu for every context (four per cpu; normal, softirq, irq, NMI). Since trace_printk() should only be used for debugging purposes, there's no reason to waste memory on these buffers on a production system. That means, trace_printk() should never be used unless a developer is debugging their kernel. There's macro magic to allocate the buffers if trace_printk() is used anywhere in the kernel. To help enforce that trace_printk() isn't used outside of development, when it is used, a nasty banner is displayed on bootup (or when a module is loaded that uses trace_printk() and the kernel core does not). Here's the banner: ******************************************************** NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE trace_printk() being used. Allocating extra memory. This means that this is a DEBUG kernel and it is unsafe for produciton use. If you see this message and you are not debugging the kernel, report this immediately to your vendor! NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE NOTICE ******************************************************** That should hopefully keep developers from trying to sneak in a trace_printk() or two. Link: http://lkml.kernel.org/p/20140528131440.2283213c@gandalf.local.home Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bdffd893a0e9c431304142d12d9a0a21d365c502	29-Apr-2014	Christoph Lameter <cl@linux.com>	tracing: Replace __get_cpu_var uses with this_cpu_ptr Replace uses of &__get_cpu_var for address calculation with this_cpu_ptr. Link: http://lkml.kernel.org/p/alpine.DEB.2.10.1404291415560.18364@gentwo.org Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: Christoph Lameter <cl@linux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b1169cc69ba96b124df820904a6d3eb775491d7f	29-Apr-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Remove mock up poll wait function Now that the ring buffer has a built in way to wake up readers when there's data, using irq_work such that it is safe to do it in any context. But it was still using the old "poor man's" wait polling that checks every 1/10 of a second to see if it should wake up a waiter. This makes the latency for a wake up excruciatingly long. No need to do that anymore. Completely remove the different wait_poll types from the tracers and have them all use the default one now. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f4874261049e3abdd481359d82cafa5068369ebd	29-Apr-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Break out of tracing_wait_pipe() before wait_pipe() is called When reading from trace_pipe, if tracing is off but nothing was read it should block. If something is read and tracing is off, then EOF is returned. If tracing is on and there's nothing to read, it will block. But because the check of whether tracing is off and something was read is done after the block on the pipe, it is hit or miss if the EOF is returned or not leading to inconsistent behavior. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ad1438a076e275b70d1a04de1364bc483e5a81db	17-Apr-2014	Fabian Frederick <fabf@skynet.be>	tracing: Add static to local functions This patch adds static to the following functions: -cycle_t buffer_ftrace_now -void free_snapshot -int trace_selftest_startup_dynamic_tracing Link: http://lkml.kernel.org/p/20140417214442.d7abc7c0b0e4b90e7fedecc9@skynet.be Signed-off-by: Fabian Frederick <fabf@skynet.be> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0b9b12c1b884eb34773312f15c194220025e0416	14-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Move ftrace_max_lock into trace_array In preparation for having tracers enabled in instances, the max_lock should be unique as updating the max for one tracer is a separate operation than updating it for another tracer using a different max. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6d9b3fa5e7f663bbfb9d2d80d46136f75319cb28	14-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Move tracing_max_latency into trace_array In preparation for letting the latency tracers be used by instances, remove the global tracing_max_latency variable and add a max_latency field to the trace_array that the latency tracers will now use. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4104d326b670c2b66f575d2004daa28b2d1b4c8d	10-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	ftrace: Remove global function list and call function directly Instead of having a list of global functions that are called, as only one global function is allow to be enabled at a time, there's no reason to have a list. Instead, simply have all the users of the global ops, use the global ops directly, instead of registering their own ftrace_ops. Just switch what function is used before enabling the function tracer. This removes a lot of code as well as the complexity involved with it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a786c06d9f2719203c00b3d97b21f9a96980d0b5	11-Apr-2014	Al Viro <viro@zeniv.linux.org.uk>	missing bits of "splice: fix racy pipe->buffers uses" that commit has fixed only the parts of that mess in fs/splice.c itself; there had been more in several other ->splice_read() instances... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
17a280ea8111c66791c18c0353b7986aafcb24fe	11-Apr-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add missing function triggers dump and cpudump to README The debugfs tracing README file lists all the function triggers except for dump and cpudump. These should be added too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
fbb32750a62df75d1ffea547f3908b21c5496d9f	03-Feb-2014	Al Viro <viro@zeniv.linux.org.uk>	pipe: kill ->map() and ->unmap() all pipe_buffer_operations have the same instances of those... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2c4a33aba5f9ea3a28f2e40351f078d95f00786b	26-Mar-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix traceon trigger condition to actually turn tracing on While working on my tutorial for 2014 Linux Collaboration Summit I found that the traceon trigger did not work when conditions were used. The other triggers worked fine though. Looking into it, it is because of the way the triggers use the ring buffer to store the fields it will use for the condition. But if tracing is off, nothing is stored in the buffer, and the tracepoint exits before calling the trigger to test the condition. This is fine for all the triggers that only work when tracing is on, but for traceon trigger that is to work when tracing is off, nothing happens. The fix is simple, just use a temp ring buffer to record the event if tracing is off and the event has a trace event conditional trigger enabled. The rest of the tracepoint code will work just fine, but the tracepoint wont be recorded in the other buffers. Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e1e232ca6b8faa210e5509f17d55519b4392524f	11-Feb-2014	Steven Rostedt <rostedt@goodmis.org>	tracing: Add trace_clock=<clock> kernel parameter Being able to change the trace clock at boot can be advantageous if you need a better source of when things happen across CPUs. The default trace clock is the fastest, but it uses local clocks which may not be synced across CPUs and it does not let you know when events took place with respect to events on other CPUs. The global trace clock can help in this case, and if you do not care about timings, the counter "clock" is the best, as that is just a simple atomic counter that is incremented for every event. Usage is to add "trace_clock=counter" on the kernel command line. You can replace counter with "global" or any of the clocks listed in /sys/kernel/debug/tracing/trace_clock Suggested-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Appreciated-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
591dffdade9f07692a7dd3ed16830ec24e901ece	10-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	ftrace: Allow for function tracing instance to filter functions Create a "set_ftrace_filter" and "set_ftrace_notrace" files in the instance directories to let users filter of functions to trace for the given instance. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
50512ab576e1ce29953c9259e1f36ce16f350f20	14-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Convert tracer->enabled to counter As tracers will soon be used by instances, the tracer enabled field needs to be converted to a counter instead of a boolean. This counter is protected by the trace_types_lock mutex. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6b450d2533e2c1c71fbe7f1bdce0bb1c9f813030	14-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Disable tracers before deletion of instance When an instance is about to be deleted, make sure the tracer is set to nop. If it isn't reset the tracer and set it to the nop tracer, otherwise memory leaks and bad pointers may result. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f1b21c9a40704dfdf7b8423c7d2969ea31c9857d	14-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Only let top level have option files Currently, only the top level instance can have tracing options. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
607e2ea167e56db84387f3ab97e59a862e101cab	07-Nov-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Set up infrastructure to allow tracers for instances Currently the tracers (function, function_graph, irqsoff, etc) can only be used by the top level tracing directory (not for instances). This sets up the infrastructure to allow instances to be able to run a separate tracer apart from the what the top level tracing is doing. As tracers need to adapt for being used by instances, the tracers must flag if they can be used by instances or not. Currently only the 'nop' tracer can be used by all instances. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bf6065b5c7014ab30383405718c7a6b96d2cbdb2	10-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Pass trace_array to flag_changed callback As options (flags) may affect instances instead of being global the flag_changed() callbacks need to receive the trace_array descriptor of the instance they will be modifying. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8c1a49aedb73fb2f15aaa32ad9e2e1c4289f45cb	10-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Pass trace_array to set_flag callback As options (flags) may affect instances instead of being global the set_flag() callbacks need to receive the trace_array descriptor of the instance they will be modifying. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3132e107d608f8753240d82d61303c500fd515b4	23-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Check if tracing is enabled in trace_puts() If trace_puts() is used very early in boot up, it can crash the machine if it is called before the ring buffer is allocated. If a trace_printk() is used with no arguments, then it will be converted into a trace_puts() and suffer the same fate. Cc: stable@vger.kernel.org # 3.10+ Fixes: 09ae72348ecc "tracing: Add trace_puts() for even faster trace_printk() tracing" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
71485c45891b8a0fcc4ce22d87251424ab51e096	23-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix formatting of trace README file Fix the formatting of the README file in the trace debugfs to fit in an 80 character window. Also add a comment about the event trigger counter with regards to traceon and traceoff. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
26f255646e0ca6fde0e994e2a815ba2b31770dce	17-Jan-2014	Tom Zanussi <tom.zanussi@linux.intel.com>	tracing/README: Add event file usage to tracing mini-HOWTO It would be useful to have a cheat-sheet for everything under tracing/events/ alongside the existing text describing the other files in the tracing/ dir. Add short descriptions of the directories and files under events/ along with examples, similar to the existing text for the other files in tracing/. Also clean up a few minor alignment problems noticed when adding the new text. Link: http://lkml.kernel.org/r/1389993104.3040.445.camel@empanada Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
92fdd98cf8bdec4d6b0c510e2f073ac4fd059be8	17-Jan-2014	Al Viro <viro@ZenIV.linux.org.uk>	tracing: Fix buggered tee(2) on tracing_pipe In kernel/trace/trace.c we have this: static void tracing_pipe_buf_release(struct pipe_inode_info pipe, struct pipe_buffer buf) { __free_page(buf->page); } static const struct pipe_buf_operations tracing_pipe_buf_ops = { .can_merge = 0, .map = generic_pipe_buf_map, .unmap = generic_pipe_buf_unmap, .confirm = generic_pipe_buf_confirm, .release = tracing_pipe_buf_release, .steal = generic_pipe_buf_steal, .get = generic_pipe_buf_get, }; with void generic_pipe_buf_get(struct pipe_inode_info pipe, struct pipe_buffer buf) { page_cache_get(buf->page); } and I don't see anything that would've prevented tee(2) called on the pipe that got stuff spliced into it from that sucker. ->ops->get() will be called, then buf gets copied into target pipe's ->bufs[] and eventually readers get to both copies of the buffer. With get_page(page) look at that page __free_page(page) look at that page __free_page(page) which is not a good thing, to put it mildly. AFAICS, that ought to use the normal generic_pipe_buf_release() (aka page_cache_release(buf->page)), shouldn't it? [ SDR - As trace_pipe just allocates the page with alloc_page(GFP_KERNEL), and doesn't do anything special with it (no LRU logic). The __free_page() should be fine, as it wont actually free a page with reference count. Maybe there's a chance to leak memory? Anyway, This change is at a minimum good for being symmetric with generic_pipe_buf_get, it is fine to add. ] Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> [ SDR - Removed no longer used tracing_pipe_buf_release ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dced341b2d4f06668efaab33f88de5d287c0f45b	14-Jan-2014	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Have trace buffer point back to trace_array The trace buffer has a descriptor pointer that goes back to the trace array. But it was never assigned. Luckily, nothing uses it (yet), but it will in the future. Although nothing currently uses this, if any of the new features get backported to older kernels, and because this is such a simple change, I'm marking it for stable too. Cc: stable@vger.kernel.org # v3.10+ Fixes: 12883efb670c "tracing: Consolidate max_tr into main trace_array structure" Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
098c879e1f2d6ee7afbfe959f6b04070065cec90	21-Dec-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add generic tracing_lseek() function Trace event triggers added a lseek that uses the ftrace_filter_lseek() function. Unfortunately, when function tracing is not configured in that function is not defined and the kernel fails to build. This is the second time that function was added to a file ops and it broke the build due to requiring special config dependencies. Make a generic tracing_lseek() that all the tracing utilities may use. Also, modify the old ftrace_filter_lseek() to return 0 instead of 1 on WRONLY. Not sure why it was a 1 as that does not make sense. This also changes the old tracing_seek() to modify the file pos pointer on WRONLY as well. Reported-by: kbuild test robot <fengguang.wu@intel.com> Tested-by: Tom Zanussi <tom.zanussi@linux.intel.com> Acked-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
93e31ffbf417a84fbae518fb46b3ea3f0d8fa6e1	24-Oct-2013	Tom Zanussi <tom.zanussi@linux.intel.com>	tracing: Add 'snapshot' event trigger command Add 'snapshot' event_command. snapshot event triggers are added by the user via this command in a similar way and using practically the same syntax as the analogous 'snapshot' ftrace function command, but instead of writing to the set_ftrace_filter file, the snapshot event trigger is written to the per-event 'trigger' files: echo 'snapshot' > .../somesys/someevent/trigger The above command will turn on snapshots for someevent i.e. whenever someevent is hit, a snapshot will be done. This also adds a 'count' version that limits the number of times the command will be invoked: echo 'snapshot:N' > .../somesys/someevent/trigger Where N is the number of times the command will be invoked. The above command will snapshot N times for someevent i.e. whenever someevent is hit N times, a snapshot will be done. Also adds a new tracing_alloc_snapshot() function - the existing tracing_snapshot_alloc() function is a special version of tracing_snapshot() that also does the snapshot allocation - the snapshot triggers would like to be able to do just the allocation but not take a snapshot; the existing tracing_snapshot_alloc() in turn now also calls tracing_alloc_snapshot() underneath to do that allocation. Link: http://lkml.kernel.org/r/c9524dd07ce01f9dcbd59011290e0a8d5b47d7ad.1382622043.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> [ fix up from kbuild test robot <fengguang.wu@intel.com report ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e5137b50a0640009fd63a3e65c14bc6e1be8796a	04-Oct-2013	Peter Zijlstra <peterz@infradead.org>	ftrace, sched: Add TRACE_FLAG_PREEMPT_RESCHED Since the introduction of PREEMPT_NEED_RESCHED in: f27dde8deef3 ("sched: Add NEED_RESCHED to the preempt_count") we need to be able to look at both TIF_NEED_RESCHED and PREEMPT_NEED_RESCHED to understand the full preemption behaviour. Add it to the trace output. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com> Link: http://lkml.kernel.org/r/20131004152826.GP3081@twins.programming.kicks-ass.net Signed-off-by: Ingo Molnar <mingo@kernel.org>
042b10d83d05174e50ee861ee3aca55fd6204324	06-Nov-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Remove unused function ftrace_off_permanent() In the past, ftrace_off_permanent() was called if something strange was detected. But the ftrace_bug() now handles all the anomolies that can happen with ftrace (function tracing), and there are no uses of ftrace_off_permanent(). Get rid of it. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2e86421debc2cf4d1513c9b73fcd34c5ce431ae3	19-Oct-2013	Geyslan G. Bem <geyslan@gmail.com>	tracing: Add helper function tracing_is_disabled() This patch creates the function 'tracing_is_disabled', which can be used outside of trace.c. Link: http://lkml.kernel.org/r/1382141754-12155-1-git-send-email-geyslan@gmail.com Signed-off-by: Geyslan G. Bem <geyslan@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b2f974d6af9accfec11e69cc76d2ab9f0c7359e0	23-Oct-2013	Cody P Schafer <cody@linux.vnet.ibm.com>	tracing: Open tracer when ftrace_dump_on_oops is used With ftrace_dump_on_oops, we previously did not open the tracer in question, sometimes causing the trace output to be useless. For example, the function_graph tracer with tracing_thresh set dumped via ftrace_dump_on_oops would show a series of '}' indented at different levels, but no function names. call trace->open() (and do a few other fixups copied from the normal dump path) to make the output more intelligible. Link: http://lkml.kernel.org/r/1382554197-16961-1-git-send-email-cody@linux.vnet.ibm.com Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
38de93abec8d8acd8d6dbbe9b0d92d6d5cdb3090	24-Oct-2013	Tom Zanussi <tom.zanussi@linux.intel.com>	tracing: Make register/unregister_ftrace_command __init register/unregister_ftrace_command() are only ever called from __init functions, so can themselves be made __init. Also make register_snapshot_cmd() __init for the same reason. Link: http://lkml.kernel.org/r/d4042c8cadb7ae6f843ac9a89a24e1c6a3099727.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f306cc82a93d6b19f01634b80c580b9755c8b7cc	24-Oct-2013	Tom Zanussi <tom.zanussi@linux.intel.com>	tracing: Update event filters for multibuffer The trace event filters are still tied to event calls rather than event files, which means you don't get what you'd expect when using filters in the multibuffer case: Before: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 2048 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 Setting the filter in tracing/instances/test1/events shouldn't affect the same event in tracing/events as it does above. After: # echo 'bytes_alloc > 8192' > /sys/kernel/debug/tracing/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # mkdir /sys/kernel/debug/tracing/instances/test1 # echo 'bytes_alloc > 2048' > /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter # cat /sys/kernel/debug/tracing/events/kmem/kmalloc/filter bytes_alloc > 8192 # cat /sys/kernel/debug/tracing/instances/test1/events/kmem/kmalloc/filter bytes_alloc > 2048 We'd like to just move the filter directly from ftrace_event_call to ftrace_event_file, but there are a couple cases that don't yet have multibuffer support and therefore have to continue using the current event_call-based filters. For those cases, a new USE_CALL_FILTER bit is added to the event_call flags, whose main purpose is to keep the old behavior for those cases until they can be updated with multibuffer support; at that point, the USE_CALL_FILTER flag (and the new associated call_filter_check_discard() function) can go away. The multibuffer support also made filter_current_check_discard() redundant, so this change removes that function as well and replaces it with filter_check_discard() (or call_filter_check_discard() as appropriate). Link: http://lkml.kernel.org/r/f16e9ce4270c62f46b2e966119225e1c3cca7e60.1382620672.git.tom.zanussi@linux.intel.com Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
057db8488b53d5e4faa0cedb2f39d4ae75dfbdbb	10-Oct-2013	Steven Rostedt <rostedt@goodmis.org>	tracing: Fix potential out-of-bounds in trace_get_user() Andrey reported the following report: ERROR: AddressSanitizer: heap-buffer-overflow on address ffff8800359c99f3 ffff8800359c99f3 is located 0 bytes to the right of 243-byte region [ffff8800359c9900, ffff8800359c99f3) Accessed by thread T13003: #0 ffffffff810dd2da (asan_report_error+0x32a/0x440) #1 ffffffff810dc6b0 (asan_check_region+0x30/0x40) #2 ffffffff810dd4d3 (__tsan_write1+0x13/0x20) #3 ffffffff811cd19e (ftrace_regex_release+0x1be/0x260) #4 ffffffff812a1065 (__fput+0x155/0x360) #5 ffffffff812a12de (____fput+0x1e/0x30) #6 ffffffff8111708d (task_work_run+0x10d/0x140) #7 ffffffff810ea043 (do_exit+0x433/0x11f0) #8 ffffffff810eaee4 (do_group_exit+0x84/0x130) #9 ffffffff810eafb1 (SyS_exit_group+0x21/0x30) #10 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Allocated by thread T5167: #0 ffffffff810dc778 (asan_slab_alloc+0x48/0xc0) #1 ffffffff8128337c (__kmalloc+0xbc/0x500) #2 ffffffff811d9d54 (trace_parser_get_init+0x34/0x90) #3 ffffffff811cd7b3 (ftrace_regex_open+0x83/0x2e0) #4 ffffffff811cda7d (ftrace_filter_open+0x2d/0x40) #5 ffffffff8129b4ff (do_dentry_open+0x32f/0x430) #6 ffffffff8129b668 (finish_open+0x68/0xa0) #7 ffffffff812b66ac (do_last+0xb8c/0x1710) #8 ffffffff812b7350 (path_openat+0x120/0xb50) #9 ffffffff812b8884 (do_filp_open+0x54/0xb0) #10 ffffffff8129d36c (do_sys_open+0x1ac/0x2c0) #11 ffffffff8129d4b7 (SyS_open+0x37/0x50) #12 ffffffff81928782 (system_call_fastpath+0x16/0x1b) Shadow bytes around the buggy address: ffff8800359c9700: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd ffff8800359c9780: fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa fa ffff8800359c9800: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9880: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 =>ffff8800359c9980: 00 00 00 00 00 00 00 00 00 00 00 00 00 00[03]fb ffff8800359c9a00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9a80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa ffff8800359c9b00: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00 ffff8800359c9b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffff8800359c9c00: 00 00 00 00 00 00 00 00 fa fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap redzone: fa Heap kmalloc redzone: fb Freed heap region: fd Shadow gap: fe The out-of-bounds access happens on 'parser->buffer[parser->idx] = 0;' Although the crash happened in ftrace_regex_open() the real bug occurred in trace_get_user() where there's an incrementation to parser->idx without a check against the size. The way it is triggered is if userspace sends in 128 characters (EVENT_BUF_SIZE + 1), the loop that reads the last character stores it and then breaks out because there is no more characters. Then the last character is read to determine what to do next, and the index is incremented without checking size. Then the caller of trace_get_user() usually nulls out the last character with a zero, but since the index is equal to the size, it writes a nul character after the allocated space, which can corrupt memory. Luckily, only root user has write access to this file. Link: http://lkml.kernel.org/r/20131009222323.04fd1a0d@gandalf.local.home Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b9be6d026d327593784b042aab4fa27e2de9c825	13-Sep-2013	Wang YanQing <udknight@gmail.com>	tracing: Show more exact help information about snapshot The current "help" that comes out of the snapshot file when it is not allocated looks like this: # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Echo 2 says that it does not allocate the buffer, which is correct, but to be more consistent with "echo 0" it should also state that it does not free. Link: http://lkml.kernel.org/r/20130914045916.GA4243@udknight Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ccfe9e42e451232dd17a230d1b4e979c3d15311e	08-Aug-2013	Alexander Z Lam <azl@google.com>	tracing: Make tracing_cpumask available for all instances Allow tracer instances to disable tracing by cpu by moving the static global tracing_cpumask into trace_array. Link: http://lkml.kernel.org/r/921622317f239bfc2283cac2242647801ef584f2.1375980149.git.azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9457158bbc0ee04ecef76862d73eecd8076e9c7b	03-Aug-2013	Alexander Z Lam <azl@google.com>	tracing: Fix reset of time stamps during trace_clock changes Fixed two issues with changing the timestamp clock with trace_clock: - The global buffer was reset on instance clock changes. Change this to pass the correct per-instance buffer - ftrace_now() is used to set buf->time_start in tracing_reset_online_cpus(). This was incorrect because ftrace_now() used the global buffer's clock to return the current time. Change this to use buffer_ftrace_now() which returns the current time for the correct per-instance buffer. Also removed tracing_reset_current() because it is not used anywhere Link: http://lkml.kernel.org/r/1375493777-17261-2-git-send-email-azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
711e124379e0f889e40e2f01d7f5d61936d3cd23	03-Aug-2013	Alexander Z Lam <azl@google.com>	tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer Releasing the free_buffer file in an instance causes the global buffer to be stopped when TRACE_ITER_STOP_ON_FREE is enabled. Operate on the correct buffer. Link: http://lkml.kernel.org/r/1375493777-17261-1-git-send-email-azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ed5467da0e369e65b247b99eb6403cb79172bcda	02-Aug-2013	Andrew Vagin <avagin@openvz.org>	tracing: Fix fields of struct trace_iterator that are zeroed by mistake tracing_read_pipe zeros all fields bellow "seq". The declaration contains a comment about that, but it doesn't help. The first field is "snapshot", it's true when current open file is snapshot. Looks obvious, that it should not be zeroed. The second field is "started". It was converted from cpumask_t to cpumask_var_t (v2.6.28-4983-g4462344), in other words it was converted from cpumask to pointer on cpumask. Currently the reference on "started" memory is lost after the first read from tracing_read_pipe and a proper object will never be freed. The "started" is never dereferenced for trace_pipe, because trace_pipe can't have the TRACE_FILE_ANNOTATE options. Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org Cc: stable@vger.kernel.org # 2.6.30 Signed-off-by: Andrew Vagin <avagin@openvz.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
09d8091c024ec88d1541d93eb8ddb2bd5cf10c39	24-Jul-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus() Commit a82274151af "tracing: Protect ftrace_trace_arrays list in trace_events.c" added taking the trace_types_lock mutex in trace_events.c as there were several locations that needed it for protection. Unfortunately, it also encapsulated a call to tracing_reset_all_online_cpus() which also takes the trace_types_lock, causing a deadlock. This happens when a module has tracepoints and has been traced. When the module is removed, the trace events module notifier will grab the trace_types_lock, do a bunch of clean ups, and also clears the buffer by calling tracing_reset_all_online_cpus. This doesn't happen often which explains why it wasn't caught right away. Commit a82274151af was marked for stable, which means this must be sent to stable too. Link: http://lkml.kernel.org/r/51EEC646.7070306@broadcom.com Reported-by: Arend van Spril <arend@broadcom.com> Tested-by: Arend van Spriel <arend@broadcom.com> Cc: Alexander Z Lam <azl@google.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9c01fe4593db123c5a72dc36f0400f776e92c954	23-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Kill trace_cpu struct/members After the previous changes trace_array_cpu->trace_cpu and trace_array->trace_cpu becomes write-only. Remove these members and kill "struct trace_cpu" as well. As a side effect this also removes memset(per_cpu_memory, 0). It was not needed, alloc_percpu() returns zero-filled memory. Link: http://lkml.kernel.org/r/20130723152613.GA23741@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6484c71cbc170634fa131b6d022d86d61686b88b	23-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu() tracing_open() and tracing_snapshot_open() are racy, the memory inode->i_private points to can be already freed. Convert these last users of "inode->i_private == trace_cpu" to use "i_private = trace_array" and rely on tracing_get_cpu(). v2: incorporate the fix from Steven, tracing_release() must not blindly dereference file->private_data unless we know that the file was opened for reading. Link: http://lkml.kernel.org/r/20130723152610.GA23737@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0bc392ee46d0fd8e6b678457ef71f074f19a03c5	23-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Change tracing_entries_fops to rely on tracing_get_cpu() tracing_open_generic_tc() is racy, the memory inode->i_private points to can be already freed. 1. Change its last user, tracing_entries_fops, to use tracing_*_generic_tr() instead. 2. Change debugfs_create_file("buffer_size_kb", data) callers to pass "data = tr". 3. Change tracing_entries_read() and tracing_entries_write() to use tracing_get_cpu(). 4. Kill the no longer used tracing_open_generic_tc() and tracing_release_generic_tc(). Link: http://lkml.kernel.org/r/20130723152606.GA23730@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4d3435b8a4c3357695e09c5e7a3bf73a19fca5b0	23-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Change tracing_stats_fops to rely on tracing_get_cpu() tracing_open_generic_tc() is racy, the memory inode->i_private points to can be already freed. 1. Change one of its users, tracing_stats_fops, to use tracing_*_generic_tr() instead. 2. Change trace_create_cpu_file("stats", data) to pass "data = tr". 3. Change tracing_stats_read() to use tracing_get_cpu(). Link: http://lkml.kernel.org/r/20130723152603.GA23727@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
46ef2be0d1d5ccea0c41bb606143586daadd537c	23-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Change tracing_buffers_fops to rely on tracing_get_cpu() tracing_buffers_open() is racy, the memory inode->i_private points to can be already freed. Change debugfs_create_file("trace_pipe_raw", data) caller to pass "data = tr", tracing_buffers_open() can use tracing_get_cpu(). Change debugfs_create_file("snapshot_raw_fops", data) caller too, this file uses tracing_buffers_open/release. Link: http://lkml.kernel.org/r/20130723152600.GA23720@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
15544209cb0b5312e5220a9337a1fe61d1a1f2d9	23-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu() tracing_open_pipe() is racy, the memory inode->i_private points to can be already freed. Change debugfs_create_file("trace_pipe", data) callers to to pass "data = tr", tracing_open_pipe() can use tracing_get_cpu(). Link: http://lkml.kernel.org/r/20130723152557.GA23717@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
649e9c70da6bfbeb563193a35d3424a5aa7c0d38	23-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Introduce trace_create_cpu_file() and tracing_get_cpu() Every "file_operations" used by tracing_init_debugfs_percpu is buggy. f_op->open/etc does: 1. struct trace_cpu tc = inode->i_private; struct trace_array tr = tc->tr; 2. trace_array_get(tr) or fail; 3. do_something(tc); But tc (and tr) can be already freed before trace_array_get() is called. And it doesn't matter whether this file is per-cpu or it was created by init_tracer_debugfs(), free_percpu() or kfree() are equally bad. Note that even 1. is not safe, the freed memory can be unmapped. But even if it was safe trace_array_get() can wrongly succeed if we also race with the next new_instance_create() which can re-allocate the same tr, or tc was overwritten and ->tr points to the valid tr. In this case 3. uses the freed/reused memory. Add the new trivial helper, trace_create_cpu_file() which simply calls trace_create_file() and encodes "cpu" in "struct inode". Another helper, tracing_get_cpu() will be used to read cpu_nr-or-RING_BUFFER_ALL_CPUS. The patch abuses ->i_cdev to encode the number, it is never used unless the file is S_ISCHR(). But we could use something else, say, i_bytes or even ->d_fsdata. In any case this hack is hidden inside these 2 helpers, it would be trivial to change them if needed. This patch only changes tracing_init_debugfs_percpu() to use the new trace_create_cpu_file(), the next patches will change file_operations. Note: tracing_get_cpu(inode) is always safe but you can't trust the result unless trace_array_get() was called, without trace_types_lock which acts as a barrier it can wrongly return RING_BUFFER_ALL_CPUS. Link: http://lkml.kernel.org/r/20130723152554.GA23710@redhat.com Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e70e78e3c83b536730e31231dd9b979768d8df3c	19-Jul-2013	Oleg Nesterov <oleg@redhat.com>	tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open() tracing_buffers_open() does trace_array_get() and then it wrongly inrcements tr->ref again under trace_types_lock. This means that every caller leaks trace_array: # cd /sys/kernel/debug/tracing/ # mkdir instances/X # true < instances/X/per_cpu/cpu0/trace_pipe_raw # rmdir instances/X rmdir: failed to remove `instances/X': Device or resource busy Link: http://lkml.kernel.org/r/20130719153644.GA18899@redhat.com Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f77d09a384676bde6445413949d9d2c508ff3e62	18-Jul-2013	Alexander Z Lam <azl@google.com>	tracing: Miscellaneous fixes for trace_array ref counting Some error paths did not handle ref counting properly, and some trace files need ref counting. Link: http://lkml.kernel.org/r/1374171524-11948-1-git-send-email-azl@google.com Cc: stable@vger.kernel.org # 3.10 Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
609e85a70bcd0eedf4ec60639dbcfb1ab011e054	11-Jul-2013	Alexander Z Lam <azl@google.com>	tracing: Fix error handling to ensure instances can always be removed Remove debugfs directories for tracing instances during creation if an error occurs causing the trace_array for that instance to not be added to ftrace_trace_arrays. If the directory continues to exist after the error, it cannot be removed because the respective trace_array is not in ftrace_trace_arrays. Link: http://lkml.kernel.org/r/1373502874-1706-2-git-send-email-azl@google.com Cc: stable@vger.kernel.org # 3.10 Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
991821c86c2fb6cc4104ce679247864dbc070a83	15-Jul-2013	zhangwei(Jovi) <jovi.zhangwei@huawei.com>	tracing: Use correct config guard CONFIG_STACK_TRACER We should use CONFIG_STACK_TRACER to guard readme text of stack tracer related file, not CONFIG_STACKTRACE. Link: http://lkml.kernel.org/r/51E3B3A2.8080609@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dcc302232c1f9b3ca16f6b8ee190eb0b1a8a0da3	03-Jul-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Make tracing_open_generic_{tr,tc}() static I have patches that will use tracing_open_generic_tr/tc() in other files, but as they are not ready to be merged yet, and Fengguang Wu's sparse scripts pointed out that these functions were not declared anywhere, I'll make them static for now. When these functions are required to be used elsewhere, I'll remove the static then. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8de1eb02778b64f8b292db531cf39a429f84315f	10-Apr-2013	zhangwei(Jovi) <jovi.zhangwei@huawei.com>	tracing: Remove ftrace() function The only caller of function ftrace(...) was removed a long time ago, so remove the function body as well. Link: http://lkml.kernel.org/r/1365564393-10972-10-git-send-email-jovi.zhangwei@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5280bcef91e706770cc1706eb97353e3513322b9	03-Jul-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Make tracer_tracing_{off,on,is_on}() static I have patches that will use tracer_tracing_on/off/is_on() in other files, but as they are not ready to be merged yet, and Fengguang Wu's sparse scripts pointed out that these functions were not declared anywhere, I'll make them static for now. When these functions are required to be used elsewhere, I'll remove the static then. Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7b85af63034818e43aee6c1d7bf1c7c6796a9073	02-Jul-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Get trace_array ref counts when accessing trace files When a trace file is opened that may access a trace array, it must increment its ref count to prevent it from being deleted. Cc: stable@vger.kernel.org # 3.10 Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ff451961a8b2a17667a7bfa39c86fb9b351445db	02-Jul-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add trace_array_get/put() to handle instance refs better Commit a695cb58162 "tracing: Prevent deleting instances when they are being read" tried to fix a race between deleting a trace instance and reading contents of a trace file. But it wasn't good enough. The following could crash the kernel: # cd /sys/kernel/debug/tracing/instances # ( while :; do mkdir foo; rmdir foo; done ) & # ( while :; do cat foo/trace &> /dev/null; done ) & Luckily this can only be done by root user, but it should be fixed regardless. The problem is that a delete of the file can happen after the reader starts to open the file but before it grabs the trace_types_mutex. The solution is to validate the trace array before using it. If the trace array does not exist in the list of trace arrays, then it returns -ENODEV. There's a possibility that a trace_array could be deleted and a new one created and the open would open its file instead. But that is very minor as it will just return the data of the new trace array, it may confuse the user but it will not crash the system. As this can only be done by root anyway, the race will only occur if root is deleting what its trying to read at the same time. Cc: stable@vger.kernel.org # 3.10 Reported-by: Alexander Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a82274151af2b075163e3c42c828529dee311487	02-Jul-2013	Alexander Z Lam <azl@google.com>	tracing: Protect ftrace_trace_arrays list in trace_events.c There are multiple places where the ftrace_trace_arrays list is accessed in trace_events.c without the trace_types_lock held. Link: http://lkml.kernel.org/r/1372732674-22726-1-git-send-email-azl@google.com Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2d71619c59fac95a5415a326162fa046161b938c	02-Jul-2013	Alexander Z Lam <azl@google.com>	tracing: Make trace_marker use the correct per-instance buffer The trace_marker file was present for each new instance created, but it added the trace mark to the global trace buffer instead of to the instance's buffer. Link: http://lkml.kernel.org/r/1372717885-4543-2-git-send-email-azl@google.com Cc: David Sharp <dhsharp@google.com> Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Alexander Z Lam <lambchop468@gmail.com> Cc: stable@vger.kernel.org # 3.10 Signed-off-by: Alexander Z Lam <azl@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
10246fa35d4ffdfe472185d4cbf9c2dfd9a9f023	01-Jul-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Use flag buffer_disabled for irqsoff tracer If the ring buffer is disabled and the irqsoff tracer records a trace it will clear out its buffer and lose the data it had previously recorded. Currently there's a callback when writing to the tracing_of file, but if tracing is disabled via the function tracer trigger, it will not inform the irqsoff tracer to stop recording. By using the "mirror" flag (buffer_disabled) in the trace_array, that keeps track of the status of the trace_array's buffer, it gives the irqsoff tracer a fast way to know if it should record a new trace or not. The flag may be a little behind the real state of the buffer, but it should not affect the trace too much. It's more important for the irqsoff tracer to be fast. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
de7edd31457b626e54a0b2a7e8ff4d65492f01ad	14-Jun-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Disable tracing on warning Add a traceoff_on_warning option in both the kernel command line as well as a sysctl option. When set, any WARN*() function that is hit will cause the tracing_on variable to be cleared, which disables writing to the ring buffer. This is useful especially when tracing a bug with function tracing. When a warning is hit, the print caused by the warning can flood the trace with the functions that producing the output for the warning. This can make the resulting trace useless by either hiding where the bug happened, or worse, by overflowing the buffer and losing the trace of the bug totally. Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
238ae93d699d59876b470bf6455de22bcfaa9a1b	26-May-2013	Wang YanQing <udknight@gmail.com>	tracing: Fix file mode of free_buffer Commit 4f271a2a60c748599b30bb4dafff30d770439b96 (tracing: Add a proc file to stop tracing and free buffer) implement a method to free up ring buffer in kernel memory in the release code path of free_buffer's fd. Then we don't need read/write support for free_buffer, indeed we just have a dummy write fop, and don't implement read fop. So the 0200 is more reasonable file mode for free_buffer than the current file mode 0644. Link: http://lkml.kernel.org/r/20130526085201.GA3183@udknight Acked-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Acked-by: David Sharp <dhsharp@google.com> Signed-off-by: Wang YanQing <udknight@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
58e8eedf18577c7eac722d5d1f190507ea263d1b	23-Apr-2013	Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>	tracing: Fix outputting formats of x86-tsc and counter when use trace_clock Outputting formats of x86-tsc and counter should be a raw format, but after applying the patch(2b6080f28c7cc3efc8625ab71495aae89aeb63a0), the format was changed to nanosec. This is because the global variable trace_clock_id was used. When we use multiple buffers, clock_id of each sub-buffer should be used. Then, this patch uses tr->clock_id instead of the global variable trace_clock_id. [ Basically, this fixes a regression where the multibuffer code changed the trace_clock file to update tr->clock_id but the traces still use the old global trace_clock_id variable, negating the file's effect. The global trace_clock_id variable is obsolete and removed. - SR ] Link: http://lkml.kernel.org/r/20130423013239.22334.7394.stgit@yunodevel Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f17a5194859a82afe4164e938b92035b86c55794	31-May-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Use current_uid() for critical time tracing The irqsoff tracer records the max time that interrupts are disabled. There are hooks in the assembly code that calls back into the tracer when interrupts are disabled or enabled. When they are enabled, the tracer checks if the amount of time they were disabled is larger than the previous recorded max interrupts off time. If it is, it creates a snapshot of the currently running trace to store where the last largest interrupts off time was held and how it happened. During testing, this RCU lockdep dump appeared: [ 1257.829021] =============================== [ 1257.829021] [ INFO: suspicious RCU usage. ] [ 1257.829021] 3.10.0-rc1-test+ #171 Tainted: G W [ 1257.829021] ------------------------------- [ 1257.829021] /home/rostedt/work/git/linux-trace.git/include/linux/rcupdate.h:780 rcu_read_lock() used illegally while idle! [ 1257.829021] [ 1257.829021] other info that might help us debug this: [ 1257.829021] [ 1257.829021] [ 1257.829021] RCU used illegally from idle CPU! [ 1257.829021] rcu_scheduler_active = 1, debug_locks = 0 [ 1257.829021] RCU used illegally from extended quiescent state! [ 1257.829021] 2 locks held by trace-cmd/4831: [ 1257.829021] #0: (max_trace_lock){......}, at: [<ffffffff810e2b77>] stop_critical_timing+0x1a3/0x209 [ 1257.829021] #1: (rcu_read_lock){.+.+..}, at: [<ffffffff810dae5a>] __update_max_tr+0x88/0x1ee [ 1257.829021] [ 1257.829021] stack backtrace: [ 1257.829021] CPU: 3 PID: 4831 Comm: trace-cmd Tainted: G W 3.10.0-rc1-test+ #171 [ 1257.829021] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./To be filled by O.E.M., BIOS SDBLI944.86P 05/08/2007 [ 1257.829021] 0000000000000001 ffff880065f49da8 ffffffff8153dd2b ffff880065f49dd8 [ 1257.829021] ffffffff81092a00 ffff88006bd78680 ffff88007add7500 0000000000000003 [ 1257.829021] ffff88006bd78680 ffff880065f49e18 ffffffff810daebf ffffffff810dae5a [ 1257.829021] Call Trace: [ 1257.829021] [<ffffffff8153dd2b>] dump_stack+0x19/0x1b [ 1257.829021] [<ffffffff81092a00>] lockdep_rcu_suspicious+0x109/0x112 [ 1257.829021] [<ffffffff810daebf>] __update_max_tr+0xed/0x1ee [ 1257.829021] [<ffffffff810dae5a>] ? __update_max_tr+0x88/0x1ee [ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107 [ 1257.829021] [<ffffffff810dbf85>] update_max_tr_single+0x11d/0x12d [ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107 [ 1257.829021] [<ffffffff810e2b15>] stop_critical_timing+0x141/0x209 [ 1257.829021] [<ffffffff8109569a>] ? trace_hardirqs_on+0xd/0xf [ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107 [ 1257.829021] [<ffffffff810e3057>] time_hardirqs_on+0x2a/0x2f [ 1257.829021] [<ffffffff811002b9>] ? user_enter+0xfd/0x107 [ 1257.829021] [<ffffffff8109550c>] trace_hardirqs_on_caller+0x16/0x197 [ 1257.829021] [<ffffffff8109569a>] trace_hardirqs_on+0xd/0xf [ 1257.829021] [<ffffffff811002b9>] user_enter+0xfd/0x107 [ 1257.829021] [<ffffffff810029b4>] do_notify_resume+0x92/0x97 [ 1257.829021] [<ffffffff8154bdca>] int_signal+0x12/0x17 What happened was entering into the user code, the interrupts were enabled and a max interrupts off was recorded. The trace buffer was saved along with various information about the task: comm, pid, uid, priority, etc. The uid is recorded with task_uid(tsk). But this is a macro that uses rcu_read_lock() to retrieve the data, and this happened to happen where RCU is blind (user_enter). As only the preempt and irqs off tracers can have this happen, and they both only have the tsk == current, if tsk == current, use current_uid() instead of task_uid(), as current_uid() does not use RCU as only current can change its uid. This fixes the RCU suspicious splat. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ca1643186d3dce6171d8f171e516b02496360a9e	23-May-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix crash when ftrace=nop on the kernel command line If ftrace=<tracer> is on the kernel command line, when that tracer is registered, it will be initiated by tracing_set_tracer() to execute that tracer. The nop tracer is just a stub tracer that is used to have no tracer enabled. It is assigned at early bootup as it is the default tracer. But if ftrace=nop is on the kernel command line, the registering of the nop tracer will call tracing_set_tracer() which will try to execute the nop tracer. But it expects tr->current_trace to be assigned something as it usually is assigned to the nop tracer. As it hasn't been assigned to anything yet, it causes the system to crash. The simple fix is to move the tr->current_trace = nop before registering the nop tracer. The functionality is still the same as the nop tracer doesn't do anything anyway. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6c24499f40d96bf07a85b709fb1bee5cea611a1d	30-Apr-2013	Steven Rostedt <rostedt@goodmis.org>	tracing: Fix small merge bug During the 3.10 merge, a conflict happened and the resolution was almost, but not quite, correct. An if statement was reversed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> [ Duh. That was just silly of me - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ed6f1c996bfe4b6e520cf7a74b51cd6988d84420	10-Apr-2013	Namhyung Kim <namhyung.kim@lge.com>	tracing: Check return value of tracing_init_dentry() Check return value and bail out if it's NULL. Link: http://lkml.kernel.org/r/1365553093-10180-2-git-send-email-namhyung@kernel.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: stable@vger.kernel.org Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9607a869ee59594f3f7b9f3ac43a11d92bf3f960	07-Apr-2013	Chen Gang <gang.chen@asianux.com>	kernel: tracing: Use strlcpy instead of strncpy Use strlcpy() instead of strncpy() as it will always add a '\0' to the end of the string even if the buffer is smaller than what is being copied. Link: http://lkml.kernel.org/r/51624254.30301@asianux.com Signed-off-by: Chen Gang <gang.chen@asianux.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2930e04d00e113ae24bb2b7c2b58de7b648a62c7	26-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix race with update_max_tr_single and changing tracers The commit 34600f0e9 "tracing: Fix race with max_tr and changing tracers" fixed the updating of the main buffers with the race of changing tracers, but left out the fix to the updating of just a per cpu buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
67012ab1d2ce871afea4ee55408f233f97d09d07	07-Apr-2013	Chen Gang <gang.chen@asianux.com>	perf: Fix strncpy() use, use strlcpy() instead of strncpy() For NUL terminated string we always need to set '\0' at the end. Signed-off-by: Chen Gang <gang.chen@asianux.com> Cc: rostedt@goodmis.org Cc: Frederic Weisbecker <fweisbec@gmail.com> Link: http://lkml.kernel.org/r/51624254.30301@asianux.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
22f45649ce08642ad7df238d5c25fa5c86bfdd31	15-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Update debugfs README file Update the README file in debugfs/tracing to something more useful. What's currently in the file is very old and what it shows doesn't have much use. Heck, it tells you how to mount debugfs! But to read this file you would have already needed to mount it. Replace the file with current up-to-date information. It's rather limited, but what do you expect from a pseudo README file. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7fe70b579c9e3daba71635e31b6189394e7b79d3	15-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix ftrace_dump() ftrace_dump() had a lot of issues. What ftrace_dump() does, is when ftrace_dump_on_oops is set (via a kernel parameter or sysctl), it will dump out the ftrace buffers to the console when either a oops, panic, or a sysrq-z occurs. This was written a long time ago when ftrace was fragile to recursion. But it wasn't written well even for that. There's a possible deadlock that can occur if a ftrace_dump() is happening and an NMI triggers another dump. This is because it grabs a lock before checking if the dump ran. It also totally disables ftrace, and tracing for no good reasons. As the ring_buffer now checks if it is read via a oops or NMI, where there's a chance that the buffer gets corrupted, it will disable itself. No need to have ftrace_dump() do the same. ftrace_dump() is now cleaned up where it uses an atomic counter to make sure only one dump happens at a time. A simple atomic_inc_return() is enough that is needed for both other CPUs and NMIs. No need for a spinlock, as if one CPU is running the dump, no other CPU needs to do it too. The tracing_on variable is turned off and not turned on. The original code did this, but it wasn't pretty. By just disabling this variable we get the result of not seeing traces that happen between crashes. For sysrq-z, it doesn't get turned on, but the user can always write a '1' to the tracing_on file. If they are using sysrq-z, then they should know about tracing_on. The new code is much easier to read and less error prone. No more deadlock possibility when an NMI triggers here. Reported-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Cc: stable@vger.kernel.org Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bd6df18716fa45bc4aa9587aca033de909e5382b	11-Mar-2013	zhangwei(Jovi) <jovi.zhangwei@huawei.com>	tracing: Use TRACE_MAX_PRINT instead of constant TRACE_MAX_PRINT macro is defined, but is not used. Link: http://lkml.kernel.org/r/513D8421.4070404@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
687c878afb526a0c3117dbc408ca76ad80d689f7	11-Mar-2013	zhangwei(Jovi) <jovi.zhangwei@huawei.com>	tracing: Use pr_warn_once instead of open coded implementation Use pr_warn_once, instead of making an open coded implementation. Link: http://lkml.kernel.org/r/513D8419.20400@huawei.com Signed-off-by: zhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
76f119179b8ce3188a8c61d2486d37810a416655	14-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add "perf" trace_clock The function trace_clock() calls "local_clock()" which is exactly the same clock that perf uses. I'm not sure why perf doesn't call trace_clock(), as trace_clock() doesn't have any users. But now it does. As trace_clock() calls local_clock() like perf does, I added the trace_clock "perf" option that uses trace_clock(). Now the ftrace buffers can use the same clock as perf uses. This will be useful when perf starts reading the ftrace buffers, and will be able to interleave them with the same clock data. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8aacf017b065a805d27467843490c976835eb4a5	14-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add "uptime" trace clock that uses jiffies Add a simple trace clock called "uptime" for those that are interested in the uptime of the trace. It uses jiffies as that's the safest method, as other uptime clocks grab seq locks, which could cause a deadlock if taken from an event or function tracer. Requested-by: Mauro Carvalho Chehab <mchehab@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
328df4759c03e2c3e7429cc6cb0e180c38f32063	14-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add function-trace option to disable function tracing of latency tracers Currently, the only way to stop the latency tracers from doing function tracing is to fully disable the function tracer from the proc file system: echo 0 > /proc/sys/kernel/ftrace_enabled This is a big hammer approach as it disables function tracing for all users. This includes kprobes, perf, stack tracer, etc. Instead, create a function-trace option that the latency tracers can check to determine if it should enable function tracing or not. This option can be set or cleared even while the tracer is active and the tracers will disable or enable function tracing depending on how the option was set. Instead of using the proc file, disable latency function tracing with echo 0 > /debug/tracing/options/function-trace Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Clark Williams <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c142be8ebe0b7bf73c8a0063925623f3e4b980c0	13-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add skip argument to trace_dump_stack() Altough the trace_dump_stack() already skips three functions in the call to stack trace, which gets the stack trace to start at the caller of the function, the caller may want to skip some more too (as it may have helper functions). Add a skip argument to the trace_dump_stack() that lets the caller skip back tracing functions that it doesn't care about. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
77fd5c15e3216b901be69047ca43b05ae9099951	12-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add snapshot trigger to function probes echo 'schedule:snapshot:1' > /debug/tracing/set_ftrace_filter This will cause the scheduler to trigger a snapshot the next time it's called (you can use any function that's not called by NMI). Even though it triggers only once, you still need to remove it with: echo '!schedule:snapshot:0' > /debug/tracing/set_ftrace_filter The :1 can be left off for the first command: echo 'schedule:snapshot' > /debug/tracing/set_ftrace_filter But this will cause all calls to schedule to trigger a snapshot. This must be removed without the ':0' echo '!schedule:snapshot' > /debug/tracing/set_ftrace_filter As adding a "count" is a different operation (internally). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
3209cff4490bee55fd2bc1d087cb8ecf2a686a88	12-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add alloc/free_snapshot() to replace duplicate code Add alloc_snapshot() and free_snapshot() to allocate and free the snapshot buffer respectively, and use these to remove duplicate code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1b22e382ab40b0e3ee5abb3e310dffb16fee22aa	09-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Let tracing_snapshot() be used by modules but not NMI Add EXPORT_SYMBOL_GPL() to let the tracing_snapshot() functions be called from modules. Also add a test to see if the snapshot was called from NMI context and just warn in the tracing buffer if so, and return. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ca268da6e415448a43138e1abc5d5f057af319d7	09-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add internal ftrace trace_puts() for ftrace to use There's a few places that ftrace uses trace_printk() for internal use, but this requires context (normal, softirq, irq, NMI) buffers to keep things lockless. But the trace_puts() does not, as it can write the string directly into the ring buffer. Make a internal helper for trace_puts() and have the internal functions use that. This way the extra context buffers are not used. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
09ae72348eccb60e304cf8ce94653f4a78fcd407	09-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Add trace_puts() for even faster trace_printk() tracing The trace_printk() is extremely fast and is very handy as it can be used in any context (including NMIs!). But it still requires scanning the fmt string for parsing the args. Even the trace_bprintk() requires a scan to know what args will be saved, although it doesn't copy the format string itself. Several times trace_printk() has no args, and wastes cpu cycles scanning the fmt string. Adding trace_puts() allows the developer to use an even faster tracing method that only saves the pointer to the string in the ring buffer without doing any format parsing at all. This will help remove even more of the "Heisenbug" effect, when debugging. Also fixed up the F_printk()s for the ftrace internal bprint and print events. Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
55034cd6e648155393b0d665eef76b38d49ad6bf	08-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Add alloc_snapshot kernel command line parameter If debugging the kernel, and the developer wants to use tracing_snapshot() in places where tracing_snapshot_alloc() may be difficult (or more likely, the developer is lazy and doesn't want to bother with tracing_snapshot_alloc() at all), then adding alloc_snapshot to the kernel command line parameter will tell ftrace to allocate the snapshot buffer (if configured) when it allocates the main tracing buffer. I also noticed that ring_buffer_expanded and tracing_selftest_disabled had inconsistent use of boolean "true" and "false" with "0" and "1". I cleaned that up too. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f4e781c0a89d5810729772290441ac7d61f321ec	07-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Move the tracing selftest code into its own function Move the tracing startup selftest code into its own function and when not enabled, always have that function succeed. This makes the register_tracer() function much more readable. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ad909e21bbe69f1d39055d346540abd827190eca	07-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Add internal tracing_snapshot() functions The new snapshot feature is quite handy. It's a way for the user to take advantage of the spare buffer that, until then, only the latency tracers used to "snapshot" the buffer when it hit a max latency. Now users can trigger a "snapshot" manually when some condition is hit in a program. But a snapshot currently can not be triggered by a condition inside the kernel. With the addition of tracing_snapshot() and tracing_snapshot_alloc(), snapshots can now be taking when a condition is hit, and the developer wants to snapshot the case without stopping the trace. Note, any snapshot will overwrite the old one, so take care in how this is done. These new functions are to be used like tracing_on(), tracing_off() and trace_printk() are. That is, they should never be called in the mainline Linux kernel. They are solely for the purpose of debugging. The tracing_snapshot() will not allocate a buffer, but it is safe to be called from any context (except NMIs). But if a snapshot buffer isn't allocated when it is called, it will write to the live buffer, complaining about the lack of a snapshot buffer, and then stop tracing (giving you the "permanent snapshot"). tracing_snapshot_alloc() will allocate the snapshot buffer if it was not already allocated and then take the snapshot. This routine may sleep, and must be called from context that can sleep. The allocation is done with GFP_KERNEL and not atomic. If you need a snapshot in an atomic context, say in early boot, then it is best to call the tracing_snapshot_alloc() before then, where it will allocate the buffer, and then you can use the tracing_snapshot() anywhere you want and still get snapshots. Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a695cb5816228f86576f5f5c6809fdf8ed382ece	06-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Prevent deleting instances when they are being read Add a ref count to the trace_array structure and prevent removal of instances that have open descriptors. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
121aaee7b0a82605d33af200c7e9ebab6fd6e444	06-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Add per_cpu directory into tracing instances Add the per_cpu directory to the created tracing instances: cd /sys/kernel/debug/tracing/instances mkdir foo ls foo/per_cpu/cpu0 buffer_size_kb snapshot_raw trace trace_pipe_raw snapshot stats trace_pipe Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ce9bae55972b228cf7bac34350c4d2caf8ea0d0b	06-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Add snapshot feature to instances Add the "snapshot" file to the the multi-buffer instances. cd /sys/kernel/debug/tracing/instances mkdir foo ls foo buffer_size_kb buffer_total_size_kb events free_buffer set_event snapshot trace trace_clock trace_marker trace_options trace_pipe tracing_on cat foo/snapshot # tracer: nop # # # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
737223fbca3b1c91feb947c7f571b35749b743b6	06-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Consolidate buffer allocation code There's a bit of duplicate code in creating the trace buffers for the normal trace buffer and the max trace buffer among the instances and the main global_trace. This code can be consolidated and cleaned up a bit making the code cleaner and more readable as well as less duplication. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
45ad21ca5530efdca6a19e4a5ac5e7bd6e24f996	06-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Have trace_array keep track if snapshot buffer is allocated The snapshot buffer belongs to the trace array not the tracer that is running. The trace array should be the data structure that keeps track of whether or not the snapshot buffer is allocated, not the tracer desciptor. Having the trace array keep track of it makes modifications so much easier. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6de58e6269cd0568ca5fbae14423914eff0f7811	05-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Add snapshot_raw to extract the raw data from snapshot Add a 'snapshot_raw' per_cpu file that allows tools to read the raw binary data of the snapshot buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f1affcaaa861f27752a769f889bf1486ebd301fe	05-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Add snapshot in the per_cpu trace directories Add the snapshot file into the per_cpu tracing directories to allow them to be read for an individual cpu. This also allows to clear an individual cpu from the snapshot buffer. If the kernel allows it (CONFIG_RING_BUFFER_ALLOW_SWAP is set), then echoing in '1' into one of the per_cpu snapshot files will do an individual cpu buffer swap instead of the entire file. Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12883efb670c28dff57dcd7f4f995a1ffe153b2d	05-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Consolidate max_tr into main trace_array structure Currently, the way the latency tracers and snapshot feature works is to have a separate trace_array called "max_tr" that holds the snapshot buffer. For latency tracers, this snapshot buffer is used to swap the running buffer with this buffer to save the current max latency. The only items needed for the max_tr is really just a copy of the buffer itself, the per_cpu data pointers, the time_start timestamp that states when the max latency was triggered, and the cpu that the max latency was triggered on. All other fields in trace_array are unused by the max_tr, making the max_tr mostly bloat. This change removes the max_tr completely, and adds a new structure called trace_buffer, that holds the buffer pointer, the per_cpu data pointers, the time_start timestamp, and the cpu where the latency occurred. The trace_array, now has two trace_buffers, one for the normal trace and one for the max trace or snapshot. By doing this, not only do we remove the bloat from the max_trace but the instances of traces can now use their own snapshot feature and not have just the top level global_trace have the snapshot feature and latency tracers for itself. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
873c642f5964b260480850040dec21e42d0ae4e4	05-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Clear all trace buffers when unloaded module event was used Currently we do not know what buffer a module event was enabled in. On unload, it is safest to clear all buffer instances, not just the top level buffer. Todo: Clear only the buffer that the event was used in. The infrastructure is there to do this, but it makes the code a bit more complex. Lets get the current code vetted before we add that. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
15693458c4bc0693fd63a50d60f35b628fcf4e29	01-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing/ring-buffer: Move poll wake ups into ring buffer code Move the logic to wake up on ring buffer data into the ring buffer code itself. This simplifies the tracing code a lot and also has the added benefit that waiters on one of the instance buffers can be woken only when data is added to that instance instead of data added to any instance. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b627344fef0c38fa4e3050348e168e46db87c905	28-Feb-2013	Steven Rostedt <srostedt@redhat.com>	tracing: Fix read blocking on trace_pipe_raw If the ring buffer is empty, a read to trace_pipe_raw wont block. The tracing code has the infrastructure to wake up waiting readers, but the trace_pipe_raw doesn't take advantage of that. When a read is done to trace_pipe_raw without the O_NONBLOCK flag set, have the read block until there's data in the requested buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cc60cdc952be09bca5b0bff9fefc7aa6185c3049	28-Feb-2013	Steven Rostedt <srostedt@redhat.com>	tracing: Fix polling on trace_pipe_raw The trace_pipe_raw never implemented polling and this was casing issues for several utilities. This is now implemented. Blocked reads still are on the TODO list. Reported-by: Mauro Carvalho Chehab <mchehab@redhat.com> Tested-by: Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
189e5784f6c5e001a84127b83f03bc76a8bfb1ec	01-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Do not block on splice if either file or splice NONBLOCK flag is set Currently only the splice NONBLOCK flag is checked to determine if the splice read should block or not. But the file descriptor NONBLOCK flag also needs to be checked. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0c8916c34203734d3b05953ebace52d7c2969f16	07-Aug-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Add rmdir to remove multibuffer instances Add a method to the hijacked dentry descriptor of the "instances" directory to allow for rmdir to remove an instance of a multibuffer. Example: cd /debug/tracing/instances mkdir hello ls hello/ rmdir hello ls Like the mkdir method, the i_mutex is dropped for the instances directory. The instances directory is created at boot up and can not be renamed or removed. The trace_types_lock mutex is used to synchronize adding and removing of instances. I've run several stress tests with different threads trying to create and delete directories of the same name, and it has stood up fine. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
277ba04461c2746cf935353474c0961161951b68	03-Aug-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Add interface to allow multiple trace buffers Add the interface ("instances" directory) to add multiple buffers to ftrace. To create a new instance, simply do a mkdir in the instances directory: This will create a directory with the following: # cd instances # mkdir foo # ls foo buffer_size_kb free_buffer trace_clock trace_pipe buffer_total_size_kb set_event trace_marker tracing_enabled events/ trace trace_options tracing_on Currently only events are able to be set, and there isn't a way to delete a buffer when one is created (yet). Note, the i_mutex lock is dropped from the parent "instances" directory during the mkdir operation. As the "instances" directory can not be renamed or deleted (created on boot), I do not see any harm in dropping the lock. The creation of the sub directories is protected by trace_types_lock mutex, which only lets one instance get into the code path at a time. If two tasks try to create or delete directories of the same name, only one will occur and the other will fail with -EEXIST. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a7603ff4b5f7e26e67af82a4c3d05eeeb8d7b160	06-Aug-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Replace the static global per_cpu arrays with allocated per_cpu The global and max-tr currently use static per_cpu arrays for the CPU data descriptors. But in order to get new allocated trace_arrays, they need to be allocated per_cpu arrays. Instead of using the static arrays, switch the global and max-tr to use allocated data. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ccb469a198cffac94a7eea0b69f715f06e2ddf15	02-Aug-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Pass the ftrace_file to the buffer lock reserve code Pass the struct ftrace_event_file *ftrace_file to the trace_event_buffer_lock_reserve() (new function that replaces the trace_current_buffer_lock_reserver()). The ftrace_file holds a pointer to the trace_array that is in use. In the case of multiple buffers with different trace_arrays, this allows different events to be recorded into different buffers. Also fixed some of the stale comments in include/trace/ftrace.h Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2b6080f28c7cc3efc8625ab71495aae89aeb63a0	11-May-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Encapsulate global_trace and remove dependencies on global vars The global_trace variable in kernel/trace/trace.c has been kept 'static' and local to that file so that it would not be used too much outside of that file. This has paid off, even though there were lots of changes to make the trace_array structure more generic (not depending on global_trace). Removal of a lot of direct usages of global_trace is needed to be able to create more trace_arrays such that we can add multiple buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ae3b5093ad6004b52e2825f3db1ad8200a2724d8	23-Jan-2013	Steven Rostedt <srostedt@redhat.com>	tracing: Use RING_BUFFER_ALL_CPUS for TRACE_PIPE_ALL_CPU Both RING_BUFFER_ALL_CPUS and TRACE_PIPE_ALL_CPU are defined as -1 and used to say that all the ring buffers are to be modified or read (instead of just a single cpu, which would be >= 0). There's no reason to keep TRACE_PIPE_ALL_CPU as it is also started to be used for more than what it was created for, and now that the ring buffer code added a generic RING_BUFFER_ALL_CPUS define, we can clean up the trace code to use that instead and remove the TRACE_PIPE_ALL_CPU macro. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ae63b31e4d0e2ec09c569306ea46f664508ef717	04-May-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Separate out trace events from global variables The trace events for ftrace are all defined via global variables. The arrays of events and event systems are linked to a global list. This prevents multiple users of the event system (what to enable and what not to). By adding descriptors to represent the event/file relation, as well as to which trace_array descriptor they are associated with, allows for more than one set of events to be defined. Once the trace events files have a link between the trace event and the trace_array they are associated with, we can create multiple trace_arrays that can record separate events in separate buffers. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
613f04a0f51e6e68ac6fe571ab79da3c0a5eb4da	14-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Prevent buffer overwrite disabled for latency tracers The latency tracers require the buffers to be in overwrite mode, otherwise they get screwed up. Force the buffers to stay in overwrite mode when latency tracers are enabled. Added a flag_changed() method to the tracer structure to allow the tracers to see what flags are being changed, and also be able to prevent the change from happing. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
80902822658aab18330569587cdb69ac1dfdcea8	14-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Keep overwrite in sync between regular and snapshot buffers Changing the overwrite mode for the ring buffer via the trace option only sets the normal buffer. But the snapshot buffer could swap with it, and then the snapshot would be in non overwrite mode and the normal buffer would be in overwrite mode, even though the option flag states otherwise. Keep the two buffers overwrite modes in sync. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
69d34da2984c95b33ea21518227e1f9470f11d95	14-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Protect tracer flags with trace_types_lock Seems that the tracer flags have never been protected from synchronous writes. Luckily, admins don't usually modify the tracing flags via two different tasks. But if scripts were to be used to modify them, then they could get corrupted. Move the trace_types_lock that protects against tracers changing to also protect the flags being set. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2721e72dd10f71a3ba90f59781becf02638aa0d9	12-Mar-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Fix race in snapshot swapping Although the swap is wrapped with a spin_lock, the assignment of the temp buffer used to swap is not within that lock. It needs to be moved into that lock, otherwise two swaps happening on two different CPUs, can end up using the wrong temp buffer to assign in the swap. Luckily, all current callers of the swap function appear to have their own locks. But in case something is added that allows two different callers to call the swap, then there's a chance that this race can trigger and corrupt the buffers. New code is coming soon that will allow for this race to trigger. I've Cc'd stable, so this bug will not show up if someone backports one of the changes that can trigger this bug. Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c9960e48543799f168c4c9486f9790fb686ce5a8	05-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Do not return EINVAL in snapshot when not allocated To use the tracing snapshot feature, writing a '1' into the snapshot file causes the snapshot buffer to be allocated if it has not already been allocated and dose a 'swap' with the main buffer, so that the snapshot now contains what was in the main buffer, and the main buffer now writes to what was the snapshot buffer. To free the snapshot buffer, a '0' is written into the snapshot file. To clear the snapshot buffer, any number but a '0' or '1' is written into the snapshot file. But if the file is not allocated it returns -EINVAL error code. This is rather pointless. It is better just to do nothing and return success. Acked-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d8741e2e88ac9a458765a0c7b4e6542d7c038334	05-Mar-2013	Steven Rostedt (Red Hat) <srostedt@redhat.com>	tracing: Add help of snapshot feature when snapshot is empty When cat'ing the snapshot file, instead of showing an empty trace header like the trace file does, show how to use the snapshot feature. Also, this is a good place to show if the snapshot has been allocated or not. Users may want to "pre allocate" the snapshot to have a fast "swap" of the current buffer. Otherwise, a swap would be slow and might fail as it would need to allocate the snapshot buffer, and that might fail under tight memory constraints. Here's what it looked like before: # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| Here's what it looks like now: # tracer: nop # # # * Snapshot is freed * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') Acked-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8bd75c77b7c6a3954140dd2e20346aef3efe4a35	07-Feb-2013	Clark Williams <williams@redhat.com>	sched/rt: Move rt specific bits into new header file Move rt scheduler definitions out of include/linux/sched.h into new file include/linux/sched/rt.h Signed-off-by: Clark Williams <williams@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/20130207094707.7b9f825f@riff.lan Signed-off-by: Ingo Molnar <mingo@kernel.org>
d840f718d28715a9833c1a8f46c2493ff3fd219b	02-Feb-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	tracing: Init current_trace to nop_trace and remove NULL checks On early boot up, when the ftrace ring buffer is initialized, the static variable current_trace is initialized to &nop_trace. Before this initialization, current_trace is NULL and will never become NULL again. It is always reassigned to a ftrace tracer. Several places check if current_trace is NULL before it uses it, and this check is frivolous, because at the point in time when the checks are made the only way current_trace could be NULL is if ftrace failed its allocations at boot up, and the paths to these locations would probably not be possible. By initializing current_trace to &nop_trace where it is declared, current_trace will never be NULL, and we can remove all these checks of current_trace being NULL which never needed to be checked in the first place. Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
debdd57f5145f3c6a4b3f8d0126abd1a2def7fc6	26-Dec-2012	Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>	tracing: Make a snapshot feature available from userspace Ftrace has a snapshot feature available from kernel space and latency tracers (e.g. irqsoff) are using it. This patch enables user applictions to take a snapshot via debugfs. Add "snapshot" debugfs file in "tracing" directory. snapshot: This is used to take a snapshot and to read the output of the snapshot. # echo 1 > snapshot This will allocate the spare buffer for snapshot (if it is not allocated), and take a snapshot. # cat snapshot This will show contents of the snapshot. # echo 0 > snapshot This will free the snapshot if it is allocated. Any other positive values will clear the snapshot contents if the snapshot is allocated, or return EINVAL if it is not allocated. Link: http://lkml.kernel.org/r/20121226025300.3252.86850.stgit@liselsia Cc: Jiri Olsa <jolsa@redhat.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> [ Fixed irqsoff selftest and also a conflict with a change that fixes the update_max_tr. ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2fd196ec1eab2623096e7fc7e6f3976160392bce	26-Dec-2012	Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>	tracing: Replace static old_tracer check of tracer name Currently the trace buffer read functions use a static variable "old_tracer" for detecting if the current tracer changes. This was suitable for a single trace file ("trace"), but to add a snapshot feature that will use the same function for its file, a check against a static variable is not sufficient. To use the output functions for two different files, instead of storing the current tracer in a static variable, as the trace iterator descriptor contains a pointer to the original current tracer's name, that pointer can now be used to check if the current tracer has changed between different reads of the trace file. Link: http://lkml.kernel.org/r/20121226025252.3252.9276.stgit@liselsia Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ad964704ba9326d027fc10fd0099b7c880e50172	29-Jan-2013	Steven Rostedt (Red Hat) <rostedt@goodmis.org>	ring-buffer: Add stats field for amount read from trace ring buffer Add a stat about the number of events read from the ring buffer: # cat /debug/tracing/per_cpu/cpu0/stats entries: 39869 overrun: 870512 commit overrun: 0 bytes: 1449912 oldest event ts: 6561.368690 now ts: 6565.246426 dropped events: 0 read events: 112 <-- Added Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
38dbe0b137bfe6ea92be495017885c0785179a02	25-Jan-2013	Jovi Zhang <bookjovi@gmail.com>	tracing: Remove second iterator initializer The trace iterator is already initialized by trace_init_global_iter(), so there is no need to initialize it again. Link: http://lkml.kernel.org/r/CACV3sb+G1YnO6168JhY3dEadmJi58pA5-2cSZT8E0WVHJNFt9Q@mail.gmail.com Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
821465295b36136998ef294fe176fba4e09c1cd9	19-Nov-2012	Shan Wei <davidshan@tencent.com>	tracing: Use __this_cpu_inc/dec operation instead of __get_cpu_var __this_cpu_inc_return() or __this_cpu_dec generates a single instruction, which is faster than __get_cpu_var operation. Link: http://lkml.kernel.org/r/50A9C1BD.1060308@gmail.com Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b736f48bda54ec75b7dc9306884c3843f1a78a0a	19-Nov-2012	Josh Triplett <josh@joshtriplett.org>	tracing: Mark tracing_dentry_percpu() static Nothing outside of kernel/trace/trace.c references tracing_dentry_percpu(). Link: http://lkml.kernel.org/r/1353302917-13995-7-git-send-email-josh@joshtriplett.org Signed-off-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
34600f0e9c33c9cd48ae87448205f51332b7d5a0	22-Jan-2013	Steven Rostedt <srostedt@redhat.com>	tracing: Fix race with max_tr and changing tracers There's a race condition between the setting of a new tracer and the update of the max trace buffers (the swap). When a new tracer is added, it sets current_trace to nop_trace before disabling the old tracer. At this moment, if the old tracer uses update_max_tr(), the update may trigger the warning against !current_trace->use_max-tr, as nop_trace doesn't have that set. As update_max_tr() requires that interrupts be disabled, we can add a check to see if current_trace == nop_trace and bail if it does. Then when disabling the current_trace, set it to nop_trace and run synchronize_sched(). This will make sure all calls to update_max_tr() have completed (it was called with interrupts disabled). As a clean up, this commit also removes shrinking and recreating the max_tr buffer if the old and new tracers both have use_max_tr set. The old way use to always shrink the buffer, and then expand it for the next tracer. This is a waste of time. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b000c8065a92b0fe0e1694f41b2c8d8ba7b7b1ec	18-Jan-2013	Steven Rostedt <srostedt@redhat.com>	tracing: Remove the extra 4 bytes of padding in events Due to a userspace issue with PowerTop v2beta, which hardcoded the offset of event fields that it was using, it broke when we removed the Big Kernel Lock counter from the event header. (commit e6e1e2593 "tracing: Remove lock_depth from event entry") Because this broke userspace, it was determined that we must keep those 4 bytes around. (commit a3a4a5acd "Regression: partial revert "tracing: Remove lock_depth from event entry"") This unfortunately wastes space in the ring buffer. 4 bytes per event, where a lot of events are just 24 bytes. That's 16% of the buffer wasted. A million events will add 4 megs of white space into the buffer. It was later noticed that PowerTop v2beta could not work on systems where the kernel was 64 bit but the userspace was 32 bits. The reason was because the offsets are different between the two and the hard coded offset of one would not work with the other. With PowerTop v2 final, it implemented the same interface that both perf and trace-cmd use. That is, it reads the format file of the event to find the offsets of the fields it needs. This fixes the problem with running powertop on a 32 bit userspace running on a 64 bit kernel. It also no longer requires the 4 byte padding. As PowerTop v2 has been out for a while, and is included in all major distributions, it is time that we can safely remove the 4 bytes of padding. Users of PowerTop v2beta should upgrade to PowerTop v2 final. Cc: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
84c6cf0db6a00601eb43cfc08244a398ffb0894c	21-Dec-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Remove unneeded check of max_tr->buffer before tracing_reset There's now a check in tracing_reset_online_cpus() if the buffer is allocated or NULL. No need to do a check before calling it with max_tr. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a54164114b96b4693b42cdb553260eec41ea4393	19-Dec-2012	Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>	tracing: Add checks if tr->buffer is NULL in tracing_reset{_online_cpus} max_tr->buffer could be NULL in the tracing_reset{_online_cpus}. In this case, a NULL pointer dereference happens, so we should return immediately from these functions. Note, the current code does not call tracing_reset*() with max_tr when its buffer is NULL, but future code will. This patch is needed to prevent the future code from crashing. Link: http://lkml.kernel.org/r/20121219070234.31200.93863.stgit@liselsia Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d8a0349c0cea477322c66ea9362f10c62fad5f62	13-Nov-2012	Shan Wei <davidshan@tencent.com>	tracing: Use this_cpu_ptr per-cpu helper typeof(&buffer) is a pointer to array of 1024 char, or char (*)[1024]. But, typeof(&buffer[0]) is a pointer to char which match the return type of get_trace_buf(). As well-known, the value of &buffer is equal to &buffer[0]. so return this_cpu_ptr(&percpu_buffer->buffer[0]) can avoid type cast. Link: http://lkml.kernel.org/r/50A1A800.3020102@gmail.com Reviewed-by: Christoph Lameter <cl@linux.com> Signed-off-by: Shan Wei <davidshan@tencent.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
250bfd3d8e7e19cb649dd94689f0af2ce3474060	14-Jan-2013	Liu Bo <bo.li.liu@oracle.com>	tracing: Fix regression of trace_pipe Commit 0fb9656d "tracing: Make tracing_enabled be equal to tracing_on" changes the behaviour of trace_pipe, ie. it makes trace_pipe return if we've read something and tracing is enabled, and this means that we have to 'cat trace_pipe' again and again while running tests. IMO the right way is if tracing is enabled, we always block and wait for ring buffer, or we may lose what we want since ring buffer's size is limited. Link: http://lkml.kernel.org/r/1358132051-5410-1-git-send-email-bo.li.liu@oracle.com Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2df8f8a6a897ebf4c5613b5be6103d33b2a21520	11-Jan-2013	Steven Rostedt <srostedt@redhat.com>	tracing: Fix regression with irqsoff tracer and tracing_on file Commit 02404baf1b47 "tracing: Remove deprecated tracing_enabled file" removed the tracing_enabled file as it never worked properly and the tracing_on file should be used instead. But the tracing_on file didn't call into the tracers start/stop routines like the tracing_enabled file did. This caused trace-cmd to break when it enabled the irqsoff tracer. If you just did "echo irqsoff > current_tracer" then it would work properly. But the tool trace-cmd disables tracing first by writing "0" into the tracing_on file. Then it writes "irqsoff" into current_tracer and then writes "1" into tracing_on. Unfortunately, the above commit changed the irqsoff tracer to check the tracing_on status instead of the tracing_enabled status. If it's disabled then it does not start the tracer internals. The problem is that writing "1" into tracing_on does not call the tracers "start" routine like writing "1" into tracing_enabled did. This makes the irqsoff tracer not start when using the trace-cmd tool, and is a regression for userspace. Simple fix is to have the tracing_on file call the tracers start() method when being enabled (and the stop() method when disabled). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a8dd2176a8e988e3744e863ac39647a6f59fa900	10-Jan-2013	Steven Rostedt <srostedt@redhat.com>	tracing: Fix regression of trace_options file setting The latest change to allow trace options to be set on the command line also broke the trace_options file. The zeroing of the last byte of the option name that is echoed into the trace_option file was removed with the consolidation of some of the code. The compare between the option and what was written to the trace_options file fails because the string holding the data written doesn't terminate with a null character. A zero needs to be added to the end of the string copied from user space. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6d49e352ae9aed3f599041b0c0389aa924815f14	06-Dec-2012	Nadia Yvette Chambers <nyc@holomorphy.com>	propagate name change to comments in kernel source I've legally changed my name with New York State, the US Social Security Administration, et al. This patch propagates the name change and change in initials and login to comments in the kernel source as well. Signed-off-by: Nadia Yvette Chambers <nyc@holomorphy.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
bf3071f5a054db9e5bab873355d27a7330ce5187	25-Jul-2012	Dave Jones <davej@redhat.com>	tracing: Remove unnecessary WARN_ONCE's from tracing_buffers_splice_read WARN shouldn't be used as a means of communicating failure to a userspace programmer. Link: http://lkml.kernel.org/r/20120725153908.GA25203@redhat.com Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d60da506cbeb3f1907a740547dd7ef04a93e908e	17-Oct-2012	Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com>	tracing: Add a resize function to make one buffer equivalent to another buffer Trace buffer size is now per-cpu, so that there are the following two patterns in resizing of buffers. (1) resize per-cpu buffers to same given size (2) resize per-cpu buffers to another trace_array's buffer size for each CPU (such as preparing the max_tr which is equivalent to the global_trace's size) __tracing_resize_ring_buffer() can be used for (1), and had implemented (2) inside it for resetting the global_trace to the original size. (2) was also implemented in another place. So this patch assembles them in a new function - resize_buffer_duplicate_size(). Link: http://lkml.kernel.org/r/20121017025616.2627.91226.stgit@falsita Signed-off-by: Hiraku Toyooka <hiraku.toyooka.gu@hitachi.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
11043d8b125671a32253cddb0b05177be0e976f6	13-Nov-2012	Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com>	tracing: Show raw time stamp on stats per cpu using counter or tsc mode for trace_clock Show raw time stamp values for stats per cpu if you choose counter or tsc mode for trace_clock. Although a unit of tracing time stamp is nsec in local or global mode, the units in counter and TSC mode are tracing counter and cycles respectively. Link: http://lkml.kernel.org/r/1352837903-32191-3-git-send-email-dhsharp@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Yoshihiro YUNOMAE <yoshihiro.yunomae.ez@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8be0709f10e3dd5d7d07933ad61a9f18c4b93ca5	13-Nov-2012	David Sharp <dhsharp@google.com>	tracing: Format non-nanosec times from tsc clock without a decimal point. With the addition of the "tsc" clock, formatting timestamps to look like fractional seconds is misleading. Mark clocks as either in nanoseconds or not, and format non-nanosecond timestamps as decimal integers. Tested: $ cd /sys/kernel/debug/tracing/ $ cat trace_clock [local] global tsc $ echo sched_switch > set_event $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on $ cat trace <idle>-0 [000] 6330.555552: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 [000] 6330.555628: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo 1 > options/latency-format $ cat trace <idle>-0 0 4104553247us+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=29964 next_prio=120 sleep-29964 0 4104553322us+: sched_switch: prev_comm=bash prev_pid=29964 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... $ echo tsc > trace_clock $ cat trace $ echo 1 > tracing_on ; sleep 0.0005 ; echo 0 > tracing_on $ echo 0 > options/latency-format $ cat trace <idle>-0 [000] 16490053398357: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 [000] 16490053588518: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... echo 1 > options/latency-format $ cat trace <idle>-0 0 91557653238+: sched_switch: prev_comm=swapper prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=bash next_pid=31128 next_prio=120 sleep-31128 0 91557843399+: sched_switch: prev_comm=bash prev_pid=31128 prev_prio=120 prev_state=S ==> next_comm=swapper next_pid=0 next_prio=120 ... v2: Move arch-specific bits out of generic code. v4: Fix x86_32 build due to 64-bit division. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1352837903-32191-2-git-send-email-dhsharp@google.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8cbd9cc6254065c97c4bac42daa55ba1abe73a8e	13-Nov-2012	David Sharp <dhsharp@google.com>	tracing,x86: Add a TSC trace_clock In order to promote interoperability between userspace tracers and ftrace, add a trace_clock that reports raw TSC values which will then be recorded in the ring buffer. Userspace tracers that also record TSCs are then on exactly the same time base as the kernel and events can be unambiguously interlaced. Tested: Enabled a tracepoint and the "tsc" trace_clock and saw very large timestamp values. v2: Move arch-specific bits out of generic code. v3: Rename "x86-tsc", cleanups v7: Generic arch bits in Kbuild. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1352837903-32191-1-git-send-email-dhsharp@google.com Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7bcfaf54f591a0775254c4ea679faf615152ee3a	02-Nov-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Add trace_options kernel command line parameter Add trace_options to the kernel command line parameter to be able to set options at early boot. For example, to enable stack dumps of events, add the following: trace_options=stacktrace This along with the trace_event option, you can get not only traces of the events but also the stack dumps with them. Requested-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0d5c6e1c19bab82fad4837108c2902f557d62a04	02-Nov-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Use irq_work for wake ups and remove _nowake_() functions Have the ring buffer commit function use the irq_work infrastructure to wake up any waiters waiting on the ring buffer for new data. The irq_work was created for such a purpose, where doing the actual wake up at the time of adding data is too dangerous, as an event or function trace may be in the midst of the work queue locks and cause deadlocks. The irq_work will either delay the action to the next timer interrupt, or trigger an IPI to itself forcing an interrupt to do the work (in a safe location). With irq_work, all ring buffer commits can safely do wakeups, removing the need for the ring buffer commit "nowake" variants, which were used by events and function tracing. All commits can now safely use the normal commit, and the "nowake" variants can be removed. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
02404baf1b47123f1c88c9f9f1f3b00e1e2b10db	01-Nov-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Remove deprecated tracing_enabled file The tracing_enabled file was used as a quick way to stop tracers, and try to bring down overhead for things like the latency tracers (irqsoff, wakeup, etc). But it didn't work that well. The tracing_on file was created as a really fast way to stop recording into the ftrace ring buffer and can interact with the kernel. That is a tracing_off() call in the kernel can disable recording of events, and then from userspace one could echo 1 into the tracing_on file to continue it. The tracing_enabled function did too much to allow for this. The tracing_on has taken over as a way to start and stop tracing and the tracing_enabled file should not be used. But because of its existance, it still confuses people. Over a year ago the following commit was added: commit 6752ab4a9c30d5411b2dfdb251a3f1cb18aae487 Author: Steven Rostedt <srostedt@redhat.com> Date: Tue Feb 8 13:54:06 2011 -0500 tracing: Deprecate tracing_enabled for tracing_on This commit added a WARN_ON() if the tracing_enabled file's variable was changed. After this was added, only LatencyTop complained, and they soon fixed their tool as there was no reason that LatencyTop should touch this file as it was using the perf ring buffers which this file does not interact with. But since that time no one else has complained about this WARN_ON(). Thus it is safe to assume that this file is no longer needed. Time to get rid of it. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0fb9656d957d79dbe7ae155bb6533b1d465e4a50	11-May-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Make tracing_enabled be equal to tracing_on The tracing_enabled file has been deprecated as it never was able to serve its purpose well. The tracing_on file has taken over. Instead of having code to keep tracing_enabled, have the tracing_enabled file just set tracing_on, and remove the tracing_enabled variable. This allows us to remove the tracing_enabled file. The reason that the remove is in a different change set and not removed here is in case we find some lonely userspace tool that requires the file to exist. Then the removal patch will get reverted, but this one will not. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c7b84ecada9a8b7fe3e6c081e70801703897ed5d	12-May-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Remove unused function unregister_tracer() The function register_tracer() is only used by kernel core code, that never needs to remove the tracer. As trace_events have become the main way to add new tracing to the kernel, the need to unregister a tracer has diminished. Remove the unused function unregister_tracer(). If a need arises where we need it, then we can always add it back. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
60303ed3f4b9332b9aa9bc17c68bc174e7343e2d	12-Oct-2012	David Sharp <dhsharp@google.com>	tracing: Reset ring buffer when changing trace_clocks Because the "tsc" clock isn't in nanoseconds, the ring buffer must be reset when changing clocks so that incomparable timestamps don't end up in the same trace. Tested: Confirmed switching clocks resets the trace buffer. Google-Bug-Id: 6980623 Link: http://lkml.kernel.org/r/1349998076-15495-3-git-send-email-dhsharp@google.com Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Signed-off-by: David Sharp <dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7ffbd48d5cab22bcd1120eb2349db1319e2d827a	11-Oct-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Cache comms only after an event occurred Whenever an event is registered, the comm of tasks are saved at every task switch instead of saving them at every event. But if an event isn't executed much, the comm cache will be filled up by tasks that did not record the event and you lose out on the comms that did. Here's an example, if you enable the following events: echo 1 > /debug/tracing/events/kvm/kvm_cr/enable echo 1 > /debug/tracing/events/net/net_dev_xmit/enable Note, there's no kvm running on this machine so the first event will never be triggered, but because it is enabled, the storing of comms will continue. If we now disable the network event: echo 0 > /debug/tracing/events/net/net_dev_xmit/enable and look at the trace: cat /debug/tracing/trace sshd-2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 sshd-2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0 sshd-2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0 sshd-2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0 sshd-2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0 We see that process 2672 which triggered the events has the comm "sshd". But if we run hackbench for a bit and look again: cat /debug/tracing/trace <...>-2672 [001] ..s2 375.731616: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 375.731617: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 375.859356: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 375.859357: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 375.947351: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 375.947352: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 376.035383: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s1 376.035383: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=242 rc=0 <...>-2672 [001] ..s2 377.563806: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6de0 len=226 rc=0 <...>-2672 [001] ..s1 377.563807: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6de0 len=226 rc=0 <...>-2672 [001] ..s2 377.563834: net_dev_xmit: dev=eth0 skbaddr=ffff88005cbb6be0 len=114 rc=0 <...>-2672 [001] ..s1 377.563842: net_dev_xmit: dev=br0 skbaddr=ffff88005cbb6be0 len=114 rc=0 The stored "sshd" comm has been flushed out and we get a useless "<...>". But by only storing comms after a trace event occurred, we can run hackbench all day and still get the same output. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
81698831bc462ff16f76bc11249a1e492424da4c	11-Oct-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Enable comm recording if trace_printk() is used If comm recording is not enabled when trace_printk() is used then you just get this type of output: [ adding trace_printk("hello! %d", irq); in do_IRQ ] <...>-2843 [001] d.h. 80.812300: do_IRQ: hello! 14 <...>-2734 [002] d.h2 80.824664: do_IRQ: hello! 14 <...>-2713 [003] d.h. 80.829971: do_IRQ: hello! 14 <...>-2814 [000] d.h. 80.833026: do_IRQ: hello! 14 By enabling the comm recorder when trace_printk is enabled: hackbench-6715 [001] d.h. 193.233776: do_IRQ: hello! 21 sshd-2659 [001] d.h. 193.665862: do_IRQ: hello! 21 <idle>-0 [001] d.h1 193.665996: do_IRQ: hello! 21 Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b382ede6b5eb8188926b72a9ef42fd2354342a97	11-Oct-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Expand ring buffer when trace_printk() is used Since tracing is not used by 99% of Linux users, even though tracing may be configured in, it does not make sense to allocate 1.4 Megs per CPU for the ring buffers if they are not used. Thus, on boot up the ring buffers are set to a minimal size until something needs the and they are expanded. This works well for events and tracers (function, etc), but for the asynchronous use of trace_printk() which can write to the ring buffer at any time, does not expand the buffers. On boot up a check is made to see if any trace_printk() is used to see if the trace_printk() temp buffer pages should be allocated. This same code can be used to expand the buffers as well. Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
884bfe89a462fcc85c8abd96171519cf2fe70929	15-Jul-2011	Slava Pestov <slavapestov@google.com>	ring-buffer: Add a 'dropped events' counter The existing 'overrun' counter is incremented when the ring buffer wraps around, with overflow on (the default). We wanted a way to count requests lost from the buffer filling up with overflow off, too. I decided to add a new counter instead of retro-fitting the existing one because it seems like a different statistic to count conceptually, and also because of how the code was structured. Link: http://lkml.kernel.org/r/1310765038-26399-1-git-send-email-slavapestov@google.com Signed-off-by: Slava Pestov <slavapestov@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bcd83ea6cbfee54e33d1527b87538dc99ca2137b	26-Sep-2012	Daniel Walter <sahne@0x90.at>	tracing: Replace strict_strto* with kstrto* * remove old string conversions with kstrto* Link: http://lkml.kernel.org/r/20120926200838.GC1244@0x90.at Signed-off-by: Daniel Walter <sahne@0x90.at> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d55cb6cf143ae16eaa415baab520b8eaf4a1012f	09-Aug-2012	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>	ftrace: Allow stealing pages from pipe buffer Use generic steal operation on pipe buffer to allow stealing ring buffer's read page from pipe buffer. Note that this could reduce the performance of splice on the splice_write side operation without affinity setting. Since the ring buffer's read pages are allocated on the tracing-node, but the splice user does not always execute splice write side operation on the same node. In this case, the page will be accessed from the another node. Thus, it is strongly recommended to assign the splicing thread to corresponding node. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
5224c3a31549f1c056039545b289e1b01ed02f12	08-Sep-2012	Mandeep Singh Baines <mandeep.baines@gmail.com>	tracing: Add an option for disabling markers In our application, we have trace markers spread through user-space. We have markers in GL, X, etc. These are super handy for Chrome's about:tracing feature (Chrome + system + kernel trace view), but can be very distracting when you're trying to debug a kernel issue. I normally, use "grep -v tracing_mark_write" but it would be nice if I could just temporarily disable markers all together. Link: http://lkml.kernel.org/r/1347066739-26285-1-git-send-email-msb@chromium.org CC: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Mandeep Singh Baines <msb@chromium.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d20b92ab668cc44fc84bba0001839c5a8013a5cd	14-Mar-2012	Eric W. Biederman <ebiederm@xmission.com>	userns: Teach trace to use from_kuid - When tracing capture the kuid. - When displaying the data to user space convert the kuid into the user namespace of the process that opened the report file. Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
87abb3b15c62033409f5bf2ffb5620c94f91cf2c	02-Aug-2012	Wang Tianhong <wangthbj@linux.vnet.ibm.com>	tracing/trivial: Fix some typos in kernel/trace Fix some typos in kernel/trace. Link: http://lkml.kernel.org/r/1343887320.2228.9.camel@louis-ThinkPad-T410 Signed-off-by: Wang Tianhong <wangthbj@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b2ad368bebc0f772613668e893fa176396e9094c	10-Jul-2012	Anton Vorontsov <anton.vorontsov@linaro.org>	tracing: Fix initialization failure path in tracing_set_tracer() If tracer->init() fails, current code will leave current_tracer pointing to an unusable tracer, which at best makes 'current_tracer' report inaccurate value. Fix the issue by pointing current_tracer to nop tracer, and only update current_tracer with the new one after all the initialization succeeds. Signed-off-by: Anton Vorontsov <anton.vorontsov@linaro.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
93574fcc5b50cc7b8834698acb2ce947e5b6a5dc	11-Jul-2012	Dan Carpenter <dan.carpenter@oracle.com>	tracing: Check for allocation failure in __tracing_open() Clean up and return -ENOMEM on if the kzalloc() fails. This also prevents a potential crash, as the pointer that failed to allocate would be later used. Link: http://lkml.kernel.org/r/20120711063507.GF11812@elgon.mountain Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6d158a813efcd09661c23f16ddf7e2ff834cb20c	28-Jun-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Remove NR_CPUS array from trace_iterator Replace the NR_CPUS array of buffer_iter from the trace_iterator with an allocated array. This will just create an array of possible CPUS instead of the max number specified. The use of NR_CPUS in that array caused allocation failures for machines that were tight on memory. This did not cause any failures to the system itself (no crashes), but caused unnecessary failures for reading the trace files. Added a helper function called 'trace_buffer_iter()' that returns the buffer_iter item or NULL if it is not defined or the array was not allocated. Some routines do not require the array (tracing_open_pipe() for one). Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0be61ebc18b919dddbdbcd1c4f42513c310ecf59	18-Jun-2012	Steven Rostedt <srostedt@redhat.com>	tracing/selftest: Add a WARN_ON() if a tracer test fails Add a WARN_ON() output on test failures so that they are easier to detect in automated tests. Although, the WARN_ON() will not print if the test causes the system to crash, obviously. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
047fe3605235888f3ebcda0c728cb31937eadfe6	12-Jun-2012	Eric Dumazet <edumazet@google.com>	splice: fix racy pipe->buffers uses Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered by splice_shrink_spd() called from vmsplice_to_pipe() commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes) added capability to adjust pipe->buffers. Problem is some paths don't hold pipe mutex and assume pipe->buffers doesn't change for their duration. Fix this by adding nr_pages_max field in struct splice_pipe_desc, and use it in place of pipe->buffers where appropriate. splice_shrink_spd() loses its struct pipe_inode_info argument. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Tom Herbert <therbert@google.com> Cc: stable <stable@vger.kernel.org> # 2.6.35 Tested-by: Dave Jones <davej@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
f2bf1f6f5f89d031245067512449fc889b2f4bb2	07-Jun-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Have tracing_off() actually turn tracing off A recent update to have tracing_on/off() only affect the ftrace ring buffers instead of all ring buffers had a cut and paste error. The tracing_off() did the exact same thing as tracing_on() and would not actually turn off tracing. Unfortunately, tracing_off() is more important to be working than tracing_on() as this is a key development tool, as it lets the developer turn off tracing as soon as a problem is discovered. It is also used by panic and oops code. This bug also breaks the 'echo func:traceoff > set_ftrace_filter' Cc: <stable@vger.kernel.org> # 3.4 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
895b67fd5830ce18a6f1375a7c062fcf84b4b874	07-Nov-2011	Richard Weinberger <richard@nod.at>	tracing: Remove kernel_lock annotations The BKL is gone, these annotations are useless. Link: http://lkml.kernel.org/r/1320654202-4433-1-git-send-email-richard@nod.at Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a591c73f127505cdbd0aa399a92112a8ddff8730	03-May-2012	Vaibhav Nagarnaik <vnagarnaik@google.com>	tracing: Fix initial buffer_size_kb state Make sure that the state of buffer_size_kb is initialized correctly and returns actual size of the ring buffer. Link: http://lkml.kernel.org/r/1336066834-1673-1-git-send-email-vnagarnaik@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Laurent Chavey <chavey@google.com> Cc: Justin Teravest <teravest@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
71babb2705e2203a64c27ede13ae3508a0d2c16c	04-May-2012	Vaibhav Nagarnaik <vnagarnaik@google.com>	tracing: change CPU ring buffer state from tracing_cpumask According to Documentation/trace/ftrace.txt: tracing_cpumask: This is a mask that lets the user only trace on specified CPUS. The format is a hex string representing the CPUS. The tracing_cpumask currently doesn't affect the tracing state of per-CPU ring buffers. This patch enables/disables CPU recording as its corresponding bit in tracing_cpumask is set/unset. Link: http://lkml.kernel.org/r/1336096792-25373-3-git-send-email-vnagarnaik@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Laurent Chavey <chavey@google.com> Cc: Justin Teravest <teravest@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0a3d7ce7e6caa8c39cb5184bd9047a01a40abc2a	23-Apr-2012	Namhyung Kim <namhyung.kim@lge.com>	tracing: Check return value of tracing_dentry_percpu() If tracing_dentry_percpu() failed, tracing_init_debugfs_percpu() will try to create each cpu directories on debugfs' root directory as d_percpu is NULL. Link: http://lkml.kernel.org/r/1335143517-2285-1-git-send-email-namhyung.kim@lge.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Namhyung Kim <namhyung.kim@lge.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
83f40318dab00e3298a1f6d0b12ac025e84e478d	04-May-2012	Vaibhav Nagarnaik <vnagarnaik@google.com>	ring-buffer: Make removal of ring buffer pages atomic This patch adds the capability to remove pages from a ring buffer without destroying any existing data in it. This is done by removing the pages after the tail page. This makes sure that first all the empty pages in the ring buffer are removed. If the head page is one in the list of pages to be removed, then the page after the removed ones is made the head page. This removes the oldest data from the ring buffer and keeps the latest data around to be read. To do this in a non-racey manner, tracing is stopped for a very short time while the pages to be removed are identified and unlinked from the ring buffer. The pages are freed after the tracing is restarted to minimize the time needed to stop tracing. The context in which the pages from the per-cpu ring buffer are removed runs on the respective CPU. This minimizes the events not traced to only NMI trace contexts. Link: http://lkml.kernel.org/r/1336096792-25373-1-git-send-email-vnagarnaik@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Laurent Chavey <chavey@google.com> Cc: Justin Teravest <teravest@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6edb2a8a385f0cdef51dae37ff23e74d76d8a6ce	12-May-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Clean up tracing_mark_write() On gcc 4.5 the function tracing_mark_write() would give a warning of page2 being uninitialized. This is due to a bug in gcc because the logic prevents page2 from being used uninitialized, and gcc 4.6+ does not complain (correctly). Instead of adding a "unitialized" around page2, which could show a bug later on, I combined page1 and page2 into an array map_pages[]. This binds the two and the two are modified according to nr_pages (what gcc 4.5 seems to ignore). This no longer gives a warning with gcc 4.5 nor with gcc 4.6. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
68179686ac67cb08f08b1ef28b860d5ed899f242	09-May-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Remove ftrace_disable/enable_cpu() The ftrace_disable_cpu() and ftrace_enable_cpu() functions were needed back before the ring buffer was lockless. Now that the ring buffer is lockless (and has been for some time), these functions serve no purpose, and unnecessarily slow down operations of the tracer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
50e18b94c695644d824381e7574b9c44acc25ffe	25-Apr-2012	Jiri Olsa <jolsa@redhat.com>	tracing: Use seq_*_private interface for some seq files It's appropriate to use __seq_open_private interface to open some of trace seq files, because it covers all steps we are duplicating in tracing code - zallocating the iterator and setting it as seq_file's private. Using this for following files: trace available_filter_functions enabled_functions Link: http://lkml.kernel.org/r/1335342219-2782-5-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> [ Fixed warnings for: kernel/trace/trace.c: In function '__tracing_open': kernel/trace/trace.c:2418:11: warning: unused variable 'ret' [-Wunused-variable] kernel/trace/trace.c:2417:19: warning: unused variable 'm' [-Wunused-variable] ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
438ced1720b584000a9e8a4349d1f6bb7ee3ad6d	02-Feb-2012	Vaibhav Nagarnaik <vnagarnaik@google.com>	ring-buffer: Add per_cpu ring buffer control files Add a debugfs entry under per_cpu/ folder for each cpu called buffer_size_kb to control the ring buffer size for each CPU independently. If the global file buffer_size_kb is used to set size, the individual ring buffers will be adjusted to the given size. The buffer_size_kb will report the common size to maintain backward compatibility. If the buffer_size_kb file under the per_cpu/ directory is used to change buffer size for a specific CPU, only the size of the respective ring buffer is updated. When tracing/buffer_size_kb is read, it reports 'X' to indicate that sizes of per_cpu ring buffers are not equivalent. Link: http://lkml.kernel.org/r/1328212844-11889-1-git-send-email-vnagarnaik@google.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Justin Teravest <teravest@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5a26c8f0cf1e95106858bb4e23ca6dd14c9b842f	20-Apr-2012	Dan Carpenter <dan.carpenter@oracle.com>	tracing: Remove an unneeded check in trace_seq_buffer() memcpy() returns a pointer to "bug". Hopefully, it's not NULL here or we would already have Oopsed. Link: http://lkml.kernel.org/r/20120420063145.GA22649@elgon.mountain Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
07d777fe8c3985bc83428c2866713c2d1b3d4129	22-Sep-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Add percpu buffers for trace_printk() Currently, trace_printk() uses a single buffer to write into to calculate the size and format needed to save the trace. To do this safely in an SMP environment, a spin_lock() is taken to only allow one writer at a time to the buffer. But this could also affect what is being traced, and add synchronization that would not be there otherwise. Ideally, using percpu buffers would be useful, but since trace_printk() is only used in development, having per cpu buffers for something never used is a waste of space. Thus, the use of the trace_bprintk() format section is changed to be used for static fmts as well as dynamic ones. Then at boot up, we can check if the section that holds the trace_printk formats is non-empty, and if it does contain something, then we know a trace_printk() has been added to the kernel. At this time the trace_printk per cpu buffers are allocated. A check is also done at module load time in case a module is added that contains a trace_printk(). Once the buffers are allocated, they are never freed. If you use a trace_printk() then you should know what you are doing. A buffer is made for each type of context: normal softirq irq nmi The context is checked and the appropriate buffer is used. This allows for totally lockless usage of trace_printk(), and they no longer even disable interrupts. Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
348f0fc238efb441a28e7644c51f9fd3001b228a	16-Apr-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Fix regression with tracing_on The change to make tracing_on affect only the ftrace ring buffer, caused a bug where it wont affect any ring buffer. The problem was that the buffer of the trace_array was passed to the write function and not the trace array itself. The trace_array can change the buffer when running a latency tracer. If this happens, then the buffer being disabled may not be the buffer currently used by ftrace. This will cause the tracing_on file to become useless. The simple fix is to pass the trace_array to the write function instead of the buffer. Then the actual buffer may be changed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12b5da349a8b94c9dbc3430a6bc42eabd9eaf50b	27-Mar-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Fix ent_size in trace output When reading the trace file, the records of each of the per_cpu buffers are examined to find the next event to print out. At the point of looking at the event, the size of the event is recorded. But if the first event is chosen, the other events in the other CPU buffers will reset the event size that is stored in the iterator descriptor, causing the event size passed to the output functions to be incorrect. In most cases this is not a problem, but for the case of stack traces, it is. With the change to the stack tracing to record a dynamic number of back traces, the output depends on the size of the entry instead of the fixed 8 back traces. When the entry size is not correct, the back traces would not be fully printed. Note, reading from the per-cpu trace files were not affected. Reported-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b892e5c89787716b95a8e55d77d25a1c0748df10	02-Mar-2012	Steven Rostedt <srostedt@redhat.com>	tracing: Keep NMI watchdog from triggering when dumping trace As ftrace_dump() (called by ftrace_dump_on_oops) disables interrupts as it dumps its output to the console, it can keep interrupts disabled for long periods of time. This is likely to trigger the NMI watchdog, and it can disrupt the output of critical data. Add a touch_nmi_watchdog() to each event that is written to the screen to keep the NMI watchdog from affecting the output. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
499e547057f5bba5cd6f87ebe59b05d0c59da905	22-Feb-2012	Steven Rostedt <srostedt@redhat.com>	tracing/ring-buffer: Only have tracing_on disable tracing buffers As the ring-buffer code is being used by other facilities in the kernel, having tracing_on file disable all buffers is not a desired affect. It should only disable the ftrace buffers that are being used. Move the code into the trace.c file and use the buffer disabling for tracing_on() and tracing_off(). This way only the ftrace buffers will be affected by them and other kernel utilities will not be confused to why their output suddenly stopped. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1e42e83fde5537266c1d1e7fd8c010b3028d50fc	08-Feb-2012	Geunsik Lim <geunsik.lim@samsung.com>	ftrace: sched_switch plugin is deprecated Actually, sched_switch function tracer is merged into wakeup/wakeup_rt Update 'mini-HOWTO' for ftrace(Kernel function tracer). If we want to trace "sched:sched_switch" to trace sched_switch func, We may utilize event option.(e.g: trace-cmd list -e \| grep sched) This patch is based on Linux-3.3.rc2-SMP-PREEMPT Link: http://lkml.kernel.org/r/1328695537-15081-1-git-send-email-geunsik.lim@gmail.com Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f4ae40a6a50a98ac23d4b285f739455e926a473e	24-Jul-2011	Al Viro <viro@zeniv.linux.org.uk>	switch debugfs to umode_t Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
a8eecf2248a45bf69f0625b23c003ad2ccd765ee	02-Oct-2011	Paul E. McKenney <paul.mckenney@linaro.org>	trace: Allow ftrace_dump() to be called from modules Add an EXPORT_SYMBOL_GPL() so that rcutorture can dump the trace buffer upon detection of an RCU error. Signed-off-by: Paul E. McKenney <paul.mckenney@linaro.org> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org>
39eaf7ef884dcc44f7ff1bac803ca2a1dcf43544	17-Nov-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Add entries in buffer and total entries to default output header Knowing the number of event entries in the ring buffer compared to the total number that were written is useful information. The latency format gives this information and there's no reason that the default format does not. This information is now added to the default header, along with the number of online CPUs: # tracer: nop # # entries-in-buffer/entries-written: 159836/64690869 #P:4 # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| <idle>-0 [000] ...2 49.442971: local_touch_nmi <-cpu_idle <idle>-0 [000] d..2 49.442973: enter_idle <-cpu_idle <idle>-0 [000] d..2 49.442974: atomic_notifier_call_chain <-enter_idle <idle>-0 [000] d..2 49.442976: __atomic_notifier_call_chain <-atomic_notifier The above shows that the trace contains 159836 entries, but 64690869 were written. One could figure out that there were 64531033 entries that were dropped. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
77271ce4b2c0df0a76ad1cbb6a95b07e1f88c1ea	17-Nov-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Add irq, preempt-count and need resched info to default trace output People keep asking how to get the preempt count, irq, and need resched info and we keep telling them to enable the latency format. Some developers think that traces without this info is completely useless, and for a lot of tasks it is useless. The first option was to enable the latency trace as the default format, but the header for the latency format is pretty useless for most tracers and it also does the timestamp in straight microseconds from the time the trace started. This is sometimes more difficult to read as the default trace is seconds from the start of boot up. Latency format: # tracer: nop # # nop latency trace v1.1.5 on 3.2.0-rc1-test+ # -------------------------------------------------------------------- # latency: 0 us, #159771/64234230, CPU#1 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # \| task: -0 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / migratio-6 0...2 41778231us+: rcu_note_context_switch <-__schedule migratio-6 0...2 41778233us : trace_rcu_utilization <-rcu_note_context_switch migratio-6 0...2 41778235us+: rcu_sched_qs <-rcu_note_context_switch migratio-6 0d..2 41778236us+: rcu_preempt_qs <-rcu_note_context_switch migratio-6 0...2 41778238us : trace_rcu_utilization <-rcu_note_context_switch migratio-6 0...2 41778239us+: debug_lockdep_rcu_enabled <-__schedule default format: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # \| \| \| \| \| migration/0-6 [000] 50.025810: rcu_note_context_switch <-__schedule migration/0-6 [000] 50.025812: trace_rcu_utilization <-rcu_note_context_switch migration/0-6 [000] 50.025813: rcu_sched_qs <-rcu_note_context_switch migration/0-6 [000] 50.025815: rcu_preempt_qs <-rcu_note_context_switch migration/0-6 [000] 50.025817: trace_rcu_utilization <-rcu_note_context_switch migration/0-6 [000] 50.025818: debug_lockdep_rcu_enabled <-__schedule migration/0-6 [000] 50.025820: debug_lockdep_rcu_enabled <-__schedule The latency format header has latency information that is pretty meaningless for most tracers. Although some of the header is useful, and we can add that later to the default format as well. What is really useful with the latency format is the irqs-off, need-resched hard/softirq context and the preempt count. This commit adds the option irq-info which is on by default that adds this information: # tracer: nop # # _-----=> irqs-off # / _----=> need-resched # \| / _---=> hardirq/softirq # \|\| / _--=> preempt-depth # \|\|\| / delay # TASK-PID CPU# \|\|\|\| TIMESTAMP FUNCTION # \| \| \| \|\|\|\| \| \| <idle>-0 [000] d..2 49.309305: cpuidle_get_driver <-cpuidle_idle_call <idle>-0 [000] d..2 49.309307: mwait_idle <-cpu_idle <idle>-0 [000] d..2 49.309309: need_resched <-mwait_idle <idle>-0 [000] d..2 49.309310: test_ti_thread_flag <-need_resched <idle>-0 [000] d..2 49.309312: trace_power_start.constprop.13 <-mwait_idle <idle>-0 [000] d..2 49.309313: trace_cpu_idle <-mwait_idle <idle>-0 [000] d..2 49.309315: need_resched <-mwait_idle If a user wants the old format, they can disable the 'irq-info' option: # tracer: nop # # TASK-PID CPU# TIMESTAMP FUNCTION # \| \| \| \| \| <idle>-0 [000] 49.309305: cpuidle_get_driver <-cpuidle_idle_call <idle>-0 [000] 49.309307: mwait_idle <-cpu_idle <idle>-0 [000] 49.309309: need_resched <-mwait_idle <idle>-0 [000] 49.309310: test_ti_thread_flag <-need_resched <idle>-0 [000] 49.309312: trace_power_start.constprop.13 <-mwait_idle <idle>-0 [000] 49.309313: trace_cpu_idle <-mwait_idle <idle>-0 [000] 49.309315: need_resched <-mwait_idle Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7e9a49ef542610609144d1afcd516dc3fafac4d6	07-Nov-2011	Jiri Olsa <jolsa@redhat.com>	tracing/latency: Fix header output for latency tracers In case the the graph tracer (CONFIG_FUNCTION_GRAPH_TRACER) or even the function tracer (CONFIG_FUNCTION_TRACER) are not set, the latency tracers do not display proper latency header. The involved/fixed latency tracers are: wakeup_rt wakeup preemptirqsoff preemptoff irqsoff The patch adds proper handling of tracer configuration options for latency tracers, and displaying correct header info accordingly. * The current output (for wakeup tracer) with both graph and function tracers disabled is: # tracer: wakeup # <idle>-0 0d.h5 1us+: 0:120:R + [000] 7: 0:R watchdog/0 <idle>-0 0d.h5 3us+: ttwu_do_activate.clone.1 <-try_to_wake_up ... * The fixed output is: # tracer: wakeup # # wakeup latency trace v1.1.5 on 3.1.0-tip+ # -------------------------------------------------------------------- # latency: 55 us, #4/4, CPU#0 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # \| task: migration/0-6 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / cat-1129 0d..4 1us : 1129:120:R + [000] 6: 0:R migration/0 cat-1129 0d..4 2us+: ttwu_do_activate.clone.1 <-try_to_wake_up * The current output (for wakeup tracer) with only function tracer enabled is: # tracer: wakeup # cat-1140 0d..4 1us+: 1140:120:R + [000] 6: 0:R migration/0 cat-1140 0d..4 2us : ttwu_do_activate.clone.1 <-try_to_wake_up * The fixed output is: # tracer: wakeup # # wakeup latency trace v1.1.5 on 3.1.0-tip+ # -------------------------------------------------------------------- # latency: 207 us, #109/109, CPU#1 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:2) # ----------------- # \| task: watchdog/1-12 (uid:0 nice:0 policy:1 rt_prio:99) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / <idle>-0 1d.h5 1us+: 0:120:R + [001] 12: 0:R watchdog/1 <idle>-0 1d.h5 3us : ttwu_do_activate.clone.1 <-try_to_wake_up Link: http://lkml.kernel.org/r/20111107150849.GE1807@m.brq.redhat.com Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
436fc280261dcfce5af38f08b89287750dc91cd2	14-Oct-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Fix returning of duplicate data after EOF in trace_pipe_raw The trace_pipe_raw handler holds a cached page from the time the file is opened to the time it is closed. The cached page is used to handle the case of the user space buffer being smaller than what was read from the ring buffer. The left over buffer is held in the cache so that the next read will continue where the data left off. After EOF is returned (no more data in the buffer), the index of the cached page is set to zero. If a user app reads the page again after EOF, the check in the buffer will see that the cached page is less than page size and will return the cached page again. This will cause reading the trace_pipe_raw again after EOF to return duplicate data, making the output look like the time went backwards but instead data is just repeated. The fix is to not reset the index right after all data is read from the cache, but to reset it after all data is read and more data exists in the ring buffer. Cc: stable <stable@kernel.org> Reported-by: Jeremy Eder <jeder@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
9b5f8b31af57a8ce9e9f77864d9143b5e3304815	12-Aug-2011	Geunsik Lim <geunsik.lim@samsung.com>	ftrace: Fix README to state tracing_on to start/stop tracing tracing_enabled option is deprecated. To start/stop tracing, write to /sys/kernel/debug/tracing/tracing_on without tracing_enabled. This patch is based on Linux 3.1.0-rc1 Signed-off-by: Geunsik Lim <geunsik.lim@samsung.com> Link: http://lkml.kernel.org/r/1313127022-23830-1-git-send-email-leemgs1@gmail.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d696b58ca2c3ca76e784ef89a7e0453d9b7ab187	22-Sep-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Do not allocate buffer for trace_marker When doing intense tracing, the kmalloc inside trace_marker can introduce side effects to what is being traced. As trace_marker() is used by userspace to inject data into the kernel ring buffer, it needs to do so with the least amount of intrusion to the operations of the kernel or the user space application. As the ring buffer is designed to write directly into the buffer without the need to make a temporary buffer, and userspace already went through the hassle of knowing how big the write will be, we can simply pin the userspace pages and write the data directly into the buffer. This improves the impact of tracing via trace_marker tremendously! Thanks to Peter Zijlstra and Thomas Gleixner for pointing out the use of get_user_pages_fast() and kmap_atomic(). Suggested-by: Thomas Gleixner <tglx@linutronix.de> Suggested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e0a413f619ef8bc366dafc6f8221674993b8d85f	30-Sep-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Warn on output if the function tracer was found corrupted As the function tracer is very intrusive, lots of self checks are performed on the tracer and if something is found to be strange it will shut itself down keeping it from corrupting the rest of the kernel. This shutdown may still allow functions to be traced, as the tracing only stops new modifications from happening. Trying to stop the function tracer itself can cause more harm as it requires code modification. Although a WARN_ON() is executed, a user may not notice it. To help the user see that something isn't right with the tracing of the system a big warning is added to the output of the tracer that lets the user know that their data may be incomplete. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6249687f76b69cc0b2ad34636f4a18d693ef3262	19-Sep-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Add a counter clock for those that do not trust clocks When debugging tight race conditions, it can be helpful to have a synchronized tracing method. Although in most cases the global clock provides this functionality, if timings is not the issue, it is more comforting to know that the order of events really happened in a precise order. Instead of using a clock, add a "counter" that is simply an incrementing atomic 64bit counter that orders the events as they are perceived to happen. The trace_clock_counter() is added from the attempt by Peter Zijlstra trying to convert the trace_clock_global() to it. I took Peter's counter code and made trace_clock_counter() instead, and added it to the choice of clocks. Just echo counter > /debug/tracing/trace_clock to activate it. Requested-by: Thomas Gleixner <tglx@linutronix.de> Requested-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Reviewed-By: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5389f6fad27019f2ba78f1b332f719ec05f12a42	25-Jul-2009	Thomas Gleixner <tglx@linutronix.de>	locking, tracing: Annotate tracing locks as raw The tracing locks can be taken in atomic context and therefore cannot be preempted on -rt - annotate it. In mainline this change documents the low level nature of the lock - otherwise there's no functional difference. Lockdep and Sparse checking will work as usual. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c64e148a3be3cb786534ad38298c25c833116c26	16-Aug-2011	Vaibhav Nagarnaik <vnagarnaik@google.com>	trace: Add ring buffer stats to measure rate of events The stats file under per_cpu folder provides the number of entries, overruns and other statistics about the CPU ring buffer. However, the numbers do not provide any indication of how full the ring buffer is in bytes compared to the overall size in bytes. Also, it is helpful to know the rate at which the cpu buffer is filling up. This patch adds an entry "bytes: " in printed stats for per_cpu ring buffer which provides the actual bytes consumed in the ring buffer. This field includes the number of bytes used by recorded events and the padding bytes added when moving the tail pointer to next page. It also adds the following time stamps: "oldest event ts:" - the oldest timestamp in the ring buffer "now ts:" - the timestamp at the time of reading The field "now ts" provides a consistent time snapshot to the userspace when being read. This is read from the same trace clock used by tracing event timestamps. Together, these values provide the rate at which the buffer is filling up, from the formula: bytes / (now_ts - oldest_event_ts) Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Link: http://lkml.kernel.org/r/1313531179-9323-3-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f81ab074c30234b07c8309c542cafd07bed721f7	16-Aug-2011	Vaibhav Nagarnaik <vnagarnaik@google.com>	trace: Add a new readonly entry to report total buffer size The current file "buffer_size_kb" reports the size of per-cpu buffer and not the overall memory allocated which could be misleading. A new file "buffer_total_size_kb" adds up all the enabled CPU buffer sizes and reports it. This is only a readonly entry. Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Link: http://lkml.kernel.org/r/1313531179-9323-2-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4a9bd3f134decd6d16ead8d288342d57aad486be	14-Jul-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Have dynamic size event stack traces Currently the stack trace per event in ftace is only 8 frames. This can be quite limiting and sometimes useless. Especially when the "ignore frames" is wrong and we also use up stack frames for the event processing itself. Change this to be dynamic by adding a percpu buffer that we can write a large stack frame into and then copy into the ring buffer. For interrupts and NMIs that come in while another event is being process, will only get to use the 8 frame stack. That should be enough as the task that it interrupted will have the full stack frame anyway. Requested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1fd8df2c3970c9e7e4e262354154ee39e58bdd7c	08-Jun-2011	Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>	tracing/kprobes: Fix kprobe-tracer to support stack trace Fix to support kernel stack trace correctly on kprobe-tracer. Since the execution path of kprobe-based dynamic events is different from other tracepoint-based events, normal ftrace_trace_stack() doesn't work correctly. To fix that, this introduces ftrace_trace_stack_regs() which traces stack via pt_regs instead of current stack register. e.g. # echo p schedule+4 > /sys/kernel/debug/tracing/kprobe_events # echo 1 > /sys/kernel/debug/tracing/options/stacktrace # echo 1 > /sys/kernel/debug/tracing/events/kprobes/enable # head -n 20 /sys/kernel/debug/tracing/trace bash-2968 [000] 10297.050245: p_schedule_4: (schedule+0x4/0x4ca) bash-2968 [000] 10297.050247: <stack trace> => schedule_timeout => n_tty_read => tty_read => vfs_read => sys_read => system_call_fastpath kworker/0:1-2940 [000] 10297.050265: p_schedule_4: (schedule+0x4/0x4ca) kworker/0:1-2940 [000] 10297.050266: <stack trace> => worker_thread => kthread => kernel_thread_helper sshd-1132 [000] 10297.050365: p_schedule_4: (schedule+0x4/0x4ca) sshd-1132 [000] 10297.050365: <stack trace> => sysret_careful Note: Even with this fix, the first entry will be skipped if the probe is put on the function entry area before the frame pointer is set up (usually, that is 4 bytes (push %bp; mov %sp %bp) on x86), because stack unwinder depends on the frame pointer. Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: yrl.pp-manager.tt@hitachi.com Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Namhyung Kim <namhyung@gmail.com> Link: http://lkml.kernel.org/r/20110608070934.17777.17116.stgit@fedora15 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
22fe9b54d859e53bfbbbdc1a0a77a82bc453927c	07-Jun-2011	Peter Huewe <peterhuewe@gmx.de>	tracing: Convert to kstrtoul_from_user This patch replaces the code for getting an unsigned long from a userspace buffer by a simple call to kstroul_from_user. This makes it easier to read and less error prone. Signed-off-by: Peter Huewe <peterhuewe@gmx.de> Link: http://lkml.kernel.org/r/1307476707-14762-1-git-send-email-peterhuewe@gmx.de Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f56e7f8efb4ec200364f690a9902713410e24d47	03-Jun-2011	Jiri Olsa <jolsa@redhat.com>	tracing, function: Fix trace header to follow context-info option The header display of function tracer does not follow the context-info option, so field names are displayed even if this option is off. Added check for TRACE_ITER_CONTEXT_INFO trace_flags. With following commands: # echo function > ./current_tracer # echo 0 > options/context-info # cat trace This is what it looked like before: # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # \| \| \| \| \| add_preempt_count <-schedule rcu_note_context_switch <-schedule ... This is what it looks like now: # tracer: function # _raw_spin_unlock_irqrestore <-hrtimer_try_to_cancel ... Signed-off-by: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1307113131-10045-4-git-send-email-jolsa@redhat.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cf30cf67d6c7592c670ec946d89fc15ee0deb0eb	15-Jun-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Add disable_on_free option Add a trace option to disable tracing on free. When this option is set, a write into the free_buffer file will not only shrink the ring buffer down to zero, but it will also disable tracing. Cc: Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
4f271a2a60c748599b30bb4dafff30d770439b96	14-Jun-2011	Vaibhav Nagarnaik <vnagarnaik@google.com>	tracing: Add a proc file to stop tracing and free buffer The proc file entry buffer_size_kb is used to set the size of tracing buffer. The memory to expand the buffer size is kernel memory. Consider a use case where tracing is handled by a user space utility, which acts as a gate keeper for tracing requests. In an OOM condition, tracing is considered a low priority task and if the utility gets killed the ring buffer memory cannot be released back to the kernel. This patch adds a proc file called "free_buffer" whose purpose is to stop tracing and free up the ring buffer when it is closed. The user space process can then set the desired size in buffer_size_kb file and open the fd to the "free_buffer" file. Under OOM condition, if the process gets killed, the kernel closes the file descriptor. The release handler stops the tracing and releases the kernel memory automatically. Cc: Ingo Molnar <mingo@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7ea5906405a1f3fc1c0033dfd7e02f2cfd1de5e5	04-May-2011	Vaibhav Nagarnaik <vnagarnaik@google.com>	tracing: Use NUMA allocation for per-cpu ring buffer pages The tracing ring buffer is a group of per-cpu ring buffers where allocation and logging is done on a per-cpu basis. The events that are generated on a particular CPU are logged in the corresponding buffer. This is to provide wait-free writes between CPUs and good NUMA node locality while accessing the ring buffer. However, the allocation routines consider NUMA locality only for buffer page metadata and not for the actual buffer page. This causes the pages to be allocated on the NUMA node local to the CPU where the allocation routine is running at the time. This patch fixes the problem by using a NUMA node specific allocation routine so that the pages are allocated from a NUMA node local to the logging CPU. I tested with the getuid_microbench from autotest. It is a simple binary that calls getuid() in a loop and measures the average time for the syscall to complete. The following command was used to test: $ getuid_microbench 1000000 Compared the numbers found on kernel with and without this patch and found that logging latency decreases by 30-50 ns/call. tracing with non-NUMA allocation - 569 ns/call tracing with NUMA allocation - 512 ns/call Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e7e2ee89a9dbf48d70a922d5625cd7320a27cbff	10-May-2011	Vaibhav Nagarnaik <vnagarnaik@google.com>	tracing: Schedule a delayed work to call wakeup() In using syscall tracing by concurrent processes, the wakeup() that is called in the event commit function causes contention on the spin lock of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid tracepoints, and by running getuid_microbench from autotest in parallel I found that the contention causes exponential latency increase in the tracing path. The autotest binary getuid_microbench calls getuid() in a tight loop for the given number of iterations and measures the average time required to complete a single invocation of syscall. The patch schedules a delayed work after 2 ms once an event commit calls to wake up the trace wait_queue. This removes the delay caused by contention on spin lock in wakeup() and amortizes the wakeup() calls scheduled over the 2 ms period. In the following example, the script enables the sys_enter_getuid and sys_exit_getuid tracepoints and runs the getuid_microbench in parallel with the given number of processes. The output clearly shows the latency increase caused by contentions. $ ~/getuid.sh 1 1000000 calls in 0.720974253 s (720.974253 ns/call) $ ~/getuid.sh 2 1000000 calls in 1.166457554 s (1166.457554 ns/call) 1000000 calls in 1.168933765 s (1168.933765 ns/call) $ ~/getuid.sh 3 1000000 calls in 1.783827516 s (1783.827516 ns/call) 1000000 calls in 1.795553270 s (1795.553270 ns/call) 1000000 calls in 1.796493376 s (1796.493376 ns/call) $ ~/getuid.sh 4 1000000 calls in 4.483041796 s (4483.041796 ns/call) 1000000 calls in 4.484165388 s (4484.165388 ns/call) 1000000 calls in 4.484850762 s (4484.850762 ns/call) 1000000 calls in 4.485643576 s (4485.643576 ns/call) $ ~/getuid.sh 5 1000000 calls in 6.497521653 s (6497.521653 ns/call) 1000000 calls in 6.502000236 s (6502.000236 ns/call) 1000000 calls in 6.501709115 s (6501.709115 ns/call) 1000000 calls in 6.502124100 s (6502.124100 ns/call) 1000000 calls in 6.502936358 s (6502.936358 ns/call) After the patch, the latencies scale better. 1000000 calls in 0.728720455 s (728.720455 ns/call) 1000000 calls in 0.842782857 s (842.782857 ns/call) 1000000 calls in 0.883803135 s (883.803135 ns/call) 1000000 calls in 0.902077764 s (902.077764 ns/call) 1000000 calls in 0.902838202 s (902.838202 ns/call) 1000000 calls in 0.908896885 s (908.896885 ns/call) 1000000 calls in 0.932523515 s (932.523515 ns/call) 1000000 calls in 0.958009672 s (958.009672 ns/call) 1000000 calls in 0.986188020 s (986.188020 ns/call) 1000000 calls in 0.989771102 s (989.771102 ns/call) 1000000 calls in 0.933518391 s (933.518391 ns/call) 1000000 calls in 0.958897947 s (958.897947 ns/call) 1000000 calls in 1.031038897 s (1031.038897 ns/call) 1000000 calls in 1.089516025 s (1089.516025 ns/call) 1000000 calls in 1.141998347 s (1141.998347 ns/call) Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Rubin <mrubin@google.com> Cc: David Sharp <dhsharp@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.com Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a3a4a5acd3bd2f6f1e102e1f1b9d2e2bb320a7fd	06-May-2011	Arjan van de Ven <arjan@linux.intel.com>	Regression: partial revert "tracing: Remove lock_depth from event entry" This partially reverts commit e6e1e2593592a8f6f6380496655d8c6f67431266. That commit changed the structure layout of the trace structure, which in turn broke PowerTOP (1.9x generation) quite badly. I appreciate not wanting to expose the variable in question, and PowerTOP was not using it, so I've replaced the variable with just a padding field - that way if in the future a new field is needed it can just use this padding field. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
ee5e51f51be755830f57445e268ba50e88ccbdbb	25-Mar-2011	Jiri Olsa <jolsa@redhat.com>	tracing: Avoid soft lockup in trace_pipe running following commands: # enable the binary option echo 1 > ./options/bin # disable context info option echo 0 > ./options/context-info # tracing only events echo 1 > ./events/enable cat trace_pipe plus forcing system to generate many tracing events, is causing lockup (in NON preemptive kernels) inside tracing_read_pipe function. The issue is also easily reproduced by running ltp stress test. (ftrace_stress_test.sh) The reasons are: - bin/hex/raw output functions for events are set to trace_nop_print function, which prints nothing and returns TRACE_TYPE_HANDLED value - LOST EVENT trace do not handle trace_seq overflow These reasons force the while loop in tracing_read_pipe function never to break. The attached patch fixies handling of lost event trace, and changes trace_nop_print to print minimal info, which is needed for the correct tracing_read_pipe processing. v2 changes: - omit the cond_resched changes by trace_nop_print changes - WARN changed to WARN_ONCE and added info to be able to find out the culprit v3 changes: - make more accurate patch comment Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <20110325110518.GC1922@jolsa.brq.redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
25985edcedea6396277003854657b5f3cb31a628	31-Mar-2011	Lucas De Marchi <lucas.demarchi@profusion.mobi>	Fix common misspellings Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
4a0b1665db09cf2da9ad7d0f12da386373c10bfa	10-Mar-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Fix irqoff selftest expanding max buffer If the kernel command line declares a tracer "ftrace=sometracer" and that tracer is either not defined or is enabled after irqsoff, then the irqs off selftest will fail with the following error: Testing tracer irqsoff: ------------[ cut here ]------------ WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra ce.c:713 update_max_tr_single+0xfa/0x11b() Hardware name: Modules linked in: Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1 Call Trace: [<c0441d9d>] ? warn_slowpath_common+0x65/0x7a [<c049adb2>] ? update_max_tr_single+0xfa/0x11b [<c0441dc1>] ? warn_slowpath_null+0xf/0x13 [<c049adb2>] ? update_max_tr_single+0xfa/0x11b [<c049e454>] ? stop_critical_timing+0x154/0x204 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [<c049e529>] ? time_hardirqs_on+0x25/0x28 [<c0468bca>] ? trace_hardirqs_on_caller+0x18/0x12f [<c0468cec>] ? trace_hardirqs_on+0xb/0xd [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1 [<c049b6b8>] ? register_tracer+0xf8/0x1a3 [<c14e93fe>] ? init_irqsoff_tracer+0xd/0x11 [<c040115e>] ? do_one_initcall+0x71/0x121 [<c14e93f1>] ? init_irqsoff_tracer+0x0/0x11 [<c14ce3a9>] ? kernel_init+0x13a/0x1b6 [<c14ce26f>] ? kernel_init+0x0/0x1b6 [<c0403842>] ? kernel_thread_helper+0x6/0x10 ---[ end trace e93713a9d40cd06c ]--- .. no entries found ..FAILED! What happens is the "ftrace=..." will expand the ring buffer to its default size (from its minimum size) but it will not expand the max ring buffer (the ring buffer to store maximum latencies). When the irqsoff test runs, it will call the ring buffer swap routine that checks if the max ring buffer is the same size as the normal ring buffer, and will fail if it is not. This causes the test to fail. The solution is to expand the max ring buffer before running the self test if the max ring buffer is used by that tracer and the normal ring buffer is expanded. The max ring buffer should be shrunk again after the test is done to save space. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e6e1e2593592a8f6f6380496655d8c6f67431266	09-Mar-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Remove lock_depth from event entry The lock_depth field in the event headers was added as a temporary data point for help in removing the BKL. Now that the BKL is pretty much been removed, we can remove this field. This in turn changes the header from 12 bytes to 8 bytes, removing the 4 byte buffer that gcc would insert if the first field in the data load was 8 bytes in size. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
750912fa366312e9c5bc83eab352898a26750401	08-Dec-2010	David Sharp <dhsharp@google.com>	tracing: Add an 'overwrite' trace_option. Add an "overwrite" trace_option for ftrace to control whether the buffer should be overwritten on overflow or not. The default remains to overwrite old events when the buffer is full. This patch adds the option to instead discard newest events when the buffer is full. This is useful to get a snapshot of traces just after enabling traces. Dropping the current event is also a simpler code path. Signed-off-by: David Sharp <dhsharp@google.com> LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
6752ab4a9c30d5411b2dfdb251a3f1cb18aae487	08-Feb-2011	Steven Rostedt <srostedt@redhat.com>	tracing: Deprecate tracing_enabled for tracing_on tracing_enabled should not be used, it is heavy weight and does not do much in helping lower the overhead. tracing_on should be used instead. Warn users to use tracing_on when tracing_enabled is used as it will soon be removed from the tracing directory. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1dbd1951f39e13da579ffe879cce19586d0462de	09-Dec-2010	Li Zefan <lizf@cn.fujitsu.com>	tracing: Fix preempt count leak While running my ftrace stress test, this showed up: BUG: sleeping function called from invalid context at mm/mmap.c:233 ... note: cat[3293] exited with preempt_count 1 The bug was introduced by commit 91e86e560d0b3ce4c5fc64fd2bbb99f856a30a4e ("tracing: Fix recursive user stack trace") Cc: <stable@kernel.org> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4D0089AC.1020802@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
364829b1263b44aa60383824e4c1289d83d78ca7	25-Nov-2010	Slava Pestov <slavapestov@google.com>	tracing: Fix panic when lseek() called on "trace" opened for writing The file_ops struct for the "trace" special file defined llseek as seq_lseek(). However, if the file was opened for writing only, seq_open() was not called, and the seek would dereference a null pointer, file->private_data. This patch introduces a new wrapper for seq_lseek() which checks if the file descriptor is opened for reading first. If not, it does nothing. Cc: <stable@kernel.org> Signed-off-by: Slava Pestov <slavapestov@google.com> LKML-Reference: <1290640396-24179-1-git-send-email-slavapestov@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
451a3c24b0135bce54542009b5fde43846c7cf67	17-Nov-2010	Arnd Bergmann <arnd@arndb.de>	BKL: remove extraneous #include <smp_lock.h> The big kernel lock has been removed from all these files at some point, leaving only the #include. Remove this too as a cleanup. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
91e86e560d0b3ce4c5fc64fd2bbb99f856a30a4e	10-Nov-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Fix recursive user stack trace The user stack trace can fault when examining the trace. Which would call the do_page_fault handler, which would trace again, which would do the user stack trace, which would fault and call do_page_fault again ... Thus this is causing a recursive bug. We need to have a recursion detector here. [ Resubmitted by Jiri Olsa ] [ Eric Dumazet recommended using __this_cpu_* instead of __get_cpu_* ] Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <1289390172-9730-3-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dd49a38cf30944be27892c10b1c0e5b3fa73bcb2	21-Oct-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Do not limit the size of the number of CPU buffers The tracing per_cpu buffers were limited to 999 CPUs for a mear savings in stack space of a char array. Up the array to 30 characters which is more than enough to hold a 64 bit number. Reported-by: Robin Holt <holt@sgi.com> Suggested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
907f27840985fe6a0c62e43cd4702c6e04b4bcc7	28-Sep-2010	matt mooney <mfm@muteddisk.com>	tracing/trivial: Remove cast from void* Unnecessary cast from void* in assignment. Signed-off-by: matt mooney <mfm@muteddisk.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1aa54bca6ee0d07ebcafb8ca8074b624d80724aa	28-Jul-2010	Marcin Slusarz <marcin.slusarz@gmail.com>	tracing: Sanitize value returned from write(trace_marker, "...", len) When userspace code writes non-new-line-terminated string to trace_marker file, write handler appends new-line and returns number of bytes written to trace buffer, so write(fd, "abc", 3) will return 4 That's unexpected and unfortunately it confuses glibc's fprintf function. Example: int main() { fprintf(stderr, "abc"); return 0; } $ gcc test.c -o test $ echo mmiotrace > /sys/kernel/debug/tracing/current_tracer $ ./test 2>/sys/kernel/debug/tracing/trace_marker results in infinite loop: write(fd, "abc", 3) = 4 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 write(fd, "", 1) = 0 (...) ...and kernel trace buffer full of empty markers. Fix it by sanitizing write return value. Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com> LKML-Reference: <20100727231801.GB2826@joi.lan> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
955b61e597984745fb7d34c75708f6503b6aaeab	05-Aug-2010	Jason Wessel <jason.wessel@windriver.com>	ftrace,kdb: Extend kdb to be able to dump the ftrace buffer Add in a helper function to allow the kdb shell to dump the ftrace buffer. Modify trace.c to expose the capability to iterate over the ftrace buffer in a read only capacity. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> CC: Frederic Weisbecker <fweisbec@gmail.com>
24a461d537f49f9da6533d83100999ea08c6c755	10-Jul-2010	Dan Carpenter <error27@gmail.com>	trace: strlen() return doesn't account for the NULL We need to add one to the strlen() return because of the NULL character. The type->name here generally comes from the kernel and I don't think any of them come close to being MAX_TRACER_SIZE (100) characters long so this is basically a cleanup. Signed-off-by: Dan Carpenter <error27@gmail.com> LKML-Reference: <20100710100644.GV19184@bicker> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ef710e100c1068d3dd5774d2b34c5485219e06ce	01-Jul-2010	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>	tracing: Shrink max latency ringbuffer if unnecessary Documentation/trace/ftrace.txt says buffer_size_kb: This sets or displays the number of kilobytes each CPU buffer can hold. The tracer buffers are the same size for each CPU. The displayed number is the size of the CPU buffer and not total size of all buffers. The trace buffers are allocated in pages (blocks of memory that the kernel uses for allocation, usually 4 KB in size). If the last page allocated has room for more bytes than requested, the rest of the page will be used, making the actual allocation bigger than requested. ( Note, the size may not be a multiple of the page size due to buffer management overhead. ) This can only be updated when the current_tracer is set to "nop". But it's incorrect. currently total memory consumption is 'buffer_size_kb x CPUs x 2'. Why two times difference is there? because ftrace implicitly allocate the buffer for max latency too. That makes sad result when admin want to use large buffer. (If admin want full logging and makes detail analysis). example, If admin have 24 CPUs machine and write 200MB to buffer_size_kb, the system consume ~10GB memory (200MB x 24 x 2). umm.. 5GB memory waste is usually unacceptable. Fortunatelly, almost all users don't use max latency feature. The max latency buffer can be disabled easily. This patch shrink buffer size of the max latency buffer if unnecessary. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> LKML-Reference: <20100701104554.DA2D.A69D9226@jp.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e870e9a1240bcef1157ffaaf71dac63362e71904	02-Jul-2010	Li Zefan <lizf@cn.fujitsu.com>	tracing: Allow to disable cmdline recording We found that even enabling a single trace event that will rarely be triggered can add big overhead to context switch. (lmbench context switch test) ------------------------------------------------- 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ------ ------ ------ ------ ------ ------- ------- 2.19 2.3 2.21 2.56 2.13 2.54 2.07 2.39 2.51 2.35 2.75 2.27 2.81 2.24 The overhead is 6% ~ 11%. It's because when a trace event is enabled 3 tracepoints (sched_switch, sched_wakeup, sched_wakeup_new) will be activated to map pid to cmdname. We'd like to avoid this overhead, so add a trace option '(no)record-cmd' to allow to disable cmdline recording. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4C2D57F4.2050204@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b444786f1a797a7f84e2561346a670649f9c7b3c	07-Jul-2010	Arnd Bergmann <arnd@arndb.de>	tracing: Use generic_file_llseek for debugfs The default for llseek will change to no_llseek, so the tracing debugfs files need to add explicit .llseek assignments. Since we're dealing with regular files from a VFS perspective, use generic_file_llseek. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: John Kacur <jkacur@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <1278538820-1392-10-git-send-email-arnd@arndb.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
eb7beb5c09af75494234ea6acd09d0a647cf7338	16-Jul-2010	Frederic Weisbecker <fweisbec@gmail.com>	tracing: Remove special traces Special traces type was only used by sysprof. Lets remove it now that sysprof ftrace plugin has been dropped. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Soeren Sandmann <sandmann@daimi.au.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com>
f376bf5ffbad863d4bc3b2586b7e34cdf756ad17	16-Jul-2010	Frederic Weisbecker <fweisbec@gmail.com>	tracing: Remove sysprof ftrace plugin The sysprof ftrace plugin doesn't seem to be seriously used somewhere. There is a branch in the sysprof tree that makes an interface to it, but the real sysprof tool uses either its own module or perf events. Drop the sysprof ftrace plugin then, as it's mostly useless. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Soeren Sandmann <sandmann@daimi.au.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com>
5e3d20a68f63fc5a310687d81956c3b96e488b84	04-Jul-2010	Arnd Bergmann <arnd@arndb.de>	init: Remove the BKL from startup code I have shown by code review that no driver takes the BKL at init time any more, so whatever the init code was locking against is no longer there and it is now safe to remove the BKL there. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Steven Rostedt <rostedt@goodmis> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
30dbb20e68e6f7df974b77d2350ebad5eb6f6c9e	26-May-2010	Américo Wang <xiyou.wangcong@gmail.com>	tracing: Remove boot tracer The boot tracer is useless. It simply logs the initcalls but in fact these initcalls are also logged through printk while using the initcall_debug kernel parameter. Nobody seem to be using it so far. Then just remove it. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Cc: Chase Douglas <chase.douglas@canonical.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <20100526105753.GA5677@cr0.nay.redhat.com> [ remove the hooks in main.c, and the headers ] Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
5168ae50a66e3ff7184c2b16d661bd6d70367e50	03-Jun-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Remove ftrace_preempt_disable/enable The ftrace_preempt_disable/enable functions were to address a recursive race caused by the function tracer. The function tracer traces all functions which makes it easily susceptible to recursion. One area was preempt_enable(). This would call the scheduler and the schedulre would call the function tracer and loop. (So was it thought). The ftrace_preempt_disable/enable was made to protect against recursion inside the scheduler by storing the NEED_RESCHED flag. If it was set before the ftrace_preempt_disable() it would not call schedule on ftrace_preempt_enable(), thinking that if it was set before then it would have already scheduled unless it was already in the scheduler. This worked fine except in the case of SMP, where another task would set the NEED_RESCHED flag for a task on another CPU, and then kick off an IPI to trigger it. This could cause the NEED_RESCHED to be saved at ftrace_preempt_disable() but the IPI to arrive in the the preempt disabled section. The ftrace_preempt_enable() would not call the scheduler because the flag was already set before entring the section. This bug would cause a missed preemption check and cause lower latencies. Investigating further, I found that the recusion caused by the function tracer was not due to schedule(), but due to preempt_schedule(). Now that preempt_schedule is completely annotated with notrace, the recusion no longer is an issue. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2711ca237a084286ea1c2dcf82ab2aadab23a00d	21-May-2010	Steven Rostedt <srostedt@redhat.com>	ring-buffer: Move zeroing out excess in page to ring buffer code Currently the trace splice code zeros out the excess bytes in the page before sending it off to userspace. This is to make sure userspace is not getting anything it should not be when reading the pages, because the excess data was never initialized to zero before writing (for perfomance reasons). But the splice code has no business in doing this work, it should be done by the ring buffer. With the latest changes for recording lost events, the splice code gets it wrong anyway. Move the zeroing out of excess bytes into the ring buffer code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
35f3d14dbbc58447c61e38a162ea10add6b31dc7	20-May-2010	Jens Axboe <jens.axboe@oracle.com>	pipe: add support for shrinking and growing pipes This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for growing and shrinking the size of a pipe and adjusts pipe.c and splice.c (and relay and network splice) usage to work with these larger (or smaller) pipes. Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
a9a5776380208a3e48a92d0c763ee1a3b486fb73	23-Apr-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Allow events to share their print functions Multiple events may use the same method to print their data. Instead of having all events have a pointer to their print funtions, the trace_event structure now points to a trace_event_functions structure that will hold the way to print ouf the event. The event itself is now passed to the print function to let the print function know what kind of event it should print. This opens the door to consolidating the way several events print their output. text data bss dec hex filename 4913961 1088356 861512 6863829 68bbd5 vmlinux.orig 4900382 1048964 861512 6810858 67ecea vmlinux.init 4900446 1049028 861512 6810986 67ed6a vmlinux.preprint This change slightly increases the size but is needed for the next change. v3: Fix the branch tracer events to handle this change. v2: Fix the new function graph tracer event calls to handle this change. Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Masami Hiramatsu <mhiramat@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
72c9ddfd4c5bf54ef03cfdf57026416cb678eeba	21-Apr-2010	David Miller <davem@davemloft.net>	ring-buffer: Make non-consuming read less expensive with lots of cpus. When performing a non-consuming read, a synchronize_sched() is performed once for every cpu which is actively tracing. This is very expensive, and can make it take several seconds to open up the 'trace' file with lots of cpus. Only one synchronize_sched() call is actually necessary. What is desired is for all cpus to see the disabling state change. So we transform the existing sequence: for_each_cpu() { ring_buffer_read_start(); } where each ring_buffer_start() call performs a synchronize_sched(), into the following: for_each_cpu() { ring_buffer_read_prepare(); } ring_buffer_read_prepare_sync(); for_each_cpu() { ring_buffer_read_start(); } wherein only the single ring_buffer_read_prepare_sync() call needs to do the synchronize_sched(). The first phase, via ring_buffer_read_prepare(), allocates the 'iter' memory and increments ->record_disabled. In the second phase, ring_buffer_read_prepare_sync() makes sure this ->record_disabled state is visible fully to all cpus. And in the final third phase, the ring_buffer_read_start() calls reset the 'iter' objects allocated in the first phase since we now know that none of the cpus are adding trace entries any more. This makes openning the 'trace' file nearly instantaneous on a sparc64 Niagara2 box with 128 cpus tracing. Signed-off-by: David S. Miller <davem@davemloft.net> LKML-Reference: <20100420.154711.11246950.davem@davemloft.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
62b915f1060996a8e1f69be50e3b8e9e43b710cb	02-Apr-2010	Jiri Olsa <jolsa@redhat.com>	tracing: Add graph output support for irqsoff tracer Add function graph output to irqsoff tracer. The graph output is enabled by setting new 'display-graph' trace option. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <1270227683-14631-4-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cecbca96da387428e220e307a9c945e37e2f4d9e	18-Apr-2010	Frederic Weisbecker <fweisbec@gmail.com>	tracing: Dump either the oops's cpu source or all cpus buffers The ftrace_dump_on_oops kernel parameter, sysctl and sysrq let one dump every cpu buffers when an oops or panic happens. It's nice when you have few cpus but it may take ages if have many, plus you miss the real origin of the problem in all the cpu traces. Sometimes, all you need is to dump the cpu buffer that triggered the opps, most of the time it is our main interest. This patch modifies ftrace_dump_on_oops to handle this choice. The ftrace_dump_on_oops kernel parameter, when it comes alone, has the same behaviour than before. But ftrace_dump_on_oops=orig_cpu will only dump the buffer of the cpu that oops'ed. Similarly, sysctl kernel.ftrace_dump_on_oops=1 and echo 1 > /proc/sys/kernel/ftrace_dump_on_oops keep their previous behaviour. But setting 2 jumps into cpu origin dump mode. v2: Fix double setup v3: Fix spelling issues reported by Randy Dunlap v4: Also update __ftrace_dump in the selftests Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
aa27497c2fb4c7f57706099bd489e683e5cc3e3b	05-Apr-2010	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: Fix uninitialized variable of tracing/trace output Because a local variable is not initialized, I got these when I did 'cat tracing/trace'. (not trace_pipe): CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770221: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446612133255294080 EVENTS] ps-3099 [000] 560.770221: lock_acquire: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770222: lock_release: ffff880030865010 &(&dentry->d_lock)->rlock CPU:0 [LOST 18446744071579453134 EVENTS] ps-3099 [000] 560.770222: lock_release: ffffffff816cfb98 dcache_lock See peek_next_entry(), it does not set *lost_events when we 'cat tracing/trace' Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4BB9A929.2000303@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bc21b478425ac73f66a5ec0b375a5e0d12d609ce	01-Apr-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Show the lost events in the trace_pipe output Now that the ring buffer can keep track of where events are lost. Use this information to the output of trace_pipe: hackbench-3588 [001] 1326.701660: lock_acquire: ffffffff816591e0 read rcu_read_lock hackbench-3588 [001] 1326.701661: lock_acquire: ffff88003f4091f0 &(&dentry->d_lock)->rlock hackbench-3588 [001] 1326.701664: lock_release: ffff88003f4091f0 &(&dentry->d_lock)->rlock CPU:1 [LOST 673 EVENTS] hackbench-3588 [001] 1326.702711: kmem_cache_free: call_site=ffffffff81102b85 ptr=ffff880026d96738 hackbench-3588 [001] 1326.702712: lock_release: ffff88003e1480a8 &mm->mmap_sem hackbench-3588 [001] 1326.702713: lock_acquire: ffff88003e1480a8 &mm->mmap_sem Even works with the function graph tracer: 2) ! 170.098 us \| } 2) 4.036 us \| rcu_irq_exit(); 2) 3.657 us \| idle_cpu(); 2) ! 190.301 us \| } CPU:2 [LOST 2196 EVENTS] 2) 0.853 us \| } /* cancel_dirty_page */ 2) \| remove_from_page_cache() { 2) 1.578 us \| _raw_spin_lock_irq(); 2) \| __remove_from_page_cache() { Note, it does not work with the iterator "trace" file, since it requires the use of consuming the page from the ring buffer to determine how many events were lost, which the iterator does not do. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
66a8cb95ed04025664d1db4e952155ee1dccd048	31-Mar-2010	Steven Rostedt <srostedt@redhat.com>	ring-buffer: Add place holder recording of dropped events Currently, when the ring buffer drops events, it does not record the fact that it did so. It does inform the writer that the event was dropped by returning a NULL event, but it does not put in any place holder where the event was dropped. This is not a trivial thing to add because the ring buffer mostly runs in overwrite (flight recorder) mode. That is, when the ring buffer is full, new data will overwrite old data. In a produce/consumer mode, where new data is simply dropped when the ring buffer is full, it is trivial to add the placeholder for dropped events. When there's more room to write new data, then a special event can be added to notify the reader about the dropped events. But in overwrite mode, any new write can overwrite events. A place holder can not be inserted into the ring buffer since there never may be room. A reader could also come in at anytime and miss the placeholder. Luckily, the way the ring buffer works, the read side can find out if events were lost or not, and how many events. Everytime a write takes place, if it overwrites the header page (the next read) it updates a "overrun" variable that keeps track of the number of lost events. When a reader swaps out a page from the ring buffer, it can record this number, perfom the swap, and then check to see if the number changed, and take the diff if it has, which would be the number of events dropped. This can be stored by the reader and returned to callers of the reader. Since the reader page swap will fail if the writer moved the head page since the time the reader page set up the swap, this gives room to record the overruns without worrying about races. If the reader sets up the pages, records the overrun, than performs the swap, if the swap succeeds, then the overrun variable has not been updated since the setup before the swap. For binary readers of the ring buffer, a flag is set in the header of each sub page (sub buffer) of the ring buffer. This flag is embedded in the size field of the data on the sub buffer, in the 31st bit (the size can be 32 or 64 bits depending on the architecture), but only 27 bits needs to be used for the actual size (less actually). We could add a new field in the sub buffer header to also record the number of events dropped since the last read, but this will change the format of the binary ring buffer a bit too much. Perhaps this change can be made if the information on the number of events dropped is considered important enough. Note, the notification of dropped events is only used by consuming reads or peeking at the ring buffer. Iterating over the ring buffer does not keep this information because the necessary data is only available when a page swap is made, and the iterator does not swap out pages. Cc: Robert Richter <robert.richter@amd.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: "Luis Claudio R. Goncalves" <lclaudio@uudg.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5a0e3ad6af8660be21ca98a971cd00f331318c05	24-Mar-2010	Tejun Heo <tj@kernel.org>	include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
b6345879ccbd9b92864fbd7eb8ac48acdb4d6b15	13-Mar-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Do not record user stack trace from NMI context A bug was found with Li Zefan's ftrace_stress_test that caused applications to segfault during the test. Placing a tracing_off() in the segfault code, and examining several traces, I found that the following was always the case. The lock tracer was enabled (lockdep being required) and userstack was enabled. Testing this out, I just enabled the two, but that was not good enough. I needed to run something else that could trigger it. Running a load like hackbench did not work, but executing a new program would. The following would trigger the segfault within seconds: # echo 1 > /debug/tracing/options/userstacktrace # echo 1 > /debug/tracing/events/lock/enable # while :; do ls > /dev/null ; done Enabling the function graph tracer and looking at what was happening I finally noticed that all cashes happened just after an NMI. 1) \| copy_user_handle_tail() { 1) \| bad_area_nosemaphore() { 1) \| __bad_area_nosemaphore() { 1) \| no_context() { 1) \| fixup_exception() { 1) 0.319 us \| search_exception_tables(); 1) 0.873 us \| } [...] 1) 0.314 us \| __rcu_read_unlock(); 1) 0.325 us \| native_apic_mem_write(); 1) 0.943 us \| } 1) 0.304 us \| rcu_nmi_exit(); [...] 1) 0.479 us \| find_vma(); 1) \| bad_area() { 1) \| __bad_area() { After capturing several traces of failures, all of them happened after an NMI. Curious about this, I added a trace_printk() to the NMI handler to read the regs->ip to see where the NMI happened. In which I found out it was here: ffffffff8135b660 <page_fault>: ffffffff8135b660: 48 83 ec 78 sub $0x78,%rsp ffffffff8135b664: e8 97 01 00 00 callq ffffffff8135b800 <error_entry> What was happening is that the NMI would happen at the place that a page fault occurred. It would call rcu_read_lock() which was traced by the lock events, and the user_stack_trace would run. This would trigger a page fault inside the NMI. I do not see where the CR2 register is saved or restored in NMI handling. This means that it would corrupt the page fault handling that the NMI interrupted. The reason the while loop of ls helped trigger the bug, was that each execution of ls would cause lots of pages to be faulted in, and increase the chances of the race happening. The simple solution is to not allow user stack traces in NMI context. After this patch, I ran the above "ls" test for a couple of hours without any issues. Without this patch, the bug would trigger in less than a minute. Cc: stable@kernel.org Reported-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a2f8071428ed9a0f06865f417c962421c9a6b488	13-Mar-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Disable buffer switching when starting or stopping trace When the trace iterator is read, tracing_start() and tracing_stop() is called to stop tracing while the iterator is processing the trace output. These functions disable both the standard buffer and the max latency buffer. But if the wakeup tracer is running, it can switch these buffers between the two disables: buffer = global_trace.buffer; if (buffer) ring_buffer_record_disable(buffer); <<<--------- swap happens here buffer = max_tr.buffer; if (buffer) ring_buffer_record_disable(buffer); What happens is that we disabled the same buffer twice. On tracing_start() we can enable the same buffer twice. All ring_buffer_record_disable() must be matched with a ring_buffer_record_enable() or the buffer can be disable permanently, or enable prematurely, and cause a bug where a reset happens while a trace is commiting. This patch protects these two by taking the ftrace_max_lock to prevent a switch from occurring. Found with Li Zefan's ftrace_stress_test. Cc: stable@kernel.org Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
283740c619d211e34572cc93c8cdba92ccbdb9cc	13-Mar-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Use same local variable when resetting the ring buffer In the ftrace code that resets the ring buffer it references the buffer with a local variable, but then uses the tr->buffer as the parameter to reset. If the wakeup tracer is running, which can switch the tr->buffer with the max saved buffer, this can break the requirement of disabling the buffer before the reset. buffer = tr->buffer; ring_buffer_record_disable(buffer); synchronize_sched(); __tracing_reset(tr->buffer, cpu); If the tr->buffer is swapped, then the reset is not happening to the buffer that was disabled. This will cause the ring buffer to fail. Found with Li Zefan's ftrace_stress_test. Cc: stable@kernel.org Reported-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
0e95017355dcf43031da6d0e360a748717e56df1	26-Feb-2010	Tim Bird <tim.bird@am.sony.com>	function-graph: Add tracing_thresh support to function_graph tracer Add support for tracing_thresh to the function_graph tracer. This version of this feature isolates the checks into new entry and return functions, to avoid adding more conditional code into the main function_graph paths. When the tracing_thresh is set and the function graph tracer is enabled, only the functions that took longer than the time in microseconds that was set in tracing_thresh are recorded. To do this efficiently, only the function exits are recorded: [tracing]# echo 100 > tracing_thresh [tracing]# echo function_graph > current_tracer [tracing]# cat trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # \| \| \| \| \| \| \| 1) ! 119.214 us \| } /* smp_apic_timer_interrupt / 1) <========== \| 0) ! 101.527 us \| } / __rcu_process_callbacks / 0) ! 126.461 us \| } / rcu_process_callbacks / 0) ! 145.111 us \| } / __do_softirq / 0) ! 149.667 us \| } / do_softirq / 0) ! 168.817 us \| } / irq_exit / 0) ! 248.254 us \| } / smp_apic_timer_interrupt */ Also, add support for specifying tracing_thresh on the kernel command line. When used like so: "tracing_thresh=200 ftrace=function_graph" this can be used to analyse system startup. It is important to disable tracing soon after boot, in order to avoid losing the trace data. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Tim Bird <tim.bird@am.sony.com> LKML-Reference: <4B87098B.4040308@am.sony.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
1acaa1b2d9b5904c9cce06122990a2d71046ce16	05-Mar-2010	Arnaldo Carvalho de Melo <acme@redhat.com>	tracing: Update the comm field in the right variable in update_max_tr The latency output showed: # \| task: -3 (uid:0 nice:0 policy:1 rt_prio:99) The comm is missing in the "task:" and it looks like a minus 3 is the output. The correct display should be: # \| task: migration/0-3 (uid:0 nice:0 policy:1 rt_prio:99) The problem is that the comm is being stored in the wrong data structure. The max_tr.data[cpu] is what stores the comm, not the tr->data[cpu]. Before this patch the max_tr.data[cpu]->comm was zeroed and the /debug/trace ended up showing just the '-' sign followed by the pid. Also remove a needless initialization of max_data. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <1267824230-23861-1-git-send-email-acme@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ac91d85456372a90af5b85eb6620fd2efb1e431b	02-Mar-2010	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: Fix warning in s_next of trace file ops This warning in s_next() can be triggered by lseek(): [<c018b3f7>] ? s_next+0x77/0x80 [<c013e3c1>] warn_slowpath_common+0x81/0xa0 [<c018b3f7>] ? s_next+0x77/0x80 [<c013e3fa>] warn_slowpath_null+0x1a/0x20 [<c018b3f7>] s_next+0x77/0x80 [<c01efa77>] traverse+0x117/0x200 [<c01eff13>] seq_lseek+0xa3/0x120 [<c01efe70>] ? seq_lseek+0x0/0x120 [<c01d7081>] vfs_llseek+0x41/0x50 [<c01d8116>] sys_llseek+0x66/0xa0 [<c0102bd0>] sysenter_do_call+0x12/0x26 The iterator "leftover" variable is zeroed in the opening of the trace file. But lseek can call s_start() which will call s_next() without reseting the "leftover" variable back to zero, which might trigger the WARN_ON_ONCE(iter->leftover) that is in s_next(). Cc: stable@kernel.org Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4B8CE06A.9090207@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
74bf4076f2ed79b5510440b72a561823a8852ec0	25-Jan-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Prevent kernel oops with corrupted buffer If the contents of the ftrace ring buffer gets corrupted and the trace file is read, it could create a kernel oops (usualy just killing the user task thread). This is caused by the checking of the pid in the buffer. If the pid is negative, it still references the cmdline cache array, which could point to an invalid address. The simple fix is to test for negative PIDs. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
d931369b74b3d6f2044f595af6f3dd074f65d9cf	06-Jan-2010	Steven Rostedt <srostedt@redhat.com>	tracing: Add stack dump to trace_printk if stacktrace option is set If the ftrace stacktrace option is set, then add the stack dumps to trace_printk. Requested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7e53bd42d14c75192b99674c40fcc359392da59d	06-Jan-2010	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: Consolidate protection of reader access to the ring buffer At the beginning, access to the ring buffer was fully serialized by trace_types_lock. Patch d7350c3f4569 gives more freedom to readers, and patch b04cc6b1f6 adds code to protect trace_pipe and cpu#/trace_pipe. But actually it is not enough, ring buffer readers are not always read-only, they may consume data. This patch makes accesses to trace, trace_pipe, trace_pipe_raw cpu#/trace, cpu#/trace_pipe and cpu#/trace_pipe_raw serialized. And removes tracing_reader_cpumask which is used to protect trace_pipe. Details: Ring buffer serializes readers, but it is low level protection. The validity of the events (which returns by ring_buffer_peek() ..etc) are not protected by ring buffer. The content of events may become garbage if we allow another process to consume these events concurrently: A) the page of the consumed events may become a normal page (not reader page) in ring buffer, and this page will be rewritten by the events producer. B) The page of the consumed events may become a page for splice_read, and this page will be returned to system. This patch adds trace_access_lock() and trace_access_unlock() primitives. These primitives allow multi process access to different cpu ring buffers concurrently. These primitives don't distinguish read-only and read-consume access. Multi read-only access is also serialized. And we don't use these primitives when we open files, we only use them when we read files. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <4B447D52.1050602@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c757bea93bea4b77ebd181cc6dca60c15e3b1a2c	22-Dec-2009	Steven Rostedt <srostedt@redhat.com>	tracing: Fix setting tracer specific options The function __set_tracer_option() takes as its last parameter a "neg" value. If set it should negate the value of the option. The trace_options_write() passed the value written to the file which is what the new value needs to be set as. But since this is not the negative, it never sets the value. Reported-by: Peter Zijlstra <peterz@infradead.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
28dfef8febe48f59cf1e7596e1992a6a1893ca24	16-Dec-2009	Alexey Dobriyan <adobriyan@gmail.com>	const: constify remaining pipe_buf_operations Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
e36c54582c6f14adc9e10473e2aec2cc4f0acc03	14-Dec-2009	Steven Rostedt <srostedt@redhat.com>	tracing: Fix return of trace_dump_stack() The trace_dump_stack() returned a value for a void function. Also, added the missing stub for trace_dump_stack() when tracing is not configured. Reported-by: Ingo Molnar <mingo@elte.hu> LKML-Reference: <20091214162713.GA31060@elte.hu> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
0199c4e68d1f02894bdefe4b5d9e9ee4aedd8d62	02-Dec-2009	Thomas Gleixner <tglx@linutronix.de>	locking: Convert __raw_spin* functions to arch_spin* Name space cleanup. No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
edc35bd72e2079b25f99c5da7d7a65dbbffc4a26	03-Dec-2009	Thomas Gleixner <tglx@linutronix.de>	locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED Further name space cleanup. No functional change Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
445c89514be242b1b0080056d50bdc1b72adeb5c	02-Dec-2009	Thomas Gleixner <tglx@linutronix.de>	locking: Convert raw_spinlock to arch_spinlock The raw_spin* namespace was taken by lockdep for the architecture specific implementations. raw_spin_* would be the ideal name space for the spinlocks which are not converted to sleeping locks in preempt-rt. Linus suggested to convert the raw_ to arch_ locks and cleanup the name space instead of using an artifical name like core_spin, atomic_spin or whatever No functional change. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: linux-arch@vger.kernel.org
8d18eaaff5acaa58369be342c86e607643ce10c7	08-Dec-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: Simplify trace_option_write() - remove duplicate code inside trace_options_write() - extract duplicate code in trace_options_write() and set_tracer_option() Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4B1DC532.9010802@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2cbafd68b826f8e0471875cf33cdfb8a1478aef1	08-Dec-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: Remove useless trace option Since commit 4d9493c90f8e6e1b164aede3814010a290161abb ("ftrace: remove add-hoc code"), option "sched-tree" has become useless. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4B1DC50A.7040402@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
13f16d209161c95e92aef40e350cc6cf56ac440b	08-Dec-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: Use seq file for trace_clock The buffer for the output is as small as 64 bytes, so it'll overflow if we add more clock type. Use seq file instead. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4B1DC4FB.5030407@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
fdb372ed4cadbfe9dbba0e932a77d0523682e690	08-Dec-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: Use seq file for trace_options Code simplification for reading trace_options. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-reference: <4B1DC4EF.3090106@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
273b281fa22c293963ee3e6eec418f5dda2dbc83	18-Oct-2009	Sam Ravnborg <sam@ravnborg.org>	kbuild: move utsrelease.h to include/generated Fix up all users of utsrelease.h Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Michal Marek <mmarek@suse.cz>
03889384cee7a198a79447c1ea6aca2c8e54d155	11-Dec-2009	Steven Rostedt <srostedt@redhat.com>	tracing: Add trace_dump_stack() I've been asked a few times about how to find out what is calling some location in the kernel. One way is to use dynamic function tracing and implement the func_stack_trace. But this only finds out who is calling a particular function. It does not tell you who is calling that function and entering a specific if conditional. I have myself implemented a quick version of trace_dump_stack() for this purpose a few times, and just needed it now. This is when I realized that this would be a good tool to have in the kernel like trace_printk(). Using trace_dump_stack() is similar to dump_stack() except that it writes to the trace buffer instead and can be used in critical locations. For example: @@ -5485,8 +5485,12 @@ need_resched_nonpreemptible: if (prev->state && !(preempt_count() & PREEMPT_ACTIVE)) { if (unlikely(signal_pending_state(prev->state, prev))) prev->state = TASK_RUNNING; - else + else { deactivate_task(rq, prev, 1); + trace_printk("Deactivating task %s:%d\n", + prev->comm, prev->pid); + trace_dump_stack(); + } switch_count = &prev->nvcsw; } Produces: <...>-3249 [001] 296.105269: schedule: Deactivating task ntpd:3249 <...>-3249 [001] 296.105270: <stack trace> => schedule => schedule_hrtimeout_range => poll_schedule_timeout => do_select => core_sys_select => sys_select => system_call_fastpath Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f2942487ffb0c0a80b2312f667ea30dd55a24bb0	06-Dec-2009	Carsten Emde <Carsten.Emde@osadl.org>	tracing: Remove comparing of NULL to va_list in trace_array_vprintk() Olof Johansson stated the following: Comparing a va_list with NULL is bogus. It's supposed to be treated like an opaque type and only be manipulated with va_* accessors. Olof noticed that this code broke the ARM builds: kernel/trace/trace.c: In function 'trace_array_vprintk': kernel/trace/trace.c:1364: error: invalid operands to binary == (have 'va_list' and 'void *') kernel/trace/trace.c: In function 'tracing_mark_write': kernel/trace/trace.c:3349: error: incompatible type for argument 3 of 'trace_vprintk' This patch partly reverts c13d2f7c3231e873f30db92b96c8caa48f100f33 and re-installs the original mark_printk() mechanism. Reported-by: Olof Johansson <olof@lixom.net> Signed-off-by: Carsten Emde <C.Emde@osadl.org> LKML-Reference: <4B1BAB74.104@osadl.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a63ce5b306855bccdacba95c03bfc293316c8ae3	07-Dec-2009	Steven Rostedt <srostedt@redhat.com>	tracing: Buffer the output of seq_file in case of filled buffer If the seq_read fills the buffer it will call s_start again on the next itertation with the same position. This causes a problem with the function_graph tracer because it consumes the iteration in order to determine leaf functions. What happens is that the iterator stores the entry, and the function graph plugin will look at the next entry. If that next entry is a return of the same function and task, then the function is a leaf and the function_graph plugin calls ring_buffer_read which moves the ring buffer iterator forward (the trace iterator still points to the function start entry). The copying of the trace_seq to the seq_file buffer will fail if the seq_file buffer is full. The seq_read will not show this entry. The next read by userspace will cause seq_read to again call s_start which will reuse the trace iterator entry (the function start entry). But the function return entry was already consumed. The function graph plugin will think that this entry is a nested function and not a leaf. To solve this, the trace code now checks the return status of the seq_printf (trace_print_seq). If the writing to the seq_file buffer fails, we set a flag in the iterator (leftover) and we do not reset the trace_seq buffer. On the next call to s_start, we check the leftover flag, and if it is set, we just reuse the trace_seq buffer and do not call into the plugin print functions. Before this patch: 2) \| fput() { 2) \| __fput() { 2) 0.550 us \| inotify_inode_queue_event(); 2) \| __fsnotify_parent() { 2) 0.540 us \| inotify_dentry_parent_queue_event(); After the patch: 2) \| fput() { 2) \| __fput() { 2) 0.550 us \| inotify_inode_queue_event(); 2) 0.548 us \| __fsnotify_parent(); 2) 0.540 us \| inotify_dentry_parent_queue_event(); [ Updated the patch to fix a missing return 0 from the trace_print_seq() stub when CONFIG_TRACING is disabled. Reported-by: Ingo Molnar <mingo@elte.hu> ] Reported-by: Jiri Olsa <jolsa@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
29bf4a5e3fed3dde3eb629a0cb1762c1e9217458	09-Dec-2009	Steven Rostedt <srostedt@redhat.com>	tracing: Only call pipe_close if pipe_close is defined This fixes a cut and paste error that had pipe_close get called if pipe_open was defined (not pipe_close). Reported-by: Kosaki Motohiro <kosaki.motohiro@jp.fujitsu.com> LKML-Reference: <20091209153204.F4CD.A69D9226@jp.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c521efd1700a8c0f7ce26f011f5eaecca17fabfa	07-Dec-2009	Steven Rostedt <srostedt@redhat.com>	tracing: Add pipe_close interface An ftrace plugin can add a pipe_open interface when the user opens trace_pipe. But if the plugin allocates something within the pipe_open it can not free it because there exists no pipe_close. The hook to the trace file open has a corresponding close. The closing of the trace_pipe file should also have a corresponding close. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c13d2f7c3231e873f30db92b96c8caa48f100f33	16-Nov-2009	Carsten Emde <Carsten.Emde@osadl.org>	tracing: Fix trace_marker output When a string was written to <debugfs>/tracing/trace_marker, some strange characters appeared in the trace output instead of the string, since a vprint function erroneously called a vararg print function with a va_list argument. This patch fixes the problem and simplifies the related code. Signed-off-by: Carsten Emde <C.Emde@osadl.org> LKML-Reference: <4B01AE5D.1010801@osadl.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
a646365cc330b5aaf4452c91f61b1e0d1acf68d0	11-Nov-2009	Roel Kluin <roel.kluin@gmail.com>	tracing: Fix return value of tracing_stats_read() The function tracing_stats_read() mistakenly returns ENOMEM instead of the negative value -ENOMEM. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> LKML-Reference: <4AFB2C0B.50605@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
dd17c8f72993f9461e9c19250e3f155d6d99df22	29-Oct-2009	Rusty Russell <rusty@rustcorp.com.au>	percpu: remove per_cpu__ prefix. Now that the return from alloc_percpu is compatible with the address of per-cpu vars, it makes sense to hand around the address of per-cpu variables. To make this sane, we remove the per_cpu__ prefix we used created to stop people accidentally using these vars directly. Now we have sparse, we can use that (next patch). tj: * Updated to convert stuff which were missed by or added after the original patch. * Kill per_cpu_var() macro. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
9705f69ed0a5ef593f45e618bcb3cbfdbf391f64	29-Oct-2009	Tejun Heo <tj@kernel.org>	percpu: make percpu symbols in tracer unique This patch updates percpu related symbols in kernel tracer such that percpu symbols are unique and don't clash with local symbols. This serves two purposes of decreasing the possibility of global percpu symbol collision and allowing dropping per_cpu__ prefix from percpu symbols. * kernel/trace/trace.c: s/max_data/max_tr_data/ * kernel/trace/trace_hw_branches: s/tracer/hwb_tracer/, s/buffer/hwb_buffer/ Partly based on Rusty Russell's "alloc_percpu: rename percpu vars which cause name clashes" patch. Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com>
cf8517cf905b5cd31d5790250b9ac39f7cb8aa53	24-Oct-2009	Jiri Olsa <jolsa@redhat.com>	tracing: Update ppos instead of filp->f_pos Instead of directly updating filp->f_pos we should update the ppos argument. The filp->f_pos gets updated within the file_pos_write() function called from sys_write(). Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <20091023233646.399670810@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1beee96bae0daf7f491356777c3080cc436950f5	14-Oct-2009	Frederic Weisbecker <fweisbec@gmail.com>	ftrace: Rename set_bootup_ftrace into set_cmdline_ftrace set_cmdline_ftrace is a better match against what does this function: apply a tracer name from the kernel command line. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com>
aef6f81b55f462082699c06e8e67e6eb5630ed45	12-Oct-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing: Rename set_ftrace to set_bootup_ftrace Do this rename because set_ftrace is too much generic and not enough self-explainable as a name. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com>
9288f99aa52d90a5b82573c4b769c97c55af2f56	08-Oct-2009	Christoph Lameter <cl@linux-foundation.org>	this_cpu: Use this_cpu_xx for ftrace this_cpu_xx can reduce the instruction count here and also avoid address arithmetic. Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Tejun Heo <tj@kernel.org>
a813a159766ee9d36aa1fc717c60d63325a6d077	09-Oct-2009	Steven Rostedt <srostedt@redhat.com>	tracing: fix trace_vprintk call The addition of trace_array_{v}printk used the wrong function for trace_vprintk to call. This broke trace_marker and trace_vprintk itself. Although trace_printk may not have been affected by those that end up calling trace_vbprintk. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
79f5599772ac2f138d7a75b8f3f06a93f09c75f7	15-Jun-2009	Li Zefan <lizf@cn.fujitsu.com>	cpumask: use zalloc_cpumask_var() where possible Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
88e9d34c727883d7d6f02cf1475b3ec98b8480c7	23-Sep-2009	James Morris <jmorris@namei.org>	seq_file: constify seq_operations Make all seq_operations structs const, to help mitigate against revectoring user-triggerable function pointers. This is derived from the grsecurity patch, although generated from scratch because it's simpler than extracting the changes from there. Signed-off-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Acked-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
3c235a337e205da0f614e456be72881483dcde6e	22-Sep-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: Fix off-by-one in trace_get_user() Leave the last slot for the tailing '\0'. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4AB865FA.5080801@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
89f19f04dc72363d912fd007a399cb10310eff6e	19-Sep-2009	Andrew Morton <akpm@linux-foundation.org>	sched: Fix raciness in runqueue_is_locked() runqueue_is_locked() is unavoidably racy due to a poor interface design. It does cpu = get_cpu() ret = some_perpcu_thing(cpu); put_cpu(cpu); return ret; Its return value is unreliable. Fix. Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <200909191855.n8JItiko022148@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ee6c2c1bd15e60a442d1861b66285f112ce4f25c	18-Sep-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: remove max_tracer_type_len Limit the length of a tracer's name within 100 chars, and then we don't have to play with max_tracer_type_len. Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4AB32377.9020601@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
60ba77022712c7cda0eda286154bae160446b24a	13-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add filter event logic to special, mmiotrace and boot tracers Now that the pluging tracers use macros to create the structures and automate the exporting of their formats to the format files, they also automatically get a filter file. This patch adds the code to implement the filter logic in the trace recordings. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b5130b1e7d3717d03ab1916b198bf0d49fa0a619	13-Sep-2009	Carsten Emde <Carsten.Emde@osadl.org>	tracing: do not update tracing_max_latency when tracer is stopped The state of the function pair tracing_stop()/tracing_start() is correctly considered when tracer data are updated. However, the global and externally accessible variable tracing_max_latency is always updated - even when tracing is stopped. The update should only occur, if tracing was not stopped. Signed-off-by: Carsten Emde <C.Emde@osadl.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b63f39ea50330f836e301ddda21c6a93dcf0d6a3	11-Sep-2009	jolsa@redhat.com <jolsa@redhat.com>	tracing: create generic trace parser Create a "trace_parser" that can parse the user space input for separate words. struct trace_parser is the descriptor. Generic "trace_get_user" function that can be a helper to read multiple words passed in by user space. Signed-off-by: Jiri Olsa <jolsa@redhat.com> LKML-Reference: <1252682969-3366-2-git-send-email-jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
637e7e864103a7a68c1ce43ada27dfc25c0d113f	11-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add lock depth to entries This patch adds the lock depth of the big kernel lock to the generic entry header. This way we can see the depth of the lock and help in removing the BKL. Example: # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| /_--=> lock-depth # \|\|\|\|\|/ delay # cmd pid \|\|\|\|\|\| time \| caller # \ / \|\|\|\|\|\| \ \| / <idle>-0 2.N..3 5902255250us+: lock_acquire: read rcu_read_lock <idle>-0 2.N..3 5902255253us+: lock_release: rcu_read_lock <idle>-0 2dN..3 5902255257us+: lock_acquire: xtime_lock <idle>-0 2dN..4 5902255259us : lock_acquire: clocksource_lock <idle>-0 2dN..4 5902255261us+: lock_release: clocksource_lock Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
48659d31195bb76d688e99dabd816c5472fb1656	11-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: move tgid out of generic entry and into userstack The userstack trace required the recording of the tgid entry. Unfortunately, it was added to the generic entry where it wasted 4 bytes of every entry and was only used by one entry. This patch moves it out of the generic field and moves it into the only user (userstack_entry). Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e8165dbb03ed04d798163ee512074b9a9466a9c8	04-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: report error in trace if we fail to swap latency buffer The irqsoff tracer will fail to swap the cpu buffer with the max buffer if it preempts a commit. Instead of ignoring this, this patch makes the tracer report it if the last max latency failed due to preempting a current commit. The output of the latency tracer will look like this: # tracer: irqsoff # # irqsoff latency trace v1.1.5 on 2.6.31-rc5 # -------------------------------------------------------------------- # latency: 112 us, #1/1, CPU#1 \| (M:preempt VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # \| task: -4281 (uid:0 nice:0 policy:0 rt_prio:0) # ----------------- # => started at: save_args # => ended at: __do_softirq # # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / bash-4281 1d.s6 265us : update_max_tr_single: Failed to swap buffers due to commit in progress Note the latency time and the functions that disabled the irqs or preemption will still be listed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
659372d3e42a3e17a2e042d38a8bcdb94bfbe797	04-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add trace_array_printk for internal tracers to use This patch adds a trace_array_printk to allow a tracer to use the trace_printk on its own trace array. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
e77405ad80f53966524b5c31244e13fbbbecbd84	02-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: pass around ring buffer instead of tracer The latency tracers (irqsoff and wakeup) can swap trace buffers on the fly. If an event is happening and has reserved data on one of the buffers, and the latency tracer swaps the global buffer with the max buffer, the result is that the event may commit the data to the wrong buffer. This patch changes the API to the trace recording to be recieve the buffer that was used to reserve a commit. Then this buffer can be passed in to the commit. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f633903af2ceb0cec07d45e499a072b6593d0ed1	04-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: make tracing_reset safe for external use Reseting the trace buffer without first disabling the buffer and waiting for any writers to complete, can corrupt the ring buffer. This patch makes the external version of tracing_reset safe from corruption by disabling the ring buffer and calling synchronize_sched. This version can no longer be called from interrupt context. But all those callers have been removed. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2f26ebd549b9ab55ac756b836ec759c11fe93f81	01-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: use timestamp to determine start of latency traces Currently the latency tracers reset the ring buffer. Unfortunately if a commit is in process (due to a trace event), this can corrupt the ring buffer. When this happens, the ring buffer will detect the corruption and then permanently disable the ring buffer. The bug does not crash the system, but it does prevent further tracing after the bug is hit. Instead of reseting the trace buffers, the timestamp of the start of the trace is used instead. The buffers will still contain the previous data, but the output will not count any data that is before the timestamp of the trace. Note, this only affects the static trace output (trace) and not the runtime trace output (trace_pipe). The runtime trace output does not make sense for the latency tracers anyway. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
76f0d07376388f32698ba51b6090a26b90c1342f	04-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: remove users of tracing_reset The function tracing_reset is deprecated for outside use of trace.c. The new function to reset the the buffers is tracing_reset_online_cpus. The reason for this is that resetting the buffers while the event trace points are active can corrupt the buffers, because they may be writing at the time of reset. The tracing_reset_online_cpus disables writes and waits for current writers to finish. This patch replaces all users of tracing_reset except for the latency tracers. Those changes require more work and will be removed in the following patches. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
621968cdb2563b667d6ecb484ba91ef4c3a797b3	04-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: disable buffers and synchronize_sched before resetting Resetting the ring buffers while traces are happening can corrupt the ring buffer and disable it (no kernel crash to worry about). The safest thing to do is disable the ring buffers, call synchronize_sched() to wait for all current writers to finish and then reset the buffer. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
b8de7bd168fa54d059b16d3057b2f8a32cc5bdc3	01-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: disable update max tracer while reading trace When reading the tracer from the trace file, updating the max latency may corrupt the output. This patch disables the tracing of the max latency while reading the trace file. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
8248ac052dfd1eb41819fbc0ca5c7a1667e7e70c	02-Sep-2009	Steven Rostedt <srostedt@redhat.com>	tracing: print out start and stop in latency traces During development of the tracer, we would copy information from the live tracer to the max tracer with one memcpy. Since then we added a generic ring buffer and we handle the copies differently now. Unfortunately, we never copied the critical section information, and we lost the output: # => started at: kmem_cache_alloc # => ended at: kmem_cache_alloc This patch adds back the critical start and end copying as well as removes the unused "trace_idx" and "overrun" fields of the trace_array_cpu structure. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5d4a9dba2d7fbab69f00dedd430d1788834a055a	27-Aug-2009	Steven Rostedt <srostedt@redhat.com>	tracing: only show tracing_max_latency when latency tracer configured The tracing_max_latency file should only be present when one of the latency tracers ({preempt\|irqs}off, wakeup*) are enabled. This patch also removes tracing_thresh when latency tracers are not enabled, as well as compiles out code that is only used for latency tracers. Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5079f3261ffd7fe4a537679af695f2328943a245	25-Aug-2009	Zhaolei <zhaolei@cn.fujitsu.com>	ftrace: Move setting of clock-source out of options There are many clock sources for the tracing system but we can only enable/disable one at a time with the trace/options file. We can move the setting of clock-source out of options and add a separate file for it: # cat trace_clock [local] global # echo global > trace_clock # cat trace_clock local [global] Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> LKML-Reference: <4A939D08.6050604@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
f2d84b65b9778e8a35dd904f7d3993f0a60c9756	07-Aug-2009	Zhaolei <zhaolei@cn.fujitsu.com>	ftrace: Unify effect of writing to trace_options and option/* "echo noglobal-clock > trace_options" can be used to change trace clock but "echo 0 > options/global-clock" can't. The flag toggling will be silently accepted without actually changing the clock callback. We can fix it by using set_tracer_flags() in trace_options_core_write(). Changelog: v1->v2: Simplified switch() after Li Zefan <lizf@cn.fujitsu.com>'s suggestion Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
7770841e63730d62928b0879498064e9614b2ce0	07-Aug-2009	Zhaolei <zhaolei@cn.fujitsu.com>	tracing: Rename set_tracer_flags()'s local variable trace_flags set_tracer_flags() have a local variable named trace_flags which has the same name than a global one in the same scope. This leads to confusion, using tracer_flags should be better by its meaning. Changelog: v1->v2: Simplified another patch in this patchset, no change in this patch. Signed-off-by: Zhao Lei <zhaolei@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
f413cdb80ce00ec1a4d0ab949b5d96c81cae7f75	07-Aug-2009	Frederic Weisbecker <fweisbec@gmail.com>	perf_counter: Fix/complete ftrace event records sampling This patch implements the kernel side support for ftrace event record sampling. A new counter sampling attribute is added: PERF_SAMPLE_TP_RECORD which requests ftrace events record sampling. In this case if a PERF_TYPE_TRACEPOINT counter is active and a tracepoint fires, we emit the tracepoint binary record to the perfcounter event buffer, as a sample. Result, after setting PERF_SAMPLE_TP_RECORD attribute from perf record: perf record -f -F 1 -a -e workqueue:workqueue_execution perf report -D 0x21e18 [0x48]: event: 9 . . ... raw event: size 72 bytes . 0000: 09 00 00 00 01 00 48 00 d0 c7 00 81 ff ff ff ff ......H........ . 0010: 0a 00 00 00 0a 00 00 00 21 00 00 00 00 00 00 00 ........!...... . 0020: 2b 00 01 02 0a 00 00 00 0a 00 00 00 65 76 65 6e +...........eve . 0030: 74 73 2f 31 00 00 00 00 00 00 00 00 0a 00 00 00 ts/1........... . 0040: e0 b1 31 81 ff ff ff ff ....... . 0x21e18 [0x48]: PERF_EVENT_SAMPLE (IP, 1): 10: 0xffffffff8100c7d0 period: 33 The raw ftrace binary record starts at offset 0020. Translation: struct trace_entry { type = 0x2b = 43; flags = 1; preempt_count = 2; pid = 0xa = 10; tgid = 0xa = 10; } thread_comm = "events/1" thread_pid = 0xa = 10; func = 0xffffffff8131b1e0 = flush_to_ldisc() What will come next? - Userspace support ('perf trace'), 'flight data recorder' mode for perf trace, etc. - The unconditional copy from the profiling callback brings some costs however if someone wants no such sampling to occur, and needs to be fixed in the future. For that we need to have an instant access to the perf counter attribute. This is a matter of a flag to add in the struct ftrace_event. - Take care of the events recursivity! Don't ever try to record a lock event for example, it seems some locking is used in the profiling fast path and lead to a tracing recursivity. That will be fixed using raw spinlock or recursivity protection. - [...] - Profit! :-) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Gabriel Munteanu <eduard.munteanu@linux360.ro> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1a0799a8fef5acc6503f9c5e79b2cd003317826c	29-Jul-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-graph-tracer: Move graph event insertion helpers in the graph tracer file The function graph events helpers which insert the function entry and return events into the ring buffer currently reside in trace.c But this file is quite overloaded and the right place for these helpers is in the function graph tracer file. Then move them to trace_functions_graph.c Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
82e04af498a85ba425efe77580b7ba08234411df	29-Jul-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing: Move sched event insertion helpers in the sched switch tracer file The sched events helpers which insert the sched switch and wakeup events into the ring buffer currently reside in trace.c But this file is quite overloaded and the right place for these helpers is in the sched switch tracer file. Then move them to trace_functions.c Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
c0a0d0d3f65284c71115a9bb1ed801ee33eeb552	29-Jul-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: Make the stack entry helpers global Make the stacktrace event insertion helpers globals. This has two effects: - Prepare for moving the sched events insertion helpers to the sched switch tracer file. - Move some ifdef outside function definitions Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
5e5bf483986ad86ad25f25eec5299c86eb2d1c57	29-Jul-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: Turn ftrace_cpu_disabled into a global var In order to prepare the moving of the function graph tracer insertion helpers from trace.c to trace_functions_graph.c, we need to export the ftrace_cpu_disabled variable. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org>
74e7ff8c50b6b022e6ffaa736b16a4dc161d3eaf	28-Jul-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: Fix missing function_graph events when we splice_read from trace_pipe About a half events are missing when we splice_read from trace_pipe. They are unexpectedly consumed because we ignore the TRACE_TYPE_NO_CONSUME return value used by the function graph tracer when it needs to consume the events by itself to walk on the ring buffer. The same problem appears with ftrace_dump() Example of an output before this patch: 1) \| ktime_get_real() { 1) 2.846 us \| read_hpet(); 1) 4.558 us \| } 1) 6.195 us \| } After this patch: 0) \| ktime_get_real() { 0) \| getnstimeofday() { 0) 1.960 us \| read_hpet(); 0) 3.597 us \| } 0) 5.196 us \| } The fix also applies on 2.6.30 Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: stable@kernel.org LKML-Reference: <4A6EEC52.90704@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
8650ae32ef7045e763825dee6256dde7f331bb85	23-Jul-2009	Steven Rostedt <srostedt@redhat.com>	tracing: only truncate ftrace files when O_TRUNC is set The current code will truncate the ftrace files contents if O_APPEND is not set and the file is opened in write mode. This is incorrect. It should only truncate the file if O_TRUNC is set. Otherwise if one of these files is opened by a C program with fopen "r+", it will incorrectly truncate the file. Reported-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
ff4e9da2330beb8d64498a513d3f9694e941b01a	22-Jun-2009	Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>	tracing: cleanup for tracing_trace_options_read() '\n' is already appended, and what we need is just an extra space for the '\0'. Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> LKML-Reference: <4A3EED63.3090908@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
566b0aaf798a0dddfc455d1a5b05c424c6686c65	16-Jul-2009	jolsa@redhat.com <jolsa@redhat.com>	tracing: Remove unused fields/variables Signed-off-by: Jiri Olsa <jolsa@redhat.com> Cc: rostedt@goodmis.org LKML-Reference: <1247773468-11594-2-git-send-email-jolsa@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
405f55712dfe464b3240d7816cc4fe4174831be2	11-Jul-2009	Alexey Dobriyan <adobriyan@gmail.com>	headers: smp_lock.h redux * Remove smp_lock.h from files which don't need it (including some headers!) * Add smp_lock.h to files which do need it * Make smp_lock.h include conditional in hardirq.h It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT This will make hardirq.h inclusion cheaper for every PREEMPT=n config (which includes allmodconfig/allyesconfig, BTW) Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
77ae365eca895061c8bf2b2e3ae1d9ea62869739	27-Mar-2009	Steven Rostedt <srostedt@redhat.com>	ring-buffer: make lockless This patch converts the ring buffers into a completely lockless buffer recording system. The read side still takes locks since we still serialize readers. But the writers are the ones that must be lockless (those can happen in NMIs). The main change is to the "head_page" pointer. We write to the tail, and read from the head. The "head_page" pointer in the cpu buffer is now just a reference to where to look. The real head page is now kept in the head_page->list->prev->next pointer. That is, in the list head of the previous page we set flags. The list pages are allocated to be aligned such that the lowest significant bits are always zero pointing to the list. This gives us play to put in flags to their pointers. bit 0: set when the page is a head page bit 1: set when the writer is moving the page (for overwrite mode) cmpxchg is used to update the pointer. When the writer wraps the buffer and the tail meets the head, in overwrite mode, the writer must move the head page forward. It first uses cmpxchg to change the pointer flag from 1 to 2. Once this is done, the reader on another CPU will not take the page from the buffer. The writers need to protect against interrupts (we don't bother with disabling interrupts because NMIs are allowed to write too). After the writer sets the pointer flag to 2, it takes care to manage interrupts coming in. This is discribed in detail within the comments of the code. Changes in version 2: - Let reader reset entries value of header page. - Fix tail page passing commit page on reader page test. - Always increment entries and write counter in rb_tail_page_update - Add safety check in rb_set_commit_to_write to break out of infinite loop - add mask in rb_is_reader_page [ Impact: lock free writing to the ring buffer ] Signed-off-by: Steven Rostedt <srostedt@redhat.com>
020e5f85cb087a40572c8b8b2dd06292a14fa212	01-Jul-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing/events: Add trace_event boot option We already have ftrace= boot option, and this adds a similar boot option for trace events, so allow trace events to be enabled at boot, for boot debugging purpose. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A4ACE29.3010407@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
9d612beff5089b89a295a2331883a8ce3fff08c1	24-Jun-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: Fix trace_buf_size boot option We should be able to specify [KMG] when setting trace_buf_size boot option, as documented in kernel-parameters.txt Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A41F2DB.4020102@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
f129e965bef40c6153e4fe505f1e408286213424	24-Jun-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: Reset iterator in t_start() The iterator is m->private, but it's not reset to trace_types in t_start(). If the output is larger than PAGE_SIZE and t_start() is called the 2nd time, things will go wrong. Reviewed-by: Liming Wang <liming.wang@windriver.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <4A418728.5020506@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
156f5a7801195fa2ce44aeeb62d6cf8468f3332a	02-Jun-2009	GeunSik Lim <leemgs1@gmail.com>	debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem. Many developers use "/debug/" or "/debugfs/" or "/sys/kernel/debug/" directory name to mount debugfs filesystem for ftrace according to ./Documentation/tracers/ftrace.txt file. And, three directory names(ex:/debug/, /debugfs/, /sys/kernel/debug/) is existed in kernel source like ftrace, DRM, Wireless, Documentation, Network[sky2]files to mount debugfs filesystem. debugfs means debug filesystem for debugging easy to use by greg kroah hartman. "/sys/kernel/debug/" name is suitable as directory name of debugfs filesystem. - debugfs related reference: http://lwn.net/Articles/334546/ Fix inconsistency of directory name to mount debugfs filesystem. * From Steven Rostedt - find_debugfs() and tracing_files() in this patch. Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Acked-by : Inaky Perez-Gonzalez <inaky@linux.intel.com> Reviewed-by : Steven Rostedt <rostedt@goodmis.org> Reviewed-by : James Smart <james.smart@emulex.com> CC: Jiri Kosina <trivial@kernel.org> CC: David Airlie <airlied@linux.ie> CC: Peter Osterlund <petero2@telia.com> CC: Ananth N Mavinakayanahalli <ananth@in.ibm.com> CC: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> CC: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
e4f2d10f479d18198ebafcb5e124cc3dd8b8817a	15-Jun-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: replace a GFP_ATOMIC with GFP_KERNEL allocation Atomic allocation is not needed here. [ Impact: clean up of memory alloction type ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B898.2050607@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
215368e8e59023d6a0abdda896923018d74fdf7f	15-Jun-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: fix a typo in tracing_cpumask_write() It's tracing_cpumask_new that should be kfree()ed. This causes tracing_cpumask to be freed due to the typo: # echo z > tracing_cpumask bash: echo: write error: Invalid argument And subsequent reads/writes to tracing_cpuamsk will access this already-freed tracing_cpumask, thus may lead to crash. [ Impact: fix leak and crash when writing invalid val to tracing_cpumask ] Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <4A35B86A.7070608@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
112f38a7e36e9d688b389507136bf3af3e6d159b	01-Jun-2009	Steven Rostedt <srostedt@redhat.com>	tracing: make trace pipe recognize latency format flag The trace_pipe did not recognize the latency format flag and would produce different output than the trace file. The problem was partly due that the trace flags in the iterator was not set as well as the trace_pipe zeros out part of the iterator (including the flags) to be able to use the same routines as the trace file. trace_flags of the iterator should not cause any problems when not zeroed out by for trace_pipe. Reported-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5b6045a906f48d37591365c5dcdd6d1d146bfd4a	26-May-2009	Heiko Carstens <heiko.carstens@de.ibm.com>	trace: disable preemption before taking raw spinlocks s390 code uses smp_processor_id() in __raw_spin_lock() code which reveals that a (raw) spinlock is taken without preemption disabled. This can potentially deadlock. To fix this explicitly disable and enable preemption. BUG: using smp_processor_id() in preemptible [00000000] code: cat/2278 caller is trace_find_cmdline+0x40/0xfc CPU: 0 Not tainted 2.6.30-rc7-dirty #39 Process cat (pid: 2278, task: 000000003faedb68, ksp: 000000003b33b988) 000000003b33b988 000000003b33bae0 0000000000000002 0000000000000000 000000003b33bb80 000000003b33baf8 000000003b33baf8 00000000000175d6 0000000000000001 000000003b33b988 000000003f9b0000 000000000000000b 000000000000000c 000000003b33bb40 000000003b33bae0 0000000000000000 0000000000000000 00000000000175d6 000000003b33bae0 000000003b33bb28 Call Trace: ([<00000000000174b2>] show_trace+0x112/0x170) [<0000000000017582>] show_stack+0x72/0x100 [<0000000000441538>] dump_stack+0xc8/0xd8 [<000000000025c350>] debug_smp_processor_id+0x114/0x130 [<00000000000bf0e4>] trace_find_cmdline+0x40/0xfc [<00000000000c35d4>] trace_print_context+0x58/0xac [<00000000000bb676>] print_trace_line+0x416/0x470 [<00000000000bc8fe>] s_show+0x4e/0x428 [<000000000013834e>] seq_read+0x36a/0x5d4 [<0000000000112a78>] vfs_read+0xc8/0x174 [<0000000000112c58>] SyS_read+0x74/0xc4 [<000000000002c7ae>] sysc_noemu+0x10/0x16 [<000002000012436c>] 0x2000012436c 1 lock held by cat/2278: #0: (&p->lock){+.+.+.}, at: [<0000000000138056>] seq_read+0x72/0x5d4 [ Impact: fix preempt-unsafe raw spinlock ] Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
4f5359685af6de7dca101393dc606620adbe963f	18-May-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: add trace_event_read_lock() I found that there is nothing to protect event_hash in ftrace_find_event(). Rcu protects the event hashlist but not the event itself while we use it after its extraction through ftrace_find_event(). This lack of a proper locking in this spot opens a race window between any event dereferencing and module removal. Eg: --Task A-- print_trace_line(trace) { event = find_ftrace_event(trace) --Task B-- trace_module_remove_events(mod) { list_trace_events_module(ev, mod) { unregister_ftrace_event(ev->event) { hlist_del(ev->event->node) list_del(....) } } } \|--> module removed, the event has been dropped --Task A-- event->print(trace); // Dereferencing freed memory If the event retrieved belongs to a module and this module is concurrently removed, we may end up dereferencing a data from a freed module. RCU could solve this, but it would add latency to the kernel and forbid tracers output callbacks to call any sleepable code. So this fix converts 'trace_event_mutex' to a read/write semaphore, and adds trace_event_read_lock() to protect ftrace_find_event(). [ Impact: fix possible freed memory dereference in ftrace ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <4A114806.7090302@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
88fc86c283d9c3854e67e4155808027bc2519eb6	14-May-2009	GeunSik Lim <leemgs1@gmail.com>	tracing: Append prompt in /debug/tracing/README file append prompt in /debug/tracing/README file. This is trivial issue. Fix typo Mini Howto file(README) for ftrace. [ Impact: cleanup ] Signed-off-by: GeunSik Lim <geunsik.lim@samsung.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Cc: williams <williams@redhat.com> LKML-Reference: <1242289418.31161.45.camel@centos51> Signed-off-by: Ingo Molnar <mingo@elte.hu>
9456f0fa6d3cb944d3b9fc31c9a244e0362c26ea	07-May-2009	Steven Rostedt <srostedt@redhat.com>	tracing: reset ring buffer when removing modules with events Li Zefan found that there's a race using the event ids of events and modules. When a module is loaded, an event id is incremented. We only have 16 bits for event ids (65536) and there is a possible (but highly unlikely) race that we could load and unload a module that registers events so many times that the event id counter overflows. When it overflows, it then restarts and goes looking for available ids. An id is available if it was added by a module and released. The race is if you have one module add an id, and then is removed. Another module loaded can use that same event id. But if the old module still had events in the ring buffer, the new module's call back would get bogus data. At best (and most likely) the output would just be garbage. But if the module for some reason used pointers (not recommended) then this could potentially crash. The safest thing to do is just reset the ring buffer if a module that registered events is removed. [ Impact: prevent unpredictable results of event id overflows ] Reported-by: Li Zefan <lizf@cn.fujitsu.com> LKML-Reference: <49FEAFD0.30106@cn.fujitsu.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
94487d6d53af5acae10cf9fd52f74498994d46b1	06-May-2009	Steven Rostedt <srostedt@redhat.com>	tracing: use proper export symbol for tracing api When adding the EXPORT_SYMBOL to some of the tracing API, I accidently used EXPORT_SYMBOL instead of EXPORT_SYMBOL_GPL. This patch fixes that mistake. [ Impact: export the tracing code only for GPL modules ] Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
c8d771835e18c938dae8690611d65fe98ad30f58	30-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: export stats of ring buffers to userspace This patch adds stats to the ftrace ring buffers: # cat /debugfs/tracing/per_cpu/cpu0/stats entries: 42360 overrun: 30509326 commit overrun: 0 nmi dropped: 0 Where entries are the total number of data entries in the buffer. overrun is the number of entries not consumed and were overwritten by the writer. commit overrun is the number of entries dropped due to nested writers wrapping the buffer before the initial writer finished the commit. nmi dropped is the number of entries dropped due to the ring buffer lock being held when an nmi was going to write to the ring buffer. Note, this field will be meaningless and will go away when the ring buffer becomes lockless. [ Impact: let userspace know what is happening in the ring buffers ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7267fa6819467669f5cc2ba81a615dcc88158b4b	29-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: fix ref count in splice pages The pages allocated for the splice binary buffer did not initialize the ref count correctly. This caused pages not to be freed and causes a drastic memory leak. Thanks to logdev I was able to trace the tracer to find where the leak was. [ Impact: stop memory leak when using splice ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
f2957f1f196b0217644a17c1379855a118a37d72	29-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: have splice only copy full pages Splice works with pages, it is much more effecient to use an entire page than to copy bits over several pages. Using logdev to trace the internals of the splice mechanism, I was able to see that splice can be very aggressive. When tracing is occurring, and the reader caught up to the writer, and the writer is on the reader page, the reader will copy what is there into the splice page. Splice may iterate over several pages and if the writer is still writing to the page, the reader will keep copying bits to new pages to pass to userspace. This patch changes it to only pass data to userspace if the page is full (the writer has left the page). This has a small side effect that splice can not read a partial page, and must wait for the page to fill. This should not be an issue. If tracing has stopped, then a use of "read" will still read all of the page. [ Impact: better performance for ring buffer splice code ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
93459c6cb9816c52200993d29dd18cea1daee335	29-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: only add splice page if entries exist The splice code allocates a page even when the ring buffer is empty. It detects the ring buffer being empty when it it fails to copy anything from the ring buffer into the page. This patch adds a check to see if there is anything in the ring buffer before allocating a page. Thanks to logdev for letting me trace the tracer to find this. [ Impact: speed up due to removing unnecessary allocation ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
5beae6efd1004b44c3e257dc96087978e4c763c1	29-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: fix ref count in splice pages The pages allocated for the splice binary buffer did not initialize the ref count correctly. This caused pages not to be freed and causes a drastic memory leak. Thanks to logdev I was able to trace the tracer to find where the leak was. [ Impact: stop memory leak when using splice ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
cd891ae0305601bdb4d2e7e85282961c4ff256cd	28-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: convert ftrace_dump spinlocks to raw ftrace_dump is used for printing out the contents of the ftrace ring buffer to the console on failure. Currently it uses a spinlock to synchronize the output from multiple failures on different CPUs. This spin lock currently is a normal spinlock and can cause issues with lockdep and lock tracing. This patch converts it to raw since it is for error handling only. The lock is local to the ftrace_dump and is not used by any other infrastructure. [ Impact: prevent ftrace_dump from locking up by internal tracing ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
7a4f453b6d7379a7c380825949977c5a838aa012	22-Apr-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing/events: make struct trace_entry->type to be int type struct trace_entry->type is unsigned char, while trace event's id is int type, thus for a event with id >= 256, it's entry->type is cast to (id % 256), and then we can't see the trace output of this event. # insmod trace-events-sample.ko # echo foo_bar > /mnt/tracing/set_event # cat /debug/tracing/events/trace-events-sample/foo_bar/id 256 # cat /mnt/tracing/trace_pipe <...>-3548 [001] 215.091142: Unknown type 0 <...>-3548 [001] 216.089207: Unknown type 0 <...>-3548 [001] 217.087271: Unknown type 0 <...>-3548 [001] 218.085332: Unknown type 0 [ Impact: fix output for trace events with id >= 256 ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <49EEDB0E.5070207@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
3189cdb31622f4e40688ce5a6fc5d940b42bc805	17-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: protect trace_printk from recursion trace_printk can be called from any context, including NMIs. If this happens, then we must test for for recursion before grabbing any spinlocks. This patch prevents trace_printk from being called recursively. [ Impact: prevent hard lockup in lockdep event tracer ] Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
12acd473d45cf2e40de3782cb2de712e5cd4d715	17-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add EXPORT_SYMBOL_GPL for trace commits Not all the necessary symbols were exported to allow for tracing by modules. This patch adds them in. [ Impact: allow modules to commit data to the ring buffer ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
339ae5d3c3fc2025e3657637921495fd600027c7	17-Apr-2009	Li Zefan <lizf@cn.fujitsu.com>	tracing: fix file mode of trace and README trace is read-write and README is read-only. [ Impact: fix /debug/tracing/ file permissions. ] Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <49E7EAB6.4070605@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
69abe6a5d18a9394baa325bab8f57748b037c517	10-Apr-2009	Avadh Patel <avadh4all@gmail.com>	tracing: add saved_cmdlines file to show cached task comms Export the cached task comms to userspace. This allows user apps to translate the pids from a trace into their respective task command lines. [ Impact: let userspace apps reading binary buffer know comm's of pids ] Signed-off-by: Avadh Patel <avadh4all@gmail.com> [ added error checking and use of buf pointer to index file_buf ] Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
17c873ec280a03894bc718af817f7f24fa787ae1	11-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing/events: add export symbols for trace events in modules Impact: let modules add trace events The trace event code requires some functions to be exported to allow modules to use TRACE_EVENT. This patch adds EXPORT_SYMBOL_GPL to the necessary functions. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
eb02ce017dd83985041a7e54c6449f92d53b026f	08-Apr-2009	Tom Zanussi <tzanussi@gmail.com>	tracing/filters: use ring_buffer_discard_commit() in filter_check_discard() This patch changes filter_check_discard() to make use of the new ring_buffer_discard_commit() function and modifies the current users to call the old commit function in the non-discard case. It also introduces a version of filter_check_discard() that uses the global trace buffer (filter_current_check_discard()) for those cases. v2 changes: - fix compile error noticed by Ingo Molnar Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: fweisbec@gmail.com LKML-Reference: <1239178554.10295.36.camel@tropicana> Signed-off-by: Ingo Molnar <mingo@elte.hu>
77d9f465d46fd67cdb82ee5e1ab99dd57a17c486	02-Apr-2009	Steven Rostedt <srostedt@redhat.com>	tracing/filters: use ring_buffer_discard_commit for discarded events The ring_buffer_discard_commit makes better usage of the ring_buffer when an event has been discarded. It tries to remove it completely if possible. This patch converts the trace event filtering to use ring_buffer_discard_commit instead of the ring_buffer_event_discard. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
e45f2e2bd298e1ff687448e5fd15a3588b5807ec	31-Mar-2009	Tom Zanussi <tzanussi@gmail.com>	tracing/filters: add TRACE_EVENT_FORMAT_NOFILTER event macro Frederic Weisbecker suggested that the trace_special event shouldn't be filterable; this patch adds a TRACE_EVENT_FORMAT_NOFILTER event macro that allows an event format to be exported without having a filter attached, and removes filtering from the trace_special event. Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
e1112b4d96859367a93468027c9635e2ac04eb3f	31-Mar-2009	Tom Zanussi <tzanussi@gmail.com>	tracing/filters: add run-time field descriptions to TRACE_EVENT_FORMAT events This patch adds run-time field descriptions to all the event formats exported using TRACE_EVENT_FORMAT. It also hooks up all the tracers that use them (i.e. the tracers in the 'ftrace subsystem') so they can also have their output filtered by the event-filtering mechanism. When I was testing this, there were a couple of things that fooled me into thinking the filters weren't working, when actually they were - I'll mention them here so others don't make the same mistakes (and file bug reports. ;-) One is that some of the tracers trace multiple events e.g. the sched_switch tracer uses the context_switch and wakeup events, and if you don't set filters on all of the traced events, the unfiltered output from the events without filters on them can make it look like the filtering as a whole isn't working properly, when actually it is doing what it was asked to do - it just wasn't asked to do the right thing. The other is that for the really high-volume tracers e.g. the function tracer, the volume of filtered events can be so high that it pushes the unfiltered events out of the ring buffer before they can be read so e.g. cat'ing the trace file repeatedly shows either no output, or once in awhile some output but that isn't there the next time you read the trace, which isn't what you normally expect when reading the trace file. If you read from the trace_pipe file though, you can catch them before they disappear. Changes from v1: As suggested by Frederic Weisbecker: - get rid of externs in functions - added unlikely() to filter_check_discard() Signed-off-by: Tom Zanussi <tzanussi@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
93cfb3c9fd83d877a8f1ffad9ff862b617b32828	02-Apr-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: fix splice return too large I got these from strace: splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 12288 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 16384 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192 splice(0x3, 0, 0x5, 0, 0x1000, 0x1) = 8192 I wanted to splice_read 4096 bytes, but it returns 8192 or larger. It is because the return value of tracing_buffers_splice_read() does not include "zero out any left over data" bytes. But tracing_buffers_read() includes these bytes, we make them consistent. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D46674.9030804@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c7625a555f55d7ae49236cde551786c88f5a5ce1	02-Apr-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: update file->f_pos when splice(2) it Impact: Cleanup These two lines: if (unlikely(*ppos)) return -ESPIPE; in tracing_buffers_splice_read() are not needed, VFS layer has disabled seek(2). We remove these two lines, and then we can update file->f_pos. And tracing_buffers_read() updates file->f_pos, this fix make tracing_buffers_splice_read() updates file->f_pos too. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D46670.4010503@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ddd538f3e6a1a4bec2f6942f83a753263e6577b4	02-Apr-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: allocate page when needed Impact: Cleanup Sometimes, we open trace_pipe_raw, but we don't read(2) it, we just splice(2) it, thus, the page is not used. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D4666B.4010608@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
d1e7e02f30be672c6f6ee40908be83877a0d49d1	02-Apr-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: disable seeking for trace_pipe_raw Impact: disable pread() We set tracing_buffers_fops.llseek to no_llseek, but we can still perform pread() to read this file. That is not expected. This fix uses nonseekable_open() to disable it. tracing_buffers_fops.llseek is still set to no_llseek, it mark this file is a "non-seekable device" and is used by sys_splice(). See also do_splice() or manual of splice(2): ERRORS EINVAL Target file system doesn't support splicing; neither of the descriptors refers to a pipe; or offset given for non-seekable device. Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> LKML-Reference: <49D46668.8030806@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5452af664f6fba26b80eb2c8c4ceae2999d5cf56	27-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: factorize the tracing files creation Impact: cleanup Most of the tracing files creation follow the same pattern: ret = debugfs_create_file(...) if (!ret) pr_warning("Couldn't create ... entry\n") Unify it! Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1238109938-11840-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
bc2b6871c17b3aff79fb14e1a1c06c5f5a187f76	23-Mar-2009	Nikanth Karthikesan <knikanth@suse.de>	Update /debug/tracing/README Some of the tracers have been renamed, which was not updated in the in-kernel run-time README file. Update it. Signed-off-by: Nikanth Karthikesan <knikanth@suse.de> LKML-Reference: <200903231158.32151.knikanth@suse.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b0dfa978c7a1699fb3506fbfcba0b6a5c4bd17ae	01-Apr-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: alloc the started cpumask for the trace file Impact: fix a crash while cat trace file Currently we are using a cpumask to remind each cpu where a trace occured. It lets us notice the user that a cpu just had its first trace. But on latest -tip we have the following crash once we cat the trace file: IP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 *pde = 00000000 Oops: 0000 [#1] PREEMPT SMP last sysfs file: /sys/class/net/eth0/carrier Pid: 3897, comm: cat Not tainted (2.6.29-tip-02825-g0f22972-dirty #81) EIP: 0060:[<c0270c4a>] EFLAGS: 00010297 CPU: 0 EIP is at print_trace_fmt+0x45/0xe7 EAX: 00000000 EBX: 00000000 ECX: c12d9e98 EDX: ccdb7010 ESI: d31f4000 EDI: 00322401 EBP: d31f3f10 ESP: d31f3efc DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process cat (pid: 3897, ti=d31f2000 task=d3b3cf20 task.ti=d31f2000) Stack: d31f4080 ccdb7010 d31f4000 d691fe70 ccdb7010 d31f3f24 c0270e5c d31f4000 d691fe70 d31f4000 d31f3f34 c02718e8 c12d9e98 d691fe70 d31f3f70 c02bfc33 00001000 09130000 d3b46e00 d691fe98 00000000 00000079 00000001 00000000 Call Trace: [<c0270e5c>] ? print_trace_line+0x170/0x17c [<c02718e8>] ? s_show+0xa7/0xbd [<c02bfc33>] ? seq_read+0x24a/0x327 [<c02bf9e9>] ? seq_read+0x0/0x327 [<c02ab18b>] ? vfs_read+0x86/0xe1 [<c02ab289>] ? sys_read+0x40/0x65 [<c0202d8f>] ? sysenter_do_call+0x12/0x3c Code: 00 00 00 89 45 ec f7 c7 00 20 00 00 89 55 f0 74 4e f6 86 98 10 00 00 02 74 45 8b 86 8c 10 00 00 8b 9e a8 10 00 00 e8 52 f3 ff ff <0f> a3 03 19 c0 85 c0 75 2b 8b 86 8c 10 00 00 8b 9e a8 10 00 00 EIP: [<c0270c4a>] print_trace_fmt+0x45/0xe7 SS:ESP 0068:d31f3efc CR2: 0000000000000000 ---[ end trace aa9cf38e5ebed9dd ]--- This is because we alloc the iter->started cpumask on tracing_pipe_open but not on tracing_open. It hadn't been noticed until now because we need to have ring buffer overruns to activate the starting of cpu buffer detection. Also, we need a check to not print the messagge for the first trace on the file. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1238619188-6109-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5f0c6c03c5fee91c02c696bc9bf4c0d41392abe7	27-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: fix missing include string.h Building a kernel with tracing can raise the following warning on tip/master: kernel/trace/trace.c:1249: error: implicit declaration of function 'vbin_printf' We are missing an include to string.h Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1238160130-7437-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
cf8e3474654f20433aab9aa35826d43b5f245008	30-Mar-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: fix incorrect return type of ns2usecs() Impact: fix time output bug in 32bits system ns2usecs() returns 'long', it's incorrect. (In i386) ... <idle>-0 [000] 521.442100: _spin_lock <-tick_do_update_jiffies64 <idle>-0 [000] 521.442101: do_timer <-tick_do_update_jiffies64 <idle>-0 [000] 521.442102: update_wall_time <-do_timer <idle>-0 [000] 521.442102: update_xtime_cache <-update_wall_time .... (It always print the time less than 2200 seconds besides ...) Because 'long' is 32bits in i386. ( (1<<31) useconds is about 2200 seconds) ... <idle>-0 [001] 4154502640.134759: rcu_bh_qsctr_inc <-__do_softirq <idle>-0 [001] 4154502640.134760: _local_bh_enable <-__do_softirq <idle>-0 [001] 4154502640.134761: idle_cpu <-irq_exit ... (very large value) Because 'long' is a signed type and it is 32bits in i386. Changes in v2: return 'unsigned long long' instead of 'cycle_t' Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> LKML-Reference: <49D05D10.4030009@cn.fujitsu.com> Reported-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a2a16d6a3156ef7309ca7328a20c35df9418e670	25-Mar-2009	Steven Rostedt <srostedt@redhat.com>	function-graph: add option to calculate graph time or not graph time is the time that a function is executing another function. Thus if function A calls B, if graph-time is set, then the time for A includes B. This is the default behavior. But if graph-time is off, then the time spent executing B is subtracted from A. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
0706f1c48ca8a7ab478090b4e38f2e578ae2bfe0	24-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: adding function timings to function profiler If the function graph trace is enabled, the function profiler will use it to take the timing of the functions. cat /debug/tracing/trace_stat/functions Function Hit Time -------- --- ---- mwait_idle 127 183028.4 us schedule 26 151997.7 us __schedule 31 151975.1 us sys_wait4 2 74080.53 us do_wait 2 74077.80 us sys_newlstat 138 39929.16 us do_path_lookup 179 39845.79 us vfs_lstat_fd 138 39761.97 us user_path_at 153 39469.58 us path_walk 179 39435.76 us __link_path_walk 189 39143.73 us [...] Note the times are skewed due to the function graph tracer not taking into account schedules. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
be6f164a02f394675e2ac2077dd354cebef5b4c0	24-Mar-2009	Steven Rostedt <srostedt@redhat.com>	function-graph: add option for include sleep times Impact: give user a choice to show times spent while sleeping The user may want to see the time a function spent sleeping. This patch adds the trace option "sleep-time" to allow that. The "sleep-time" option is default on. echo sleep-time > /debug/tracing/trace_options produces: ------------------------------------------ 2) avahi-d-3428 => <idle>-0 ------------------------------------------ 2) \| finish_task_switch() { 2) 0.621 us \| _spin_unlock_irq(); 2) 2.202 us \| } 2) ! 1002.197 us \| } 2) ! 1003.521 us \| } where as, echo nosleep-time > /debug/tracing/trace_options produces: 0) <idle>-0 => yum-upd-3416 ------------------------------------------ 0) \| finish_task_switch() { 0) 0.643 us \| _spin_unlock_irq(); 0) 2.342 us \| } 0) + 41.302 us \| } 0) + 42.453 us \| } Signed-off-by: Steven Rostedt <srostedt@redhat.com>
1618536961d31f9b3f55767b22d4a897f4204c26	23-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-graph-tracer: fix functions call traces imbalance Impact: fix traces output Sometimes one can observe an imbalance in the traces between function calls and function return traces: func1() { } } The curly brace inside func1() is the return of another function nested inside func1. The return trace have been inserted in the buffer but not the entry. We are storing a return address on the function traces stack while we haven't inserted its entry on the buffer, hence the imbalance on the traces. This is because the tracers doesn't check all failures that can happen on buffer insertion. This patch reports the tracing recursion failures and the ring buffer failures. In such cases, we now restore the original return address for the function, giving up its return trace. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237843021-11695-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
3e1f60b80cafcb5d7e8d3665b35962fbb8fb9efa	22-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: check if debugfs is registered before creating files Impact: fix a crash with ftrace={nop,boot} parameter If the nop or initcall tracers are launched as boot tracers, they will attempt to create their option directory and files. But these tracers are registered very early and then assigned as "boot tracers" very early if asked to. Since they do this before debugfs has been registered (core initcall), a crash is triggered. Another early tracers could also come later. So we fix it by checking if debugfs is initialized before creating the root tracing directory. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237759847-21025-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
07edf7121374609709ef1b0889f6e7b8d6a62ec1	22-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/events: don't use wake up for events Impact: fix hard-lockup with sched switch events Some ftrace events, such as sched wakeup, can be traced while the runqueue lock is hold. Since they are using trace_current_buffer_unlock_commit(), they call wake_up() which can try to grab the runqueue lock too, resulting in a deadlock. Now for all event, we call a new helper: trace_nowake_buffer_unlock_commit() which do pretty the same than trace_current_buffer_unlock_commit() except than it doesn't call trace_wake_up(). Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237759847-21025-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b8b94265337f83b7db9c5f429b1769d463d7da8c	22-Mar-2009	Dmitri Vorobiev <dmitri.vorobiev@movial.com>	tracing: fix four sparse warnings Impact: cleanup. This patch fixes the following sparse warnings: kernel/trace/trace.c:385:9: warning: symbol 'trace_seq_to_buffer' was not declared. Should it be static? kernel/trace/trace_clock.c:29:13: warning: symbol 'trace_clock_local' was not declared. Should it be static? kernel/trace/trace_clock.c:54:13: warning: symbol 'trace_clock' was not declared. Should it be static? kernel/trace/trace_clock.c:74:13: warning: symbol 'trace_clock_global' was not declared. Should it be static? Signed-off-by: Dmitri Vorobiev <dmitri.vorobiev@movial.com> LKML-Reference: <1237741871-5827-4-git-send-email-dmitri.vorobiev@movial.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
cf586b61f80229491127d3c57c06ed93c9f530d3	22-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-graph-tracer: prevent hangs during self-tests Impact: detect tracing related hangs Sometimes, with some configs, the function graph tracer can make the timer interrupt too much slow, hanging the kernel in an endless loop of timer interrupts servicing. As suggested by Ingo, this patch brings a watchdog which stops the selftest after a defined number of functions traced, definitely disabling this tracer. For those who want to debug the cause of the function graph trace hang, you can pass the ftrace_dump_on_oops kernel parameter to dump the traces after this hang detection. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1237694675-23509-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
40ce74f19c28077550646c76d96a075bf312e461	19-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: remove recording function depth from trace_printk The function depth in trace_printk was to facilitate the function graph output. Now that the function graph calculates the depth within the trace output, we no longer need to record the depth when the trace_printk is called. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5ef841f6f32dce0b752a4fa0622781ee67a0e874	19-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: make print_(b)printk_msg_only global This patch makes print_printk_msg_only and print_bprintk_msg_only global for other functions to use. It also renames them by adding a "trace_" to the beginning to avoid namespace collisions. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
4acd4d00f716873e27e7b60ae292cbdbfae674dd	18-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: give easy way to clear trace buffer There is currently no easy way to clear the trace buffer. Currently the only way is to change the current tracer. This patch lets the user clear the trace buffer by simply writing into the trace files. echo > /debug/tracing/trace or to clear a single cpu (i.e. for CPU 1): echo > /debug/tracing/per_cpu/cpu1/trace Requested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a635cf0497342978d417cae19d4a4823932977ff	18-Mar-2009	Carsten Emde <Carsten.Emde@osadl.org>	tracing: fix command line to pid reverse map Impact: fix command line to pid mapping map_cmdline_to_pid[] is checked in trace_save_cmdline(), but never updated. This results in stale pid to command line mappings and the tracer output will associate the wrong comm string. Signed-off-by: Carsten Emde <Carsten.Emde@osadl.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
50d88758a3f9787cbdbdbc030560b815721eab4b	18-Mar-2009	Thomas Gleixner <tglx@linutronix.de>	tracing: fix trace_find_cmdline() Impact: prevent stale command line output In case there is no valid command line mapping for a pid trace_find_cmdline() returns without updating the comm buffer. The trace dump keeps the previous entry which results in confusing trace output: <idle>-0 [000] 280.702056 .... <idle>-23456 [000] 280.702080 .... Update the comm buffer with "<...>" when no mapping is found. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2c7eea4c62ba090b7f4583c3d7337ea0019be900	18-Mar-2009	Thomas Gleixner <tglx@linutronix.de>	tracing: replace the crude (unsigned) -1 hackery Impact: cleanup The command line recorder uses (unsigned) -1 to mark non mapped entries in the pid to command line maps. The validity check is completely unintuitive: idx >= SAVED_CMDLINES There is no need for such casting games. Use a constant to mark unmapped entries and check for that constant to make the code readable and understandable. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
18aecd362a1c991fbf5f7919ae051a77532ba2f8	18-Mar-2009	Thomas Gleixner <tglx@linutronix.de>	tracing: stop command line recording when tracing is disabled Impact: prevent overwrite of command line entries When the tracer is stopped the command line recording continues to record. The check for tracing_is_on() is not sufficient here as the ringbuffer status is not affected by setting debug/tracing/tracing_enabled to 0. On a non idle system this can result in the loss of the command line information for the stopped trace, which makes the trace harder to read and analyse. Check tracer_enabled to allow further recording. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
af4617bdba34aa556272b34c3986b0a4d588f568	17-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add global-clock option to provide cross CPU clock to traces Impact: feature to allow better serialized clock This patch adds an option called "global-clock" that will allow the tracer to switch to a slower but more accurate (across CPUs) clock. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
37886f6a9f62d22530ffee8d3f9215c8345b6969	17-Mar-2009	Steven Rostedt <srostedt@redhat.com>	ring-buffer: add api to allow a tracer to change clock source This patch adds a new function called ring_buffer_set_clock that allows a tracer to assign its own clock source to the buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
6adaad14d7d4d3ef31b4e2dc992b18b5da7c4eb3	17-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: stop comm recording on tracing off Impact: fix for losing comms in trace The command lines of tasks are cached at sched switch to not need to record them at every trace point. Disabling the tracing on stops the recording of traces, but does not stop the caching of command lines. When the tracing is off the cache may overflow and cause the tracing to show incorrect tasks matching the PIDs. This patch disables prevents updates to the comm cache when the ring buffer is off. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
4ca530852346be239b7c19e7bec5d2b78855bebe	17-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: protect reader of cmdline output Impact: fix to one cause of incorrect comm outputs in trace The spinlock only protected the creation of a comm <=> pid pair. But it was possible that a reader could look up a pid, and get the wrong comm because it had no locking. This also required changing trace_find_cmdline to copy the comm cache and not just send back a pointer to it. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2fc1dfbe17e7705c55b7a99da995fa565e26f151	16-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: fix early free of cpumasks Impact: fix crashes when tracing cpumasks While ring-buffer allocation, the cpumasks are allocated too, including the tracing cpumask and the per-cpu file mask handler. But these cpumasks are freed accidentally just after. Fix it. Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237164303-11476-1-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
59f586db98919d7d9c43527b26c8de1cdf9ed912	15-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: fix missing mutex unlock on tracing_set_tracer() Impact: fix possible locking imbalance In case of ring buffer resize failure, tracing_set_tracer forgot to release trace_types_lock. Fix it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1237151439-6755-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
7f96f93f02b7637491a1637dee12dcdcd40b9802	13-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: move binary buffers into per cpu directory The binary_buffers directory in /debugfs/tracing held the files to read the trace buffers in a binary format. This held one file per CPU buffer. But we also have a per_cpu directory that holds a way to read the pretty-print formats. This patch moves the binary buffers into the per_cpu_directory: # ls /debug/tracing/per_cpu/cpu1/ trace trace_pipe trace_pipe_raw The new name is called "trace_pipe_raw". The binary buffers always acted similar to trace_pipe, except that they produce raw data. Requested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
48ead02030f849d011259244bb4ea9b985479006	12-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: bring back raw trace_printk for dynamic formats strings Impact: fix callsites with dynamic format strings Since its new binary implementation, trace_printk() internally uses static containers for the format strings on each callsites. But the value is assigned once at build time, which means that it can't take dynamic formats. So this patch unearthes the raw trace_printk implementation for the callers that will need trace_printk to be able to carry these dynamic format strings. The trace_printk() macro will use the appropriate implementation for each callsite. Most of the time however, the binary implementation will still be used. The other impact of this patch is that mmiotrace_printk() will use the old implementation because it calls the low level trace_vprintk and we can't guess here whether the format passed in it is dynamic or not. Some parts of this patch have been written by Steven Rostedt (most notably the part that chooses the appropriate implementation for each callsites). Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
db526ca329f855510e8ce672332eba3304aed590	12-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: show that buffer size is not expanded Impact: do not confuse user on small trace buffer sizes When the system boots up, the trace buffer is small to conserve memory. It is only two pages per online CPU. When the tracer is used, it expands to the default value. This can confuse the user if they look at the buffer size and see only 7, but then later they see 1408. # cat /debug/tracing/buffer_size_kb 7 # echo sched_switch > /debug/tracing/current_tracer # cat /debug/tracing/buffer_size_kb 1408 This patch tries to help remove this confustion by showing that the buffer has not been expanded. # cat /debug/tracing/buffer_size_kb 7 (expanded: 1408) Signed-off-by: Steven Rostedt <srostedt@redhat.com>
1027fcb206a0fb8348e63aff078c74bdee1c2698	12-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: protect ring_buffer_expanded with trace_types_lock Impact: prevent races with ring_buffer_expanded This patch places the expanding of the tracing buffer under the protection of the trace_types_lock mutex. It is highly unlikely that there would be any contention, but better safe than sorry. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a123c52b46a1f84bcec3dc963351896c6d6afaf7	12-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: fix comments about trace buffer resizing Impact: cleanup Some of the comments about the trace buffer resizing is gobbledygook. And I wonder why people question if I'm a native English speaker. This patch makes the comments make a bit more sense. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
554f786e284a6ce859d51f62240d615603944c8e	12-Mar-2009	Steven Rostedt <srostedt@redhat.com>	ring-buffer: only allocate buffers for online cpus Impact: save on memory Currently, a ring buffer was allocated for each "possible_cpus". On some systems, this is the same as NR_CPUS. Thus, if a system defined NR_CPUS = 64 but it only had 1 CPU, we could have possibly 63 useless ring buffers taking up space. With a default buffer of 3 megs, this could be quite drastic. This patch changes the ring buffer code to only allocate ring buffers for online CPUs. If a CPU goes off line, we do not free the buffer. This is because the user may still have trace data in that buffer that they would like to look at. Perhaps in the future we could add code to delete a ring buffer if the CPU is offline and the ring buffer becomes empty. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
9aba60fe6eb20453de53a572143bef22fa929fba	12-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: fix trace_wait to know to wait on all cpus or just one Impact: fix to task live locking on reading trace_pipe on one CPU The same code is used for both trace_pipe (all CPUS) and the per_cpu trace_pipe file. When there is no data to read, it will check for signals and wait on the trace wait queue. The problem happens with the per_cpu wait. The trace_wait code checks all CPUs. Thus, if there's data in another CPU buffer, then it will exit the wait, without checking for signals or waiting on the wait queue. It would then try to read the empty buffer, and since that will just return nothing, then it will try to wait again. Unfortunately, that will again fail due to there still being data in the other buffers. This ends up with a live lock for the task. This patch fixes the trace_wait to be aware that the iterator may only be waiting on a single buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
1852fcce181faa237c010a3dbedb473cf9d4555f	11-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: expand the ring buffers when an event is activated To save memory, the tracer ring buffers are set to a minimum. The activating of a trace expands the ring buffer size. This patch adds this expanding, when an event is activated. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
73c5162aa362a543793f4a957c6c536dcbaa89ce	11-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: keep ring buffer to minimum size till used Impact: less memory impact on systems not using tracer When the kernel boots up that has tracing configured, it allocates the default size of the ring buffer. This currently happens to be 1.4Megs per possible CPU. This is quite a bit of wasted memory if the system is never using the tracer. The current solution is to keep the ring buffers to a minimum size until the user uses them. Once a tracer is piped into the current_tracer the ring buffer will be expanded to the default size. If the user changes the size of the ring buffer, it will take the size given by the user immediately. If the user adds a "ftrace=" to the kernel command line, then the ring buffers will be set to the default size on initialization. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
80370cb758e7ca2692cd9fb5e413d970b1f4b2b2	10-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: use raw spinlocks for trace_vprintk Impact: prevent locking up by lockdep tracer The lockdep tracer uses trace_vprintk and thus trace_vprintk can not call back into lockdep without locking up. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
ef18012b248b47ec9a12c3a83ca5e99782d39c5d	10-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: remove funky whitespace in the trace code Impact: clean up There existed a lot of <space><tab>'s in the tracing code. This patch removes them. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
888b55dc314d26239d84c3b187dae555a81c1605	08-Mar-2009	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>	ftrace: tracing header should put '#' at the beginning of a line In a recent discussion, Andrew Morton pointed out that tracing header should put '#' at the beginning of a line. Then, we can easily filtered the header by following grep usage: cat trace \| grep -v '^#' Wakeup trace also has the same header problem. Comparison of headers displayed: before this patch: # tracer: wakeup # wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip -------------------------------------------------------------------- latency: 19059 us, #21277/21277, CPU#1 \| (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) ----------------- \| task: kondemand/1-1644 (uid:0 nice:-5 policy:0 rt_prio:0) ----------------- # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / irqbalan-1887 1d.s. 0us : 1887:120:R + [001] 1644:115:S kondemand/1 irqbalan-1887 1d.s. 1us : default_wake_function <-autoremove_wake_function irqbalan-1887 1d.s. 2us : check_preempt_wakeup <-try_to_wake_up after this patch: # tracer: wakeup # # wakeup latency trace v1.1.5 on 2.6.29-rc7-tip-tip # -------------------------------------------------------------------- # latency: 529 us, #530/530, CPU#0 \| (M:desktop VP:0, KP:0, SP:0 HP:0 #P:4) # ----------------- # \| task: kondemand/0-1641 (uid:0 nice:-5 policy:0 rt_prio:0) # ----------------- # # _------=> CPU# # / _-----=> irqs-off # \| / _----=> need-resched # \|\| / _---=> hardirq/softirq # \|\|\| / _--=> preempt-depth # \|\|\|\| / # \|\|\|\|\| delay # cmd pid \|\|\|\|\| time \| caller # \ / \|\|\|\|\| \ \| / sshd-2496 0d.s. 0us : 2496:120:R + [000] 1641:115:S kondemand/0 sshd-2496 0d.s. 1us : default_wake_function <-autoremove_wake_function sshd-2496 0d.s. 1us : check_preempt_wakeup <-try_to_wake_up Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <20090308124421.23C3.A69D9226@jp.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
769b0441f438c4bb4872cb8560eb6fe51bcc09ee	06-Mar-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: drop the old trace_printk() implementation in favour of trace_bprintk() Impact: faster and lighter tracing Now that we have trace_bprintk() which is faster and consume lesser memory than trace_printk() and has the same purpose, we can now drop the old implementation in favour of the binary one from trace_bprintk(), which means we move all the implementation of trace_bprintk() to trace_printk(), so the Api doesn't change except that we must now use trace_seq_bprintk() to print the TRACE_PRINT entries. Some changes result of this: - Previously, trace_bprintk depended of a single tracer and couldn't work without. This tracer has been dropped and the whole implementation of trace_printk() (like the module formats management) is now integrated in the tracing core (comes with CONFIG_TRACING), though we keep the file trace_printk (previously trace_bprintk.c) where we can find the module management. Thus we don't overflow trace.c - changes some parts to use trace_seq_bprintk() to print TRACE_PRINT entries. - change a bit trace_printk/trace_vprintk macros to support non-builtin formats constants, and fix 'const' qualifiers warnings. But this is all transparent for developers. - etc... V2: - Rebase against last changes - Fix mispell on the changelog V3: - Rebase against last changes (moving trace_printk() to kernel.h) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-5-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1ba28e02a18cbdbea123836f6c98efb09cbf59ec	06-Mar-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: add trace_bprintk() Impact: add a generic printk() for tracing, like trace_printk() trace_bprintk() uses the infrastructure to record events on ring_buffer. [ fweisbec@gmail.com: ported to latest -tip, made it work if !CONFIG_MODULES, never free the format strings from modules because we can't keep track of them and conditionnaly create the ftrace format strings section (reported by Steven Rostedt) ] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-4-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1427cdf0592368bdec57276edaf714040ee8744f	06-Mar-2009	Lai Jiangshan <laijs@cn.fujitsu.com>	tracing: infrastructure for supporting binary record Impact: save on memory for tracing Current tracers are typically using a struct(like struct ftrace_entry, struct ctx_switch_entry, struct special_entr etc...)to record a binary event. These structs can only record a their own kind of events. A new kind of tracer need a new struct and a lot of code too handle it. So we need a generic binary record for events. This infrastructure is for this purpose. [fweisbec@gmail.com: rebase against latest -tip, make it safe while sched tracing as reported by Steven Rostedt] Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1236356510-8381-3-git-send-email-fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5e2336a0d47c9661a40cc5ef85135ce1406af6e8	06-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: make all file_operations const Impact: cleanup All file_operations structures should be constant. No one is going to change them. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5e1607a00bd082972629d3d68c95c8bcf902b55a	05-Mar-2009	Ingo Molnar <mingo@elte.hu>	tracing: rename ftrace_printk() => trace_printk() Impact: cleanup Use a more generic name - this also allows the prototype to move to kernel.h and be generally available to kernel developers who want to do some quick tracing. Signed-off-by: Ingo Molnar <mingo@elte.hu>
27d48be84477d2f0a2e2ac3738a3971dece631d5	05-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: consolidate print_lat_fmt and print_trace_fmt Impact: clean up Both print_lat_fmt and print_trace_fmt do pretty much the same thing except for one different function call. This patch consolidates the two functions and adds an if statement to perform the difference. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5fd73f862468280d4cbb5ba4321502f911f9f89a	05-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: remove extra latency_trace method from trace structure Impact: clean up The trace and latency_trace function pointers are identical for every tracer but the function tracer. The differences in the function tracer are trivial (latency output puts paranthesis around parent). This patch removes the latency_trace pointer and all prints will now just use the trace output function pointer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
c032ef64d680717e4e8ce3da65da6419a35f8a2c	05-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add latency output format option With the removal of the latency_trace file, we lost the ability to see some of the finer details in a trace. Like the state of interrupts enabled, the preempt count, need resched, and if we are in an interrupt handler, softirq handler or not. This patch simply creates an option to bring back the old format. This also removes the warning about an unused variable that held the latency_trace file operations. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
e74da5235cec6cb71eb338c987f876ecc793138b	05-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: fix seq read from trace files The buffer used by trace_seq was updated incorrectly. Instead of consuming what was actually read, it consumed the rest of the buffer on reads. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
2dc5d12b1f43134e9bc5037f69f4739cfdfab93e	05-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: do not return EFAULT if read copied anything Impact: fix trace read to conform to standards Andrew Morton, Theodore Tso and H. Peter Anvin brought to my attention that a userspace read should not return -EFAULT if it succeeded in copying anything. It should only return -EFAULT if it failed to copy at all. This patch modifies the check of copy_from_user and updates the return code appropriately. I also used H. Peter Anvin's short cut rule to just test ret == count. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
e543ad76914abec1acf6631604a4154cd7a2ca6b	05-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add cpu_file intialization for ftrace_dump Impact: fix to ftrace_dump output corruption The commit: b04cc6b1f6398b0e0b60d37e27ce51b4899672ec tracing/core: introduce per cpu tracing files added a new field to the iterator called cpu_file. This was a handle to differentiate between the per cpu trace output files and the all cpu "trace" file. The all cpu "trace" file required setting this to TRACE_PIPE_ALL_CPU. The problem is that the ftrace_dump sets up its own iterator but was not updated to handle this change. The result was only CPU 0 printing out on crash and a lot of "<0>"'s also being printed. Reported-by: Thomas Gleixner <tglx@linuxtronix.de> Tested-by: Darren Hart <dvhtc@us.ibm.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
efed792d6738964f399a508ef9e831cd60fa4657	04-Mar-2009	Peter Zijlstra <peterz@infradead.org>	tracing: add lockdep tracepoints for lock acquire/release Augment the traces with lock names when lockdep is available: 1) \| down_read_trylock() { 1) \| _spin_lock_irqsave() { 1) \| /* lock_acquire: &sem->wait_lock / 1) 4.201 us \| } 1) \| _spin_unlock_irqrestore() { 1) \| / lock_release: &sem->wait_lock / 1) 3.523 us \| } 1) \| / lock_acquire: try read &mm->mmap_sem / 1) + 13.386 us \| } 1) 1.635 us \| find_vma(); 1) \| handle_mm_fault() { 1) \| __do_fault() { 1) \| filemap_fault() { 1) \| find_lock_page() { 1) \| find_get_page() { 1) \| / lock_acquire: read rcu_read_lock / 1) \| / lock_release: rcu_read_lock / 1) 5.697 us \| } 1) 8.158 us \| } 1) + 11.079 us \| } 1) \| _spin_lock() { 1) \| / lock_acquire: __pte_lockptr(page) / 1) 3.949 us \| } 1) 1.460 us \| page_add_file_rmap(); 1) \| _spin_unlock() { 1) \| / lock_release: __pte_lockptr(page) / 1) 3.115 us \| } 1) \| unlock_page() { 1) 1.421 us \| page_waitqueue(); 1) 1.220 us \| __wake_up_bit(); 1) 6.519 us \| } 1) + 34.328 us \| } 1) + 37.452 us \| } 1) \| up_read() { 1) \| / lock_release: &mm->mmap_sem / 1) \| _spin_lock_irqsave() { 1) \| / lock_acquire: &sem->wait_lock / 1) 3.865 us \| } 1) \| _spin_unlock_irqrestore() { 1) \| / lock_release: &sem->wait_lock */ 1) 8.562 us \| } 1) + 17.370 us \| } Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: =?ISO-8859-1?Q?T=F6r=F6k?= Edwin <edwintorok@gmail.com> Cc: Jason Baron <jbaron@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> LKML-Reference: <1236166375.5330.7209.camel@laptop> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2cadf9135eb3b6d84b6427314be827ddd443c308	02-Dec-2008	Steven Rostedt <srostedt@redhat.com>	tracing: add binary buffer files for use with splice Impact: new feature This patch creates a directory of files that correspond to the per CPU ring buffers. These are binary files and are made to be used with splice. This is the fastest way to extract data from the ftrace ring buffers. Thanks to Jiaying Zhang for pushing me to get this code fixed, and to Eduard - Gabriel Munteanu for his splice code that helped me debug my code. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
f9520750c4c9924c14325cd951efae5fae58104c	02-Mar-2009	Steven Rostedt <srostedt@redhat.com>	tracing: make trace_seq_reset global and rename to trace_seq_init Impact: clean up The trace_seq functions may be used separately outside of the ftrace iterator. The trace_seq_reset is needed for these operations. This patch also renames trace_seq_reset to the more appropriate trace_seq_init. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
ef5580d0fffce6e0a01043bac0625128b5d409a7	28-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add interface to write into current tracer buffer Right now all tracers must manage their own trace buffers. This was to enforce tracers to be independent in case we finally decide to allow each tracer to have their own trace buffer. But now we are adding event tracing that writes to the current tracer's buffer. This adds an interface to allow events to write to the current tracer buffer without having to manage its own. Since event tracing has no "tracer", and is just a way to hook into any other tracer. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
0cfe82451dfa3ebf4e69158f2eb450f2fbb6b715	27-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: replace kzalloc with kcalloc Impact: clean up kcalloc is a better approach to allocate a NULL array. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
5c6a3ae1b4beebb56e2916b84f1208d96a9e32ff	27-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: use newline separator for trace options list Impact: clean up Instead of listing the trace options like: # cat /debug/tracing/trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj We now list them like: # cat /debug/tracing/trace_options print-parent nosym-offset nosym-addr noverbose noraw nohex nobin noblock nostacktrace nosched-tree ftrace_printk noftrace_preempt nobranch annotate nouserstacktrace nosym-userobj Signed-off-by: Steven Rostedt <srostedt@redhat.com>
85a2f9b46f8cd8aaa11c64c715e1ea3ec27ec486	27-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: use pointer error returns for __tracing_open Impact: fix compile warning and clean up When I first wrote __tracing_open, instead of passing the error code via the ERR_PTR macros, I lazily used a separate parameter to hold the return for errors. When Frederic Weisbecker updated that function, he used the Linux kernel ERR_PTR for the returns. This caused the parameter return to possibly not be initialized on error. gcc correctly pointed this out with a warning. This patch converts the entire function to use the Linux kernel ERR_PTR macro methods. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
d8e83d26b5ab3b31ee0ff6d093a2627707a1e221	27-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add protection around open use of current_tracer Impact: fix to possible race conditions There's some uses of current_tracer that is not protected by the trace_types_lock. There is a small chance that a sysadmin changes the tracer while the current_tracer is being referenced. If the race is hit, it is unlikely to cause any harm since the tracers are constant and are not freed. But some strang side effects may occur. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
577b785f55168d5acb3d123ba41bfe8d7981e044	27-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add tracer dependent options to options directory This patch adds the tracer dependent options dynamically to the options directory when the tracer is activated. These options are removed when the tracer is deactivated. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a8259075074fb09c230b4cd2c8d3ee3c49d6ecd1	27-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: add options directory and core option files This patch creates an options directory in the debugfs, that contains the available tracing options. These files contain 1 or 0, where 1 is the option is enabled and 0 it is disabled. Simply echoing in 1 will enable the option and 0 will disable it. This patch only contains the core options, not the tracer options. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
8656e7a2fa6afcd8682990f804a2a9674568738f	26-Feb-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: make the per cpu trace files in per cpu directories Impact: restructure the VFS layout of per CPU trace buffers The per cpu trace files are all in a single directory: /debug/tracing/per_cpu. In case of a large number of cpu, the content of this directory becomes messy so we create now one directory per cpu inside /debug/tracing/per_cpu which contain each their own trace_pipe and trace files. Ie: /debug/tracing$ ls -R per_cpu per_cpu: cpu0 cpu1 per_cpu/cpu0: trace trace_pipe per_cpu/cpu1: trace trace_pipe Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
d7350c3f45694104e820041969c8185c5f99e57c	25-Feb-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: make the read callbacks reentrants Now that several per-cpu files can be read or spliced at the same, we want the read/splice callbacks for tracing files to be reentrants. Until now, a single global mutex (trace_types_lock) serialized the access to tracing_read_pipe(), tracing_splice_read_pipe(), and the seq helpers. Ie: it means that if a user tries to read trace_pipe0 and trace_pipe1 at the same time, the access to the function tracing_read_pipe() is contended and one reader must wait for the other to finish its read call. The trace_type_lock mutex is mostly here to serialize the access to the global current tracer (current_trace), which can be changed concurrently. Although the iter struct keeps a private pointer to this tracer, its callbacks can be changed by another function. The method used here is to not keep anymore private reference to the tracer inside the iterator but to make a copy of it inside the iterator. Then it checks on subsequents read calls if the tracer has changed. This is not costly because the current tracer is not expected to be changed often, so we use a branch prediction for that. Moreover, we add a private mutex to the iterator (there is one iterator per file descriptor) to serialize the accesses in case of multiple consumers per file descriptor (which would be a silly idea from the user). Note that this is not to protect the ring buffer, since the ring buffer already serializes the readers accesses. This is to prevent from traces weirdness in case of concurrent consumers. But these mutexes can be dropped anyway, that would not result in any crash. Just tell me what you think about it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b04cc6b1f6398b0e0b60d37e27ce51b4899672ec	25-Feb-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: introduce per cpu tracing files Impact: split up tracing output per cpu Currently, on the tracing debugfs directory, three files are available to the user to let him extracting the trace output: - trace is an iterator through the ring-buffer. It's a reader but not a consumer It doesn't block when no more traces are available. - trace pretty similar to the former, except that it adds more informations such as prempt count, irq flag, ... - trace_pipe is a reader and a consumer, it will also block waiting for traces if necessary (heh, yes it's a pipe). The traces coming from different cpus are curretly mixed up inside these files. Sometimes it messes up the informations, sometimes it's useful, depending on what does the tracer capture. The tracing_cpumask file is useful to filter the output and select only the traces captured a custom defined set of cpus. But still it is not enough powerful to extract at the same time one trace buffer per cpu. So this patch creates a new directory: /debug/tracing/per_cpu/. Inside this directory, you will now find one trace_pipe file and one trace file per cpu. Which means if you have two cpus, you will have: trace0 trace1 trace_pipe0 trace_pipe1 And of course, reading these files will have the same effect than with the usual tracing files, except that you will only see the traces from the given cpu. The original all-in-one cpu trace file are still available on their original place. Until now, only one consumer was allowed on trace_pipe to avoid racy consuming on the ring-buffer. Now the approach changed a bit, you can have only one consumer per cpu. Which means you are allowed to read concurrently trace_pipe0 and trace_pipe1 But you can't have two readers on trace_pipe0 or trace_pipe1. Following the same logic, if there is one reader on the common trace_pipe, you can not have at the same time another reader on trace_pipe0 or in trace_pipe1. Because in trace_pipe is already a consumer in all cpu buffers in essence. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
886b5b73d71e4027d7dc6c14f5f7ab102201ea6b	25-Feb-2009	Ingo Molnar <mingo@elte.hu>	tracing: remove /debug/tracing/latency_trace Impact: remove old debug/tracing API /debug/tracing/latency_trace is an old legacy format we kept from the old latency tracer. Remove the file for now. If there's any useful bit missing then we'll propagate any useful output bits into the /debug/tracing/trace output. Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
fa7c7f6e11f70d62505074a8b30a776236850dec	11-Feb-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: remove unused parameter in tracing_fill_pipe_page() Impact: cleanup The struct page *pages parameter is unused. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
6eaaa5d57e76c454479833fc8594cd7c3b75c789	11-Feb-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/core: use appropriate waiting on trace_pipe Impact: api and pipe waiting change Currently, the waiting used in tracing_read_pipe() is done through a 100 msecs schedule_timeout() loop which periodically check if there are traces on the buffer. This can cause small latencies for programs which are reading the incoming events. This patch makes the reader waiting for the trace_wait waitqueue except for few tracers such as the sched and functions tracers which might be already hold the runqueue lock while waking up the reader. This is performed through a new callback wait_pipe() on struct tracer. If none is implemented on a specific tracer, the default waiting for trace_wait queue is attached. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
af513098452b8887d7c0e15a39d7cb74479501bd	17-Feb-2009	Wenji Huang <wenji.huang@oracle.com>	tracing: use the more proper parameter Pass tsk to tracing_record_cmdline instead of current. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
e7669b8e329255bbcb40af65b38e342825d97a46	10-Feb-2009	Hannes Eder <hannes@hanneseder.net>	tracing: fix sparse warning: attribute function with __acquires/__releases Fix this sparse warning: kernel/trace/trace.c:458:9: warning: context imbalance in 'register_tracer' - unexpected unlock Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5e39841c45cf5e6ea930ede1b0303309e03037a2	10-Feb-2009	Hannes Eder <hannes@hanneseder.net>	tracing: fix sparse warnings: fix (un-)signedness Fix these sparse warnings: kernel/trace/ring_buffer.c:70:37: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:84:39: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:96:43: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2475:13: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2478:42: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2500:40: warning: incorrect type in argument 3 (different signedness) kernel/trace/ring_buffer.c:2505:44: warning: incorrect type in argument 2 (different signedness) kernel/trace/ring_buffer.c:2507:46: warning: incorrect type in argument 2 (different signedness) kernel/trace/trace.c:2130:40: warning: incorrect type in argument 3 (different signedness) kernel/trace/trace.c:2280:40: warning: incorrect type in argument 3 (different signedness) Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
4fd2735881bf4d8bf5e30979f31fc2f1b1d505fa	10-Feb-2009	Hannes Eder <hannes@hanneseder.net>	tracing: fix sparse warnings: make symbols static Impact: make global variables and a global function static The function '__trace_userstack' does not seem to have a caller, so it is commented out. Fix this sparse warnings: kernel/trace/trace.c:82:5: warning: symbol 'tracing_disabled' was not declared. Should it be static? kernel/trace/trace.c:600:10: warning: symbol 'trace_record_cmdline_disabled' was not declared. Should it be static? kernel/trace/trace.c:957:6: warning: symbol '__trace_userstack' was not declared. Should it be static? kernel/trace/trace.c:1694:5: warning: symbol 'tracing_release' was not declared. Should it be static? Signed-off-by: Hannes Eder <hannes@hanneseder.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c3706f005c3aaf570e71f0f083fdbb59a5a9fa2e	10-Feb-2009	Wenji Huang <wenji.huang@oracle.com>	tracing: fix typos in comments Impact: clean up. Fix typos in the comments. Signed-off-by: Wenji Huang <wenji.huang@oracle.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
34cd4998d38f9bd04f34b78a7cb0c7f1bee00bd9	09-Feb-2009	Steven Rostedt <srostedt@redhat.com>	tracing: clean up splice code Ingo Molnar suggested a series of clean ups for the splice code. This patch implements those suggestions. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
ff98781bab2735e6c89793034173e0cb5007a7e5	09-Feb-2009	Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>	tracing: Move pipe waiting code out of tracing_read_pipe(). This moves the pipe waiting code from tracing_read_pipe() into tracing_wait_pipe(), which is useful to implement other fops, like splice_read. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
3c56819b14b00dd449bd776303e61f8532fad09f	09-Feb-2009	Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>	tracing: splice support for tracing_pipe Added and implemented tracing_pipe_fops->splice_read(). This allows userspace programs to get tracing data more efficiently. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Steven Rostedt <srostedt@redhat.com>
b91facc367366b3f71375f337eb5997ec9ab4e69	06-Feb-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-graph-tracer: handle the leaf functions from trace_pipe When one cats the trace file, the leaf functions are printed without brackets: function(); whereas in the trace_pipe file we'll see the following: function() { } This is because the ring_buffer handling is not the same between those two files. On the trace file, when an entry is printed, the iterator advanced and then we can check the next entry. There is no iterator with trace_pipe, the current entry to print has been peeked and not consumed. So checking the next entry will still return the current one while we don't consume it. This patch introduces a new value for the output callbacks to ask the tracing core to not consume the current entry after printing it. We need it because we will have to consume the current entry ourself to check the next one. Now the trace_pipe is able to handle well the leaf functions. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b5db03c4355e568f1567758287b30a6a262d5057	07-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	tracing: handle unregistering the current tracer Impact: simplification Instead of requiring that plugins have the sequence: my_tracer_stop(my_trace_array); unregister_tracer(my_tracer); it should be possible just do a: unregister_tracer(my_tracer); Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1830b52d0de8c60c4f5dfbac134aa8f69d815801	08-Feb-2009	Steven Rostedt <srostedt@redhat.com>	trace: remove deprecated entry->cpu Impact: fix to prevent developers from using entry->cpu With the new ring buffer infrastructure, the cpu for the entry is implicit with which CPU buffer it is on. The original code use to record the current cpu into the generic entry header, which can be retrieved by entry->cpu. When the ring buffer was introduced, the users were convert to use the the cpu number of which cpu ring buffer was in use (this was passed to the tracers by the iterator: iter->cpu). Unfortunately, the cpu item in the entry structure was never removed. This allowed for developers to use it instead of the proper iter->cpu, unknowingly, using an uninitialized variable. This was not the fault of the developers, since it would seem like the logical place to retrieve the cpu identifier. This patch removes the cpu item from the entry structure and fixes all the users that should have been using iter->cpu. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
b6f11df26fdc28324cf9c9e3b77f2dc985c1bb13	05-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	trace: Call tracing_reset_online_cpus before tracer->init() Impact: cleanup To make it easy for ftrace plugin writers, as this was open coded in the existing plugins Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
51a763dd84253bab1d0a1e68e11a7753d1b702ca	05-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	tracing: Introduce trace_buffer_{lock_reserve,unlock_commit} Impact: new API These new functions do what previously was being open coded, reducing the number of details ftrace plugin writers have to worry about. It also standardizes the handling of stacktrace, userstacktrace and other trace options we may introduce in the future. With this patch, for instance, the blk tracer (and some others already in the tree) can use the "userstacktrace" /d/tracing/trace_options facility. $ codiff /tmp/vmlinux.before /tmp/vmlinux.after linux-2.6-tip/kernel/trace/trace.c: trace_vprintk \| -5 trace_graph_return \| -22 trace_graph_entry \| -26 trace_function \| -45 __ftrace_trace_stack \| -27 ftrace_trace_userstack \| -29 tracing_sched_switch_trace \| -66 tracing_stop \| +1 trace_seq_to_user \| -1 ftrace_trace_special \| -63 ftrace_special \| +1 tracing_sched_wakeup_trace \| -70 tracing_reset_online_cpus \| -1 13 functions changed, 2 bytes added, 355 bytes removed, diff: -353 linux-2.6-tip/block/blktrace.c: __blk_add_trace \| -58 1 function changed, 58 bytes removed, diff: -58 linux-2.6-tip/kernel/trace/trace.c: trace_buffer_lock_reserve \| +88 trace_buffer_unlock_commit \| +86 2 functions changed, 174 bytes added, diff: +174 /tmp/vmlinux.after: 16 functions changed, 176 bytes added, 413 bytes removed, diff: -237 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
0a9877514c4fed10a70720293b37213dd172ee3e	05-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	ring_buffer: remove unused flags parameter Impact: API change, cleanup >From ring_buffer_{lock_reserve,unlock_commit}. $ codiff /tmp/vmlinux.before /tmp/vmlinux.after linux-2.6-tip/kernel/trace/trace.c: trace_vprintk \| -14 trace_graph_return \| -14 trace_graph_entry \| -10 trace_function \| -8 __ftrace_trace_stack \| -8 ftrace_trace_userstack \| -8 tracing_sched_switch_trace \| -8 ftrace_trace_special \| -12 tracing_sched_wakeup_trace \| -8 9 functions changed, 90 bytes removed, diff: -90 linux-2.6-tip/block/blktrace.c: __blk_add_trace \| -1 1 function changed, 1 bytes removed, diff: -1 /tmp/vmlinux.after: 10 functions changed, 91 bytes removed, diff: -91 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frédéric Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
dac74940289f350c2590bec92737833bad608541	05-Feb-2009	Steven Rostedt <srostedt@redhat.com>	trace: code style clean up Ingo Molnar suggested using goto logic to keep the indentation down and to be able to remove the nasty line breaks. This actually makes the code a bit more readable. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
7be421510b91491d5aa5a29fa1005712039b95af	05-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	trace: Remove unused trace_array_cpu parameter Impact: cleanup Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
268ccda0cb4d1292029d07ee3dbd07117baf6ecb	04-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	trace: assign defaults at register_ftrace_event Impact: simplification of tracers As all tracers are doing this we might as well do it in register_ftrace_event and save one branch each time we call these callbacks. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ae7462b4f1fe1f36b5d562dbd5202a2eba01f072	04-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	trace: make the trace_event callbacks return enum print_line_t As they actually all return these enumerators. Reported-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
d9793bd8018f835c64b10f44e278c86cecb8e932	03-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	trace: judicious error checking of trace_seq results Impact: bugfix and cleanup Some callsites were returning either TRACE_ITER_PARTIAL_LINE if the trace_seq routines (trace_seq_printf, etc) returned 0 meaning its buffer was full, or zero otherwise. But... /* Return values for print_line callback / enum print_line_t { TRACE_TYPE_PARTIAL_LINE = 0, / Retry after flushing the seq / TRACE_TYPE_HANDLED = 1, TRACE_TYPE_UNHANDLED = 2 / Relay to other output functions */ }; In other cases the return value was not being relayed at all. Most of the time it didn't hurt because the page wasn't get filled, but for correctness sake, handle the return values everywhere. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Acked-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2c9b238eb325895d3312dad64e2685783575e474	02-Feb-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	trace: Change struct trace_event callbacks parameter list Impact: API change The trace_seq and trace_entry are in trace_iterator, where there are more fields that may be needed by tracers, so just pass the tracer_iterator as is already the case for struct tracer->print_line. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c4a8e8be2d43cc22b371e8e9c05c253409759d94	02-Feb-2009	Frederic Weisbecker <fweisbec@gmail.com>	trace: better manage the context info for events Impact: make trace_event more convenient for tracers All tracers (for the moment) that use the struct trace_event want to have the context info printed before their own output: the pid/cmdline, cpu, and timestamp. But some other tracers that want to implement their trace_event callbacks will not necessary need these information or they may want to format them as they want. This patch adds a new default-enabled trace option: TRACE_ITER_CONTEXT_INFO When disabled through: echo nocontext-info > /debugfs/tracing/trace_options The pid, cpu and timestamps headers will not be printed. IE with the sched_switch tracer with context-info (default): bash-2935 [001] 100.356561: 2935:120:S ==> [001] 0:140:R <idle> <idle>-0 [000] 100.412804: 0:140:R + [000] 11:115:S events/0 <idle>-0 [000] 100.412816: 0:140:R ==> [000] 11:115:R events/0 events/0-11 [000] 100.412829: 11:115:S ==> [000] 0:140:R <idle> Without context-info: 2935:120:S ==> [001] 0:140:R <idle> 0:140:R + [000] 11:115:S events/0 0:140:R ==> [000] 11:115:R events/0 11:115:S ==> [000] 0:140:R <idle> A tracer can disable it at runtime by clearing the bit TRACE_ITER_CONTEXT_INFO in trace_flags. The print routines were renamed to trace_print_context and trace_print_lat_context, so that they can be used by tracers if they want to use them for one of the trace_event callbacks. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
79fb0768fbd371f3b94d909f51f587b3a24ab272	03-Feb-2009	Steven Rostedt <srostedt@redhat.com>	trace: let boot trace be chosen by command line Now that we have a working ftrace=<tracer> function, make the boot tracer get activated by it. This way we can turn it on or off without recompiling the kernel, as well as keeping the selftests on. The selftests are disabled whenever a default tracer starts running. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b2821ae68b14480bfc85ea1629537163310bc5cd	03-Feb-2009	Steven Rostedt <srostedt@redhat.com>	trace: fix default boot up tracer Peter Zijlstra started the functionality to start up a default tracing at bootup. This patch finishes the work. Now if you add 'ftrace=<tracer>' to the command line, when that tracer is registered on bootup, that tracer is selected and starts tracing. Note, all selftests for tracers that are registered after this tracer is disabled. This prevents the selftests from disturbing the running tracer, or the running tracer from disturbing the selftest. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
9011262a37cb438f0fa9394b5e83840db8f9680a	23-Jan-2009	Arnaldo Carvalho de Melo <acme@redhat.com>	ftrace: add ftrace_vprintk Impact: new helper function Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b06a830183b610c0a88c29a92feb7991a867ab46	22-Jan-2009	Steven Rostedt <srostedt@redhat.com>	trace: fix logic to start/stop counting The logic in the tracing_start/stop code prevents the WARN_ON from ever detecting if a start/stop pair was mismatched. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
97b17efe4537e11bf6669106cfe4ee2c5331b267	21-Jan-2009	Steven Rostedt <srostedt@redhat.com>	ring-buffer: do not swap if recording is disabled If the ring buffer recording has been disabled. Do not let swapping of ring buffers occur. Simply return -EAGAIN. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1092307d582a7566d23779c304cf86f3075ac5f0	16-Jan-2009	Steven Rostedt <srostedt@redhat.com>	trace: set max latency variable to zero on default Impact: trace max latencies on start of latency tracing This patch sets the max latency to zero whenever one of the irq variant tracers or the wakeup tracer is set to current tracer. Most developers expect to see output when starting up a latency tracer. But since the max_latency is already set to max, and it takes a latency greater than max_latency to be recorded, there is no trace. This is not the expected behavior and has even confused myself. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a442e5e0a2011af5b2d1f118fee0a8f9079f1d88	14-Jan-2009	Steven Rostedt <srostedt@redhat.com>	trace: stop all recording to ring buffer on ftrace_dump Impact: limit ftrace dump output Currently ftrace_dump only calls ftrace_kill that is a fast way to prevent the function tracer functions from being called (just sets a flag and clears the function to call, nothing else). It is better to also turn off any recording to the ring buffers as well. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
faf6861ebd776871e77b761c43ec045cd20b5716	14-Jan-2009	Steven Rostedt <srostedt@redhat.com>	trace: print ftrace_dump at KERN_EMERG log level Impact: fix to print out ftrace_dump when expected I was debugging a hard race condition to only find out that after I hit the race, my log level was not at level to show KERN_INFO. The time it took to trigger the race was wasted because I did not capture the trace. Since ftrace_dump is only called from kernel oops (and only when it is set in the kernel command line to do so), or when a developer adds it to their own local tree, the log level of the print should be at KERN_EMERG to make sure the print appears. ftrace_dump is not called by a normal user setup, and will not add extra unwanted print out to the console. There is no reason it should be at KERN_INFO. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
745b1626dd71ce9661a05ea4db57859ed5c773d2	16-Jan-2009	Steven Rostedt <srostedt@redhat.com>	trace: set max latency variable to zero on default Impact: trace max latencies on start of latency tracing This patch sets the max latency to zero whenever one of the irq variant tracers or the wakeup tracer is set to current tracer. Most developers expect to see output when starting up a latency tracer. But since the max_latency is already set to max, and it takes a latency greater than max_latency to be recorded, there is no trace. This is not the expected behavior and has even confused myself. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a225cdd263f340c864febb1992802fb5b08bc328	16-Jan-2009	Steven Rostedt <srostedt@redhat.com>	ftrace: remove static from function tracer functions Impact: clean up After reorganizing the functions in trace.c and trace_function.c, they no longer need to be in global context. This patch makes the functions and one variable into static. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
bb3c3c95f330f7bf16e33b002e48882616089db1	16-Jan-2009	Steven Rostedt <srostedt@redhat.com>	ftrace: move function tracer functions out of trace.c Impact: clean up of trace.c The function tracer functions were put in trace.c because it needed to share static variables that were in trace.c. Since then, those variables have become global for various reasons. This patch moves the function tracer functions into trace_function.c where they belong. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5361499101306cfb776c3cfa0f69d0479bc63868	16-Jan-2009	Steven Rostedt <srostedt@redhat.com>	ftrace: add stack trace to function tracer Impact: new feature to stack trace any function Chris Mason asked about being able to pick and choose a function and get a stack trace from it. This feature enables his request. # echo io_schedule > /debug/tracing/set_ftrace_filter # echo function > /debug/tracing/current_tracer # echo func_stack_trace > /debug/tracing/trace_options Produces the following in /debug/tracing/trace: kjournald-702 [001] 135.673060: io_schedule <-sync_buffer kjournald-702 [002] 135.673671: <= sync_buffer <= __wait_on_bit <= out_of_line_wait_on_bit <= __wait_on_buffer <= sync_dirty_buffer <= journal_commit_transaction <= kjournald Note, be careful about turning this on without filtering the functions. You may find that you have a 10 second lag between typing and seeing what you typed. This is why the stack trace for the function tracer does not use the same stack_trace flag as the other tracers use. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
0ee6b6cf5bdb793b4c68507dd65adf16341aa4ca	14-Jan-2009	Steven Rostedt <srostedt@redhat.com>	trace: stop all recording to ring buffer on ftrace_dump Impact: limit ftrace dump output Currently ftrace_dump only calls ftrace_kill that is a fast way to prevent the function tracer functions from being called (just sets a flag and clears the function to call, nothing else). It is better to also turn off any recording to the ring buffers as well. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
428aee1460a75197f0190534b4d610450ee59af1	14-Jan-2009	Steven Rostedt <srostedt@redhat.com>	trace: print ftrace_dump at KERN_EMERG log level Impact: fix to print out ftrace_dump when expected I was debugging a hard race condition to only find out that after I hit the race, my log level was not at level to show KERN_INFO. The time it took to trigger the race was wasted because I did not capture the trace. Since ftrace_dump is only called from kernel oops (and only when it is set in the kernel command line to do so), or when a developer adds it to their own local tree, the log level of the print should be at KERN_EMERG to make sure the print appears. ftrace_dump is not called by a normal user setup, and will not add extra unwanted print out to the console. There is no reason it should be at KERN_INFO. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
002bb86d8d42f18937aef396c3ecd65c7e02e21a	10-Jan-2009	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: separate events tracing and stats tracing engine Impact: tracing's Api change Currently, the stat tracing depends on the events tracing. When you switch to a new tracer, the stats files of the previous tracer will disappear. But it's more scalable to separate those two engines. This way, we can keep the stat files of one or several tracers when we want, without bothering of multiple tracer stat files or tracer switching. To build/destroys its stats files, a tracer just have to call register_stat_tracer/unregister_stat_tracer everytimes it wants to. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
4462344ee9ea9224d026801b877887f2f39774a3	01-Jan-2009	Rusty Russell <rusty@rustcorp.com.au>	cpumask: convert kernel trace functions further Impact: Reduce future memory usage, use new cpumask API. Since the last patch was created and acked, more old cpumask users slipped into kernel/trace. Mostly trivial conversions, except struct trace_iterator's "started" member becomes a cpumask_var_t. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
9e01c1b74c9531e301c900edaa92a99fcb7738f2	01-Jan-2009	Rusty Russell <rusty@rustcorp.com.au>	cpumask: convert kernel trace functions Impact: Reduce future memory usage, use new cpumask API. (Eventually, cpumask_var_t will be allocated based on nr_cpu_ids, not NR_CPUS). Convert kernel trace functions to use struct cpumask API: 1) Use cpumask_copy/cpumask_test_cpu/for_each_cpu. 2) Use cpumask_var_t and alloc_cpumask_var/free_cpumask_var everywhere. 3) Use on_each_cpu instead of playing with current->cpus_allowed. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Steven Rostedt <rostedt@goodmis.org>
1af237a099a3b8ff56aa384f605c6a68af7bf288	29-Dec-2008	Huang Weiyi <weiyi.huang@gmail.com>	tracing: removed duplicated #include Removed duplicated #include in kernel/trace/trace.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dbd0b4b33074aa6b7832a9d9a5bd985eca5c1aa2	29-Dec-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: provide the base infrastructure for histogram tracing Impact: extend the tracing API The goal of this patch is to normalize and make more easy the implementation of statistical (histogram) tracing. It implements a trace_stat file into the /debugfs/tracing directory where one can print a one-shot output of statistics/histogram entries. A tracer has to provide two basic iterator callbacks: stat_start() => the first entry stat_next(prev, idx) => the next one. Note that it is adapted for arrays or hash tables or lists.... since it provides a pointer to the previous entry and the current index of the iterator. These two callbacks are called to get a snapshot of the statistics at each opening of the trace_stat file because. The values are so updated between two "cat trace_stat". And the tracer is free to lock its datas during the iteration to keep consistent values. Since it is almost always interesting to sort statisticals values to address the problems by priority, this infrastructure provides a "sorting" of the stat entries too if desired. A tracer has just to provide a stat_cmp callback to compare two entries and the stat tracing infrastructure will build a sorted list of the given entries. A last callback, called stat_headers, can be implemented by a tracer to output headers on its trace. If one of these callbacks is changed on runtime, it just have to signal it to the stat tracing API by calling the init_tracer_stat() helper. Changes in V2: - Fix a memory leak if the user opens multiple times the trace_stat file without closing it. Now we always free our list before rebuilding it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
f633cef0200bbaec539e2dbb0bc4bed7f022f98b	24-Dec-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: change trace.c to use registered events Impact: rework trace.c to use new event register API Almost every ftrace event has to implement its output display in trace.c through a different function. Some events did not handle all the formats (trace, latency-trace, raw, hex, binary), and this method does not scale well. This patch converts the format functions to use the event API to find the event and and print its format. Currently, we have a print function for trace, latency_trace, raw, hex and binary. A trace_nop_print is available if the event wants to avoid output on a particular format. Perhaps other tracers could use this in the future (like mmiotrace and function_graph). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
f0868d1e23a8efec33beb3aa688aab7fdb1ae093	24-Dec-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: set up trace event hash infrastructure Impact: simplify/generalize/refactor trace.c The trace.c file is becoming more difficult to maintain due to the growing number of events. There is several formats that an event may be printed. This patch sets up the infrastructure of an event hash to allow for events to register how they should be printed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c47956d9ae3341d2d1998bff26620fa3338c01e4	24-Dec-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: remove obsolete print continue functionality Impact: cleanup, remove obsolete code Now that the ring buffer used by ftrace allows for variable length entries, we do not need the 'cont' feature of the buffer. This code makes other parts of ftrace more complex and by removing this it simplifies the ftrace code. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
213cc060797378059a28ebc5c539f3e9a80160bd	18-Dec-2008	Pekka J Enberg <penberg@cs.helsinki.fi>	ftrace: introduce tracing_reset_online_cpus() helper Impact: cleanup This patch factors out common code from multiple tracers into a tracing_reset_online_cpus() function and converts the tracers to use it. Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
3bddb9a3246f6df5cf3b7655cb541ac10203bb71	19-Dec-2008	Ingo Molnar <mingo@elte.hu>	tracing: fix warning in kernel/trace/trace.c this warning: kernel/trace/trace.c: In function ‘print_lat_fmt’: kernel/trace/trace.c:1826: warning: unused variable ‘state’ Triggers because 'state' has become unused - remove it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
3d9101e92529e1ff6014f95a69afc82f37b9b13a	17-Dec-2008	Thomas Gleixner <tglx@linutronix.de>	trace: fix task state printout Impact: fix occasionally incorrect trace output The tracing code has interesting varieties of printing out task state. Unfortunalely only one of the instances is correct as it copies the code from sched.c:sched_show_task(). The others are plain wrong as they treatthe bitfield as an integer offset into the character array. Also the size check of the character array is wrong as it includes the trailing \0. Use a common state decoder inline which does the Right Thing. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
66896a85cf2890b6bbbc4c9ccdcd296600ffbf89	13-Dec-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: add the printk-msg-only option Impact: display ftrace_printk messages "as is" By default, ftrace_printk() messages find their output with some other informations like pid, caller, ... Sometimes a developer just want to have the ftrace_printk left "as is", without other information. This is done by providing a default-off option called printk-msg-only. To enable it, just do `echo printk-msg-only > /debugfs/tracing/trace_options` Before the patch: <...>-2739 [000] 145.692153: __might_sleep: I'm an ftrace_printk msg in __might_sleep <...>-2739 [000] 145.692155: __might_sleep: I'm another ftrace_printk msg in __might_sleep After the patch and the printk-msg-only option enabled: I'm an ftrace_printk msg in __might_sleep I'm another ftrace_printk msg in __might_sleep Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
29c0177e6a4ac094302bed54a1d4bbb6b740a9ef	13-Dec-2008	Rusty Russell <rusty@rustcorp.com.au>	cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and cpulist_scnprintf to take pointers. Impact: change calling convention of existing cpumask APIs Most cpumask functions started with cpus_: these have been replaced by cpumask_ ones which take struct cpumask pointers as expected. These four functions don't have good replacement names; fortunately they're rarely used, so we just change them over. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Mike Travis <travis@sgi.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: mingo@redhat.com Cc: tony.luck@intel.com Cc: ralf@linux-mips.org Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: cl@linux-foundation.org Cc: srostedt@redhat.com
a93751cab71d63126687551823ed3e70cd85854a	11-Dec-2008	Markus Metzger <markut.t.metzger@intel.com>	x86, bts, ftrace: adapt the hw-branch-tracer to the ds.c interface Impact: restructure code, cleanup Remove BTS bits from the hw-branch-tracer (renamed from bts-tracer) and use the ds interface. Signed-off-by: Markus Metzger <markut.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
e2ac8ef576e45d9db7264abc51383e68d26067bb	12-Nov-2008	Robert Richter <robert.richter@amd.com>	ftrace: remove unused function arg in trace_iterator_increment() This removes the unused cpu function parameter. Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Robert Richter <robert.richter@amd.com>
e726f5f91effd8944c76475a2688093a03ba0d10	08-Dec-2008	Ingo Molnar <mingo@elte.hu>	tracing/function-graph-tracer: fix 'flags' variable mismatch this warning: kernel/trace/trace.c: In function ‘trace_vprintk’: kernel/trace/trace.c:3626: warning: ‘flags’ may be used uninitialized in this function shows some confusion about irq_flags / flags use here. We already have irq_flags so remove the extra flags variable. Signed-off-by: Ingo Molnar <mingo@elte.hu>
380c4b1411ccd6885f92b2c8ceb08433a720f44e	06-Dec-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-graph-tracer: append the tracing_graph_flag Impact: Provide a way to pause the function graph tracer As suggested by Steven Rostedt, the previous patch that prevented from spinlock function tracing shouldn't use the raw_spinlock to fix it. It's much better to follow lockdep with normal spinlock, so this patch adds a new flag for each task to make the function graph tracer able to be paused. We also can send an ftrace_printk whithout worrying of the irrelevant traced spinlock during insertion. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
8e1b82e0866befaa0b2920be296c6e4c3fc7f422	06-Dec-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-graph-tracer: turn tracing_selftest_running into an int Impact: cleanup Apply some suggestions of Steven Rostedt: _turn tracing_selftest_running into a simple int (no need of an atomic_t) _set it __read_mostly _fix a comment style Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
21a8c466f99063eeb8567318b4e305eda9015408	04-Dec-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: provide the macro task_curr_ret_stack() Impact: cleanup As suggested by Steven Rostedt, this patch provide a new macro task_curr_ret_stack() to move the cpp conditionnal CONFIG into the linux/ftrace.h headers. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ff32504fdc56407654584ef187b20022c94a3486	04-Dec-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: don't insert TRACE_PRINT during selftests Impact: fix tracer selfstests false results After setting a ftrace_printk somewhere in th kernel, I saw the Function tracer selftest failing. When a selftest occurs, the ring buffer is lurked to see if some entries were inserted. But concurrent insertion such as ftrace_printk could occured at the same time and could give false positive or negative results. This patch prevent prevent from TRACE_PRINT entries insertion during selftests. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1fd8f2a3f9a91b287a876cef830b21baafc8a799	03-Dec-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-graph-tracer: handle ftrace_printk entries Handle the TRACE_PRINT entries from the function grapg tracer and output them as a C comment just below the function that called it, as if it was a comment inside this function. Example with an ftrace_printk inside might_sleep() function: void __might_sleep(char file, int line) { static unsigned long prev_jiffy; / ratelimiting / ftrace_printk("Hi I'm a comment in might_sleep() :-)"); A chunk of a resulting trace: 0) \| _reiserfs_free_block() { 0) \| reiserfs_read_bitmap_block() { 0) \| __bread() { 0) \| __getblk() { 0) \| __find_get_block() { 0) 0.698 us \| mark_page_accessed(); 0) 2.267 us \| } 0) \| __might_sleep() { 0) \| / Hi I'm a comment in might_sleep() :-) */ 0) 1.321 us \| } 0) 5.872 us \| } 0) 7.313 us \| } 0) 8.718 us \| } And this patch brings two minor fixes: - The newline after a switch-out task has disappeared - The "\|" sign just before the cpu number on task-switch has been deleted. 0) 0.616 us \| pick_next_task_rt(); 0) 1.457 us \| _spin_trylock(); 0) 0.653 us \| _spin_unlock(); 0) 0.728 us \| _spin_trylock(); 0) 0.631 us \| _spin_unlock(); 0) 0.729 us \| native_load_sp0(); 0) 0.593 us \| native_load_tls(); ------------------------------------------ 0) cat-2834 => migrati-3 ------------------------------------------ 0) \| finish_task_switch() { 0) 0.841 us \| _spin_unlock_irq(); 0) 0.616 us \| post_schedule_rt(); 0) 3.882 us \| } Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
804a685162a7080386714166776f57255a75238e	03-Dec-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: trace single pid for function graph tracer Impact: New feature This patch makes the changes to set_ftrace_pid apply to the function graph tracer. # echo $$ > /debugfs/tracing/set_ftrace_pid # echo function_graph > /debugfs/tracing/current_tracer Will cause only the current task to be traced. Note, the trace flags are also inherited by child processes, so the children of the shell will also be traced. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ea4e2bc4d9f7370e57a343ccb5e7c0ad3222ec3c	03-Dec-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: graph of a single function This patch adds the file: /debugfs/tracing/set_graph_function which can be used along with the function graph tracer. When this file is empty, the function graph tracer will act as usual. When the file has a function in it, the function graph tracer will only trace that function. For example: # echo blk_unplug > /debugfs/tracing/set_graph_function # cat /debugfs/tracing/trace [...] ------------------------------------------ \| 2) make-19003 => kjournald-2219 ------------------------------------------ 2) \| blk_unplug() { 2) \| dm_unplug_all() { 2) \| dm_get_table() { 2) 1.381 us \| _read_lock(); 2) 0.911 us \| dm_table_get(); 2) 1. 76 us \| _read_unlock(); 2) + 12.912 us \| } 2) \| dm_table_unplug_all() { 2) \| blk_unplug() { 2) 0.778 us \| generic_unplug_device(); 2) 2.409 us \| } 2) 5.992 us \| } 2) 0.813 us \| dm_table_put(); 2) + 29. 90 us \| } 2) + 34.532 us \| } You can add up to 32 functions into this file. Currently we limit it to 32, but this may change with later improvements. To add another function, use the append '>>': # echo sys_read >> /debugfs/tracing/set_graph_function # cat /debugfs/tracing/set_graph_function blk_unplug sys_read Using the '>' will clear out the function and write anew: # echo sys_write > /debug/tracing/set_graph_function # cat /debug/tracing/set_graph_function sys_write Note, if you have function graph running while doing this, the small time between clearing it and updating it will cause the graph to record all functions. This should not be an issue because after it sets the filter, only those functions will be recorded from then on. If you need to only record a particular function then set this file first before starting the function graph tracer. In the future this side effect may be corrected. The set_graph_function file is similar to the set_ftrace_filter but it does not take wild cards nor does it allow for more than one function to be set with a single write. There is no technical reason why this is the case, I just do not have the time yet to implement that. Note, dynamic ftrace must be enabled for this to appear because it uses the dynamic ftrace records to match the name to the mcount call sites. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
e49dc19c6a19ea112fcb94b7c62ec62cdd5c08aa	03-Dec-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: function graph return for function entry Impact: feature, let entry function decide to trace or not This patch lets the graph tracer entry function decide if the tracing should be done at the end as well. This requires all function graph entry functions return 1 if it should trace, or 0 if the return should not be traced. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a5e25883a445dce94a087ca479b21a5959cd5c18	02-Dec-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: replace raw_local_irq_save with local_irq_save Impact: fix for lockdep and ftrace The raw_local_irq_save/restore confuses lockdep. This patch converts them to the local_irq_save/restore variants. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c7425acb42fff1e723b05fbf4ea11e9a455d95dc	28-Nov-2008	Török Edwin <edwintorok@gmail.com>	tracing, alpha: fix build: add missing #ifdef CONFIG_STACKTRACE There are architectures that still have no stacktrace support. Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
660c7f9be96321fc80026d76411bd15e6f418a72	26-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: add thread comm to function graph tracer Impact: enhancement to function graph tracer Export the trace_find_cmdline so the function graph tracer can use it to print the comms of the threads. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
287b6e68ca7209caec40b2f44f837c580a413bae	26-Nov-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-return-tracer: set a more human readable output Impact: feature This patch sets a C-like output for the function graph tracing. For this aim, we now call two handler for each function: one on the entry and one other on return. This way we can draw a well-ordered call stack. The pid of the previous trace is loosely stored to be compared against the one of the current trace to see if there were a context switch. Without this little feature, the call tree would seem broken at some locations. We could use the sched_tracer to capture these sched_events but this way of processing is much more simpler. 2 spaces have been chosen for indentation to fit the screen while deep calls. The time of execution in nanosecs is printed just after closed braces, it seems more easy this way to find the corresponding function. If the time was printed as a first column, it would be not so easy to find the corresponding function if it is called on a deep depth. I plan to output the return value but on 32 bits CPU, the return value can be 32 or 64, and its difficult to guess on which case we are. I don't know what would be the better solution on X86-32: only print eax (low-part) or even edx (high-part). Actually it's thee same problem when a function return a 8 bits value, the high part of eax could contain junk values... Here is an example of trace: sys_read() { fget_light() { } 526 vfs_read() { rw_verify_area() { security_file_permission() { cap_file_permission() { } 519 } 1564 } 2640 do_sync_read() { pipe_read() { __might_sleep() { } 511 pipe_wait() { prepare_to_wait() { } 760 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { update_min_vruntime() { } 504 } 1587 clear_buddies() { } 512 add_cfs_task_weight() { } 519 update_min_vruntime() { } 511 } 5602 dequeue_entity() { update_curr() { update_min_vruntime() { } 496 } 1631 clear_buddies() { } 496 update_min_vruntime() { } 527 } 4580 hrtick_update() { hrtick_start_fair() { } 488 } 1489 } 13700 } 14949 } 16016 msecs_to_jiffies() { } 496 put_prev_task_fair() { } 504 pick_next_task_fair() { } 489 pick_next_task_rt() { } 496 pick_next_task_fair() { } 489 pick_next_task_idle() { } 489 ------------8<---------- thread 4 ------------8<---------- finish_task_switch() { } 1203 do_softirq() { __do_softirq() { __local_bh_disable() { } 669 rcu_process_callbacks() { __rcu_process_callbacks() { cpu_quiet() { rcu_start_batch() { } 503 } 1647 } 3128 __rcu_process_callbacks() { } 542 } 5362 _local_bh_enable() { } 587 } 8880 } 9986 kthread_should_stop() { } 669 deactivate_task() { dequeue_task() { dequeue_task_fair() { dequeue_entity() { update_curr() { calc_delta_mine() { } 511 update_min_vruntime() { } 511 } 2813 Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
fb52607afcd0629776f1dc9e657647ceae81dd50	25-Nov-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-return-tracer: change the name into function-graph-tracer Impact: cleanup This patch changes the name of the "return function tracer" into function-graph-tracer which is a more suitable name for a tracing which makes one able to retrieve the ordered call stack during the code flow. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
8bba1bf5e2434c83f2fe8b1422604ace9bbe4cb8	25-Nov-2008	Markus Metzger <markus.t.metzger@intel.com>	x86, ftrace: call trace->open() before stopping tracing; add trace->print_header() Add a callback to allow an ftrace plug-in to write its own header. Move the call to trace->open() up a few lines. The changes are required by the BTS ftrace plug-in. Signed-off-by: Markus Metzger <markus.t.metzger@intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
e38da59269be8c0196d16dff1be5bb26076afc6a	23-Nov-2008	Török Edwin <edwintorok@gmail.com>	tracing/stack-tracer: avoid races accessing file Impact: fix race vma->vm_file reference is only stable while holding the mmap_sem, so move usage of it to within the critical section. Signed-off-by: Ingo Molnar <mingo@elte.hu>
cffa10aecb6891f090a4d53a075bc40c082c45fc	22-Nov-2008	Török Edwin <edwintorok@gmail.com>	tracing/stack-tracer: fix locking and refcounts Impact: fix refcounting/object-access bug Hold mmap_sem while looking up/accessing vma. Hold the RCU lock while using the task we looked up. Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
8d7c6a96164651dbbab449ef0b5c20ae1f76a3a1	22-Nov-2008	Török Edwin <edwintorok@gmail.com>	tracing/stack-tracer: fix style issues Impact: cleanup Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
69bb54ec05f57da7f6fac2cec0820cbc970df20f	21-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: add ftrace_off_permanent Impact: add new API to disable all of ftrace on anomalies It case of a serious anomaly being detected (like something caught by lockdep) it is a good idea to disable all tracing immediately, without grabing any locks. This patch adds ftrace_off_permanent that disables the tracers, function tracing and ring buffers without a way to enable them again. This should only be used when something serious has been detected. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b54d3de9f3b8956653b06f1a32e9f9321c6d9027	22-Nov-2008	Török Edwin <edwintorok@gmail.com>	tracing: identify which executable object the userspace address belongs to Impact: modify+improve the userstacktrace tracing visualization feature Store thread group leader id, and use it to lookup the address in the process's map. We could have looked up the address on thread's map, but the thread might not exist by the time we are called. The process might not exist either, but if you are reading trace_pipe, that is unlikely. Example usage: mount -t debugfs nodev /sys/kernel/debug cd /sys/kernel/debug/tracing echo userstacktrace >iter_ctrl echo sym-userobj >iter_ctrl echo sched_switch >current_tracer echo 1 >tracing_enabled cat trace_pipe >/tmp/trace& .... run application ... echo 0 >tracing_enabled cat /tmp/trace You'll see stack entries like: /lib/libpthread-2.7.so[+0xd370] You can convert them to function/line using: addr2line -fie /lib/libpthread-2.7.so 0xd370 Or: addr2line -fie /usr/lib/debug/libpthread-2.7.so 0xd370 For non-PIC/PIE executables this won't work: a.out[+0x73b] You need to run the following: addr2line -fie a.out 0x40073b (where 0x400000 is the default load address of a.out) Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
02b67518e2b1c490787dac7f35e1204e74fe21ba	22-Nov-2008	Török Edwin <edwintorok@gmail.com>	tracing: add support for userspace stacktraces in tracing/iter_ctrl Impact: add new (default-off) tracing visualization feature Usage example: mount -t debugfs nodev /sys/kernel/debug cd /sys/kernel/debug/tracing echo userstacktrace >iter_ctrl echo sched_switch >current_tracer echo 1 >tracing_enabled .... run application ... echo 0 >tracing_enabled Then read one of 'trace','latency_trace','trace_pipe'. To get the best output you can compile your userspace programs with frame pointers (at least glibc + the app you are tracing). Signed-off-by: Török Edwin <edwintorok@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
86fa2f60674540df0b34f5c547ed0c1cf3a8f212	19-Nov-2008	Ingo Molnar <mingo@elte.hu>	ftrace: fix selftest locking Impact: fix self-test boot crash Self-test failure forgot to re-lock the BKL - crashing the next initcall: Testing tracer irqsoff: .. no entries found ..FAILED! initcall init_irqsoff_tracer+0x0/0x11 returned 0 after 3906 usecs calling init_mmio_trace+0x0/0xf @ 1 ------------[ cut here ]------------ Kernel BUG at c0c0a915 [verbose debug info unavailable] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC last sysfs file: Pid: 1, comm: swapper Not tainted (2.6.28-rc5-tip #53704) EIP: 0060:[<c0c0a915>] EFLAGS: 00010286 CPU: 1 EIP is at unlock_kernel+0x10/0x2b EAX: ffffffff EBX: 00000000 ECX: 00000000 EDX: f7030000 ESI: c12da19c EDI: 00000000 EBP: f7039f54 ESP: f7039f54 DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068 Process swapper (pid: 1, ti=f7038000 task=f7030000 task.ti=f7038000) Stack: f7039f6c c0164d30 c013fed8 a7d8d7b4 00000000 00000000 f7039f74 c12fb78a f7039fd0 c0101132 c12fb77d 00000000 6f727200 6f632072 2d206564 c1002031 0000000f f7039fa2 f7039fb0 3531b171 00000000 00000000 0000002f c12ca480 Call Trace: [<c0164d30>] ? register_tracer+0x66/0x13f [<c013fed8>] ? ktime_get+0x19/0x1b [<c12fb78a>] ? init_mmio_trace+0xd/0xf [<c0101132>] ? do_one_initcall+0x4a/0x111 [<c12fb77d>] ? init_mmio_trace+0x0/0xf [<c015c7e6>] ? init_irq_proc+0x46/0x59 [<c12e851d>] ? kernel_init+0x104/0x152 [<c12e8419>] ? kernel_init+0x0/0x152 [<c01038b7>] ? kernel_thread_helper+0x7/0x10 Code: 58 14 43 75 0a b8 00 9b 2d c1 e8 51 43 7a ff 64 a1 00 a0 37 c1 89 58 14 5b 5d c3 55 64 8b 15 00 a0 37 c1 83 7a 14 00 89 e5 79 04 <0f> 0b eb fe 8b 42 14 48 85 c0 89 42 14 79 0a b8 00 9b 2d c1 e8 EIP: [<c0c0a915>] unlock_kernel+0x10/0x2b SS:ESP 0068:f7039f54 ---[ end trace a7919e7f17c0a725 ]--- Kernel panic - not syncing: Attempted to kill init! So clean up the flow a bit. Signed-off-by: Ingo Molnar <mingo@elte.hu>
a22506347d038a66506c6f57e9b97104128e280d	18-Nov-2008	Heiko Carstens <heiko.carstens@de.ibm.com>	ftrace: preemptoff selftest not working Impact: fix preemptoff and preemptirqsoff tracer self-tests I was wondering why the preemptoff and preemptirqsoff tracer selftests don't work on s390. After all its just that they get called from non-preemptible context: kernel_init() will execute all initcalls, however the first line in kernel_init() is lock_kernel(), which causes the preempt_count to be increased. Any later calls to add_preempt_count() (especially those from the selftests) will therefore not result in a call to trace_preempt_off() since the check below in add_preempt_count() will be false: if (preempt_count() == val) trace_preempt_off(CALLER_ADDR0, get_parent_ip(CALLER_ADDR1)); Hence the trace buffer will be empty. Fix this by releasing the BKL during the self-tests. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
0bb943c7a2136716757a263f604d26309fd98042	14-Nov-2008	Julia Lawall <julia@diku.dk>	tracing: kernel/trace/trace.c: introduce missing kfree() Impact: fix memory leak Error handling code following a kzalloc should free the allocated data. The semantic match that finds the problem is as follows: (http://www.emn.fr/x-info/coccinelle/) // <smpl> @r exists@ local idexpression x; statement S; expression E; identifier f,l; position p1,p2; expression ptr != NULL; @@ ( if ((x@p1 = $kmalloc\\|kzalloc\\|kcalloc$(...)) == NULL) S \| x@p1 = $kmalloc\\|kzalloc\\|kcalloc$(...); ... if (x == NULL) S ) <... when != x when != if (...) { <+...x...+> } x->f = E ...> ( return $0\\|<+...x...+>\\|ptr$; \| return@p2 ...; ) @script:python@ p1 << r.p1; p2 << r.p2; @@ print " file: %s kmalloc %s return %s" % (p1[0].file,p1[0].line,p2[0].line) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
0231022cc32d5f2e7f3c06b75691dda0ad6aec33	17-Nov-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/function-return-tracer: add the overrun field Impact: help to find the better depth of trace We decided to arbitrary define the depth of function return trace as "20". Perhaps this is not enough. To help finding an optimal depth, we measure now the overrun: the number of functions that have been missed for the current thread. By default this is not displayed, we have to do set a particular flag on the return tracer: echo overrun > /debug/tracing/trace_options And the overrun will be printed on the right. As the trace shows below, the current 20 depth is not enough. update_wall_time+0x37f/0x8c0 -> update_xtime_cache (345 ns) (Overruns: 2838) update_wall_time+0x384/0x8c0 -> clocksource_get_next (1141 ns) (Overruns: 2838) do_timer+0x23/0x100 -> update_wall_time (3882 ns) (Overruns: 2838) tick_do_update_jiffies64+0xbf/0x160 -> do_timer (5339 ns) (Overruns: 2838) tick_sched_timer+0x6a/0xf0 -> tick_do_update_jiffies64 (7209 ns) (Overruns: 2838) vgacon_set_cursor_size+0x98/0x120 -> native_io_delay (2613 ns) (Overruns: 274) vgacon_cursor+0x16e/0x1d0 -> vgacon_set_cursor_size (33151 ns) (Overruns: 274) set_cursor+0x5f/0x80 -> vgacon_cursor (36432 ns) (Overruns: 274) con_flush_chars+0x34/0x40 -> set_cursor (38790 ns) (Overruns: 274) release_console_sem+0x1ec/0x230 -> up (721 ns) (Overruns: 274) release_console_sem+0x225/0x230 -> wake_up_klogd (316 ns) (Overruns: 274) con_flush_chars+0x39/0x40 -> release_console_sem (2996 ns) (Overruns: 274) con_write+0x22/0x30 -> con_flush_chars (46067 ns) (Overruns: 274) n_tty_write+0x1cc/0x360 -> con_write (292670 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2a/0x90 -> native_apic_mem_write (330 ns) (Overruns: 274) irq_enter+0x17/0x70 -> idle_cpu (413 ns) (Overruns: 274) smp_apic_timer_interrupt+0x2f/0x90 -> irq_enter (1525 ns) (Overruns: 274) ktime_get_ts+0x40/0x70 -> getnstimeofday (465 ns) (Overruns: 274) ktime_get_ts+0x60/0x70 -> set_normalized_timespec (436 ns) (Overruns: 274) ktime_get+0x16/0x30 -> ktime_get_ts (2501 ns) (Overruns: 274) hrtimer_interrupt+0x77/0x1a0 -> ktime_get (3439 ns) (Overruns: 274) Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
adf9f19574334c9a29a2bc956009fcac7edf1a6b	17-Nov-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: implement a set_flag callback for tracers Impact: give a way to send specific messages to tracers The current implementation of tracing uses some flags to control the output of general tracers. But we have no way to implement custom flags handling for a specific tracer. This patch proposes a new callback for the struct tracer which called set_flag and a structure that represents a 32 bits variable flag. A tracer can implement a struct tracer_flags on which it puts the initial value of the flag integer. Than it can place a range of flags with their name and their flag mask on the flag integer. The structure that implement a single flag is called struct tracer_opt. These custom flags will be available through the trace_options file like the general tracing flags. Changing their value is done like the other general flags. For example if you have a flag that calls "foo", you can activate it by writing "foo" or "nofoo" on trace_options. Note that the set_flag callback is optional and is only needed if you want the flags changing to be signaled to your tracer and let it to accept or refuse their assignment. V2: Some arrangements in coding style.... Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
0c726da983de0704254250ef6495ca152e7abcca	16-Nov-2008	Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>	tracing: branch tracer, fix writing to trace/trace_options Impact: fix trace_options behavior writing to trace/trace_options use the index of the array to find the value of the flag. With branch tracer flag defined conditionally, this breaks writing to trace_options with branch tracer disabled. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1c80025a49855b12fa09bb6db71820e3367b1369	16-Nov-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: change the type of the init() callback Impact: extend the ->init() method with the ability to fail This bring a way to know if the initialization of a tracer successed. A tracer must return 0 on success and a traditional error (ie: -ENOMEM) if it fails. If a tracer fails to init, it is free to print a detailed warn. The tracing api will not and switch to a new tracer will just return the error from the init callback. Note: this will be used for the return tracer. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
e6e7a65aabdb696cf05a56cfd495c49a11fd4cde	16-Nov-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: fix unexpected -EINVAL when longest tracer name is set Impact: fix confusing write() -EINVAL when changing the tracer The following commit d9e540762f5cdd89f24e518ad1fd31142d0b9726 remade alive the bug which made the set of a new tracer returning -EINVAL if this is the longest name of tracer. This patch corrects it. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
d51ad7ac48f991c4a8834485727efa99a691cb87	15-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: replace raw_local_irq_save with local_irq_save Impact: fix lockdep disabling itself when function tracing is enabled The raw_local_irq_saves used in ftrace is causing problems with lockdep. (it thinks the irq flags are out of sync and disables itself with a warning) The raw ops here are not needed, and the normal local_irq_save is fine. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b6dff3ec5e116e3af6f537d4caedcad6b9e5082a	14-Nov-2008	David Howells <dhowells@redhat.com>	CRED: Separate task security context from task_struct Separate the task security context from task_struct. At this point, the security data is temporarily embedded in the task_struct with two pointers pointing to it. Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in entry.S via asm-offsets. With comment fixes Signed-off-by: Marc Dionne <marc.c.dionne@gmail.com> Signed-off-by: David Howells <dhowells@redhat.com> Acked-by: James Morris <jmorris@namei.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: James Morris <jmorris@namei.org>
12ef7d448613ead2babd41c3ebfa1fe03c20edef	12-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: CPU buffer start annotation clean ups Impact: better handling of CPU buffer start annotation Because of the confusion with the per CPU buffers wrapping where one CPU might be more active at the end of the trace than the other CPUs causing that one CPU to have a shorter history. Kernel developers were confused by the "missing" data of that one CPU at the beginning of the trace output. An annotation was added to the trace output to show that the buffer had started: # tracer: function # # TASK-PID CPU# TIMESTAMP FUNCTION # \| \| \| \| \| ##### CPU 3 buffer started #### <idle>-0 [003] 158.192959: smp_apic_timer_interrupt [...] <idle>-0 [003] 161.556520: default_idle ##### CPU 1 buffer started #### <idle>-0 [001] 161.592494: hrtimer_force_reprogram [etc] But this annotation gets a bit messy when tracers do not fill the buffers. This patch does a couple of things: One) it adds a flag to trace_options to disable these annotations Two) it does not annotate if the tracer did not overflow its buffer. This makes the output much cleaner. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ee6bce52276c0717ed3e63296e5d9465d339e923	12-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: rename iter_ctrl to trace_options Impact: rename file /debug/tracing/iter_ctrl to /debug/tracing/trace_options The original ftrace had a file called "iter_ctrl" that would control the way the output was iterated. But this file grew into a catch all for different trace options. This patch renames the file from iter_ctrl to trace_options to reflect this change. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
1696b2b0f44a8d42f3e6b1ea90c21790871c04d9	13-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: show buffer size in kilobytes Impact: change the units of buffer_size_kb to kilobytes This patch changes the units of the buffer_size_kb file to kilobytes. Reading and writing to the file uses kilobytes as units. To help users to know what units are used, the output of the file now looks like: # cat /debug/tracing/buffer_size_kb 1408 Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a94c80e78bc9f4493ffc25a02d5d7bcd93c399d0	12-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: rename trace_entries to buffer_size_kb Impact: rename of debugfs file trace_entries to buffer_size_kb The original ftrace had fixed size entries, and the number of entries was shown and modified via the file called trace_entries. By converting to the unified trace buffer, we now allow for variable size entries which makes the meaning of trace_entries pointless. Since trace_size might be confused to the size of the trace, this patch names it "buffer_size_kb" (thanks to Arjan van de Ven for this idea). [ mingo@elte.hu: changed from buffer_size to buffer_size_kb ] ( Note, the units are still bytes - the next patch changes that, to keep the wide rename patch separate from the unit-change patch. ) Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
9f029e83e968e5661d7be045bbcb620dbb909938	12-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: rename unlikely iter_ctrl to branch Impact: rename of iter_ctrl unlikely to branch The unlikely name is ugly. This patch converts the iter_ctrl command "unlikely" and "nounlikely" to "branch" and "nobranch" respectively. It also renames a lot of internal functions to use "branch" instead of "unlikely". Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2ed84eeb8808cf3c9f039213ca137ffd7d753f0e	12-Nov-2008	Steven Rostedt <srostedt@redhat.com>	trace: rename unlikely profiler to branch profiler Impact: name change of unlikely tracer and profiler Ingo Molnar suggested changing the config from UNLIKELY_PROFILE to BRANCH_PROFILING. I never did like the "unlikely" name so I went one step farther, and renamed all the unlikely configurations to a "BRANCH" variant. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
68d119f0a66f7e3663304343b072e56a2693446b	12-Nov-2008	Ingo Molnar <mingo@elte.hu>	tracing: finetune branch-tracer output Steve suggested the to change the output from this: > bash-3471 [003] 357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177 > bash-3471 [003] 357.014756: [ .... ] update_curr:sched_fair.c:489 > bash-3471 [003] 357.014758: [ .... ] calc_delta_fair:sched_fair.c:411 to this: > bash-3471 [003] 357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177 > bash-3471 [003] 357.014756: [ ok ] update_curr:sched_fair.c:489 > bash-3471 [003] 357.014758: [ ok ] calc_delta_fair:sched_fair.c:411 as it makes it clearer to the user what it means exactly. Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
f88c4ae9f8c3939bee4337c75c7a673b5de7a8a7	12-Nov-2008	Ingo Molnar <mingo@elte.hu>	tracing: branch tracer, tweak output Impact: modify the tracer output, to make it a bit easier to read Change the output from: > bash-3471 [003] 357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177 > bash-3471 [003] 357.014756: [correct] update_curr:sched_fair.c:489 > bash-3471 [003] 357.014758: [correct] calc_delta_fair:sched_fair.c:411 to: > bash-3471 [003] 357.014755: [ MISS ] sched_info_dequeued:sched_stats.h:177 > bash-3471 [003] 357.014756: [ .... ] update_curr:sched_fair.c:489 > bash-3471 [003] 357.014758: [ .... ] calc_delta_fair:sched_fair.c:411 it's good to have fields aligned vertically, and the only important information is a prediction miss, so display only that information. Signed-off-by: Ingo Molnar <mingo@elte.hu>
52f232cb720a7babb752849cbc2cab2d24021209	12-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	tracing: likely/unlikely branch annotation tracer Impact: new likely/unlikely branch tracer This patch adds a way to record the instances of the likely() and unlikely() branch condition annotations. When "unlikely" is set in /debugfs/tracing/iter_ctrl the unlikely conditions will be added to any of the ftrace tracers. The change takes effect when a new tracer is passed into the current_tracer file. For example: bash-3471 [003] 357.014755: [INCORRECT] sched_info_dequeued:sched_stats.h:177 bash-3471 [003] 357.014756: [correct] update_curr:sched_fair.c:489 bash-3471 [003] 357.014758: [correct] calc_delta_fair:sched_fair.c:411 bash-3471 [003] 357.014759: [correct] account_group_exec_runtime:sched_stats.h:356 bash-3471 [003] 357.014761: [correct] update_curr:sched_fair.c:489 bash-3471 [003] 357.014763: [INCORRECT] calc_delta_fair:sched_fair.c:411 bash-3471 [003] 357.014765: [correct] calc_delta_mine:sched.c:1279 Which shows the normal tracer heading, as well as whether the condition was correct "[correct]" or was mistaken "[INCORRECT]", followed by the function, file name and line number. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
15e6cb3673ea6277999642802406a764b49391b0	11-Nov-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing: add a tracer to catch execution time of kernel functions Impact: add new tracing plugin which can trace full (entry+exit) function calls This tracer uses the low level function return ftrace plugin to measure the execution time of the kernel functions. The first field is the caller of the function, the second is the measured function, and the last one is the execution time in nanoseconds. - v3: - HAVE_FUNCTION_RET_TRACER have been added. Each arch that support ftrace return should enable it. - ftrace_return_stub becomes ftrace_stub. - CONFIG_FUNCTION_RET_TRACER depends now on CONFIG_FUNCTION_TRACER - Return traces printing can be used for other tracers on trace.c - Adapt to the new tracing API (no more ctrl_update callback) - Correct the check of "disabled" during insertion. - Minor changes... Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5aa1ba6a6c710e747838a22d798ac97a8b248745	11-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: prevent ftrace_special from recursion Impact: stop ftrace_special from recursion The ftrace_special is used to help debug areas of the kernel. Because of this, if it is put in certain locations, the fact that it allows recursion can become a problem if the kernel developer using does not realize that. This patch changes ftrace_special to not allow recursion into itself to make it more robust. It also changes from preempt disable interrupts disable to prevent any loss of trace entries. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
bf5e6519b85b3853f2d0bb4f17a4e2eaeffeb574	11-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: disable tracing on resize Impact: fix for bug on resize This patch addresses the bug found here: http://bugzilla.kernel.org/show_bug.cgi?id=11996 When ftrace converted to the new unified trace buffer, the resizing of the buffer was not protected as much as it was originally. If tracing is performed while the resize occurs, then the buffer can be corrupted. This patch disables all ftrace buffer modifications before a resize takes place. Signed-off-by: Steven Rostedt <srostedt@redhat.com>
a309720c876d7ad2e224bfd1982c92ae4364c82e	08-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: display start of CPU buffer in trace output Impact: change in trace output Because the trace buffers are per cpu ring buffers, the start of the trace can be confusing. If one CPU is very active at the end of the trace, its history will not go as far back as the other CPU traces. This means that output for a particular CPU may not appear for the first part of a trace. To help annotate what is happening, and to prevent any more confusion, this patch adds a line that annotates the start of a CPU buffer output. For example: automount-3495 [001] 184.596443: dnotify_parent <-vfs_write [...] automount-3495 [001] 184.596449: dput <-path_put automount-3496 [002] 184.596450: down_read_trylock <-do_page_fault [...] sshd-3497 [001] 184.597069: up_read <-do_page_fault <idle>-0 [000] 184.597074: __exit_idle <-exit_idle [...] automount-3496 [002] 184.597257: filemap_fault <-__do_fault <idle>-0 [003] 184.597261: exit_idle <-smp_apic_timer_interrupt Note, parsers of a trace output should always ignore any lines that start with a '#'. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c76f06945be50564f925799ddfb6235ee4c26aa0	08-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: remove trace array ctrl Impact: remove obsolete variable in trace_array structure With the new start / stop method of ftrace, the ctrl variable in the trace_array structure is now obsolete. Remove it. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
bbf5b1a0cecb56de6236db8b01c5bfb7ab8ba8b2	08-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: remove ctrl_update method Impact: Remove the ctrl_update tracer method With the new quick start/stop method of tracing, the ctrl_update method is out of date. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
49833fc232bd6a5076496994d855f601354501d7	08-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: enable trace_printk by default Impact: have the ftrace_printk enabled on startup It is confusing to have to "echo trace_printk > /debug/tracing/iter_ctrl" after adding ftrace_printk in the kernel. Currently the trace_printk is set to off by default. ftrace_printk should only be in open kernel code when used for debugging, and thus it should be enabled by default. It may also be used to record data within a tracer, but those ftrace_printks should be within wrappers that are either enabled by trace_points or have a variable protecting the code path from being entered when the tracer is disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
9036990d462e09366f7297a2d1da6582c3e6b1d3	05-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: restructure tracing start/stop infrastructure Impact: change where tracing is started up and stopped Currently, when a new tracer is selected via echo'ing a tracer name into the current_tracer file, the startup is only done if tracing_enabled is set to one. If tracing_enabled is changed to zero (by echo'ing 0 into the tracing_enabled file) a full shutdown is performed. The full startup and shutdown of a tracer can be expensive and the user can lose out traces when echo'ing in 0 to the tracing_enabled file, because the process takes too long. There can also be places that the user would like to start and stop the tracer several times and doing the full startup and shutdown of a tracer might be too expensive. This patch performs the full startup and shutdown when a tracer is selected. It also adds a way to do a quick start or stop of a tracer. The quick version is just a flag that prevents the tracing from taking place, but the overhead of the code is still there. For example, the startup of a tracer may enable tracepoints, or enable the function tracer. The stop and start will just set a flag to have the tracer ignore the calls when the tracepoint or function trace is called. The overhead of the tracer may still be present when the tracer is stopped, but no tracing will occur. Setting the tracer to the 'nop' tracer (or any other tracer) will perform the shutdown of the tracer which will disable the tracepoint or disable the function tracer. The tracing_enabled file will simply start or stop tracing. This change is all internal. The end result for the user should be the same as before. If tracing_enabled is not set, no trace will happen. If tracing_enabled is set, then the trace will happen. The tracing_enabled variable is static between tracers. Enabling tracing_enabled and going to another tracer will keep tracing_enabled enabled. Same is true with disabling tracing_enabled. This patch will now provide a fast start/stop method to the users for enabling or disabling tracing. Note: There were two methods to the struct tracer that were never used: The methods start and stop. These were to be used as a hook to the reading of the trace output, but ended up not being necessary. These two methods are now used to enable the start and stop of each tracer, in case the tracer needs to do more than just not write into the buffer. For example, the irqsoff tracer must stop recording max latencies when tracing is stopped. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
0f04870148ecb825133bc2733f473b1c5773ac0b	05-Nov-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: soft tracing stop and start Impact: add way to quickly start stop tracing from the kernel This patch adds a soft stop and start to the trace. This simply disables function tracing via the ftrace_disabled flag, and disables the trace buffers to prevent recording. The tracing code may still be executed, but the trace will not be recorded. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
072ba49838b42c873c496d72c91bb237914cf3b6	26-Oct-2008	Eric Anholt <eric@anholt.net>	ftrace: fix breakage in bin_fmt results In 777e208d40d0953efc6fb4ab58590da3f7d8f02d we changed from outputting field->cpu (a char) to iter->cpu (unsigned int), increasing the resulting structure size by 3 bytes. Signed-off-by: Eric Anholt <eric@anholt.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
d7ad44b697c9d13e445ddc7d16f736fbac333249	31-Oct-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/fastboot: use sched switch tracer from boot tracer Impact: enhance boot trace output with scheduling events Use the sched_switch tracer from the boot tracer. We also can trace schedule events inside the initcalls. Sched tracing is disabled after the initcall has finished and then reenabled before the next one is started. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b2a866f9344cb30d7ddf5d67b5b8393daf8bef07	04-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: function tracer with irqs disabled Impact: disable interrupts during trace entry creation (as opposed to preempt) To help with performance, I set the ftracer to not disable interrupts, and only to disable preemption. If an interrupt occurred, it would not be traced, because the function tracer protects itself from recursion. This may be faster, but the trace output might miss some traces. This patch makes the fuction trace disable interrupts, but it also adds a runtime feature to disable preemption instead. It does this by having two different tracer functions. When the function tracer is enabled, it will check to see which version is requested (irqs disabled or preemption disabled). Then it will use the corresponding function as the tracer. Irq disabling is the default behaviour, but if the user wants better performance, with the chance of missing traces, then they can choose the preempt disabled version. Running hackbench 3 times with the irqs disabled and 3 times with the preempt disabled function tracer yielded: tracing type times entries recorded ------------ -------- ---------------- irq disabled 43.393 166433066 43.282 166172618 43.298 166256704 preempt disabled 38.969 159871710 38.943 159972935 39.325 161056510 Average: irqs disabled: 43.324 166287462 preempt disabled: 39.079 160300385 preempt is 10.8 percent faster than irqs disabled. I wrote a patch to count function trace recursion and reran hackbench. With irq disabled: 1,150 times the function tracer did not trace due to recursion. with preempt disabled: 5,117,718 times. The thousand times with irq disabled could be due to NMIs, or simply a case where it called a function that was not protected by notrace. But we also see that a large amount of the trace is lost with the preempt version. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
182e9f5f704ed6b9175142fe8da33c9ce0c52b52	04-Nov-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: insert in the ftrace_preempt_disable()/enable() functions Impact: use new, consolidated APIs in ftrace plugins This patch replaces the schedule safe preempt disable code with the ftrace_preempt_disable() and ftrace_preempt_enable() safe functions. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b3aa557722b3d5858f14ca559e03461c24125aaf	31-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: use kretprobe trampoline name to test in output Impact: ia64+tracing build fix When a function is kprobed, the return address is set to the kprobe_trampoline, or something similar. This caused the output of the trace to look confusing when the parent seemed to be this "kprobe_trampoline" function. To fix this, Abhishek Sagar added a test of the instruction pointer of the parent to see if it matched the kprobe_trampoline. If it did, the output would print a "[unknown/kretprobe'd]" instead. Unfortunately, not all archs do this the same way, and the trampoline function may not be exported, which causes failures in builds. This patch will compare the name instead of the pointer to see if it matches. This prevents us from depending on a function from being exported, and should work on all archs. The worst that can happen is that an arch might use a different name and then we go back to the confusing output. At least the arch will still build. Reported-by: Abhishek Sagar <sagar.abhishek@gmail.com> Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Abhishek Sagar <sagar.abhishek@gmail.com> Acked-by: Abhishek Sagar <sagar.abhishek@gmail.com>
c2c80529460095035752bf0ecc1af82c1e0f6e0f	31-Oct-2008	Al Viro <viro@ZenIV.linux.org.uk>	tracing, alpha: undefined reference to `save_stack_trace' Impact: build fix on !stacktrace architectures only select STACKTRACE on architectures that have STACKTRACE_SUPPORT ... since we also need to ifdef out the guts of ftrace_trace_stack(). We also want to disallow setting TRACE_ITER_STACKTRACE in trace_flags on such configs, but that can wait. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
d9e540762f5cdd89f24e518ad1fd31142d0b9726	01-Nov-2008	Peter Zijlstra <a.p.zijlstra@chello.nl>	ftrace: ftrace_dump_on_oops=[tracer] Impact: add new (optional) debug boot option In order to facilitate early boot trouble, allow one to specify a tracer on the kernel boot line. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a26a2a27396c0a0877aa701f8f92d08ba550a6c9	31-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: nmi safe code clean ups Impact: cleanup This patch cleans up the NMI safe code for dynamic ftrace as suggested by Andrew Morton. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
9244489a7b69fe0746dc7cb3957f02e05bd1ceb0	24-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: handle archs that do not support irqs_disabled_flags Impact: build fix on non-lockdep architectures Some architectures do not support a way to read the irq flags that is set from "local_irq_save(flags)" to determine if interrupts were disabled or enabled. Ftrace uses this information to display to the user if the trace occurred with interrupts enabled or disabled. Besides the fact that those archs that do not support this will fail to compile, unless they fix it, we do not want to have the trace simply say interrupts were not disabled or they were enabled, without knowing the real answer. This patch adds a 'X' in the output to let the user know that the architecture they are running on does not support a way for the tracer to determine if interrupts were enabled or disabled. It also lets those same archs compile with tracing enabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b807c3d0f8e39ed7cbbbe6da162650e305e8de15	30-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: nmi update statistics Impact: add more debug info to /debugfs/tracing/dyn_ftrace_total_info This patch adds dynamic ftrace NMI update statistics to the /debugfs/tracing/dyn_ftrace_total_info stat file. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
60063a66236c15f5613f91390631e06718689782	28-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: fix current_tracer error return The commit (in linux-tip) c2931e05ec5965597cbfb79ad332d4a29aeceb23 ( ftrace: return an error when setting a nonexistent tracer ) added useful code that would error when a bad tracer was written into the current_tracer file. But this had a bug if the amount written was more than the amount read by that code. The first iteration would set the tracer correctly, but since it did not consume the rest of what was written (usually whitespace), the userspace utility would continue to write what was not consumed. This second iteration would fail to find a tracer and return -EINVAL. Funny thing is that the tracer would have already been set. This patch just consumes all the data that is written to the file. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
21798a84ab383cdac0e7ee3368e0792b718b867d	28-Oct-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing: fix a build error on alpha Impact: build fix on Alpha When tracing is enabled, some arch have included <linux/irqflags.h> on their <asm/system.h> but others like alpha or m68k don't. Build error on alpha: kernel/trace/trace.c: In function 'tracing_cpumask_write': kernel/trace/trace.c:2145: error: implicit declaration of function 'raw_local_irq_disable' kernel/trace/trace.c:2162: error: implicit declaration of function 'raw_local_irq_enable' Tested on Alpha through a cross-compiler (should correct a similar issue on m68k). Reported-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
944ac4259e39801c843a915c3da8194ac9af0440	24-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: ftrace dump on oops control Impact: add (default-off) dump-trace-on-oops flag Currently, ftrace is set up to dump its contents to the console if the kernel panics or oops. This can be annoying if you have trace data in the buffers and you experience an oops, but the trace data is old or static. Usually when you want ftrace to dump its contents is when you are debugging your system and you have set up ftrace to trace the events leading to an oops. This patch adds a control variable called "ftrace_dump_on_oops" that will enable the ftrace dump to console on oops. This variable is default off but a developer can enable it either through the kernel command line by adding "ftrace_dump_on_oops" or at run time by setting (or disabling) /proc/sys/kernel/ftrace_dump_on_oops. v2: Replaced /** with /* as Randy explained that kernel-doc does not yet handle variables. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
08f5ac906d2c0faf96d608c54a0b03177376da8d	23-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: remove ftrace hash The ftrace hash was used by the ftrace_daemon code. The record ip function would place the calling address (ip) into the hash. The daemon would later read the hash and modify that code. The hash complicates the code. This patch removes it. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
81adbdc029ecc416d56563e7f159100181dd711d	23-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: only have ftrace_kill atomic When an anomaly is detected, we need a way to completely disable ftrace. Right now we have two functions: ftrace_kill and ftrace_kill_atomic. The ftrace_kill tries to do it in a "nice" way by converting everything back to a nop. The "nice" way is dangerous itself, so this patch removes it and only has the "atomic" version, which is all that is needed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
606576ce816603d9fe1fb453a88bc6eea16ca709	07-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: rename FTRACE to FUNCTION_TRACER Due to confusion between the ftrace infrastructure and the gcc profiling tracer "ftrace", this patch renames the config options from FTRACE to FUNCTION_TRACER. The other two names that are offspring from FTRACE DYNAMIC_FTRACE and FTRACE_MCOUNT_RECORD will stay the same. This patch was generated mostly by script, and partially by hand. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ad0a3b68114e8f3c25ac0045b45a2838f23e3b3a	09-Oct-2008	Harvey Harrison <harvey.harrison@gmail.com>	trace: add build-time check to avoid overrunning hex buffer Remove the runtime BUG_ON and change to a compile-time check in the macro that calls the hex format routine [Noticed by Joe Perches] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2fbc474901933c8f0c09b0280dfbb6780cb8bd60	09-Oct-2008	Harvey Harrison <harvey.harrison@gmail.com>	ftrace: fix hex output mode of ftrace Fix the output of ftrace in hex mode as the hi/lo nibbles are output in reverse order. Without this patch, the output of ftrace is: raw mode : 6474 0 141531612444 0 140 + 6402 120 S hex mode : 000091a4 00000000 000000023f1f50c1 00000000 c8 000000b2 00009120 87 ffff00c8 00000035 There is an inversion on ouput hex(6474) is 194a [based on a patch by Philippe Reynes <tremyfr@yahoo.fr>] Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
c2931e05ec5965597cbfb79ad332d4a29aeceb23	04-Oct-2008	Frederic Weisbecker <fweisbec@gmail.com>	ftrace: return an error when setting a nonexistent tracer When one try to set a nonexistent tracer, no error is returned as if the name of the tracer was correct. We should return -EINVAL. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
3ea2e6d71aafe35b8aaf89ed711a283815acfae6	04-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: make some tracers reentrant Now that the ring buffer is reentrant, some of the ftrace tracers (sched_swich, debugging traces) can also be reentrant. Note: Never make the function tracer reentrant, that can cause recursion problems all over the kernel. The function tracer must disable reentrancy. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
38697053fa006411224a1790e2adb8216440ab0f	01-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: preempt disable over interrupt disable With the new ring buffer infrastructure in ftrace, I'm trying to make ftrace a little more light weight. This patch converts a lot of the local_irq_save/restore into preempt_disable/enable. The original preempt count in a lot of cases has to be sent in as a parameter so that it can be recorded correctly. Some places were recording it incorrectly before anyway. This is also laying the ground work to make ftrace a little bit more reentrant, and remove all locking. The function tracers must still protect from reentrancy. Note: All the function tracers must be careful when using preempt_disable. It must do the following: resched = need_resched(); preempt_disable_notrace(); [...] if (resched) preempt_enable_no_resched_notrace(); else preempt_enable_notrace(); The reason is that if this function traces schedule() itself, the preempt_enable_notrace() will cause a schedule, which will lead us into a recursive failure. If we needed to reschedule before calling preempt_disable, we should have already scheduled. Since we did not, this is most likely that we should not and are probably inside a schedule function. If resched was not set, we still need to catch the need resched flag being set when preemption was off and the if case at the end will catch that for us. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
7104f300c5a69b46dda00d898034dd05c9f21739	01-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: type cast filter+verifier The mmiotrace map had a bug that would typecast the entry from the trace to the wrong type. That is a known danger of C typecasts, there's absolutely zero checking done on them. Help that problem a bit by using a GCC extension to implement a type filter that restricts the types that a trace record can be cast into, and by adding a dynamic check (in debug mode) to verify the type of the entry. This patch adds a macro to assign all entries of ftrace using the type of the variable and checking the entry id. The typecasts are now done in the macro for only those types that it knows about, which should be all the types that are allowed to be read from the tracer. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
797d3712a9dd75c720558612be05f42c031a7bb5	30-Sep-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: adapt mmiotrace to the new type of print_line, fix Correct the value's type of trace_empty function Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
d769041f865330034131525ee6a7f72eb4af2a24	01-Oct-2008	Steven Rostedt <rostedt@goodmis.org>	ring_buffer: implement new locking The old "lock always" scheme had issues with lockdep, and was not very efficient anyways. This patch does a new design to be partially lockless on writes. Writes will add new entries to the per cpu pages by simply disabling interrupts. When a write needs to go to another page than it will grab the lock. A new "read page" has been added so that the reader can pull out a page from the ring buffer to read without worrying about the writer writing over it. This allows us to not take the lock for all reads. The lock is now only taken when a read needs to go to a new page. This is far from lockless, and interrupts still need to be disabled, but it is a step towards a more lockless solution, and it also solves a lot of the issues that were noticed by the first conversion of ftrace to the ring buffers. Note: the ring_buffer_{un}lock API has been removed. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
9ff4b9744c187cae58c3774361ea090addbc4130	29-Sep-2008	Pekka Paalanen <pq@iki.fi>	tracing/ftrace: fix pipe breaking This patch fixes a bug which break the pipe when the seq is empty. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2c4f035f6c3e8fda661eb6105aa51ef07aa71607	29-Sep-2008	Frederic Weisbecker <fweisbec@gmail.com>	tracing/ftrace: change the type of the print_line callback We need a kind of disambiguation when a print_line callback returns 0. _There is not enough space to print all the entry. Please flush the seq and retry. _I can't handle this type of entry This patch changes the type of this callback for better information. Also some changes have been made in this V2. _ Only relay to default functions after the print_line callback fails. _ This patch doesn't fix the issue with the broken pipe (see patch 2/4 for that) Some things are still in discussion: _ Find better names for the enum print_line_t values _ Change the type of print_trace_line into boolean. Patches to change that can be sent later. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Acked-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu>
777e208d40d0953efc6fb4ab58590da3f7d8f02d	30-Sep-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: take advantage of variable length entries Now that the underlining ring buffer for ftrace now hold variable length entries, we can take advantage of this by only storing the size of the actual event into the buffer. This happens to increase the number of entries in the buffer dramatically. We can also get rid of the "trace_cont" operation, but I'm keeping that until we have no more users. Some of the ftrace tracers can now change their code to adapt to this new feature. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
3928a8a2d98081d1bc3c0a84a2d70e29b90ecf1c	30-Sep-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: make work with new ring buffer This patch ports ftrace over to the new ring buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b5ad384e79add1d87fff54070000dadcf218ffab	23-Sep-2008	Frédéric Weisbecker <fweisbec@gmail.com>	tracing/ftrace: make tracing suitable to run the boot tracer The tracing engine have now to be init in early_initcall to set the boot tracer. Only the debugfs settings will be initialized at fs_initcall time. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
43a15386c4faf913f7d70a47748c266d6210cd6e	21-Sep-2008	Frédéric Weisbecker <fweisbec@gmail.com>	tracing/ftrace: replace none tracer by nop tracer Replace "none" tracer by the recently created "nop" tracer. Both are pretty similar except that nop accepts TRACE_PRINT or TRACE_SPECIAL entries. And as a consequence, changing the size of the ring buffer now requires that tracing has already been disabled. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Steven Noonan <steven@uplinklabs.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5bf9a1ee350a10feb94107de32a203d81fbbe706	16-Sep-2008	Pekka Paalanen <pq@iki.fi>	ftrace: inject markers via trace_marker file Allow a user to inject a marker (TRACE_PRINT entry) into the trace ring buffer. The related file operations are derived from code by Frédéric Weisbecker <fweisbec@gmail.com>. Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
fc5e27ae4b45a0619701a83f30d9b7fad7ed9400	16-Sep-2008	Pekka Paalanen <pq@iki.fi>	mmiotrace: handle TRACE_PRINT entries Also make trace_seq_print_cont() non-static, and add a newline if the seq buffer can't hold all data. Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
801fe40001dfc263848552fb28924b766ed44ea4	16-Sep-2008	Pekka Paalanen <pq@iki.fi>	ftrace: add trace_vprintk() trace_vprintk() for easier implementation of tracer specific *_printk functions. Add check check for no_tracer, and implement __ftrace_printk() as a wrapper. Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
45dcd8b8a8ca855591e3ac882d3a7fc255d09d43	16-Sep-2008	Pekka Paalanen <pq@iki.fi>	ftrace: move mmiotrace functions out of trace.c Moves the mmiotrace specific functions from trace.c to trace_mmiotrace.c. Functions trace_wake_up(), tracing_get_trace_entry(), and tracing_generic_entry_update() are therefore made available outside trace.c. Signed-off-by: Pekka Paalanen <pq@iki.fi> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
80b5e940050c273ba277acbf3a9fbc1d4441e681	04-Sep-2008	Peter Zijlstra <peterz@infradead.org>	ftrace: sched_switch: show the wakee's cpu While profiling the smp behaviour of the scheduler it was needed to know to which cpu a task got woken. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
f09ce573f57ddc35c67b39e51f34545877b30031	04-Sep-2008	Peter Zijlstra <peterz@infradead.org>	ftrace: make ftrace_printk usable with the other tracers Currently ftrace_printk only works with the ftrace tracer, switch it to an iter_ctrl setting so we can make us of them with other tracers too. [rostedt@redhat.com: tweak to the disable condition] Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
5a90f577e5369a84b720ead42e621fcb1b8a8b21	03-Sep-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: print continue index fix An item in the trace buffer that is bigger than one entry may be split up using the TRACE_CONT entry. This makes it a virtual single entry. The current code increments the iterator index even while traversing TRACE_CONT entries, making it look like the iterator is further than it actually is. This patch adds code to not increment the iterator index while skipping over TRACE_CONT entries. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
652567aa2000f1d4a1fd434382a30d8dd4a7c980	03-Sep-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: binary and not logical for continue test Peter Zijlstra provided me with a nice brown paper bag while letting me know that I was doing a logical AND and not a binary one, making a condition true more often than it should be. Luckily, a false true is handled by the calling function and no harm is done. But this needs to be fixed regardless. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a6168353d1c0b24b7512e14d1c3e47ed69881a23	21-Aug-2008	Michael Ellerman <michael@ellerman.id.au>	ftrace: make output nicely spaced for up to 999 cpus Currently some of the ftrace output goes skewiff if you have more than 9 cpus, and some if you have more than 99. Twiddle with the headers and format strings to make up to 999 cpus display without causing spacing problems. Signed-off-by: Michael Ellerman <michael@ellerman.id.au> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
98a983aad2e5b3dc83a8a761675445cdd8f3e6bd	15-Aug-2008	Frédéric Weisbecker <fweisbec@gmail.com>	ftrace: fix some mistakes in error messages This patch fixes some mistakes on the tracer in warning messages when debugfs fails to create tracing files. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: srostedt@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
3f5a54e371ca20b119b73704f6c01b71295c1714	31-Jul-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: dump out ftrace buffers to console on panic At OLS I had a lot of interest to be able to have the ftrace buffers dumped on panic. Usually one would expect to uses kexec and examine the buffers after a new kernel is loaded. But sometimes the resources do not permit kdump and kexec, so having an option to still see the sequence of events up to the crash is very advantageous. This patch adds the option to have the ftrace buffers dumped to the console in the latency_trace format on a panic. When the option is set, the default entries per CPU buffer are lowered to 16384, since the writing to the serial (if that is the console) may take an awful long time otherwise. [ Changes since -v1: Got alpine to send correctly (as well as spell check working). Removed config option. Moved the static variables into ftrace_dump itself. Gave printk a log level. ] Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2f2c99dba2398ef7d9c21f7c793180a50e68b1f0	01-Aug-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: ftrace_printk doc moved Based on Randy Dunlap's suggestion, the ftrace_printk kernel-doc belongs with the ftrace_printk macro that should be used. Not with the __ftrace_printk internal function. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
dd0e545f061f90099a3dcc13aa77e29c6295cf23	01-Aug-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: printk formatting infrastructure This patch adds a feature that can help kernel developers debug their code using ftrace. int ftrace_printk(const char *fmt, ...); This records into the ftrace buffer using printf formatting. The entry size in the buffers are still a fixed length. A new type has been added that allows for more entries to be used for a single recording. The start of the print is still the same as the other entries. It returns the number of characters written to the ftrace buffer. For example: Having a module with the following code: static int __init ftrace_print_test(void) { ftrace_printk("jiffies are %ld\n", jiffies); return 0; } Gives me: insmod-5441 3...1 7569us : ftrace_print_test: jiffies are 4296626666 for the latency_trace file and: insmod-5441 [03] 1959.370498: ftrace_print_test jiffies are 4296626666 for the trace file. Note: Only the infrastructure should go into the kernel. It is to help facilitate debugging for other kernel developers. Calls to ftrace_printk is not intended to be left in the kernel, and should be frowned upon just like scattering printks around in the code. But having this easily at your fingertips helps the debugging go faster and bugs be solved quicker. Maybe later on, we can hook this with markers and have their printf format be sucked into ftrace output. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2e2ca155cd2213b4f398031180fb3d399d5b7db9	01-Aug-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: new continue entry - separate out from trace_entry Some tracers will need to work with more than one entry. In order to do this the trace_entry structure was split into two fields. One for the start of all entries, and one to continue an existing entry. The trace_entry structure now has a "field" entry that consists of the previous content of the trace_entry, and a "cont" entry that is just a string buffer the size of the "field" entry. Thanks to Andrew Morton for suggesting this idea. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
605ccb73f6a1c891a16268b3a2923208fc637958	27-Jul-2008	Andrea Righi <righi.andrea@gmail.com>	tracing: remove unused variable Remove the following warning with CONFIG_TRACING=y: kernel/trace/trace.c: In function ‘s_next’: kernel/trace/trace.c:1186: warning: unused variable ‘last_ent’ Signed-off-by: Andrea Righi <righi.andrea@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1986b0cb1671ea39178b4e2b00461109728fc935	24-Jul-2008	Ingo Molnar <mingo@elte.hu>	ftrace: remove latency-tracer leftover remove the :vim=ft=help tag from trace files. I used them years ago to syntax-highlight traces and forgot about this hack. Signed-off-by: Ingo Molnar <mingo@elte.hu>
60bc080090e3bf6afa29c62cb25f913706551010	11-Jul-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: separate out the function enabled variable Currently the function tracer uses the global tracer_enabled variable that is used to keep track if the tracer is enabled or not. The function tracing startup needs to be separated out, otherwise the internal happenings of the tracer startup is also recorded. This patch creates a ftrace_function_enabled variable to all the starting of the function traces to happen after everything has been started. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
b5c21b4514b38f450848feb432f7120376d01ffe	11-Jul-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: check proper config for preempt type There is no CONFIG_PREEMPT_DESKTOP. Use the proper entry CONFIG_PREEMPT. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
ecea656d1d5e912d2f3d332657ea4a6d8380f891	21-Jun-2008	Abhishek Sagar <sagar.abhishek@gmail.com>	ftrace: freeze kprobe'd records Let records identified as being kprobe'd be marked as "frozen". The trouble with records which have a kprobe installed on their mcount call-site is that they don't get updated. So if such a function which is currently being traced gets its tracing disabled due to a new filter rule (or because it was added to the notrace list) then it won't be updated and continue being traced. This patch allows scanning of all frozen records during tracing to check if they should be traced. Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
20764ff1efb440640353053ec83263e69e1259e0	12-Jun-2008	Jiri Slaby <jirislaby@gmail.com>	ftrace: fix printout Do not print loglevel before "entries of %ld bytes". Move it to the previous pr_info. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Cc: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2b1bce1787700768cbc87c8509851c6f49d252dc	09-Jun-2008	Ankita Garg <ankita@in.ibm.com>	ftrace: disable tracing when current_tracer is set to "none" Found that inspite of setting the current_tracer to "none", trace from the previous trace type continued to be collected. The patch below fixes this and causes the trace to be disabled when the "none" type is selected. Compile and boot tested the patch for functionality. Signed-off-by: Ankita Garg <ankita@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
76094a2cf46e4ab776055d4086615b884408568c	27-May-2008	Abhishek Sagar <sagar.abhishek@gmail.com>	ftrace: distinguish kretprobe'd functions in trace logs Tracing functions via ftrace which have a kretprobe installed on them, can produce misleading output in their trace logs. E.g, consider the correct trace of the following sequence: do_IRQ() { ~ irq_enter(); ~ } Trace log (sample): <idle>-0 [00] 4154504455.781616: irq_enter <- do_IRQ But if irq_enter() has a kretprobe installed on it, the return value stored on the stack at each invocation is modified to divert the return to a kprobe trampoline function called kretprobe_trampoline(). So with this the trace would (currently) look like: <idle>-0 [00] 4154504455.781616: irq_enter <- kretprobe_trampoline Now this is quite misleading to the end user, as it suggests something that didn't actually happen. So just to avoid such misinterpretations, the inlined patch aims to output such a log as: <idle>-0 [00] 4154504455.781616: irq_enter <- [unknown/kretprobe'd] Signed-off-by: Abhishek Sagar <sagar.abhishek@gmail.com> Acked-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
41bc8144d02028133bcd1d545023c6f49e8b2411	22-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: fix up cmdline recording The new work with converting the trace hooks over to markers broke the command line recording of ftrace. This patch fixes it again. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4902f8849da6d2805bd291551a6dfd48f1b4f604	22-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: move ftrace_special to trace.c Move the ftrace_special out of sched_switch to trace.c. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: pq@iki.fi Cc: proski@gnu.org Cc: sandmann@redhat.com Cc: a.p.zijlstra@chello.nl Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
19384c0314342222b18d4c7f09cdce1ca74dfd2a	22-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: limit use of check pages The check_pages function is called often enough that it can cause problems with trace outputs or even bringing the system to a halt. This patch limits the check_pages to the places that are most likely to have problems. The check is made at the flip between the global array and the max save array, as well as when the size of the buffers changes and the self tests. This patch also removes the BUG_ON from check_pages and replaces it with a WARN_ON and disabling of the tracer. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Cc: pq@iki.fi Cc: proski@gnu.org Cc: sandmann@redhat.com Cc: a.p.zijlstra@chello.nl Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
801a175bf601f9a9d5e86e92dee9adeeb6625da8	12-May-2008	Ingo Molnar <mingo@elte.hu>	mmiotrace: ftrace fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bd8ac686c73c7e925fcfe0b02dc4e7b947127864	12-May-2008	Pekka Paalanen <pq@iki.fi>	ftrace: mmiotrace, updates here is a patch that makes mmiotrace work almost well within the tracing framework. The patch applies on top of my previous patch. I have my own output formatting in place now. Summary of changes: - fix the NULL dereference that was due to not calling tracing_reset() - add print_line() callback into struct tracer - implement print_line() for mmiotrace, producing up-to-spec text - add my output header, but that is not really called in the right place - rewrote the main structs in mmiotrace - added two new trace entry types: TRACE_MMIO_RW and TRACE_MMIO_MAP - made some functions in trace.c non-static - check current==NULL in tracing_generic_entry_update() - fix(?) comparison in trace_seq_printf() Things seem to work fine except a few issues. Markers (text lines injected into mmiotrace log) are missing, I did not feel hacking them in before we have variable length entries. My output header is printed only for 'trace' file, but not 'trace_pipe'. For some reason, despite my quick fix, iter->trace is NULL in print_trace_line() when called from 'trace_pipe' file, which means I don't get proper output formatting. I only tried by loading nouveau.ko, which just detects the card, and that is traced fine. I didn't try further. Map, two reads and unmap. Works perfectly. I am missing the information about overflows, I'd prefer to have a counter for lost events. I didn't try, but I guess currently there is no way of knowning when it overflows? So, not too far from being fully operational, it seems :-) And looking at the diffstat, there also is some 700-900 lines of user space code that just became obsolete. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d618b3e6e50970a6248ac857653fdd49bcd3c045	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: sysprof updates make the sample period configurable. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
74f4e369fc5b52433ad824cef32d3bf1304549be	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: stacktrace fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
3eefae994d9224fb7771a3ddb683868363c23510	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: limit trace entries Currently there is no protection from the root user to use up all of memory for trace buffers. If the root user allocates too many entries, the OOM killer might start kill off all tasks. This patch adds an algorith to check the following condition: pages_requested > (freeable_memory + current_trace_buffer_pages) / 4 If the above is met then the allocation fails. The above prevents more than 1/4th of freeable memory from being used by trace buffers. To determine the freeable_memory, I made determine_dirtyable_memory in mm/page-writeback.c global. Special thanks goes to Peter Zijlstra for suggesting the above calculation. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
6c6c27969a4c6024e6c8838829546c02aaddca18	12-May-2008	Pekka Paalanen <pq@iki.fi>	ftrace: add readpos to struct trace_seq; add trace_seq_to_user() Refactor code from tracing_read_pipe() and create trace_seq_to_user(). Moved trace_seq_reset() call before iter->trace->read() call so that when all leftover data is returned, trace_seq is reset automatically. Signed-off-by: Pekka Paalanen <pq@iki.fi> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
a4feb8348b62fe76a63cdb5569f5c920f5283c06	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: special stacktrace Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
9fe068e92f6290e89e19adc521441661a1229f00	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: trace faster Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4823ed7eadf35e4b57ce581327e21d39585f1f32	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: fix setting of pos in read_pipe In resetting the iterator in read_pipe, the reset of pos was postitioned in the wrong location with respect to the memset operation. The current code sets pos, incorrectly, to zero. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
107bad8bef5ab2c3a3bff7648c18c9dc3abdc13b	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: add trace pipe header pluggin This patch adds a method for open_pipe and open_read to the pluggins so that they can add a header to the trace pipe call. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
53d0aa773053ab18287781e25d52c5faf9e0e09e	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: add logic to record overruns This patch sets up the infrastructure to record overruns of the tracing buffer. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
25b0b44a1c732ccfc58095cdd8438955a0a19fff	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: fix comm on function trace output In cleaning up of the sched_switch code, the function trace recording of task comms was removed. This patch adds back the recording of comms for function trace. The output of ftrace now has the task comm instead of <...>. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4fcdae83cebda24b519a89d3dd976081fff1ca80	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: comment code This is first installment of adding documentation to the ftrace. Expect many more patches of this kind in the near future. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
ab46428c6969d50ecf6f6e97b7a84abba6274368	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: modulize the number of CPU buffers Currently ftrace allocates a trace buffer for every possible CPU. Work is being done to change it to only online CPUs and add hooks to hotplug CPUS. This patch lays out the infrastructure for such a change. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c6caeeb142cd3a03c46107aac45c8effc02bbadb	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: replace simple_strtoul with strict_strtoul Andrew Morton suggested using strict_strtoul over simple_strtoul. This patch replaces them in ftrace. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cffae437cdfbc8a5370d38cefbff1dfe34dad6ca	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: simple clean ups Andrew Morton mentioned some clean ups that should be done to ftrace. This patch does some of the simple clean ups. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
afc2abc0ae4265730a0fc48618012193a09a1d10	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: cleanups no code changed. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
93dcc6ea096c51cc30ad0f6647ed05fead2439bf	12-May-2008	Thomas Gleixner <tglx@linutronix.de>	ftrace: simplify hexprint simplify hex to ascii conversion. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
a98a3c3fde3ae7614f19758a043691b6f59dac53	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: trace_entries to dynamically change trace buffer size This patch adds /debug/tracing/trace_entries that allows users to see as well as modify the number of trace entries the buffers hold. The number of entries only increments in ENTRIES_PER_PAGE which is calculated by the size of an entry with the number of entries that can fit in a page. The user does not need to use an exact size, but the entries will be rounded to one of the increments. Trying to set the entries to 0 will return with -EINVAL. To avoid race conditions, the modification of the buffer size can only be done when tracing is completely disabled (current_tracer == none). A info message will be printed if a user tries to modify the buffer size when not set to none. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2dc8f09571a61d9cb3dc47bba6608689dfe15d16	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: trace_pipe implement NONBLOCK This patch implements "NONBLOCK" for trace_pipe. If the trace_pipe is opened with O_NONBLOCK, then the trace_pipe read will not block when buffer is empty. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
845279972f1736c3463c9cebd1bad92a0a347176	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: return EOF in trace_pipe on change of tracer Break out of while loop with EOF when the current_trace changes. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
b5685aede3b7b65e72ddc73b951aa1f70798a614	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: restore iterator trace in pipe read The trace iterator is reset in the read. We still need to restore the tracer that the trace_pipe was opened with. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8487c23765b6e0444ec6b5f1530766d63fe68e35	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: allow trace_pipe to block on all reads We expect things like "cat" to block on reads to trace_pipe. That's what trace_pipe is for. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d17d969160c18b631a19c2b34d260691402650f8	12-May-2008	Ankita Garg <ankita@in.ibm.com>	ftrace: fix conversion of task state to char in latency tracer The conversion of task states to a character in the sched_switch tracer (part of latency tracer infrastructure), seems to be incorrect. We currently do it by indexing into the state_to_char array using the state value. The state values do not map directly into the array index and are thus incorrect. The following patch addresses this issue. This is also what is being done even in the show_task() routine in kernel/sched.c The patch has been compile and run tested. Signed-off-by: Ankita Garg <ankita@in.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
72829bc3d63cdc592d8f7dd86ad3b3fe8900fb74	23-May-2008	Thomas Gleixner <tglx@linutronix.de>	ftrace: move enums to ftrace.h and make helper function global picked from the mmiotracer patches to distangle the patch queues. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d15f57f23eaba975309a153b23699cd0c0236974	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: fix mutex unlock in trace output If the trace output changes on reading the trace files, there is a chance that the start function will return NULL. If the start function of a sequence returns NULL the stop equivalent is not called. In this case, all locks that are taken must be released even if they are released in the stop function. This patch fixes a case that a mutex was not released on return of NULL in the start sequence function. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
92205c2343527a863d660360599a4bf8cede77b0	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: user raw_spin_lock in tracing Lock debugging enabled cause huge performance problems for tracing. Having the lock verification happening for every function that is called because mcount calls spin_lock can cripple the system. This patch converts the spin_locks used by ftrace into raw_spin_locks. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8f96da02c14d722ad9a3713cd7273ce28c9036ad	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: remove wakeup from function trace trace_function is called by mcount and calling wake_up from that can have unpredictable results. This patch removes the wakeup from trace_function. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bac524d3f3dfeffa3a9d44f2c64035b88bcaacb4	12-May-2008	Peter Zijlstra <a.p.zijlstra@chello.nl>	ftrace: trace next state Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
88a4216c3ec4281fc7e6725cc3a3ccd01fb1aa14	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: sched special Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
f29c73fe3404f8799ed57aaf48859e0b55fc071f	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: include cpu in stacktrace Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
442e544ce52d4415a024920b84fb95c5f9aa0855	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: iter ctrl fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
36dfe9252bd4c9b55e8454363fb7e444c92c5030	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: make use of tracing_cpumask Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c7078de1aaf562a31b20984409c38cc1b40fa8a3	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: add tracing_cpumask Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4ac3ba41d372e3a9e420b36bc43589662b188a14	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: trace scheduler rbtree Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1a3c3034336320554a3342572dae98d69e054fc7	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: fix __trace_special() Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
017730c11241e26577673eb9d957cfc66172ea91	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: fix wakeups Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4e65551905fb0300ae7e667cbaa41ee2e3f29a13	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: sched tracer, trace full rbtree Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4c1f4d4f0175129934d5dbc19a39296430937a05	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: make nostacktrace the default Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
86387f7ee5d3273ff4859e2c64ce656639b6ca65	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: add stack tracing Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
57422797dc009fc83766bcf230d29dbe6e08e21e	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: add wakeup events to sched tracer Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
e309b41dd65aa953f86765eeeecc941d8e1e8b8f	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: remove notrace now that we have a kbuild method for notrace, no need to pollute the C code with the annotations. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
6fb44b717c10ecf37beaaebd312f3afa93fed714	12-May-2008	Steven Rostedt <srostedt@srostedt@redhat.com>	ftrace: add trace_function api for other tracers to use A new check was added in the ftrace function that wont trace if the CPU trace buffer is disabled. Unfortunately, other tracers used ftrace() to write to the buffer after they disabled it. The new disable check makes these calls into a nop. This patch changes the __ftrace that is called without the check into a new api for the other tracers to use, called "trace_function". The other tracers use this interface instead when the trace CPU buffer is already disabled. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2a2cc8f7c4d0dfd75720867f7dc58d24f075edfc	12-May-2008	Soeren Sandmann Pedersen <sandmann@redhat.com>	ftrace: allow the event pipe to be polled Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2577046740fe6d77864128c6187c11125c2449ea	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: build fix no need to backmerge, only affects ftrace-enabled kernels. (which is not the default) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
5e3ca0ec76fce92daa4eed0d02de9c79b1fe3920	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: introduce the "hex" output method Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2e0f57618529a2739a5e1570e6c445c9c966b595	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: build fix Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
0fd9e0dac9026df09986a4b201518ae015814aef	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: use cpu clock again Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
26994ead1fc8cced63f17e9848edc1771036664e	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: enabled tracing by default This patch is the correct way to have tracing enabled by default. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
088b1e427dbba2af93cb6a7d39258c10ff58dd27	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: pipe fixes Some fixes for better output with the trace pipe. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
dcb6308f2b56720599f6b9d5a01c33e67e69bde4	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace, locking fix should be an irq-safe lock ... Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
f0a920d5752e1788c0cba2add103076bcc0f7a49	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: add trace_special() for ad-hoc tracing. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cb0f12aae8d085140d37ada351aa5a8e76c3f9b0	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: bin-output Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
f9896bf30928922a3913a3810a4ab7908da6cfe7	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: add raw output Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
8c523a9d82dbc4f3f7d972df8c0f1eacd83d0d55	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: clean-up-pipe-iteration Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
cdd31cd2d7a0dcbec2cce3974f7129dd4fc8c879	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: remove-idx-sync remove idx syncing - it's expensive on SMP. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
53c37c17aafcf50f7c6fddaf01dda8f9d7e31ddf	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: fast, scalable, synchronized timestamps implement globally synchronized, fast and scalable time source for tracing. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
750ed1a40783432d0dcb0e6c2e813a12615d7664	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: timestamp syncing, prepare rename and uninline now() to ftrace_now(). Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4bf39a9411a4ce8712954e03a9bd1592ee345919	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: cleanups no code changed. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
d4c5a2f5870939d837293de87b41dda0012a4572	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: fix locking we can hold all cpu trace buffer locks at once - put each into a separate lock class. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
b3806b4316306dc9c542eff6c23d7d42918f3504	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: user run time file reading This patch creates a file called trace_pipe in the tracing debug directory. This file is a consumer of the trace buffers. This means that reads of this file consumes the entries from the trace buffers so that they will not be read a second time, as contrast to the static buffers latency_trace and trace. Reading from the trace_pipe will remove the entries from trace and latency_trace too. The advantage that trace_pipe has is that it can record live traces. It will block when there is nothing in the buffer, and read the entries as they are entered. An EOF happens when tracing is disabled (tracing_enabled = 0). Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
214023c3d13a71525e463b5e54e360b926b4dc90	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: add a buffer for output Later patches will need to print the same things as the seq output does. But those outputs will not use the seq utility. This patch adds a buffer to the iterator, that can be used by either the seq utility or other output. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
93a588f459da134be6ab17c4104e28441beb0d22	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: change buffers to producer consumer This patch changes the way the CPU trace buffers are handled. Instead of always starting from the trace page head, the logic is changed to a producer consumer logic. This allows for the buffers to be drained while they are alive. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
1d4db00a5e30c7b8f8dc2a1b19e886fd942be143	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: reset selftests The tests may leave stuff in the buffers. This resets the buffers after each test is run. If a test fails, it does not reset the buffer to avoid touching a buffer that is corrupted. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4e3c3333f3bd7eedfd21b1155b3c7cd24fc7f754	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: fix time offset fix time offset calculations and ordering, plus make code more consistent. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
7bd2f24c2f769e3f8f1d4fc8b9fddf689825f6a7	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: add README make it easier for newbies to find their way around. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
c7aafc549766b87819285d3480648fc652a47bc4	12-May-2008	Ingo Molnar <mingo@elte.hu>	ftrace: cleanups factor out code and clean it up. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
60a11774b38fef1ab90b18c5353bd1c7c4d311c8	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: add self-tests Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
57f50be14d57b0dbf88dd019e7bb0ff3a3dc7b81	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: fix max latency Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
89b2f97819dd074297bbe3e19eaa4afcc98845ad	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: fix updates to max trace This patch fixes some bugs to the updating of the max trace that was caused by implementing the new buffering. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
18cef379d30f5ded20cc31d7f2d342639d39919d	12-May-2008	Steven Rostedt <rostedt@goodmis.org>	ftrace: don't use raw_local_irq_save/restore Using raw_local_irq_save/restore confuses lockdep. It's fine to use the normal ones. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
4c11d7aed389375253b59e2b1865eec96663c65d	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: convert single large buffer into single pages. Allocating large buffers for the tracer may fail easily. This patch converts the buffer from a large ordered allocation to single pages. It uses the struct page LRU field to link the pages together. Later patches may also implement dynamic increasing and decreasing of the trace buffers. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
bc0c38d139ec7fcd5c030aea16b008f3732e42ac	12-May-2008	Steven Rostedt <srostedt@redhat.com>	ftrace: latency tracer infrastructure This patch adds the latency tracer infrastructure. This patch does not add anything that will select and turn it on, but will be used by later patches. If it were to be compiled, it would add the following files to the debugfs: The root tracing directory: /debugfs/tracing/ This patch also adds the following files: available_tracers list of available tracers. Currently no tracers are available. Looking into this file only shows "none" which is used to unregister all tracers. current_tracer The trace that is currently active. Empty on start up. To switch to a tracer simply echo one of the tracers that are listed in available_tracers: example: (used with later patches) echo function > /debugfs/tracing/current_tracer To disable the tracer: echo disable > /debugfs/tracing/current_tracer tracing_enabled echoing "1" into this file starts the ftrace function tracing (if sysctl kernel.ftrace_enabled=1) echoing "0" turns it off. latency_trace This file is readonly and holds the result of the trace. trace This file outputs a easier to read version of the trace. iter_ctrl Controls the way the output of traces look. So far there's two controls: echoing in "symonly" will only show the kallsyms variables without the addresses (if kallsyms was configured) echoing in "verbose" will change the output to show a lot more data, but not very easy to understand by humans. echoing in "nosymonly" turns off symonly. echoing in "noverbose" turns off verbose. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>