1IMPORTANT NOTE FOR 64-BIT USERS
2-------------------------------
3There are known issues with some perftools functionality on x86_64
4systems.  See 64-BIT ISSUES, below.
5
6
7TCMALLOC
8--------
9Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
10tcmalloc -- a replacement for malloc and new.  See below for some
11environment variables you can use with tcmalloc, as well.
12
13tcmalloc functionality is available on all systems we've tested; see
14INSTALL for more details.  See README_windows.txt for instructions on
15using tcmalloc on Windows.
16
17NOTE: When compiling with programs with gcc, that you plan to link
18with libtcmalloc, it's safest to pass in the flags
19
20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free
21
22when compiling.  gcc makes some optimizations assuming it is using its
23own, built-in malloc; that assumption obviously isn't true with
24tcmalloc.  In practice, we haven't seen any problems with this, but
25the expected risk is highest for users who register their own malloc
26hooks with tcmalloc (using gperftools/malloc_hook.h).  The risk is
27lowest for folks who use tcmalloc_minimal (or, of course, who pass in
28the above flags :-) ).
29
30
31HEAP PROFILER
32-------------
33See doc/heap-profiler.html for information about how to use tcmalloc's
34heap profiler and analyze its output.
35
36As a quick-start, do the following after installing this package:
37
381) Link your executable with -ltcmalloc
392) Run your executable with the HEAPPROFILE environment var set:
40     $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
413) Run pprof to analyze the heap usage
42     $ pprof <path/to/binary> /tmp/heapprof.0045.heap  # run 'ls' to see options
43     $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
44
45You can also use LD_PRELOAD to heap-profile an executable that you
46didn't compile.
47
48There are other environment variables, besides HEAPPROFILE, you can
49set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
50below.
51
52The heap profiler is available on all unix-based systems we've tested;
53see INSTALL for more details.  It is not currently available on Windows.
54
55
56HEAP CHECKER
57------------
58See doc/heap-checker.html for information about how to use tcmalloc's
59heap checker.
60
61In order to catch all heap leaks, tcmalloc must be linked *last* into
62your executable.  The heap checker may mischaracterize some memory
63accesses in libraries listed after it on the link line.  For instance,
64it may report these libraries as leaking memory when they're not.
65(See the source code for more details.)
66
67Here's a quick-start for how to use:
68
69As a quick-start, do the following after installing this package:
70
711) Link your executable with -ltcmalloc
722) Run your executable with the HEAPCHECK environment var set:
73     $ HEAPCHECK=1 <path/to/binary> [binary args]
74
75Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
76
77You can also use LD_PRELOAD to heap-check an executable that you
78didn't compile.
79
80The heap checker is only available on Linux at this time; see INSTALL
81for more details.
82
83
84CPU PROFILER
85------------
86See doc/cpu-profiler.html for information about how to use the CPU
87profiler and analyze its output.
88
89As a quick-start, do the following after installing this package:
90
911) Link your executable with -lprofiler
922) Run your executable with the CPUPROFILE environment var set:
93     $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
943) Run pprof to analyze the CPU usage
95     $ pprof <path/to/binary> /tmp/prof.out      # -pg-like text output
96     $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
97
98There are other environment variables, besides CPUPROFILE, you can set
99to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
100
101The CPU profiler is available on all unix-based systems we've tested;
102see INSTALL for more details.  It is not currently available on Windows.
103
104NOTE: CPU profiling doesn't work after fork (unless you immediately
105      do an exec()-like call afterwards).  Furthermore, if you do
106      fork, and the child calls exit(), it may corrupt the profile
107      data.  You can use _exit() to work around this.  We hope to have
108      a fix for both problems in the next release of perftools
109      (hopefully perftools 1.2).
110
111
112EVERYTHING IN ONE
113-----------------
114If you want the CPU profiler, heap profiler, and heap leak-checker to
115all be available for your application, you can do:
116   gcc -o myapp ... -lprofiler -ltcmalloc
117
118However, if you have a reason to use the static versions of the
119library, this two-library linking won't work:
120   gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a  # errors!
121
122Instead, use the special libtcmalloc_and_profiler library, which we
123make for just this purpose:
124   gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
125
126
127CONFIGURATION OPTIONS
128---------------------
129For advanced users, there are several flags you can pass to
130'./configure' that tweak tcmalloc performace.  (These are in addition
131to the environment variables you can set at runtime to affect
132tcmalloc, described below.)  See the INSTALL file for details.
133
134
135ENVIRONMENT VARIABLES
136---------------------
137The cpu profiler, heap checker, and heap profiler will lie dormant,
138using no memory or CPU, until you turn them on.  (Thus, there's no
139harm in linking -lprofiler into every application, and also -ltcmalloc
140assuming you're ok using the non-libc malloc library.)
141
142The easiest way to turn them on is by setting the appropriate
143environment variables.  We have several variables that let you
144enable/disable features as well as tweak parameters.
145
146Here are some of the most important variables:
147
148HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
149HEAPCHECK=<type>  -- turns on heap checking with strictness 'type'
150CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
151PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
152                     surrounded with ProfilerEnable()/ProfilerDisable().
153PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
154
155TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits
156MALLOCSTATS=<level>    -- prints memory-use stats at program-exit
157
158For a full list of variables, see the documentation pages:
159   doc/cpuprofile.html
160   doc/heapprofile.html
161   doc/heap_checker.html
162
163
164COMPILING ON NON-LINUX SYSTEMS
165------------------------------
166
167Perftools was developed and tested on x86 Linux systems, and it works
168in its full generality only on those systems.  However, we've
169successfully ported much of the tcmalloc library to FreeBSD, Solaris
170x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
171functionality in tcmalloc_minimal to Windows.  See INSTALL for details.
172See README_windows.txt for details on the Windows port.
173
174
175PERFORMANCE
176-----------
177
178If you're interested in some third-party comparisons of tcmalloc to
179other malloc libraries, here are a few web pages that have been
180brought to our attention.  The first discusses the effect of using
181various malloc libraries on OpenLDAP.  The second compares tcmalloc to
182win32's malloc.
183  http://www.highlandsun.com/hyc/malloc/
184  http://gaiacrtn.free.fr/articles/win32perftools.html
185
186It's possible to build tcmalloc in a way that trades off faster
187performance (particularly for deletes) at the cost of more memory
188fragmentation (that is, more unusable memory on your system).  See the
189INSTALL file for details.
190
191
192OLD SYSTEM ISSUES
193-----------------
194
195When compiling perftools on some old systems, like RedHat 8, you may
196get an error like this:
197    ___tls_get_addr: symbol not found
198
199This means that you have a system where some parts are updated enough
200to support Thread Local Storage, but others are not.  The perftools
201configure script can't always detect this kind of case, leading to
202that error.  To fix it, just comment out (or delete) the line
203   #define HAVE_TLS 1
204in your config.h file before building.
205
206
20764-BIT ISSUES
208-------------
209
210There are two issues that can cause program hangs or crashes on x86_64
21164-bit systems, which use the libunwind library to get stack-traces.
212Neither issue should affect the core tcmalloc library; they both
213affect the perftools tools such as cpu-profiler, heap-checker, and
214heap-profiler.
215
2161) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
217libc function dl_iterate_phdr() acquires its locks in the wrong
218order.  This bug should not affect tcmalloc, but may cause occasional
219deadlock with the cpu-profiler, heap-profiler, and heap-checker.
220Its likeliness increases the more dlopen() commands an executable has.
221Most executables don't have any, though several library routines like
222getgrgid() call dlopen() behind the scenes.
223
2242) On x86-64 64-bit systems, while tcmalloc itself works fine, the
225cpu-profiler tool is unreliable: it will sometimes work, but sometimes
226cause a segfault.  I'll explain the problem first, and then some
227workarounds.
228
229Note that this only affects the cpu-profiler, which is a
230gperftools feature you must turn on manually by setting the
231CPUPROFILE environment variable.  If you do not turn on cpu-profiling,
232you shouldn't see any crashes due to perftools.
233
234The gory details: The underlying problem is in the backtrace()
235function, which is a built-in function in libc.
236Backtracing is fairly straightforward in the normal case, but can run
237into problems when having to backtrace across a signal frame.
238Unfortunately, the cpu-profiler uses signals in order to register a
239profiling event, so every backtrace that the profiler does crosses a
240signal frame.
241
242In our experience, the only time there is trouble is when the signal
243fires in the middle of pthread_mutex_lock.  pthread_mutex_lock is
244called quite a bit from system libraries, particularly at program
245startup and when creating a new thread.
246
247The solution: The dwarf debugging format has support for 'cfi
248annotations', which make it easy to recognize a signal frame.  Some OS
249distributions, such as Fedora and gentoo 2007.0, already have added
250cfi annotations to their libc.  A future version of libunwind should
251recognize these annotations; these systems should not see any
252crashses.
253
254Workarounds: If you see problems with crashes when running the
255cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
256your code, rather than setting CPUPROFILE.  This will profile only
257those sections of the codebase.  Though we haven't done much testing,
258in theory this should reduce the chance of crashes by limiting the
259signal generation to only a small part of the codebase.  Ideally, you
260would not use ProfilerStart()/ProfilerStop() around code that spawns
261new threads, or is otherwise likely to cause a call to
262pthread_mutex_lock!
263
264---
26517 May 2011
266