1IMPORTANT NOTE FOR 64-BIT USERS 2------------------------------- 3There are known issues with some perftools functionality on x86_64 4systems. See 64-BIT ISSUES, below. 5 6 7TCMALLOC 8-------- 9Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of 10tcmalloc -- a replacement for malloc and new. See below for some 11environment variables you can use with tcmalloc, as well. 12 13tcmalloc functionality is available on all systems we've tested; see 14INSTALL for more details. See README_windows.txt for instructions on 15using tcmalloc on Windows. 16 17NOTE: When compiling with programs with gcc, that you plan to link 18with libtcmalloc, it's safest to pass in the flags 19 20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free 21 22when compiling. gcc makes some optimizations assuming it is using its 23own, built-in malloc; that assumption obviously isn't true with 24tcmalloc. In practice, we haven't seen any problems with this, but 25the expected risk is highest for users who register their own malloc 26hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is 27lowest for folks who use tcmalloc_minimal (or, of course, who pass in 28the above flags :-) ). 29 30 31HEAP PROFILER 32------------- 33See doc/heap-profiler.html for information about how to use tcmalloc's 34heap profiler and analyze its output. 35 36As a quick-start, do the following after installing this package: 37 381) Link your executable with -ltcmalloc 392) Run your executable with the HEAPPROFILE environment var set: 40 $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args] 413) Run pprof to analyze the heap usage 42 $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options 43 $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap 44 45You can also use LD_PRELOAD to heap-profile an executable that you 46didn't compile. 47 48There are other environment variables, besides HEAPPROFILE, you can 49set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES" 50below. 51 52The heap profiler is available on all unix-based systems we've tested; 53see INSTALL for more details. It is not currently available on Windows. 54 55 56HEAP CHECKER 57------------ 58See doc/heap-checker.html for information about how to use tcmalloc's 59heap checker. 60 61In order to catch all heap leaks, tcmalloc must be linked *last* into 62your executable. The heap checker may mischaracterize some memory 63accesses in libraries listed after it on the link line. For instance, 64it may report these libraries as leaking memory when they're not. 65(See the source code for more details.) 66 67Here's a quick-start for how to use: 68 69As a quick-start, do the following after installing this package: 70 711) Link your executable with -ltcmalloc 722) Run your executable with the HEAPCHECK environment var set: 73 $ HEAPCHECK=1 <path/to/binary> [binary args] 74 75Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian 76 77You can also use LD_PRELOAD to heap-check an executable that you 78didn't compile. 79 80The heap checker is only available on Linux at this time; see INSTALL 81for more details. 82 83 84CPU PROFILER 85------------ 86See doc/cpu-profiler.html for information about how to use the CPU 87profiler and analyze its output. 88 89As a quick-start, do the following after installing this package: 90 911) Link your executable with -lprofiler 922) Run your executable with the CPUPROFILE environment var set: 93 $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args] 943) Run pprof to analyze the CPU usage 95 $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output 96 $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output 97 98There are other environment variables, besides CPUPROFILE, you can set 99to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below. 100 101The CPU profiler is available on all unix-based systems we've tested; 102see INSTALL for more details. It is not currently available on Windows. 103 104NOTE: CPU profiling doesn't work after fork (unless you immediately 105 do an exec()-like call afterwards). Furthermore, if you do 106 fork, and the child calls exit(), it may corrupt the profile 107 data. You can use _exit() to work around this. We hope to have 108 a fix for both problems in the next release of perftools 109 (hopefully perftools 1.2). 110 111 112EVERYTHING IN ONE 113----------------- 114If you want the CPU profiler, heap profiler, and heap leak-checker to 115all be available for your application, you can do: 116 gcc -o myapp ... -lprofiler -ltcmalloc 117 118However, if you have a reason to use the static versions of the 119library, this two-library linking won't work: 120 gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors! 121 122Instead, use the special libtcmalloc_and_profiler library, which we 123make for just this purpose: 124 gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a 125 126 127CONFIGURATION OPTIONS 128--------------------- 129For advanced users, there are several flags you can pass to 130'./configure' that tweak tcmalloc performace. (These are in addition 131to the environment variables you can set at runtime to affect 132tcmalloc, described below.) See the INSTALL file for details. 133 134 135ENVIRONMENT VARIABLES 136--------------------- 137The cpu profiler, heap checker, and heap profiler will lie dormant, 138using no memory or CPU, until you turn them on. (Thus, there's no 139harm in linking -lprofiler into every application, and also -ltcmalloc 140assuming you're ok using the non-libc malloc library.) 141 142The easiest way to turn them on is by setting the appropriate 143environment variables. We have several variables that let you 144enable/disable features as well as tweak parameters. 145 146Here are some of the most important variables: 147 148HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix 149HEAPCHECK=<type> -- turns on heap checking with strictness 'type' 150CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file. 151PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code 152 surrounded with ProfilerEnable()/ProfilerDisable(). 153PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples. 154 155TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits 156MALLOCSTATS=<level> -- prints memory-use stats at program-exit 157 158For a full list of variables, see the documentation pages: 159 doc/cpuprofile.html 160 doc/heapprofile.html 161 doc/heap_checker.html 162 163 164COMPILING ON NON-LINUX SYSTEMS 165------------------------------ 166 167Perftools was developed and tested on x86 Linux systems, and it works 168in its full generality only on those systems. However, we've 169successfully ported much of the tcmalloc library to FreeBSD, Solaris 170x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic 171functionality in tcmalloc_minimal to Windows. See INSTALL for details. 172See README_windows.txt for details on the Windows port. 173 174 175PERFORMANCE 176----------- 177 178If you're interested in some third-party comparisons of tcmalloc to 179other malloc libraries, here are a few web pages that have been 180brought to our attention. The first discusses the effect of using 181various malloc libraries on OpenLDAP. The second compares tcmalloc to 182win32's malloc. 183 http://www.highlandsun.com/hyc/malloc/ 184 http://gaiacrtn.free.fr/articles/win32perftools.html 185 186It's possible to build tcmalloc in a way that trades off faster 187performance (particularly for deletes) at the cost of more memory 188fragmentation (that is, more unusable memory on your system). See the 189INSTALL file for details. 190 191 192OLD SYSTEM ISSUES 193----------------- 194 195When compiling perftools on some old systems, like RedHat 8, you may 196get an error like this: 197 ___tls_get_addr: symbol not found 198 199This means that you have a system where some parts are updated enough 200to support Thread Local Storage, but others are not. The perftools 201configure script can't always detect this kind of case, leading to 202that error. To fix it, just comment out (or delete) the line 203 #define HAVE_TLS 1 204in your config.h file before building. 205 206 20764-BIT ISSUES 208------------- 209 210There are two issues that can cause program hangs or crashes on x86_64 21164-bit systems, which use the libunwind library to get stack-traces. 212Neither issue should affect the core tcmalloc library; they both 213affect the perftools tools such as cpu-profiler, heap-checker, and 214heap-profiler. 215 2161) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the 217libc function dl_iterate_phdr() acquires its locks in the wrong 218order. This bug should not affect tcmalloc, but may cause occasional 219deadlock with the cpu-profiler, heap-profiler, and heap-checker. 220Its likeliness increases the more dlopen() commands an executable has. 221Most executables don't have any, though several library routines like 222getgrgid() call dlopen() behind the scenes. 223 2242) On x86-64 64-bit systems, while tcmalloc itself works fine, the 225cpu-profiler tool is unreliable: it will sometimes work, but sometimes 226cause a segfault. I'll explain the problem first, and then some 227workarounds. 228 229Note that this only affects the cpu-profiler, which is a 230gperftools feature you must turn on manually by setting the 231CPUPROFILE environment variable. If you do not turn on cpu-profiling, 232you shouldn't see any crashes due to perftools. 233 234The gory details: The underlying problem is in the backtrace() 235function, which is a built-in function in libc. 236Backtracing is fairly straightforward in the normal case, but can run 237into problems when having to backtrace across a signal frame. 238Unfortunately, the cpu-profiler uses signals in order to register a 239profiling event, so every backtrace that the profiler does crosses a 240signal frame. 241 242In our experience, the only time there is trouble is when the signal 243fires in the middle of pthread_mutex_lock. pthread_mutex_lock is 244called quite a bit from system libraries, particularly at program 245startup and when creating a new thread. 246 247The solution: The dwarf debugging format has support for 'cfi 248annotations', which make it easy to recognize a signal frame. Some OS 249distributions, such as Fedora and gentoo 2007.0, already have added 250cfi annotations to their libc. A future version of libunwind should 251recognize these annotations; these systems should not see any 252crashses. 253 254Workarounds: If you see problems with crashes when running the 255cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into 256your code, rather than setting CPUPROFILE. This will profile only 257those sections of the codebase. Though we haven't done much testing, 258in theory this should reduce the chance of crashes by limiting the 259signal generation to only a small part of the codebase. Ideally, you 260would not use ProfilerStart()/ProfilerStop() around code that spawns 261new threads, or is otherwise likely to cause a call to 262pthread_mutex_lock! 263 264--- 26517 May 2011 266