README
1IMPORTANT NOTE FOR 64-BIT USERS
2-------------------------------
3There are known issues with some perftools functionality on x86_64
4systems. See 64-BIT ISSUES, below.
5
6
7TCMALLOC
8--------
9Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
10tcmalloc -- a replacement for malloc and new. See below for some
11environment variables you can use with tcmalloc, as well.
12
13tcmalloc functionality is available on all systems we've tested; see
14INSTALL for more details. See README_windows.txt for instructions on
15using tcmalloc on Windows.
16
17NOTE: When compiling with programs with gcc, that you plan to link
18with libtcmalloc, it's safest to pass in the flags
19
20 -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free
21
22when compiling. gcc makes some optimizations assuming it is using its
23own, built-in malloc; that assumption obviously isn't true with
24tcmalloc. In practice, we haven't seen any problems with this, but
25the expected risk is highest for users who register their own malloc
26hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is
27lowest for folks who use tcmalloc_minimal (or, of course, who pass in
28the above flags :-) ).
29
30
31HEAP PROFILER
32-------------
33See doc/heap-profiler.html for information about how to use tcmalloc's
34heap profiler and analyze its output.
35
36As a quick-start, do the following after installing this package:
37
381) Link your executable with -ltcmalloc
392) Run your executable with the HEAPPROFILE environment var set:
40 $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
413) Run pprof to analyze the heap usage
42 $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options
43 $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
44
45You can also use LD_PRELOAD to heap-profile an executable that you
46didn't compile.
47
48There are other environment variables, besides HEAPPROFILE, you can
49set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
50below.
51
52The heap profiler is available on all unix-based systems we've tested;
53see INSTALL for more details. It is not currently available on Windows.
54
55
56HEAP CHECKER
57------------
58See doc/heap-checker.html for information about how to use tcmalloc's
59heap checker.
60
61In order to catch all heap leaks, tcmalloc must be linked *last* into
62your executable. The heap checker may mischaracterize some memory
63accesses in libraries listed after it on the link line. For instance,
64it may report these libraries as leaking memory when they're not.
65(See the source code for more details.)
66
67Here's a quick-start for how to use:
68
69As a quick-start, do the following after installing this package:
70
711) Link your executable with -ltcmalloc
722) Run your executable with the HEAPCHECK environment var set:
73 $ HEAPCHECK=1 <path/to/binary> [binary args]
74
75Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
76
77You can also use LD_PRELOAD to heap-check an executable that you
78didn't compile.
79
80The heap checker is only available on Linux at this time; see INSTALL
81for more details.
82
83
84CPU PROFILER
85------------
86See doc/cpu-profiler.html for information about how to use the CPU
87profiler and analyze its output.
88
89As a quick-start, do the following after installing this package:
90
911) Link your executable with -lprofiler
922) Run your executable with the CPUPROFILE environment var set:
93 $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
943) Run pprof to analyze the CPU usage
95 $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output
96 $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
97
98There are other environment variables, besides CPUPROFILE, you can set
99to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
100
101The CPU profiler is available on all unix-based systems we've tested;
102see INSTALL for more details. It is not currently available on Windows.
103
104NOTE: CPU profiling doesn't work after fork (unless you immediately
105 do an exec()-like call afterwards). Furthermore, if you do
106 fork, and the child calls exit(), it may corrupt the profile
107 data. You can use _exit() to work around this. We hope to have
108 a fix for both problems in the next release of perftools
109 (hopefully perftools 1.2).
110
111
112EVERYTHING IN ONE
113-----------------
114If you want the CPU profiler, heap profiler, and heap leak-checker to
115all be available for your application, you can do:
116 gcc -o myapp ... -lprofiler -ltcmalloc
117
118However, if you have a reason to use the static versions of the
119library, this two-library linking won't work:
120 gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors!
121
122Instead, use the special libtcmalloc_and_profiler library, which we
123make for just this purpose:
124 gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
125
126
127CONFIGURATION OPTIONS
128---------------------
129For advanced users, there are several flags you can pass to
130'./configure' that tweak tcmalloc performace. (These are in addition
131to the environment variables you can set at runtime to affect
132tcmalloc, described below.) See the INSTALL file for details.
133
134
135ENVIRONMENT VARIABLES
136---------------------
137The cpu profiler, heap checker, and heap profiler will lie dormant,
138using no memory or CPU, until you turn them on. (Thus, there's no
139harm in linking -lprofiler into every application, and also -ltcmalloc
140assuming you're ok using the non-libc malloc library.)
141
142The easiest way to turn them on is by setting the appropriate
143environment variables. We have several variables that let you
144enable/disable features as well as tweak parameters.
145
146Here are some of the most important variables:
147
148HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
149HEAPCHECK=<type> -- turns on heap checking with strictness 'type'
150CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
151PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
152 surrounded with ProfilerEnable()/ProfilerDisable().
153PROFILEFREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
154
155TCMALLOC_DEBUG=<level> -- the higher level, the more messages malloc emits
156MALLOCSTATS=<level> -- prints memory-use stats at program-exit
157
158For a full list of variables, see the documentation pages:
159 doc/cpuprofile.html
160 doc/heapprofile.html
161 doc/heap_checker.html
162
163
164COMPILING ON NON-LINUX SYSTEMS
165------------------------------
166
167Perftools was developed and tested on x86 Linux systems, and it works
168in its full generality only on those systems. However, we've
169successfully ported much of the tcmalloc library to FreeBSD, Solaris
170x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
171functionality in tcmalloc_minimal to Windows. See INSTALL for details.
172See README_windows.txt for details on the Windows port.
173
174
175PERFORMANCE
176-----------
177
178If you're interested in some third-party comparisons of tcmalloc to
179other malloc libraries, here are a few web pages that have been
180brought to our attention. The first discusses the effect of using
181various malloc libraries on OpenLDAP. The second compares tcmalloc to
182win32's malloc.
183 http://www.highlandsun.com/hyc/malloc/
184 http://gaiacrtn.free.fr/articles/win32perftools.html
185
186It's possible to build tcmalloc in a way that trades off faster
187performance (particularly for deletes) at the cost of more memory
188fragmentation (that is, more unusable memory on your system). See the
189INSTALL file for details.
190
191
192OLD SYSTEM ISSUES
193-----------------
194
195When compiling perftools on some old systems, like RedHat 8, you may
196get an error like this:
197 ___tls_get_addr: symbol not found
198
199This means that you have a system where some parts are updated enough
200to support Thread Local Storage, but others are not. The perftools
201configure script can't always detect this kind of case, leading to
202that error. To fix it, just comment out (or delete) the line
203 #define HAVE_TLS 1
204in your config.h file before building.
205
206
20764-BIT ISSUES
208-------------
209
210There are two issues that can cause program hangs or crashes on x86_64
21164-bit systems, which use the libunwind library to get stack-traces.
212Neither issue should affect the core tcmalloc library; they both
213affect the perftools tools such as cpu-profiler, heap-checker, and
214heap-profiler.
215
2161) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
217libc function dl_iterate_phdr() acquires its locks in the wrong
218order. This bug should not affect tcmalloc, but may cause occasional
219deadlock with the cpu-profiler, heap-profiler, and heap-checker.
220Its likeliness increases the more dlopen() commands an executable has.
221Most executables don't have any, though several library routines like
222getgrgid() call dlopen() behind the scenes.
223
2242) On x86-64 64-bit systems, while tcmalloc itself works fine, the
225cpu-profiler tool is unreliable: it will sometimes work, but sometimes
226cause a segfault. I'll explain the problem first, and then some
227workarounds.
228
229Note that this only affects the cpu-profiler, which is a
230gperftools feature you must turn on manually by setting the
231CPUPROFILE environment variable. If you do not turn on cpu-profiling,
232you shouldn't see any crashes due to perftools.
233
234The gory details: The underlying problem is in the backtrace()
235function, which is a built-in function in libc.
236Backtracing is fairly straightforward in the normal case, but can run
237into problems when having to backtrace across a signal frame.
238Unfortunately, the cpu-profiler uses signals in order to register a
239profiling event, so every backtrace that the profiler does crosses a
240signal frame.
241
242In our experience, the only time there is trouble is when the signal
243fires in the middle of pthread_mutex_lock. pthread_mutex_lock is
244called quite a bit from system libraries, particularly at program
245startup and when creating a new thread.
246
247The solution: The dwarf debugging format has support for 'cfi
248annotations', which make it easy to recognize a signal frame. Some OS
249distributions, such as Fedora and gentoo 2007.0, already have added
250cfi annotations to their libc. A future version of libunwind should
251recognize these annotations; these systems should not see any
252crashses.
253
254Workarounds: If you see problems with crashes when running the
255cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
256your code, rather than setting CPUPROFILE. This will profile only
257those sections of the codebase. Though we haven't done much testing,
258in theory this should reduce the chance of crashes by limiting the
259signal generation to only a small part of the codebase. Ideally, you
260would not use ProfilerStart()/ProfilerStop() around code that spawns
261new threads, or is otherwise likely to cause a call to
262pthread_mutex_lock!
263
264---
26517 May 2011
266
README_windows.txt
1--- COMPILING
2
3This project has begun being ported to Windows. A working solution
4file exists in this directory:
5 gperftools.sln
6
7You can load this solution file into VC++ 7.1 (Visual Studio 2003) or
8later -- in the latter case, it will automatically convert the files
9to the latest format for you.
10
11When you build the solution, it will create a number of unittests,
12which you can run by hand (or, more easily, under the Visual Studio
13debugger) to make sure everything is working properly on your system.
14The binaries will end up in a directory called "debug" or "release" in
15the top-level directory (next to the .sln file). It will also create
16two binaries, nm-pdb and addr2line-pdb, which you should install in
17the same directory you install the 'pprof' perl script.
18
19I don't know very much about how to install DLLs on Windows, so you'll
20have to figure out that part for yourself. If you choose to just
21re-use the existing .sln, make sure you set the IncludeDir's
22appropriately! Look at the properties for libtcmalloc_minimal.dll.
23
24Note that these systems are set to build in Debug mode by default.
25You may want to change them to Release mode.
26
27To use tcmalloc_minimal in your own projects, you should only need to
28build the dll and install it someplace, so you can link it into
29further binaries. To use the dll, you need to add the following to
30the linker line of your executable:
31 "libtcmalloc_minimal.lib" /INCLUDE:"__tcmalloc"
32
33Here is how to accomplish this in Visual Studio 2005 (VC8):
34
351) Have your executable depend on the tcmalloc library by selecting
36 "Project Dependencies..." from the "Project" menu. Your executable
37 should depend on "libtcmalloc_minimal".
38
392) Have your executable depend on a tcmalloc symbol -- this is
40 necessary so the linker doesn't "optimize out" the libtcmalloc
41 dependency -- by right-clicking on your executable's project (in
42 the solution explorer), selecting Properties from the pull-down
43 menu, then selecting "Configuration Properties" -> "Linker" ->
44 "Input". Then, in the "Force Symbol References" field, enter the
45 text "__tcmalloc" (without the quotes). Be sure to do this for both
46 debug and release modes!
47
48You can also link tcmalloc code in statically -- see the example
49project tcmalloc_minimal_unittest-static, which does this. For this
50to work, you'll need to add "/D PERFTOOLS_DLL_DECL=" to the compile
51line of every perftools .cc file. You do not need to depend on the
52tcmalloc symbol in this case (that is, you don't need to do either
53step 1 or step 2 from above).
54
55An alternative to all the above is to statically link your application
56with libc, and then replace its malloc with tcmalloc. This allows you
57to just build and link your program normally; the tcmalloc support
58comes in a post-processing step. This is more reliable than the above
59technique (which depends on run-time patching, which is inherently
60fragile), though more work to set up. For details, see
61 https://groups.google.com/group/google-perftools/browse_thread/thread/41cd3710af85e57b
62
63
64--- THE HEAP-PROFILER
65
66The heap-profiler has had a preliminary port to Windows. It has not
67been well tested, and probably does not work at all when Frame Pointer
68Optimization (FPO) is enabled -- that is, in release mode. The other
69features of perftools, such as the cpu-profiler and leak-checker, have
70not yet been ported to Windows at all.
71
72
73--- WIN64
74
75The function-patcher has to disassemble code, and is very
76x86-specific. However, the rest of perftools should work fine for
77both x86 and x64. In particular, if you use the 'statically link with
78libc, and replace its malloc with tcmalloc' approach, mentioned above,
79it should be possible to use tcmalloc with 64-bit windows.
80
81As of perftools 1.10, there is some support for disassembling x86_64
82instructions, for work with win64. This work is preliminary, but the
83test file preamble_patcher_test.cc is provided to play around with
84that a bit. preamble_patcher_test will not compile on win32.
85
86
87--- ISSUES
88
89NOTE FOR WIN2K USERS: According to reports
90(http://code.google.com/p/gperftools/issues/detail?id=127)
91the stack-tracing necessary for the heap-profiler does not work on
92Win2K. The best workaround is, if you are building on a Win2k system
93is to add "/D NO_TCMALLOC_SAMPLES=" to your build, to turn off the
94stack-tracing. You will not be able to use the heap-profiler if you
95do this.
96
97NOTE ON _MSIZE and _RECALLOC: The tcmalloc version of _msize returns
98the size of the region tcmalloc allocated for you -- which is at least
99as many bytes you asked for, but may be more. (btw, these *are* bytes
100you own, even if you didn't ask for all of them, so it's correct code
101to access all of them if you want.) Unfortunately, the Windows CRT
102_recalloc() routine assumes that _msize returns exactly as many bytes
103as were requested. As a result, _recalloc() may not zero out new
104bytes correctly. IT'S SAFEST NOT TO USE _RECALLOC WITH TCMALLOC.
105_recalloc() is a tricky routine to use in any case (it's not safe to
106use with realloc, for instance).
107
108
109I have little experience with Windows programming, so there may be
110better ways to set this up than I've done! If you run across any
111problems, please post to the google-perftools Google Group, or report
112them on the gperftools Google Code site:
113 http://groups.google.com/group/google-perftools
114 http://code.google.com/p/gperftools/issues/list
115
116-- craig
117
118Last modified: 2 February 2012
119