123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285 |
- gperftools
- ----------
- (originally Google Performance Tools)
- The fastest malloc we’ve seen; works particularly well with threads
- and STL. Also: thread-friendly heap-checker, heap-profiler, and
- cpu-profiler.
- OVERVIEW
- ---------
- gperftools is a collection of a high-performance multi-threaded
- malloc() implementation, plus some pretty nifty performance analysis
- tools.
- gperftools is distributed under the terms of the BSD License. Join our
- mailing list at [email protected] for updates:
- https://groups.google.com/forum/#!forum/gperftools
- gperftools was original home for pprof program. But do note that
- original pprof (which is still included with gperftools) is now
- deprecated in favor of golang version at https://github.com/google/pprof
- TCMALLOC
- --------
- Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
- tcmalloc -- a replacement for malloc and new. See below for some
- environment variables you can use with tcmalloc, as well.
- tcmalloc functionality is available on all systems we've tested; see
- INSTALL for more details. See README_windows.txt for instructions on
- using tcmalloc on Windows.
- NOTE: When compiling with programs with gcc, that you plan to link
- with libtcmalloc, it's safest to pass in the flags
- -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free
- when compiling. gcc makes some optimizations assuming it is using its
- own, built-in malloc; that assumption obviously isn't true with
- tcmalloc. In practice, we haven't seen any problems with this, but
- the expected risk is highest for users who register their own malloc
- hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is
- lowest for folks who use tcmalloc_minimal (or, of course, who pass in
- the above flags :-) ).
- HEAP PROFILER
- -------------
- See docs/heapprofile.html for information about how to use tcmalloc's
- heap profiler and analyze its output.
- As a quick-start, do the following after installing this package:
- 1) Link your executable with -ltcmalloc
- 2) Run your executable with the HEAPPROFILE environment var set:
- $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
- 3) Run pprof to analyze the heap usage
- $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options
- $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
- You can also use LD_PRELOAD to heap-profile an executable that you
- didn't compile.
- There are other environment variables, besides HEAPPROFILE, you can
- set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
- below.
- The heap profiler is available on all unix-based systems we've tested;
- see INSTALL for more details. It is not currently available on Windows.
- HEAP CHECKER
- ------------
- See docs/heap_checker.html for information about how to use tcmalloc's
- heap checker.
- In order to catch all heap leaks, tcmalloc must be linked *last* into
- your executable. The heap checker may mischaracterize some memory
- accesses in libraries listed after it on the link line. For instance,
- it may report these libraries as leaking memory when they're not.
- (See the source code for more details.)
- Here's a quick-start for how to use:
- As a quick-start, do the following after installing this package:
- 1) Link your executable with -ltcmalloc
- 2) Run your executable with the HEAPCHECK environment var set:
- $ HEAPCHECK=1 <path/to/binary> [binary args]
- Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
- You can also use LD_PRELOAD to heap-check an executable that you
- didn't compile.
- The heap checker is only available on Linux at this time; see INSTALL
- for more details.
- CPU PROFILER
- ------------
- See docs/cpuprofile.html for information about how to use the CPU
- profiler and analyze its output.
- As a quick-start, do the following after installing this package:
- 1) Link your executable with -lprofiler
- 2) Run your executable with the CPUPROFILE environment var set:
- $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
- 3) Run pprof to analyze the CPU usage
- $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output
- $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
- There are other environment variables, besides CPUPROFILE, you can set
- to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
- The CPU profiler is available on all unix-based systems we've tested;
- see INSTALL for more details. It is not currently available on Windows.
- NOTE: CPU profiling doesn't work after fork (unless you immediately
- do an exec()-like call afterwards). Furthermore, if you do
- fork, and the child calls exit(), it may corrupt the profile
- data. You can use _exit() to work around this. We hope to have
- a fix for both problems in the next release of perftools
- (hopefully perftools 1.2).
- EVERYTHING IN ONE
- -----------------
- If you want the CPU profiler, heap profiler, and heap leak-checker to
- all be available for your application, you can do:
- gcc -o myapp ... -lprofiler -ltcmalloc
- However, if you have a reason to use the static versions of the
- library, this two-library linking won't work:
- gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors!
- Instead, use the special libtcmalloc_and_profiler library, which we
- make for just this purpose:
- gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
- CONFIGURATION OPTIONS
- ---------------------
- For advanced users, there are several flags you can pass to
- './configure' that tweak tcmalloc performance. (These are in addition
- to the environment variables you can set at runtime to affect
- tcmalloc, described below.) See the INSTALL file for details.
- ENVIRONMENT VARIABLES
- ---------------------
- The cpu profiler, heap checker, and heap profiler will lie dormant,
- using no memory or CPU, until you turn them on. (Thus, there's no
- harm in linking -lprofiler into every application, and also -ltcmalloc
- assuming you're ok using the non-libc malloc library.)
- The easiest way to turn them on is by setting the appropriate
- environment variables. We have several variables that let you
- enable/disable features as well as tweak parameters.
- Here are some of the most important variables:
- HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
- HEAPCHECK=<type> -- turns on heap checking with strictness 'type'
- CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
- PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
- surrounded with ProfilerEnable()/ProfilerDisable().
- CPUPROFILE_FREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
- PERFTOOLS_VERBOSE=<level> -- the higher level, the more messages malloc emits
- MALLOCSTATS=<level> -- prints memory-use stats at program-exit
- For a full list of variables, see the documentation pages:
- docs/cpuprofile.html
- docs/heapprofile.html
- docs/heap_checker.html
- COMPILING ON NON-LINUX SYSTEMS
- ------------------------------
- Perftools was developed and tested on x86 Linux systems, and it works
- in its full generality only on those systems. However, we've
- successfully ported much of the tcmalloc library to FreeBSD, Solaris
- x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
- functionality in tcmalloc_minimal to Windows. See INSTALL for details.
- See README_windows.txt for details on the Windows port.
- PERFORMANCE
- -----------
- If you're interested in some third-party comparisons of tcmalloc to
- other malloc libraries, here are a few web pages that have been
- brought to our attention. The first discusses the effect of using
- various malloc libraries on OpenLDAP. The second compares tcmalloc to
- win32's malloc.
- http://www.highlandsun.com/hyc/malloc/
- http://gaiacrtn.free.fr/articles/win32perftools.html
- It's possible to build tcmalloc in a way that trades off faster
- performance (particularly for deletes) at the cost of more memory
- fragmentation (that is, more unusable memory on your system). See the
- INSTALL file for details.
- OLD SYSTEM ISSUES
- -----------------
- When compiling perftools on some old systems, like RedHat 8, you may
- get an error like this:
- ___tls_get_addr: symbol not found
- This means that you have a system where some parts are updated enough
- to support Thread Local Storage, but others are not. The perftools
- configure script can't always detect this kind of case, leading to
- that error. To fix it, just comment out (or delete) the line
- #define HAVE_TLS 1
- in your config.h file before building.
- 64-BIT ISSUES
- -------------
- There are two issues that can cause program hangs or crashes on x86_64
- 64-bit systems, which use the libunwind library to get stack-traces.
- Neither issue should affect the core tcmalloc library; they both
- affect the perftools tools such as cpu-profiler, heap-checker, and
- heap-profiler.
- 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
- libc function dl_iterate_phdr() acquires its locks in the wrong
- order. This bug should not affect tcmalloc, but may cause occasional
- deadlock with the cpu-profiler, heap-profiler, and heap-checker.
- Its likeliness increases the more dlopen() commands an executable has.
- Most executables don't have any, though several library routines like
- getgrgid() call dlopen() behind the scenes.
- 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
- cpu-profiler tool is unreliable: it will sometimes work, but sometimes
- cause a segfault. I'll explain the problem first, and then some
- workarounds.
- Note that this only affects the cpu-profiler, which is a
- gperftools feature you must turn on manually by setting the
- CPUPROFILE environment variable. If you do not turn on cpu-profiling,
- you shouldn't see any crashes due to perftools.
- The gory details: The underlying problem is in the backtrace()
- function, which is a built-in function in libc.
- Backtracing is fairly straightforward in the normal case, but can run
- into problems when having to backtrace across a signal frame.
- Unfortunately, the cpu-profiler uses signals in order to register a
- profiling event, so every backtrace that the profiler does crosses a
- signal frame.
- In our experience, the only time there is trouble is when the signal
- fires in the middle of pthread_mutex_lock. pthread_mutex_lock is
- called quite a bit from system libraries, particularly at program
- startup and when creating a new thread.
- The solution: The dwarf debugging format has support for 'cfi
- annotations', which make it easy to recognize a signal frame. Some OS
- distributions, such as Fedora and gentoo 2007.0, already have added
- cfi annotations to their libc. A future version of libunwind should
- recognize these annotations; these systems should not see any
- crashes.
- Workarounds: If you see problems with crashes when running the
- cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
- your code, rather than setting CPUPROFILE. This will profile only
- those sections of the codebase. Though we haven't done much testing,
- in theory this should reduce the chance of crashes by limiting the
- signal generation to only a small part of the codebase. Ideally, you
- would not use ProfilerStart()/ProfilerStop() around code that spawns
- new threads, or is otherwise likely to cause a call to
- pthread_mutex_lock!
- ---
- 17 May 2011
|