README 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285
  1. gperftools
  2. ----------
  3. (originally Google Performance Tools)
  4. The fastest malloc we’ve seen; works particularly well with threads
  5. and STL. Also: thread-friendly heap-checker, heap-profiler, and
  6. cpu-profiler.
  7. OVERVIEW
  8. ---------
  9. gperftools is a collection of a high-performance multi-threaded
  10. malloc() implementation, plus some pretty nifty performance analysis
  11. tools.
  12. gperftools is distributed under the terms of the BSD License. Join our
  13. mailing list at [email protected] for updates:
  14. https://groups.google.com/forum/#!forum/gperftools
  15. gperftools was original home for pprof program. But do note that
  16. original pprof (which is still included with gperftools) is now
  17. deprecated in favor of golang version at https://github.com/google/pprof
  18. TCMALLOC
  19. --------
  20. Just link in -ltcmalloc or -ltcmalloc_minimal to get the advantages of
  21. tcmalloc -- a replacement for malloc and new. See below for some
  22. environment variables you can use with tcmalloc, as well.
  23. tcmalloc functionality is available on all systems we've tested; see
  24. INSTALL for more details. See README_windows.txt for instructions on
  25. using tcmalloc on Windows.
  26. NOTE: When compiling with programs with gcc, that you plan to link
  27. with libtcmalloc, it's safest to pass in the flags
  28. -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free
  29. when compiling. gcc makes some optimizations assuming it is using its
  30. own, built-in malloc; that assumption obviously isn't true with
  31. tcmalloc. In practice, we haven't seen any problems with this, but
  32. the expected risk is highest for users who register their own malloc
  33. hooks with tcmalloc (using gperftools/malloc_hook.h). The risk is
  34. lowest for folks who use tcmalloc_minimal (or, of course, who pass in
  35. the above flags :-) ).
  36. HEAP PROFILER
  37. -------------
  38. See docs/heapprofile.html for information about how to use tcmalloc's
  39. heap profiler and analyze its output.
  40. As a quick-start, do the following after installing this package:
  41. 1) Link your executable with -ltcmalloc
  42. 2) Run your executable with the HEAPPROFILE environment var set:
  43. $ HEAPPROFILE=/tmp/heapprof <path/to/binary> [binary args]
  44. 3) Run pprof to analyze the heap usage
  45. $ pprof <path/to/binary> /tmp/heapprof.0045.heap # run 'ls' to see options
  46. $ pprof --gv <path/to/binary> /tmp/heapprof.0045.heap
  47. You can also use LD_PRELOAD to heap-profile an executable that you
  48. didn't compile.
  49. There are other environment variables, besides HEAPPROFILE, you can
  50. set to adjust the heap-profiler behavior; c.f. "ENVIRONMENT VARIABLES"
  51. below.
  52. The heap profiler is available on all unix-based systems we've tested;
  53. see INSTALL for more details. It is not currently available on Windows.
  54. HEAP CHECKER
  55. ------------
  56. See docs/heap_checker.html for information about how to use tcmalloc's
  57. heap checker.
  58. In order to catch all heap leaks, tcmalloc must be linked *last* into
  59. your executable. The heap checker may mischaracterize some memory
  60. accesses in libraries listed after it on the link line. For instance,
  61. it may report these libraries as leaking memory when they're not.
  62. (See the source code for more details.)
  63. Here's a quick-start for how to use:
  64. As a quick-start, do the following after installing this package:
  65. 1) Link your executable with -ltcmalloc
  66. 2) Run your executable with the HEAPCHECK environment var set:
  67. $ HEAPCHECK=1 <path/to/binary> [binary args]
  68. Other values for HEAPCHECK: normal (equivalent to "1"), strict, draconian
  69. You can also use LD_PRELOAD to heap-check an executable that you
  70. didn't compile.
  71. The heap checker is only available on Linux at this time; see INSTALL
  72. for more details.
  73. CPU PROFILER
  74. ------------
  75. See docs/cpuprofile.html for information about how to use the CPU
  76. profiler and analyze its output.
  77. As a quick-start, do the following after installing this package:
  78. 1) Link your executable with -lprofiler
  79. 2) Run your executable with the CPUPROFILE environment var set:
  80. $ CPUPROFILE=/tmp/prof.out <path/to/binary> [binary args]
  81. 3) Run pprof to analyze the CPU usage
  82. $ pprof <path/to/binary> /tmp/prof.out # -pg-like text output
  83. $ pprof --gv <path/to/binary> /tmp/prof.out # really cool graphical output
  84. There are other environment variables, besides CPUPROFILE, you can set
  85. to adjust the cpu-profiler behavior; cf "ENVIRONMENT VARIABLES" below.
  86. The CPU profiler is available on all unix-based systems we've tested;
  87. see INSTALL for more details. It is not currently available on Windows.
  88. NOTE: CPU profiling doesn't work after fork (unless you immediately
  89. do an exec()-like call afterwards). Furthermore, if you do
  90. fork, and the child calls exit(), it may corrupt the profile
  91. data. You can use _exit() to work around this. We hope to have
  92. a fix for both problems in the next release of perftools
  93. (hopefully perftools 1.2).
  94. EVERYTHING IN ONE
  95. -----------------
  96. If you want the CPU profiler, heap profiler, and heap leak-checker to
  97. all be available for your application, you can do:
  98. gcc -o myapp ... -lprofiler -ltcmalloc
  99. However, if you have a reason to use the static versions of the
  100. library, this two-library linking won't work:
  101. gcc -o myapp ... /usr/lib/libprofiler.a /usr/lib/libtcmalloc.a # errors!
  102. Instead, use the special libtcmalloc_and_profiler library, which we
  103. make for just this purpose:
  104. gcc -o myapp ... /usr/lib/libtcmalloc_and_profiler.a
  105. CONFIGURATION OPTIONS
  106. ---------------------
  107. For advanced users, there are several flags you can pass to
  108. './configure' that tweak tcmalloc performance. (These are in addition
  109. to the environment variables you can set at runtime to affect
  110. tcmalloc, described below.) See the INSTALL file for details.
  111. ENVIRONMENT VARIABLES
  112. ---------------------
  113. The cpu profiler, heap checker, and heap profiler will lie dormant,
  114. using no memory or CPU, until you turn them on. (Thus, there's no
  115. harm in linking -lprofiler into every application, and also -ltcmalloc
  116. assuming you're ok using the non-libc malloc library.)
  117. The easiest way to turn them on is by setting the appropriate
  118. environment variables. We have several variables that let you
  119. enable/disable features as well as tweak parameters.
  120. Here are some of the most important variables:
  121. HEAPPROFILE=<pre> -- turns on heap profiling and dumps data using this prefix
  122. HEAPCHECK=<type> -- turns on heap checking with strictness 'type'
  123. CPUPROFILE=<file> -- turns on cpu profiling and dumps data to this file.
  124. PROFILESELECTED=1 -- if set, cpu-profiler will only profile regions of code
  125. surrounded with ProfilerEnable()/ProfilerDisable().
  126. CPUPROFILE_FREQUENCY=x-- how many interrupts/second the cpu-profiler samples.
  127. PERFTOOLS_VERBOSE=<level> -- the higher level, the more messages malloc emits
  128. MALLOCSTATS=<level> -- prints memory-use stats at program-exit
  129. For a full list of variables, see the documentation pages:
  130. docs/cpuprofile.html
  131. docs/heapprofile.html
  132. docs/heap_checker.html
  133. COMPILING ON NON-LINUX SYSTEMS
  134. ------------------------------
  135. Perftools was developed and tested on x86 Linux systems, and it works
  136. in its full generality only on those systems. However, we've
  137. successfully ported much of the tcmalloc library to FreeBSD, Solaris
  138. x86, and Darwin (Mac OS X) x86 and ppc; and we've ported the basic
  139. functionality in tcmalloc_minimal to Windows. See INSTALL for details.
  140. See README_windows.txt for details on the Windows port.
  141. PERFORMANCE
  142. -----------
  143. If you're interested in some third-party comparisons of tcmalloc to
  144. other malloc libraries, here are a few web pages that have been
  145. brought to our attention. The first discusses the effect of using
  146. various malloc libraries on OpenLDAP. The second compares tcmalloc to
  147. win32's malloc.
  148. http://www.highlandsun.com/hyc/malloc/
  149. http://gaiacrtn.free.fr/articles/win32perftools.html
  150. It's possible to build tcmalloc in a way that trades off faster
  151. performance (particularly for deletes) at the cost of more memory
  152. fragmentation (that is, more unusable memory on your system). See the
  153. INSTALL file for details.
  154. OLD SYSTEM ISSUES
  155. -----------------
  156. When compiling perftools on some old systems, like RedHat 8, you may
  157. get an error like this:
  158. ___tls_get_addr: symbol not found
  159. This means that you have a system where some parts are updated enough
  160. to support Thread Local Storage, but others are not. The perftools
  161. configure script can't always detect this kind of case, leading to
  162. that error. To fix it, just comment out (or delete) the line
  163. #define HAVE_TLS 1
  164. in your config.h file before building.
  165. 64-BIT ISSUES
  166. -------------
  167. There are two issues that can cause program hangs or crashes on x86_64
  168. 64-bit systems, which use the libunwind library to get stack-traces.
  169. Neither issue should affect the core tcmalloc library; they both
  170. affect the perftools tools such as cpu-profiler, heap-checker, and
  171. heap-profiler.
  172. 1) Some libc's -- at least glibc 2.4 on x86_64 -- have a bug where the
  173. libc function dl_iterate_phdr() acquires its locks in the wrong
  174. order. This bug should not affect tcmalloc, but may cause occasional
  175. deadlock with the cpu-profiler, heap-profiler, and heap-checker.
  176. Its likeliness increases the more dlopen() commands an executable has.
  177. Most executables don't have any, though several library routines like
  178. getgrgid() call dlopen() behind the scenes.
  179. 2) On x86-64 64-bit systems, while tcmalloc itself works fine, the
  180. cpu-profiler tool is unreliable: it will sometimes work, but sometimes
  181. cause a segfault. I'll explain the problem first, and then some
  182. workarounds.
  183. Note that this only affects the cpu-profiler, which is a
  184. gperftools feature you must turn on manually by setting the
  185. CPUPROFILE environment variable. If you do not turn on cpu-profiling,
  186. you shouldn't see any crashes due to perftools.
  187. The gory details: The underlying problem is in the backtrace()
  188. function, which is a built-in function in libc.
  189. Backtracing is fairly straightforward in the normal case, but can run
  190. into problems when having to backtrace across a signal frame.
  191. Unfortunately, the cpu-profiler uses signals in order to register a
  192. profiling event, so every backtrace that the profiler does crosses a
  193. signal frame.
  194. In our experience, the only time there is trouble is when the signal
  195. fires in the middle of pthread_mutex_lock. pthread_mutex_lock is
  196. called quite a bit from system libraries, particularly at program
  197. startup and when creating a new thread.
  198. The solution: The dwarf debugging format has support for 'cfi
  199. annotations', which make it easy to recognize a signal frame. Some OS
  200. distributions, such as Fedora and gentoo 2007.0, already have added
  201. cfi annotations to their libc. A future version of libunwind should
  202. recognize these annotations; these systems should not see any
  203. crashes.
  204. Workarounds: If you see problems with crashes when running the
  205. cpu-profiler, consider inserting ProfilerStart()/ProfilerStop() into
  206. your code, rather than setting CPUPROFILE. This will profile only
  207. those sections of the codebase. Though we haven't done much testing,
  208. in theory this should reduce the chance of crashes by limiting the
  209. signal generation to only a small part of the codebase. Ideally, you
  210. would not use ProfilerStart()/ProfilerStop() around code that spawns
  211. new threads, or is otherwise likely to cause a call to
  212. pthread_mutex_lock!
  213. ---
  214. 17 May 2011