heapprofile.html 13 KB


  1. <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
  2. <HTML>
  3. <HEAD>
  4. <link rel="stylesheet" href="designstyle.css">
  5. <title>Gperftools Heap Profiler</title>
  6. </HEAD>
  7. <BODY>
  8. <p align=right>
  9. <i>Last modified
  10. <script type=text/javascript>
  11. var lm = new Date(document.lastModified);
  12. document.write(lm.toDateString());
  13. </script></i>
  14. </p>
  15. <p>This is the heap profiler we use at Google, to explore how C++
  16. programs manage memory. This facility can be useful for</p>
  17. <ul>
  18. <li> Figuring out what is in the program heap at any given time
  19. <li> Locating memory leaks
  20. <li> Finding places that do a lot of allocation
  21. </ul>
  22. <p>The profiling system instruments all allocations and frees. It
  23. keeps track of various pieces of information per allocation site. An
  24. allocation site is defined as the active stack trace at the call to
  25. <code>malloc</code>, <code>calloc</code>, <code>realloc</code>, or,
  26. <code>new</code>.</p>
  27. <p>There are three parts to using it: linking the library into an
  28. application, running the code, and analyzing the output.</p>
  29. <h1>Linking in the Library</h1>
  30. <p>To install the heap profiler into your executable, add
  31. <code>-ltcmalloc</code> to the link-time step for your executable.
  32. Also, while we don't necessarily recommend this form of usage, it's
  33. possible to add in the profiler at run-time using
  34. <code>LD_PRELOAD</code>:
  35. <pre>% env LD_PRELOAD="/usr/lib/libtcmalloc.so" &lt;binary&gt;</pre>
  36. <p>This does <i>not</i> turn on heap profiling; it just inserts the
  37. code. For that reason, it's practical to just always link
  38. <code>-ltcmalloc</code> into a binary while developing; that's what we
  39. do at Google. (However, since any user can turn on the profiler by
  40. setting an environment variable, it's not necessarily recommended to
  41. install profiler-linked binaries into a production, running
  42. system.) Note that if you wish to use the heap profiler, you must
  43. also use the tcmalloc memory-allocation library. There is no way
  44. currently to use the heap profiler separate from tcmalloc.</p>
  45. <h1>Running the Code</h1>
  46. <p>There are several alternatives to actually turn on heap profiling
  47. for a given run of an executable:</p>
  48. <ol>
  49. <li> <p>Define the environment variable HEAPPROFILE to the filename
  50. to dump the profile to. For instance, to profile
  51. <code>/usr/local/bin/my_binary_compiled_with_tcmalloc</code>:</p>
  52. <pre>% env HEAPPROFILE=/tmp/mybin.hprof /usr/local/bin/my_binary_compiled_with_tcmalloc</pre>
  53. <li> <p>In your code, bracket the code you want profiled in calls to
  54. <code>HeapProfilerStart()</code> and <code>HeapProfilerStop()</code>.
  55. (These functions are declared in <code>&lt;gperftools/heap-profiler.h&gt;</code>.)
  56. <code>HeapProfilerStart()</code> will take the
  57. profile-filename-prefix as an argument. Then, as often as
  58. you'd like before calling <code>HeapProfilerStop()</code>, you
  59. can use <code>HeapProfilerDump()</code> or
  60. <code>GetHeapProfile()</code> to examine the profile. In case
  61. it's useful, <code>IsHeapProfilerRunning()</code> will tell you
  62. whether you've already called HeapProfilerStart() or not.</p>
  63. </ol>
  64. <p>For security reasons, heap profiling will not write to a file --
  65. and is thus not usable -- for setuid programs.</p>
  66. <H2>Modifying Runtime Behavior</H2>
  67. <p>You can more finely control the behavior of the heap profiler via
  68. environment variables.</p>
  69. <table frame=box rules=sides cellpadding=5 width=100%>
  70. <tr valign=top>
  71. <td><code>HEAP_PROFILE_ALLOCATION_INTERVAL</code></td>
  72. <td>default: 1073741824 (1 Gb)</td>
  73. <td>
  74. Dump heap profiling information each time the specified number of
  75. bytes has been allocated by the program.
  76. </td>
  77. </tr>
  78. <tr valign=top>
  79. <td><code>HEAP_PROFILE_INUSE_INTERVAL</code></td>
  80. <td>default: 104857600 (100 Mb)</td>
  81. <td>
  82. Dump heap profiling information whenever the high-water memory
  83. usage mark increases by the specified number of bytes.
  84. </td>
  85. </tr>
  86. <tr valign=top>
  87. <td><code>HEAP_PROFILE_TIME_INTERVAL</code></td>
  88. <td>default: 0</td>
  89. <td>
  90. Dump heap profiling information each time the specified
  91. number of seconds has elapsed.
  92. </td>
  93. </tr>
  94. <tr valign=top>
  95. <td><code>HEAPPROFILESIGNAL</code></td>
  96. <td>default: disabled</td>
  97. <td>
  98. Dump heap profiling information whenever the specified signal is sent to the
  99. process.
  100. </td>
  101. </tr>
  102. <tr valign=top>
  103. <td><code>HEAP_PROFILE_MMAP</code></td>
  104. <td>default: false</td>
  105. <td>
  106. Profile <code>mmap</code>, <code>mremap</code> and <code>sbrk</code>
  107. calls in addition
  108. to <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
  109. and <code>new</code>. <b>NOTE:</b> this causes the profiler to
  110. profile calls internal to tcmalloc, since tcmalloc and friends use
  111. mmap and sbrk internally for allocations. One partial solution is
  112. to filter these allocations out when running <code>pprof</code>,
  113. with something like
  114. <code>pprof --ignore='DoAllocWithArena|SbrkSysAllocator::Alloc|MmapSysAllocator::Alloc</code>.
  115. </td>
  116. </tr>
  117. <tr valign=top>
  118. <td><code>HEAP_PROFILE_ONLY_MMAP</code></td>
  119. <td>default: false</td>
  120. <td>
  121. Only profile <code>mmap</code>, <code>mremap</code>, and <code>sbrk</code>
  122. calls; do not profile
  123. <code>malloc</code>, <code>calloc</code>, <code>realloc</code>,
  124. or <code>new</code>.
  125. </td>
  126. </tr>
  127. <tr valign=top>
  128. <td><code>HEAP_PROFILE_MMAP_LOG</code></td>
  129. <td>default: false</td>
  130. <td>
  131. Log <code>mmap</code>/<code>munmap</code> calls.
  132. </td>
  133. </tr>
  134. </table>
  135. <H2>Checking for Leaks</H2>
  136. <p>You can use the heap profiler to manually check for leaks, for
  137. instance by reading the profiler output and looking for large
  138. allocations. However, for that task, it's easier to use the <A
  139. HREF="heap_checker.html">automatic heap-checking facility</A> built
  140. into tcmalloc.</p>
  141. <h1><a name="pprof">Analyzing the Output</a></h1>
  142. <p>If heap-profiling is turned on in a program, the program will
  143. periodically write profiles to the filesystem. The sequence of
  144. profiles will be named:</p>
  145. <pre>
  146. &lt;prefix&gt;.0000.heap
  147. &lt;prefix&gt;.0001.heap
  148. &lt;prefix&gt;.0002.heap
  149. ...
  150. </pre>
  151. <p>where <code>&lt;prefix&gt;</code> is the filename-prefix supplied
  152. when running the code (e.g. via the <code>HEAPPROFILE</code>
  153. environment variable). Note that if the supplied prefix
  154. does not start with a <code>/</code>, the profile files will be
  155. written to the program's working directory.</p>
  156. <p>The profile output can be viewed by passing it to the
  157. <code>pprof</code> tool -- the same tool that's used to analyze <A
  158. HREF="cpuprofile.html">CPU profiles</A>.
  159. <p>Here are some examples. These examples assume the binary is named
  160. <code>gfs_master</code>, and a sequence of heap profile files can be
  161. found in files named:</p>
  162. <pre>
  163. /tmp/profile.0001.heap
  164. /tmp/profile.0002.heap
  165. ...
  166. /tmp/profile.0100.heap
  167. </pre>
  168. <h3>Why is a process so big</h3>
  169. <pre>
  170. % pprof --gv gfs_master /tmp/profile.0100.heap
  171. </pre>
  172. <p>This command will pop-up a <code>gv</code> window that displays
  173. the profile information as a directed graph. Here is a portion
  174. of the resulting output:</p>
  175. <p><center>
  176. <img src="heap-example1.png">
  177. </center></p>
  178. A few explanations:
  179. <ul>
  180. <li> <code>GFS_MasterChunk::AddServer</code> accounts for 255.6 MB
  181. of the live memory, which is 25% of the total live memory.
  182. <li> <code>GFS_MasterChunkTable::UpdateState</code> is directly
  183. accountable for 176.2 MB of the live memory (i.e., it directly
  184. allocated 176.2 MB that has not been freed yet). Furthermore,
  185. it and its callees are responsible for 729.9 MB. The
  186. labels on the outgoing edges give a good indication of the
  187. amount allocated by each callee.
  188. </ul>
  189. <h3>Comparing Profiles</h3>
  190. <p>You often want to skip allocations during the initialization phase
  191. of a program so you can find gradual memory leaks. One simple way to
  192. do this is to compare two profiles -- both collected after the program
  193. has been running for a while. Specify the name of the first profile
  194. using the <code>--base</code> option. For example:</p>
  195. <pre>
  196. % pprof --base=/tmp/profile.0004.heap gfs_master /tmp/profile.0100.heap
  197. </pre>
  198. <p>The memory-usage in <code>/tmp/profile.0004.heap</code> will be
  199. subtracted from the memory-usage in
  200. <code>/tmp/profile.0100.heap</code> and the result will be
  201. displayed.</p>
  202. <h3>Text display</h3>
  203. <pre>
  204. % pprof --text gfs_master /tmp/profile.0100.heap
  205. 255.6 24.7% 24.7% 255.6 24.7% GFS_MasterChunk::AddServer
  206. 184.6 17.8% 42.5% 298.8 28.8% GFS_MasterChunkTable::Create
  207. 176.2 17.0% 59.5% 729.9 70.5% GFS_MasterChunkTable::UpdateState
  208. 169.8 16.4% 75.9% 169.8 16.4% PendingClone::PendingClone
  209. 76.3 7.4% 83.3% 76.3 7.4% __default_alloc_template::_S_chunk_alloc
  210. 49.5 4.8% 88.0% 49.5 4.8% hashtable::resize
  211. ...
  212. </pre>
  213. <p>
  214. <ul>
  215. <li> The first column contains the direct memory use in MB.
  216. <li> The fourth column contains memory use by the procedure
  217. and all of its callees.
  218. <li> The second and fifth columns are just percentage
  219. representations of the numbers in the first and fourth columns.
  220. <li> The third column is a cumulative sum of the second column
  221. (i.e., the <code>k</code>th entry in the third column is the
  222. sum of the first <code>k</code> entries in the second column.)
  223. </ul>
  224. <h3>Ignoring or focusing on specific regions</h3>
  225. <p>The following command will give a graphical display of a subset of
  226. the call-graph. Only paths in the call-graph that match the regular
  227. expression <code>DataBuffer</code> are included:</p>
  228. <pre>
  229. % pprof --gv --focus=DataBuffer gfs_master /tmp/profile.0100.heap
  230. </pre>
  231. <p>Similarly, the following command will omit all paths subset of the
  232. call-graph. All paths in the call-graph that match the regular
  233. expression <code>DataBuffer</code> are discarded:</p>
  234. <pre>
  235. % pprof --gv --ignore=DataBuffer gfs_master /tmp/profile.0100.heap
  236. </pre>
  237. <h3>Total allocations + object-level information</h3>
  238. <p>All of the previous examples have displayed the amount of in-use
  239. space. I.e., the number of bytes that have been allocated but not
  240. freed. You can also get other types of information by supplying a
  241. flag to <code>pprof</code>:</p>
  242. <center>
  243. <table frame=box rules=sides cellpadding=5 width=100%>
  244. <tr valign=top>
  245. <td><code>--inuse_space</code></td>
  246. <td>
  247. Display the number of in-use megabytes (i.e. space that has
  248. been allocated but not freed). This is the default.
  249. </td>
  250. </tr>
  251. <tr valign=top>
  252. <td><code>--inuse_objects</code></td>
  253. <td>
  254. Display the number of in-use objects (i.e. number of
  255. objects that have been allocated but not freed).
  256. </td>
  257. </tr>
  258. <tr valign=top>
  259. <td><code>--alloc_space</code></td>
  260. <td>
  261. Display the number of allocated megabytes. This includes
  262. the space that has since been de-allocated. Use this
  263. if you want to find the main allocation sites in the
  264. program.
  265. </td>
  266. </tr>
  267. <tr valign=top>
  268. <td><code>--alloc_objects</code></td>
  269. <td>
  270. Display the number of allocated objects. This includes
  271. the objects that have since been de-allocated. Use this
  272. if you want to find the main allocation sites in the
  273. program.
  274. </td>
  275. </table>
  276. </center>
  277. <h3>Interactive mode</a></h3>
  278. <p>By default -- if you don't specify any flags to the contrary --
  279. pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
  280. you can run many of the commands described above. You can type
  281. <code>help</code> for a list of what commands are available in
  282. interactive mode.</p>
  283. <h1>Caveats</h1>
  284. <ul>
  285. <li> Heap profiling requires the use of libtcmalloc. This
  286. requirement may be removed in a future version of the heap
  287. profiler, and the heap profiler separated out into its own
  288. library.
  289. <li> If the program linked in a library that was not compiled
  290. with enough symbolic information, all samples associated
  291. with the library may be charged to the last symbol found
  292. in the program before the library. This will artificially
  293. inflate the count for that symbol.
  294. <li> If you run the program on one machine, and profile it on
  295. another, and the shared libraries are different on the two
  296. machines, the profiling output may be confusing: samples that
  297. fall within the shared libaries may be assigned to arbitrary
  298. procedures.
  299. <li> Several libraries, such as some STL implementations, do their
  300. own memory management. This may cause strange profiling
  301. results. We have code in libtcmalloc to cause STL to use
  302. tcmalloc for memory management (which in our tests is better
  303. than STL's internal management), though it only works for some
  304. STL implementations.
  305. <li> If your program forks, the children will also be profiled
  306. (since they inherit the same HEAPPROFILE setting). Each
  307. process is profiled separately; to distinguish the child
  308. profiles from the parent profile and from each other, all
  309. children will have their process-id attached to the HEAPPROFILE
  310. name.
  311. <li> Due to a hack we make to work around a possible gcc bug, your
  312. profiles may end up named strangely if the first character of
  313. your HEAPPROFILE variable has ascii value greater than 127.
  314. This should be exceedingly rare, but if you need to use such a
  315. name, just set prepend <code>./</code> to your filename:
  316. <code>HEAPPROFILE=./&Auml;gypten</code>.
  317. </ul>
  318. <hr>
  319. <address>Sanjay Ghemawat
  320. <!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
  321. </address>
  322. </body>
  323. </html>