cpuprofile.html 19 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537
  1. <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
  2. <HTML>
  3. <HEAD>
  4. <link rel="stylesheet" href="designstyle.css">
  5. <title>Gperftools CPU Profiler</title>
  6. </HEAD>
  7. <BODY>
  8. <p align=right>
  9. <i>Last modified
  10. <script type=text/javascript>
  11. var lm = new Date(document.lastModified);
  12. document.write(lm.toDateString());
  13. </script></i>
  14. </p>
  15. <p>This is the CPU profiler we use at Google. There are three parts
  16. to using it: linking the library into an application, running the
  17. code, and analyzing the output.</p>
  18. <p>On the off-chance that you should need to understand it, the CPU
  19. profiler data file format is documented separately,
  20. <a href="cpuprofile-fileformat.html">here</a>.
  21. <H1>Linking in the Library</H1>
  22. <p>To install the CPU profiler into your executable, add
  23. <code>-lprofiler</code> to the link-time step for your executable.
  24. (It's also probably possible to add in the profiler at run-time using
  25. <code>LD_PRELOAD</code>, e.g.
  26. <code>% env LD_PRELOAD="/usr/lib/libprofiler.so" &lt;binary&gt;</code>,
  27. but this isn't necessarily recommended.)</p>
  28. <p>This does <i>not</i> turn on CPU profiling; it just inserts the
  29. code. For that reason, it's practical to just always link
  30. <code>-lprofiler</code> into a binary while developing; that's what we
  31. do at Google. (However, since any user can turn on the profiler by
  32. setting an environment variable, it's not necessarily recommended to
  33. install profiler-linked binaries into a production, running
  34. system.)</p>
  35. <H1>Running the Code</H1>
  36. <p>There are several alternatives to actually turn on CPU profiling
  37. for a given run of an executable:</p>
  38. <ol>
  39. <li> <p>Define the environment variable CPUPROFILE to the filename
  40. to dump the profile to. For instance, if you had a version of
  41. <code>/bin/ls</code> that had been linked against libprofiler,
  42. you could run:</p>
  43. <pre>% env CPUPROFILE=ls.prof /bin/ls</pre>
  44. </li>
  45. <li> <p>In addition to defining the environment variable CPUPROFILE
  46. you can also define CPUPROFILESIGNAL. This allows profiling to be
  47. controlled via the signal number that you specify. The signal number
  48. must be unused by the program under normal operation. Internally it
  49. acts as a switch, triggered by the signal, which is off by default.
  50. For instance, if you had a copy of <code>/bin/chrome</code> that had been
  51. been linked against libprofiler, you could run:</p>
  52. <pre>% env CPUPROFILE=chrome.prof CPUPROFILESIGNAL=12 /bin/chrome &</pre>
  53. <p>You can then trigger profiling to start:</p>
  54. <pre>% killall -12 chrome</pre>
  55. <p>Then after a period of time you can tell it to stop which will
  56. generate the profile:</p>
  57. <pre>% killall -12 chrome</pre>
  58. </li>
  59. <li> <p>In your code, bracket the code you want profiled in calls to
  60. <code>ProfilerStart()</code> and <code>ProfilerStop()</code>.
  61. (These functions are declared in <code>&lt;gperftools/profiler.h&gt;</code>.)
  62. <code>ProfilerStart()</code> will take
  63. the profile-filename as an argument.</p>
  64. </li>
  65. </ol>
  66. <p>In Linux 2.6 and above, profiling works correctly with threads,
  67. automatically profiling all threads. In Linux 2.4, profiling only
  68. profiles the main thread (due to a kernel bug involving itimers and
  69. threads). Profiling works correctly with sub-processes: each child
  70. process gets its own profile with its own name (generated by combining
  71. CPUPROFILE with the child's process id).</p>
  72. <p>For security reasons, CPU profiling will not write to a file -- and
  73. is thus not usable -- for setuid programs.</p>
  74. <p>See the include-file <code>gperftools/profiler.h</code> for
  75. advanced-use functions, including <code>ProfilerFlush()</code> and
  76. <code>ProfilerStartWithOptions()</code>.</p>
  77. <H2>Modifying Runtime Behavior</H2>
  78. <p>You can more finely control the behavior of the CPU profiler via
  79. environment variables.</p>
  80. <table frame=box rules=sides cellpadding=5 width=100%>
  81. <tr valign=top>
  82. <td><code>CPUPROFILE_FREQUENCY=<i>x</i></code></td>
  83. <td>default: 100</td>
  84. <td>
  85. How many interrupts/second the cpu-profiler samples.
  86. </td>
  87. </tr>
  88. <tr valign=top>
  89. <td><code>CPUPROFILE_REALTIME=1</code></td>
  90. <td>default: [not set]</td>
  91. <td>
  92. If set to any value (including 0 or the empty string), use
  93. ITIMER_REAL instead of ITIMER_PROF to gather profiles. In
  94. general, ITIMER_REAL is not as accurate as ITIMER_PROF, and also
  95. interacts badly with use of alarm(), so prefer ITIMER_PROF unless
  96. you have a reason prefer ITIMER_REAL.
  97. </td>
  98. </tr>
  99. </table>
  100. <h1><a name="pprof">Analyzing the Output</a></h1>
  101. <p><code>pprof</code> is the script used to analyze a profile. It has
  102. many output modes, both textual and graphical. Some give just raw
  103. numbers, much like the <code>-pg</code> output of <code>gcc</code>,
  104. and others show the data in the form of a dependency graph.</p>
  105. <p>pprof <b>requires</b> <code>perl5</code> to be installed to run.
  106. It also requires <code>dot</code> to be installed for any of the
  107. graphical output routines, and <code>gv</code> to be installed for
  108. <code>--gv</code> mode (described below).
  109. </p>
  110. <p>Here are some ways to call pprof. These are described in more
  111. detail below.</p>
  112. <pre>
  113. % pprof /bin/ls ls.prof
  114. Enters "interactive" mode
  115. % pprof --text /bin/ls ls.prof
  116. Outputs one line per procedure
  117. % pprof --gv /bin/ls ls.prof
  118. Displays annotated call-graph via 'gv'
  119. % pprof --gv --focus=Mutex /bin/ls ls.prof
  120. Restricts to code paths including a .*Mutex.* entry
  121. % pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof
  122. Code paths including Mutex but not string
  123. % pprof --list=getdir /bin/ls ls.prof
  124. (Per-line) annotated source listing for getdir()
  125. % pprof --disasm=getdir /bin/ls ls.prof
  126. (Per-PC) annotated disassembly for getdir()
  127. % pprof --text localhost:1234
  128. Outputs one line per procedure for localhost:1234
  129. % pprof --callgrind /bin/ls ls.prof
  130. Outputs the call information in callgrind format
  131. </pre>
  132. <h3>Analyzing Text Output</h3>
  133. <p>Text mode has lines of output that look like this:</p>
  134. <pre>
  135. 14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
  136. </pre>
  137. <p>Here is how to interpret the columns:</p>
  138. <ol>
  139. <li> Number of profiling samples in this function
  140. <li> Percentage of profiling samples in this function
  141. <li> Percentage of profiling samples in the functions printed so far
  142. <li> Number of profiling samples in this function and its callees
  143. <li> Percentage of profiling samples in this function and its callees
  144. <li> Function name
  145. </ol>
  146. <h3>Analyzing Callgrind Output</h3>
  147. <p>Use <a href="http://kcachegrind.sourceforge.net">kcachegrind</a> to
  148. analyze your callgrind output:</p>
  149. <pre>
  150. % pprof --callgrind /bin/ls ls.prof > ls.callgrind
  151. % kcachegrind ls.callgrind
  152. </pre>
  153. <p>The cost is specified in 'hits', i.e. how many times a function
  154. appears in the recorded call stack information. The 'calls' from
  155. function a to b record how many times function b was found in the
  156. stack traces directly below function a.</p>
  157. <p>Tip: if you use a debug build the output will include file and line
  158. number information and kcachegrind will show an annotated source
  159. code view.</p>
  160. <h3>Node Information</h3>
  161. <p>In the various graphical modes of pprof, the output is a call graph
  162. annotated with timing information, like so:</p>
  163. <A HREF="pprof-test-big.gif">
  164. <center><table><tr><td>
  165. <img src="pprof-test.gif">
  166. </td></tr></table></center>
  167. </A>
  168. <p>Each node represents a procedure. The directed edges indicate
  169. caller to callee relations. Each node is formatted as follows:</p>
  170. <center><pre>
  171. Class Name
  172. Method Name
  173. local (percentage)
  174. <b>of</b> cumulative (percentage)
  175. </pre></center>
  176. <p>The last one or two lines contains the timing information. (The
  177. profiling is done via a sampling method, where by default we take 100
  178. samples a second. Therefor one unit of time in the output corresponds
  179. to about 10 milliseconds of execution time.) The "local" time is the
  180. time spent executing the instructions directly contained in the
  181. procedure (and in any other procedures that were inlined into the
  182. procedure). The "cumulative" time is the sum of the "local" time and
  183. the time spent in any callees. If the cumulative time is the same as
  184. the local time, it is not printed.</p>
  185. <p>For instance, the timing information for test_main_thread()
  186. indicates that 155 units (about 1.55 seconds) were spent executing the
  187. code in <code>test_main_thread()</code> and 200 units were spent while
  188. executing <code>test_main_thread()</code> and its callees such as
  189. <code>snprintf()</code>.</p>
  190. <p>The size of the node is proportional to the local count. The
  191. percentage displayed in the node corresponds to the count divided by
  192. the total run time of the program (that is, the cumulative count for
  193. <code>main()</code>).</p>
  194. <h3>Edge Information</h3>
  195. <p>An edge from one node to another indicates a caller to callee
  196. relationship. Each edge is labelled with the time spent by the callee
  197. on behalf of the caller. E.g, the edge from
  198. <code>test_main_thread()</code> to <code>snprintf()</code> indicates
  199. that of the 200 samples in <code>test_main_thread()</code>, 37 are
  200. because of calls to <code>snprintf()</code>.</p>
  201. <p>Note that <code>test_main_thread()</code> has an edge to
  202. <code>vsnprintf()</code>, even though <code>test_main_thread()</code>
  203. doesn't call that function directly. This is because the code was
  204. compiled with <code>-O2</code>; the profile reflects the optimized
  205. control flow.</p>
  206. <h3>Meta Information</h3>
  207. <p>The top of the display should contain some meta information
  208. like:</p>
  209. <pre>
  210. /tmp/profiler2_unittest
  211. Total samples: 202
  212. Focusing on: 202
  213. Dropped nodes with &lt;= 1 abs(samples)
  214. Dropped edges with &lt;= 0 samples
  215. </pre>
  216. <p>This section contains the name of the program, and the total
  217. samples collected during the profiling run. If the
  218. <code>--focus</code> option is on (see the <a href="#focus">Focus</a>
  219. section below), the legend also contains the number of samples being
  220. shown in the focused display. Furthermore, some unimportant nodes and
  221. edges are dropped to reduce clutter. The characteristics of the
  222. dropped nodes and edges are also displayed in the legend.</p>
  223. <h3><a name=focus>Focus and Ignore</a></h3>
  224. <p>You can ask pprof to generate a display focused on a particular
  225. piece of the program. You specify a regular expression. Any portion
  226. of the call-graph that is on a path which contains at least one node
  227. matching the regular expression is preserved. The rest of the
  228. call-graph is dropped on the floor. For example, you can focus on the
  229. <code>vsnprintf()</code> libc call in <code>profiler2_unittest</code>
  230. as follows:</p>
  231. <pre>
  232. % pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof
  233. </pre>
  234. <A HREF="pprof-vsnprintf-big.gif">
  235. <center><table><tr><td>
  236. <img src="pprof-vsnprintf.gif">
  237. </td></tr></table></center>
  238. </A>
  239. <p>Similarly, you can supply the <code>--ignore</code> option to
  240. ignore samples that match a specified regular expression. E.g., if
  241. you are interested in everything except calls to
  242. <code>snprintf()</code>, you can say:</p>
  243. <pre>
  244. % pprof --gv --ignore=snprintf /tmp/profiler2_unittest test.prof
  245. </pre>
  246. <h3>Interactive mode</a></h3>
  247. <p>By default -- if you don't specify any flags to the contrary --
  248. pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
  249. you can run many of the commands described above. You can type
  250. <code>help</code> for a list of what commands are available in
  251. interactive mode.</p>
  252. <h3><a name=options>pprof Options</a></h3>
  253. For a complete list of pprof options, you can run <code>pprof
  254. --help</code>.
  255. <h4>Output Type</h4>
  256. <p>
  257. <center>
  258. <table frame=box rules=sides cellpadding=5 width=100%>
  259. <tr valign=top>
  260. <td><code>--text</code></td>
  261. <td>
  262. Produces a textual listing. (Note: If you have an X display, and
  263. <code>dot</code> and <code>gv</code> installed, you will probably
  264. be happier with the <code>--gv</code> output.)
  265. </td>
  266. </tr>
  267. <tr valign=top>
  268. <td><code>--gv</code></td>
  269. <td>
  270. Generates annotated call-graph, converts to postscript, and
  271. displays via gv (requres <code>dot</code> and <code>gv</code> be
  272. installed).
  273. </td>
  274. </tr>
  275. <tr valign=top>
  276. <td><code>--dot</code></td>
  277. <td>
  278. Generates the annotated call-graph in dot format and
  279. emits to stdout (requres <code>dot</code> be installed).
  280. </td>
  281. </tr>
  282. <tr valign=top>
  283. <td><code>--ps</code></td>
  284. <td>
  285. Generates the annotated call-graph in Postscript format and
  286. emits to stdout (requres <code>dot</code> be installed).
  287. </td>
  288. </tr>
  289. <tr valign=top>
  290. <td><code>--pdf</code></td>
  291. <td>
  292. Generates the annotated call-graph in PDF format and emits to
  293. stdout (requires <code>dot</code> and <code>ps2pdf</code> be
  294. installed).
  295. </td>
  296. </tr>
  297. <tr valign=top>
  298. <td><code>--gif</code></td>
  299. <td>
  300. Generates the annotated call-graph in GIF format and
  301. emits to stdout (requres <code>dot</code> be installed).
  302. </td>
  303. </tr>
  304. <tr valign=top>
  305. <td><code>--list=&lt;<i>regexp</i>&gt;</code></td>
  306. <td>
  307. <p>Outputs source-code listing of routines whose
  308. name matches &lt;regexp&gt;. Each line
  309. in the listing is annotated with flat and cumulative
  310. sample counts.</p>
  311. <p>In the presence of inlined calls, the samples
  312. associated with inlined code tend to get assigned
  313. to a line that follows the location of the
  314. inlined call. A more precise accounting can be
  315. obtained by disassembling the routine using the
  316. --disasm flag.</p>
  317. </td>
  318. </tr>
  319. <tr valign=top>
  320. <td><code>--disasm=&lt;<i>regexp</i>&gt;</code></td>
  321. <td>
  322. Generates disassembly of routines that match
  323. &lt;regexp&gt;, annotated with flat and
  324. cumulative sample counts and emits to stdout.
  325. </td>
  326. </tr>
  327. </table>
  328. </center>
  329. <h4>Reporting Granularity</h4>
  330. <p>By default, pprof produces one entry per procedure. However you can
  331. use one of the following options to change the granularity of the
  332. output. The <code>--files</code> option seems to be particularly
  333. useless, and may be removed eventually.</p>
  334. <center>
  335. <table frame=box rules=sides cellpadding=5 width=100%>
  336. <tr valign=top>
  337. <td><code>--addresses</code></td>
  338. <td>
  339. Produce one node per program address.
  340. </td>
  341. </tr>
  342. <td><code>--lines</code></td>
  343. <td>
  344. Produce one node per source line.
  345. </td>
  346. </tr>
  347. <td><code>--functions</code></td>
  348. <td>
  349. Produce one node per function (this is the default).
  350. </td>
  351. </tr>
  352. <td><code>--files</code></td>
  353. <td>
  354. Produce one node per source file.
  355. </td>
  356. </tr>
  357. </table>
  358. </center>
  359. <h4>Controlling the Call Graph Display</h4>
  360. <p>Some nodes and edges are dropped to reduce clutter in the output
  361. display. The following options control this effect:</p>
  362. <center>
  363. <table frame=box rules=sides cellpadding=5 width=100%>
  364. <tr valign=top>
  365. <td><code>--nodecount=&lt;n&gt;</code></td>
  366. <td>
  367. This option controls the number of displayed nodes. The nodes
  368. are first sorted by decreasing cumulative count, and then only
  369. the top N nodes are kept. The default value is 80.
  370. </td>
  371. </tr>
  372. <tr valign=top>
  373. <td><code>--nodefraction=&lt;f&gt;</code></td>
  374. <td>
  375. This option provides another mechanism for discarding nodes
  376. from the display. If the cumulative count for a node is
  377. less than this option's value multiplied by the total count
  378. for the profile, the node is dropped. The default value
  379. is 0.005; i.e. nodes that account for less than
  380. half a percent of the total time are dropped. A node
  381. is dropped if either this condition is satisfied, or the
  382. --nodecount condition is satisfied.
  383. </td>
  384. </tr>
  385. <tr valign=top>
  386. <td><code>--edgefraction=&lt;f&gt;</code></td>
  387. <td>
  388. This option controls the number of displayed edges. First of all,
  389. an edge is dropped if either its source or destination node is
  390. dropped. Otherwise, the edge is dropped if the sample
  391. count along the edge is less than this option's value multiplied
  392. by the total count for the profile. The default value is
  393. 0.001; i.e., edges that account for less than
  394. 0.1% of the total time are dropped.
  395. </td>
  396. </tr>
  397. <tr valign=top>
  398. <td><code>--focus=&lt;re&gt;</code></td>
  399. <td>
  400. This option controls what region of the graph is displayed
  401. based on the regular expression supplied with the option.
  402. For any path in the callgraph, we check all nodes in the path
  403. against the supplied regular expression. If none of the nodes
  404. match, the path is dropped from the output.
  405. </td>
  406. </tr>
  407. <tr valign=top>
  408. <td><code>--ignore=&lt;re&gt;</code></td>
  409. <td>
  410. This option controls what region of the graph is displayed
  411. based on the regular expression supplied with the option.
  412. For any path in the callgraph, we check all nodes in the path
  413. against the supplied regular expression. If any of the nodes
  414. match, the path is dropped from the output.
  415. </td>
  416. </tr>
  417. </table>
  418. </center>
  419. <p>The dropped edges and nodes account for some count mismatches in
  420. the display. For example, the cumulative count for
  421. <code>snprintf()</code> in the first diagram above was 41. However
  422. the local count (1) and the count along the outgoing edges (12+1+20+6)
  423. add up to only 40.</p>
  424. <h1>Caveats</h1>
  425. <ul>
  426. <li> If the program exits because of a signal, the generated profile
  427. will be <font color=red>incomplete, and may perhaps be
  428. completely empty</font>.
  429. <li> The displayed graph may have disconnected regions because
  430. of the edge-dropping heuristics described above.
  431. <li> If the program linked in a library that was not compiled
  432. with enough symbolic information, all samples associated
  433. with the library may be charged to the last symbol found
  434. in the program before the library. This will artificially
  435. inflate the count for that symbol.
  436. <li> If you run the program on one machine, and profile it on
  437. another, and the shared libraries are different on the two
  438. machines, the profiling output may be confusing: samples that
  439. fall within shared libaries may be assigned to arbitrary
  440. procedures.
  441. <li> If your program forks, the children will also be profiled
  442. (since they inherit the same CPUPROFILE setting). Each process
  443. is profiled separately; to distinguish the child profiles from
  444. the parent profile and from each other, all children will have
  445. their process-id appended to the CPUPROFILE name.
  446. <li> Due to a hack we make to work around a possible gcc bug, your
  447. profiles may end up named strangely if the first character of
  448. your CPUPROFILE variable has ascii value greater than 127.
  449. This should be exceedingly rare, but if you need to use such a
  450. name, just set prepend <code>./</code> to your filename:
  451. <code>CPUPROFILE=./&Auml;gypten</code>.
  452. </ul>
  453. <hr>
  454. <address>Sanjay Ghemawat<br>
  455. <!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
  456. <!-- hhmts start -->
  457. Last modified: Fri May 9 14:41:29 PDT 2008
  458. <!-- hhmts end -->
  459. </address>
  460. </BODY>
  461. </HTML>