123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537 |
- <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
- <HTML>
- <HEAD>
- <link rel="stylesheet" href="designstyle.css">
- <title>Gperftools CPU Profiler</title>
- </HEAD>
- <BODY>
- <p align=right>
- <i>Last modified
- <script type=text/javascript>
- var lm = new Date(document.lastModified);
- document.write(lm.toDateString());
- </script></i>
- </p>
- <p>This is the CPU profiler we use at Google. There are three parts
- to using it: linking the library into an application, running the
- code, and analyzing the output.</p>
- <p>On the off-chance that you should need to understand it, the CPU
- profiler data file format is documented separately,
- <a href="cpuprofile-fileformat.html">here</a>.
- <H1>Linking in the Library</H1>
- <p>To install the CPU profiler into your executable, add
- <code>-lprofiler</code> to the link-time step for your executable.
- (It's also probably possible to add in the profiler at run-time using
- <code>LD_PRELOAD</code>, e.g.
- <code>% env LD_PRELOAD="/usr/lib/libprofiler.so" <binary></code>,
- but this isn't necessarily recommended.)</p>
- <p>This does <i>not</i> turn on CPU profiling; it just inserts the
- code. For that reason, it's practical to just always link
- <code>-lprofiler</code> into a binary while developing; that's what we
- do at Google. (However, since any user can turn on the profiler by
- setting an environment variable, it's not necessarily recommended to
- install profiler-linked binaries into a production, running
- system.)</p>
- <H1>Running the Code</H1>
- <p>There are several alternatives to actually turn on CPU profiling
- for a given run of an executable:</p>
- <ol>
- <li> <p>Define the environment variable CPUPROFILE to the filename
- to dump the profile to. For instance, if you had a version of
- <code>/bin/ls</code> that had been linked against libprofiler,
- you could run:</p>
- <pre>% env CPUPROFILE=ls.prof /bin/ls</pre>
- </li>
- <li> <p>In addition to defining the environment variable CPUPROFILE
- you can also define CPUPROFILESIGNAL. This allows profiling to be
- controlled via the signal number that you specify. The signal number
- must be unused by the program under normal operation. Internally it
- acts as a switch, triggered by the signal, which is off by default.
- For instance, if you had a copy of <code>/bin/chrome</code> that had been
- been linked against libprofiler, you could run:</p>
- <pre>% env CPUPROFILE=chrome.prof CPUPROFILESIGNAL=12 /bin/chrome &</pre>
- <p>You can then trigger profiling to start:</p>
- <pre>% killall -12 chrome</pre>
- <p>Then after a period of time you can tell it to stop which will
- generate the profile:</p>
- <pre>% killall -12 chrome</pre>
- </li>
- <li> <p>In your code, bracket the code you want profiled in calls to
- <code>ProfilerStart()</code> and <code>ProfilerStop()</code>.
- (These functions are declared in <code><gperftools/profiler.h></code>.)
- <code>ProfilerStart()</code> will take
- the profile-filename as an argument.</p>
- </li>
- </ol>
- <p>In Linux 2.6 and above, profiling works correctly with threads,
- automatically profiling all threads. In Linux 2.4, profiling only
- profiles the main thread (due to a kernel bug involving itimers and
- threads). Profiling works correctly with sub-processes: each child
- process gets its own profile with its own name (generated by combining
- CPUPROFILE with the child's process id).</p>
- <p>For security reasons, CPU profiling will not write to a file -- and
- is thus not usable -- for setuid programs.</p>
- <p>See the include-file <code>gperftools/profiler.h</code> for
- advanced-use functions, including <code>ProfilerFlush()</code> and
- <code>ProfilerStartWithOptions()</code>.</p>
- <H2>Modifying Runtime Behavior</H2>
- <p>You can more finely control the behavior of the CPU profiler via
- environment variables.</p>
- <table frame=box rules=sides cellpadding=5 width=100%>
- <tr valign=top>
- <td><code>CPUPROFILE_FREQUENCY=<i>x</i></code></td>
- <td>default: 100</td>
- <td>
- How many interrupts/second the cpu-profiler samples.
- </td>
- </tr>
- <tr valign=top>
- <td><code>CPUPROFILE_REALTIME=1</code></td>
- <td>default: [not set]</td>
- <td>
- If set to any value (including 0 or the empty string), use
- ITIMER_REAL instead of ITIMER_PROF to gather profiles. In
- general, ITIMER_REAL is not as accurate as ITIMER_PROF, and also
- interacts badly with use of alarm(), so prefer ITIMER_PROF unless
- you have a reason prefer ITIMER_REAL.
- </td>
- </tr>
- </table>
- <h1><a name="pprof">Analyzing the Output</a></h1>
- <p><code>pprof</code> is the script used to analyze a profile. It has
- many output modes, both textual and graphical. Some give just raw
- numbers, much like the <code>-pg</code> output of <code>gcc</code>,
- and others show the data in the form of a dependency graph.</p>
- <p>pprof <b>requires</b> <code>perl5</code> to be installed to run.
- It also requires <code>dot</code> to be installed for any of the
- graphical output routines, and <code>gv</code> to be installed for
- <code>--gv</code> mode (described below).
- </p>
- <p>Here are some ways to call pprof. These are described in more
- detail below.</p>
- <pre>
- % pprof /bin/ls ls.prof
- Enters "interactive" mode
- % pprof --text /bin/ls ls.prof
- Outputs one line per procedure
- % pprof --gv /bin/ls ls.prof
- Displays annotated call-graph via 'gv'
- % pprof --gv --focus=Mutex /bin/ls ls.prof
- Restricts to code paths including a .*Mutex.* entry
- % pprof --gv --focus=Mutex --ignore=string /bin/ls ls.prof
- Code paths including Mutex but not string
- % pprof --list=getdir /bin/ls ls.prof
- (Per-line) annotated source listing for getdir()
- % pprof --disasm=getdir /bin/ls ls.prof
- (Per-PC) annotated disassembly for getdir()
- % pprof --text localhost:1234
- Outputs one line per procedure for localhost:1234
- % pprof --callgrind /bin/ls ls.prof
- Outputs the call information in callgrind format
- </pre>
- <h3>Analyzing Text Output</h3>
- <p>Text mode has lines of output that look like this:</p>
- <pre>
- 14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
- </pre>
- <p>Here is how to interpret the columns:</p>
- <ol>
- <li> Number of profiling samples in this function
- <li> Percentage of profiling samples in this function
- <li> Percentage of profiling samples in the functions printed so far
- <li> Number of profiling samples in this function and its callees
- <li> Percentage of profiling samples in this function and its callees
- <li> Function name
- </ol>
- <h3>Analyzing Callgrind Output</h3>
- <p>Use <a href="http://kcachegrind.sourceforge.net">kcachegrind</a> to
- analyze your callgrind output:</p>
- <pre>
- % pprof --callgrind /bin/ls ls.prof > ls.callgrind
- % kcachegrind ls.callgrind
- </pre>
- <p>The cost is specified in 'hits', i.e. how many times a function
- appears in the recorded call stack information. The 'calls' from
- function a to b record how many times function b was found in the
- stack traces directly below function a.</p>
- <p>Tip: if you use a debug build the output will include file and line
- number information and kcachegrind will show an annotated source
- code view.</p>
- <h3>Node Information</h3>
- <p>In the various graphical modes of pprof, the output is a call graph
- annotated with timing information, like so:</p>
- <A HREF="pprof-test-big.gif">
- <center><table><tr><td>
- <img src="pprof-test.gif">
- </td></tr></table></center>
- </A>
- <p>Each node represents a procedure. The directed edges indicate
- caller to callee relations. Each node is formatted as follows:</p>
- <center><pre>
- Class Name
- Method Name
- local (percentage)
- <b>of</b> cumulative (percentage)
- </pre></center>
- <p>The last one or two lines contains the timing information. (The
- profiling is done via a sampling method, where by default we take 100
- samples a second. Therefor one unit of time in the output corresponds
- to about 10 milliseconds of execution time.) The "local" time is the
- time spent executing the instructions directly contained in the
- procedure (and in any other procedures that were inlined into the
- procedure). The "cumulative" time is the sum of the "local" time and
- the time spent in any callees. If the cumulative time is the same as
- the local time, it is not printed.</p>
- <p>For instance, the timing information for test_main_thread()
- indicates that 155 units (about 1.55 seconds) were spent executing the
- code in <code>test_main_thread()</code> and 200 units were spent while
- executing <code>test_main_thread()</code> and its callees such as
- <code>snprintf()</code>.</p>
- <p>The size of the node is proportional to the local count. The
- percentage displayed in the node corresponds to the count divided by
- the total run time of the program (that is, the cumulative count for
- <code>main()</code>).</p>
- <h3>Edge Information</h3>
- <p>An edge from one node to another indicates a caller to callee
- relationship. Each edge is labelled with the time spent by the callee
- on behalf of the caller. E.g, the edge from
- <code>test_main_thread()</code> to <code>snprintf()</code> indicates
- that of the 200 samples in <code>test_main_thread()</code>, 37 are
- because of calls to <code>snprintf()</code>.</p>
- <p>Note that <code>test_main_thread()</code> has an edge to
- <code>vsnprintf()</code>, even though <code>test_main_thread()</code>
- doesn't call that function directly. This is because the code was
- compiled with <code>-O2</code>; the profile reflects the optimized
- control flow.</p>
- <h3>Meta Information</h3>
- <p>The top of the display should contain some meta information
- like:</p>
- <pre>
- /tmp/profiler2_unittest
- Total samples: 202
- Focusing on: 202
- Dropped nodes with <= 1 abs(samples)
- Dropped edges with <= 0 samples
- </pre>
- <p>This section contains the name of the program, and the total
- samples collected during the profiling run. If the
- <code>--focus</code> option is on (see the <a href="#focus">Focus</a>
- section below), the legend also contains the number of samples being
- shown in the focused display. Furthermore, some unimportant nodes and
- edges are dropped to reduce clutter. The characteristics of the
- dropped nodes and edges are also displayed in the legend.</p>
- <h3><a name=focus>Focus and Ignore</a></h3>
- <p>You can ask pprof to generate a display focused on a particular
- piece of the program. You specify a regular expression. Any portion
- of the call-graph that is on a path which contains at least one node
- matching the regular expression is preserved. The rest of the
- call-graph is dropped on the floor. For example, you can focus on the
- <code>vsnprintf()</code> libc call in <code>profiler2_unittest</code>
- as follows:</p>
- <pre>
- % pprof --gv --focus=vsnprintf /tmp/profiler2_unittest test.prof
- </pre>
- <A HREF="pprof-vsnprintf-big.gif">
- <center><table><tr><td>
- <img src="pprof-vsnprintf.gif">
- </td></tr></table></center>
- </A>
- <p>Similarly, you can supply the <code>--ignore</code> option to
- ignore samples that match a specified regular expression. E.g., if
- you are interested in everything except calls to
- <code>snprintf()</code>, you can say:</p>
- <pre>
- % pprof --gv --ignore=snprintf /tmp/profiler2_unittest test.prof
- </pre>
- <h3>Interactive mode</a></h3>
- <p>By default -- if you don't specify any flags to the contrary --
- pprof runs in interactive mode. At the <code>(pprof)</code> prompt,
- you can run many of the commands described above. You can type
- <code>help</code> for a list of what commands are available in
- interactive mode.</p>
- <h3><a name=options>pprof Options</a></h3>
- For a complete list of pprof options, you can run <code>pprof
- --help</code>.
- <h4>Output Type</h4>
- <p>
- <center>
- <table frame=box rules=sides cellpadding=5 width=100%>
- <tr valign=top>
- <td><code>--text</code></td>
- <td>
- Produces a textual listing. (Note: If you have an X display, and
- <code>dot</code> and <code>gv</code> installed, you will probably
- be happier with the <code>--gv</code> output.)
- </td>
- </tr>
- <tr valign=top>
- <td><code>--gv</code></td>
- <td>
- Generates annotated call-graph, converts to postscript, and
- displays via gv (requres <code>dot</code> and <code>gv</code> be
- installed).
- </td>
- </tr>
- <tr valign=top>
- <td><code>--dot</code></td>
- <td>
- Generates the annotated call-graph in dot format and
- emits to stdout (requres <code>dot</code> be installed).
- </td>
- </tr>
- <tr valign=top>
- <td><code>--ps</code></td>
- <td>
- Generates the annotated call-graph in Postscript format and
- emits to stdout (requres <code>dot</code> be installed).
- </td>
- </tr>
- <tr valign=top>
- <td><code>--pdf</code></td>
- <td>
- Generates the annotated call-graph in PDF format and emits to
- stdout (requires <code>dot</code> and <code>ps2pdf</code> be
- installed).
- </td>
- </tr>
- <tr valign=top>
- <td><code>--gif</code></td>
- <td>
- Generates the annotated call-graph in GIF format and
- emits to stdout (requres <code>dot</code> be installed).
- </td>
- </tr>
- <tr valign=top>
- <td><code>--list=<<i>regexp</i>></code></td>
- <td>
- <p>Outputs source-code listing of routines whose
- name matches <regexp>. Each line
- in the listing is annotated with flat and cumulative
- sample counts.</p>
- <p>In the presence of inlined calls, the samples
- associated with inlined code tend to get assigned
- to a line that follows the location of the
- inlined call. A more precise accounting can be
- obtained by disassembling the routine using the
- --disasm flag.</p>
- </td>
- </tr>
- <tr valign=top>
- <td><code>--disasm=<<i>regexp</i>></code></td>
- <td>
- Generates disassembly of routines that match
- <regexp>, annotated with flat and
- cumulative sample counts and emits to stdout.
- </td>
- </tr>
- </table>
- </center>
- <h4>Reporting Granularity</h4>
- <p>By default, pprof produces one entry per procedure. However you can
- use one of the following options to change the granularity of the
- output. The <code>--files</code> option seems to be particularly
- useless, and may be removed eventually.</p>
- <center>
- <table frame=box rules=sides cellpadding=5 width=100%>
- <tr valign=top>
- <td><code>--addresses</code></td>
- <td>
- Produce one node per program address.
- </td>
- </tr>
- <td><code>--lines</code></td>
- <td>
- Produce one node per source line.
- </td>
- </tr>
- <td><code>--functions</code></td>
- <td>
- Produce one node per function (this is the default).
- </td>
- </tr>
- <td><code>--files</code></td>
- <td>
- Produce one node per source file.
- </td>
- </tr>
- </table>
- </center>
- <h4>Controlling the Call Graph Display</h4>
- <p>Some nodes and edges are dropped to reduce clutter in the output
- display. The following options control this effect:</p>
- <center>
- <table frame=box rules=sides cellpadding=5 width=100%>
- <tr valign=top>
- <td><code>--nodecount=<n></code></td>
- <td>
- This option controls the number of displayed nodes. The nodes
- are first sorted by decreasing cumulative count, and then only
- the top N nodes are kept. The default value is 80.
- </td>
- </tr>
- <tr valign=top>
- <td><code>--nodefraction=<f></code></td>
- <td>
- This option provides another mechanism for discarding nodes
- from the display. If the cumulative count for a node is
- less than this option's value multiplied by the total count
- for the profile, the node is dropped. The default value
- is 0.005; i.e. nodes that account for less than
- half a percent of the total time are dropped. A node
- is dropped if either this condition is satisfied, or the
- --nodecount condition is satisfied.
- </td>
- </tr>
- <tr valign=top>
- <td><code>--edgefraction=<f></code></td>
- <td>
- This option controls the number of displayed edges. First of all,
- an edge is dropped if either its source or destination node is
- dropped. Otherwise, the edge is dropped if the sample
- count along the edge is less than this option's value multiplied
- by the total count for the profile. The default value is
- 0.001; i.e., edges that account for less than
- 0.1% of the total time are dropped.
- </td>
- </tr>
- <tr valign=top>
- <td><code>--focus=<re></code></td>
- <td>
- This option controls what region of the graph is displayed
- based on the regular expression supplied with the option.
- For any path in the callgraph, we check all nodes in the path
- against the supplied regular expression. If none of the nodes
- match, the path is dropped from the output.
- </td>
- </tr>
- <tr valign=top>
- <td><code>--ignore=<re></code></td>
- <td>
- This option controls what region of the graph is displayed
- based on the regular expression supplied with the option.
- For any path in the callgraph, we check all nodes in the path
- against the supplied regular expression. If any of the nodes
- match, the path is dropped from the output.
- </td>
- </tr>
- </table>
- </center>
- <p>The dropped edges and nodes account for some count mismatches in
- the display. For example, the cumulative count for
- <code>snprintf()</code> in the first diagram above was 41. However
- the local count (1) and the count along the outgoing edges (12+1+20+6)
- add up to only 40.</p>
- <h1>Caveats</h1>
- <ul>
- <li> If the program exits because of a signal, the generated profile
- will be <font color=red>incomplete, and may perhaps be
- completely empty</font>.
- <li> The displayed graph may have disconnected regions because
- of the edge-dropping heuristics described above.
- <li> If the program linked in a library that was not compiled
- with enough symbolic information, all samples associated
- with the library may be charged to the last symbol found
- in the program before the library. This will artificially
- inflate the count for that symbol.
- <li> If you run the program on one machine, and profile it on
- another, and the shared libraries are different on the two
- machines, the profiling output may be confusing: samples that
- fall within shared libaries may be assigned to arbitrary
- procedures.
- <li> If your program forks, the children will also be profiled
- (since they inherit the same CPUPROFILE setting). Each process
- is profiled separately; to distinguish the child profiles from
- the parent profile and from each other, all children will have
- their process-id appended to the CPUPROFILE name.
- <li> Due to a hack we make to work around a possible gcc bug, your
- profiles may end up named strangely if the first character of
- your CPUPROFILE variable has ascii value greater than 127.
- This should be exceedingly rare, but if you need to use such a
- name, just set prepend <code>./</code> to your filename:
- <code>CPUPROFILE=./Ägypten</code>.
- </ul>
- <hr>
- <address>Sanjay Ghemawat<br>
- <!-- Created: Tue Dec 19 10:43:14 PST 2000 -->
- <!-- hhmts start -->
- Last modified: Fri May 9 14:41:29 PDT 2008
- <!-- hhmts end -->
- </address>
- </BODY>
- </HTML>
|