[prev in list] [next in list] [prev in thread] [next in thread]
List: oprofile-list
Subject: FAQ contribution: Usage & interaction of PAPI & Oprofile
From: Harry Mangalam <hjm () tacgi ! com>
Date: 2005-12-20 17:58:53
Message-ID: do9gtf$5fh$1 () sea ! gmane ! org
[Download RAW message or body]
FAQ contribution:
This is probably longer and less specific than what you wanted, but I had
to write up this little HOWTO stanza for our own group, so you're welcome to
use whatever part of it you'd like for inclusion or clarification in your own
docs. The end of the Oprofile section has a bit that discusses the conflict
between Oprofile and the PAPI/HPCToolkit approach. If I've misrepresented
oprofile, please let me know - I'm but a simple user.
FAQ entry for the usage & interaction of PAPI & Oprofile
Many Linux machines will be set up to use both oprofile (now available in the
2.6 kernel source as a module [CONFIG_OPROFILE=m]) and tools which require
the PAPI API to do performance profiling (such as the HPCToolkit's hpcrun
[http://www.hipersoft.rice.edu/hpctoolkit/]). The latter requires a kernel
source patch and recompile to enable the PAPI API under Linux, as well as the
compilation of the 'perfctr' kernel module. I believe that other SW that
uses the PAPI API under Linux (such as U.Orgegon's sophisticated Tuning and
Analysis Utilities (tau [http://www.cs.uoregon.edu/research/tau/home.php])
also uses the perfctr module. Many distribution kernels come with the
oprofile module enabled; none that I'm aware of come with the PAPI patches
applied and usable (a shame - it's very useful to developers).
Both software approaches are quite useful and yield complementary (& some
overlapping) information and both have distinct advantages. However, the two
approaches cannot be used successively without some caution. Since both
kernel modules access some of the same resources, one must be unloaded before
the other is used.
In using oprofile, the web site is a good place to start - the documentation
and especially the examples are extremely useful.
[http://oprofile.sourceforge.net/docs]
NB: The Ubuntu distro that I use has no root user, so all root commands are
prefaced with 'sudo' to indicate a root-requiring command. On those systems
with root users, you could do this as root or even enable a root shell on a
Ubuntu-like distro with 'sudo bash'.
Oprofile first requires the module loading:
$ sudo modprobe oprofile
Second, initialize the 'oprofiled' daemon and start it collecting info. This
is a different approach from the HPCToolkit and allows oprofile to analyze
not only the application under investigation but the entire system for the
time being profiles including the kernel itself. The HPCToolkit is specific
for particular applications and as such does not require a daemon running.
$ sudo opcontrol --vmlinux=/path/to/vmlinux
Or when you don't have a vmlinux or don't want to profile the kernel
$ opcontrol --no-vmlinux
NOTE that this is the UNCOMPRESSED linux elf executable, not the typical
vmlinuz compressed boot sector that is installed in the /boot dir
In the case of my machine:
$ sudo opcontrol --vmlinux=/usr/src/linux-2.6.11/vmlinux
This machine is a dual opteron. If I wanted to profile each CPU separately, I
would invoke it with:
$ sudo opcontrol --separate=cpu --vmlinux=/usr/src/linux-2.6.11/vmlinux
to report profiling on both CPUs
Note that once enabled for BOTH CPUs, you have to explicitly shut it off for
succeeding runs where you want the results pooled for both CPUs.
$ sudo opcontrol --separate=none --vmlinux=/usr/src/linux-2.6.11/vmlinux
Next, start the profiling with:
$ sudo opcontrol --start
When ready to collect info, do a 'sudo ls' to init the timeout on the sudo
command so later ones don't ask for passwords, then for an application (an
executable called ncbo in the following example) and assuming that it has
been compiled with the '-g' flag:
# first reset the counters:
$ sudo opcontrol --reset
# execute the command
$ /home/hjm/nco/bin/ncbo -h -O --op_typ='-' -p /home/hjm/nco_bm \
ipcc_dly_T85.nc ipcc_dly_T85_00.nc /home/hjm/nco_bm/ipcc.diff.nc
# this command runs for > 60s, important as it's a statistical profiler
# when the program ends, dump the collected statistics
$ opreport --exclude-dependent --demangle=smart --symbols > \
oprofile.report.full.ncbo
The above stanza is meant to be run as a shell or moused into a shell window
so there is minimal delay from resetting the counters to running the proram
to generating the output. This ensures that the profiling data is specific
to the application that is running.
The output is a human-readable text file that will give you the time spent in
each function. The poll_idle time is that time which the CPU(s) has spent
doing NOTHING. ie idling. For a lightly loaded dual-CPU machine,
you would expect to obtain about 50% in poll_idle running a single serial job.
Cleaning up after Oprofile. Since Oprofile runs as a daemon, it adds a very
small amount of CPU and memory overhead to a running system. To remove that
overhead, you have to explicitly kill the daemon:
$ sudo opcontrol --shutdown
This next part is not well-documented and only causes a problem if you want to
run a PAPI-based profiler such as hpcrun. You MUST remove the oprofile
module and this cannot be done via the usual 'rmmod oprofile' approach.
There is a specific command to do it:
$ opcontrol --deinit
If the oprofile module is loaded and you try to run 'hpcrun' (even to get a
list of available options), you'll get an unhelpful error like this:
$ hpcrun -L
(pid 27342): PAPI library initialization failure - expected version 50397184,
dynamic library was version -3. Aborting.
This is diagnostic (I believe) that the oprofile module is still loaded and
that the perfctr and oprofile modules are fighting over the CPU.
Using hpcrun:
=============
Don't forget that in order for the hpcrun to work, the perftr module has to be
modprobe-loaded AND /dev/perfctr has to be chmod to 644.
Using the HPCToolkit:
first make sure that the oprofile module is not loaded:
$ lsmod |grep oprofile
should return nothing. If it gives you an indication that the oprofile module
IS loaded, unload it
with the command:
$ sudo opcontrol --deinit
then load the perfctr module to allow the PAPI API access to the hardware
counters.
$ modprobe perfctr
After this, it is relatively straightforward. Anything you want to profile,
just run it behind the 'hpcrun'
command:
$hpcrun (options) -- home/hjm/nco/bin/ncbo -h -O --op_typ='-'
-p /home/hjm/nco_bm \
ipcc_dly_T85.nc ipcc_dly_T85_00.nc /home/hjm/nco_bm/ipcc.diff.nc
the (options) are typically a set of hardware counters you want to access
during the run. On an Opteron,
the available options can be got by running:
$ hpcrun -L |grep Yes
517 $ hpcrun -L |grep Yes
PAPI_L2_DCM Yes Level 2 data cache misses ()
PAPI_L2_ICM Yes Level 2 instruction cache misses ()
PAPI_FPU_IDL Yes Cycles floating point units are idle ()
PAPI_TLB_DM Yes Data translation lookaside buffer misses ()
PAPI_TLB_IM Yes Instruction translation lookaside buffer misses ()
PAPI_L1_LDM Yes Level 1 load misses ()
PAPI_L1_STM Yes Level 1 store misses ()
PAPI_L2_LDM Yes Level 2 load misses ()
PAPI_L2_STM Yes Level 2 store misses ()
PAPI_STL_ICY Yes Cycles with no instruction issue ()
PAPI_HW_INT Yes Hardware interrupts ()
PAPI_BR_TKN Yes Conditional branch instructions taken ()
PAPI_BR_MSP Yes Conditional branch instructions mispredicted ()
PAPI_TOT_INS Yes Instructions completed ()
PAPI_FP_INS Yes Floating point instructions ()
PAPI_BR_INS Yes Branch instructions ()
PAPI_VEC_INS Yes Vector/SIMD instructions ()
PAPI_RES_STL Yes Cycles stalled on any resource ()
PAPI_TOT_CYC Yes Total cycles ()
PAPI_L2_DCH Yes Level 2 data cache hits ()
PAPI_L1_DCA Yes Level 1 data cache accesses ()
PAPI_L2_DCR Yes Level 2 data cache reads ()
PAPI_L2_DCW Yes Level 2 data cache writes ()
PAPI_L2_ICH Yes Level 2 instruction cache hits ()
PAPI_L1_ICA Yes Level 1 instruction cache accesses ()
PAPI_L1_ICR Yes Level 1 instruction cache reads ()
PAPI_FML_INS Yes Floating point multiply instructions ()
PAPI_FAD_INS Yes Floating point add instructions ()
PAPI_FP_OPS Yes Floating point operations ()
these options can be requested by inserting them into the (options) space, for
example, as:
$ hpcrun -e PAPI_TOT_CYC:32767 -e PAPI_FP_OPS:32767 -e PAPI_FP_INS:32767\
-e PAPI_HW_INT:32767 -e PAPI_L2_DCM:32767 -- <command to profile>
[don't forget the '--' separator between the hpcrun command chain and the
application]
hpcrun will profile EVERYTHING that results from the <command to profile> so
if it's a shell command, it will profile every subcommand in the shell,
giving each its own output file in the form of:
<app_name>.PAPI_TOT_CYC.clay.ess.uci.edu.10137.0
The output files you're interested in can be processed into something usable
with 'hpcquick', a perl script that calls some other HPC tools to generate
the XML DB (in its own subdirectory) that the java browser 'hpcviewer'
needs.
# src location hpcrun output file to process
vvvvvvv vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
hpcquick -I src/nco -P ncwa.PAPI_TOT_CYC.clay.ess.uci.edu.10137.0
# view the results via java hpcviewer
hpcviewer # and open the './hpcquick.dbxxx/hpcquick.hpcviewer' file.
This will open a java-based source and data browser that can show you where
your application is spending time.
John Levon wrote:
> On Thu, Dec 15, 2005 at 07:03:54PM -0800, Harry Mangalam wrote:
>
>> Does this mean that on kernels >2.5, the module can be unloaded safely
>> even if it IS an SMP system?
>
> Yes. Use opcontrol --deinit (or unmount oprofilefs and unload directly).
>
> john
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files
> for problems? Stop! Download the new AJAX search engine that makes
> searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
oprofile-list mailing list
oprofile-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic