[prev in list] [next in list] [prev in thread] [next in thread] 

List:       oprofile-list
Subject:    FAQ contribution: Usage & interaction of PAPI & Oprofile
From:       Harry Mangalam <hjm () tacgi ! com>
Date:       2005-12-20 17:58:53
Message-ID: do9gtf$5fh$1 () sea ! gmane ! org
[Download RAW message or body]

FAQ contribution:

This is probably longer and less specific than what you wanted, but I had 
to write up this little HOWTO stanza for our own group, so you're welcome to 
use whatever part of it you'd like for inclusion or clarification in your own 
docs.  The end of the Oprofile section has a bit that discusses the conflict 
between Oprofile and the PAPI/HPCToolkit approach.  If I've misrepresented
oprofile, please let me know - I'm but a simple user.


FAQ entry for the usage & interaction of PAPI & Oprofile

Many Linux machines will be set up to use both oprofile (now available in the 
2.6 kernel source as a module [CONFIG_OPROFILE=m]) and tools which require 
the PAPI API to do performance profiling (such as the HPCToolkit's hpcrun 
[http://www.hipersoft.rice.edu/hpctoolkit/]).  The latter requires a kernel 
source patch and recompile to enable the PAPI API under Linux, as well as the 
compilation of the 'perfctr' kernel module.  I believe that other SW that 
uses the PAPI API under Linux (such as U.Orgegon's sophisticated Tuning and 
Analysis Utilities (tau [http://www.cs.uoregon.edu/research/tau/home.php]) 
also uses the perfctr module.  Many distribution kernels come with the 
oprofile module enabled; none that I'm aware of come with the PAPI patches 
applied and usable (a shame - it's very useful to developers).

Both software approaches are quite useful and yield complementary (& some 
overlapping) information and both have distinct advantages.  However, the two 
approaches cannot be used successively without some caution.  Since both 
kernel modules access some of the same resources, one must be unloaded before 
the other is used.

In using oprofile, the web site is a good place to start - the documentation 
and especially the examples are extremely useful. 
[http://oprofile.sourceforge.net/docs]

NB: The Ubuntu distro that I use has no root user, so all root commands are 
prefaced with 'sudo' to indicate a root-requiring command.  On those systems 
with root users, you could do this as root or even enable a root shell on a 
Ubuntu-like distro with 'sudo bash'.

Oprofile first requires the module loading:

  $ sudo modprobe oprofile

Second, initialize the 'oprofiled' daemon and start it collecting info. This 
is a different approach from the HPCToolkit and allows oprofile to analyze 
not only the application under investigation but the entire system for the 
time being profiles including the kernel itself.  The HPCToolkit is specific 
for particular applications and as such does not require a daemon running.

  $ sudo opcontrol --vmlinux=/path/to/vmlinux

Or when you don't have a vmlinux or don't want to profile the kernel
  $ opcontrol --no-vmlinux

NOTE that this is the UNCOMPRESSED linux elf executable, not the typical 
vmlinuz compressed boot sector that is installed in the /boot dir

In the case of my machine:

  $ sudo opcontrol --vmlinux=/usr/src/linux-2.6.11/vmlinux

This machine is a dual opteron.  If I wanted to profile each CPU separately, I 
would invoke it with:

$ sudo opcontrol  --separate=cpu --vmlinux=/usr/src/linux-2.6.11/vmlinux 

to report profiling on both CPUs

Note that once enabled for BOTH CPUs, you have to explicitly shut it off for 
succeeding runs where you want the results pooled for both CPUs.

  $ sudo opcontrol --separate=none --vmlinux=/usr/src/linux-2.6.11/vmlinux

Next, start the profiling with:
 
  $ sudo opcontrol --start

When ready to collect info, do a 'sudo ls' to init the timeout on the sudo 
command so later ones don't ask for passwords, then for an application (an 
executable called ncbo in the following example) and assuming that it has 
been compiled with the '-g' flag:

# first reset the counters:
  $ sudo opcontrol --reset
# execute the command
  $ /home/hjm/nco/bin/ncbo -h -O  --op_typ='-' -p /home/hjm/nco_bm  \
    ipcc_dly_T85.nc ipcc_dly_T85_00.nc /home/hjm/nco_bm/ipcc.diff.nc
# this command runs for > 60s, important as it's a statistical profiler
# when the program ends, dump the collected statistics
  $ opreport --exclude-dependent --demangle=smart --symbols > \
    oprofile.report.full.ncbo

The above stanza is meant to be run as a shell or moused into a shell window 
so there is minimal delay from resetting the counters to running the proram 
to generating the output.  This ensures that the profiling data is specific 
to the application that is running.

The output is a human-readable text file that will give you the time spent in 
each function.  The poll_idle time is that time which the CPU(s) has spent 
doing NOTHING. ie idling.  For a lightly loaded dual-CPU machine, 
you would expect to obtain about 50% in poll_idle running a single serial job.

Cleaning up after Oprofile.  Since Oprofile runs as a daemon, it adds a very 
small amount of CPU and memory overhead to a running system.  To remove that 
overhead, you have to explicitly kill the daemon:

  $ sudo opcontrol --shutdown  

This next part is not well-documented and only causes a problem if you want to 
run a PAPI-based profiler such as hpcrun.  You MUST remove the oprofile 
module and this cannot be done via the usual 'rmmod oprofile' approach.  
There is a specific command to do it:

  $ opcontrol --deinit

If the oprofile module is loaded and you try to run 'hpcrun' (even to get a 
list of available options), you'll get an unhelpful error like this:

  $ hpcrun -L
(pid 27342): PAPI library initialization failure - expected version 50397184, 
dynamic library was version -3. Aborting.

This is diagnostic (I believe) that the oprofile module is still loaded and 
that the perfctr and oprofile modules are fighting over the CPU.



Using hpcrun:
=============

Don't forget that in order for the hpcrun to work, the perftr module has to be 
modprobe-loaded AND /dev/perfctr has to be chmod to 644.

Using the HPCToolkit:
first make sure that the oprofile module is not loaded:

  $ lsmod |grep oprofile
  
should return nothing.  If it gives you an indication that the oprofile module 
IS loaded, unload it 
with the command:

  $ sudo opcontrol --deinit

then load the perfctr module to allow the PAPI API access to the hardware 
counters.

$ modprobe perfctr

After this, it is relatively straightforward. Anything you want to profile, 
just run it behind the 'hpcrun'
command:

  $hpcrun (options) -- home/hjm/nco/bin/ncbo -h -O  --op_typ='-' 
-p /home/hjm/nco_bm  \
  ipcc_dly_T85.nc ipcc_dly_T85_00.nc /home/hjm/nco_bm/ipcc.diff.nc
  
the (options) are typically a set of hardware counters you want to access 
during the run. On an Opteron,
the available options can be got by running:

$ hpcrun -L |grep Yes
517 $ hpcrun -L |grep Yes
PAPI_L2_DCM     Yes     Level 2 data cache misses ()
PAPI_L2_ICM     Yes     Level 2 instruction cache misses ()
PAPI_FPU_IDL    Yes     Cycles floating point units are idle ()
PAPI_TLB_DM     Yes     Data translation lookaside buffer misses ()
PAPI_TLB_IM     Yes     Instruction translation lookaside buffer misses ()
PAPI_L1_LDM     Yes     Level 1 load misses ()
PAPI_L1_STM     Yes     Level 1 store misses ()
PAPI_L2_LDM     Yes     Level 2 load misses ()
PAPI_L2_STM     Yes     Level 2 store misses ()
PAPI_STL_ICY    Yes     Cycles with no instruction issue ()
PAPI_HW_INT     Yes     Hardware interrupts ()
PAPI_BR_TKN     Yes     Conditional branch instructions taken ()
PAPI_BR_MSP     Yes     Conditional branch instructions mispredicted ()
PAPI_TOT_INS    Yes     Instructions completed ()
PAPI_FP_INS     Yes     Floating point instructions ()
PAPI_BR_INS     Yes     Branch instructions ()
PAPI_VEC_INS    Yes     Vector/SIMD instructions ()
PAPI_RES_STL    Yes     Cycles stalled on any resource ()
PAPI_TOT_CYC    Yes     Total cycles ()
PAPI_L2_DCH     Yes     Level 2 data cache hits ()
PAPI_L1_DCA     Yes     Level 1 data cache accesses ()
PAPI_L2_DCR     Yes     Level 2 data cache reads ()
PAPI_L2_DCW     Yes     Level 2 data cache writes ()
PAPI_L2_ICH     Yes     Level 2 instruction cache hits ()
PAPI_L1_ICA     Yes     Level 1 instruction cache accesses ()
PAPI_L1_ICR     Yes     Level 1 instruction cache reads ()
PAPI_FML_INS    Yes     Floating point multiply instructions ()
PAPI_FAD_INS    Yes     Floating point add instructions ()
PAPI_FP_OPS     Yes     Floating point operations ()

these options can be requested by inserting them into the (options) space, for 
example, as:
  $ hpcrun -e PAPI_TOT_CYC:32767 -e PAPI_FP_OPS:32767  -e PAPI_FP_INS:32767\
    -e PAPI_HW_INT:32767  -e PAPI_L2_DCM:32767 -- <command to profile>
    
    [don't forget the '--' separator between the hpcrun command chain and the 
application]
    
hpcrun will profile EVERYTHING that results from the <command to profile> so 
if it's a shell command, it will profile every subcommand in the shell, 
giving each its own output file in the form of:
<app_name>.PAPI_TOT_CYC.clay.ess.uci.edu.10137.0

The output files you're interested in can be processed into something usable 
with 'hpcquick', a perl script that calls some other HPC tools to generate 
the XML DB  (in its own subdirectory) that the java browser 'hpcviewer' 
needs.

#        src location     hpcrun output file to process
            vvvvvvv       vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
hpcquick -I src/nco    -P ncwa.PAPI_TOT_CYC.clay.ess.uci.edu.10137.0

# view the results via java hpcviewer
hpcviewer # and open the './hpcquick.dbxxx/hpcquick.hpcviewer' file.

This will open a java-based source and data browser that can show you where 
your application is spending time.

John Levon wrote:

> On Thu, Dec 15, 2005 at 07:03:54PM -0800, Harry Mangalam wrote:
> 
>> Does this mean that on kernels >2.5, the module can be unloaded safely
>> even if it IS an SMP system?
> 
> Yes. Use opcontrol --deinit (or unmount oprofilefs and unload directly).
> 
> john
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log
> files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
oprofile-list mailing list
oprofile-list@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oprofile-list
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic