Scientific Computing & Visualization
Help Contact
About Accounts Computation Visualization Documentation Services

How to Use the Portland Group pgprof Profiler

On the Katana, you can use the Portland Group's pgprof multi-process and multi-threaded profiler to profile MPI and OpenMP FORTRAN or C codes. For more details, please consult the PGI Tools Guide (PDF).

Please note that only two licenses of pgprof are available on the system. If both are in use, you will have to wait for one to free up. When you are done using pgprof, please exit from it to avoid blocking others from accessing it.

Profiling MPI codes with pgprof

Profiling an MPI code requires four steps:
  1. Step 1. Select MPI_IMPLEMENTATION to be mpich.
    
    katana:~ % setenv MPI_IMPLEMENTATION mpich
    katana:~ % printenv MPI_IMPLEMENTATION
    mpich
    
    The latter, optional, command confirms that mpich is the active MPI_IMPLEMENTATION.
  2. Step 2. Compile code with -Mprof=time,mpi,func
    
    katana:~ % mpif77 -o example example.f -Mprof=mpi,func
    
  3. Step 3. Run code to generate timing data.
    
    katana:~ % mpirun -np 4 example
    
    • Running the code interactively limits the number of tasks to 4.
    • Four output files will be generated for this run: pgprof.out, pgprof.out1, pgprof.out2, and pgprof.out3.
    • You will need to run the job in batch if more than 4 slots is needed or if the job requires more than a few minutes to run. Don't forget to reset MPI_IMPLEMENTATION to mpich in your batch script:
      
        . . .
      
        export MPI_IMPLEMENTATION=mpich
      
        . . .
      
  4. Step 4. Run the profiler to display performance data through GUI.
    You will need X-server running because pgprof will spawn a GUI window as shown below.
    
    katana:~ % pgprof -exe example1
    

    In the above GUI window, the elapsed time and slots used (4 processes) are displayed immediately below the task bar at the top. The left sub-window shows the starting locations for "main", functions "integral" and "fct" within the single source file example1_2.f. The function "integral" is called by main exactly once per process. The function "integral" calls "fct" 500 times. Since 4 slots are used, the work load for "fct" is 25%, which is also shown graphically through a horizontal bar in red. All these information is displayed on the top right sub-window. The bottom window shows the work load of "example1" (shaded in blue). You may select "fct" by clicking on "fct" and the bottom window will respond in kind.

    • If pgprof -o example1 were used, the line near the top of the GUI window that reports the wall clock time would print "example1_2" instead of "a.out" as the executable name.
    • pgprof may also be invoked in the text mode
      
      katana:F77/tmp % pgprof -exe example1 -text
      
      pgprof 7.1-1
      Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved.
      Copyright 2000-2007, STMicroelectronics, Inc. All Rights Reserved.
      
      
      Datafile  : pgprof.out
      Processes : 4
      Threads   : 4
      
      pgprof> print
      
      Profile output - Tue Nov 20 09:06:50 EST 2007
      Program                           : a.out
      Datafile                          : pgprof.out
      Process                           : 0
      Total Time for Process            : 0.100336 secs
      Sort by max time 
      Select all
      
      
                         Routine      Source  Line  
      Calls  Time(%)        Name        File   No.  
                                                    
          1       91  example1_2  example1.f     1  
        500        5         fct  example1.f    88  
          1        4    integral  example1.f    73  
      
      pgprof> 
      

    Profiling OpenMP codes with pgprof

    To start pgprof, follow the step-by-step procedures shown below:

    • Step 1. Compile the program using the appropriate PGI compiler with -Mprof=func to turn on profiling on a function level.
      
      katana:~ % pgf77 -o example example.f -Mprof=time,func -O3
      
    • Step 2. Run the compiled code (either interactively or in batch) to generate a single pgprof.out data file.
    • Step 3. Run the pgprof profiler to collect profiling data.
      By default, a GUI-based window will be launched which reports profiling data extracted from the file pgprof.out. You will need an X-server based windowing software for this to work. At BU, if you access Katana from a Windows-based PC, you can download x-win32 for free.

      • GUI Method
        
        katana:~ % pgprof -exe example1
        

        In the above GUI window, the elapsed time is displayed immediately below the task bar at the top. At the bottom right corner, there is a drop-down menu. For OpenMP applications, you should select "Threads" as in the picture. The left sub-window shows "main" as well as functions "integral" and "fct", all belonging to the same source file, example1.f. The bottom window shows the activities of all the threads for the active function which is highlighted in blue in the left top window. In this case, "main" is the master thread and hence only thread 0 has actions. The function "integral" is called by main exactly once per thread (there are 4 threads). In turn, "integral" calls "fct" 500 times per thread for a total of 2000 times (see the right top window).

        Upon clicking "fct" with the mouse button, the bottom shows the corresponding activities:

      • Text Method
        
        katana:OpenMP/F77omp % pgprof -text -exe example1
        
        pgprof 7.1-1
        Copyright 1989-2000, The Portland Group, Inc. All Rights Reserved
        Copyright 2000-2007, STMicroelectronics, Inc. All Rights Reserved
        
        
        Datafile  : pgprof.out
        Processes : 1
        Threads   : 4
        
        pgprof> print
        
        Profile output - Sun Nov 18 09:47:21 EST 2007
        Program                           : a.out
        Datafile                          : pgprof.out
        Process                           : 0
        Total Time for Process            : 0.044703 secs
        Sort by max time 
        Select all
        
        
                         Routine      Source  Line  
        Calls  Time(%)      Name        File   No.  
                                                    
            1       51  example1  example1.f     1  
        2,000       29       fct  example1.f    59  
            4       20  integral  example1.f    44  
        
        pgprof> quit
        
        

Boston University
Boston University
 
OIT | CCS | September 16, 2008  
Scientific Computing & Visualization Boston University home page Boston University home page