Scientific Computing & Visualization
Help Contact
About Accounts Computation Visualization Documentation Services

Programming for the IBM pSeries

The primary purpose of the SCV pSeries computer system is for multiprocessor applications. The system consists of the login node, twister.bu.edu, along with 8 additional compute nodes (for a total 72 processors) dedicated to batch processing. The login node is available for users to login for code development, compilation, and running jobs. All long running jobs must be submitted to batch to run on the compute nodes.

If you have an existing code that you plan to run on the pSeries, please go to the Code Porting page to learn about facts regarding the system that may impact your specific applications on the pSeries. If your code requires external packages (e.g., linear algebra subroutines ), go to the packages page to see what is available. Please visit Running Jobs for information related to running jobs on the pSeries.

Compiling  

On the IBM pSeries system, the primary parallel processing paradigms are MPI and OpenMP. Included below is a complete list of compilers for FORTRAN, C, and C++.

Language Serial Default File Suffix MPI OpenMP or auto-parallel MPI+OpenMP
FORTRAN 77 xlf .f mpxlf xlf_r mpxlf_r
FORTRAN 90 xlf90 .f,.f90 mpxlf90 xlf90_r mpxlf90_r
  f90.f90    
FORTRAN 95 xlf95 .f, .f90 mpxlf95 xlf95_r mpxlf95_r
  f95.f, .f90   
C cc .c mpcc cc_r mpcc_r
   xlc .c   xlc_r  
C++ xlC .C, .c mpCC xlC_r mpCC_r

Note in the above table that, for parallel processing using the shared memory model OpenMP, always compile with a compiler whose name ends with "_r" to guarantee "thread-safe" computation. A piece of code (e,g, a subroutine) is said to be thread-safe if it can be acted upon simultaneously and safely by multiple threads.

Some compiler flags for FORTRAN, C, and C++:

  • Automatic parallelization: use -qsmp flag, e.g.,
    twister% xlf_r -qsmp example.f -o example
    To run:
    twister% setenv OMP_NUM_THREADS 4
    twister% example
  • Parallelize with OpenMP
    twister% xlf_r -qsmp=omp example.f -o example
    To run:
    twister% setenv OMP_NUM_THREADS 4
    twister% example
  • 64-bit addressing: -q64
    twister% xlf -q64 example.f
  • most aggressive optimization: -O5
    twister% xlf -O5 example.f
  • default data segment limit is 256 MB; to increase it to 2 GB (maximum allowable value) specify -bmaxdata:0x80000000 (0x40000000 is 1G in hex). Note that if you elect to use 64-bit addressing (i.e., -q64), you will get, practically, unlimited addressing. In this case, DO NOT use -bmaxdata.
  • Optimization levels:
    • default = none
    • -O = standard optimization
    • -O2 = same as -O
    • -O3 -qstrict = more optimization than -O2; preserves semantics
    • -O3 = more agggressive optimization
    • -O4 = -O3 + -qhot + -qipa + -qarch=auto
    • -O5 = -O3 + -qhot + -qipa=level=2 -qarch=auto
  • Architecture-specific switches
    • -qarch=ARCH (where ARCH is one of auto, pwr5, ...)
    • -qtune=TUNE (where TUNE is one of auto, pwr5, ...)
  • For C++, where necessary you need to include the header file mpi.h as usual. In addition, to turn on the headers for C++ in the mpi.h file, the following flag must be included during compilation:
    twister % mpCC -D_MPI_CPP_BINDINGS ...
  • C Preprocessor
    If your FORTRAN file contains cpp lines (like #define, #include, ...), there are several ways to make the compiler recognizes them:
    • change the suffix from .f to .F (supported by many compilers)
    • compile with -qsuffix=cpp=f (compiler specific)
      (if you also want to see how the compiler handles the cpp statements, add also "-d" to the compile line; if your file is called bar.f, a file Fbar.f will be created. This file contains, in addition to the FORTRAN part, the expanded cpp statements.
  • Function inlining
    • -Q — inline all subprograms deemed appropriate by compiler
    • -Q+fct1:fct2:...:fctn — inline specific subprograms
    • -Q-fct1:fct2:...:fctn — do not inline specific subprograms
    • alternately, can use -qipa switch, see for example xlf manpage
  • Have compiler generate information
    • -qlist
    • -qsource
    • -qreport
  • Array-bound check
    • -C — run time array bounds checking
    • -qcheck — long form of -C

Mixing C with FORTRAN 

  • When calling a FORTRAN routine from within a C program, DO NOT append an underscore ( _ )
  • On the other hand, append an underscore ( _ ) if you are using the following library functions: dtime (dtime_), etime (etime_), flush (flush_)

Code compilation  examples for MPI and OpenMP

  • MPI
    • Compilation examples for MPI.

      To enable MPI, use the appropriate compiler with the "mp" prefix. You don't have to explicitly link with the MPI library, it's done automatically.
      twister% mpxlf mycode.f
      twister% mpxlf90 mycode.f90
    • How to compile with 64-bit addressing.
      Include "-q64" switch.
  • OpenMP
    • Compilation examples for OpenMP.
      twister% xlf_r -qsmp=omp example.f
      twister% xlf90_r -qsmp=omp example.f90
    • Running jobs for OpenMP codes. Example
      twister% setenv OMP_NUM_THREADS 4
      twister% a.out

File Access  

  • All the IBM pSeries compute nodes share the same file system.
  • SCRATCH DISK - On occasions, there are jobs that produce large, temporary, output files. For these situations, a "scratch" disk is available on each node. There is no need to apply for space on these disks. Note that, unlike your home directory, /scratch is NOT BACKED UP and files created there will be removed by the system after 10 days. Note also that each pSeries compute node has its own /scratch space. This is important for batch jobs that require access to the scratch disk. When you are on twister, the interactive node, a reference to /scratch points to /twister/scratch. To avoid ambiguity, you should always use the full reference of /NODE/scratch, where NODE is the name of the node on whose scratch disk you want to have access, which may or may not be the node on which the job is run. It is, however, advisable to perform I/O locally to achieve better efficiency. For instance, you could specify /scrabble/scratch if you submit a 32-processor batch job (please see Technical Summary for available queues). For many, it suffices to always refer to /twister/scratch as home for both their input and output files, regardless of the batch queue. This is most convenient in most situations. However, if you have extraordinarily large output files or if /twister/scratch cannot accomodate your needs, writing to the scratch of the node associated with the queue would be advised. (More ...)
  • You can access your home directory among the three systems (i.e., pSeries, Bluegene, and Linux Cluster) by prepending with /ibm or /linux,
    twister% cd /linux/usr1/scv/kadin
    This sends you from the pSeries to the home directory of the Cluster and Blue Gene (they share the same file system).
    cootie% cd /ibm/usr1/scv/kadin
    This sends you from the Linux file system to the pSeries file system.

Programming Tools  

  • Debuggers
    • pdbx debugger
    • idebug IBM GUI-based Distributed Debugger
    • dbx The basic command-line debugger
    • gdb debugger (with ddd GUI front-end)
  • Profilers (to report on the performance of a computer program)
    • prof -- a flat profiler.
    • gprof -- a call-graph profiler.
    • pct -- a performance collection tool.
Boston University
Boston University
 
OIT | CCS | September 15, 2009  
Scientific Computing & Visualization Boston University home page Boston University home page