Programming for the IBM pSeries
The primary purpose of the SCV pSeries computer system is for multiprocessor applications. The system
consists of the login node, twister.bu.edu, along with 8 additional
compute nodes (for a total 72 processors) dedicated to batch processing. The login node is available for users to login
for code development, compilation, and running jobs. All long running jobs must be
submitted to batch to run on the compute nodes.
If you have an existing code that you plan to run on the pSeries, please go to the
Code Porting page to learn about facts
regarding the system that may impact your specific applications on the pSeries.
If your code requires external packages (e.g., linear algebra subroutines
), go to the packages page to see what is available.
Please visit Running Jobs for information
related to running jobs on the pSeries.
Compiling
On the IBM pSeries system, the primary parallel processing paradigms are MPI and OpenMP. Included below is a complete list of compilers for
FORTRAN, C, and C++.
Note in the above table that, for parallel processing using the shared memory
model OpenMP, always compile with a compiler whose name ends with
"_r" to guarantee "thread-safe" computation. A piece of code (e,g, a subroutine)
is said to be thread-safe if it can be acted upon simultaneously and safely by
multiple threads.
Some compiler flags for FORTRAN, C, and C++:
- Automatic parallelization: use -qsmp flag, e.g.,
twister% xlf_r -qsmp example.f -o example
To run:
twister% setenv OMP_NUM_THREADS 4
twister% example
- Parallelize with OpenMP
twister% xlf_r -qsmp=omp example.f -o example
To run:
twister% setenv OMP_NUM_THREADS 4
twister% example
- 64-bit addressing: -q64
twister% xlf -q64 example.f
- most aggressive optimization: -O5
twister% xlf -O5 example.f
- default data segment limit is 256 MB; to increase it to 2 GB
(maximum allowable value) specify -bmaxdata:0x80000000
(0x40000000 is 1G in hex). Note that if you elect to use
64-bit addressing (i.e., -q64), you will get, practically,
unlimited addressing. In this case, DO NOT use -bmaxdata.
- Optimization levels:
- default = none
- -O = standard optimization
- -O2 = same as -O
- -O3 -qstrict = more optimization than -O2; preserves semantics
- -O3 = more agggressive optimization
- -O4 = -O3 + -qhot + -qipa + -qarch=auto
- -O5 = -O3 + -qhot + -qipa=level=2 -qarch=auto
- Architecture-specific switches
- -qarch=ARCH (where ARCH is one of auto, pwr5, ...)
- -qtune=TUNE (where TUNE is one of auto, pwr5, ...)
- For C++, where necessary you need to include the header file
mpi.h as usual. In addition, to turn
on the headers for C++ in the mpi.h file, the following flag
must be included during compilation:
twister % mpCC -D_MPI_CPP_BINDINGS ...
- C Preprocessor
If your FORTRAN file contains cpp lines (like #define, #include, ...), there
are several ways to make the compiler recognizes them:
- change the suffix from .f to .F (supported by many compilers)
- compile with -qsuffix=cpp=f (compiler specific)
(if you also want to see how the compiler handles the
cpp statements, add also "-d" to the compile line; if your
file is called bar.f, a file Fbar.f will be created.
This file contains, in addition to the FORTRAN part, the
expanded cpp statements.
- Function inlining
- -Q inline all subprograms deemed appropriate
by compiler
- -Q+fct1:fct2:...:fctn inline specific subprograms
- -Q-fct1:fct2:...:fctn do not inline specific subprograms
- alternately, can use -qipa switch, see for example
xlf manpage
- Have compiler generate information
- Array-bound check
- -C run time array bounds checking
- -qcheck long form of -C
Mixing C with FORTRAN
- When calling a FORTRAN routine from within a C program, DO NOT
append an underscore ( _ )
- On the other hand, append an underscore ( _ ) if you are
using the following library functions: dtime (dtime_), etime
(etime_), flush (flush_)
Code compilation examples for MPI and OpenMP
- MPI
- Compilation examples for MPI.
To enable MPI, use the appropriate compiler with the "mp" prefix.
You don't have to explicitly link with the MPI library, it's done
automatically.
twister% mpxlf mycode.f
twister% mpxlf90 mycode.f90
- How to compile with 64-bit addressing.
Include "-q64" switch.
- OpenMP
- Compilation examples for OpenMP.
twister% xlf_r -qsmp=omp example.f
twister% xlf90_r -qsmp=omp example.f90
- Running jobs for OpenMP codes. Example
twister% setenv OMP_NUM_THREADS 4
twister% a.out
File Access
- All the IBM pSeries compute nodes share the same file system.
- SCRATCH DISK - On occasions, there are jobs that produce large, temporary, output files.
For these situations, a "scratch" disk is available on each node. There is no
need to apply for space on these disks. Note that, unlike your home directory,
/scratch is NOT BACKED UP and files created there will be removed by the system
after 10 days.
Note also that each pSeries compute node has its own /scratch space.
This is important for batch jobs that require access to
the scratch disk. When you are on twister, the interactive node, a
reference to /scratch points to /twister/scratch. To avoid ambiguity,
you should always use the full reference of /NODE/scratch, where
NODE is the name of the node on whose scratch disk you want to
have access, which may or may not be the node on which the job is run.
It is, however, advisable to perform I/O locally to achieve better
efficiency.
For instance, you could specify /scrabble/scratch if you
submit a 32-processor batch job (please see Technical Summary for available queues). For many, it suffices to
always refer to /twister/scratch as home for both their input and
output files, regardless of the batch queue. This is most convenient
in most situations. However, if you have extraordinarily large
output files or if /twister/scratch cannot accomodate your needs,
writing to the scratch of the node associated with the queue would
be advised. (More ...)
- You can access your home directory among the three systems (i.e., pSeries, Bluegene, and Linux Cluster)
by prepending with /ibm or /linux,
twister% cd /linux/usr1/scv/kadin
This sends you from the pSeries to the home directory of the Cluster and Blue Gene
(they share the same file system).
cootie% cd /ibm/usr1/scv/kadin
This sends you from the Linux file system to the pSeries file system.
Programming Tools
- Debuggers
- pdbx debugger
- idebug IBM GUI-based Distributed Debugger
- dbx The basic command-line debugger
- gdb debugger (with ddd GUI front-end)
- Profilers (to report on the performance of a computer program)
-
prof -- a flat profiler.
-
gprof -- a call-graph profiler.
-
pct -- a performance collection tool.
|
|