Some tips and tricks 🙂
1.Get it debugged/running in serial
- use gdb, don’t forget to compile with -g and -Wall. Did you eliminate/address all warnings?
2. Know your memory
- run pmap with your application at least once to get a snapshot, pay particular attention to stacks and total memory. Did you request enough memory through PBS etc?
3. Know your code; use callgrind, do this with your serial code. With this data, does your parallel strategy make sense cf Amdahl’s law?
3.Solve your leaks, avoid realloc
- valgrind/memcheck is your friend, do this with your serial code. Keep in mind that the your scheduler will set hard, lower limits which is some default or what you request explicitly in your job submission script, which brings me too..
4. Know your limits
- run ulimit -a; pay particular attention to stack size, it’s easy to blow particularly if you insist on statically allocating massive arrays.
5. Know your compiler eg., there are many useful lesser-used flags you can set
6. Know your thread programming pitfalls eg.,
- stack size is different to the one you find with ulimit -s
- gratuitous critical/atomic sections/ops, use valgrind –tool=helgrind
- races eg., bad/no use of shared/private clauses, use valgrind –tool=helgrind
7. Know some signals & system call so you can run strace and interpret output eg., “see that SIGFPE? That’s bad”
8. Avoid I/O
9. Debug in parallel eg., mpirun -np 4 valgrind my_application