- If you are eligible for the Intel Compiler Free Download. Download the Free Non-Commercial Intel Compiler Download
- Build OpenMPI with Intel Compiler
- Install FFTW. Remember to install FFTW-2.1.x and not FFTW-3.x or you will face an issue fft3d.h(164): catastrophic error: could not open source file "fftw.h" . Read the LAMMPS "Getting Started" Section for more information
- When you are ready and about to compile, there are several "Make" selection found at "$SOURCE/lammps-30Mar10/src/MAKE". I chose the makefile.openmpi. Be default you do not need to edit the Makefile.openmpi. But if you are a guru and want to edit the file, feel free to
- Finally go to the preceding directory by typing
cd .. (ie $SOURCE/lammps-30Mar10/src) make openmpi -j (-j for parallel compilation)
- At the end of the compilation, you should see a lmp_openmpi binary at the src directory. You are almost done
- Check that the executable are properly linked by doing a
# ldd lmp_openmpi
- Remember to include /usr/local/lib in the LD_LIBRARY_PATH if libmpi_cxx.so.0 is located at /usr/local/lib
Thursday, April 29, 2010
Installing lammps using Intel Compilers, OpenMPI and FFTW
This is a entry on how I install LAMMPS using Intel, OpenMPI and FFTW
fft3d.h(164): catastrophic error: could not open source file "fftw.h"
I was compiling LAMMPS Molecular Dynamics Simulator.
- Using $SOURCE/lammps-30Mar10/src
- I compiled using make.linux. Quite soon, I encounter the following error fft3d.h(164): catastrophic error: could not open source file "fftw.h"
- I have compiled my fftw3 and my Intel Math Kernel library properly and was able to locate the header in my Intel Math Kernel Library. I've correctly "source" the path at LD_LIBRARY_PATH and /etc/ld.so.conf.d. But the LAMMPS is still not able to locate the library.
- I realise the cruz of the problem was that LAMPS requires FFTW-2.1.x. configuring and compiling fftw-2.x, the problem went away
CPMD consortium
The CPMD code is a parallelized plane wave/pseudopotential implementation of Density Functional Theory, particularly designed for ab-initio molecular dynamics.
Monday, April 26, 2010
UNIX Binary Gaussian 09 Revision A.02 Installation instructions
Taken and modified from the README.BIN for my environment. This deserve hightlight for adminstrators to setup it quickly.
Manual setup of TCP LINDA for Gaussian
To configure for TCP Linda for Gaussian to run Parallel on Nodes, all you need is to tweak the ntsnet and LindaLauncher file found at g09 directory. For TCP Linda to work in Gaussian, just make sure the LINDA_PATH is correct.
Auto-Install for Gaussian. This can also be found at Gaussian Installation Notes
Put the .tsnet.config in your home directory.
- Check that you have the correct versions of the OS, and libraries for your machine, as listed in the website G09 platform list
- Select or create a group (e.g. g09) which will own the Gaussian files inside the /etc/group. Users who will run Gaussian should either already be in this group, or should have this added to their list of groups.
- Create a Directory to place g09 and gv (For example gaussian). You can do it by using a command
mkdir gaussian
- Mount the Gaussian CD using commands like this one
mount /mnt/cdrom
- Within the CD, you can copy the gaussian binary contents (E64_930N.TGZ) out into your newly created gaussian directory.
- Untar it by using the command
tar -zxvf E64_930N.TGZ
- Change ownership for the newly created g09 directory from step 6.
chgrp -Rv g09 g09
- Install
cd g09 ./bsd/install
- Set the environment for users login
touch .login
Place the below contents into the .login
g09root=/usr/local/gaussian/ GAUSS_SCRDIR=/scratch/$USER export g09root GAUSS_SCRDIR . $g09root/g09/bsd/g09.profile
- Put it in your .bash_profile
source .login
Manual setup of TCP LINDA for Gaussian
To configure for TCP Linda for Gaussian to run Parallel on Nodes, all you need is to tweak the ntsnet and LindaLauncher file found at g09 directory. For TCP Linda to work in Gaussian, just make sure the LINDA_PATH is correct.
- ntsnet is found $g09root/ntsnet (where $g09root = /usr/local/gaussian/g09 in my installation)
- LindaLauncher is found in $g09root/linda8.2/opteron-linux/bin/LindaLauncher (where $g09root = /usr/local/gaussian/g09 in my installation)
- flc is found at $g09root/opteron-linux/bin/flc
- pmbuild is found at $g09root/opteron-linux/bin/pmbuild
- vntsnet is found at $g09root/opteron-linux/bin/vntsnet
LINDA_PATH=/usr/local/gaussian/g09/linda8.2/opteron-linux/
Auto-Install for Gaussian. This can also be found at Gaussian Installation Notes
# cd /usr/local/gaussian/g09 # ./bsd/install
Put the .tsnet.config in your home directory.
# touch .tsnet.config
Tsnet.Appl.nodelist: n01 n02 Tsnet.Appl.verbose: True Tsnet.Appl.veryverbose: True Tsnet.Node.lindarsharg: ssh Tsnet.Appl.useglobalconfig: True
Thursday, April 22, 2010
Advancing the Power of Visualization.
This is an interview by HPwire with Steve Briggs, HPCD’s SVA product marketing manager on Visualisation from HP point of view. Interesting information
Advancing the Power of Visualization –Coming Soon to Linux Clusters: 100 Million Pixels and More
Advancing the Power of Visualization –Coming Soon to Linux Clusters: 100 Million Pixels and More
GPFS Tuning Parameters
GPFS Tuning Parameters is a good wiki information resource written by IBM for GPFS Tuning. Just parroting some of the useful tips I have learned
To view the configuration parameters that has been changed from the default
To view the active value of any of these parameters you can run
To change any of these parameters use mmchconfig. For example to change the pagepool setting on all nodes.
1. Consideration to modify the PagePool
A. Sequential I/O
The default pagepool size may be sufficient for sequential IO workloads, however, a recommended value of 256MB is known to work well in many cases. To change the pagepool size
If the file system blocksize is larger than the default (256K), the pagepool size should be scaled accordingly. For example, if 1M blocksize is used, the default 64M pagepool should be increased by 4 times to 256M. This allows the same number of buffers to be cached.
B. Random I/O
The default pagepool size will likely not be sufficient for Random IO or workloads involving a large number of small files. In some cases allocating 4GB, 8GB or more memory can improve workload performance.
C. Random Direct IO
For database applications that use Direct IO, the pagepool is not used for any user data. It's main purpose in this case is for system metadata and caching the indirect blocks of the database files.
D. NSD Server
Assuming no applications or Filesystem Manager services are running on the NSD servers, the pagepool is only used transiently by the NSD worker threads to gather data from client nodes and write the data to disk. The NSD server does not cache any of the data. Each NSD worker just needs one pagepool buffer per operation, and the buffer can be potentially as large as the largest filesystem blocksize that the disks belong to. With the default NSD configuration, there will be 3 NSD worker threads per LUN (nsdThreadsPerDisk) that the node services. So the amount of memory needed in the pagepool will be 3*#LUNS*maxBlockSize. The target amount of space in the pagepool for NSD workers is controlled by nsdBufSpace which defaults to 30%. So the pagepool should be large enough so that 30% of it has enough buffers.
For more information
To view the configuration parameters that has been changed from the default
mmlsconfig
To view the active value of any of these parameters you can run
mmfsadm dump config
To change any of these parameters use mmchconfig. For example to change the pagepool setting on all nodes.
mmchconfig pagepool=256M
1. Consideration to modify the PagePool
A. Sequential I/O
The default pagepool size may be sufficient for sequential IO workloads, however, a recommended value of 256MB is known to work well in many cases. To change the pagepool size
mmchconfig pagepool=256M [-i]
If the file system blocksize is larger than the default (256K), the pagepool size should be scaled accordingly. For example, if 1M blocksize is used, the default 64M pagepool should be increased by 4 times to 256M. This allows the same number of buffers to be cached.
B. Random I/O
The default pagepool size will likely not be sufficient for Random IO or workloads involving a large number of small files. In some cases allocating 4GB, 8GB or more memory can improve workload performance.
mmchconfig pagepool=4000M
C. Random Direct IO
For database applications that use Direct IO, the pagepool is not used for any user data. It's main purpose in this case is for system metadata and caching the indirect blocks of the database files.
D. NSD Server
Assuming no applications or Filesystem Manager services are running on the NSD servers, the pagepool is only used transiently by the NSD worker threads to gather data from client nodes and write the data to disk. The NSD server does not cache any of the data. Each NSD worker just needs one pagepool buffer per operation, and the buffer can be potentially as large as the largest filesystem blocksize that the disks belong to. With the default NSD configuration, there will be 3 NSD worker threads per LUN (nsdThreadsPerDisk) that the node services. So the amount of memory needed in the pagepool will be 3*#LUNS*maxBlockSize. The target amount of space in the pagepool for NSD workers is controlled by nsdBufSpace which defaults to 30%. So the pagepool should be large enough so that 30% of it has enough buffers.
For more information
Wednesday, April 21, 2010
NFS share on Linux client not immediately visible to other NFS clients
If you are using NFS as the shared file system, you may encounter this issue where NFS share on Linux client not immediately visible to other NFS clients. This is due to caching parameters which you must take note of on the NFS Client side. These are
- acregmin=n. The minimum time (in seconds) that the NFS client caches attributes of a regular file before it requests fresh attribute information from a server. The default is 3 seconds.
- acregmax=n. The maximum time (in seconds) that the NFS client caches attributes of a regular file before it requests fresh attribute information from a server. The default is 60.
- acdirmin=n. The minimum time (in seconds) that the NFS client caches attributes of a directory before it requests fresh attribute information from a server. The default is 60
- acdirmax=n. The maximum time (in seconds) that the NFS client caches attributes of a directory before it requests fresh attribute information from a server. The default is 60
- actimeo=n. When you wish to sets all of acregmin, acregmax, acdirmin, and acdirmax to the same value.
Tuesday, April 20, 2010
Moving HPC Applications to Cloud - The Practitioner Prospective
This is a very good summarise presentation by Victoria Livschitz, CEO of Grid Dynamics on some of the issues and challenges we will face when we unify Cloud and HPC into HPC-Cloud.
Read this: Moving HPC Applications to Cloud - The Practitioner Prospective
Read this: Moving HPC Applications to Cloud - The Practitioner Prospective
Thursday, April 15, 2010
Installing Cluster OpenMP* for Intel® Compilers
Taken from Cluster OpenMP* for Intel® Compilers Website
Overview
OpenMP* is a high level, pragma-based approach to parallel application programming. Cluster OpenMP is a simple means of extending OpenMP parallelism to 64-bit Intel® architecture-based clusters. It allows OpenMP code to run on clusters of Intel® Itanium® or Intel® 64 processors, with only slight modifications.
Prerequisite
Cluster OpenMP use requires that you already have the latest version of the Intel® C++ Compiler for Linux* and/or the Intel® Fortran Compiler for Linux*.
Benefits of Cluster OpenMP
- Simplifies porting of serial or OpenMP code to clusters.
- Requires few source code modifications, which eases debugging.
- Allows slightly modified OpenMP code to run on more processors without requiring investment in expensive Symmetric Multiprocessing (SMP) hardware.
- Offers an alternative to MPI. Is easier to learn and faster to implement.
How to Install Cluster OpenMP.
- Installing Cluster OpenMP is simple. First you have to install Intel Compilers. For more information, see Blog Entry Free Non-Commercial Intel Compiler Download
- After installation of the Compilers, download the Cluster OpenMP License File from Cluster OpenMP Download site
- Place the Cluster OpenMP License file at the License Directory. Usually it is at /opt/intel/licenses
- With the Cluster OpenMP license file in place it will make it possible to use either the “-cluster-openmp” or “-cluster-openmp-profile” compiler options with your compiler when compiling a program.
Wednesday, April 14, 2010
Install g77 on CentOS 5
gfortran which is part of the GNU Compiler Collection (GCC) has replaced the g77 compiler, which stopped development before GCC version 4.0.
For some users who require to use g77, you have to install with the following yum command
For some users who require to use g77, you have to install with the following yum command
yum install compat-gcc*
Tuesday, April 13, 2010
MPIRun and " You may set your LD_LIBRARY_PATH to have the location of the shared libraries ...... " issues
The Scenario:
I encountered this error while executing an mpirun. Do a "pbsnodes -l" and everything seems is online. I thought my $LD_LIBRARY_PATH was giving the issues. But after some exhaustive check, I've realise that communication to one of our nodes was having issues. Here's are the steps I took to solve the issue
--------------------------------------------------------------------------
A daemon (pid 16704) died unexpectedly with status 127 while attempting to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
The Error seems like it is due to LD_LIBRARY_PATH, but it may or may not.
Step 1: Check whether it is a LD_LIBRARY_PATH Issue for your head and compute node
First thing first, you should try to check whether you have the pathing of your LD_LIBRARY_PATH is blank or filled with the correct information for your head node and compute node.
Step 2: Check whether the mpirun can be executed cleanly.
Step 3: If the error still remains.....
Modify the hostfilename and insert 1 compute node at a time and compile the mpirun. You should be able to quickly identify that the problem is not $LD_LIBRARY_PATH but a problematic compute node
I encountered this error while executing an mpirun. Do a "pbsnodes -l" and everything seems is online. I thought my $LD_LIBRARY_PATH was giving the issues. But after some exhaustive check, I've realise that communication to one of our nodes was having issues. Here's are the steps I took to solve the issue
--------------------------------------------------------------------------
A daemon (pid 16704) died unexpectedly with status 127 while attempting to launch so we are aborting.
There may be more information reported by the environment (see above).
This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the location of the shared libraries on the remote nodes and this will automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
The Error seems like it is due to LD_LIBRARY_PATH, but it may or may not.
Step 1: Check whether it is a LD_LIBRARY_PATH Issue for your head and compute node
First thing first, you should try to check whether you have the pathing of your LD_LIBRARY_PATH is blank or filled with the correct information for your head node and compute node.
$ echo $LD_LIBRARY_PATH $/usr/local/lib:/opt/intel/Compiler/11.1/069/lib/intel64 .....If everything looks normal. Proceed to step 2
Step 2: Check whether the mpirun can be executed cleanly.
$ mpirun -np 32 -hostfile hostfilename openmpi-with-intel-hello-worldwhere
- hostfilename contains all the compute node host name
- openmpi-with-intel-hello-world is the compiled mpi program
Step 3: If the error still remains.....
Modify the hostfilename and insert 1 compute node at a time and compile the mpirun. You should be able to quickly identify that the problem is not $LD_LIBRARY_PATH but a problematic compute node
n01 n02 .... In my situation, my problem was due to a broken ssh-generated-key and despite my torque showing all nodes as healthy
Monday, April 12, 2010
A Hello World OpenMPI program with Intel
I compiled a simple parallel hello world program to test whether OpenMPI is working well with Intel Compilers using the example taken from Compiler Examples from https://wiki.mst.edu/nic/how_to/compile/openmpi-intel-compile
Step 1: Ensure your OpenMPI is compiled with Intel. Read the Building OpenMPI with Intel Compiler (Ver 2) for more information
Step 2: Cut and paste the parallel program taken from https://wiki.mst.edu/nic/how_to/compile/openmpi-intel-compile. Compile the C++ program with mpi
Step 3: Test on SMP Machine
Step 4: Test on Distributed Cluster
Step 1: Ensure your OpenMPI is compiled with Intel. Read the Building OpenMPI with Intel Compiler (Ver 2) for more information
Step 2: Cut and paste the parallel program taken from https://wiki.mst.edu/nic/how_to/compile/openmpi-intel-compile. Compile the C++ program with mpi
$ mpicxx -o openmpi-intel-hello mpi_hello.cpp
Step 3: Test on SMP Machine
$ mpirun -np 8 open-intel-hello
Step 4: Test on Distributed Cluster
$ mpirun -np 8 -hostfile hostfile.file open-intel-helloYou should see some output something like
Returned: 0 Hello World! I am 1 of 8 Returned: 0 Hello World! I am 6 of 8 Returned: 0 Hello World! I am 3 of 8 Returned: 0 Hello World! I am 0 of 8 Returned: 0 Hello World! I am 2 of 8 Returned: 0 Hello World! I am 5 of 8 Returned: 0 Hello World! I am 4 of 8 Returned: 0 Hello World! I am 7 of 8
Sunday, April 11, 2010
PCI Utilities
The PCI Utilities are a collection of programs for inspecting and manipulating configuration of PCI devices, all based on a common portable library libpci which offers access to the PCI configuration space on a variety of operating systems.
The utilities includes:
If you wish to install it on a RedHat Derivative Linux, just do a
The utilities includes:
- lspci
- setpci
If you wish to install it on a RedHat Derivative Linux, just do a
yum install pciutils
Thursday, April 8, 2010
Using Intel® MKL in VASP
Using Intel® MKL in VASP guide is intended to help current VASP* (Vienna Ab-Initio Package Simulation*) users get better benchmark performance by utilizing Intel® Math Kernel Library (Intel® MKL).
The guide contain configuration and setup notes
The guide contain configuration and setup notes
Applications from Vmware
ThinApp
VMware ThinApp virtualizes applications by encapsulating application files and registry into a single ThinApp package that can be deployed, managed and updated independently from the underlying OS.
Some of the key benefits according to NetApp:
SpringSource tc Server
SpringSource tc Server provides enterprise users with the lightweight server they want paired with the operational management, advanced diagnostics, and mission-critical support capabilities businesses need. It is designed to be a drop in replacement for Apache Tomcat 6, ensuring a seamless migration path for existing custom-built and commercial software applications already certified for Tomcat. One interesting feature is that the DownloadsSpringSource Tool Suite is Free.
VMware ThinApp virtualizes applications by encapsulating application files and registry into a single ThinApp package that can be deployed, managed and updated independently from the underlying OS.
Some of the key benefits according to NetApp:
- Simplify Windows 7 migration:
- Eliminate application conflicts
- Consolidate application streaming servers:
- Reduce desktop storage costs:
- Increase mobility for end users:
SpringSource tc Server
SpringSource tc Server provides enterprise users with the lightweight server they want paired with the operational management, advanced diagnostics, and mission-critical support capabilities businesses need. It is designed to be a drop in replacement for Apache Tomcat 6, ensuring a seamless migration path for existing custom-built and commercial software applications already certified for Tomcat. One interesting feature is that the DownloadsSpringSource Tool Suite is Free.
Wednesday, April 7, 2010
Torque Error - Address already in use (98) in scan_for_exiting, cannot bind to port 464 in client_to_svr - too many retries
pbs_mom;Svr;pbs_mom;LOG_ERROR:: Address already in use (98) in scan_for_exiting, cannot bind to port 464 in client_to_svr - too many retries
One cause for this is very high traffic on the network not allowing the mom and the server to communicate properly. One common case are job scripts that incessantly run qstat. You will be surprise that sometimes users input some of these qstat scripts that cause the error
One cause for this is very high traffic on the network not allowing the mom and the server to communicate properly. One common case are job scripts that incessantly run qstat. You will be surprise that sometimes users input some of these qstat scripts that cause the error
Monday, April 5, 2010
xCAT Mini HOWTO for 1.2.0
Here are some useful documentation for the older xCAT which is good for pointers and references.
Sunday, April 4, 2010
Placing user xcat contributed scripted in the xcat directory
This is a continuation of blog entry User Contributed Script ported from xcat 1.x to xcat 2.x
Step 1: Placing addclusteruser in /opt/xcat/sbin
Step 2: Placing gensshkeys in /opt/xcat/sbin
Step 3: Placing shfunctions1 in /opt/xcat/lib
To add users using addclusteruser
I'm assuming you have exported the home directory to other nodes
Step 1: Placing addclusteruser in /opt/xcat/sbin
# cd /opt/xcat/sbin # wget https://xcat.svn.sourceforge.net/svnroot/xcat/xcat-contrib/admin_patch/xCAT-2-admin_patch-1.1/addclusteruser
Step 2: Placing gensshkeys in /opt/xcat/sbin
# cd /opt/xcat/sbin # wget https://xcat.svn.sourceforge.net/svnroot/xcat/xcat-contrib/admin_patch/xCAT-2-admin_patch-1.1/gensshkeys
Step 3: Placing shfunctions1 in /opt/xcat/lib
# cd /opt/xcat/lib # wget https://xcat.svn.sourceforge.net/svnroot/xcat/xcat-contrib/admin_patch/xCAT-2-admin_patch-1.1/shfunctions1
To add users using addclusteruser
# addclusteruser ......
I'm assuming you have exported the home directory to other nodes
# pscp /etc/passwd compute:/etc/ # pscp /etc/shadow compute:/etc/ # pscp /etc/group compute:/etc/
Thursday, April 1, 2010
Can't find fftw3f library when configuring Gromacs
GROMACS is a versatile package to perform molecular dynamics.
If you are installing GROMACS using the Installation Instructions from Gromacs and encounter " can't find fftw3f library ", this is probably due to wrong precision being used. Try reconfiguring FFTW with the following settings "--enable-float"
If you are installing GROMACS using the Installation Instructions from Gromacs and encounter " can't find fftw3f library ", this is probably due to wrong precision being used. Try reconfiguring FFTW with the following settings "--enable-float"
./configure --enable-threads --enable-float make make installand it will compile nicely
Subscribe to:
Posts (Atom)