starcluster と cfncluster その1

AWSのクーポンが5月末(つまり今日)で期限切れということに気付いたので、急遽starclusterとcfnclusterを触ってみることにしました。

どちらも、pythonで書かれたAWS上でEC2のクラスタを構成するツールなんですが、前者はMITで開発されていて後者はAmazonが開発しているものです。
starclusterの方が古くからあったようで、githubの一番古いコミットは2009年のものでした。でもこの段階で既に動いてたっぽいので、プロジェクトの開始はもっと古そうです*1。
一方のcfnclusterは20014年5月にプロジェクトを始めたようなので、約2年くらいの歴史ですね。

starclusterの特長はここのページにまとまってます。
What is StarCluster? — StarCluster 0.95.6 documentation
要するに、事前に色々とインストールしたAMIを用意してあって、任意のノード数のインスタンスを起動してセットアップしてくれるといったとこでしょうかね。

cfnclusterの方はもうちょっとインフラ寄りの機能しかなくて、インスタンスの起動とネットワーク構成あたりしかやらないようです。
要はソフトのインストールは、自分でAMIを用意してねというスタンスなんでしょうかね。
この辺のページを見てると、cloud watchやauto scalingとの連動のあたりは凝ったことやってるなーといったところですが、お手軽にクラウド上でHPCやりたいっていう層*2にはあんまり訴求しない気がします。
CfnCluster Processes — CfnCluster 1.2.1
あと、AWSのサービスを使いすぎてて、ベンダーロックイン感が高めですね。:p

インストール

とりあえず、両方ともインストールしてみましょう。なお、cfnclusterはpython2.7.10 on cygwin、starclusterはpython2.7.10 on windows7でやってます。cygwinで全部やるつもりだったんですが、starclusterのインストールに失敗したのでこっちだけnative環境でやり直しています。

starcluster

インストールしようとすると、pycryptoをビルドしようとして失敗するので、こちらのページを参考にVS90COMNTOOLSを上書きして無理矢理VS2015を使うように設定します。
「Unable to find vcvarsall.bat」の対処法 | Regen Techlog
違うバージョンのVSが入っている時や、そもそもVSなんぞ入れとらんという時は別の上記ページの別の方法で対処しましょう。

> python -m virtualenv --python=C:\Python27\python.exe starcluster
> cd starcluster
> Scripts\activate
> set VS90COMNTOOLS=%VS140COMNTOOLS%
> pip install starcluster
Collecting starcluster
Collecting iso8601>=0.1.8 (from starcluster)
  Using cached iso8601-0.1.11-py2.py3-none-any.whl
Collecting pycrypto>=2.5 (from starcluster)
  Using cached pycrypto-2.6.1.tar.gz
Collecting workerpool>=0.9.2 (from starcluster)
Collecting iptools>=0.6.1 (from starcluster)
  Using cached iptools-0.6.1-py2.py3-none-any.whl
Collecting scp>=0.7.1 (from starcluster)
  Using cached scp-0.10.2-py2.py3-none-any.whl
Collecting boto>=2.23.0 (from starcluster)
  Using cached boto-2.40.0-py2.py3-none-any.whl
Collecting Jinja2>=2.7 (from starcluster)
  Using cached Jinja2-2.8-py2.py3-none-any.whl
Collecting decorator>=3.4.0 (from starcluster)
  Using cached decorator-4.0.9-py2.py3-none-any.whl
Collecting paramiko>=1.12.1 (from starcluster)
  Using cached paramiko-2.0.0-py2.py3-none-any.whl
Collecting optcomplete>=1.2-devel (from starcluster)
Collecting six (from workerpool>=0.9.2->starcluster)
  Using cached six-1.10.0-py2.py3-none-any.whl
Collecting MarkupSafe (from Jinja2>=2.7->starcluster)
Collecting pyasn1>=0.1.7 (from paramiko>=1.12.1->starcluster)
  Using cached pyasn1-0.1.9-py2.py3-none-any.whl
Collecting cryptography>=1.1 (from paramiko>=1.12.1->starcluster)
  Using cached cryptography-1.3.2-cp27-none-win_amd64.whl
Requirement already satisfied (use --upgrade to upgrade): setuptools>=11.3 in c:\users\n_so5\onedrive\python\starcluster\lib\site-packages (from cryptography>=1.1->paramiko>=1.12.1->starcluster)
Collecting enum34 (from cryptography>=1.1->paramiko>=1.12.1->starcluster)
  Using cached enum34-1.1.6-py2-none-any.whl
Collecting ipaddress (from cryptography>=1.1->paramiko>=1.12.1->starcluster)
  Using cached ipaddress-1.0.16-py27-none-any.whl
Collecting idna>=2.0 (from cryptography>=1.1->paramiko>=1.12.1->starcluster)
  Using cached idna-2.1-py2.py3-none-any.whl
Collecting cffi>=1.4.1 (from cryptography>=1.1->paramiko>=1.12.1->starcluster)
  Using cached cffi-1.6.0-cp27-none-win_amd64.whl
Collecting pycparser (from cffi>=1.4.1->cryptography>=1.1->paramiko>=1.12.1->starcluster)
Building wheels for collected packages: pycrypto
  Running setup.py bdist_wheel for pycrypto ... done
  Stored in directory: C:\Users\n_so5\AppData\Local\pip\Cache\wheels\80\1f\94\f76e9746864f198eb0e304aeec319159fa41b082f61281ffce
Successfully built pycrypto
Installing collected packages: iso8601, pycrypto, six, workerpool, iptools, pyasn1, enum34, ipaddress, idna, pycparser, cffi, cryptography, paramiko, scp, boto, MarkupSafe, Jinja2, decorator, optcomplete, starcluster
Successfully installed Jinja2-2.8 MarkupSafe-0.23 boto-2.40.0 cffi-1.6.0 cryptography-1.3.2 decorator-4.0.9 enum34-1.1.6 idna-2.1 ipaddress-1.0.16 iptools-0.6.1 iso8601-0.1.11 optcomplete-1.2-devel paramiko-2.0.0 pyasn1-0.1.9 pycparser-2.14 pycrypto-2.6.1 scp-0.10.2 six-1.10.0 starcluster-0.95.6 workerpool-0.9.4 You are using pip version 8.0.2, however version 8.1.2 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

cfncluster

> python -m virtualenv cfncluster_cygwin
> cd cfncluster_cygwin/
> . bin/activate
> pip install cfncluster
Collecting cfncluster
  Downloading cfncluster-1.2.1.tar.gz
Collecting boto>=2.39 (from cfncluster)
  Downloading boto-2.40.0-py2.py3-none-any.whl (1.3MB)
    100% |████████████████████████████████| 1.4MB 660kB/s
Collecting awscli>=1.10.13 (from cfncluster)
  Downloading awscli-1.10.34-py2.py3-none-any.whl (938kB)
    100% |████████████████████████████████| 942kB 948kB/s
Collecting colorama<=0.3.3,>=0.2.5 (from awscli>=1.10.13->cfncluster)
  Downloading colorama-0.3.3.tar.gz
Collecting docutils>=0.10 (from awscli>=1.10.13->cfncluster)
  Downloading docutils-0.12.tar.gz (1.6MB)
    100% |████████████████████████████████| 1.6MB 465kB/s
Collecting rsa<=3.5.0,>=3.1.2 (from awscli>=1.10.13->cfncluster)
  Downloading rsa-3.4.2-py2.py3-none-any.whl (46kB)
    100% |████████████████████████████████| 51kB 1.9MB/s
Collecting botocore==1.4.24 (from awscli>=1.10.13->cfncluster)
  Downloading botocore-1.4.24-py2.py3-none-any.whl (2.3MB)
    100% |████████████████████████████████| 2.3MB 340kB/s
Collecting s3transfer==0.0.1 (from awscli>=1.10.13->cfncluster)
  Downloading s3transfer-0.0.1-py2.py3-none-any.whl
Collecting pyasn1>=0.1.3 (from rsa<=3.5.0,>=3.1.2->awscli>=1.10.13->cfncluster)
  Downloading pyasn1-0.1.9-py2.py3-none-any.whl
Collecting python-dateutil<3.0.0,>=2.1 (from botocore==1.4.24->awscli>=1.10.13->cfncluster)
  Downloading python_dateutil-2.5.3-py2.py3-none-any.whl (201kB)
    100% |████████████████████████████████| 204kB 1.5MB/s
Collecting jmespath<1.0.0,>=0.7.1 (from botocore==1.4.24->awscli>=1.10.13->cfncluster)
  Downloading jmespath-0.9.0-py2.py3-none-any.whl
Collecting futures<4.0.0,>=2.2.0; python_version == "2.6" or python_version == "2.7" (from s3transfer==0.0.1->awscli>=1.10.13->cfncluster)
  Downloading futures-3.0.5-py2-none-any.whl
Collecting six>=1.5 (from python-dateutil<3.0.0,>=2.1->botocore==1.4.24->awscli>=1.10.13->cfncluster)
  Downloading six-1.10.0-py2.py3-none-any.whl
Building wheels for collected packages: cfncluster, colorama, docutils
  Running setup.py bdist_wheel for cfncluster ... done
  Stored in directory: /home/n_so5/.cache/pip/wheels/26/c8/b0/3cf98bf7d72a9a63a358cbf094a50092641a70843de01ca155
  Running setup.py bdist_wheel for colorama ... done
  Stored in directory: /home/n_so5/.cache/pip/wheels/21/c5/cf/63fb92293f3ad402644ccaf882903cacdb8fe87c80b62c84df
  Running setup.py bdist_wheel for docutils ... done
  Stored in directory: /home/n_so5/.cache/pip/wheels/db/de/bd/b99b1e12d321fbc950766c58894c6576b1a73ae3131b29a151
Successfully built cfncluster colorama docutils
Installing collected packages: boto, colorama, docutils, pyasn1, rsa, six, python-dateutil, jmespath, botocore, futures, s3transfer, awscli, cfncluster
Successfully installed awscli-1.10.34 boto-2.40.0 botocore-1.4.24 cfncluster-1.2.1 colorama-0.3.3 docutils-0.12 futures-3.0.5 jmespath-0.9.0 pyasn1-0.1.9 python-dateutil-2.5.3 rsa-3.4.2 s3transfer-0.0.1 six-1.10.0

クラスタの起動

まずは、starclusterを使って4ノードのクラスタを作ってみましょう。
ここのチュートリアルを参考に、適当にいじっていきます。
Quick-Start — StarCluster 0.95.6 documentation

まずは、コンフィグファイルを生成します。

> starcluster help
Options:
--------
[1] Show the StarCluster config template
[2] Write config template to C:\Users\n_so5\.babun\cygwin\home\n_so5\.starcluster\config
[q] Quit
Please enter your selection: 2  <== 2を入力してテンプレートファイルを書き出させます。
>>> Config template written to C:\Users\n_so5\.babun\cygwin\home\n_so5\.starcluster\config
>>> Please customize the config template

続いて、生成されたconfigファイルを編集します。変更したのは以下の5行のみです。内部的にはbotoを使っているので、鍵関連は環境変数でも設定できるはずですが、設定ファイルを読み込んだ時点で値が入っていないとエラーチェックにひっかかるようです・・・

AWS_ACCESS_KEY_ID =     #your aws access key id here
AWS_SECRET_ACCESS_KEY = #your secret aws access key here
AWS_USER_ID =           #your 12-digit aws user id here
AWS_REGION_NAME =  ap-northeast-1
CLUSTER_SIZE = 4
NODE_INSTANCE_TYPE = c3.large

それから、keypairを生成します。

>starcluster createkey mykey -o %HOME%\.ssh\mykey.rsa

最後にクラスタを起動します。

>starcluster start smallcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Using default cluster template: smallcluster
>>> Validating cluster template settings...
>>> Cluster template settings are valid
>>> Starting cluster...
>>> Launching a 4-node cluster...
>>> Creating security group @sc-smallcluster...
>>> Creating placement group @sc-smallcluster...
Reservation:r-cbaaa168
>>> Waiting for instances to propagate...
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up... (updating every 30s)
>>> Waiting for all nodes to be in a 'running' state...
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for SSH to come up on all nodes...
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Waiting for cluster to come up took 1.251 mins
>>> The master node is ec2-54-175-0-145.compute-1.amazonaws.com
>>> Configuring cluster...
>>> Running plugin starcluster.clustersetup.DefaultClusterSetup
>>> Configuring hostnames...
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating cluster user: sgeadmin (uid: 1001, gid: 1001)
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring scratch space for user(s): sgeadmin
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Configuring /etc/hosts on each node
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Starting NFS server on master
>>> Configuring NFS exports path(s):
/home
>>> Mounting all NFS export path(s) on 3 worker node(s)
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.285 mins
>>> Configuring passwordless ssh for root
>>> Configuring passwordless ssh for sgeadmin
>>> Running plugin starcluster.plugins.sge.SGEPlugin
>>> Configuring SGE...
>>> Configuring NFS exports path(s):
/opt/sge6
>>> Mounting all NFS export path(s) on 3 worker node(s)
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Setting up NFS took 0.212 mins
>>> Installing Sun Grid Engine...
3/3 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Creating SGE parallel environment 'orte'
4/4 |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||| 100%
>>> Adding parallel environment 'orte' to queue 'all.q'
>>> Configuring cluster took 2.328 mins
>>> Starting cluster took 3.701 mins

The cluster is now ready to use. To login to the master node
as root, run:

    $ starcluster sshmaster smallcluster

If you're having issues with the cluster you can reboot the
instances and completely reconfigure the cluster from
scratch using:

    $ starcluster restart smallcluster

When you're finished using the cluster and wish to terminate
it and stop paying for service:

    $ starcluster terminate smallcluster

Alternatively, if the cluster uses EBS instances, you can
use the 'stop' command to shutdown all nodes and put them
into a 'stopped' state preserving the EBS volumes backing
the nodes:

    $ starcluster stop smallcluster

WARNING: Any data stored in ephemeral storage (usually /mnt)
will be lost!

You can activate a 'stopped' cluster by passing the -x
option to the 'start' command:

    $ starcluster start -x smallcluster

This will start all 'stopped' nodes and reconfigure the
cluster.

ものの数分も待っていたら、クラスタの完成です！
さっそくログインしてみましょう。

>starcluster sshmaster smallcluster
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

>>> Starting Pure-Python SSH shell...
Line-buffered terminal emulation. Press F6 or ^Z to send EOF.

eval $(resize)
root@master:~# eval $(resize)
The program 'resize' is currently not installed. You can install it by typing:
apt-get install xterm

"Starting Pure-Python SSH shell"とか出てるので、どうやらこのためにParamikoを使っているようです。しかし、残念ながらtermcapだかterminfoだかの設定が入っていないのとwindowsのコマンドプロンプト環境からだと激しく使い難い*3ので、事前に用意されているsgeadminというユーザのauthorized_keysに鍵を追加して後は普段から使っているsshクライアントを使って作業します。

> starcluster put smallcluster %HOME%\.ssh\id_rsa.pub ./
> starcluster sshmaster smallcluster
> root@master:~# cat id_rsa.pub >> ~sgeadmin/.ssh/authorized_keys
> starcluster listclusters
 StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

-----------------------------------------------
smallcluster (security group: @sc-smallcluster)
-----------------------------------------------
Launch time: 2016-05-31 14:06:38
Uptime: 0 days, 00:23:07
VPC: vpc-9f9df4fa
Subnet: subnet-7907d752
Zone: us-east-1a
Keypair: mykey
EBS volumes: N/A
Cluster nodes:
     master running i-1a5d9286 ec2-54-175-0-145.compute-1.amazonaws.com
    node001 running i-1b5d9287 ec2-54-175-213-27.compute-1.amazonaws.com
    node002 running i-185d9284 ec2-54-89-100-190.compute-1.amazonaws.com
    node003 running i-195d9285 ec2-52-201-218-218.compute-1.amazonaws.com
Total nodes: 4

starcluster put {クラスタ名} でファイルをアップロード(getだと逆にダウンロード)できます。これで公開鍵を送りこんで、authorized_keysに末尾に追加しておきましょう。
terminalの設定がダメダメなのでコピペでなんとかするのはちょっと厳しげです。
あと、最後にやっているようにstarcluster listclustersコマンドを実行すると起動中のマシンのホスト名一覧が表示されるので、ここでmasterと表記されているノード(ec2-54-175-0-145.compute-1.amazonaws.com)へsshでログインします。
この時点でようやく気付いたんですが、REGION指定がうまく渡っていなかったようで、デフォルトのus-east-1にインスタンスが上がっていました・・・

続けてHPLでも流すかーと思ってたんですが、なんか設定が変なみたいで、mpiccはmpich2のものだけどmpirunはOpenMPIのものが使われているという謎な状況です。

sgeadmin@master:~$ which mpicc
/usr/bin/mpicc
sgeadmin@master:~$ which mpirun
/usr/bin/mpirun
sgeadmin@master:~$ file /usr/bin/mpicc /usr/bin/mpirun
/usr/bin/mpicc:  symbolic link to `/etc/alternatives/mpicc'
/usr/bin/mpirun: symbolic link to `/etc/alternatives/mpirun'
sgeadmin@master:~$ file /etc/alternatives/mpicc /etc/alternatives/mpirun
/etc/alternatives/mpicc:  symbolic link to `/usr/bin/mpicc.mpich2'
/etc/alternatives/mpirun: symbolic link to `/usr/bin/mpirun.openmpi'

alternativesって使ったことなかったんですが、ぐぐりながら状況を見てみるとこんな感じ。

> update-alternatives --display mpi
mpi - auto mode
  link currently points to /usr/include/mpich2
/usr/include/mpich2 - priority 40
  slave libmpi++.so: /usr/lib/libmpichcxx.so
  slave libmpi.so: /usr/lib/libmpich.so
  slave libmpif77.so: /usr/lib/libfmpich.so
  slave libmpif90.so: /usr/lib/libmpichf90.so
  slave mpic++: /usr/bin/mpic++.mpich2
  slave mpic++.1.gz: /usr/share/man/man1/mpic++.mpich2.1.gz
  slave mpicc: /usr/bin/mpicc.mpich2
  slave mpicc.1.gz: /usr/share/man/man1/mpicc.mpich2.1.gz
  slave mpicxx: /usr/bin/mpicxx.mpich2
  slave mpicxx.1.gz: /usr/share/man/man1/mpicxx.mpich2.1.gz
  slave mpif77: /usr/bin/mpif77.mpich2
  slave mpif77.1.gz: /usr/share/man/man1/mpif77.mpich2.1.gz
  slave mpif90: /usr/bin/mpif90.mpich2
  slave mpif90.1.gz: /usr/share/man/man1/mpif90.mpich2.1.gz
/usr/lib/openmpi/include - priority 40
  slave libmpi++.so: /usr/lib/openmpi/lib/libmpi_cxx.so
  slave libmpi.so: /usr/lib/openmpi/lib/libmpi.so
  slave libmpif77.so: /usr/lib/openmpi/lib/libmpi_f77.so
  slave libmpif90.so: /usr/lib/openmpi/lib/libmpi_f90.so
  slave mpiCC: /usr/bin/mpic++.openmpi
  slave mpiCC.1.gz: /usr/share/man/man1/mpiCC.openmpi.1.gz
  slave mpic++: /usr/bin/mpic++.openmpi
  slave mpic++.1.gz: /usr/share/man/man1/mpic++.openmpi.1.gz
  slave mpicc: /usr/bin/mpicc.openmpi
  slave mpicc.1.gz: /usr/share/man/man1/mpicc.openmpi.1.gz
  slave mpicxx: /usr/bin/mpic++.openmpi
  slave mpicxx.1.gz: /usr/share/man/man1/mpicxx.openmpi.1.gz
  slave mpif77: /usr/bin/mpif77.openmpi
  slave mpif77.1.gz: /usr/share/man/man1/mpif77.openmpi.1.gz
  slave mpif90: /usr/bin/mpif90.openmpi
  slave mpif90.1.gz: /usr/share/man/man1/mpif90.openmpi.1.gz
Current 'best' version is '/usr/include/mpich2'.

良くわからんけどrootで入り直して、マニュアルでOpenMPIに設定します。*4

# update-alternatives --config  mpi
There are 2 choices for the alternative mpi (providing /usr/include/mpi).

  Selection    Path                      Priority   Status
------------------------------------------------------------
* 0            /usr/include/mpich2        40        auto mode
  1            /usr/include/mpich2        40        manual mode
  2            /usr/lib/openmpi/include   40        manual mode

Press enter to keep the current choice[*], or type selection number: 2
update-alternatives: using /usr/lib/openmpi/include to provide /usr/include/mpi (mpi) in manual mode

もっかいsgeadminでログインしなおしてOpenMPIが使われるようになっているか確認します。

sgeadmin@master:~$ file /usr/bin/mpicc /usr/bin/mpirun
/usr/bin/mpicc:  symbolic link to `/etc/alternatives/mpicc'
/usr/bin/mpirun: symbolic link to `/etc/alternatives/mpirun'
sgeadmin@master:~$ file /etc/alternatives/mpicc /etc/alternatives/mpirun
/etc/alternatives/mpicc:  symbolic link to `/usr/bin/mpicc.openmpi'
/etc/alternatives/mpirun: symbolic link to `/usr/bin/mpirun.openmpi'

ようやく正常に使えるようになったので、こんな感じのテストプログラムを作って流してみます。

#include <mpi.h>
#include <stdio.h>

main(int argc, char** argv)
{
int nproc;
int myrank;
char hostname[80];
int name_len;
int ierr;
int i;
MPI_Init(&argc, &argv);

ierr=MPI_Comm_size(MPI_COMM_WORLD, &nproc);
if(ierr!=0) fprintf(stderr,"ierr from MPI_Comm_size = %d\n",ierr);
ierr= MPI_Comm_rank(MPI_COMM_WORLD, &myrank);
if(ierr!=0) fprintf(stderr,"ierr from MPI_Comm_rank = %d\n",ierr);

MPI_Get_processor_name(hostname, &name_len);
if(name_len >80) hostname[79]='\0';

for(i=0; i< nproc; i++)
{
  if(myrank==i)
  {
    fprintf(stderr,"myrank is %d of %d on %s\n",myrank,nproc, hostname);
  }
  MPI_Barrier(MPI_COMM_WORLD);
}
MPI_Finalize();
return 0;
}

> mpicc tmp.c
> mpirun -np 8 -machinefile hostfile ./a.out
myrank is 0 of 8 on master
myrank is 1 of 8 on master
myrank is 2 of 8 on node001
myrank is 3 of 8 on node001
myrank is 4 of 8 on node002
myrank is 5 of 8 on node002
myrank is 6 of 8 on node003
myrank is 7 of 8 on node003

ようやく正常に流れました。ちなみ、alternativesの設定をいじるまではMPI_Comm_sizeを呼ぶとプロセス数が0だと言われていました。
mpich2とOpenMPIが混ざってたので、おそらくMPI_COMM_WORLDの値が両ライブラリの間で違うのが原因でしょう。

これは、複数のMPIライブラリが用意された環境だとはまるポイントで、MPIの規格で定義されているいくつかの値(MPI_COMM_WORLDとかMPI_SUMとか)は実はmpi.hの中で定義されているマクロ変数で実装されていることが多くて、なおかつライブラリが変われば値自体は異なるものになっていることが多いので互換性はありません。
これらの定義済変数はコンパイル時に(正確にはプリプロセス時に)それぞれのライブラリが定義した値へと置き換えられてしまうので、今回のようにコンパイル時に使ったmpi.hと実行時に呼ばれた共有ライブラリが別の実装のものになっていると、この値に不整合が発生し、呼び出し側ではMPI_COMM_WORLDを渡したつもりなのに、ライブラリ側では未定義のCOMMが指定されているという現象が起きます。*5
この辺の事情が分かっていないSEが適当な作業をすると、ユーザから「俺のプログラム動かんようになったやないか！」というクレームが飛んでくるので気をつけましょうw*6

さて、せっかくなのでHPLも流しときましょう。まずはHPLのソースをダウンロードしてきます。*7

HPL - A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers

さっきの鍵と同じようにstarcluster putで送りこんで後はビルドして流すだけ。

> starcluster put smallcluster hpl-2.2.tar.gz /tmp
sgeadmin@master:~$ tar xfz /tmp/hpl-2.2.tar.gz
sgeadmin@master:~$ cd hpl-2.2/
sgeadmin@master:~/hpl-2.2$ cp setup/Make.Linux_PII_FBLAS ./

Make.Linux_PII_FBLASをエディタで開いて以下のように設定します。

TOPdir       = $(HOME)/hpl-2.2
MPdir, MPinc, MPlib  ->コメントアウト
LAdir=/usr/lib
Lalib= $(LAdir)/libblas.a
CC=mpicc
LINKER=mpicc

そして、make

> make arch=Linux_PII_FBLAS 2>&1 |tee makelog

正常にビルドできたら、bin/Linux_PII_FBLASの下にxhplという名前の実行ファイルとHPL.datという設定ファイルが生成されています。
デフォルトだとNsが小さすぎるのと、いくつかの設定を振りながら測定を繰り返す形になっているのでエディタでHPL.datを開いて"# of ～"と書かれている行の左端の数字を全部1にします。それから、Ns=10000, NBs=64, Ps=4, Qs=2くらいに設定してSGE経由でジョブを投げます。

> qsub -cwd -pe orte 8 -b y mpirun ./xhpl

正常に終了したらmpirun.o*とかmpirun.e*というファイルに標準出力や標準エラー出力が吐かれています。

sgeadmin@master:~/hpl-2.2/bin/Linux_PII_FBLAS$ cat mpirun.o13
================================================================================
HPLinpack 2.2  --  High-Performance Linpack benchmark  --   February 24, 2016
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :   10000
NB     :      64
PMAP   : Row-major process mapping
P      :       4
Q      :       2
PFACT  :    Left
NBMIN  :       2
NDIV   :       2
RFACT  :    Left
BCAST  :   1ring
DEPTH  :       0
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00L2L2       10000    64     4     2              88.81              7.509e+00
HPL_pdgesv() start time Tue May 31 07:56:39 2016

HPL_pdgesv() end time   Tue May 31 07:58:08 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0013934 ...... PASSED
================================================================================

Finished      1 tests with the following results:
              1 tests completed and passed residual checks,
              0 tests completed and failed residual checks,
              0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------

End of Tests.
================================================================================

8コアで7.6GFlopsってずいぶん低いですね。
使っているCPUは、 Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHzだそうなので、理論ピーク性能は8コア合計で 4*2*2.8*8= 172.7GFlopsなので実効効率は4.4%
さすがにサイズが小さすぎたようなので、N=40000まで増やして再測定してみました。しかし、これでも27.4GFlopsなので実効効率は15%くらいですかね。

================================================================================
T/V                N    NB     P     Q               Time                 Gflops
--------------------------------------------------------------------------------
WR00L2L2       40000    64     4     2            1557.25              2.740e+01
HPL_pdgesv() start time Tue May 31 08:17:37 2016

HPL_pdgesv() end time   Tue May 31 08:43:35 2016

--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0007106 ...... PASSED
================================================================================

ドキュメントによれば、starclusterが使っているBLASはAtlasらしいので、リビルドしていない状態だとこんなもんで限界かもしれません。時間があったら、HPCCのDGEMMテストでも使って追試してみましょう。

かなり長くなってしまったので、cfnclusterは次の記事に分けます。

*1:2008年のSCの時に見たような気がするけど、気のせいかも

*2:そんな層が居るかどうかはともかくとしてw

*3:Ctrol-C押すといきなりsshのセッションが切れたりとか・・・

*4:これ、本来はauto設定にするならpriorityの値を変えないと駄目なんじゃなかろうか・・・

*5:Commが未定義の時にMPI_Comm_size/rankがこんな挙動してて良いのかどうかは未確認

*6:といいつつ、MPI関連とかジョブスケジューラ関連の設定ミスはエンドユーザから指摘してもなかなか理解していないSEが多いのも事実。

*7:あら、今年に入ってアップデートされてるな・・・今さら何を変えたんだろうか。

HPCメモ

HPC(High Performance Computing)に関連したりしなかったりすることのメモ書き

starcluster と cfncluster その1

インストール

starcluster

cfncluster

クラスタの起動