edit · history · print

Creating a new EMI CREAM CE

Relevant documentation:

   * EMI installation guide
   * CREAM install how to

Preparation work

Install SL5.x (latest at time of writing 5.6)

The local repository is available here. Once finished make sure to also install and configure ntp. The sample ntp.conf can be copied from other working nodes.

Install all the repositories

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-4.noarch.rpm
wget http://emisoft.web.cern.ch/emisoft/dist/EMI/1/sl5/x86_64/base/emi-release-1.0.0-1.sl5.noarch.rpm
yum install ./emi-release-1.0.0-1.sl5.noarch.rpm
wget http://repository.egi.eu/sw/production/cas/1/current/repo-files/EGI-trustanchors.repo -O /etc/yum.repos.d/EGI-trustanchors.repo

Disable Yum automatic updates using this script: http://forge.cnaf.infn.it/frs/download.php/101/disable_yum.sh

Pre-requisites installation

The CREAM CE node requires the host certificate/key files to be installed. Make sure to place the two files (hostkey.pem and hostcert.pem) in the target node into the /etc/grid-security directory. Then set the proper mode and ownerships doing:

chown root.root /etc/grid-security/hostcert.pem
chown root.root /etc/grid-security/hostkey.pem
chmod 600 /etc/grid-security/hostcert.pem
chmod 400 /etc/grid-security/hostkey.pem

Install the Yum protectbase, the CA meta package and the jdk dependence of some CREAM components:

yum install yum-protectbase.noarch 
yum install ca-policy-egi-core 
yum install xml-commons-apis 

Torque client installation

The custom install of our own Torque is on torque.hep.kbfi.ee, take from the /root/torque-2.5.5/ the file torque-package-clients-linux-x86_64.sh and execute it on the new CREAM CE with --install. It should install the relevant torque client libraries and commands to /usr/bin. Now instead of installing the emi-torque-util metapackage one has to install:

 yum install lcg-info-dynamic-pbs glite-apel-pbs glite-apel-core glite-yaim-torque-utils lcg-info-dynamic-scheduler-generic lcg-info-dynamic-scheduler-pbs glite-yaim-core

Now to set up host based authentication you need to login to torque.hep.kbfi.ee, go to ssh-keys, add the new CREAM CE to the list of relevant hosts and rerun keyscan. Once done you have to distribute the ssh_known_hosts to all relevant nodes (all CREAM CE's, all WN's and torque itself). Then add the new CREAM to /etc/hosts.equiv as well. The /etc/ssh/shosts.equiv has to be distributed to the new CREAM CE and the sshd_config from some other CREAM CE be taken, the important things are:

UsePAM = no
HostbasedAuthentication yes
IgnoreUserKnownHosts yes
IgnoreRhosts yes

You also need to add all WN's to the /etc/hosts file with FQDN and short names to make sure sshd picks the mappings up correctly. Once you restart sshd the host based auth should work (to test it however you need the user accounts, which are created during CREAM configuration so be patient). But at least qstat should give you the torque status information.

Install CREAM CE itself

Install the meta package: yum install emi-cream-ce

Now before configuration make sure the following group ID's are free: 155, 156. In SL56 they were used by some random avahi etc stuff.

And configure it: /opt/glite/yaim/bin/yaim -c -s site-info.def -n creamCE -n TORQUE_utils

Any reconfigurations in the future should be enough to just use -n creamCE and not Torque utils. For the site-info.def take the file from mars.hep.kbfi.ee:siteinfo/. Change any reference to mars.hep.kbfi.ee to the new hostname as well as: BLAH_JOBID_PREFIX=crmrs_

where the prefix has to be crxxx_ where the xxx can be altered. This allows to identify later which CREAM CE has submitted the jobs to the torque server.

Also, to fix GIP add the following to /etc/lcg-info-dynamic-scheduler.conf:

[LRMS]
lrms_backend_cmd: /usr/libexec/lrmsinfo-pbs
[Scheduler]
cycle_time : 0

Comments and remarks

Here I'm leaving comments and common bugs that I've found

lcgadmin roled user can't login with glExec failed to get user id.

The reason is that CMS needs non-pool accounts, one has to modify the /etc/grid-security/grid-mapfile to remove the pool account description and replace with ordinary accounts (.sgmcms should be replaced with sgmcms000 etc). Only common cms accounts are pool accounts.

Making it work with multiple CREAM's

Add the following to /etc/fstab and make sure the endpoints exist and are empty (in the case of /opt/edg/var/info-torque you have to create it first and clean /opt/edg/var/info directory):

torque.hep.kbfi.ee:/var/spool/torque/server_logs   /var/spool/torque/server_logs  nfs  users,rw   0   0
torque.hep.kbfi.ee:/var/spool/torque/server_priv/accounting /var/spool/torque/server_priv/accounting nfs users,rw 0 0 
torque.hep.kbfi.ee:/opt/edg/var/info /opt/edg/var/info-torque nfs users,rw 0 0

then do a mount -a to attach them. The reason being that the software tags have to be shared amongst all the CREAM CE's as well as the Torque logs have to be available for job parsing.

Now to make sure all the tags are synchronized across the CE's you should copy from one of the working CREAM CE's the /usr/local/bin/sync-tags.sh script to your new CE /usr/local/bin and also the /etc/cron.d/sync-tags to your new CE cron.d directory. Afterwards it's good idea to change the time in the cron definition so that not all CE's do the sync at the same time.

Testing the CREAM CE

Firstly create a test.sh (a simple id -a; sleep 300 will do) under one of the users created (i.e. cms011) and try as that user to qsub it to the queue. If the job runs through nicely and you get back the output, then PBS itself works. Next up try to see from a UI machine if the new CREAM CE has submissions enabled: glite-ce-allowed-submission xxx.hep.kbfi.ee:8443

the answer should be that submissions are enabled.

Once done prepare a simple test.jdl:

[
executable="/bin/sleep";
arguments="300";
]

and try submitting it to the cluster:

glite-ce-job-submit -a -r xxx.hep.kbfi.ee:8443/cream-pbs-short test.jdl 
glite-ce-job-status https://mars.hep.kbfi.ee:8443/CREAM084384886

it's also good to test as SGM user if you have the ability. More useful tests here

Also, make sure that BDII is ok: http://gstat.egi.eu/gstat/site/T2_Estonia/treeview/bdii_site/io.hep.kbfi.ee/

edit · history · print
Page last modified on August 31, 2011, at 02:37 PM