edit · history · print

System checks

A list of things that should be working in a healthy system:

  • all machines are up - ping
  • d-cache pools and doors are up and visible to the d-cache system - python dcache iface
  • dcap usable - sample dccp job on one of the WN-s
  • d-cache pool usage - > 95%? mail
  • d-cache restore queue empty - check "http://io.hep.kbfi.ee:2288/poolInfo/restoreHandler/*"
  • ganglia is working - gmond/gmetad statuses
  • gstat tests are passing - curl | grep
  • srm usable - full srmcp test (can be done using PhEDEx user, has already access to proxy)
  • bdii working - check both oberon and io with a simple ldapsearch
  • LCG SAM has passed the last 3 times - ???
  • load on boxes: load > 10 * core? mail

PS: some simple checks - http://hep.kbfi.ee/dbg/checks.html (no news is good news).

edit · history · print
Page last modified on November 13, 2007, at 03:23 PM