System checks
A list of things that should be working in a healthy system:
- all machines are up - ping
- d-cache pools and doors are up and visible to the d-cache system - python dcache iface
- dcap usable - sample dccp job on one of the WN-s
- d-cache pool usage - > 95%? mail
- d-cache restore queue empty - check "http://io.hep.kbfi.ee:2288/poolInfo/restoreHandler/*"
- ganglia is working - gmond/gmetad statuses
- gstat tests are passing - curl | grep
- srm usable - full srmcp test (can be done using PhEDEx user, has already access to proxy)
- bdii working - check both oberon and io with a simple ldapsearch
- LCG SAM has passed the last 3 times - ???
- load on boxes: load > 10 * core? mail
PS: some simple checks - http://hep.kbfi.ee/dbg/checks.html (no news is good news).