Think about the objectives. The primary objective is to keep the website
running, and if it isn't up and running, then to bring it back up as soon
With that in mind I would start with something like
- checking for certain jobs
- Raul's suggestion below
and so forth. This allows you to handle it with something closer to the
source and make it "self healing". Are the jobs down? Are they down
because of something that you should just start them back up, or should
you alert someone to the fact and let it be resolved manually? Maybe
start them back up and let someone know also?
But I also like the off-machine suggestions. We do end-to-end testing of
Domino Fax for iSeries. We send faxes out automatically from Domino Fax
for iSeries using our 2805 card, back in to Domino Fax for iSeries via the
DID modems on different cards. Then we check the inbox for that fax. If
it's not there within a set time, email is sent to interested parties. I
think a bulk of the time the outage is caused by Domino Fax not properly
handling garbage from bulk fax spam mailers.
Off site web site tools previously mentioned allow you to test stuff like
external DNS's, firewall issues, NAT, etc. And having one that is
generally accepted may help you with those network consultants that claim
the site works for them. (Only to find out that they are testing
internally and not externally, or use a host name table, or ...).