But if the agent runs multiple times a minutes what are the odds that the agent is NOT running when the server crashes?
Wouldn't it be better if the server said enough detail other than "hey an agent was running, let's blame it on that"?
Then again wouldn't it be better if the server had self healing capabilities which didn't involve automatically restarting the server?
It would be like me setting up Robot Messenger to monitor QSYSOPR for divide by zero errors and then issuing pwrdwnsys restart(*yes) as a possible solution. While, yes, that clears the hung job, there has to be a better way.

-----Original Message-----
From: Domino400 <domino400-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of Chris Whisonant
Sent: Thursday, February 28, 2019 9:46 AM
To: Lotus Domino on the IBM i (AS/400 and iSeries) <domino400@xxxxxxxxxxxxxxxxxx>
Subject: Re: Domino 10.0.1 experiences: Server Restart Notification

If you haven't already, you should set log_agentmanager=1 so that you're recording the start and stop times of each agent. This way you know exactly which one had been running at the server panic. You may even need to add some print statements in the agent to see how far it is getting in it's code execution. If it's hanging at the same place when the crashes happen, then something may need to be adjusted. It could be a memory leak with the agent or amgr task. But agents can definitely cause servers to crash.

Thanks,
Chris


On Thu, Feb 28, 2019 at 9:02 AM Rob Berendt <rob@xxxxxxxxx> wrote:

HCL:
<snip>
...
After reviewing the log files that you have submitted, the least we
can do for now is to monitor your server if another crash or fault
recovery will happen. Based on the call stack captured on the NSD (as
seen below), the process that caused the server to crash is an Agent
Manager.
...
Unfortunately, the database and the specific agent that caused the
crash was not captured in any of the log files uploaded. This is why
we need to monitor the server if another crash will happen and check
if same call stack will be captured and if we will be able to
determine the Agent and database affected.
May I confirm with you if you are actually running an agent for this
server? May I know what agent is that, so we can check further?
</snip>

Again, blame the server restart on a "divide by zero" error... :-(
It's like this whole product is cobbled together from tissue paper and
spit.


-----Original Message-----
From: Domino400 <domino400-bounces@xxxxxxxxxxxxxxxxxx> On Behalf Of
Rob Berendt
Sent: Thursday, February 28, 2019 8:41 AM
To: Lotus Domino on the IBM i (AS/400 and iSeries) <
domino400@xxxxxxxxxxxxxxxxxx>
Subject: RE: Domino 10.0.1 experiences: Server Restart Notification

You know that field in the server document, "Mail Fault Notification
to:"? This is what it is used for:
Fault Recovery Notification: Server QUALITY3/DEKKO was restarted after
a fault on 02/27/2019 12:42:37

Hopefully they will figure out why this 10.0.1 server faulted on it's
own. I have so many of these on my 9.0.1FP10 servers that the tickets
drag on for months. So it doesn't initially make me paranoid about
10.0.1. I was kind of hoping they'd go away though.

HCL tends to blame our agent code. Which, to me, makes about as much
sense as blaming the IPL of an lpar of IBM i because some RPG
programmer had a divide by zero error.
HCL: The nsd shows the agent was running at the time of the system fault.
Me: The agent runs a bazillion times a day doing transactions from
our ERP into Domino. So I think it's just a coincidence. It didn't
fault the system the other gazillion times it ran.

--
This is the Lotus Domino on the IBM i (AS/400 and iSeries) (Domino400)
mailing list To post a message email: Domino400@xxxxxxxxxxxxxxxxxx To
subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/domino400
or email: Domino400-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/domino400.
--
This is the Lotus Domino on the IBM i (AS/400 and iSeries) (Domino400)
mailing list To post a message email: Domino400@xxxxxxxxxxxxxxxxxx To
subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/domino400
or email: Domino400-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at
https://archive.midrange.com/domino400.

--
This is the Lotus Domino on the IBM i (AS/400 and iSeries) (Domino400) mailing list To post a message email: Domino400@xxxxxxxxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: https://lists.midrange.com/mailman/listinfo/domino400
or email: Domino400-request@xxxxxxxxxxxxxxxxxx
Before posting, please take a moment to review the archives at https://archive.midrange.com/domino400.

This thread ...

Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2020 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].