Agents not executing health indicator

The agents not executing health indicator is intended to help identify hosts that are not executing the agent at regular intervals. Agents could fail to execute for a number of reasons. The service could have been manually stopped, cf-execd may have died, or policy may be broken in a way that does not allow the agent to complete its execution.


Every time an agent executes at the end it notes last execution and computes average interval between executions.

The average interval is computed by geometric average and effectively it represents average of last 4 intervals within which cf-agent executed.

When the hub detects the agent has not executed within 3x expected interval it will show up as agent not executing.

You can check the average cf-agent execution interval (agentexecutioninterval) for agents not running, execute this query as custom query in Reporting Tab in Mission Portal:


select * from agentstatus where LastAgentExecutionStatus = 'FAIL'



Q: Is it possible to change the expected interval threshold?

A: No, it is not currently a user definable option.

 

Q: Why would my average agent run time be lower than my scheduled interval?

A: If multiple cf-execd daemons are running each is launching its own agent and it will reduce the average execution interval that is detected.

Have more questions? Submit a request

Comments

Powered by Zendesk