Posted Wed, 08 Aug 2018 02:55:23 GMT by

I have been tasked with setting up outage notifications to my group (for example, sending an email or automatically generating a problem incident when AutoMate experiences an outage), and determining outage statistics (for example, the time frame when AutoMate was unavailable). I admit I am having a hard time even defining what should be considered an “outage”.  Currently, we are using AutoMate version 9, primarily for file transfers only.  We have two servers set up to load balance access to a single AutoMate instance.  Each server also functions as a backup if the other becomes unavailable. 

Do you have any recommendations or suggestions for how to define an outage, how to determine if an outage has occurred, and how to determine how long the outage lasted? Any insight or direction is greatly appreciated. Thanks.

Posted Sat, 11 Aug 2018 13:25:36 GMT by

The following is how I've set this up. It's been in place for several years and really keeps things going.

You will need 2 servers running AutoMate. Basically this method enables servers to monitor each other.

 

 

 

Monitor that Automate service and task triggering are running:

Create a health check task: 

Task creates a dummy.txt file on the remote server. This task runs every hour (of course you can change this to meet your needs). You would have this task on both servers.  This same task then waits 10 minutes after it created the dummy file to see if the remote server deletes it (see health keep alive task below).  If the file is deleted we know AutoMate service is running and task triggering is working as expected.  If it's not then you need to check if the AutoMate service and AMTask.exe are running on remote server 

 

Example to get AutoMate service status and AMTask.exe process status on remote server.

<AMSERVICES ACTIVITY="state" SERVICE="AutoMate 9" REMOTEMACHINE="%HostName%" REMOTEDOMAIN="MYDOMAIN" REMOTEUSERNAME="username" REMOTEPASSWORD="AM2CkmLdNUX6LI7Sbl05hfcsg==aME" RESULTVARIABLE="amservicestatus" />

 

<AMIF ACTIVITY="process_running" DOMAINNAME="MYDOMAIN" REMOTEMACHINE="%HostName%" USERNAME="username" PASSWORD="AM2CkmLdNUX6LI7Sbl05hfcsg==aME" ACTION="end" PROCESS="AMTask.exe">

 

Create a health keep alive task:

This task deletes the dummy.txt file on the local server. This job runs every 1 minute. You would have this task on both servers.

You must be signed in to post in this forum.