In an HA environment, there are likely to be jobs and subsystems running on the source system, that will not be active on the target system until the target system becomes the source during a failover.
This makes monitoring difficult as QSystem Monitor always shows jobs and subsystems not running on the target system. The introduction of calendars in QSystem Monitor V13 allows us more flexibility.
The first items we need to consider are the calendar rules. These define the dates and times we want the calendar to be active, and, in this case, correspond to the dates and times of the failover.
To start with, create a calendar rule that makes all dates and times active:
Now that we've created our first rule, we need to apply it to our calendars. In this case we'll create two calendars: one for the Production system (PROD) and one for the failover system (BACK).
After we enter the name (PROD) and a description, we can select the rules from the left hand panel and click Add.
Now that we have our rule selected we can highlight it and use the “Set to Include” button, this is telling the calendar to “include” all dates and times that apply to the rule.
The icon now shows that the rule is an inclusive rule.
If you need to check a calendar to see how the rules affect the dates and times, click the Preview tab.
Note the green lines highlight the date/time that's active on the dates shown. Our rule started on 09/18/13 so dates before this are shown as red.
We can repeat the entire process for the Backup system (BACK), only this time when we add the rule we set it to “Set to Exclude”.
In this case, the Preview tab shows all dates in red.
Now, let's assume we're going to have a planned failover starting on Sunday the 22nd, and then switching back on Saturday 28th (so it will last 6 days).
Following the same steps we took above, we need to create a rule for the first failover date.
Add this rule to the PROD calendar as an exclusion rule.
Because this is a more specific rule, it needs to be at a higher sequence number in the rule order. So, we highlight the rule and click Move Up.
Click the Preview tab. The dates from Sunday 22nd to Friday 27th are now red.
We can repeat the entire process for the Backup system (BACK); only this time when we add the rule we click Set to Include.
Click the Preview tab. The dates from Sunday 22nd to Friday 27th are now green.
We've seen how easy it is to create simple calendar rules and then add them to calendars as either inclusive rules, or exclusive rules, as required.
If there are more dates, we can simply create more rules, or if the situation changes and our failover window is going to extend beyond the six days previously entered, we can simply change the rule accordingly.
Equally, if we need a rule that applies to times as well as dates, we can use the hours and minutes fields for this.
Finally, if we want to preview multiple calendars together to ensure all our dates and times are satisfied, we can do this from the main calendar window by checking the right-hand box against each of the calendars required (clicking the column header sorts the screen by that column).
We've created our calendars, and now we need to associate those calendars with thresholds. We do this by creating threshold selectors.
For the purpose of this exercise, we're going to use a Job Count monitor for four jobs called, HAJOB, that should be running in subsystem, QSYSWRK. These jobs will only be active on the Backup system when the Backup has taken control. They should not be active when the Production system has switched.
In our example, USCCS003 is the normal production system and USCCS004 is the failover system.
First, we'll create the job count monitor for our HAJOB jobs.
We now open up the USCCS003 system record, find the definition just created, and drag it over to the bar group.
Now, right-click the HAJOB bar. Then, select the change properties option. Open the overrides for system USCCS003 and click Global Threshold.
Click Select a list.
Click Create Threshold Selector.
Select the previously created calendar for PROD and the HAJOB threshold in the drop-down lists.
Provide a description and default threshold (to be used where there is no match in the calendar).
Click OK to create the selector. Then, select it from the list.
We now have a threshold record that will only be active based on the calendar associated with it.
Now we can repeat the entire process for the Backup system.
Here is the online monitor before the threshold selectors were added:
Note that the entry for the U004 system is Red as there are no jobs currently running in QSYSWRK.
Here is the same view after the threshold selectors have been added:
Now let's assume we have had to perform an emergency switch today (18th September).
The first thing to notice is that the monitor is informing us that there are not four HAJOBs running in QSYSWRK on the USCCS003 system.
So, we can quickly create a new calendar rule as follows:
We can add this as an exclusion rule for the PROD calendar, and an inclusion rule for the BACK calendar.
Now, when we view the online monitor we get the following:
Zero HAJOBS are running on USCCS003, but no error, and four jobs are correctly running on USCCS004. If one of the HAJOBs should fail on the new host system, we'd get a warning as expected.
There is a bit of initial set up of the calendar rules, calendars, and threshold selectors; but, after it's done it's very easy to create new rules and add them to your calendars to help assist you when you perform your high availability switches.