A Blessing and a Curse
For all its processing powers and capabilities, the IBM System i also possesses an affliction that is both terrible and wonderful: a continuous monologue—or, in this case, message log—detailing in real time every single action, event, and observation of every bit and byte running through its components. It’s akin to a continuous digital stream of consciousness. This is the Yin and Yang of the System i: The source of its power, so to speak, the wealth of all that information, contained in all of those messages, can also be a contributing factor to its greatest vulnerability.
An IT manager charged with manually monitoring that incredible volume of messages would only have to miss a single important message, such as a critical storage condition, in the deluge of informational ones for a system to fail. For networked environments, the issue is of course multiplied by the number of systems, adding staggering new scale to the problem. It’s easy to see that in this scenario, too much information can be as dangerous as too little; the result is the same if that important message is missed. The true value of any message management strategy comes from one that strikes a balance by not merely compromising in the middle of these polarized positions, but instead further utilizes the benefits that can be found at both ends of the spectrum to the best advantage. After all, you still want all of those messages—you just don’t need to give your undivided attention to all of them, all of the time.
Qualifying and Filtering Messages
When it comes to message management, the solution lies in the quality of messages, not the quantity. By automatically filtering and replying to the bulk of informational messages (i.e. those that don’t require human intervention, decision-making, or other forms of assistance to be resolved), an individual manager or team can concentrate their time and energies on exception messages. Among this group of exception messages, further qualification will help determine the specific requirements for each message type, given these considerations:
- Time Sensitive – Action must be taken as soon as possible
- Important and Noteworthy - But not time-sensitive
- Known Responses - Can be replied to automatically
- Data Sensitive – Not all staff should have authorization to view or reply
Creating Smart Alerts
Once messages have been qualified and filtered as far as possible for automated response, resolution pathways for messages or message queues can then be identified. These will require the addition of smart alerts and escalation procedures that take into account any existing resource limitations or procedural requirements including: business needs and processes; administrator shifts and availability patterns; authorization or security protocols; and device or channel preferences. Monitoring messages in real-time is absolutely essential, as this forms the basis of any proactive approach to message management. Using the example of the critical storage message, a pathway to resolution needs to accommodate all eventualities of when the message might occur. Below, we’ve created an example of the levels of escalation required for real-world conditions. Users would set a suitable time parameter with each level so if the message remained unanswered it would proceed to the next level of alert:
|1st||Visual—Message blinks in red||Central monitor in the data center|
|2nd||Visual + Audio (siren!)||Central monitor in the data center & speakers turned up|
|3rd||Email Message||Administration team according to calendar availability|
|4th||SMS Message||IT Manager's cell phone|
In the example above, the first two levels account for the message occurring during normal operating hours where operators and support staff can immediately attend to the issue. If, however, the message occurred during out-of-hours operations, it is likely the message would not be tended to until the third or fourth level of escalation had occurred. Thinking through scenarios like this will help to customize escalation procedures that are tailored to a particular company’s resources and availability. Depending on the type of message and the business environment, it might be appropriate to build in levels that utilize system channels such as SNMP traps, syslog interfaces, or ones that forward the System i message to a multi-platform enterprise management console in line with internal or external security as well as audit or other regulatory guidelines.
Another benefit of customizing escalation procedures is that they can provide an opportunity to build in personal accountability to teams or individuals. For example, the IT Manager who is woken at 3 a.m. when the critical message is sent to his cell phone will want to know why there was a lack of response from the on-call administration team at the third level of alert, prior to it being escalated to the fourth level.
Timing is Everything
The success of any message notification strategy will be determined by its ability to eliminate threats to the system in good time. As mentioned earlier, this begins with real-time identification but quickly extends to the time it takes to resolve an issue. Knowing about a looping job in the system is not sufficient information to resolve it. Operators need to know where that looping job resides in order to provide an immediate response without any lengthy investigation, during which time the looping job could otherwise be consuming vast amounts of system resources.
Similarly, any strategy must also be flexible enough to accommodate the growth, change and development of a systems environment over time. Staff change, resources change and processes change so the ability to tweak alert rules, authorization profiles and escalation procedures will be important to not only maintain the pro-active approach in the long term, but also to ensure the management of it does not itself become a labor-intensive task.