Short (21 second) Outages Detected by InterMapper
InterMapper may report that a device is down for 21 seconds or 51 seconds, or some other short interval. This occurs because the device being tested is actually unresponsive during the time that InterMapper is attempting to test it, but that it responds immediately the next time InterMapper tries.
In general, InterMapper will not consider a device to be down until it has exhausted its attempts to ping/query/connect to it. The following discussion is relevant for "packet-based devices" using pings, SNMP queries, etc. TCP-based probes, such as HTTP, IMAP, SMTP, etc., use the underlying operating system's built-in TCP connection retry mechanism, which differs from the following.
By default, InterMapper will try the packet-based probes three times, spaced three seconds apart. Thus, it takes nine seconds (3 tries, spaced three seconds apart) to determine that the device is down.
When the next poll interval starts (usually 30 seconds after the start of the previous poll), InterMapper tries again. If the device responds right away, then the outage will appear to have lasted 21 seconds (30 second poll interval, minus the nine seconds waiting to determine that the device was down.) For full details about InterMapper's polling procedure, see the User Guide.
What causes this?
We have found a few major causes for this:
* The device is unresponsive for a moment. If the device doesn't respond within the timeout (perhaps because it or an intervening resource was busy), then InterMapper will decide the device was down, even though it had actually been operational.
* High packet loss. If the packet loss on the route between InterMapper and the device is high (generally greater than 10%), then it becomes likely that all of InterMapper's queries might be dropped. This can be a symptom of a problem on the network.
* InterMapper versions prior to 4.5.1 might receive ping response packets, but fail to process them in a timely manner. This could incorrectly raise packet loss statistics. We strongly recommend you upgrade to a current version if you're using InterMapper 4.5 or older.
What you can do?
If the packet loss is low, you can try extending InterMapper's timeout for the device. To do this, Set Info... -> Set Timeout... on the device to set a longer period. This allows more time for the device to respond.
High packet loss is always a symptom of bad network performance. If InterMapper's ping traffic experiences high loss, so will your real data packets. You should look for the cause of the packet loss, because it will lead to slow performance for all the people using that portion of the network.