Q. We use the largest database size and it is constantly full and process restarts take significant amounts of time. What is n2flows loading and checking when it reloads that takes such a significant period of time?
A. Quick answer: InterMapper Flows 1.1 makes this much faster.
Longer answer: The core pieces of the ns2flows service were created specifically to provide 100% accurate flow recall. This is important for security and forensic capabilities, and allows an additional benefit of being able to filter by arbitrary IP address, CIDR blocks, and port/protocols, after the fact.
For instance, one of the initial deployments of the code is in a security operations center, where the security analysts would notice interesting behaviors on the network, and would want to immediately scan for similar behaviors in the recent past. For instance: "have we seen traffic to this external host before?", or "what other locations in my network are communicating using this service?"
Being able to browse quickly through flow data allows these analysts to be much more effective than forcing them to go use a full packet capture, which may not extend as far in the past. The requirement, however, is that after a restart there are no gaps in the data. For instance, if the memory cache claims a 3 hour fidelity, but missed 20% of the flows that actually happened (and are stored on disk), this forensic capability is lost.
In order to maintain this 100% accuracy, a restart quickly scans the database to determine the time span it occupies (this takes a millisecond), and then figures out what part of the database needs to be read fully. For instance, if the database is 100 GB and spans 100 hours, and the memory cache is set at 5 GB, then it would stand to reason that the last 5 hours of database data should be read into memory.
However, each stored session has a start timestamp, and a stop timestamp. Longer sessions might therefore span multiple database segments, and if the end timestamp of a long session extends (into the past) in a previous database segment, then it must also be scanned to determine the extend of the overlap. A 100GB DB, will have about 50 segments, therefore it might be necessary to scan 6, or even 8 GB worth of DB data to fill the 5GB memory cache, and maintain the 100% fidelity.
Version 1.0 of the ns2flows service would also put these records in chronological order in memory, which took substantial time. For version 1.1, this condition was relaxed, at a slight memory capacity penalty, to allow for much faster start up times.