I've written in this blog about various reasons for using network automation but it is time to put them together.
7. Performance and SLAs
The first thing that network management does is performance monitoring. It's conceptually easy, but surprisingly challenging, primarily due to differences in vendors, changes in standards (e.g., 32-bit vs 64-bit counters, different SNMP versions, and bugs in vendor implementations). Once those hurdles are past, the thousands of interfaces need to be sorted by a variety of criteria (e.g., percent utilization, error rates, broadcasts, etc). Alerting thresholds on performance data need to be defined and now you have a system that alerts you when utilization is high or errors suddenly appear on a link. Doing this task without automation is impossible in any network consisting of more than about 50 routers and switches.
SLAs are another area where automation is required. How else would you monitor the delay, jitter, and packet loss across a network (to pick three common SLA factors). An automated system is required for performing SLA tests, processing the results, and presenting the reports.
6. Scaling of processes
There are many processes in managing networks that should be performed regularly to have a smoothly running network with minimum downtime. But because these processes take a lot of time to implement manually, they are seldom performed. With network automation, these processes can be performed regularly, reducing risk of an unexpected network failure. Of course, the results of the processes should be sent to a network administrator, particularly regarding any alerts or exceptions.
These processes include:
Reduce operating costs by tracking the inventory of your network devices and paying maintenance only on those devices that are in your network. Know which devices you want to upgrade next in a network refresh by tracking the age of all your devices and the OS loaded on them.
When troubleshooting, an accurate network topology drawing is valuable. Keeping network drawings up to date is a tedious and often neglected task and when a problem occurs, I typically see people sketching the network topology so they can proceed with the problem diagnosis. The NMS collects connectivity information, which can be displayed within the tool or exported to drawing tools (Microsoft has published the Visio XML format).
Topology information is also very valuable for network planning and preventing outages. It allows you to answer questions about uplink oversubscription ratios, verify redundant connections (or the lack thereof), and identify strange topologies that tend to appear in most networks (and that can cause strange behavior or failure modes).
3. Network Analysis
Network analysis is the process of taking all the collected data about a network and performing analysis on that data to identify current and potential problems. The simplest analysis is identifying interfaces running at high utilization. More complex analysis incorporates data from multiple devices, such as determining that a VRRP group only contains one router (the operational data from all routers shows that there is no peer router). The most complex analysis uses multiple sources of data, such as from both configuration files and operational data, exemplified by a duplex mismatch where an interface configuration shows a setting of 'auto', the interface's state is 'half', and operational data shows late collisions.
Other network analysis incorporates data sources like events (syslog or SNMP traps). Most network management systems collect the data but then rely on the network engineer to perform the analysis. Because the network engineer is already busy, this limits what he or she can do, so it often defaults to looking at alerts generated by the interface utilization thresholds. Automating the analysis tasks allows easy identification of lots of problems that network engineers know that they know should be done but never have the time to perform.
2. Correlation of the above items
The next step in automation is to correlate several of the above items. A good example is to use the topology information to perform higher frequency interface performance polling on any interface where the neighboring device is another infrastructure device. Edge ports can be polled at a much lower frequency.
Another example is using the topology information to determine whether a subnet has been allocated multiple times. Similarly, it would be good to use topology to tell if two subnets that overlap, but have different masks are on the same segment due to a typo in the configuration or are they two different subnets in different parts of the network
1. Human error
The biggest and most important reason for network automation is human error. It accounts for at least 40% of network failures (some estimates are as high as 80%). It has been proven that automation helps reduce those errors. Updating the configurations of hundreds of routers and switches is not something that should be done manually. Automated mechanisms to verify a proposed change and to implement a change control process where it is validated by other network engineers is important for reducing or eliminating silly mistakes.
That's the list. Networks are big. Networks are complex and are increasing in complexity. Automation is the only hope we have of managing the size and complexity while providing high availability.
Terry Slattery, CCIE #1026, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as a full time consultant, Terry was the founder and CTO of Netcordia, and inventor of NetMRI, a suite of network management products. Terry started Netcordia as a consulting company in 2000 and transitioned to a network management product company in 2003. During the consulting days, he used his network design and implementation skills to lead a team in the design and implementation of a high availability network at a brokerage clearing house. Terry is the former President and founder of Chesapeake Computer Consultants, Inc., a networking and computer systems training and consulting company. He co-invented and patented the vLab(tm) internet-based remote lab system. He is co-author of the McGraw Hill text Advanced IP Routing in Cisco Networks. Terry led the team that developed the current Cisco IOS user interface under contract to Cisco Systems. Terry is experienced in the design and installation of large TCP/IP based networks and is a successful network protocol instructor. He is the second Cisco Certified Internetworking Expert (CCIE) #1026 and the first outside of Cisco. He enjoys membership on the Vanderbilt University Engineering School’s Industrial Advisory Board and the IEEE.