Skip to main content

Self-Healing Network Automation

By July 15, 2021No Comments

Self-Healing Network Automation

<img src="" size="S" position="right" alt="Ansible" %}

In January and May of 2018, Arctiq ran "self-healing" network automation events with live demonstrations of real use cases using Ansible for configuration management. These events were in collaboration with our ecosystem partners, F5 and Red Hat. The demonstrations highlighted multi-vendor networking with Cisco, Juniper and F5 devices fully interconnected with dynamic routing, monitoring (Nagios) and ChatOps tools (StackStorm/Slack). The use cases demonstrated how full stack automation can be achieved for configuration of the network and application stack in a single routine using Ansible.

The demo environment was built using GNS3, an open source tool for network device emulation and run on cloud-based bare-metal hosts provided by

Arctiq leveraged Ansible to automate network deployments using dynamically generated configuration templates and playbooks. The event presentation and other configuration examples are available in Arctiq’s GitHub repository HERE.

The Self-Healing Network

The challenge around automating network fixes is how to triage or determine which device have been impacted and what the actual issue is, not to mention where to invoke the repairs to correct configurations and restore stability. In the use case demos, we highlight the validity of reconfiguring the entire lab environment to the "known working" state. This eliminates the need to create many different automation routines for specific fixes (interface down, manual change with error, malicious code) and leverages the idempotent nature of Ansible to only re-configure the devices and parameters where the configuration differs from the defined end state. True declarative playbooks with idempotency. Monitoring tools will still provide administrators with notification of error messages or failure details associated with the issues, while at the same time the environment "self-heals" without manual intervention, minimizing downtime.

Self-Healing WorkFlow

The self-healing nature of this approach is the true definition of interoperability and how automation and shared intelligence is possible using open source and enterprise solutions together. The demo enviorment used the following automation workflow:

<img src="" position="center" alt="Self-Healing-Workflow" %}

In Arctiq’s typical “technology-first” style we explored real world use cases to automate network and application configurations with Ansible to monitor and action changes based on event-driven integration with key open source tooling to achieve:

  • Templated, common config standards for devices
  • Automation across the entire network and application stack
  • Event-driven, intelligent automation
  • Operational efficiency
  • Real-time notifications of environment status
  • Active documentation with version control

Use Case Demo 1:

Lab Reset and Configuration with Ansible Tower

In this video we run Ansible Tower workflow jobs to first reset the lab environment and then to fully configure it. The first workflow job playbooks will "reset" the network devices and application configurations back to the standard baseline templates. Once the lab has been reset, the second workflow job playbooks will re-configure the entire lab to establish full network connectivity and application standup.

{% include video name="oTDgPpauKQE" %}

Use Case Demo 2:

Git Webhook Integration

This video highlights the use of a handy webhook that fires from GitHub into Ansible Tower to update the project upon a code commit to the Git project repository. This webhook ensures that the inventory and project files are always up-to-date by running an SCM update job in Tower and eliminates manual syncing or missing items when executing a job or workflow.

{% include video name="Abd2BTyXiwU" %}

Use Case Demo 3:

Network Failure / F5 VIP Failure Self-Healing

In this use case, we simulate a failure of network interfaces to break the connectivity between the F5 Application Delivery Controller and the back-end web servers. The ChatOps integration with Nagios-StackStorm-Slack highlights event-driven automation and self-healing by identifying the failed VIP and triggering the workflow job from Ansible Tower to configure the entire lab back to its working state.

{% include video name="oRQ9ICIpMdU" %}

Use Case Demo 4:

WebApp Integrity Enforcement Self-Healing

This use case profiles the defacement of a production website by a disgruntled employee. Changes are made to the back-end web servers hosting the page data by adding malicious code. The active monitoring notices changes based on the defined criteria of what should be in the web server configuration and triggers the workflow job from Ansible Tower to return the page data back to the approved production state.

{% include video name="Kf_nU-8kTwI" %}

Automation for Network Operations

Automation is not just limited to DevOps. Network operators can also realize many increased efficiences and consistences by automating repetitive tasks and building standardized configurations. It’s important to focus on what benefits automation provides to existing processes and manual tasks. The network is critical to any enterprise as it provides the foundation to interconnect applications, clouds and users.

Are you a network operator interested in automating repeatable tasks or eliminating errors?

<img src="" size="S" position="Right" alt="F5" %}

F5 has taken a proactive approach to bridge the current network operations professionals and DevOps teams with FREE, self-paced online training called Super-NetOps. Note: You’ll need to create an F5 Networks support account if you do not already have one. Super-NetOps teaches administrators how to deploy standard configuration elements through automation to reduce errors and time-to-service with interactive labs.

If you are interested in a demo or network automation services for your environment, please reach out! We will be running more events in the future expanding on these demos and continuing to highlight new use cases.

//take the first step