Verification of programmable networks

The underlying networks serving the online services and applications of a business must be reliable, secure, and highly available. Outages caused by defective network devices, misconfigurations, or security breaches can lead to expensive losses. Consequently, network verification is a crucial task in the network management and control process. In the traditional networking paradigm, there is a fixed set of functionalities that devices such as IP-based routers and Ethernet-based switches support. Specifically, in these devices, it is not easy to modify the control plane (the decision-making component) without the manufacturer’s intervention. The control plane and the data plane (the component responsible for forwarding the traffic based on the decisions made by the control plane) are highly coupled together.

Thus, network operators can only configure these devices which makes the orchestration and management of these devices a daunting task in highly customized environments. Software-defined Networking (SDN) was first proposed as a new paradigm that introduces programmability to the control plane. It separates the control plane from the data plane in the devices and places the control plane in a logically centralized software SDN controller. This decoupling lets the SDN controller, often running in commodity hardware, manage multiple data plane devices (i.e., SDN switches). However, in traditional SDNs the functionalities of devices were fixed, and not programmable. Later, the P4 language was introduced to bring programmability to the data plane as well. P4 is a domain specific language with multiple abstractions for packet processing. It enables the flexibility of data plane programming without sacrificing the line-rate performance. And as a result, allow network administrators to implement custom network policies  independent of the target devices. This opens up opportunities for novel uses of the network: e.g., in-band network telemetry, in-network caching, and in-network congestion control.

The increased capabilities of the programmability, however, are the portent of new challenges to the network correctness. In order to check the overall consistency of a network, we need to first verify that the control plane is making decisions as expected by the network operator, i.e. the control plane is consistent with the management plane. Second, we need to check that the data plane is forwarding the traffic as expected by the control plane. The conventional wisdom is that in SDN, the logically centralized control plane has an accurate representation of the actual data plane state, e.g. network devices consistently match and forward the traffic as specified by the rules generated by the control plane. However, just like any other program, SDN switches and P4 programs are prone to software bugs or dataplane faults. The faults can manifest in many ways e.g., resulting in abnormal network behavior different from the expected behavior or in other words, the control plane may be completely unaware of the abnormal data plane behavior caused by the faults on the dataplane.

Previous work in this area, however, lacks a holistic methodology to tackle this problem and thus, addresses only certain parts of the problem. Yet, the consistency of the overall system is only as good as its least consistent part. To that end, we need systems that perform consistency checking between different planes and do not assume that the control plane has an accurate representation of the data plane state. In classic SDNs, motivated by an analogy of network consistency checking with program testing, we propose to add active probe-based network state fuzzing to our consistency check repertoire. Hereby, our system, PAZZ, combines production traffic with active probes to
periodically test if the actual forwarding path and decision elements (on the data plane) correspond to the expected ones (on the control plane).

Our insight is that active traffic covers the inconsistency cases beyond the ones identified by passive traffic. PAZZ prototype was built and evaluated on topologies of varying scale and complexity. Our results show that PAZZ requires minimal network resources to detect persistent data plane faults through fuzzing and localize them quickly while outperforming baseline approaches. For P4-enabled SDNs, we present P4CONSIST, a system that takes advantage of P4 to verify the control-data plane inconsistency. The workflow of P4Consist reposes on four main modules: a control plane module, a data plane module, an input traffic generator, and an analyzer. P4CONSIST continuously generates active probe-based traffic as an input to the network. For a given packet (5-tuple flow) and a source-destination pair, the control plane module proactively gathers the topology and configuration information in the form of an expected report. On the other side, the data plane module traces the forwarding behavior of each switch on the path using INT (In-band Network Telemetry) to continuously elicit telemetry data in the form of an actual report.

Finally, the analyzer compares the two results to detect control-data plane inconsistencies using depth-first search (DFS) and symbolic execution. Experiments with our prototype show that P4CONSIST can verify the control-data plane consistency in the complex datacenter 4ary fat-tree (20 Switches) and multipath grid (4, 9, and 16 Switches)
topologies with 60k rules per switch within a minimum time of 4 minutes.

References