CNS Core: Medium: Distributed Runtime Dataplane Telemetry as an Adaptive Query Scheduling Problem: Algorithms and Applications

Funding source: NSF CNS-2212590. Period of performance: 10/01/2022 -- 09/30/2026.

Project Overview

The networks comprising the Internet must be monitored to ensure high quality, reliable service to end users. Emerging network telemetry systems that rely on modern data plane technologies (e.g., programmable switches) offer a high degree of visibility into todays’ networks and promise to enable a new generation of network security and network performance applications (e.g., detecting new cyberattacks; supporting quality-of-experience requirements for video streaming). These systems consider tasks that are defined as queries and compiled onto a switch. The switch then performs packet processing at line rate for only those packets that satisfy the given query. However, existing network telemetry system are typically designed for static query and traffic workloads, don’t scale with the number of queries or traffic rate, and assume that tasks are simple (e.g., stateless, single switch). This project seeks to develop a class of next-generation network telemetry systems that address these challenges.

The emergence of runtime programmable data plane devices is leveraged by this project, which seeks to develop and experimentally evaluate a scalable telemetry system that can accommodate traffic and query dynamics and support adaptive telemetry applications over multiple switches across a network. These devices allow for time-division-multiplexing of limited switch resources which in turn facilitates fine-grained resource management to cope with dynamics and achieve scalability. The novel scientific contributions of this project include (i) a multi-objective approximation-based query scheduling scheme to manage resources on a single switch with multiple stages while controlling accuracy-latency-reporting load tradeoff; (ii) an extension of the scheduling scheme to seamlessly support queries that require traffic visibility across multiple switches; and (iii) support for adaptive telemetry applications to facilitate their stateful and iterative execution, and testing.

People

  • Lead PI: Reza Rejaie
  • Co-PIs: Ram Durairajan (Co-PI, UO), Walter Willinger (Senior Personnel, NIKSUN, Inc.)
  • Ph.D. Students: Chris Misa
  • B.S. Students: TBD

Publications

Outreach