CC* Integration-Large: Bringing Code to Data: A Collaborative Approach to Democratizing Internet Data Science

Funding source: NSF OAC-2126281. Period of performance: 10/01/2021 -- 09/30/2024.

Project Overview

Successful application of machine learning (ML) for networking problems depends on the availability of high-quality labeled data from real-world networks. Equally critical is the ability to share these datasets, respecting the data owners' privacy concerns. Unfortunately, short of sharing the data via today’s commonly-applied data-to-code paradigm, researchers lack a systematic framework for working with or benefiting from data collected and curated by third parties. Consequently, Internet Data Science as practiced today is ill-suited for applications such as (i) high-quality data labeling, (ii) rigorous evaluation of research artifacts such as learning models, and (iii) independent validation/reproducibility of reported research findings.

This collaborative project brings together researchers from UO, UCSB, and NIKSUN, Inc. and will investigate an innovative collaborative data labeling and knowledge sharing framework in three thrusts. First, the project will investigate a novel code-to-data approach that entails sharing of programmatic representations of operators' domain knowledge to identify events of interest in the data. Second, the project will design and develop a new learning framework to enable the pursuit of Internet Data Science as a full-fledged collaborative effort. Third, the project will illustrate the capabilities of the proposed framework in the context of collaborative efforts between two participating universities (UO and UCSB) and demonstrate its ability to scale to any number of participants.

The resulting framework will serve as a driving force for advancing collaborative efforts in the emerging area of Internet Data Science. In addition to identifying some of the fundamental changes to how ML ought to be used in networking, the research findings will benefit both industry and academia and will ensure that tomorrow's workforce has the proper training to fully exploit the application of ML for network-specific problems. Also, the outcomes will catalyze the development of a roadmap for the adoption of Internet Data Science efforts by operators and the deployment of ensuing research artifacts in real-world production networks.

People

  • Lead PI: Ram Durairajan
  • Co-PIs: Reza Rejaie (Co-PI, UO), Jon Miyake (Co-PI, UO), Arpit Gupta (Co-PI, UCSB), Walter Willinger (Senior Personnel, NIKSUN, Inc.)
  • Ph.D. Students: TBD
  • M.S. Students: Abduarraheem Elfandi, Mana Atarod
  • B.S. Students Alumni: Jared Knofcynzski

Publications

  • Leveraging Prefix Structure to Detect Volumetric DDoS Attack Signatures with Programmable Switches
    Chris Misa, Ramakrishnan Durairajan, Arpit Gupta, Reza Rejaie and Walter Willinger
    In IEEE Symposium on Security and Privacy (S&P) (Oakland '24), San Francisco, CA, May 2024.
    [PAPER]     [CODE]    

  • Data-Fusion for Prefix-Level Inference: A DDoS Case Study
    Chris Misa, Ramakrishnan Durairajan, Reza Rejaie and Walter Willinger
    In Security Datasets for AI (SECDAI) workshop, virtual, April 2024.
    [PAPER]    

  • Network Management with Graph Machine Learning: Challenges and Solutions
    Yu Wang and Ramakrishnan Durairajan
    In Security Datasets for AI (SECDAI) workshop, virtual, April 2024.
    [PAPER]    

  • DynATOS+: A Network Telemetry System for Dynamic Traffic and Query Workloads
    Chris Misa, Ramakrishnan Durairajan, Reza Rejaie and Walter Willinger
    In IEEE/ACM Transactions on Networking, 2024.
    [PAPER]    

  • Special Issue on The ACM SIGMETRICS Workshop on Measurements for Self-Driving Networks
    Arpit Gupta, Ramakrishnan Durairajan and Walter Willinger
    In Proceedings of ACM SIGMETRICS Performance Evaluation Review, 2023.
    [PAPER]    

  • ARISE: A Multi-Task Weak Supervision Framework for Network Measurements
    Jared Knofczynski, Ramakrishnan Durairajan and Walter Willinger
    In IEEE JSAC Series on Machine Learning in Communications and Networks, July 2022.
    [PAPER]     [CODE]    

  • Dynamic Scheduling of Approximate Telemetry Queries
    Chris Misa, Walt O'Connor, Ramakrishnan Durairajan, Reza Rejaie and Walter Willinger
    In Proceedings of USENIX NSDI'22, Renton, WA, April 2022.
    [PAPER]     [PROJECT WEBSITE]     [CODE]    

  • Revisiting Network Telemetry in COIN: A Case for Runtime Programmability
    Chris Misa, Ramakrishnan Durairajan, Reza Rejaie and Walter Willinger
    In IEEE Network (In-Network Computing: Emerging Trends for the Edge-Cloud Continuum), September 2021.
    [PAPER]     [PROJECT WEBSITE]    

  • Challenges in Using ML for Networking Research: How to Label If You Must
    Yukhe Lavinia, Ramakrishnan Durairajan, Reza Rejaie and Walter Willinger
    In Proceedings of Workshop on Network Meets AI & ML (NetAI'20)
    co-located with ACM SIGCOMM'20, New York, USA, August 2020.
    [PAPER]    

Outreach