CC* Integration-Large: Bringing Code to Data: A Collaborative Approach to Democratizing Internet Data Science

Funding source: NSF OAC-2126281. Period of performance: 10/01/2021 -- 09/30/2023.

Project Overview

Successful application of machine learning (ML) for networking problems depends on the availability of high-quality labeled data from real-world networks. Equally critical is the ability to share these datasets, respecting the data owners' privacy concerns. Unfortunately, short of sharing the data via today’s commonly-applied data-to-code paradigm, researchers lack a systematic framework for working with or benefiting from data collected and curated by third parties. Consequently, Internet Data Science as practiced today is ill-suited for applications such as (i) high-quality data labeling, (ii) rigorous evaluation of research artifacts such as learning models, and (iii) independent validation/reproducibility of reported research findings.

This collaborative project brings together researchers from UO, UCSB, and NIKSUN, Inc. and will investigate an innovative collaborative data labeling and knowledge sharing framework in three thrusts. First, the project will investigate a novel code-to-data approach that entails sharing of programmatic representations of operators' domain knowledge to identify events of interest in the data. Second, the project will design and develop a new learning framework to enable the pursuit of Internet Data Science as a full-fledged collaborative effort. Third, the project will illustrate the capabilities of the proposed framework in the context of collaborative efforts between two participating universities (UO and UCSB) and demonstrate its ability to scale to any number of participants.

The resulting framework will serve as a driving force for advancing collaborative efforts in the emerging area of Internet Data Science. In addition to identifying some of the fundamental changes to how ML ought to be used in networking, the research findings will benefit both industry and academia and will ensure that tomorrow's workforce has the proper training to fully exploit the application of ML for network-specific problems. Also, the outcomes will catalyze the development of a roadmap for the adoption of Internet Data Science efforts by operators and the deployment of ensuing research artifacts in real-world production networks.

People

  • Lead PI: Ram Durairajan
  • Co-PIs: Reza Rejaie (Co-PI, UO), David Teach (Co-PI, UO), Arpit Gupta (Co-PI, UCSB), Walter Willinger (Senior Personnel, NIKSUN, Inc.)
  • Ph.D. Students: Yukhe Lavinia
  • B.S. Students: Jared Knofcynzski

Publications

To be added.

Software and Datasets

To be added.