The increased use of video conferencing applications (VCAs) has made it critical to understand and support end- user quality of experience (QoE) by all stakeholders in the VCA ecosystem, especially network operators, who typically do not have direct access to client software. Existing VCA QoE estimation methods use passive measurements of application- level Real-time Transport Protocol (RTP) headers. However, a network operator does not always have access to RTP head- ers, particularly when VCAs use custom RTP protocols (e.g., Zoom) or due to system constraints (e.g., legacy measurement systems). Given this challenge, this paper considers can we use more standard features in the network traffic, namely the IP and UDP headers, to provide per-second estimates of key VCA QoE metrics such as frames rate and video resolution. We develop a method that uses machine learning with a combi- nation of flow statistics (e.g., throughput) and features derived based on the mechanisms used by the VCAs to fragment video frames into packets. We evaluate our method for three preva- lent VCAs running over WebRTC: Google Meet, Microsoft Teams, and Cisco Webex. Our evaluation consists of 54,696 seconds of VCA data collected from both (1), controlled in-lab network conditions, and (2) 15 real-world access networks. We show that our approach yields similar accuracy compared to the RTP-based baselines, despite using only IP/UDP data. For instance, we can estimate frame rate within 2 FPS for up to 83.05% of one-second intervals in the real-world data, which is only 1.76% lower than using the RTP headers.
Sep 19, 2023