Consumers, regulators, and ISPs all use client-based “speed tests” to measure network performance, both in single-user settings and in aggregate. Two prevalent speed tests, Ookla’s Speedtest and Measurement Lab’s Network Diagnostic Test (NDT), are often used for similar purposes, despite having significant differences in both the test design and implementation and in the infrastructure used to conduct measurements. In this paper, we present a comparative evaluation of Ookla and NDT7 (the latest version of NDT), both in controlled and wide-area settings. Our goal is to characterize when, how much and under what circumstances these two speed tests differ, as well as what factors contribute to the differences. To study the effects of the test design, we conduct a series of controlled, in-lab experiments under a variety of network conditions and usage modes (TCP congestion control, native vs. browser client). Our results show that Ookla and NDT7 report similar speeds when the latency between the client and server is low, but that the tools diverge when path latency is high. To characterize the behavior of these tools in wide-area deployment, we collect more than 40,000 pairs of Ookla and NDT7 measurements across six months and 67 households, with a range of ISPs and speed tiers. Our analysis demonstrates various systemic issues, including high variability in NDT7 test results and systematically under-performing servers in the Ookla network.