How ITS Uses Machine Learning to Measure and Improve Speech Quality
When public safety professionals use telecommunications systems to communicate with one another, it’s easy for them to tell when there’s an issue with the signals—they hear distorted sound, static or interruptions, to name a few examples.
Fixing these issues is much tougher. As the amount of spectrum used to transmit speech decreases, so do speech quality and intelligibility. A reliable system for measuring speech quality and intelligibility is required to optimize the two quantities—adjusting bandwidth use to efficiently deliver acceptable quality and intelligibility.
Unfortunately, measurements using human listeners are time-consuming and expensive. Existing automated measurements are fast, but require systems to be taken offline to be tested. Improving these kinds of measurements would lead to more reliable and efficient telecommunications systems. This is especially critical for systems used by first responders, when clear voice communications can save lives.
To make maximally efficient use of radio spectrum, industry and government need a measurement method that can be deployed to in-service networks without needing access to the original signal for comparison. In other words, a machine that can learn to do what humans do effortlessly: judge the quality of a speech signal when it arrives at the end of the transmission channel without knowing anything about what it sounded like when it started. This is called a no-reference, or NR, measurement.
Researchers at NTIA’s Institute for Telecommunication Sciences (ITS) have made breakthroughs in their quest for accurate real-time measurements of speech quality and intelligibility. They trained neural networks using a large speech database to develop software that allows for quick and accurate assessments of speech quality and intelligibility. This paves the way for optimizing user experience and conserving radio spectrum.
NR measurements have been an industrywide goal for years, but previous development attempts produced less-than-satisfactory results. ITS’ Quality of Experience (QoE) team has taken the first step toward solving this problem. Their breakthrough leverages recent advances in deep neural networks as well as a significant trove of data that ITS generated specifically for this purpose over the course of many months. The work takes advantage of expertise that ITS has accumulated over decades of working with public safety to improve speech intelligibility.
ITS used its data to train NR measurements called WAWEnets. The team applied machine learning techniques to this big data set to train convolutional neural networks that measure speech quality and speech intelligibility. Inspired by biological processes, convolutional neural networks assemble more complex patterns using smaller and simpler patterns, and thus are less complex and use less processing power compared with other dense or fully-connected neural networks. The abstract mathematical rules learned from the big data sets have proven to produce results that closely correlate to human-listener quality assessments.
WAWEnets facilitate moving accurate speech testing out of the laboratory environment, advancing innovation to ensure spectrum is available for federal and commercial services. It enables deployment of real-time, in-service, and accurate speech quality or intelligibility monitoring anywhere in a telecom network. The development, evaluation, and deployment of WAWEnets earned ITS a US Department of Commerce Gold Medal for Scientific/Engineering achievement in 2020, and ITS work to apply machine learning to benefit telecommunication users has continued. In June 2021, ITS published two papers at the IEEE-sponsored 13th International Conference on Quality of Multimedia Experience. “Full-Reference and No-Reference Objective Evaluation of Deep Neural Network Speech” addresses measurement issues that arise when machine learning is used to produce speech and “Measuring Speech Quality of System Input while Observing only System Output” lays out an entirely new paradigm for applying machine learning to measure speech.
The WAWEnets software is available on NTIA’s GitHub page for other researchers to build upon. For a more detailed description of WAWEnets, you can read the article by ITS’s Andrew A. Catellier and Stephen D. Voran, “WAWEnets: A No-Reference Convolutional Waveform-Based Approach to Estimating Narrowband and Wideband Speech Quality.”