Wireless Acoustic Sensor Networks: Combined Acoustic Echo Cancellation and Adaptive Beamforming
Hands free communication tasks involve different speech processing techniques. These include noise reduction, extraction of desired speech signals and attenuation of interfering speech signals. A teleconferencing system also requires echo cancellation. Recent applications usually contain more than one microphone, enabling spatial processing in addition to time-frequency processing techniques. As wireless communication becomes very popular, new algorithms based on wireless acoustic sensor network (WASN) were developed. Such network is constructed of several nodes with one or more sensors each, communicating through a wireless media. The wireless network is distributed over a large volume and has a better coverage than a single wired sensor array. Noise reduction, echo cancellation and algorithms for WASN are all widely explored. However, their combination is still considered a challenge. When integrating noise reduction system with echo cancellation system, the performance of one or both of the systems is insufficient. If these systems are adaptive, changes in one system hamper the convergence of the other and vice versa. Fitting such integrated scheme to a WASN is also a cumbersome task.
In this research a distributed generalized sidelobe canceller (GSC)-based algorithm is developed for a fully connected (WASN). It combines desired speakers enhancement and noise reduction with echo cancellation in a reverberant surroundings. The algorithm suits networks with number of nodes higher than the number of constrained speakers, with at least two sensors at each node. Work flow of the system is comprised of two main stages, namely local and global stages. In the local stage, every node processes its own sensors signals to produce an audio output and a local activity mask of its nearest speaker. The local activity mask estimation is based on a combination of a model-based speech presence probability (SPP), direct to reverberant ratio (DRR), direction of arrival (DOA) and energy level estimations per time-frequency bin. The local output and mask are transmitted to all nodes in the network and utilized in the global step, where a global output and mask are generated. Adaptation of the algorithm is controlled by the global activity masks, which are also shared in the network and indicate exclusive speaker activity. The adaptive nature of this algorithm makes it suitable for real-time applications such as speech enhancements for audio-conference. An extensive simulation study is provided, showing that the proposed algorithm is sub-optimal compared to the centralized solution, but it outperforms a single-node beamformer.