The Cone of Silence: Speech Separation by Localization

הפרדת דוברים מבוסס זווית

מספר פרויקט
412
סטטוס - הצעה
הצעה
אחראי אקדמי
שנה
2024

הרקע לפרויקט:

Given a multi-microphone recording of an unknown number of speakers talking concurrently, this project simultaneously localizes the sources and separate the individual speakers. The core of this method is a deep network in the waveform domain, which isolates sources within an angular region θ ± w/2, given an angle of interest θ and angular window size w. By exponentially decreasing w, we can perform a binary search to localize and separate all sources in logarithmic time. This algorithm allows for an arbitrary number of potentially moving speakers at the same time, including more speakers than seen during training.
Automating this process of speech separation has many valuable applications, including assistive technology for the hearing impaired, improvement of Automatic Speech Recognition(ASR) systems, or better transcription of spoken content in noisy in-the-wild Internet videos (Speech to Text).

מטרת הפרויקט:

The purpose of this project is to develop a method that can effectively localize and separate individual speakers in a multi-microphone recording where multiple speakers are talking simultaneously.

תכולת הפרויקט:

1. Watch the lectures in youtube - Stanford University CS231n, Spring 2017
2. Read the paper
3. Download the dataset
4. Build the model
5. Train the model
6. Expect to satisfactory results :))
The project will be implemented in Pytorch

קורסי קדם:

Deep Learing, Python and Pytorch.

דרישות נוספות:

Watching related videos on YouTube

מקורות:

We will implement the following paper:
The Cone of Silence:Speech Separation by Localization

תאריך עדכון אחרון : 31/07/2023