Seminar "Selected Topics in Speech and Audio Signal Processing"

 

Basic Information
Lecturers: Gerhard Schmidt and group
Semester: Summer term
Language: English or German
Target group: Master students in electrical engineering and computer engineering
Prerequisites: Fundamentals in digital signal processing
Registration
procedure:

If you want to sign up for this seminar, you need to register with the following information in the registration form

  • surname, first name,
  • e-mail address,
  • matriculation number,

Please note that the registration period starts 01.04.2025 at 10:00 h and ends 13.04.2025 at 23:59 h. All applications before and after this registration period will not be taken into account.

Registration will be possible within the before mentioned time by sending an e-mail with the desired seminar topic, name and matriculation number to This email address is being protected from spambots. You need JavaScript enabled to view it..

Only one student per topic is permitted (first come - first serve).

The registration is binding. A deregistration is only possible by sending an e-mail with your name and matriculation number to This email address is being protected from spambots. You need JavaScript enabled to view it. until Sunday, 13.04.2025 at 23:59 h. All later cancellations of registration will be considered as having failed the seminar.

Time: Preliminary meeting per arrangement with individual supervisor
Written report due on xx.xx.2025
Final presentations, xx.xx.2025 at xx:xx h, xx
Contents:

Students write a scientific report on a topic closely related to the current research of the DSS group. Potential topics, therefore, deal with digital signal processing related to speech and audio signal processing.

Students will also present their findings in front of the other participants and the DSS group.

 

Topics for SoSe 25

Topic title Description
Real-time Formant Extraction in Pathological Speech Formants, which are the resonant frequencies of the vocal tract, play a crucial role in speech production and perception. The real-time extraction of formants is especially important in applications such as pathological speech processing, where it aids in diagnosing and monitoring speech disorders like dysarthria, apraxia, and other conditions that affect speech clarity and quality. Various methods for real-time formant extraction, including both classical signal processing techniques and modern machine learning-driven approaches, should be compared. Understanding the advantages and limitations of these methods will help in selecting the most appropriate approach for different speech processing tasks.
Speech Evaluation Metrics The assessment of speech quality and intelligibility is of great interest, both in speech therapy and in the evaluation of algorithms for improving speech signals with regard to interference factors such as noise and reverberation. The aim of this seminar paper is to summarize and compare various objective assessment methods. In addition to traditional assessment criteria such as STOI, PESQ, etc., approaches based on neural networks should also be used for the comparison.
Auditory Feedback Modulation Auditory feedback, which involves the real-time processing and modulation of sound signals, plays a crucial role in modifying voice quality, particularly in medical contexts. It is especially important for applications in speech therapy and rehabilitation, where controlling and enhancing voice quality can aid in diagnosing and treating speech disorders. Real-time modification of voice quality can be achieved through a range of techniques, including signal processing, and machine learning algorithms. Challenges and innovations involved in developing systems that dynamically adjust voice quality based on the user's specific needs and environmental factors will be explored. By reviewing existing literature on voice quality modification techniques, the research aims to provide a comprehensive understanding of the current approaches and their effectiveness in the treatment of speech disorders, such as dysphonia or other vocal impairments.
Diffusion-based Neural Networks for Speech Enhancement In many real-world applications, such as telephone conversations, video conferencing, or hearing aids, speech quality is often compromised by background noise. Conventional speech enhancement methods reach their limits, especially in the presence of severe distortion or complex background noise. Diffusion-based neural networks offer a promising approach by gradually reconstructing speech signals while preserving fine acoustic details. The aim of this seminar is to explore how these models work and what advantages they offer over traditional approaches.