You searched for subject:(Cocktail Party Problem)
.
Showing records 1 – 11 of
11 total matches.
No search limiters apply to these results.

University of Texas – Austin
1.
-0608-5688.
The neural representation of simultaneous speech and music.
Degree: MA, Communications Sciences and Disorders, 2019, University of Texas – Austin
URL: http://dx.doi.org/10.26153/tsw/7539
► Research on neural processing of speech in the presence of other sounds has mostly been limited to studies of the cocktail party problem, in which…
(more)
▼ Research on neural processing of speech in the presence of other sounds has mostly been limited to studies of the
cocktail party problem, in which a target speech signal is superimposed on other speech. The processing of speech in combination with other types of sound, meanwhile, has received little research and is poorly understood. In the current study, electroencephalography (EEG) was used to measure listeners’ neural responses to stimuli consisting of overlapping segments of speech and different musical instruments. Presented here is a preliminary analysis of these data that indicates differential neural representation of component sounds in these mixtures. Possible explanations for this result are discussed, as well as potential future analyses of the data.
Advisors/Committee Members: Hamilton, Liberty (advisor).
Subjects/Keywords: Speech perception; Music perception; Cocktail party problem; Electroencephalography
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
-0608-5688. (2019). The neural representation of simultaneous speech and music. (Masters Thesis). University of Texas – Austin. Retrieved from http://dx.doi.org/10.26153/tsw/7539
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Chicago Manual of Style (16th Edition):
-0608-5688. “The neural representation of simultaneous speech and music.” 2019. Masters Thesis, University of Texas – Austin. Accessed January 24, 2021.
http://dx.doi.org/10.26153/tsw/7539.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
MLA Handbook (7th Edition):
-0608-5688. “The neural representation of simultaneous speech and music.” 2019. Web. 24 Jan 2021.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Vancouver:
-0608-5688. The neural representation of simultaneous speech and music. [Internet] [Masters thesis]. University of Texas – Austin; 2019. [cited 2021 Jan 24].
Available from: http://dx.doi.org/10.26153/tsw/7539.
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete
Council of Science Editors:
-0608-5688. The neural representation of simultaneous speech and music. [Masters Thesis]. University of Texas – Austin; 2019. Available from: http://dx.doi.org/10.26153/tsw/7539
Note: this citation may be lacking information needed for this citation format:
Author name may be incomplete

Delft University of Technology
2.
Hulsinga, Derk-Jan (author).
The cocktail party problem: GSVD-beamformers for speech in reverberant environments.
Degree: 2018, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:60a67ca0-6110-480a-bc10-5f7bd5908029
► Hearing aids as a form of audio preprocessing is increasingly common in everyday life. The goal of this thesis is to implement a blind approach…
(more)
▼ Hearing aids as a form of audio preprocessing is increasingly common in everyday life. The goal of this thesis is to implement a blind approach to the
cocktail party problem and challenge some of the regular assumptions made in literature. We approach the problemas wideband FD-BSS. From this field of research, the common assumption of contineous activity is dropped. Instead a number of users detection is implemented as a preprocessing step and ensure the appropriate number of demixing vectors for each time frequency bin. The validity of the standard mixing model used for STFT’s is challenged by looking at the response of a linear array. Source separation is achieved by demixing vectors based on the GSVD, derived in a model-based approach. While most ermutation solvers offer an a posteriori solution for all users, we looked at finding local solutions for a single user. Combining this with the user identification called the alignment step, we conclude that the permutation
problem can be reduced to selecting a demixing vector for each discrete time-frequency instance. The correlation coefficient proves to be a sufficient metric to couple reconstructions to the original data as it selects most of the active time-frequency bins. In the far-field case, our approach performs in a comparable but not superior manner. We did find that our method is much more robust against inaccuracies introduced when narrowband channels are assumed but not actually available. This is strongly exemplified by our experiment of a changing DFT-size. The Frobinius norm was suggested as a measure of distance between the estimate STFT and the orignial signals time frequency domain description but it resulted in counter intuitive results which didn’t correspond with other metrics used in this thesis. It is expected that there are effects induced by changing the size of the STFT which are not accounted for. Our demixing vectors achieve comparable inteligibility, measured by STOI, as the compared techniques and it is more robust against smaller sample sizes than the theoretically SINR optimal MVDR.
Advisors/Committee Members: van der Veen, Alle-Jan (mentor), Heusdens, Richard (graduation committee), Weber, Jos (graduation committee), Delft University of Technology (degree granting institution).
Subjects/Keywords: Speech separation; Blind source separation; Generalized Singular Value Decomposition; Permutation problem; Cocktail Party Problem
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hulsinga, D. (. (2018). The cocktail party problem: GSVD-beamformers for speech in reverberant environments. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:60a67ca0-6110-480a-bc10-5f7bd5908029
Chicago Manual of Style (16th Edition):
Hulsinga, Derk-Jan (author). “The cocktail party problem: GSVD-beamformers for speech in reverberant environments.” 2018. Masters Thesis, Delft University of Technology. Accessed January 24, 2021.
http://resolver.tudelft.nl/uuid:60a67ca0-6110-480a-bc10-5f7bd5908029.
MLA Handbook (7th Edition):
Hulsinga, Derk-Jan (author). “The cocktail party problem: GSVD-beamformers for speech in reverberant environments.” 2018. Web. 24 Jan 2021.
Vancouver:
Hulsinga D(. The cocktail party problem: GSVD-beamformers for speech in reverberant environments. [Internet] [Masters thesis]. Delft University of Technology; 2018. [cited 2021 Jan 24].
Available from: http://resolver.tudelft.nl/uuid:60a67ca0-6110-480a-bc10-5f7bd5908029.
Council of Science Editors:
Hulsinga D(. The cocktail party problem: GSVD-beamformers for speech in reverberant environments. [Masters Thesis]. Delft University of Technology; 2018. Available from: http://resolver.tudelft.nl/uuid:60a67ca0-6110-480a-bc10-5f7bd5908029

University of California – San Francisco
3.
Hullett, Patrick W.
Functional Organization of Speech Processing Areas and A Systematic Approach to the Cocktail Party Problem.
Degree: Bioengineering, 2013, University of California – San Francisco
URL: http://www.escholarship.org/uc/item/77q5w40n
► The brain is a physical system that can perform intelligent computations. We are interested in nature of those computations to understand how the brain does…
(more)
▼ The brain is a physical system that can perform intelligent computations. We are interested in nature of those computations to understand how the brain does intelligent things. To that end we have focused on two particularly fruitful questions that were tractable given the current state of knowledge and resources: What is the organization of processing in human speech centers? And, how does the brain solve the cocktail party problem?To address the first question, we recorded superior temporal gyrus activity in awake human subjects passively listening to speech stimuli using electrocorticography. The high spatial and temporal resolution of this recording technique combined with maximally informative dimension analysis made it possible to compute high density spectrotemporal receptive field maps in a region of the brain specialized for speech perception. Based on these maps, we found that human superior temporal gyrus has a strong modulotopic organization - a higher order analog of tonotopic organization that has not been previously identified in any human or non-human auditory area. To investigate the mechanisms by which neural systems solve the cocktail party problem, we created animals that are specialists at extracting vocalization information in the face of by noise-rearing rats and testing them behaviorally to show specialization. Through single unit recordings from primary auditory cortex, we identified a subpopulation of neurons that can extract vocalization information in the face of noise. Although the prevalence of these neurons is the same in both groups of animals, neurons from specialized animals extract information at significantly higher rates. Further receptive field analysis will give insight to the underlying mechanism of this ability. This work demonstrates the ability to create animals specialized at solving the cocktail party problem and a method to identify neurons that contribute to this specialization. This approach can be applied to different classes of noise to generate and refine models of cocktail party processing.
Subjects/Keywords: Neurosciences; Cocktail Party Problem; Functional Organization; Modulotopic Organization; Noise-rearing; Superior Temporal Gyrus
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Hullett, P. W. (2013). Functional Organization of Speech Processing Areas and A Systematic Approach to the Cocktail Party Problem. (Thesis). University of California – San Francisco. Retrieved from http://www.escholarship.org/uc/item/77q5w40n
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Hullett, Patrick W. “Functional Organization of Speech Processing Areas and A Systematic Approach to the Cocktail Party Problem.” 2013. Thesis, University of California – San Francisco. Accessed January 24, 2021.
http://www.escholarship.org/uc/item/77q5w40n.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Hullett, Patrick W. “Functional Organization of Speech Processing Areas and A Systematic Approach to the Cocktail Party Problem.” 2013. Web. 24 Jan 2021.
Vancouver:
Hullett PW. Functional Organization of Speech Processing Areas and A Systematic Approach to the Cocktail Party Problem. [Internet] [Thesis]. University of California – San Francisco; 2013. [cited 2021 Jan 24].
Available from: http://www.escholarship.org/uc/item/77q5w40n.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Hullett PW. Functional Organization of Speech Processing Areas and A Systematic Approach to the Cocktail Party Problem. [Thesis]. University of California – San Francisco; 2013. Available from: http://www.escholarship.org/uc/item/77q5w40n
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Maryland
4.
Vishnubhotla, Srikanth.
SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS.
Degree: Electrical Engineering, 2011, University of Maryland
URL: http://hdl.handle.net/1903/11525
► Automatic segregation of overlapping speech signals from single-channel recordings is a challenging problem in speech processing. Similarly, the problem of extracting speech signals from noisy…
(more)
▼ Automatic segregation of overlapping speech signals from single-channel recordings is a challenging
problem in speech processing. Similarly, the
problem of extracting speech signals from noisy speech is a
problem that has attracted a variety of research for several years but is still unsolved. Speech extraction from noisy speech mixtures where the background interference could be either speech or noise is especially difficult when the task is to preserve perceptually salient properties of the recovered acoustic signals for use in human communication. In this work, we propose a speech segregation algorithm that can simultaneously deal with both background noise as well as interfering speech. We propose a feature-based, bottom-up algorithm which makes no assumptions about the nature of the interference or does not rely on any prior trained source models for speech extraction. As such, the algorithm should be applicable for a wide variety of problems, and also be useful for human communication since an aim of the system is to recover the target speech signals in the acoustic domain. The proposed algorithm can be compartmentalized into (1) a multi-pitch detection stage which extracts the pitch of the participating speakers, (2) a segregation stage which teases apart the harmonics of the participating sources, (3) a reliability and add-back stage which scales the estimates based on their reliability and adds back appropriate amounts of aperiodic energy for the unvoiced regions of speech and (4) a speaker assignment stage which assigns the extracted speech signals to their appropriate respective sources. The pitch of two overlapping speakers is extracted using a novel feature, the 2-D Average Magnitude Difference Function, which is also capable of giving a single pitch estimate when the input contains only one speaker. The segregation algorithm is based on a least squares framework relying on the estimated pitch values to give estimates of each speaker's contributions to the mixture. The reliability block is based on a non-linear function of the energy of the estimates, this non-linear function having been learnt from a variety of speech and noise data but being very generic in nature and applicability to different databases. With both single- and multiple- pitch extraction and segregation capabilities, the proposed algorithm is amenable to both speech-in-speech and speech-in-noise conditions. The algorithm is evaluated on several objective and subjective tests using both speech and noise interference from different databases. The proposed speech segregation system demonstrates performance comparable to or better than the state-of-the-art on most of the objective tasks. Subjective tests on the speech signals reconstructed by the algorithm, on normal hearing as well as users of hearing aids, indicate a significant improvement in the perceptual quality of the speech signal after being processed by our proposed algorithm, and suggest that the proposed segregation algorithm can be used as a pre-processing block within the…
Advisors/Committee Members: Espy-Wilson, Carol Y (advisor).
Subjects/Keywords: Electrical Engineering; cocktail party problem; noise suppression; pitch tracking; speech enhancement; speech extraction; speech segregation
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Vishnubhotla, S. (2011). SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS. (Thesis). University of Maryland. Retrieved from http://hdl.handle.net/1903/11525
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Vishnubhotla, Srikanth. “SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS.” 2011. Thesis, University of Maryland. Accessed January 24, 2021.
http://hdl.handle.net/1903/11525.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Vishnubhotla, Srikanth. “SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS.” 2011. Web. 24 Jan 2021.
Vancouver:
Vishnubhotla S. SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS. [Internet] [Thesis]. University of Maryland; 2011. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/1903/11525.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Vishnubhotla S. SEGREGATION OF SPEECH SIGNALS IN NOISY ENVIRONMENTS. [Thesis]. University of Maryland; 2011. Available from: http://hdl.handle.net/1903/11525
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

McMaster University
5.
Wiklund, Karl.
The Cocktail Party Problem: Solutions and Applications.
Degree: PhD, 2009, McMaster University
URL: http://hdl.handle.net/11375/17396
► The human auditory system is remarkable in its ability to function in busy acoustic environments. It is able to selectively focus attention on and…
(more)
▼ The human auditory system is remarkable in its ability to function in busy acoustic environments. It is able to selectively focus attention on and extract a single source of interest in the midst of competing acoustic sources, reverberation and motion. Yet this problem, which is so elementary for most human listeners has proven to be a very difficult one to solve computationally. Even more difficult has been the search for practical solutions to problems to which digital signal processing can be applied. Many applications that would benefit from a solution such as hearing aid systems, industrial noise control, or audio surveillance require that any such solution be able to operate in real time and consume only a minimal amount of computational resources. In this thesis, a novel solution to the cocktail party problem is proposed. This solution is rooted in the field of Computational Auditory Scene Analysis, and makes use of insights regarding the processing carried out by the early human auditory system in order to effectively suppress interference. These neurobiological insights have been thus adapted in such a way as to produce a solution to the cocktail party problem that is practical from an engineering point of view. The proposed solution has been found to be robust under a wide range of realistic environmental conditions, including spatially distributed interference, as well as reverberation.
Thesis
Doctor of Philosophy (PhD)
Advisors/Committee Members: Haykin, Simon, Electrical and Computer Engineering.
Subjects/Keywords: electrical and computer engineering; auditory; cocktail party problem
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Wiklund, K. (2009). The Cocktail Party Problem: Solutions and Applications. (Doctoral Dissertation). McMaster University. Retrieved from http://hdl.handle.net/11375/17396
Chicago Manual of Style (16th Edition):
Wiklund, Karl. “The Cocktail Party Problem: Solutions and Applications.” 2009. Doctoral Dissertation, McMaster University. Accessed January 24, 2021.
http://hdl.handle.net/11375/17396.
MLA Handbook (7th Edition):
Wiklund, Karl. “The Cocktail Party Problem: Solutions and Applications.” 2009. Web. 24 Jan 2021.
Vancouver:
Wiklund K. The Cocktail Party Problem: Solutions and Applications. [Internet] [Doctoral dissertation]. McMaster University; 2009. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/11375/17396.
Council of Science Editors:
Wiklund K. The Cocktail Party Problem: Solutions and Applications. [Doctoral Dissertation]. McMaster University; 2009. Available from: http://hdl.handle.net/11375/17396

Macquarie University
6.
Westermann, Adam.
Understanding speech in complex acoustic environments: the role of informational masking and auditory distance perception.
Degree: 2015, Macquarie University
URL: http://hdl.handle.net/1959.14/1071720
► Theoretical thesis.
At foot of title page: Department of Linguistics, Faculty of Human Sciences and National Acoustics Laboratories, Australian Hearing.
Bibliography: pages 121-132.
1. General…
(more)
▼ Theoretical thesis.
At foot of title page: Department of Linguistics, Faculty of Human Sciences and National Acoustics Laboratories, Australian Hearing.
Bibliography: pages 121-132.
1. General introduction – 2. The effect of spacial separation in distance on the intelligibility of speech in rooms – 3. The effect of a hearing impairment on source-distant dependent speech intelligibility in rooms – 4. The influence of informational masking in reverberant, multi-talker environements – 5. The effect of nearby maskers in reverberant, multi-talker environments – 6. General summary and discussion.
One of the greatest challenges for the auditory system is communicating in environments where speech is degraded by multiple spatially distributed maskers and room reverberation. This "cocktail-party" situation and the related auditory mechanisms have been a topic for numerous studies. This thesis primarily investigated speech intelligibility in such environments— specifically considering the role of differences in distance between talkers and the contribution of informational masking (IM). The first two studies investigated the role of differences in distance between competing talkers on spatial release from masking (SRM) in normal hearing (NH) and subsequently, hearing impaired (HI) listeners. Intelligibility improved for both NH and HI listeners when moving the masker further away from the target. Contrastingly, when the target was moved further away and the maskers were kept near the listener, the results varied significantly across subjects. While intelligibility improved for some NH listeners, the HI listeners performed substantially worse. It was hypothesized that in this condition IM was caused by masker distraction rather than confusion. In the third study, the role of IM was investigated in a simulated cafeteria environment. Substantial IM effects were only observed when the target and masking talker were colocated and the same person. In conditions that resemble real life, no significant IM effects were found. This suggests that IM is of low relevance in real-life listening and is exaggerated by target-masker similarities and the colocated spatial configuration often used in previous listening tests. The final study investigated the effect of nearby masking talkers in a simulated cafeteria environment with NH and HI listeners. The study demonstrated that for realistic conditions, nearby distracters introduce a significant amount of IM in both NH and HI subjects. However, the observed IM was likely not due to target-masker confusions, but rather caused by the nearby masker distracting the listener. Overall, this work suggests that (i) NH and HI listeners use distance related cues in the cocktail-party environment, (ii) in such environments IM related to target-masker confusions is of little relevance, and (iii) nearby maskers introduce IM - likely due to distraction of attention. These findings contribute to our understanding of auditory processing and could potentially have implications on signal…
Advisors/Committee Members: Macquarie University. Department of Linguistics, National Acoustic Laboratories (Australia).
Subjects/Keywords: Auditory masking; Auditory perception; Speech perception; auditory distance perception; informational masking; energetic masking; speech intelligibility; cocktail party problem
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Westermann, A. (2015). Understanding speech in complex acoustic environments: the role of informational masking and auditory distance perception. (Doctoral Dissertation). Macquarie University. Retrieved from http://hdl.handle.net/1959.14/1071720
Chicago Manual of Style (16th Edition):
Westermann, Adam. “Understanding speech in complex acoustic environments: the role of informational masking and auditory distance perception.” 2015. Doctoral Dissertation, Macquarie University. Accessed January 24, 2021.
http://hdl.handle.net/1959.14/1071720.
MLA Handbook (7th Edition):
Westermann, Adam. “Understanding speech in complex acoustic environments: the role of informational masking and auditory distance perception.” 2015. Web. 24 Jan 2021.
Vancouver:
Westermann A. Understanding speech in complex acoustic environments: the role of informational masking and auditory distance perception. [Internet] [Doctoral dissertation]. Macquarie University; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/1959.14/1071720.
Council of Science Editors:
Westermann A. Understanding speech in complex acoustic environments: the role of informational masking and auditory distance perception. [Doctoral Dissertation]. Macquarie University; 2015. Available from: http://hdl.handle.net/1959.14/1071720

University of Maryland
7.
Krishnan, Lakshmi.
Neuromorphic model for sound source segregation.
Degree: Electrical Engineering, 2015, University of Maryland
URL: http://hdl.handle.net/1903/18155
► While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud…
(more)
▼ While humans can easily segregate and track a speaker's voice in a loud noisy environment, most modern speech recognition systems still perform poorly in loud background noise. The computational principles behind auditory source segregation in humans is not yet fully understood. In this dissertation, we develop a computational model for source segregation inspired by auditory processing in the brain. To support the key principles behind the computational model, we conduct a series of electro-encephalography experiments using both simple tone-based stimuli and more natural speech stimulus.
Most source segregation algorithms utilize some form of prior information about the target speaker or use more than one simultaneous recording of the noisy speech mixtures. Other methods develop models on the noise characteristics. Source segregation of simultaneous speech mixtures with a single microphone recording and no knowledge of the target speaker is still a challenge.
Using the principle of temporal coherence, we develop a novel computational model that exploits the difference in the temporal evolution of features that belong to different sources to perform unsupervised monaural source segregation. While using no prior information about the target speaker, this method can gracefully incorporate knowledge about the target speaker to further enhance the segregation.Through a series of EEG experiments we collect neurological evidence to support the principle behind the model.
Aside from its unusual structure and computational innovations, the proposed model provides testable hypotheses of the physiological mechanisms of the remarkable perceptual ability of humans to segregate acoustic sources, and of its psychophysical manifestations in navigating complex sensory environments. Results from EEG experiments provide further insights into the assumptions behind the model and provide motivation for future single unit studies that can provide more direct evidence for the principle of temporal coherence.
Advisors/Committee Members: Shamma, Shihab (advisor).
Subjects/Keywords: Electrical engineering; Neurosciences; auditory EEG; auditory scene analysis; cocktail party problem; sound source segregation; temporal coherence
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Krishnan, L. (2015). Neuromorphic model for sound source segregation. (Thesis). University of Maryland. Retrieved from http://hdl.handle.net/1903/18155
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Krishnan, Lakshmi. “Neuromorphic model for sound source segregation.” 2015. Thesis, University of Maryland. Accessed January 24, 2021.
http://hdl.handle.net/1903/18155.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Krishnan, Lakshmi. “Neuromorphic model for sound source segregation.” 2015. Web. 24 Jan 2021.
Vancouver:
Krishnan L. Neuromorphic model for sound source segregation. [Internet] [Thesis]. University of Maryland; 2015. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/1903/18155.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Krishnan L. Neuromorphic model for sound source segregation. [Thesis]. University of Maryland; 2015. Available from: http://hdl.handle.net/1903/18155
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

University of Lethbridge
8.
Grasse, Lukas Walter Neufeld.
Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments
.
Degree: 2020, University of Lethbridge
URL: http://hdl.handle.net/10133/5815
► Understanding speech in the presence of distracting talkers is a difficult computational problem known as the cocktail party problem. Motivated by auditory processing in the…
(more)
▼ Understanding speech in the presence of distracting talkers is a difficult computational problem known as the cocktail party problem. Motivated by auditory processing in the human brain, this thesis developed a neural network to isolate the speech of a single talker given binaural input containing a target talker and multiple distractors. In this research the network is called a Binaural Speaker Isolation FFTNet or BSINet for short. To compare the performance of BSINet to human participant performance on recognizing the target talker's speech with a varying number of distractors, a "cocktail party" dataset was designed and made available online. This dataset also enables the comparison of network performance to human participant performance. Using the Word-Error-Rate metric for evaluation, this research finds that BSINet performs comparably to the human participants. Thus BSINet provides significant advancement for solving the challenging cocktail party problem.
Subjects/Keywords: Speech Recognition;
Denoising;
Speaker Isolation;
Cocktail Party Problem;
Auditory selective attention;
Neural networks (Computer science);
Speech perception;
Automatic speech recognition;
Directional hearing;
Auditory perception;
Dissertations, Academic
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Grasse, L. W. N. (2020). Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments
. (Thesis). University of Lethbridge. Retrieved from http://hdl.handle.net/10133/5815
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Grasse, Lukas Walter Neufeld. “Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments
.” 2020. Thesis, University of Lethbridge. Accessed January 24, 2021.
http://hdl.handle.net/10133/5815.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Grasse, Lukas Walter Neufeld. “Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments
.” 2020. Web. 24 Jan 2021.
Vancouver:
Grasse LWN. Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments
. [Internet] [Thesis]. University of Lethbridge; 2020. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/10133/5815.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Grasse LWN. Biologically-inspired auditory artificial intelligence for speech recognition in multi-talker environments
. [Thesis]. University of Lethbridge; 2020. Available from: http://hdl.handle.net/10133/5815
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation

Delft University of Technology
9.
Opdam, R.C.G. (author).
Binaural CASA algorithm for speech source localization: Advancements in noisy and reverberant situations.
Degree: 2010, Delft University of Technology
URL: http://resolver.tudelft.nl/uuid:7b397f85-9573-4940-892f-bcf9ae739545
► In this thesis a binaural CASA localization algorithm is developed for the implementation in a binaural hearing aid with downstream speech enhancement. Two binaural CASA…
(more)
▼ In this thesis a binaural CASA localization algorithm is developed for the implementation in a binaural hearing aid with downstream speech enhancement. Two binaural CASA localization algorithms, based on the Albani model, are proposed to enhance the localization performance in noisy and reverberant acoustic environments. The Albani model is extended with a zero-lag interaural coherence (IC) time window pre-selection, detection of multiple sources per time-window, coincidence detection between interaural level and time differences (ILD and ITD) and a lagged time window comparison, in the proposed extended Albani algorithm. A further addition to the proposed extended Albani algorithm with a binaural cue selector based on an inhibition process, is proposed in the extended Albani algorithm with cue selection by inhibition. Performed simulations show that the extended Albani algorithm performs the best in noisy situations up to a SNR level of -12 dB and the extended Albani algorithm with cue selection by inhibition performs the best in reverberant situations up to a reverberation time of 2.0 s. These proposed localization algorithms show a better performance than the present known CASA methods in both noise and reverberation.
Laboratory for Acoustical Imaging and Sound Control
Imaging Science & Technology
Applied Sciences
Advisors/Committee Members: Schlesinger, A. (mentor), Boone, M.M. (mentor).
Subjects/Keywords: Source localization; CASA; Binaural cue; Hearing aid; Speech intelligibility; Cocktail party problem; Inhibition; ILD; ITD; Noise; Reverberation; Albani
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Opdam, R. C. G. (. (2010). Binaural CASA algorithm for speech source localization: Advancements in noisy and reverberant situations. (Masters Thesis). Delft University of Technology. Retrieved from http://resolver.tudelft.nl/uuid:7b397f85-9573-4940-892f-bcf9ae739545
Chicago Manual of Style (16th Edition):
Opdam, R C G (author). “Binaural CASA algorithm for speech source localization: Advancements in noisy and reverberant situations.” 2010. Masters Thesis, Delft University of Technology. Accessed January 24, 2021.
http://resolver.tudelft.nl/uuid:7b397f85-9573-4940-892f-bcf9ae739545.
MLA Handbook (7th Edition):
Opdam, R C G (author). “Binaural CASA algorithm for speech source localization: Advancements in noisy and reverberant situations.” 2010. Web. 24 Jan 2021.
Vancouver:
Opdam RCG(. Binaural CASA algorithm for speech source localization: Advancements in noisy and reverberant situations. [Internet] [Masters thesis]. Delft University of Technology; 2010. [cited 2021 Jan 24].
Available from: http://resolver.tudelft.nl/uuid:7b397f85-9573-4940-892f-bcf9ae739545.
Council of Science Editors:
Opdam RCG(. Binaural CASA algorithm for speech source localization: Advancements in noisy and reverberant situations. [Masters Thesis]. Delft University of Technology; 2010. Available from: http://resolver.tudelft.nl/uuid:7b397f85-9573-4940-892f-bcf9ae739545
10.
Puvvada, Venkata Naga Krishna Chaitanya.
CORTICAL REPRESENTATION OF SPEECH IN COMPLEX AUDITORY ENVIRONMENTS AND APPLICATIONS.
Degree: Electrical Engineering, 2017, University of Maryland
URL: http://hdl.handle.net/1903/20412
► Being able to attend and recognize speech or a particular sound in complex listening environments is a feat performed by humans effortlessly. The underlying neural…
(more)
▼ Being able to attend and recognize speech or a particular sound in complex listening environments is a feat performed by humans effortlessly. The underlying neural mechanisms, however, remain unclear and cannot yet be emulated by artificial systems. Understanding the internal (cortical) representation of external acoustic world is a key step in deciphering the mechanisms of human auditory processing. Further, understanding neural representation of sound finds numerous applications in clinical research for psychiatric disorders with auditory processing deficits such as schizophrenia.
In the first part of this dissertation, cortical activity from normal hearing human subjects is recorded, non-invasively, using magnetoencephalography in two different real-life listening scenarios. First, when natural speech is distorted by reverberation as well as stationary additive noise. Second, when the attended speech is degraded by the presence of multiple additional talkers in the background, simulating a
cocktail party. Using natural speech affected by reverberation and noise, it was demonstrated that the auditory cortex maintains both distorted as well as distortion-free representations of speech. Additionally, we show that, while the neural representation of speech remained robust to additive noise in absence of reverberation, noise had detrimental effect in presence of reverberation, suggesting differential mechanisms of speech processing for additive and reverberation distortions. In the
cocktail party paradigm, we demonstrated that primary like areas represent the external auditory world in terms of acoustics, whereas higher-order areas maintained an object based representation. Further, it was demonstrated that background speech streams were represented as an unsegregated auditory object. The results suggest that object based representation of auditory scene emerge in higher-order auditory cortices.
In the second part of this dissertation, using electroencephalographic recordings from normal human subjects and patients suffering from schizophrenia, it was demonstrated, for the first time, that delta band steady state responses are more affected in schizophrenia patients compared with healthy individuals, contrary to the prevailing dominance of gamma band studies in literature. Further, the results from this study suggest that the inadequate ability to sustain neural responses in this low frequency range may play a vital role in auditory perceptual and cognitive deficit mechanisms in schizophrenia.
Overall this dissertation furthers current understanding of cortical representation of speech in complex listening environments and how auditory representation of sounds is affected in psychiatric disorders involving aberrant auditory processing.
Advisors/Committee Members: Simon, Jonathan Z (advisor).
Subjects/Keywords: Electrical engineering; Neurosciences; Acoustics; attention; cocktail party problem; neuroimaging; reverberation; schizophrenia; speech
…problem of estimating the
sources corresponding to EEG/MEG observations can be viewed as… …responsible for the observed measurements, the problem is further complicated in
imaging due to its… …ill-posed nature of the problem, a regularizer in the form of spatial prior
covariance…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Puvvada, V. N. K. C. (2017). CORTICAL REPRESENTATION OF SPEECH IN COMPLEX AUDITORY ENVIRONMENTS AND APPLICATIONS. (Thesis). University of Maryland. Retrieved from http://hdl.handle.net/1903/20412
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Chicago Manual of Style (16th Edition):
Puvvada, Venkata Naga Krishna Chaitanya. “CORTICAL REPRESENTATION OF SPEECH IN COMPLEX AUDITORY ENVIRONMENTS AND APPLICATIONS.” 2017. Thesis, University of Maryland. Accessed January 24, 2021.
http://hdl.handle.net/1903/20412.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
MLA Handbook (7th Edition):
Puvvada, Venkata Naga Krishna Chaitanya. “CORTICAL REPRESENTATION OF SPEECH IN COMPLEX AUDITORY ENVIRONMENTS AND APPLICATIONS.” 2017. Web. 24 Jan 2021.
Vancouver:
Puvvada VNKC. CORTICAL REPRESENTATION OF SPEECH IN COMPLEX AUDITORY ENVIRONMENTS AND APPLICATIONS. [Internet] [Thesis]. University of Maryland; 2017. [cited 2021 Jan 24].
Available from: http://hdl.handle.net/1903/20412.
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
Council of Science Editors:
Puvvada VNKC. CORTICAL REPRESENTATION OF SPEECH IN COMPLEX AUDITORY ENVIRONMENTS AND APPLICATIONS. [Thesis]. University of Maryland; 2017. Available from: http://hdl.handle.net/1903/20412
Note: this citation may be lacking information needed for this citation format:
Not specified: Masters Thesis or Doctoral Dissertation
11.
Meléndez, Alejandro Vélez.
Acoustic communication in noisy environments:Signal recognition in fluctuating backgrounds.
Degree: PhD, Ecology, Evolution and Behavior, 2012, University of Minnesota
URL: http://purl.umn.edu/127314
► Following one conversation in multi-talker environments is a difficult perceptual task that we encounter frequently. How the human auditory system solves this problem has been…
(more)
▼ Following one conversation in multi-talker environments is a difficult perceptual task that we encounter frequently. How the human auditory system solves this problem has been the focus of research for decades. While many nonhuman animals also communicate in noisy social aggregations, we know very little about how they solve analogous problems. Dip listening, an ability to catch `acoustic glimpses' of target signals when the level of fluctuating backgrounds momentarily drops, represents one way by which receivers may recognize signals in noise. It has even been suggested that animals may be adapted to exploit level fluctuations of the natural soundscape (i.e., the mixture of sounds in the environment) to recognize communication signals. This hypothesis, however, is not yet supported by empirical evidence because (i) we know little about the characteristics of level fluctuations in natural soundscapes, (ii) very few studies have investigated the ability of nonhuman animals to recognize communication signals in fluctuating backgrounds, and (iii) no study has investigated signal recognition in the presence of noises with level fluctuations of natural soundscapes. I addressed these gaps in knowledge using gray treefrogs (Hyla chrysoscelis) and green treefrogs (Hyla cinerea) as model systems. I found that level fluctuations of the noise generated in social aggregations vary across species. I also show that gray treefrogs, but not green treefrogs, have an ability to listen in the dips of fluctuating backgrounds when recognizing communication signals. This ability, however, is not specifically `tuned' to exploit level fluctuations of natural soundscapes. Together, my findings offer little support for the hypothesis that receivers are adapted to exploit level fluctuations of the natural soundscape to recognize communication signals.
Subjects/Keywords: Animal communication; Cocktail party problem; Cope's gray treefrogs; Green treefrogs; Hyla chrysoscelis; Hyla cinerea; Ecology, Evolution and Behavior
…solves
this so-called ‘cocktail-party problem’ (Cherry 1953) has been the focus of… …constitutes one way by which humans
solve the cocktail-party problem. Speech perception is usually… …aggregations. My research complements that on the human cocktail
party problem by adding an… …acoustically. Despite
similarities with the so-called “cocktail-party problem” in humans, few studies… …Slabbekoorn 2005). Despite similarity with the human
“cocktail party problem,” a phenomenon…
Record Details
Similar Records
Cite
Share »
Record Details
Similar Records
Cite
« Share





❌
APA ·
Chicago ·
MLA ·
Vancouver ·
CSE |
Export
to Zotero / EndNote / Reference
Manager
APA (6th Edition):
Meléndez, A. V. (2012). Acoustic communication in noisy environments:Signal recognition in fluctuating backgrounds. (Doctoral Dissertation). University of Minnesota. Retrieved from http://purl.umn.edu/127314
Chicago Manual of Style (16th Edition):
Meléndez, Alejandro Vélez. “Acoustic communication in noisy environments:Signal recognition in fluctuating backgrounds.” 2012. Doctoral Dissertation, University of Minnesota. Accessed January 24, 2021.
http://purl.umn.edu/127314.
MLA Handbook (7th Edition):
Meléndez, Alejandro Vélez. “Acoustic communication in noisy environments:Signal recognition in fluctuating backgrounds.” 2012. Web. 24 Jan 2021.
Vancouver:
Meléndez AV. Acoustic communication in noisy environments:Signal recognition in fluctuating backgrounds. [Internet] [Doctoral dissertation]. University of Minnesota; 2012. [cited 2021 Jan 24].
Available from: http://purl.umn.edu/127314.
Council of Science Editors:
Meléndez AV. Acoustic communication in noisy environments:Signal recognition in fluctuating backgrounds. [Doctoral Dissertation]. University of Minnesota; 2012. Available from: http://purl.umn.edu/127314
.