Philip N. Garner: Publications

[1] Mutian He and Philip N. Garner. The interpreter understands your meaning: End-to-end spoken language understanding aided by speech translation. In Findings of the Association for Computational Linguistics: EMNLP 2023, Singapore, December 2023. To appear. [ bib | arXiv ]
[2] Pavel Korshunov, Haolin Chen, Philip N. Garner, and S├ębastien Marcel. Vulnerability of automatic identity recognition to audio-visual deepfakes. In IEEE International Joint Conference on Biometrics, Ljubljana, Slovenia, September 2023. Best poster award recipient. [ bib ]
[3] Mutian He and Philip N. Garner. Can ChatGPT detect intent? Evaluating large language models for spoken language understanding. In Proceedings of Interspeech, pages 1109--1113, Dublin, Ireland, August 2023. [ bib | DOI | arXiv ]
[4] Haolin Chen and Philip N. Garner. Diffusion transformer for adaptive text-to-speech. In Proceedings of the 12th ISCA Speech Synthesis Workshop, pages 157--162, Grenoble, France, August 2023. [ bib | DOI ]
[5] Haolin Chen, Mutian He, Louise Coppieters de Gibson, and Philip N. Garner. The Idiap speech synthesis system for the Blizzard Challenge 2023. In Proceedings of the 18th Blizzard Challenge Workshop, pages 93--97, Grenoble, France, August 2023. [ bib | DOI ]
[6] Ulrich Thy Jensen, Dominic Rohner, Olivier Bornet, Daniel Carron, Philip Garner, Dimitra Loupi, and John Antonakis. Combating COVID-19 with charisma: Evidence on governor speeches in the United States. The Leadership Quarterly, 2023. In press. [ bib | DOI ]
[7] Alexandre Bittar and Philip N. Garner. Surrogate gradient spiking neural networks as encoders for large vocabulary continuous speech recognition, December 2022. [ bib | arXiv ]
[8] Alexandre Bittar and Philip N. Garner. Bayesian recurrent units and the forward-backward algorithm. In Proceedings of Interspeech, Incheon, Korea, September 2022. [ bib | DOI ]
[9] Louise Coppieters de Gibson and Philip N. Garner. Low-level physiological implications of end-to-end learning for speech recognition. In Proceedings of Interspeech, Incheon, Korea, September 2022. [ bib | DOI ]
[10] Alexandre Bittar and Philip N. Garner. A surrogate gradient spiking baseline for speech command recognition. Frontiers in Neuroscience, 16, August 2022. [ bib | DOI ]
[11] Julian Linke, Philip N. Garner, Gernot Kubin, and Barbara Schuppler. Conversational speech recognition needs data? Experiments with Austrian German. In Proceedings of the 13th Language Resources and Evaluation Conference, pages 4684--4691, Marseille, France, June 2022. [ bib | .pdf ]
[12] Bastian Schnell and Philip N. Garner. Investigating a neural all pass warp in modern TTS applications. Speech Communication, 138:26--37, March 2022. Open Access. [ bib | DOI ]
[13] Abbas Khosravani, Philip N. Garner, and Alexandros Lazaridis. Learning to translate low-resourced Swiss German dialectal speech into standard German text. In Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Cartagena, Colombia, December 2021. [ bib | DOI | .pdf ]
[14] Abbas Khosravani, Philip N. Garner, and Alexandros Lazaridis. An evaluation benchmark for automatic speech recognition of German-English code-switching. In Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Cartagena, Colombia, December 2021. [ bib | DOI | .pdf ]
[15] Abbas Khosravani, Philip N. Garner, and Alexandros Lazaridis. Modeling dialectal variation for Swiss German automatic speech recognition. In Proceedings of Interspeech, Brno, Czechia, September 2021. [ bib | DOI ]
[16] Philip N. Garner and Sibo Tong. A Bayesian approach to recurrence in neural networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 43(8):2527--2537, August 2021. [ bib | DOI | arXiv ]
[17] Bastian Schnell and Philip N. Garner. Improving emotional TTS with an emotion intensity input from unsupervised extraction. In Proceedings of the 11th ISCA Speech Synthesis Workshop, pages 60--65, Hungary, August 2021. [ bib | DOI ]
[18] Alexandre Bittar and Philip N. Garner. A Bayesian interpretation of the light gated recurrent unit. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Toronto, Canada, June 2021. [ bib | DOI | .pdf ]
[19] Niccolò Antonello and Philip N. Garner. A t-distribution based operator for enhancing out of distribution robustness of neural network classifiers. IEEE Signal Processing Letters, 27:1070--1074, June 2020. [ bib | DOI | arXiv ]
[20] Lorenzo Tarantino, Philip N. Garner, and Alexandros Lazaridis. Self-attention for speech emotion recognition. In Proceedings of Interspeech, pages 2578--2582, Graz, Austria, September 2019. [ bib | DOI ]
[21] Sibo Tong, Apoorv Vyas, Philip N. Garner, and Hervé Bourlard. Unbiased semi-supervised LF-MMI training using dropout. In Proceedings of Interspeech, pages 1576--1580, Graz, Austria, September 2019. [ bib | DOI ]
[22] Bastian Schnell and Philip N. Garner. Neural VTLN for speaker adaptation in TTS. In Proceedings of the 10th ISCA Speech Synthesis Workshop, pages 29--34, Vienna, Austria, September 2019. [ bib | DOI ]
[23] Philip N. Garner, Olivier Bornet, Dimitra Loupi, Dominic Rohner, and John Antonakis. Deep learning of charisma. Presented as a demonstration at SwissText 2019, the Fourth Swiss Text Analytics Conference, June 2019. Winterthur, Switzerland. [ bib ]
[24] François Marelli, Bastian Schnell, Hervé Bourlard, Thierry Dutoit, and Philip N. Garner. An end-to-end network to synthesize intonation using a generalized command response model. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, May 2019. [ bib | DOI | .pdf ]
[25] Alexandre Nanchen and Philip N. Garner. Empirical evaluation and combination of punctuation prediction models applied to broadcast news. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, May 2019. [ bib | DOI | .pdf ]
[26] Sibo Tong, Philip N. Garner, and Hervé Bourlard. An investigation of multilingual ASR using end-to-end LF-MMI. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brighton, UK, May 2019. [ bib | DOI | .pdf ]
[27] Gaetan Ramet, Philip N. Garner, Michael Baeriswyl, and Alexandros Lazaridis. Context-aware attention mechanism for speech emotion recognition. In IEEE Workshop on Spoken Language Technology, Athens, Greece, December 2018. [ bib | DOI | .pdf ]
[28] Sibo Tong, Philip N. Garner, and Hervé Bourlard. Multilingual training and cross-lingual adaptation on CTC-based acoustic model. Speech Communication, 104:39--46, November 2018. [ bib | DOI | arXiv | .pdf ]
[29] Branislav Gerazov, Gérard Bailly, Omar Mohammed, Yi Xu, and Philip N. Garner. Embedding context-dependent variations of prosodic contours using variational encoding for decomposing the structure of speech prosody. In Workshop on Prosody and Meaning: Information Structure and Beyond, Aix-en-Provence, France, November 2018. ProMAix. [ bib | .pdf ]
[30] Bastian Schnell and Philip N. Garner. A neural model to predict parameters for a generalized command response model of intonation. In Proceedings of Interspeech, pages 3147--3151, Hyderabad, India, September 2018. [ bib | DOI ]
[31] Bastian Schnell and Philip N. Garner. A neural model to predict parameters for a generalized command response model of intonation. Presented at the 5th Machine Learning in Speech and Language Processing Workshop (MLSLP-2018) Satellite workshop of Interspeech 2018, Hyderabad, India, September 2018. [ bib | .pdf ]
[32] Sibo Tong, Philip N. Garner, and Hervé Bourlard. Fast language adaptation using phonological information. In Proceedings of Interspeech, pages 2459--2463, Hyderabad, India, September 2018. [ bib | DOI ]
[33] Pierre-Edouard Honnet, Branislav Gerazov, Aleksandar Gjoreski, and Philip N. Garner. Intonation modelling using a muscle model and perceptually weighted matching pursuit. Speech Communication, 97:81--93, March 2018. [ bib | DOI | .pdf ]
[34] Sibo Tong, Philip N. Garner, and Hervé Bourlard. An investigation of deep neural networks for multilingual speech recognition training and adaptation. In Proceedings of Interspeech, pages 714--718, Stockholm, Sweden, August 2017. [ bib | DOI | .pdf ]
[35] Renars Liepins, Ulrich Germann, Guntis Barzdins, Alexandra Birch, Steve Renals, Susanne Weber, Peggy van der Kreeft, Herve Bourlard, João Prieto, Ondrej Klejch, Peter Bell, Alexandros Lazaridis, Alfonso Mendes, Sebastian Riedel, Mariana S. C. Almeida, Pedro Balage, Shay B. Cohen, Tomasz Dwojak, Philip N. Garner, Andreas Giefer, Marcin Junczys-Dowmunt, Hina Imran, David Nogueira, Ahmed Ali, Sebastião Miranda, Andrei Popescu-Belis, Lesly Miculicich Werlen, Nikos Papasarantopoulos, Abiola Obamuyide, Clive Jones, Fahim Dalvi, Andreas Vlachos, Yang Wang, Sibo Tong, Rico Sennrich, Nikolaos Pappas, Shashi Narayan, Marco Damonte, Nadir Durrani, Sameer Khurana, Ahmed Abdelali, Hassan Sajjad, Stephan Vogel, David Sheppey, Chris Hernon, and Jeff Mitchell. The SUMMA platform prototype. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, pages 116--119, Valencia, Spain, April 2017. [ bib | http ]
[36] Milos Cernak, Alexandros Lazaridis, Afsaneh Asaei, and Philip N. Garner. Composition of deep and spiking neural networks for very low bit rate speech coding. IEEE/ACM Transactions on Audio, Speech and Language Processing, 24(12):2301--2312, December 2016. [ bib | DOI | arXiv | .pdf ]
[37] Alexandros Lazaridis, Ivan Himawan, Petr Motlicek, Iosif Mporas, and Philip N. Garner. Investigating cross-lingual multi-level adaptive networks: The importance of the correlation of source and target languages. In Proceedings of the International Workshop on Spoken Language Translation, Seattle, WA, USA, December 2016. [ bib | http ]
[38] Milos Cernak and Philip N. Garner. Phonvoc: A phonetic and phonological vocoding toolkit. In Proceedings of Interspeech, San Francisco, California, USA, September 2016. [ bib | DOI | .pdf ]
[39] Alexandros Lazaridis, Milos Cernak, and Philip N. Garner. Probabilistic amplitude demodulation features in speech synthesis for improving prosody. In Proceedings of Interspeech, San Francisco, California, USA, September 2016. [ bib | DOI | .pdf ]
[40] Milos Cernak, Afsaneh Asaei, Pierre-Edouard Honnet, Philip N. Garner, and Hervé Bourlard. Sound pattern matching for automatic prosodic event detection. In Proceedings of Interspeech, San Francisco, California, USA, September 2016. [ bib | DOI | .pdf ]
[41] Jean-Philippe Goldman, Pierre-Edouard Honnet, Robert Clark, Philip N. Garner, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Tiago Macedo, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, and Junichi Yamagishi. The SIWIS database: a multilingual speech database with acted emphasis. In Proceedings of Interspeech, San Francisco, California, USA, September 2016. [ bib | DOI | .pdf ]
[42] Alexandros Lazaridis, Milos Cernak, Pierre-Edouard Honnet, and Philip N. Garner. Investigating spectral amplitude modulation phase hierarchy features in speech synthesis. In Proceedings of the 9th ISCA Speech Synthesis Workshop, Sunnyvale, CA, USA, September 2016. [ bib | DOI | .pdf ]
[43] Pierre-Edouard Honnet and Philip N. Garner. Emphasis recreation for TTS using intonation atoms. In Proceedings of the 9th ISCA Speech Synthesis Workshop, Sunnyvale, CA, USA, September 2016. [ bib | DOI | .pdf ]
[44] Branislav Gerazov, Aleksandar Gjoreski, Aleksandar Melov, Pierre-Edouard Honnet, Zoran Ivanovski, and Philip N. Garner. Unified prosody model based on atom decomposition for emphasis detection. In Proceedings of ETAI, Struga, Macedonia, September 2016. [ bib | .pdf ]
[45] Branislav Gerazov and Philip N. Garner. An agonist-antagonist pitch production model. In Andrey Ronzhin, Rodmonga Potapova, and Géza Németh, editors, Speech and Computer, volume 9811 of Lecture Notes in Artificial Intelligence, pages 84--91. Springer International Publishing, Budapest, Hungary, August 2016. 18th International Conference, SPECOM 2016. [ bib | DOI | .pdf ]
[46] Milan Sečujski, Branislav Gerazov, Tamás Gábor Csapó, Vlado Delić, Philip N. Garner, Aleksandar Gjoreski, David Guennec, Zoran Ivanovski, Aleksandar Melov, Géza Németh, Ana Stojković, and György Szaszák. Design of a speech corpus for research on cross-lingual prosody transfer. In Andrey Ronzhin, Rodmonga Potapova, and Géza Németh, editors, Speech and Computer, volume 9811 of Lecture Notes in Artificial Intelligence, pages 199--206. Springer International Publishing, Budapest, Hungary, August 2016. 18th International Conference, SPECOM 2016. [ bib | DOI | .pdf ]
[47] Tamás Gábor Csapó, Géza Németh, Milos Cernak, and Philip N. Garner. Modeling unvoiced sounds in statistical parametric speech synthesis with a continuous vocoder. In Proceedings of the European Signal Processing Conference, Budapest, Hungary, August 2016. [ bib | DOI | .pdf ]
[48] Branislav Gerazov and Philip N. Garner. An investigation of muscle models for physiologically based intonation modelling. In Proceedings of the 23rd Telecommunications Forum, pages 468--471, Belgrade, Serbia, November 2015. [ bib | DOI | .pdf ]
[49] Branislav Gerazov, Pierre-Edouard Honnet, Aleksandar Gjoreski, and Philip N. Garner. Weighted correlation based atom decomposition intonation modelling. In Proceedings of Interspeech, pages 1601--1605, Dresden, Germany, September 2015. [ bib | .pdf ]
[50] Alexandros Lazaridis, Blaise Potard, and Philip N. Garner. DNN-based speech synthesis: Importance of input features and training data. In Andrey Ronzhin, Rodmonga Potapova, and Nikos Fakotakis, editors, Speech and Computer, volume 9319 of Lecture Notes in Computer Science, pages 193--200. Springer International Publishing, Athens, Greece, September 2015. 17th International Conference, SPECOM 2015. [ bib | DOI | .pdf ]
[51] Mohammad J. Taghizadeh, Afsaneh Asaei, Saeid Haghighatshoar, Philip N. Garner, and Hervé Bourlard. Spatial sound localization via multipath Euclidean distance matrix recovery. IEEE Journal of Selected Topics in Signal Processing, 9(5):802--814, August 2015. Issue on Spatial Audio. [ bib | DOI | .pdf ]
[52] Milos Cernak, Philip N. Garner, Alexandros Lazaridis, Petr Motlicek, and Xingyu Na. Incremental syllable-context phonetic vocoding. IEEE Transactions on Audio, Speech and Language Processing, 23(6):1019--1030, June 2015. [ bib | DOI | .pdf ]
[53] Pierre-Edouard Honnet, Branislav Gerazov, and Philip N. Garner. Atom decomposition-based intonation modelling. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4744--4748, Brisbane, Australia, April 2015. [ bib | DOI | .pdf ]
[54] Milos Cernak, Blaise Potard, and Philip N. Garner. Phonological vocoding using artificial neural networks'. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4844--4848, Brisbane, Australia, April 2015. [ bib | DOI | .pdf ]
[55] Mohammad J. Taghizadeh, Saeid Haghighatshoar, Afsaneh Asaei, Philip N. Garner, and Hervé Bourlard. Robust microphone placement for source localization from noisy distance measurements. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, April 2015. [ bib | DOI | .pdf ]
[56] Mohammad J. Taghizadeh, Reza Parhizkar, Philip N. Garner, Hervé Bourlard, and Afsaneh Asaei. Ad hoc microphone array calibration: Euclidean distance matrix completion algorithm and theoretical guarantees. Signal Processing, 107:123--140, February 2015. Special Issue on ad hoc microphone arrays and wireless acoustic sensor networks. [ bib | DOI | arXiv | .pdf ]
[57] Petr Motlicek, David Imseng, Blaise Potard, Philip N. Garner, and Ivan Himawan. Exploiting foreign resources for DNN-based ASR. EURASIP Journal on Audio, Speech, and Music Processing, 2015(17), 2015. [ bib | DOI | .pdf ]
[58] György Szaszák, Tamás Gábor Csapó, Philip N. Garner, Branislav Gerazov, Zoran Ivanovski, Géza Németh, Bálint Tóth, Milan Sečujski, and Vlado Delić. The SP2 SCOPES project on speech prosody. In DOGS2014 - Digital speech and image processing, Novi Sad, Serbia, October 2014. [ bib | .pdf ]
[59] Philip N. Garner, Rob Clark, Jean-Philippe Goldman, Pierre-Edouard Honnet, Maria Ivanova, Alexandros Lazaridis, Hui Liang, Beat Pfister, Manuel Sam Ribeiro, Eric Wehrli, and Junichi Yamagishi. Translation and prosody in Swiss languages. Nouveaux cahiers de linguistique française, 31, September 2014. 3rd Swiss Workshop on Prosody, Geneva, September 2014. [ bib | .pdf ]
[60] Alexandros Lazaridis and Philip N. Garner. Syllable-based regional Swiss French accent identification using prosodic features. Nouveaux cahiers de linguistique française, 31, September 2014. 3rd Swiss Workshop on Prosody, Geneva, September 2014. [ bib | .pdf ]
[61] Pierre-Edouard Honnet and Philip N. Garner. Importance of prosody in Swiss French accent for speech synthesis. Nouveaux cahiers de linguistique française, 31, September 2014. 3rd Swiss Workshop on Prosody, Geneva, September 2014. [ bib | .pdf ]
[62] Philip N. Garner, David Imseng, and Thomas Meyer. Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch. In Proceedings of Interspeech, Singapore, September 2014. [ bib | .pdf ]
[63] Milos Cernak, Alexandros Lazaridis, Philip N. Garner, and Petr Motlicek. Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding. In Proceedings of Interspeech, Singapore, September 2014. [ bib | .pdf ]
[64] Mohammad J. Taghizadeh, Philip N. Garner, and Hervé Bourlard. Enhanced diffuse field model for ad-hoc microphone array calibration. Signal Processing, 101:242--255, August 2014. [ bib | DOI | .pdf ]
[65] Alexandros Lazaridis, Elie Khoury, Jean-Philippe Goldman, Mathieu Avanzi, Sébastien Marcel, and Philip N. Garner. Swiss French regional accent identification. In Proceedings of Odyssey 2014: The Speaker and Language Recognition Workshop, Joensuu, Finland, June 2014. [ bib | .pdf ]
[66] Pierre-Edouard Honnet, Alexandros Lazaridis, Jean-Philippe Goldman, and Philip N. Garner. Prosody in Swiss French accents: Investigation using analysis by synthesis. In Proceedings of the 7th Speech Prosody Conference, Dublin, Ireland, May 2014. [ bib | .pdf ]
[67] Alexandros Lazaridis, Pierre-Edouard Honnet, and Philip N. Garner. SVR vs MLP for phone duration modelling in HMM-based speech synthesis. In Proceedings of the 7th Speech Prosody Conference, Dublin, Ireland, May 2014. [ bib | .pdf ]
[68] Mohammad J. Taghizadeh, Afsaneh Asaei, Philip N. Garner, and Hervé Bourlard. Ad-hoc microphone array calibration from partial distance measurements. In Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA 2014), Nancy, France, May 2014. Nominated for best paper award. [ bib | DOI | .pdf ]
[69] Lakshmi Saheer, Junichi Yamagishi, Philip N. Garner, and John Dines. Combining vocal tract length normalization with hierarchical linear transformations. IEEE Journal of Selected Topics in Signal Processing, 8(2):262--272, April 2014. Special Issue on Statistical Parametric Speech Synthesis. [ bib | DOI | .pdf ]
[70] David Imseng, Petr Motlicek, Hervé Bourlard, and Philip N. Garner. Using out-of-language data to improve an under-resourced speech recognizer. Speech Communication, 56:142--151, January 2014. [ bib | DOI | .pdf ]
[71] Steve Renals, Jean Carletta, Keith Edwards, Hervé Bourlard, Phil Garner, Andrei Popescu-Belis, Dietrich Klakow, Andrey Girenko, Volha Petukova, Philippe Wacker, Andrew Joscelyne, Costis Kompis, Simon Aliwell, William Stevens, and Youssef Sabbah. ROCKIT: Roadmap for conversational interaction technologies. In Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research Including Business Opportunities and Challenges, pages 39--42, Istanbul, Turkey, 2014. ACM. [ bib ]
[72] David Imseng, Petr Motlicek, Philip N. Garner, and Hervé Bourlard. Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition. In Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, pages 332--337, Olomouc, Czech Republic, December 2013. [ bib | DOI | .pdf ]
[73] György Szaszák and Philip N. Garner. Evaluating intra- and crosslingual adaptation for non-native speech recognition in a bilingual environment. In Proceedings of the IEEE International Conference on Cognitive Infocommunications, pages 357--362, Budapest, Hungary, December 2013. [ bib | DOI | .pdf ]
[74] David Imseng, Hervé Bourlard, John Dines, Philip N. Garner, and Mathew Magimai.-Doss. Applying multi- and cross-lingual stochastic phone space transformations to non-native speech recognition. IEEE Transactions on Audio, Speech and Language Processing, 21(8):1713--1726, August 2013. [ bib | DOI | http ]
[75] Milos Cernak, Xingyu Na, and Philip N. Garner. Syllable-based pitch encoding for low bit rate speech coding with recognition/synthesis architecture. In Proceedings of Interspeech, Lyon, France, August 2013. [ bib | DOI | .pdf ]
[76] Petr Motlicek, David Imseng, and Philip N. Garner. Crosslingual tandem-SGMM: Exploiting out-of-language data for acoustic model and feature level adaptation. In Proceedings of Interspeech, Lyon, France, August 2013. [ bib | DOI | .pdf ]
[77] Mohammad J. Taghizadeh, Reza Parhizkar, Philip N. Garner, and Hervé Bourlard. Euclidean distance matrix completion for ad-hoc microphone array calibration. In Proceedings IEEE International Conference On Digital Signal Processing, pages 1--7, Santorini, Greece, July 2013. [ bib | DOI | .pdf ]
[78] Milos Cernak, Petr Motlicek, and Philip N. Garner. On the (un)importance of the contextual factors in HMM-based speech synthesis and coding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 8140--8143, Vancouver, Canada, May 2013. [ bib | DOI | .pdf ]
[79] Petr Motlicek, Philip N. Garner, Namhoon Kim, and Jeongmi Cho. Accent adaptation using subspace Gaussian mixture models. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 7170--7174, Vancouver, Canada, May 2013. [ bib | DOI | .pdf ]
[80] Philip N. Garner, Milos Cernak, and Petr Motlicek. A simple continuous pitch estimation algorithm. IEEE Signal Processing Letters, 20(1):102--105, January 2013. [ bib | DOI | .pdf ]
[81] David Imseng, Hervé Bourlard, Holger Caesar, Philip N. Garner, Gwénolé Lecorvé, and Alexandre Nanchen. Mediaparl: Bilingual mixed language accented speech database. In IEEE Workshop on Spoken Language Technology, pages 263--268, Miami, Florida, USA, December 2012. [ bib | DOI | .pdf ]
[82] Cong-Thanh Do, Mohammad J. Taghizadeh, and Philip N. Garner. Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition. In IEEE Workshop on Spoken Language Technology, pages 137--142, Miami, Florida, USA, December 2012. [ bib | DOI | .pdf ]
[83] Lakshmi Saheer, John Dines, and Philip N. Garner. Vocal tract length normalization for statistical parametric speech synthesis. IEEE Transactions on Audio, Speech and Language Processing, 20(7):2134--2148, September 2012. [ bib | DOI | .pdf ]
[84] David Imseng, John Dines, Petr Motlicek, Philip N. Garner, and Hervé Bourlard. Comparing different acoustic modeling techniques for multilingual boosting. In Proceedings of Interspeech, Portland, Oregon, September 2012. [ bib | DOI | .pdf ]
[85] Mohammad J. Taghizadeh, Philip N. Garner, and Hervé Bourlard. Microphone array beampattern characterization for hands-free speech applications. In Proceedings of the Seventh IEEE Sensor Array and Multichannel Signal Processing Workshop, pages 465--468, Hoboken, NJ, USA, June 2012. [ bib | DOI | .pdf ]
[86] David Imseng, Hervé Bourlard, and Philip N. Garner. Boosting under-resourced speech recognizers by exploiting out of language data - case study on Afrikaans. In Proceedings of the 3rd International Workshop on Spoken Languages Technologies for Under-resourced Languages, pages 60--67, Cape Town, South Africa, May 2012. [ bib | .pdf ]
[87] Lakshmi Saheer, Junichi Yamagishi, Philip N. Garner, and John Dines. Combining vocal tract length normalization with hierarchial linear transformations. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4493--4496, Kyoto, Japan, March 2012. [ bib | DOI | .pdf ]
[88] David Imseng, Hervé Bourlard, and Philip N. Garner. Using KL-divergence and multilingual information to improve ASR for under-resourced languages. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4869--4872, Kyoto, Japan, March 2012. [ bib | DOI | .pdf ]
[89] Thomas Hain, Lukáš Burget, John Dines, Philip N. Garner, František Grézl, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiát, Mike Lincoln, and Vincent Wan. Transcribing meetings with the AMIDA systems. IEEE Transactions on Audio, Speech and Language Processing, 20(2):486--498, February 2012. [ bib | DOI | .pdf ]
[90] Philip N. Garner. Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition. Speech Communication, 53(8):991--1001, October 2011. [ bib | DOI | .pdf ]
[91] Hervé Bourlard, John Dines, Mathew Magimai-Doss, Philip N. Garner, David Imseng, Petr Motlicek, Hui Liang, Lakshmi Saheer, and Fabio Valente. Current trends in multilingual speech processing. Sādhanā, 36(5):885--915, October 2011. Invited paper for special issue on the topic of Speech Communication and Signal Processing. [ bib | DOI | .pdf ]
[92] Philip N. Garner. Bayesian Approaches to Uncertainty in Speech Processing. Phd by publication, School of Computing Sciences, University of East Anglia, September 2011. Awarded July 2012. [ bib | .pdf ]
[93] David Imseng, Hervé Bourlard, John Dines, Philip N. Garner, and Mathew Magimai.-Doss. Improving non-native ASR through stochastic multilingual phoneme space transformations. In Proceedings of Interspeech, Florence, Italy, August 2011. [ bib | DOI | .pdf ]
[94] Andrei Popescu-Belis, Majid Yazdani, Alexandre Nanchen, and Philip N. Garner. A speech-based just-in-time retrieval system using semantic search. In Proceedings of the ACL 2011 System Demonstrations, pages 80--86, Portland, OR, USA, June 2011. [ bib | http ]
[95] Andrei Popescu-Belis, Majid Yazdani, Alexandre Nanchen, and Philip N. Garner. A just-in-time retrieval system for dialogues or monologues. In Proceedings of the 12th Annual SIGDial Meeting on Discourse and Dialogue, pages 350--352, Portland, OR, USA, June 2011. [ bib | .pdf ]
[96] Mohammad J. Taghizadeh, Philip N. Garner, Hervé Bourlard, Hamid R. Abutalebi, and Asaei Afsaneh. An integrated framework for multi-channel multi-source localization and voice activity detection. In Proceedings of The Third Joint Workshop on Hands-free Speech Communication and Microphone Arrays, pages 92--97, Edinburgh, UK, May 2011. [ bib | DOI | .pdf ]
[97] Thomas Hain and Philip N. Garner. Speech recognition. In Steve Renals, Hervé Bourlard, Jean Carletta, and Andrei Popescu-Belis, editors, Multimodal Signal Processing: Human Interactions in Meetings, chapter 5. Cambridge University Press, The Edinburgh Building, Cambridge CB2 2RU, UK, 2011. [ bib ]
[98] Afsaneh Asaei, Philip N. Garner, and Hervé Bourlard. Sparse component analysis for speech recognition in multi-speaker environment. In Proceedings of Interspeech, Makuhari, Japan, September 2010. [ bib | DOI | .pdf ]
[99] Petr Motlicek, Fabio Valente, and Philip N. Garner. English spoken term detection in multilingual recordings. In Proceedings of Interspeech, Makuhari, Japan, September 2010. [ bib | DOI | .pdf ]
[100] Danil Korchagin, Philip N. Garner, and Petr Motlicek. Hands free audio analysis from home entertainment. In Proceedings of Interspeech, Makuhari, Japan, September 2010. [ bib | DOI | .pdf ]
[101] Thomas Hain, Lukas Burget, John Dines, Philip N. Garner, Asmaa El Hannani, Marijn Huijbregts, Martin Karafiat, Mike Lincoln, and Vincent Wan. The AMIDA 2009 meeting transcription system. In Proceedings of Interspeech, Makuhari, Japan, September 2010. [ bib | DOI | .pdf ]
[102] Philip N. Garner and John Dines. Tracter: A lightweight dataflow framework. In Proceedings of Interspeech, Makuhari, Japan, September 2010. [ bib | DOI | .pdf ]
[103] Lakshmi Saheer, John Dines, Philip N. Garner, and Hui Liang. Implementation of VTLN for statistical speech synthesis. In Proceedings of the 7th ISCA Speech Synthesis Workshop, Kyoto, Japan, September 2010. [ bib | .pdf ]
[104] Mirjam Wester, John Dines, Matthew Gibson, Hui Liang, Yi-Jian Wu, Lakshmi Saheer, Simon King, Keiichiro Oura, Philip N. Garner, William Byrne, Yong Guan, Teemu Hirsimäki, Reima Karhila, Mikko Kurimo, Matt Shannon, Sayaka Shiota, Jilei Tian, Keiichi Tokuda, and Junichi Yamagishi. Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project. In Proceedings of the 7th ISCA Speech Synthesis Workshop, Kyoto, Japan, September 2010. [ bib | .pdf ]
[105] Mikko Kurimo, William Byrne, John Dines, Philip N. Garner, Matthew Gibson, Yong Guan, Teemu Hirsimäki, Reima Karhila, Simon King, Hui Liang, Keiichiro Oura, Lakshmi Saheer, Matt Shannon, Sayaka Shiota, Jilei Tian, Keiichi Tokuda, Mirjam Wester, Yi-Jian Wu, and Junichi Yamagishi. Personalising speech-to-speech translation in the EMIME project. In Proceedings of the ACL 2010 System Demonstrations, pages 48--53, Uppsala, Sweden, July 2010. [ bib | .pdf ]
[106] Lakshmi Saheer, Philip N. Garner, John Dines, and Hui Liang. VTLN adaptation for statistical speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4838--4841, Dallas, USA, March 2010. [ bib | DOI | .pdf ]
[107] Danil Korchagin, Philip N. Garner, and John Dines. Automatic temporal alignment of AV data with confidence estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 269--272, Dallas, USA, March 2010. [ bib | DOI | .pdf ]
[108] Philip N. Garner. SNR features for automatic speech recognition. In Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, pages 182--187, Merano, Italy, December 2009. [ bib | DOI | .pdf ]
[109] Philip N. Garner, John Dines, Thomas Hain, Asmaa El Hannani, Martin Karafiát, Danil Korchagin, Mike Lincoln, Vincent Wan, and Le Zhang. Real-time ASR from meetings. In Proceedings of Interspeech, Brighton, UK, September 2009. [ bib | DOI | .pdf ]
[110] Kenichi Kumatani, John McDonough, Barbara Rauch, Dietrich Klakow, Philip N. Garner, and Weifeng Li. Beamforming with a maximum negentropy criterion. IEEE Transactions on Audio, Speech and Language Processing, 17(5):994--1008, July 2009. [ bib | DOI | .pdf ]
[111] Kenichi Kumatani, John McDonough, Barbara Rauch, Philip Garner, John Dines, and Weifeng Li. Maximum kurtosis beamforming with the generalized sidelobe canceller. In Proceedings of Interspeech, Brisbane, Australia., September 2008. [ bib | DOI | .pdf ]
[112] Philip N. Garner. Silence models in weighted finite-state transducers. In Proceedings of Interspeech, Brisbane, Australia., September 2008. [ bib | DOI | .pdf ]
[113] Kenichi Kumatani, John McDonough, Dietrich Klakow, Philip N. Garner, and Weifeng Li. Adaptive beamforming with a maximum negentropy criterion. In Proceedings of the Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA), pages 180--183, Italy, May 2008. [ bib | DOI | .pdf ]
[114] Kenichi Kumatani, John McDonough, Stefan Schacht, Dietrich Klakow, Philip Garner, and Weifeng Li. Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1608--1612, Las Vegas, April 2008. [ bib | DOI | .pdf ]
[115] Philip N. Garner, Toshiaki Fukada, and Yasuhiro Komori. A differential spectral voice activity detector. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pages 597--600, Montreal, May 2004. [ bib | DOI | .pdf ]
[116] Jason P. A. Charlesworth and Philip N. Garner. Spoken content. In B. S. Manjunath, Philippe Salembier, and Thomas Sikora, editors, Introduction to MPEG-7: Multimedia Content Description Interface, chapter 18, pages 299--316. John Wiley & Sons Ltd., July 2002. [ bib | .html ]
[117] Philip N. Garner and Adam T. Lindsay, editors. Information Technology - Multimedia Content Description Interface - Part 4: Audio. Number 15938-4:2002. ISO/IEC, 2002. International Standard. [ bib ]
[118] Jason P. A. Charlesworth and Philip N. Garner. SpokenContent representation in MPEG-7. IEEE Transactions on Circuits and Systems for Video Technology, 11(6):730--736, June 2001. Special Issue on MPEG-7. [ bib | DOI ]
[119] J. P. A Charlesworth and P. N. Garner. Spoken content metadata and MPEG-7. In Proceedings ACM Multimedia 2000 Workshops, pages 81--84, Marina Del Rey, California, November 2000. ACM, PO Box 11405, New York, NY 10286 1405. [ bib | .pdf ]
[120] Adam T. Lindsay, Savitha Srinivasan, Jason P. A. Charlesworth, Philip N. Garner, and Werner Kriechbaum. Representation and linking mechanisms for audio in MPEG-7. Signal Processing: Image Communication, 16(1--2):193--209, September 2000. [ bib | DOI | .pdf ]
[121] Andrew R. Webb and Philip N. Garner. A basis function approach to position estimation using microwave arrays. Applied Statistics, 48 part 2:197--209, 1999. [ bib | DOI | .pdf ]
[122] Philip N. Garner and Wendy J. Holmes. On the robust incorporation of formant features into hidden Markov models for automatic speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 1, pages 1--4, 1998. [ bib | DOI | .pdf ]
[123] Philip N. Garner. On topic identification and dialogue move recognition. Computer Speech and Language, 11(4):275--306, October 1997. [ bib | DOI | .pdf ]
[124] John N. Holmes, Wendy J. Holmes, and Philip N. Garner. Using formant frequencies in speech recognition. In Proceedings of EUROSPEECH, volume 4, pages 2083--2086, Rhodes, Greece, September 1997. [ bib | .pdf ]
[125] Philip N. Garner and Aidan Hemsworth. A keyword selection strategy for dialogue move recognition and multi-class topic identification. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 3, pages 1823--1826, April 1997. [ bib | DOI | .pdf ]
[126] Philip N. Garner, Sue R. Browning, Roger K. Moore, and Martin J. Russell. A theory of word frequencies and its application to dialogue move recognition. In Proceedings of the International Conference on Spoken Language Processing, pages 1880--1883, Philadelphia, PA, USA, October 1996. [ bib | .pdf ]
[127] Andrew R. Webb and Philip N. Garner. Source position estimation using radial basis functions. In Proceedings 13th International Conference on Pattern Recognition, volume IV, pages 3--7, Vienna, 1996. [ bib ]
[128] B. Steer, J. Kloske, P. Garner, L. LeBlanc, and S. Schock. Towards sonar based perception and modelling for unmanned untethered underwater vehicles. In Proceedings IEEE International Conference on Robotics and Automation, volume 2, pages 112--116, May 1993. [ bib | DOI | .pdf ]

This file was generated by bibtex2html 1.99.