« Crossroads of Speech and Language »

Schedule

Legend of Session Symbols:
B → Break, C → Check-In, G → General, R → Registration, S → Social,
K → Keynote, O → Oral, P → Poster, S&T → Show&Tell, SS → Special Session, T → Tutorial.

Sunday

Monday

Tuesday

Wednesday

Thursday

Tutorial 1-1[Sun-T-1-1]
Sunday, 15 September, Hall 1

Generative adversarial network and its applications to speech signal and natural language processing
Tutorial; 0900–1030
Hung-yi Lee (Department of Electrical Engineering, National Taiwan University), Yu Tsao (Research Center for Information Technology Innovation, Academia Sinica)

Tutorial 2-1[Sun-T-2-1]
Sunday, 15 September, Hall 12

Statistical voice conversion with direct waveform modeling
Tutorial; 0900–1030
Tomoki Toda (Information Technology Center, Nagoya University), Kazuhiro Kobayashi (Information Technology Center, Nagoya University), Tomoki Hayashi (Information Technology Center, Nagoya University)

Tutorial 3-1[Sun-T-3-1]
Sunday, 15 September, Hall 11

Neural machine translation
Tutorial; 0900–1030
Wolfgang Macherey (Google AI), Yuan Cao (Google AI)

Tutorial 4-1[Sun-T-4-1]
Sunday, 15 September, Hall 2

Biosignal-based speech processing: from silent speech to brain-computer interfaces
Tutorial; 0900–1030
Thomas Hueber (GIPSA-lab/CNRS, Université Grenoble Alpes), Christian Herff (School for Mental Health and Neuroscience, Maastricht University)

Coffee break in upper foyer[Sun-B-1]
Sunday, 15 September, Foyer

Coffee break in upper level 1
Break; 1030–1100

Tutorial 1-2[Sun-T-1-2]
Sunday, 15 September, Hall 1

Generative adversarial network and its applications to speech signal and natural language processing
Tutorial; 1100–1230
Hung-yi Lee (Department of Electrical Engineering, National Taiwan University), Yu Tsao (Research Center for Information Technology Innovation, Academia Sinica)

Tutorial 2-2[Sun-T-2-2]
Sunday, 15 September, Hall 12

Statistical voice conversion with direct waveform modeling
Tutorial; 1100–1230
Tomoki Toda (Information Technology Center, Nagoya University), Kazuhiro Kobayashi (Information Technology Center, Nagoya University), Tomoki Hayashi (Information Technology Center, Nagoya University)

Tutorial 3-2[Sun-T-3-2]
Sunday, 15 September, Hall 11

Neural machine translation
Tutorial; 1100–1230
Wolfgang Macherey (Google AI), Yuan Cao (Google AI)

Tutorial 4-2[Sun-T-4-2]
Sunday, 15 September, Hall 2

Biosignal-based speech processing: from silent speech to brain-computer interfaces
Tutorial; 1100–1230
Thomas Hueber (GIPSA-lab/CNRS, Université Grenoble Alpes), Christian Herff (School for Mental Health and Neuroscience, Maastricht University)

Lunch Break[Sun-B-2]
Sunday, 15 September,

Lunch break
Break; 1230–1400

Tutorial 5-1[Sun-T-5-1]
Sunday, 15 September, Hall 1

Generating adversarial examples for speech and speaker recognition and other systems
Tutorial; 1400–1530
Bhiksha Raj (School of Computer Science, Carnegie Mellon University), Joseph Keshet (Department of Computer Science, Bar-Ilan University)

Tutorial 6-1[Sun-T-6-1]
Sunday, 15 September, Hall 12

Advanced methods for neural end-to-end speech processing – unification, integration, and implementation
Tutorial; 1400–1530
Takaaki Hori (Mitsubishi Electric Research Laboratories), Tomoki Hayashi (Department of Information Science, Nagoya University), Shigeki Karita (NTT Communication Science Laboratories), Shinji Watanabe (Center for Language and Speech Processing, Johns Hopkins University)

Tutorial 7-1[Sun-T-7-1]
Sunday, 15 September, Hall 11

Modeling and deploying dialog systems from scratch using open-source tools
Tutorial; 1400–1530
Alexandros Papangelis (Uber AI), Piero Molino (Uber AI), Chandra Khatri (Uber AI)

Tutorial 8-1[Sun-T-8-1]
Sunday, 15 September, Hall 2

Microphone array signal processing and deep learning for speech enhancement – strong together
Tutorial; 1400–1530
Reinhold Haeb-Umbach (Department of Communications Engineering, Paderborn University), Tomohiro Nakatani (NTT Communication Science Laboratories)

Coffee break in upper foyer[Sun-B-3]
Sunday, 15 September, Foyer

Coffee break in upper level 1
Break; 1530–1600

Tutorial 5-2[Sun-T-5-2]
Sunday, 15 September, Hall 1

Generating adversarial examples for speech and speaker recognition and other systems
Tutorial; 1600–1730
Bhiksha Raj (School of Computer Science, Carnegie Mellon University), Joseph Keshet (Department of Computer Science, Bar-Ilan University)

Tutorial 6-2[Sun-T-6-2]
Sunday, 15 September, Hall 12

Advanced methods for neural end-to-end speech processing – unification, integration, and implementation
Tutorial; 1600–1730
Takaaki Hori (Mitsubishi Electric Research Laboratories), Tomoki Hayashi (Department of Information Science, Nagoya University), Shigeki Karita (NTT Communication Science Laboratories), Shinji Watanabe (Center for Language and Speech Processing, Johns Hopkins University)

Tutorial 7-2[Sun-T-7-2]
Sunday, 15 September, Hall 11

Modeling and deploying dialog systems from scratch using open-source tools
Tutorial; 1600–1730
Alexandros Papangelis (Uber AI), Piero Molino (Uber AI), Chandra Khatri (Uber AI)

Tutorial 8-2[Sun-T-8-2]
Sunday, 15 September, Hall 2

Microphone array signal processing and deep learning for speech enhancement – strong together
Tutorial; 1600–1730
Reinhold Haeb-Umbach (Department of Communications Engineering, Paderborn University), Tomohiro Nakatani (NTT Communication Science Laboratories)

Graz Old Town on Foot[Sun-S-1]
Sunday, 15 September, Foyer

Graz Old Town on Foot
Social; 1830–2030

Registration[Mon-R-1]
Monday, 16 September, Foyer

Registration
Registration; 0745–1830

Speaker Check-in[Mon-C]
Monday, 16 September, Room 8

Speaker Check-in
Check-In; 0800–1830

Opening Session[Mon-G-1]
Monday, 16 September, Main Hall

Opening Session
General; 0830–0930

ISCA Medal 2019 Keynote Speech[Mon-K-1]
Monday, 16 September, Main Hall

Keynote
Statistical approach to speech synthesis: past, present and future [More info]
Keynote; 0930–1030
Keiichi Tokuda (Nagoya Institute of Technology)

Coffee break in both exhibition foyers, lower and upper level 1[Mon-B-1]
Monday, 16 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 1030–1100

End-to-end Speech Recognition[Mon-O-1-1]
Monday, 16 September, Main Hall

Survey Talk
Survey Talk: Modeling in automatic speech recognition: Beyond hidden markov models [More info]
Survey Talk; 1100–1140
Ralf Schlüter (RWTH Aachen University)
Very Deep Self-attention Networks for End-to-End Speech Recognition
Oral; 1140–1200
Ngoc-Quan Pham (Karlsruhe Institute of Technology), Thai Son Nguyen (Karlsruhe Institute of Technology), Jan Niehues (Karlsruhe Institute of Technology), Markus Müller (Karlsruhe Institute of Technology), Alexander Waibel (Carnegie Mellon)
Jasper: An End-to-End Convolutional Neural Acoustic Model
Oral; 1200–1220
Jason Li (NVIDIA), Vitaly Lavrukhin (NVIDIA), Boris Ginsburg (NVIDIA), Ryan Leary (NVIDIA), Oleksii Kuchaiev (NVIDIA), Jonathan M. Cohen (NVIDIA), Huyen Nguyen (NVIDIA), Ravi Teja Gadde (New York University)
Unidirectional Neural Network Architectures for End-to-End Automatic Speech Recognition
Oral; 1220–1240
Niko Moritz (0), Takaaki Hori (MERL), Jonathan Le Roux (Mitsubishi Electric Research Laboratories)
Analyzing Phonetic and Graphemic Representations in End-to-End Automatic Speech Recognition
Oral; 1240–1300
Yonatan Belinkov (MIT CSAIL), Ahmed Ali (Qatar Computing Research Institute), James Glass (Massachusetts Institute of Technology)

Speech Enhancement: Multi-channel[Mon-O-1-2]
Monday, 16 September, Hall 1

Multi-channel speech enhancement using time-domain convolutional denoising autoencoder
Oral; 1100–1120
Naohiro Tawara (Waseda University), Tetsunori Kobayashi (Waseda University), Tetsuji Ogawa (Waseda University)
On Nonlinear Spatial Filtering in Multichannel Speech Enhancement
Oral; 1120–1140
Kristina Tesch (Universität Hamburg), Robert Rehr (Signal Processing Group, Department of Informatics, University of Hamburg, Germany), Timo Gerkmannn (University of Hamburg)
Multi-Channel Block-Online Source Extraction based on Utterance Adaptation
Oral; 1140–1200
Juan M. Martín-Doñas (University of Granada), Jens Heitkämper (Paderborn University), Reinhold Haeb-Umbach (Paderborn University), Angel Gomez (University of Granada), Antonio M. Peinado (Universidad de Granada)
Exploiting Multi-Channel Speech Presence Probability in Parametric Multi-Channel Wiener Filter
Oral; 1200–1220
Saeed Bagheri Sereshki (Sonos, Inc.), Daniele Giacobello (Sonos)
Variational Bayesian Multi-channel Speech Dereverberation under Noisy Environments with Probabilistic Convolutive Transfer Function
Oral; 1220–1240
Masahito Togami (Line Corporation), Tatsuya Komatsu (Line Corporation)
Simultaneous denoising and dereverberation for low-latency applications using frame-by-frame online unified convolutional beamformer
Oral; 1240–1300
Tomohiro Nakatani (NTT Corporation), Keisuke Kinoshita (NTT Corporation)

Speech Production: Individual Differences and the Brain[Mon-O-1-3]
Monday, 16 September, Hall 2

Individual variation in cognitive processing style predicts differences in phonetic imitation of device and human voices
Oral; 1100–1120
Cathryn Snyder (UC Davis), Michelle Cohn (University of California, Davis), Georgia Zellou (UC Davis)
An investigation on speaker specific articulatory synthesis with speaker independent articulatory inversion
Oral; 1120–1140
Aravind Illa (PhD Student, Indian Institute of Science, Bangalore), Prasanta Ghosh (Assistant Professor, EE, IISc)
Individual Difference of Relative Tongue Size and its Acoustic Effects
Oral; 1140–1200
Xiaohan Zhang (Tianjin University), Chongke Bi (Tianjin University), Kiyoshi Honda (Tianjin University), Wenhuan Lu (Tianjin University), Jianguo Wei (Tianjin University)
Individual Differences of Airflow and Sound Generation in the Vocal Tract of Sibilant /s/
Oral; 1200–1220
Tsukasa Yoshinaga (Osaka University), Kazunori Nozaki (Osaka University Dental Hospital), Shigeo Wada (Osaka University)
Hush-Hush Speak: Speech Reconstruction Using Silent Videos
Oral; 1220–1240
Yaman Kumar (Adobe), Shashwat Uttam (NSUT), Amanda Stent (Bloomberg), Rajiv Shah (IIIT Delhi), Mansi Aggarwal (DTU), Dhruva Sahrawat (IIITD), Debanjan Mahata (Bloomberg)
SPEAK YOUR MIND! Towards Imagined Speech Recognition With Hierarchical Deep Learning
Oral; 1240–1300
Pramit Saha (University of British Columbia, Vancouver), Muhammad Abdul-Mageed (University of British Columbia), Sidney Fels (University of British Columbia)

Speech Signal Characterization 1[Mon-O-1-4]
Monday, 16 September, Hall 11

An Unsupervised Autoregressive Model for Speech Representation Learning
Oral; 1100–1120
Yu-An Chung (Massachusetts Institute of Technology), Wei-Ning Hsu (Massachusetts Institute of Technology), Hao Tang (Massachusetts Institute of Technology), James Glass (Massachusetts Institute of Technology)
Harmonic-aligned Frame Mask Based on Non-stationary Gabor Transform with Application to Content-dependent Speaker Comparison
Oral; 1120–1140
Feng Huang (Acoustic Research Institute, Austrian Academy of Sciences)
Glottal Closure Instants Detection from Speech Signal by Deep Features Extracted from Raw Speech and Linear Prediction Residual
Oral; 1140–1200
Gurunath Reddy M (Indian Institute of Technology, Kharagpur, India), K Sreenivasa Rao (Professor), Partha Pratim Das (Department of Computer Science & Engineering, IIT Kharagpur)
Learning Problem-agnostic Speech Representations from Multiple Self-supervised Tasks
Oral; 1200–1220
Santiago Pascual (Universitat Politècnica de Catalunya), Mirco Ravanelli (Université de Montréal), Joan Serrà (Telefonica Research), Antonio Bonafonte (Universitat Politècnica de Catalunya), Yoshua Bengio (U. Montreal)
Excitation source and vocal tract system based acoustic features for detection of nasals in continuous speech
Oral; 1220–1240
Bhanu Teja Nellore (Speech and Vision Lab, International Institute of Information Technology Hydrabad), Sri Harsha Dumpala (TCS Research and Innovation-Mumbai), Karan Nathwani (Indian Institute of Technology Jammu), Suryakanth V Gangashetty (IIIT Hyderabad - 500 032, Telangana, India)
Data Augmentation using GANs for Speech Emotion Recognition
Oral; 1240–1300
Aggelina Chatziagapi (Behavioral Signal Technologies Inc.), Georgios Paraskevopoulos (Behavioral Signal Technologies Inc.), Dimitris Sgouropoulos (Behavioral Signal Technologies Inc.), Georgios Pantazopoulos (Behavioral Signal Technologies Inc.), Malvina Nikandrou (Behavioral Signal Technologies Inc.), Theodoros Giannakopoulos (Behavioral Signal Technologies Inc.), Athanasios Katsamanis (Behavioral Signals), Alexandros Potamianos (Behavioral Signal Technologies Inc.), Shrikanth Narayanan (University of Southern California)

Neural waveform generation[Mon-O-1-5]
Monday, 16 September, Main Hall

High quality - lightweight and adaptable TTS using LPCNet
Oral; 1100–1120
Zvi Kons (IBM Haifa research lab), Slava Shechtman (Speech Technologies, IBM Research AI), Alexander Sorin (IBM Research - Haifa), Carmel Rabinovitz (IBM Research - Haifa), Ron Hoory (IBM Haifa Research Lab)
Towards achieving robust universal neural vocoding
Oral; 1120–1140
Jaime Lorenzo-Trueba (Amazon Alexa), Thomas Drugman (Amazon), Javier Latorre (Amazon), Thomas Merritt (Amazon), Bartosz Putrycz (Amazon), Roberto Barra-Chicote (Amazon), Alexis Moinet (Amazon), Vatsal Aggarwal (Amazon)
Expediting TTS Synthesis with Adversarial Vocoding
Oral; 1140–1200
Paarth Neekhara (UC San Diego), Chris Donahue (UC San Diego), Miller Puckette (UC San Diego), Shlomo Dubnov (UC San Diego), Julian McAuley (UC San Diego)
Analysis by Adversarial Synthesis - A Novel Approach for Speech Vocoding
Oral; 1200–1220
Ahmed Mustafa (University of Erlangen-Nuremberg), Arijit Biswas (Dolby Germany GmbH), Christian Bergler (Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab), Julia Schottenhamml (University Erlangen-Nuremberg), Andreas Maier (University Erlangen-Nuremberg)
Quasi-periodic WaveNet Vocoder: A Pitch Dependent Dilated Convolution Model for Parametric Speech Generation
Oral; 1220–1240
Yi-Chiao Wu (Nagoya University), Tomoki Hayashi (Nagoya University), Patrick Lumban Tobing (Nagoya University), Kazuhiro Kobayashi (Nagoya University), Tomoki Toda (Nagoya University)
A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data
Oral; 1240–1300
Xiaohai Tian (National University of Singapore), Eng Siong Chng (Nanyang Technological University), Haizhou Li (National University of Singapore)

Applications of language technologies[Mon-P-1-A]
Monday, 16 September, Gallery A

Code-switching Sentence Generation by Generative Adversarial Networks and its Application to Data Augmentation
Poster; 1100–1300
Ching-Ting Chang (National Taiwan University), Shun-Po Chuang (National Taiwan University), Hung-Yi Lee (National Taiwan University)
Integrating Video Retrieval and Moment Detection in a Unified Corpus for Video Question Answering
Poster; 1100–1300
Hongyin Luo (MIT), Mitra Mohtarami (MIT Computer Science and Artificial Intelligence Lab), James Glass (Massachusetts Institute of Technology), Karthik Krishnamurthy (Ford), Brigitte Richardson (Ford)
Comparative Analysis of Think-aloud Methods for Everyday Activities in the Context of Cognitive Robotics
Poster; 1100–1300
Moritz Meier (University of Bremen), Celeste Mason (University of Bremen), Felix Putze (University of Bremen), Tanja Schultz (Universität Bremen)
RadioTalk: a large-scale corpus of talk radio transcripts
Poster; 1100–1300
Doug Beeferman (MIT Media Lab), William Brannon (MIT Media Lab), Deb Roy (MIT Media Lab)
Qualitative evaluation of ASR adaptation in a lecture context: Application to the PASTEL corpus
Poster; 1100–1300
salima mdhaffar (Le Mans university), Yannick Estève (LIA - Avignon University), Nicolas Hernandez (Université de Nantes), Antoine LAURENT (LIUM - Laboratoire Informatique Université du Mans), Solen Quiniou (University of Nantes)
Active Annotation: bootstrapping annotation lexicon and guidelines for supervised NLU learning
Poster; 1100–1300
Federico Marinelli (University of Trento), Alessandra Cervone (University of Trento), Giuliano Tortoreto (University of Trento), Evgeny A. Stepanov (VUI Inc.), Giuseppe Di Fabbrizio (VUI Inc.), Giuseppe Riccardi (University of Trento)
Automatic lyric transcription from Karaoke vocal tracks: Resources and a Baseline System
Poster; 1100–1300
Gerardo Roa (University of Sheffield), Jon Barker (University of Sheffield)
Detecting Mismatch Between Speech and Transcription Using Cross-Modal Attention
Poster; 1100–1300
Qiang Huang (University of Sheffield), Thomas Hain (University of Sheffield)
EpaDB: a database for development of pronunciation assessment systems
Poster; 1100–1300
Jazmín Vidal (CONICET), Luciana Ferrer (CONICET), Leonardo Brambilla (Facultad de Ciencicias Exactas y Naturales, Universidad de Buenos Aires)
Automatic Compression of Subtitles with Neural Networks and its Effect on User Experience
Poster; 1100–1300
Katrin Angerbauer (University of Stuttgart, VISUS), Heike Adel (Bosch Center for Artificial Intelligence), Ngoc Thang Vu (University of Stuttgart)

Social Signals Detection and Speaker Traits Analysis[Mon-P-1-B]
Monday, 16 September, Gallery B

Predicting Humor by Learning from Time-aligned Comments
Poster; 1100–1300
Zixiaofan Yang (Columbia University), Bingyan Hu (Columbia University), Julia Hirschberg (Columbia University)
Sincerity in Acted Speech: Presenting the Sincere Apology Corpus and Results
Poster; 1100–1300
Alice Baird (University of Augsburg), Eduardo Coutinho (Liverpool University), Julia Hirschberg (Columbia University), Björn Schuller (University of Augsburg / Imperial College London)
Do not hesitate! – Unless you do it shortly or nasally: How the phonetics of filled pauses determine their subjective frequency and perceived speaker performance
Poster; 1100–1300
Oliver Niebuhr (University of Southern Denmark), Kerstin Fischer (University of Southern Denmark)
Phonet: a Tool Based on Gated Recurrent Neural Networks to Extract Phonological Posteriors from Speech
Poster; 1100–1300
Juan Camilo Vásquez Correa (Faculty of Engineering. Universidad de Antioquia UdeA), Philipp Klumpp (Friedrich-Alexander-Universität Erlangen-Nürnberg), Juan Rafael Orozco-Arroyave (Universidad de Antioquia), Elmar Noeth (Friedrich-Alexander-University Erlangen-Nuremberg)
Predicting the Leading Political Ideology of Youtube Channels\\ Using Acoustic - Textual and Metadata Information
Poster; 1100–1300
Yoan Dinkov (Sofia University), Ahmed Ali (Qatar Computing Research Institute), Ivan Koychev (Sofia University), Preslav Nakov (Qatar Computing Research Institute, HBKU)
Mitigating Gender and L1 Differences to Improve State and Trait Recognition
Poster; 1100–1300
Guozhen An (Graduate Center CUNY), Rivka Levitan (Brooklyn College CUNY)
Deep Learning based Mandarin Accent Identification for Accent Robust ASR
Poster; 1100–1300
Felix Weninger (Nuance Communications), Yang Sun (Nuance Communications), Junho Park (Nuance Communications), Daniel Willett (Nuance Communications), Puming Zhan (Nuance Communications)
Calibrating DNN Posterior Probability Estimates of HMM/DNN Models to Improve Social Signal Detection From Audio Data
Poster; 1100–1300
Gábor Gosztolya (Research Group on Artificial Intelligence), László Tóth (MTA-SZTE Research Group on Artificial Intelligence)
Conversational and Social Laughter Synthesis with WaveNet
Poster; 1100–1300
Hiroki Mori (Utsunomiya University), Tomohiro Nagata (Utsunomiya University), Yoshiko Arimoto (Faculty of Science and Engineering, Teikyo University)
Laughter dynamics in dyadic conversations
Poster; 1100–1300
Bogdan Ludusan (Phonetics and Phonology Workgroup, Faculty of Linguistics and Literary Studies, Bielefeld University), Petra Wagner (Universität Bielefeld)
Towards an annotation scheme for complex laughter in speech corpora
Poster; 1100–1300
Khiet Truong (University of Twente), Juergen Trouvain (Saarland University), Michel Jansen (University of Twente)
Using Speech to Predict Sequentially Measured Cortisol Levels During a Trier Social Stress Test
Poster; 1100–1300
Alice Baird (University of Augsburg), Shahin Amiriparian (University of Augsburg / Technische Universität München), Nicholas Cummins (University of Augsburg), Sarah Strumbauer (Friedrich-Alexander-Universität Erlangen-Nürnberg), Johanna Janson (Friedrich-Alexander-Universität Erlangen-Nürnberg), Eva-Maria Messner (University of Ulm), Harald Baumeister (University Ulm), Nicolas Rohleder (FAU University of Erlangen-Nuremberg), Björn Schuller (University of Augsburg / Imperial College London)

Speaker Recognition and Diarization[Mon-P-1-C]
Monday, 16 September, Gallery C

Bayesian HMM based x-vector clustering for Speaker Diarization
Poster; 1100–1300
Mireia Diez (Brno University of Technology), Lukas Burget (Brno University of Technology), Johan Rohdin (Brno University of Technology), Shuai Wang (Shanghai Jiao Tong University), Jan Černocký (Brno University of Technology)
Speaker Diarization with Lexical Information
Poster; 1100–1300
Tae Jin Park (University of Southern California), Kyu Han (JD AI Research), Jing Huang (JD AI Research), Xiaodong He (JD AI Research), Bowen Zhou (JD AI Research), Panayiotis Georgiou (Univ. Southern California), Shrikanth Narayanan (University of Southern California)
Joint Speech Recognition and Speaker Diarization via Sequence Transduction
Poster; 1100–1300
Laurent El Shafey (Google), Hagen Soltau (Google), Izhak Shafran (Google Inc)
Normal variance-mean mixtures for unsupervised score calibration
Poster; 1100–1300
Sandro Cumani (Politecnico di Torino)
Speaker Augmentation and Bandwidth Extension for Deep Speaker Embedding
Poster; 1100–1300
Hitoshi Yamamoto (NEC Corporation), Kong Aik Lee (Data Science Research Laboratories, NEC Corporation), Koji Okabe (NEC Corporation), Takafumi Koshinaka (Data Science Research Labs., NEC Corporation)
Large-Scale Speaker Diarization of Radio Broadcast Archives
Poster; 1100–1300
Emre Yilmaz (National University of Singapore), Adem Derinel (National University of Singapore), Zhou Kun (National University of Singapore), Henk van den Heuvel (CLS/CLST, Radboud University Nijmegen), Niko Brummer (Cyberupt), Haizhou Li (National University of Singapore), David van Leeuwen (Radboud University Nijmegen)
Toeplitz Inverse Covariance based Robust Speaker Clustering for Naturalistic Audio Streams
Poster; 1100–1300
Harishchandra Dubey (Center for Robust Speech Systems, The University of Texas at Dallas), Abhijeet Sangwan (The University of Texas at Dallas), John Hansen (The University of Texas at Dallas)
Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration
Poster; 1100–1300
Ville Vestman (School of Computing, University of Eastern Finland, Finland), Kong Aik Lee (Data Science Research Laboratories, NEC Corporation), Tomi Kinnunen (University of Eastern Finland), Takafumi Koshinaka (Data Science Research Labs., NEC Corporation)
MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation
Poster; 1100–1300
Suwon Shon (Massachusetts Institute of Technology), Najim Dehak (Johns Hopkins University), Douglas Reynolds (MIT Lincoln Laboratory), James Glass (Massachusetts Institute of Technology)
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System
Poster; 1100–1300
Zhifu Gao (University of Science and Technology of China), Yan Song (university of science and technology of china), Ian McLoughlin (The University of Kent, School of Computing, Medway), Pengcheng Li (University of Science and Technology of China), Yiheng Jiang (University of Science and Technology of China), Lirong Dai (University of Science &Technology of China)
LSTM Based Similarity Measurement with Spectral Clustering for Speaker Diarzation
Poster; 1100–1300
Qingjian Lin (SEIT, Sun Yat-sen University), Ruiqing Yin (LIMSI, CNRS, Université Paris-Saclay), Ming Li (Duke Kunshan University), Hervé Bredin (CNRS LIMSI), Claude Barras (LIMSI-CNRS)
Who said that?: Audio-visual speaker diarisation of real-world meetings
Poster; 1100–1300
Joon Son Chung (University of Oxford), Bong-Jin Lee (Naver Corporation), Icksang Han (Naver Corporation)
Multi-PLDA Diarization on Children's Speech
Poster; 1100–1300
Jiamin Xie (Johns Hopkins University), Leibny Paola Garcia Perera (Johns Hopkins University), Dan Povey (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)
Speaker Diarization using Leave-one-out Gaussian PLDA Clustering of DNN Embeddings
Poster; 1100–1300
Alan McCree (JHU HLTCOE), Gregory Sell (Johns Hopkins University), Daniel Garcia-Romero (Human Language Technology Center of Excellence, Johns Hopkins University)
Speaker-Corrupted Embeddings for Online Speaker Diarization
Poster; 1100–1300
Omid Ghahabi (EML European Media Laboratory GmbH), Volker Fischer (EML GmbH)

Speech and Audio Characterization and Segmentation [Mon-P-1-D]
Monday, 16 September, Hall 10/D

Early Identification of Speech Changes Due to Amyotrophic Lateral Sclerosis Using Machine Classification
Poster; 1100–1300
Sarah Gutz (Harvard University), Jun Wang (University of Texas at Dallas), Yana Yunusova (University of Toronto), Jordan Green (MGH IHP)
An Approach to Online Speaker Change Point Detection Using DNNs and WFSTs
Poster; 1100–1300
Lukas Mateju (Technical University of Liberec), Petr Cerva (Technical University of Liberec), Jindrich Zdansky (Technical University of Liberec)
Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks
Poster; 1100–1300
Zhenyu Tang (University of Maryland), John D. Kanu (University of Maryland), Kevin Hogan (University of Maryland), Dinesh Manocha (University of Maryland)
Automatic Detection of Breath Using Voice Activity Detection and SVM Classifier with Application on News Reports
Poster; 1100–1300
Mohamed Ismail Yasar Arafath K (Indian Institute of Technology Kharagpur), Aurobinda Routray (Indian Institute of Technology Kharagpur)
Acoustic scene classification using teacher-student learning with soft-labels
Poster; 1100–1300
Hee-Soo Heo (School of Computer Science, University of Seoul, Korea), Jee-weon Jung (University of Seoul), Hye-jin Shim (University of Seoul), Ha-Jin Yu (University of Seoul)
Rare Sound Event Detection Using Deep Learning and Data Augmentation
Poster; 1100–1300
Yanping Chen (Samsung Research America), Hongxia Jin (Samsung Research America)
A Combination of Model-based and Feature-based Strategy for Speech-to-Singing Alignment
Poster; 1100–1300
Bidisha Sharma (National University of Singapore, Singapore), Haizhou Li (National University of Singapore)
Dr.VOT : Measuring Positive and Negative Voice Onset Time in the Wild
Poster; 1100–1300
Yosi Shrem (Bar-Ilan university), Matthew Goldrick (Northwestern University,Evanston,IL), Joseph Keshet (Bar-Ilan University)
Effects of base-frequency and spectral envelope on deep-learning speech separation and recognition models
Poster; 1100–1300
Jun Hui (The Hong Kong University of Science and Technology), Yue Wei (HKUST-Shenzhen Research Institute), Shutao Chen (The Hong Kong University of Science and Technology), Richard H.Y. So (The Hong Kong University of Science and Technology)
Phone Aware Nearest Neighbor Technique using Spectral Transition Measure for Non-Parallel Voice Conversion
Poster; 1100–1300
Nirmesh Shah (Dhirubhai Ambani Institute of Information and Communication Technology, (DA-IICT), Gandhinagar), Hemant Patil (DA-IICT Gandhinagar)
Weakly Supervised Syllable Segmentation by Vowel-Consonant Peak Classification
Poster; 1100–1300
Ravi Shankar (Johns Hopkins University), Archana Venkataraman (Johns Hopkins University)

ASR for noisy and far-field speech[Mon-P-1-E]
Monday, 16 September, Hall 10/E

Examining the combination of multi-band processing and channel dropout for robust speech recognition
Poster; 1100–1300
György Kovács (Embedded Internet Systems Lab, Luleå University of Technology), László Tóth (Institute of Informatics, University of Szeged), Dirk Van Compernolle (KU Leuven - ESAT), Marcus Liwiciki (Embedded Internet Systems Lab, Luleå University of Technology)
Improved Speaker-Dependent Separation for CHiME-5 Challenge
Poster; 1100–1300
Jian Wu (Northwestern Polytechnical University), Yong Xu (Tencent AI Lab), Shi-Xiong Zhang (Tencent AI Lab), Lian-Wu Chen (Tencent AI Lab), Meng Yu (Tencent AI Lab), Lei Xie (Northwestern Polytechnical University), Dong Yu (Tencent AI Lab)
Bridging the Gap Between Monaural Speech Enhancement and Recognition with Distortion-Independent Acoustic Modeling
Poster; 1100–1300
Peidong Wang (The Ohio State University), Ke Tan (The Ohio State University), DeLiang Wang (The Ohio State University)
Enhanced Spectral Features for Distortion-Independent Acoustic Modeling
Poster; 1100–1300
Peidong Wang (The Ohio State University), DeLiang Wang (The Ohio State University)
Universal Adversarial Perturbations for Speech Recognition Systems
Poster; 1100–1300
Paarth Neekhara (UC San Diego), Shehzeen Hussain (UC San Diego), Prakhar Pandey (UC San Diego), Shlomo Dubnov (UC San Diego), Julian McAuley (UCSD), Farinaz Koushanfar (University of California San Diego)
One-pass single-channel noisy speech recognition using a combination of noisy and enhanced features
Poster; 1100–1300
Masakiyo Fujimoto (National Institute of Information and Communications Technology), Hisashi Kawai (National Institute of Information and Communications Technology)
Jointly Adversarial Enhancement Training for Robust End-to-End Speech Recognition
Poster; 1100–1300
Bin Liu (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Shuai Nie (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences), Shan Liang (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences), Wenju Liu (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences), Meng Yu (Tencent AI Lab, Bellevue, WA, USA), Lianwu Chen (Tencent AI Lab, Shenzhen, China), Shouye Peng (Xueersi Online School), Changliang Li (kingsoft AI lab)
Label Driven Time-Frequency Masking for Robust Continuous Speech Recognition
Poster; 1100–1300
Meet Soni (TCS Innovation Labs), Ashish Panda (Innovation Labs, Tata Consultancy Services)
Speaker-Invariant Feature-Mapping for Distant Speech Recognition via Adversarial Teacher-Student Learning
Poster; 1100–1300
Long Wu (University of Chinese Academy of Sciences, China), Hangting Chen (University of Chinese Academy of Sciences, China), Li Wang (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China), Pengyuan Zhang (University of Chinese Academy of Sciences, China), Yonghong Yan (University of Chinese Academy of Sciences, China)
Full-Sentence Correlation: a Method to Handle Unpredictable Noise for Robust Speech Recognition
Poster; 1100–1300
Ming Ji (Queen's University Belfast), Danny Crookes (Queen's University Belfast)
Generative Noise Modeling and Channel Simulation for Robust Speech Recognition in Unseen Conditions
Poster; 1100–1300
Meet Soni (TCS Innovation Labs), Sonal Joshi (Tata Consultancy Services), Ashish Panda (Innovation Labs, Tata Consultancy Services)
Far-Field Speech Enhancement using Heteroscedastic Autoencoder for Improved Speech Recognition
Poster; 1100–1300
Shashi Kumar (Samsung Research Institute India), Shakti Rath (Samsung Research Institute Bangalore)
End-to-end SpeakerBeam for single channel target speech recognition
Poster; 1100–1300
Marc Delcroix (NTT Communication Science Laboratories), Shinji Watanabe (Johns Hopkins University), Tsubasa Ochiai (NTT Communication Science Laboratories), Keisuke Kinoshita (NTT), Shigeki Karita (NTT Communication Science Laboratories), Atsunori Ogawa (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)
NIESR: Nuisance Invariant End-to-end Speech Recognition
Poster; 1100–1300
I-Hung Hsu (USC Information Sciences Institute), Ayush Jaiswal (USC Information Sciences Institute), Premkumar Natarajan (USC Information Sciences Institute)
Knowledge Distillation for Throat Microphone Speech Recognition
Poster; 1100–1300
Takahito Suzuki (Shizuoka University), Jun Ogata (National Institute of Advanced Industrial Science and Technology), Takashi Tsunakawa (Shizuoka University), Masafumi Nishida (Shizuoka University), Masafumi Nishimura (Shizuoka University)

Spoken Language Processing for Children's Speech[Mon-SS-1-6]
Monday, 16 September, Hall 3 [More info]

Introduction: SIG-CHILD special interest group
Oral; 1100–1112
Advances in Automatic Speech Recognition for Child Speech Using Factored Time Delay Neural Network
Oral; 1112–1130
Fei Wu (Johns Hopkins University), Leibny Paola Garcia Perera (Johns Hopkins University), Dan Povey (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)
A Frequency Normalization Technique for Kindergarten Speech Recognition Inspired by the Role of F0 in Vowel Perception
Oral; 1130–1148
Gary Yeung (University of California, Los Angeles), Abeer Alwan (UCLA)
Improving ASR Systems for Children with Autism and Language Impairment Using Domain-Focused DNN Transfer Techniques
Oral; 1148–1206
Robert Gale (Oregon Health & Science University), Liu Chen (Oregon Health & Science University), Jill Dolata (Oregon Health & Science University), meysam asgari (CSLU)
Ultrasound tongue imaging for diarization and alignment of child speech therapy sessions
Oral; 1206–1224
Manuel Sam Ribeiro (The University of Edinburgh), Aciel Eshky (University of Edinburgh), Korin Richmond (Informatics, University of Edinburgh), Steve Renals (University of Edinburgh)
Automated estimation of oral reading fluency during summer camp e-book reading with MyTurnToRead
Oral; 1224–1242
Anastassia Loukina (Educational Testing Service), Beata Beigman Klebanov (Educational Testing Service), Patrick Lange (Educational Testing Service R&D), Yao Qian (Educational Testing Service), Binod Gyawali (Educational Testing Service), Nitin Madnani (Educational Testing Service), Abhinav Misra (Educational Testing Service (ETS)), Klaus Zechner (ETS), Zuowei Wang (Educational Testing Service), John Sabatini (Educational Testing Service)
Sustained Vowel Game: a computer therapy game for children with dysphonia
Oral; 1242–1300
Vanessa Lopes (Universidade Nova de Lisboa), Joao Magalhaes (Universidade Nova de Lisboa), Sofia Cavaco (Universidade Nova de Lisboa)

Lunch Break in lower foyer[Mon-B-2]
Monday, 16 September, Foyer

Lunch Break in lower foyer
Break; 1300–1400

Attention Mechanism for Speaker State Recognition[Mon-O-2-1]
Monday, 16 September, Main Hall

Survey Talk
Survey Talk: When Attention meets Speech Applications: Speech & Speaker Recognition Perspective [More info]
Survey Talk; 1430–1510
Kyu Han (ASAPP, Inc.), Ramon Prieto , Tao Ma
Attention-enhanced Connectionist Temporal Classification for Discrete Speech Emotion Recognition
Oral; 1510–1530
Ziping Zhao (Tianjin Normal University), Zhongtian Bao (Tianjin Normal University), Zixing Zhang (Imperial College London), Nicholas Cummins (University of Augsburg), Haishuai Wang (Tianjin Normal University), Björn Schuller (University of Augsburg / Imperial College London)
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile
Oral; 1530–1550
Jeng-Lin Li (Department of Electrical Engineering, National Tsing Hua University), Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University)
A Saliency-based Attention LSTM Model for Cognitive Load Classification from Speech
Oral; 1550–1610
Ascension Gallardo-Antolin (Universidad Carlos III de Madrid), Juan M Montero (Universidad Politecnica de Madrid)
A Hierarchical Attention Network-Based Approach for Depression Detection from Transcribed Clinical Interviews
Oral; 1610–1630
Adria Mallol-Ragolta (University of Augsburg), Ziping Zhao (Tianjin Normal University), Lukas Stappen (University of Augsburg), Nicholas Cummins (University of Augsburg), Björn Schuller (University of Augsburg / Imperial College London)

ASR Neural Network Training - 1[Mon-O-2-2]
Monday, 16 September, Hall 1

Untranscribed web audio for low resource speech recognition
Oral; 1430–1450
Andrea Carmantini (University of Edinburgh), Peter Bell (University of Edinburgh), Steve Renals (University of Edinburgh)
RWTH ASR System for LibriSpeech: Hybrid vs Attention
Oral; 1450–1510
Christoph Lüscher (Human Language Technology and Pattern Recognition Group, RWTH Aachen University), Eugen Beck (RWTH Aachen University), Kazuki Irie (RWTH Aachen University), Markus Kitza (RWTH Aachen University), Wilfried Michel (RWTH Aachen University), Albert Zeyer (Human Language Technology and Pattern Recognition Group (Chair of Computer Science 6), Computer Science Department, RWTH Aachen University), Ralf Schlüter (Lehrstuhl Informatik 6, RWTH Aachen University), Hermann Ney (RWTH Aachen University)
Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition
Oral; 1510–1530
Naoyuki Kanda (Hitachi, Ltd.), Shota Horiguchi (Hitachi Ltd.), Ryoichi Takashima (Hitachi Ltd.), Yusuke Fujita (Hitachi, Ltd.), Kenji Nagamatsu (Hitachi Ltd.), Shinji Watanabe (Johns Hopkins University)
Speaker Adaptation for Attention-Based End-to-End Speech Recognition
Oral; 1530–1550
Zhong Meng (Microsoft Corporation), Yashesh Gaur (Microsoft), Jinyu Li (Microsoft), Yifan Gong (Microsoft Corp)
Large Margin Training for Attention Based End-to-End Speech Recognition
Oral; 1550–1610
Peidong Wang (The Ohio State University), Jia Cui (Tencent AI Lab), Chao Weng (Tencent AI Lab), Dong Yu (Tencent AI Lab)
Large-Scale Mixed-Bandwidth Deep Neural Network Acoustic Modeling for Automatic Speech Recognition
Oral; 1610–1630
Khoi-Nguyen Mac (University of Illinois at Urbana-Champaign), Xiaodong Cui (IBM T. J. Watson Research Center), Wei Zhang (IBM T. J. Watson Research Center), Michael Picheny (IBM T. J. Watson Research Center)

Zero-resource ASR[Mon-O-2-3]
Monday, 16 September, Hall 2

SparseSpeech: Unsupervised acoustic unit discovery with memory-augmented sequence autoencoders
Oral; 1430–1450
Benjamin Milde (Universität Hamburg), Chris Biemann (University of Hamburg)
Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery
Oral; 1450–1510
Lucas Ondel (Brno University of Technology), Hari Krishna Vydana (Brno Universtiy of Technology), Lukas Burget (Brno University of Technology), Jan Černocký (Brno University of Technology)
Speaker Adversarial Training of DPGMM-based Feature Extractor for Zero-Resource Languages
Oral; 1510–1530
Yosuke Higuchi (Waseda University), Naohiro Tawara (Waseda University), Tetsunori Kobayashi (Waseda University), Tetsuji Ogawa (Waseda University)
Building Large-Vocabulary ASR Systems for Languages Without Any Audio Training Data
Oral; 1530–1550
Manasa Prasad (Google), Daan van Esch (Google), Sandy Ritchie (Google), Jonas Fromseier Mortensen (Google)
Towards Bilingual Lexicon Discovery From Visually Grounded Speech Audio
Oral; 1550–1610
Emmanuel Azuh (Massachusetts Institute of Technology), David Harwath (Massachusetts Institute of Technology), James Glass (Massachusetts Institute of Technology)
Improving Unsupervised Subword Modeling via Disentangled Speech Representation Learning and Transformation
Oral; 1610–1630
Siyuan Feng (Department of Electronic Engineering, The Chinese University of Hong Kong), Tan Lee (The Chinese University of Hong Kong)

Sociophonetics[Mon-O-2-4]
Monday, 16 September, Hall 11

Listeners’ Ability to Identify the Gender of Preadolescent Children in Different Linguistic Contexts
Oral; 1430–1450
Shawn Nissen (Brigham Young University), Sharalee Blunck (Brigham Young University), Anita Dromey (Brigham Young University), Christopher Dromey (Brigham Young University)
Sibilant variation in New Englishes: A comparative sociophonetic study of Trinidadian and American English /s(tr)/-retraction
Oral; 1450–1510
Wiebke Ahlers (Universität Osnabrück), Philipp Meer (Universität Münster)
Tracking the New Zealand English NEAR/SQUARE merger using functional principal components analysis
Oral; 1510–1530
Michele Gubian (IPS), Jonathan Harrington (IPS, Munich), Mary Stevens (IPS), Florian Schiel (IPS), Paul Warren (Victoria University Wellington)
Phonetic Accommodation in a Wizard-of-Oz Experiment: Intonation and Segments
Oral; 1530–1550
Iona Gessinger (Saarland University), Bernd Möbius (Saarland University), Bistra Andreeva (Saarland University), Eran Raveh (Saarland University), Ingmar Steiner (Saarland University/DFKI)
PASCAL and DPA: A pilot study on using prosodic competence scores to predict communicative skills for team working and public speaking
Oral; 1550–1610
Oliver Niebuhr (University of Southern Denmark), Jan Michalsky (FAU Erlangen-Nuremberg)
Towards the prosody of persuasion in competitive negotiation. The relationship between f0 and negotiation success in same sex sales tasks
Oral; 1610–1630
Jan Michalsky (FAU Erlangen-Nuremberg), Heike Schoormann (University of Oldenburg), Thomas Schultze-Gerlach (University of Goettingen)

Ressources - Annotation - Evaluation[Mon-O-2-5]
Monday, 16 September, Hall 12

VESUS: A Crowd-Annotated Database to Study Emotion Production and Perception in Spoken English
Oral; 1430–1450
Jacob Sager (Johns Hopkins University), Ravi Shankar (Johns Hopkins University), Jacob Reinhold (Johns Hopkins University), Archana Venkataraman (Johns Hopkins University)
Building the Singapore English National Speech Corpus
Oral; 1450–1510
Jia Xin Koh (Info-communications and Media Development Authority), Aqilah Mislan (Info-communications and Media Development Authority), Kevin Khoo (IMDA), Brian Ang (IMDA), Wilson Ang (IMDA), Charmaine Ng (NTU), Ying Ying Tan (NTU)
Challenging the Boundaries of Speech Recognition: The MALACH Corpus
Oral; 1510–1530
Michael Picheny (IBM TJ Watson Research Center), Zoltán Tüske (IBM Research), Brian Kingsbury (IBM Research), Kartik Audhkhasi (IBM Research), Xiaodong Cui (IBM T. J. Watson Research Center), George Saon (IBM)
NITK Kids' Speech Corpus
Oral; 1530–1550
Pravin Bhaskar Ramteke (Research Scholar National Institute of Technology Karnataka), sujata supanekar (National Institute of Technology Karnataka), Pradyoth Hegde (National Institute of Technology Karnataka), Hanna Nelson (School of Allied Health Sciences), Venkataraja Aithal (School of Allied Health Sciences, Manipal, Karnataka), Shashidhar G Koolagudi - (National Institute of Technology Karnataka, Surathkal)
Towards Variability Resistant Dialectal Speech Evaluation
Oral; 1550–1610
Ahmed Ali (Qatar Computing Research Institute), Nizar Habash (New York University Abu Dhabi), Salam Khalifa (New York University Abu Dhabi)
How to annotate 100 hours in 45 minutes
Oral; 1610–1630
Per Fallgren (KTH Royal Institute of Technology), Zofia Malisz (KTH, Stockholm), Jens Edlund (KTH Speech, Music and Hearing)

Speech production and silent interfaces[Mon-P-2-A]
Monday, 16 September, Gallery A

Multi-corpus Acoustic-to-articulatory Speech Inversion
Poster; 1430–1630
Nadee Seneviratne (University of Maryland, College Park), Ganesh Sivaraman (Pindrop), Carol Espy-Wilson (University of Maryland at College Park)
Speech Organ Contour Extraction using Real-Time MRI and Machine Learning Method
Poster; 1430–1630
Hironori Takemoto (Chiba Institute of Technology), Goto Tsubasa (Chiba Institute of Technology), Yuta Hagihara (Chiba Institute of Technology), Sayaka Hamanaka (Chiba Institute of Technology), Tatsuya Kitamura (Konan University), Yukiko Nota (The National Institute for Japanese Language), Kikuo Maekawa (The National Institute for Japanese Language)
CNN-based phoneme classifier from vocal tract MRI learns embedding consistent with articulatory topology
Poster; 1430–1630
Kicky van Leeuwen (Netherlands Cancer Institute, University of Twente), Paula Bos (Netherlands Cancer Institute), Stefano Trebeschi (Netherlands Cancer Institute), Maarten van Alphen (Netherlands Cancer Institute), Luuk Voskuilen (Netherlands Cancer Institute), Ludi Smeele (Netherlands Cancer Institute), Ferdinand van der Heijden (University of Twente, Netherlands Cancer Institute), Rob van Son (Netherlands Cancer Institute, University of Amsterdam)
Strength and structure: Coupling tones with oral constriction gestures
Poster; 1430–1630
Doris Muecke (IfL Phonetics, University of Cologne), Anne Hermes (Laboratoire de Phonétique et Phonologie (CNRS/Sorbonne Nouvelle)), Sam Tilsen (Cornell University, Ithaca)
Towards a Speaker Independent Speech-BCI Using Speaker Adaptation
Poster; 1430–1630
Debadatta Dash (The University of Texas at Dallas), Alan Wisler (University of Texas at Dallas), Paul Ferrari (University of Texas at Austin), Jun Wang (University of Texas at Dallas)
Identifying input features for development of real-time translation of neural signals to text
Poster; 1430–1630
Janaki Sheth (Dept. of Physics and Astronomy, University of California, Los Angeles), Ariel Tankus (Dept. of Neurology and Neurosurgery, Tel Aviv University, Tel Aviv), Michelle Tran (Dept. of Neurosurgery, University of California, Los Angeles), Lindy Comstock (Dept. of Linguistics, University of California, Los Angeles), Itzhak Fried (Dept. of Neurosurgery, University of California, Los Angeles), William Speier (Dept. of Radiological Sciences, University of California, Los Angeles)
Exploring Critical Articulator Identification from 50Hz RT-MRI Data of the Vocal Tract
Poster; 1430–1630
Samuel Silva (IEETA - University of Aveiro), António Teixeira (DETI/IEETA, University of Aveiro), Conceição Cunha (IPS, University of Munich), Nuno Almeida (DETI/IEETA, University of Aveiro), Arun Joseph (Max-Planck-Institut f ür Biophysikalische Chemie), Jens Frahm (Max-Planck-Institut für Biophysikalische Chemie)
Towards a method of dynamic vocal tract shapes generation by combining static 3D and dynamic 2D MRI speech data
Poster; 1430–1630
Ioannis Douros (Université de Lorraine, CNRS, Inria, LORIA, Inserm, IADI, F-54000 Nancy, France), Anastasiia Tsukanova (Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Karyna Isaieva (IADI, Université de Lorraine, INSERM U1254), Pierre-André Vuissoz (Universit ́e de Lorraine, INSERM U1254, IADI, F-54000 Nancy, France), Yves Laprie (LORIA/CNRS)
Temporal coordination of articulatory and respiratory events prior to speech initiation
Poster; 1430–1630
Oksana Rasskazova (ZAS), Christine Mooshammer (Institut für deutsche Sprache und Linguistik, Humboldt-Universität zu Berlin), Susanne Fuchs (ZAS Berlin)
Zooming in on Spatiotemporal V-to-C Coarticulation with Functional PCA
Poster; 1430–1630
Michele Gubian (Institute of Phonetics and Speech Processing, Ludwig Maximilian University), Manfred Pastätter (Institute of Phonetics and Speech Processing, Ludwig Maximilian University of Munich), Marianne Pouplier (Institute of Phonetics and Speech Processing, Ludwig Maximilian University of Munich)
Ultrasound-based Silent Speech Interface Built on a Continuous Vocoder
Poster; 1430–1630
Tamás Gábor Csapó (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics), Mohammed Salah Al-Radhi (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics), Géza Németh (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics), Gábor Gosztolya (Research Group on Artificial Intelligence), Tamás Grósz (Institute of Informatics, University of Szeged), László Tóth (MTA-SZTE Research Group on Artificial Intelligence), Alexandra Markó (Department of Phonetics, E\"otv\"os Lor\'and University)
Assessing acoustic and articulatory dimensions of speech motor adaptation with random forests
Poster; 1430–1630
Eugen Klein (Humboldt-Universität zu Berlin), Jana Brunner (Humboldt-Universität zu Berlin), Phil Hoole (Ludwig-Maximilians-Universität München)

Dialogue speech understanding[Mon-P-2-B]
Monday, 16 September, Gallery B

Mitigating Noisy Inputs for Question Answering
Poster; 1430–1630
Denis Peskov (University of Maryland), Joe Barrow (University of Marlyand), Pedro Rodriguez (University of Maryland), Graham Neubig (Carnegie Mellon University), Jordan Boyd-Graber (University of Maryland)
Topic-Aware Dialogue Speech Recognition with Transfer Learning
Poster; 1430–1630
Yuanfeng Song (Hong Kong University of Science and Technology, WeBank Co., Ltd), Di Jiang (WeBank Co., Ltd), Xueyang Wu (Hong Kong University of Science and Technology), Qian Xu (WeBank Co., Ltd), Raymond Chi-Wing Wong (Hong Kong University of Science and Technology), Qiang Yang (Hong Kong University of Science and Technology, WeBank Co., Ltd)
Improving Conversation-Context Language Models with Multiple Spoken Language Understanding Models
Poster; 1430–1630
Ryo Masumura (NTT Corporation), Tomohiro Tanaka (NTT Corporation), Atsushi Ando (NTT Corporation), Hosana Kamiyama (NTT Corporation), Takanobu Oba (NTT Media Intelligence Laboratories, NTT Corporation), Satoshi Kobashikawa (NTT Corporation), Yushi Aono (NTT Corporation)
Meta Learning for Hyperparameter Optimization in Dialogue System
Poster; 1430–1630
Jen-Tzung Chien (National Chiao Tung University), Wei Xiang Lieow (National Chiao Tung University)
Zero Shot Intent Classification Using Long-Short Term Memory Networks
Poster; 1430–1630
Kyle Williams (Microsoft)
A Comparison of Deep Learning Methods for Language Understanding
Poster; 1430–1630
Mandy Korpusik (Massachusetts Institute of Technology), Zoe Liu (Massachusetts Institute of Technology), James Glass (Massachusetts Institute of Technology)
Slot Filling with Weighted Multi-Encoders for Out-of-Domain Values
Poster; 1430–1630
Yuka Kobayashi (Corporate Research & Development Center, Toshiba Corporation), Takami Yoshida (Corporate Research & Development Center, Toshiba Corporation), Kenji Iwata (Corporate Research & Development Center, Toshiba Corporation), Hiroshi Fujimura (Corporate Research & Development Center, Toshiba Corporation)
One-vs-All Models for Asynchronous Training: An Empirical Analysis
Poster; 1430–1630
Aman Alok (Amazon.com Inc.), Rahul Gupta (Amazon.com), Sankaranarayanan Ananthakrishnan (Raytheon BBN Technologies)
Adapting a FrameNet Semantic Parser for Spoken Language Understanding using Adversarial Learning
Poster; 1430–1630
gabriel marzinotto (Orange Labs, Aix Marseille Univ, CNRS, LIS), Géraldine Damnati (Orange Labs), FREDERIC BECHET (Aix Marseille Universite - LIS/CNRS)
M2H-GAN: A GAN-based Mapping from Machine to Human Transcripts for Speech Understanding
Poster; 1430–1630
Titouan parcollet (University of Avignon), Mohamed Morchid (University of Avignon), Xavier Bost (ORKIS), Georges Linares (LIA, University of Avignon)
Ultra-Compact NLU:Neuronal Network Binarization as Regularization
Poster; 1430–1630
Munir Georges (Intel), Krzysztof Czarnowski (Intel), Tobias Bocklet (University Erlangen-Nuremberg)
Speech Model Pre-training for End-to-End Spoken Language Understanding
Poster; 1430–1630
Loren Lugosch (Université de Montréal / Mila), Mirco Ravanelli (Université de Montréal), Patrick Ignoto (Fluent.ai Inc), Vikrant Singh Tomar (Fluent.ai Inc), Yoshua Bengio (U. Montreal)
Spoken Language Intent Detection using Confusion2Vec
Poster; 1430–1630
Prashanth Gurunath Shivakumar (University of Southern California), Mu Yang (University of Southern California), Panayiotis Georgiou (Univ. Southern California)
Investigating Adaptation and Transfer Learning for End-to-End Spoken Language Understanding from Speech
Poster; 1430–1630
Natalia Tomashenko (LIA, University of Avignon), Antoine Caubrière (LIUM, University of Le Mans), Yannick Estève (LIA - Avignon University)

Neural techniques for voice conversion and waveform generation[Mon-P-2-C]
Monday, 16 September, Gallery C

Non-parallel Voice Conversion using Weighted Generative Adversarial Networks
Poster; 1430–1630
Dipjyoti Paul (University of Crete, Greece), Yannis Pantazis (Institute of Applied and Computational Mathematics, FORTH, Greece), Yannis Stylianou (Univ of Crete)
One-shot voice conversion with disentangled representations by leveraging phonetic posteriograms
Poster; 1430–1630
Seyed Hamidreza Mohammadi (Oregon Health and Science Univ.), Taehwan Kim (ObEN, INC./ Caltech)
Investigation of F0 conditioning and Fully Convolutional Networks in Variational Autoencoder based Voice Conversion
Poster; 1430–1630
Wen Chin Huang (Nagoya University), Yi-Chiao Wu (Nagoya University), Chen-Chou Lo (Academia Sinica), Patrick Lumban Tobing (Nagoya University), Tomoki Hayashi (Nagoya University), Kazuhiro Kobayashi (Nagoya University), Tomoki Toda (Nagoya University), Yu Tsao (Academia Sinica), Hsin-Min Wang (Academia Sinica)
Jointly Trained Conversion Model and WaveNet Vocoder for Non-parallel Voice Conversion using Mel-spectrograms and Phonetic Posteriorgrams
Poster; 1430–1630
Songxiang Liu (CUHK), Yuewen Cao (The Chinese University of HongKong), Xixin Wu (The Chinese University of Hong Kong), Lifa Sun (The Chinese University of Hong Kong), Xunying Liu (Chinese University of Hong Kong), Helen Meng (The Chinese University of Hong Kong)
Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
Poster; 1430–1630
Li-Wei Chen (National Taiwan University), Hung-Yi Lee (National Taiwan University), Yu Tsao (Academia Sinica)
Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion
Poster; 1430–1630
Shaojin Ding (Texas A&M University), Ricardo Gutierrez-Osuna (Texas A&M University)
One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization
Poster; 1430–1630
Ju-chieh Chou (National Taiwan University (NTU)), Hung-yi Lee (National Taiwan University (NTU))
One-shot Voice Conversion with Global Speaker Embeddings
Poster; 1430–1630
Hui Lu (Tsinghua University), Zhiyong Wu (Tsinghua University), Dongyang Dai (Tsinghua University), Runnan Li (Tsinghua University (THU)), Shiyin Kang (Tencent), JIA JIA (Tsinghua University), Helen Meng (The Chinese University of Hong Kong)
Non-Parallel Voice Conversion with Cyclic Variational Autoencoder
Poster; 1430–1630
Patrick Lumban Tobing (Nagoya University), Yi-Chiao Wu (Nagoya University), Tomoki Hayashi (Nagoya University), Kazuhiro Kobayashi (Nagoya University), Tomoki Toda (Nagoya University)
StarGAN-VC2: Rethinking Conditional Methods for StarGAN-Based Voice Conversion
Poster; 1430–1630
Takuhiro Kaneko (NTT Communication Science Laboratories), Hirokazu Kameoka (NTT Communication Science Laboratories), Kou Tanaka (NTT corporation), Nobukatsu Hojo (NTT)
Robustness of Statistical Voice Conversion based on Direct Waveform Modification against Background Sounds
Poster; 1430–1630
yusuke kurita (Nagoya University), Kazuhiro Kobayashi (Nagoya University), Kazuya TAKEDA (Professor), Tomoki Toda (Nagoya University)
Fast Learning for Non-Parallel Many-to-Many Voice Conversion with Residual Star Generative Adversarial Networks
Poster; 1430–1630
Shengkui Zhao (Machine Intelligence Technology, Alibaba Group), Trung Hieu Nguyen (Machine Intelligence Technology, Alibaba Group), Hao Wang (Machine Intelligence Technology, Alibaba Group), Bin Ma (Machine Intelligence Technology, Alibaba Group)
GELP: GAN-Excited Liner Prediction for Speech Synthesis from Mel-spectrogram
Poster; 1430–1630
Lauri Juvela (Aalto University), Bajibabu Bollepalli (Aalto University), Junichi Yamagishi (National Institute of Informatics), Paavo Alku (Aalto University)
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation
Poster; 1430–1630
Ryuichi Yamamoto (LINE Corp.), Eunwoo Song (NAVER Corp.), Jae Min Kim (NAVER Corp.)

Speech Signal Characterization 2[Mon-P-2-D]
Monday, 16 September, Hall 10/D

Salient Speech Representations based on Cloned Networks
Poster; 1430–1630
Bastiaan Kleijn (Victoria University of Wellington), Felicia Lim (Google), Michael Chinen (Google), Jan Skoglund (Google)
Low resource automatic intonation classification using gated recurrent unit (GRU) networks pre-trained with synthesized pitch patterns
Poster; 1430–1630
Atreyee Saha (Deartment of EE, Jadavpur University), Chiranjeevi Yarra (PhD Student), Prasanta Ghosh (Assistant Professor, EE, IISc)
ASR inspired syllable stress detection for pronunciation evaluation without using a supervised classifier and syllable level features
Poster; 1430–1630
Manoj Kumar Ramanathi (Indian Institute of Science), Chiranjeevi Yarra (PhD Student), Prasanta Ghosh (Assistant Professor, EE, IISc)
Acoustic and articulatory feature based speech rate estimation using a convolutional dense neural network
Poster; 1430–1630
Renuka Mannem (Indian Institute of Science), Jhansi Mallela (Rajiv Gandhi Univeristy of Knowledge Technologies, Kadapa), Aravind Illa (Indian Institute of Science), Prasanta Ghosh (Assistant Professor, EE, IISc)
Predictive Auxiliary Variational Autoencoder for Representation Learning of Global Speech Characteristics
Poster; 1430–1630
Sebastian Springenberg (University of Hamburg), Egor Lakomkin (University of Hamburg), Cornelius Weber (University of Hamburg), Stefan Wermter (University of Hamburg)
Unsupervised low-rank representations for speech emotion recognition
Poster; 1430–1630
Georgios Paraskevopoulos (National Technical University of Athens), Efthymios Tzinis (School of Electrical & Computer Engineering, National Technical University of Athens, Greece; Behavioral Signals Technologies, Los Angeles, CA, USA), Nikolaos Ellinas (School of Electrical & Computer Engineering, National Technical University of Athens, Greece), Theodoros Giannakopoulos (Behavioral Signals), Alexandros Potamianos (National Technical University of Athens)
On the Suitability of the Riesz Spectro-Temporal Envelope for WaveNet Based Speech Synthesis
Poster; 1430–1630
Jitendra Dhiman (Indian Institute of Science Bangalore), Nagaraj Adiga (University of Crete), Chandra Sekhar Seelamantula (Indian Institute of Science, Bangalore)
Autonomous emotion learning in speech: A view of zero-shot speech emotion recognition
Poster; 1430–1630
Xinzhou Xu (Nanjing University of Posts and Telecommunications), Jun Deng (Agile Robots AG), Nicholas Cummins (University of Augsburg), Zixing Zhang (Imperial College London), Li Zhao (Southeast University), Björn Schuller (University of Augsburg / Imperial College London)
An improved goodness of pronunciation (GoP) measure for pronunciation evaluation with DNN-HMM system considering HMM transition probabilities
Poster; 1430–1630
Sweekar Sudhakara (Indian Institute of Science), Manoj Kumar Ramanathi (Indian Institute of Science), Chiranjeevi Yarra (PhD Student), Prasanta Ghosh (Assistant Professor, EE, IISc)

Model adaptation for ASR[Mon-P-2-E]
Monday, 16 September, Hall 10/E

Exploiting semi-supervised training through a dropout regularization in end-to-end speech recognition
Poster; 1430–1630
Subhadeep Dey (IDIAP), Petr Motlicek (Idiap Research Institute), Trung Bui (Adobe Research), Franck Dernoncourt (Adobe Research)
A Multi-Accent Acoustic Model using Mixture of Experts for Speech Recognition
Poster; 1430–1630
Abhinav Jain (Samsung Research Institute - Bangalore, India), Vishwanath P Singh (Samsung Research Institute - Bangalore), Shakti Rath (Samsung Research Institute Bangalore)
Personalizing ASR for Dysarthric and Accented Speech with Limited Data
Poster; 1430–1630
Joel Shor (Google), Dotan Emanuel (Google), Oran Lang (Google), Omry Tuval (Google), Michael Brenner (Google), Julie Cattiau (Google), Fernando Vieira (ALS Therapy Development Institute), Maeve McNally (ALS Therapy Development Institute), Taylor Charbonneau (ALS Therapy Development Institute), Melissa Nollstadt (ALS Therapy Development Institute), Avinatan Hassidim (Google), Yossi Matias (Google)
Improved vocal tract length perturbation for a state-of-the-art end-to-end speech recognition system
Poster; 1430–1630
Chanwoo Kim (Samsung Research), Minkyu Shin (Samsung Research), Abhinav Garg (Samsung Research), Dhananjaya Gowda (Samsung Research)
Multi-Accent Adaptation based on Gate Mechanism
Poster; 1430–1630
Han Zhu (University of Chinese Academy of Sciences), Li Wang (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics), Pengyuan Zhang (University of Chinese Academy of Sciences), Yonghong Yan (University of Chinese Academy of Sciences)
Unsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech Recognition
Poster; 1430–1630
Pengcheng Guo (Northwestern Polytechnical University), Sining Sun (Northwestern Polytechnical University), Lei Xie (Northwestern Polytechnical University)
Cumulative Adaptation for BLSTM Acoustic Models
Poster; 1430–1630
Markus Kitza (RWTH Aachen University), Pavel Golik (Human Language Technology and Pattern Recognition, Computer Science Department, RWTH Aachen University, 52056 Aachen, Germany), Ralf Schlüter (Lehrstuhl Informatik 6, RWTH Aachen University), Hermann Ney (RWTH Aachen University)
Fast DNN Acoustic Model Speaker Adaptation by Learning Hidden Unit Contribution Features
Poster; 1430–1630
Xurong Xie (Chinese University of Hong Kong), Xunying Liu (Chinese University of Hong Kong), Tan Lee (The Chinese University of Hong Kong), Lan Wang (SIAT)
End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System
Poster; 1430–1630
Emiru Tsunoo (Sony Corporation), Yosuke Kashiwagi (Sony Corporation), Satoshi Asakawa (Sony Corporation), Toshiyuki Kumakura (Sony Corporation)
Learning Speaker Aware Offsets for Speaker Adaptation of Neural Networks
Poster; 1430–1630
Leda Sari (University of Illinois at Urbana-Champaign), Samuel Thomas (IBM TJ Watson Research Center), Mark Hasegawa-Johnson (University of Illinois)
An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models
Poster; 1430–1630
Khe Chai Sim (Google Inc), Petr Zadrazil (Google), Francoise Beaufays (0)

Applications in Language Learning and Healthcare[Mon-S&T-1]
Monday, 16 September, Hall 4

Apkinson: a Mobile Solution for Multimodal Assessment of Patients with Parkinson’s Disease
Show&Tell; 1430–1630
Juan Vasquez-Correa (Pattern Recognition Lab, Friedrich-Alexander-Universität, Erlangen, Germany), T. Arias-Vergara , P. Klumpp , M. Strauss , A. Küderle , N. Roth , S. Bayerl , N. Garcıa-Ospina , P. A. Perez-Toro , L. F. Parra-Gallego , C. D. Rios-Urrego , D. Escobar-Grisales , J. R. Orozco-Arroyave , B. Eskofier , E. Nöth
Depression State Assessment: Application for detection of depression by speech
Show&Tell; 1430–1630
Gábor Kiss (Budapest University of Technology and Economics), Dávid Sztahó , Klára Vicsi
SPIRE-fluent: A self-learning app for tutoring oral fluency to second language English learners
Show&Tell; 1430–1630
Chiranjeevi Yarra (Electrical Engineering, Indian Institute of Science (IISc), Bangalore 560012), Aparna Srinivasan , Sravani Gottimukkala , Prasanta Kumar Ghosh
Using Real-Time Visual Biofeedback for Second Language Instruction
Show&Tell; 1430–1630
Shawn Nissen (Department of Communication Disorders, Brigham Young University, Provo, Utah), Rebecca Nissen
Splash: Speech and Language Assessment in Schools and Homes
Show&Tell; 1430–1630
A. Mirawdeli (Education Department, Cambridge University), I. Gallagher , J.Gibson , N.Katsos , K.Knill , H. Wood
Using Ultrasound Imaging to Create Augmented Visual Biofeedback for Articulatory Practice
Show&Tell; 1430–1630
Colin T. Annand (University of Cincinnati), Maurice Lamb , Sarah Dugan , Sarah R. Li , Hannah M. Woeste , T. Douglas Mast , Michael A. Riley , Jack A. Masterson , Neeraja Mahalingam , Kathryn J. Eary , Caroline Spencer , Suzanne Boyce , Stephanie Jackson , Anoosha Baxi , Reneé Seward
Speech based web navigation for movement impaired users
Show&Tell; 1430–1630
Vasiliy Radostev (Microsoft AI&R, Bing), Serge Berger , Pasha Kamyshev

Dynamics of Emotional Speech Exchanges in Multimodal Communication[Mon-SS-2-6]
Monday, 16 September, Hall 3 [More info]

The Dependability of Voice on Elders’ Acceptance of Humanoid Agents
Oral; 1430–1442
Anna Esposito (Università degli Studi della Campania), Terry Amorese (Università degli Studi della Campania), Marialucia Cuciniello (Università degli Studi della Campania), Maria Teresa Riviello (Università degli Studi della Campania), Antonietta M. Esposito (Istituto Nazionale di Geofisica e Vulcanologia), Alda Troncone (Università degli Studi della Campania), gennaro cordasco (Università degli Studi della Campania “Luigi Vanvitelli”)
God as interlocutor - real or imaginary? Prosodic markers of dialogue speech and expected efficacy in spoken prayer
Oral; 1442–1454
Oliver Niebuhr (University of Southern Denmark), Uffe Schjoedt (Aarhus University)
Expressiveness influences human vocal alignment toward voice-AI
Oral; 1454–1506
Michelle Cohn (University of California, Davis), Georgia Zellou (UC Davis)
Detecting Topic-Oriented Speaker Stance in Conversational Speech
Oral; 1506–1518
Catherine Lai (University of Edinburgh), Beatrice Alex (University of Edinburgh, Edinburgh Futures Institute, School of Literatures, Languages and Cultures, School of Informatics), Johanna Moore (University of Edinburgh), Leimin Tian (Monash University), Tatsuro Hori (Toyota Motor Corporation), Gianpiero Francesca (Toyota Motor Corporation)
Fusion Techniques for Utterance-Level Emotion Recognition Combining Speech and Transcripts
Oral; 1518–1530
Jilt Sebastian (TelepathyLabs GmbH), Piero Pierucci (Telepathy Labs GmbH)
Explaining Sentiment Classification
Oral; 1530–1542
Marvin Rajwadi (Intelligent Voice, University of East London), Cornelius Glackin (Intelligent Voice), Julie Wall (University of East London), Nigel Cannings (Intelligent Voice), Gerard Chollet (Intelligent Voice)
Predicting Group-Level Skin Attention to Short Movies from Audio-Based LSTM-Mixture of Experts Models
Oral; 1542–1554
Ricardo Kleinlein (Universidad Politecnica de Madrid), Cristina Luna-Jiménez (Universidad Politécnica de Madrid), Juan Manuel Montero (Universidad Politecnica de Madrid), Zoraida Callejas (University of Granada), Fernando Fernández-Martínez (Universidad Politecnica de Madrid)
Discussion
Oral; 1554–1630

Coffee break in both exhibition foyers, lower and upper level 1[Mon-B-3]
Monday, 16 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 1630–1700

ISCA General Assembly[Mon-G-2]
Monday, 16 September, Main Hall

ISCA General Assembly
General; 1700–1830

Welcome Reception[Mon-S-2]
Monday, 16 September, Messepark

Welcome Reception
Social; 1830–2030

Speaker Check-in[Tue-C]
Tuesday, 17 September, Room 8

Speaker Check-in
Check-In; 0800–1700

Registration[Mon-R-2]
Tuesday, 17 September, Foyer

Registration
Registration; 0800–1700

Keynote 2: Tanja Schultz[Tue-K-2]
Tuesday, 17 September, Main Hall

Keynote
Biosignal processing for human-machine interaction [More info]
Keynote; 0830–0930
Tanja Schultz (University of Bremen)

Coffee break in both exhibition foyers, lower and upper level 1[Tue-B-1]
Tuesday, 17 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 1030–1100

Speech Translation[Tue-O-3-1]
Tuesday, 17 September, Main Hall

Survey Talk
Survey talk: A survey on Speech Translation [More info]
Survey Talk; 1000–1040
Jan Niehues (Maastricht University)
Direct speech-to-speech translation with a sequence-to-sequence model
Oral; 1040–1100
Ye Jia (Google), Ron Weiss (Google Brain), Fadi Biadsy (Google Inc), Wolfgang Macherey (Google), Melvin Johnson (Google), Zhifeng Chen (Google), Yonghui Wu (Google)
End-to-End Speech Translation with Knowledge Distillation
Oral; 1100–1120
Yuchen Liu (National Laboratory of Pattern Recognition, CASIA), hao xiong (Baidu Inc.), Jiajun Zhang (Institute of Automation Chinese Academy of Sciences), Zhongjun He (Baidu, Inc.), Hua Wu (Baidu), Haifeng Wang (Baidu), Chengqing Zong (Institute of Automation, Chinese Academy of Sciences)
Adapting Transformer to End-to-End Spoken Language Translation
Oral; 1120–1140
Mattia Di Gangi (Fondazione Bruno Kessler), Matteo Negri (Fondazione Bruno Kessler), Marco Turchi (Fondazione Bruno Kessler)
Unsupervised phonetic and word level discovery for speech to speech translation for unwritten languages
Oral; 1140–1200
Steven Hillis (Carnegie Mellon University), Anushree Kumar (Carnegie Mellon University), Alan W Black (Carnegie Mellon University)

Speaker Recognition I[Tue-O-3-2]
Tuesday, 17 September, Hall 1

Deep Speaker Recognition: Modular or Monolithic?
Oral; 1000–1020
Gautam Bhattacharya (McGill University), Md Jahangir Alam (ETS/CRIM), Patrick Kenny (CRIM)
On the Usage of Phonetic Information for Text-independent Speaker Embedding Extraction
Oral; 1020–1040
Shuai Wang (Shanghai Jiao Tong University), Johan Rohdin (Brno University of Technology), Lukas Burget (Brno University of Technology), Oldrich Plchot (Brno University of Technology), Yanmin Qian (Shanghai Jiao Tong University), Kai Yu (Shanghai Jiao Tong University), Jan Černocký (Brno University of Technology)
Learning Speaker Representations with Mutual Information
Oral; 1040–1100
Mirco Ravanelli (Université de Montréal), Yoshua Bengio (Mila)
Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification
Oral; 1100–1120
Lanhua You (University of Science and Technology of China), Wu Guo (University of Science and Technology of China), Lirong Dai (University of Science &Technology of China), Jun Du (University of Science and Technologoy of China)
Data Augmentation using Variational Autoencoder for Embedding based Speaker Verification
Oral; 1120–1140
Zhanghao Wu (Shanghai Jiao Tong University), Shuai Wang (Shanghai Jiao Tong University), Yanmin Qian (Shanghai Jiao Tong University), Kai Yu (Shanghai Jiao Tong University)
Deep Neural Network Embeddings with Gating Mechanisms for Text-Independent Speaker Verification
Oral; 1140–1200
Lanhua You (University of Science and Technology of China), Wu Guo (University of Science and Technology of China), Lirong Dai (University of Science &Technology of China), Jun Du (University of Science and Technology of China)

Dialogue Understanding[Tue-O-3-3]
Tuesday, 17 September, Hall 2

Neural Transition Systems for Modeling Hierarchical Semantic Representations
Oral; 1000–1020
Riyaz A. Bhat (International Institute of Information Technology), John Chen (Interactions LLC), Rashmi Prasad (Interactions LLC), Srinivas Bangalore (Interactions LLC)
Mining Polysemous Triplets with Recurrent Neural Networks for Spoken Language Understanding
Oral; 1020–1040
Vedran Vukotic (LAMARK), Christian Raymond (INSA Rennes / IRISA)
Iterative Delexicalization for Improved Spoken Language Understanding
Oral; 1040–1100
Avik Ray (Samsung Research America), Yilin Shen (Samsung Research America), Hongxia Jin (Samsung Research America)
End-to-End Spoken Language Understanding: Bootstrapping in Low Resource Scenarios
Oral; 1100–1120
Swapnil Bhosale (Walchand College of Engineering), Imran Sheikh (TCS Research and Innovation), Sri Harsha Dumpala (TCS Research and Innovation-Mumbai), Sunil Kumar Kopparapu (TCS Research and Innovation - Mumbai)
Recognition of Intentions of Users' Short Responses for Conversational News Delivery System
Oral; 1120–1140
Hiroaki Takatsu (Waseda University), Katsuya Yokoyama (Waseda University), Yoichi Matsuyama (Waseda University), Hiroshi Honda (Honda R&D Co.,Ltd), Shinya Fujie (Chiba Institute of Technology), Tetsunori Kobayashi (Waseda University)
Curriculum-based transfer learning for an effective end-to-end spoken language understanding and domain portability
Oral; 1140–1200
Antoine Caubrière (LIUM, University of Le Mans), Natalia Tomashenko (LIA, University of Avignon), Antoine LAURENT (LIUM - Laboratoire Informatique Université du Mans), Emmanuel Morin (LS2N UMR CNRS 6004), Nathalie Camelin (LIUM - University of Le Mans), Yannick Estève (LIA - Avignon University)

Speech in the Brain[Tue-O-3-4]
Tuesday, 17 September, Hall 11

Spatial and Spectral Fingerprint in The Brain: Speaker Identification from Single Trial MEG Signals
Oral; 1000–1020
Debadatta Dash (The University of Texas at Dallas), Paul Ferrari (University of Texas at Austin), Jun Wang (University of Texas at Dallas)
ERP signal analysis with temporal resolution using a time window bank
Oral; 1020–1040
Annika Nijveld (Centre for Language Studies, Radboud University), Louis ten Bosch (Radboud University Nijmegen), Mirjam Ernestus (Radboud University Nijmegen)
Phase synchronization between EEG signals as a function of differences between stimuli characteristics
Oral; 1040–1100
Louis ten Bosch (Radboud University Nijmegen), Kimberley Mulder (Center for Language Studies, Radboud University, Nijmegen), Lou Boves (Centre for Language and Speech Technology, Radboud University Nijmegen)
The processing of prosodic cues to rhetorical question interpretation: Psycholinguistic and neurolinguistics evidence
Oral; 1100–1120
Mariya Kharaman (University of Konstanz), Manluolan Xu (NA), Carsten Eulitz (University of Konstanz), Bettina Braun (University of Konstanz)
The Neural Correlates Underlying Lexically-guided Perceptual Learning
Oral; 1120–1140
Odette Scharenborg (Multimedia computing, Delft University of Technology), Jiska Koemans (Radboud University Nijmegen), Cybelle Smith (Department of Psychology, University of Illinois at Urbana-Champaign), Mark Hasegawa-Johnson (University of Illinois), Kara Federmeier (3Department of Psychology & Beckman Institute, University of Illinois at Urbana-Champaign)
Speech Quality Evaluation of Synthesized Japanese Speech using EEG
Oral; 1140–1200
Ivan Halim Parmonangan (Nara Institute of Science and Technology), Hiroki Tanaka (Nara Institute of Science and Technology), Sakriani Sakti (Nara Institute of Science and Technology (NAIST) / RIKEN AIP), Shinnosuke Takamichi (University of Tokyo), Satoshi Nakamura (Nara Institute of Science and Technology)

Far-field Speech Recognition[Tue-O-3-5]
Tuesday, 17 September, Hall 12

Multi-Microphone Adaptive Noise Cancellation for Robust Hotword Detection
Oral; 1000–1020
Yiteng Huang (Google Inc.), Turaj Shabestary (Google), Alexander Gruenstein (Google), Li Wan (Google)
Multi-Task Multi-Network Joint-Learning of Deep Residual Networks and Cycle-Consistency Generative Adversarial Networks for Robust Speech Recognition
Oral; 1020–1040
Shengkui Zhao (Machine Intelligence Technology, Alibaba Group), Chongjia Ni (Machine Intelligence Technology, Alibaba Group), Rong Tong (Machine Intelligence Technology, Alibaba Group), Bin Ma (Machine Intelligence Technology, Alibaba Group)
R-vectors: New Technique for Adaptation to Room Acoustics
Oral; 1040–1100
Yuri Khokhlov (STC-innovations Ltd), Alexander Zatvornitskiy (Speech Technology Center,Ltd), Ivan Medennikov (STC-innovations Ltd, ITMO University), Ivan Sorokin (STC-innovations Ltd), Tatiana Prisyach (STC-innovations Ltd), Aleksei Romanenko (ITMO University), Anton Mitrofanov (STC-innovations Ltd), Vladimir Bataev (STC-innovations Ltd), Andrei Andrusenko (STC-innovations Ltd), Mariya Korenevskaya (STC-innovations Ltd), Oleg Petrov (ITMO University, Speech Technology Center Ltd)
Guided Source Separation Meets a Strong ASR Backend: Hitachi/Paderborn University Joint Investigation for Dinner Party ASR
Oral; 1100–1120
Naoyuki Kanda (Hitachi, Ltd.), Christoph Boeddeker (Paderborn University), Jens Heitkämper (Paderborn University), Yusuke Fujita (Hitachi, Ltd.), Shota Horiguchi (Hitachi, Ltd.), Kenji Nagamatsu (Hitachi, Ltd.), Reinhold Haeb-Umbach (Paderborn University)
Unsupervised training of neural mask-based beamforming
Oral; 1120–1140
Lukas Drude (Paderborn University), Jahn Heymann (Paderborn University), Reinhold Haeb-Umbach (Paderborn University)
Acoustic Model Ensembling Using Effective Data Augmentation for CHiME-5 Challenge
Oral; 1140–1200
Feng Ma (iFlytek Research, Hefei, Anhui, P. R. China), Li Chai (University of Science and Technology of China), Jun Du (University of Science and Technologoy of China), Diyuan Liu (University of Science and Technology of China), Zhongfu Ye (University of Science and Technology of China), Chin-Hui Lee (Georgia Institute of Technology)

Speech synthesis: data and evaluation[Tue-P-3-A]
Tuesday, 17 September, Gallery A

Investigating the Effects of Noisy and Reverberant Speech in Text-to-Speech Systems
Poster; 1000–1200
David Ayllon (Oben Inc), Héctor A. Sánchez-Hevia (Oben Inc), Carol Figueroa (Oben Inc), Pierre Lanchantin (ObEN Inc)
A Multimodal Real-Time MRI Articulatory Corpus of French for Speech Research
Poster; 1000–1200
Ioannis Douros (Université de Lorraine, CNRS, Inria, LORIA, Inserm, IADI, F-54000 Nancy, France), Jacques Felblinger (Université de Lorraine, INSERM U1254, IADI, F-54000 Nancy, France; Université de Lorraine, INSERM, CIC-IT 1433, CHRU de Nancy, F-54000 Nancy, France), Jens Frahm (Biomedizinische NMR, MPI für biophysikalische Chemie, 37070 Göettingen, Germany), Karyna Isaieva (IADI, Université de Lorraine, INSERM U1254), Arun A. Joseph (Biomedizinische NMR, MPI für biophysikalische Chemie, 37070 Göettingen, Germany), Yves Laprie (Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Freddy Odille (Université de Lorraine, INSERM U1254, IADI, F-54000 Nancy, France; Université de Lorraine, INSERM, CIC-IT 1433, CHRU de Nancy, F-54000 Nancy, France), Anastasiia Tsukanova (Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Dirk Voit (Biomedizinische NMR, MPI für biophysikalische Chemie, 37070 Göettingen, Germany), Pierre-André Vuissoz (Université de Lorraine, INSERM U1254, IADI, F-54000 Nancy, France)
A Chinese Dataset for Identifying Speakers in Novels
Poster; 1000–1200
Jia-Xiang Chen (University of Science and Technology of China), Zhen-Hua Ling (University of Science and Technology of China), Lirong Dai (University of Science &Technology of China)
CSS10: A Collection of Single Speaker Speech Datasets for 10 Languages
Poster; 1000–1200
Kyubyong Park (Kakao Brain), Thomas Mulc (Expedia, Inc.)
Selection and Training Schemes for Improving TTS Voice Built on Found Data
Poster; 1000–1200
Fang-Yu Kuo (ObEN Inc.), Iris Ouyang (ObEN Inc.), Sandesh Aryal (ObEN Inc.), Pierre Lanchantin (ObEN Inc.)
All Together Now: The Living Audio Dataset
Poster; 1000–1200
David Braude (Cereproc Ltd), Matthew Aylett (CereProc Ltd and University of Edinburgh), Caoimhín Laoide-Kemp (CereProc Ltd), Simone Ashby (Madeira ITI | University of Madeira | LARSyS), Kristen Scott (Madeira ITI | University of Madeira), Brian Ó Raghallaigh (Dublin City University), Anna Braudo (University of Edinburgh), Alex Brouwer (University of Edinburgh), Adriana Stan (Communications Department, Technical University of Cluj-Napoca)
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Poster; 1000–1200
Heiga Zen (Google), Viet Dang (Google), Robert Clark (Google, UK), Yu Zhang (Google), Ron Weiss (Google Brain), Ye Jia (Google), Zhifeng Chen (Google), Yonghui Wu (Google)
Corpus Design using Convolutional Auto-Encoder Embeddings for Audio-Book Synthesis
Poster; 1000–1200
Meysam SHAMSI (Univ Rennes, IRISA), Damien Lolive (Univ Rennes, IRISA), Nelly Barbot (Univ Rennes, IRISA), Jonathan Chevelu (Univ Rennes, IRISA)
Evaluating Intention Communication by TTS using Explicit Definitions of Illocutionary Act Performance
Poster; 1000–1200
Nobukatsu Hojo (NTT), Noboru Miyazaki (NTT)
MOSNet: Deep Learning based Objective Assessment for Voice Conversion
Poster; 1000–1200
Chen-Chou Lo (Academia Sinica), Szu-wei Fu (Research Center for Information Technology Innovation, Academia Sinica), Wen Chin Huang (Nagoya University), Xin Wang (National Institute of Informatics, Japan), Junichi Yamagishi (National Institute of Informatics), Yu Tsao (Academia Sinica), Hsin-Min Wang (Academia Sinica)
Investigating the robustness of sequence-to-sequence text-to-speech models to imperfectly-transcribed training data
Poster; 1000–1200
Jason Fong (University of Edinburgh), Pilar Oplustil (University of Edinburgh), Zack Hodari (University of Edinburgh), Simon King (University of Edinburgh)
Using pupil dilation to measure cognitive load when listening to text-to-speech in quiet and in noise
Poster; 1000–1200
Avashna Govender (The Centre for Speech Technology Research, University of Edinburgh), Anita E Wagner (Graduate School of Medical Sciences, School of Behavioural and Cognitive Neurosciences, University of Groningen), Simon King (University of Edinburgh)

Model training for ASR[Tue-P-3-B]
Tuesday, 17 September, Gallery B

Attention model for articulatory features detection
Poster; 1000–1200
Ievgen Karaulov (Sciforce), Dmytro Tkanov (Sciforce)
Guiding CTC Posterior Spike Timings for Improved Posterior Fusion and Knowledge Distillation
Poster; 1000–1200
Gakuto Kurata (IBM Research), Kartik Audhkhasi (IBM Research)
Direct Neuron-wise Fusion of Cognate Neural Networks
Poster; 1000–1200
Takashi Fukuda (IBM Research), Masayuki Suzuki (IBM Research), Gakuto Kurata (IBM Research)
Two Tiered Distributed Training Algorithm for Acoustic Modeling
Poster; 1000–1200
Pranav Ladkat (Amazon), Oleg Rybakov (Amazon), Radhika Arava (Amazon), SHK (Hari) Parthasarathi (Amazon), I-Fan Chen (Amazon.com, Inc.), Nikko Strom (Amazon.com)
Exploring the Encoder Layers of Discriminative Autoencoders for LVCSR
Poster; 1000–1200
Pin-Tuan Huang (Institute of Information Science, Academia Sinica, Taipei, Taiwan), Hung-Shin Lee (Institute of Information Science, Academia Sinica, Taipei, Taiwan), Syu-Siang Wang (MOST Joint Research Center for AI Technology and All Vista Healthcare), Kuan-Yu Chen (NTUST), Yu Tsao (Academia Sinica), Hsin-Min Wang (Academia Sinica)
Multi-task CTC Training with Auxiliary Feature Reconstruction for End-to-end Speech Recognition
Poster; 1000–1200
Gakuto Kurata (IBM Research), Kartik Audhkhasi (IBM Research)
Framewise Supervised Training towards End-to-End Speech Recognition Models: First Results
Poster; 1000–1200
Mohan Li (Toshiba (China) R&D Center), Yuanjiang Cao (University of New South Wales), Weicong Zhou (Toshiba (China) R&D Center, China), Min Liu (Toshiba(China) R&D Center)
Unbiased semi-supervised LF-MMI training using dropout
Poster; 1000–1200
Sibo Tong (Idiap Research Institute), Apoorv Vyas (Idiap Research Institute), Philip N. Garner (Idiap Research Institute), Herve Bourlard (Idiap Research Institute & EPFL)
Acoustic Model Optimization Based On Evolutionary Stochastic Gradient Descent with Anchors for Automatic Speech Recognition
Poster; 1000–1200
Xiaodong Cui (IBM T. J. Watson Research Center), Michael Picheny (IBM T. J. Watson Research Center)
Whether To Pretrain DNN or Not?: An Empirical Analysis for Voice Conversion
Poster; 1000–1200
Nirmesh Shah (Dhirubhai Ambani Institute of Information and Communication Technology, (DA-IICT), Gandhinagar), Hardik Sailor (DA-IICT), Hemant Patil (DA-IICT Gandhinagar)
Detection of Glottal Closure Instants from Raw Speech using Convolutional Neural Networks
Poster; 1000–1200
Mohit Goyal (Indian Institute of Technology, Delhi, India), Varun Srivastava (Indian Institute of Technology, Delhi), Prathosh A P (Indian Institute of Technology Delhi)
Lattice-based lightly-supervised acoustic model training
Poster; 1000–1200
Joachim Fainberg (The Centre for Speech Technology Research, University of Edinburgh, United Kingdom), Ondrej Klejch (University of Edinburgh), Steve Renals (University of Edinburgh), Peter Bell (University of Edinburgh)
Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR
Poster; 1000–1200
Wilfried Michel (RWTH Aachen University), Ralf Schlüter (RWTH Aachen University), Herman Ney (RWTH Aachen University)
End-to-End Automatic Speech Recognition with a Reconstruction Criterion Using Speech-to-Text and Text-to-Speech Encoder-Decoders
Poster; 1000–1200
Ryo Masumura (NTT Corporation), Hiroshi Sato (NTT Corporation), Tomohiro Tanaka (NTT Corporation), Takafumi Moriya (NTT Corporation), Yusuke Ijima (NTT corporation), Takanobu Oba (NTT Media Intelligence Laboratories, NTT Corporation)
Char+CV-CTC: combining graphemes and consonant/vowel units for CTC-based ASR using Multitask Learning
Poster; 1000–1200
Abdelwahab HEBA (Phd student), Thomas Pellegrini (Université de Toulouse III France), Jean-Pierre Lorré (Linagora), Régine Andre-Obrecht (IRIT- Université de Toulouse)

Network Architectures for Emotion and Paralinguistics Recognition[Tue-P-3-C]
Tuesday, 17 September, Gallery C

Deep Hierarchical Fusion with application in Sentiment Analysis
Poster; 1000–1200
Efthymios Georgiou (School of ECE, National Technical University of Athens), Charilaos Papaioannou (National Technical University of Athens), Alexandros Potamianos (National Technical University of Athens)
Towards Robust Speech Emotion Recognition using Deep Residual Networks for Speech Enhancement
Poster; 1000–1200
Andreas Triantafyllopoulos (audEERING GmbH), Gil Keren (University of Augsburg), Johannes Wagner (audEERING GmbH), Ingmar Steiner (audEERING GmbH), Björn Schuller (University of Augsburg / Imperial College London)
Towards Discriminative Representations and Unbiased Predictions: Class-specific Angular Softmax for Speech Emotion Recognition
Poster; 1000–1200
Zhixuan Li (Tsinghua University), Liang He (Tsinghua University), Jingyang Li (Ministry of Public Security, China), Li Wang (Ministry of Public Security, China), Wei-Qiang Zhang (Tsinghua University)
Learning Temporal Clusters Using Capsule Routing for Speech Emotion Recognition
Poster; 1000–1200
Md Asif Jalal (University of Sheffield), Erfan Loweimi (The University of Sheffield), Roger Moore (University of Sheffield), Thomas Hain (University of Sheffield)
Leveraging Acoustic Cues and Paralinguistic Embeddings to Detect Expression from Voice
Poster; 1000–1200
Vikramjit Mitra (Apple Inc.), Sue Booker (Apple Inc.), Erik Marchi (Apple Inc), David Scott Farrar (Apple Inc.), Ute Dorothea Peitz (Apple Inc.), Bridget Cheng (Apple Inc.), Ermine Teves (Apple Inc.), Anuj Mehta (Apple Inc.), Devang Naik (Apple)
Analysis of Deep Learning Architectures for Cross-corpus Speech Emotion Recognition
Poster; 1000–1200
jack parry (Speech Graphics), Dimitri Palaz (Speech Graphics), Georgia Clarke (Speech Graphics), Pauline Lecomte (Speech Graphics), Rebecca Mead (Speech Graphics), Michael Berger (Speech Graphics), Gregor Hofer (Speech Graphics)
A Path Signature Approach for Speech Emotion Recognition
Poster; 1000–1200
Bo Wang (University of Oxford, the Alan Turing Institute), Maria Liakata (University of Warwick, the Alan Turing Institute), Hao Ni (University College London, the Alan Turing Institute), Terry Lyons (University of Oxford, the Alan Turing Institute), Alejo Nevado-Holgado (University of Oxford), Kate Saunders (University of Oxford)
Employing Bottleneck and Convolutional Features for Speech-Based Physical Load Detection on Limited Data Amounts
Poster; 1000–1200
Olga Egorow (Otto-von-Guericke University Magdeburg), Tarik Mrech (Fraunhofer Institute for Factory Operation and Automation Magdeburg), Norman Weißkirchen (Otto-von-Guericke-Universität), Andreas Wendemuth (Otto-von-Guericke University)
Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling
Poster; 1000–1200
Jinming Zhao (Renmin University of China), Shizhe Chen (Department of Computer Science, Renmin University of China), Jingjun Liang (Renmin University of China), Qin Jin (Renmin University of China)
Predicting Group Performances using a Personality Composite-Network Architecture during Collaborative Task
Poster; 1000–1200
Shun-Chang Zhong (Department of Electrical Engineering, National Tsing Hua University), Yun-Shao Lin (Department of Electrical Engineering, National Tsing Hua University), Chun-Min Chang (Department of Electrical Engineering, National Tsing Hua University), Yi-Ching Liu (College of Management, National Taiwan University), Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University)
Enforcing Semantic Consistency for Cross Corpus Valence Regression from Speech using Adversarial Discrepancy Learning
Poster; 1000–1200
Gao-Yi Chao (Department of Electrical Engineering, National Tsing Hua University), Yun-Shao Lin (Department of Electrical Engineering, National Tsing Hua University), Chun-Min Chang (Department of Electrical Engineering, National Tsing Hua University), Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University)
Deep Learning of Segment-Level Feature Representation with Multiple Instance Learning for Utterance-Level Speech Emotion Recognition
Poster; 1000–1200
Shuiyang Mao (The Chinese University of Hong Kong), P. C. Ching (The Chinese University of Hong Kong), Tan Lee (The Chinese University of Hong Kong)

Acoustic Phonetics[Tue-P-3-D]
Tuesday, 17 September, Hall 10/D

L2 Pronunciation accuracy and context: a pilot study on the realization of geminates in Italian as L2 by French learners
Poster; 1000–1200
Sonia d'Apolito (CRIL (centro di Ricerche Interdisciplinare sul Linguaggio) - Università del Salento), Barbara Gili Fivela (Università del Salento - CRIL (Centro di Ricerche Interdisciplinare sul Linguaggio))
Neural Network-Based Modeling of Phonetic Durations
Poster; 1000–1200
Xizi Wei (Apple), Melvyn Hunt (Apple), Adrian Skilling (Apple)
An acoustic study of vowel undershoot in a system with several degrees of prominence
Poster; 1000–1200
Janina Mołczanow (University of Warsaw), Beata Lukaszewicz (University of Warsaw), Anna Łukaszewicz (University of Warsaw)
A preliminary study of charismatic speech on YouTube: correlating prosodic variation with counts of subscribers - views and likes
Poster; 1000–1200
Stephanie Berger (University of Kiel), Oliver Niebuhr (University of Southern Denmark), Margaret Zellers (University of Kiel)
Phonetic Detail Encoding in Explaining the Size of Speech Planning Window
Poster; 1000–1200
shan luo (Yangzhou University)
Acoustic cues to topic and narrow focus in Egyptian Arabic
Poster; 1000–1200
Dina ElZarka (University of Graz), Barbara Schuppler (SPSC Laboratory, Graz University of Technology), Francesco Cangemi (IfL Phonetik - University of Cologne)
An Acoustic and Articulatory Study of Ewe Vowels: A Comparative Study of Male and Female
Poster; 1000–1200
Kowovi Comivi Alowonou (College of Intelligence and Computing, Tianjin University, Tianjin, China), Jianguo Wei (College of Intelligence and Computing, Tianjin University, Tianjin, China), Wenhuan Lu (College of Intelligence and Computing Tianjin University, Tianjin, China), Zhicheng Liu (College of Intelligence and Computing Tianjin University, Tianjin, China), Kiyoshi Honda (Tianjin University), Jianwu Dang (JAIST)
The Monophthongs of Formal Nigerian English: An Acoustic Analysis
Poster; 1000–1200
Nisad Jamakovic (University of Münster), Robert Fuchs (Englisches Seminar, Westfälische Wilhelms-Universität Münster)
Quantifying fundamental frequency modulation as a function of language - speaking style and speaker
Poster; 1000–1200
Pablo Arantes (São Carlos Federal University), Anders Eriksson (Stockholm University, Department of Linguistics)
The voicing contrast in stops and affricates in the Western Armenian of Lebanon
Poster; 1000–1200
Niamh Kelly (American University of Beirut), Lara Keshishian (American University of Beirut)
``Gra[f]e!" Word-final devoicing of obstruents in Standard French: An acoustic study based on large corpora
Poster; 1000–1200
Adèle Jatteau (LIMSI), Ioana Vasilescu (Limsi-CNRS), Lori Lamel (CNRS/LIMSI), Martine Adda-Decker (LPP (Lab. Phonétique & Phonologie) / LIMSI-CNRS), Nicolas Audibert (Laboratoire de Phonétique et Phonologie, UMR7018 CNRS/Sorbonne-Nouvelle, Paris)
Acoustic Indicators of Deception in Mandarin Daily Conversations Recorded from an Interactive Game
Poster; 1000–1200
Chih-Hsiang Huang (Department of Electrical Engineering, National Tsing Hua University, Taiwan), Huang-Cheng Chou (Department of Electrical Engineering, National Tsing Hua University, Taiwan), Yi-Tong Wu (Department of Electrical Engineering, National Tsing Hua University, Taiwan), Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University, Taiwan), Yi-Wen Liu (Department of Electrical Engineering, National Tsing Hua University, Taiwan)
Prosodic effects on plosive duration in German and Austrian German
Poster; 1000–1200
Barbara Schuppler (SPSC Laboratory, Graz University of Technology), Margaret Zellers (University of Kiel)
Cross-Lingual Consistency of Phonological Features: An Empirical Study
Poster; 1000–1200
Cibu Johny (Google), Alexander Gutkin (Google), Martin Jansche (Google)
Are IP initial vowels acoustically more distinct? Results from LDA and CNN classifications
Poster; 1000–1200
Fanny Guitard-Ivent (Laboratoire de Phonétique et Phonologie (UMR 7018, CNRS – Sorbonne Nouvelle)), Gabriele Chignoli (Laboratoire de Phonétique et Phonologie (UMR 7018, CNRS – Sorbonne Nouvelle)), Cécile Fougeron (Laboratoire de Phonétique et Phonologie (UMR 7018, CNRS – Sorbonne Nouvelle)), Laurianne Georgeton (French Forensic Police Office (SCPTS))

Speech Enhancement: Noise attenuation[Tue-P-3-E]
Tuesday, 17 September, Hall 10/E

Speech Augmentation via Speaker-Specific Noise in Unseen Environment
Poster; 1000–1200
Yanan Guo (School of information science and engineering, Lanzhou University, China), Ziping Zhao (Tianjin Normal University), Yide Ma (School of information science and engineering, Lanzhou University, China), Björn Schuller (University of Augsburg / Imperial College London)
A non-causal FFTNet architecture for speech enhancement
Poster; 1000–1200
Muhammed Shifas PV (Speech Signal Processing Lab, University of Crete), Nagaraj Adiga (University of Crete), Vassilis Tsiaras (Technical University of Crete), Yannis Stylianou (Univ of Crete)
Speech Enhancement with Variance Constrained Autoencoders
Poster; 1000–1200
Daniel Braithwaite (Victoria University of Wellington), Bastiaan Kleijn (Victoria University of Wellington)
UNetGAN: A Robust Speech Enhancement Approach in Time Domain for Extremely Low Signal-to-noise Ratio Condition
Poster; 1000–1200
Xiang Hao (Inner Mongolia University), Xiangdong Su (College of Computer Science, Inner Mongolia University, Huhhot), Zhiyu Wang (Inner Mongolia University), Hui Zhang (Inner Mongolia University)
Towards Generalized Speech Enhancement with Generative Adversarial Networks
Poster; 1000–1200
Santiago Pascual (Universitat Politècnica de Catalunya), Joan Serrà (Telefonica Research), Antonio Bonafonte (Universitat Politècnica de Catalunya)
A Convolutional Neural Network with Non-Local Module for Speech Enhancement
Poster; 1000–1200
Xiaoqi Li (School of Computer Science and Technology, Wuhan University of Technology), Yaxing Li (School of Computer Science and Technology, Wuhan University of Technology), Meng Li (School of Computer Science and Technology, Wuhan University of Technology), Shan Xu (School of Computer Science and Technology, Wuhan University of Technology), Yuanjie Dong (School of Computer Science and Technology, Wuhan University of Technology), Xinrong Sun (School of Computer Science and Technology, Wuhan University of Technology), Shengwu Xiong (School of Computer Science and Technology, Wuhan University of Technology)
IA-NET: Acceleration and Compression of Speech Enhancement using Integer-adder Deep Neural Network
Poster; 1000–1200
Yu-Chen Lin (Department of Computer Science and Information Engineering, National Taiwan University), Yi-Te Hsu (Research Center for Information Technology Innovation, Academia Sinica), Szu-Wei Fu (Department of Computer Science and Information Engineering, National Taiwan University), Yu Tsao (Research Center for Information Technology Innovation, Academia Sinica), Tei-Wei Kuo (Department of Computer Science and Information Engineering, National Taiwan University)
KL-divergence Regularized Deep Neural Network Adaptation for Low-resource Speaker-dependent Speech Enhancement
Poster; 1000–1200
Li Chai (University of Science and Technology of China), Jun Du (University of Science and Technologoy of China), Chin-Hui Lee (Georgia Institute of Technology)
Speech Enhancement with Wide Residual Networks in Reverberant Environments
Poster; 1000–1200
Jorge Llombart (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza), Dayana Ribas Gonzalez (UNIZAR), Antonio Miguel (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Luis Vicente (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Alfonso Ortega (University of Zaragoza), Eduardo Lleida Solano (University of Zaragoza)
A scalable noisy speech dataset and online subjective test framework
Poster; 1000–1200
Ebrahim Beyrami (Microsoft), Chandan Karadagur Ananda Reddy (Microsoft), Jamie Pool (Microsoft), Ross Cutler (Microsoft), Sriram Srinivasan (Microsoft), Johannes Gehrke (Microsoft)
Speech Enhancement for Noise-Robust Speech Synthesis using Wasserstein GAN
Poster; 1000–1200
Nagaraj Adiga (University of Crete), Yannis Pantazis (Institute of Applied and Computational Mathematics, FORTH), Vassilis Tsiaras (Technical University of Crete), Yannis Stylianou (Univ of Crete)

The Second DIHARD Speech Diarization Challenge (DIHARD II)[Tue-SS-3-6]
Tuesday, 17 September, Hall 3 [More info]

The Second DIHARD Diarization Challenge: Dataset - task - and baselines
Oral; 1000–1020
Neville Ryant (Linguistic Data Consortium), Kenneth Church (Baidu, USA), Christopher Cieri (Linguistic Data Consortium, University of Pennsylvania), Alejandrina Cristia (Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS), Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University), Jun Du (University of Science and Technologoy of China), Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012), Mark Liberman (University of Pennsylvania)
LEAP Diarization System for the Second DIHARD Challenge
Oral; 1020–1040
Prachi Singh (Indian Institute of Science, Bangalore), Harsha Vardhan M A (Indian Institute of Science, Bangalore), Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012), Ahilan Kanagasundaram (University of Jaffna)
ViVoLAB Speaker Diarization System for the DIHARD 2019 Challenge
Oral; 1040–1100
Ignacio Viñals (ViVoLab, Aragon Institute for Engineering Research (I3A), University of Zaragoza), Pablo Gimeno (ViVoLab, Aragon Institute for Engineering Research (I3A), University of Zaragoza), Alfonso Ortega (University of Zaragoza), Antonio Miguel (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Eduardo Lleida Solano (University of Zaragoza)
UWB-NTIS Speaker Diarization System for the DIHARD II 2019 Challenge
Oral; 1100–1120
Zbynek Zajic (University of West Bohemia), Marie Kunesova (University of West Bohemia), Marek Hrúz (NTIS), Jan Vanek (Department of Cybernetics, University of West Bohemia in Pilsen)
The Second DIHARD challenge: System Description for USC-SAIL Team
Oral; 1120–1140
Tae Jin Park (University of Southern California), Manoj Kumar (University of Southern California), Nikolaos Flemotomos (University of Southern California), Monisankha Pal (University of Southern California), Raghuveer Peri (University of Southern California), Rimita Lahiri (University of Southern California), Panayiotis Georgiou (Univ. Southern California), Shrikanth Narayanan (University of Southern California)
Speaker Diarization with Deep Speaker Embeddings for DIHARD Challenge II
Oral; 1140–1200
Sergey Novoselov (ITMO University, Speech Technology Center), Aleksei Gusev (ITMO University, Speech Technology Center Ltd.), Artem Ivanov (Speech Technology Center Ltd.), Timur Pekhovsky (ITMO University, Speech Technology Center Ltd), Andrey Shulipa (ITMO University), Anastasia Avdeeva (ITMO University, Speech Technology Center Ltd.), Artem Gorlanov (ITMO University, Speech Technology Center Ltd.), Alexandr Kozlov (Speech Technology Center Ltd.)

Lunch Break in lower foyer[Tue-B-2]
Tuesday, 17 September, Foyer

Lunch Break in lower foyer
Break; 1200–1330

Speaker and Language Recognition I[Tue-O-4-1]
Tuesday, 17 September, Main Hall

Survey Talk
Survey Talk: End-to-end deep neural network based speaker and language recognition [More info]
Survey Talk; 1330–1410
Ming Li (Data Science Research Center, Duke Kunshan University), Weicheng Cai , Danwei Cai
Attention based Hybrid I-vector BLSTM Model for Language Recognition
Oral; 1410–1430
Bharat Padi (Indian Institute of Science), Anand Mohan (Indian Institute of Science), Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012)
RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification
Oral; 1430–1450
Jee-weon Jung (University of Seoul), Hee-Soo Heo (School of Computer Science, University of Seoul, Korea), Ju-ho Kim (University of Seoul), Hye-jin Shim (University of Seoul), Ha-Jin Yu (University of Seoul)
Target Speaker Extraction for Multi-Talker Speaker Verification
Oral; 1450–1510
Wei Rao (Department of Electrical and Computer Engineering, National University of Singapore), Chenglin Xu (Nanyang Technological University), Eng Siong Chng (Nanyang Technological University), Haizhou Li (National University of Singapore)
Improving Keyword Spotting and Language Identification via Neural Architecture Search at Scale
Oral; 1510–1530
Hanna Mazzawi (Google Research), Javier Gonzalvo (Google), Aleks Kracun (Google Speech), Prashant Sridhar (Google Speech), Niranjan Subrahmanya (Google Speech), Ignacio Lopez Moreno (Google Speech), Hyun Jin Park (Google Speech), Patrick Violette (Google Speech)

Speech synthesis: towards end-to-end[Tue-O-4-2]
Tuesday, 17 September, Hall 2

Forward-Backward Decoding for Regularizing End-to-End TTS
Oral; 1330–1350
Yibin Zheng (Institute of Automation, Chinese Academy of Sciences), Xi Wang (Microsoft), Lei He (Microsoft), Shifeng Pan (Microsoft), Frank K Soong (Microsoft), Zhengqi Wen (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Jianhua Tao (Microsoft)
A New GAN-based End-to-End TTS Training Algorithm
Oral; 1350–1410
Haohan Guo (School of Computer Science, Northwestern Polytechnical University, Xian, China), Frank Soong (Microsoft AI & Research, Beijing, China), Lei He (Microsoft AI & Research, Beijing, China), Lei Xie (School of Computer Science, Northwestern Polytechnical University, Xian, China)
Robust Sequence-to-Sequence Acoustic Modeling with Stepwise Monotonic Attention for Neural TTS
Oral; 1410–1430
Mutian He (Beihang University), Yan Deng (Microsoft China), Lei He (Microsoft China)
Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet
Oral; 1430–1450
Mingyang Zhang (National University of Singapore), Xin Wang (National Institute of Informatics), Fuming Fang (National Institute of Informatics), Haizhou Li (National University of Singapore), Junichi Yamagishi (University of Edinburgh)
Training Multi-Speaker Neural Text-to-Speech Systems using Speaker-Imbalanced Speech Corpora
Oral; 1450–1510
Hieu-Thi Luong (National Institute of Informatics), Xin Wang (National Institute of Informatics, Sokendai University), Junichi Yamagishi (University of Edinburgh), Nobuyuki Nishizawa (KDDI Research, Inc.)
Real-time neural text-to-speech with sequence-to-sequence acoustic model and WaveGlow or single Gaussian WaveRNN vocoders
Oral; 1510–1530
Takuma Okamoto (National Institute of Information and Communications Technology), Tomoki Toda (Nagoya University), Yoshinori Shiga (National Institute of Information and Communications Technology), Hisashi Kawai (National Institute of Information and Communications Technology)

Semantic Analysis and Classification[Tue-O-4-3]
Tuesday, 17 September, Hall 2

Fusion Strategy for Prosodic and Lexical Representations of Word Importance
Oral; 1330–1350
Sushant Kafle (Rochester Institute of Technology), Cissi Ovesdotter Alm (Rochester Institute of Technology), Matt Huenerfauth (Rochester Institute of Technology)
Self Attention in Variational Sequential Learning for Summarization
Oral; 1350–1410
Jen-Tzung Chien (National Chiao Tung University), Chun-Wei Wang (National Chiao Tung University)
Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis
Oral; 1410–1430
Zhongkai Sun (Electrical and Computer Engineering, UW-Madison), Prathusha Sarma (Electrical and Computer Engineering, UW-Madison), William Sethares (Electrical and Computer Engineering, UW-Madison), Erik Bucy (CoMC, Texas Tech University)
Interpreting and Improving Deep Neural SLU Models via Vocabulary Importance
Oral; 1430–1450
Yilin Shen (Samsung Research America), Wenhu Chen (UCSB), Hongxia Jin (Samsung Research America)
Assessing the semantic space bias caused by ASR error propagation and its effect on spoken document summarization
Oral; 1450–1510
Máté Ákos Tündik (Budapest University of Technology and Economics), Valér Kaszás (Budapest University of Technology and Economics), György Szaszák (Budapest University of Technology and Economics)
Latent Topic Attention for Domain Classification
Oral; 1510–1530
Peisong Huang (South China Agricultural University), 沛杰 黄 (华南农业大学), Wencheng Ai (South China Agricultural University), Jiande Ding (South China Agricultural University), Jinchuan Zhang (South China Agricultural University)

Speech and Audio Source Separation and Scene Analysis 1[Tue-O-4-5]
Tuesday, 17 September, Hall 12

A Unified Bayesian Source Modelling for Determined Blind Source Separation
Oral; 1330–1350
Chaitanya Prasad Narisetty (NEC Corporation)
Recursive speech separation for unknown number of speakers
Oral; 1350–1410
Naoya Takahashi (SONY), Parthasaarathy Sudarsanam (Sony India Software Centre), Nabarun Goswami (Sony India Software Centre), Yuki Mitsufuji (Sony)
Practical applicability of deep neural networks for overlapping speaker separation
Oral; 1410–1430
Pieter Appeltans (KU Leuven), Jeroen Zegers (KU Leuven, Dept. ESAT), Hugo Van hamme (KU Leuven)
Speech Separation Using Independent Vector Analysis with an Amplitude Variable Gaussian Mixture Model
Oral; 1430–1450
Zhaoyi Gu (Key Laboratory of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing 210093), Jing Lu (Key Laboratory of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing 210093), Kai Chen (Key Laboratory of Modern Acoustics, Institute of Acoustics, Nanjing University, Nanjing 210093)
Improved Speech Separation with Time-and-Frequency Cross-domain Joint Embedding and Clustering
Oral; 1450–1510
Gene-Ping Yang (National Taiwan University), ChaoI Tuan (National Taiwan University), Hung-yi Lee (National Taiwan University (NTU)), Lin-shan Lee (National Taiwan University)
WHAM!: Extending Speech Separation to Noisy Environments
Oral; 1510–1530
Gordon Wichern (Mitsubishi Electric Research Laboratories), Emmett McQuinn (Whisper.ai), Joe Antognini (Whisper.ai), Michael Flynn (Whisper.ai), Richard Zhu (Whisper.ai), Dwight Crow (Whisper.ai), Ethan Manilow (Mitsubishi Electric Research Laboratories), Jonathan Le Roux (Mitsubishi Electric Research Laboratories)

Language learning and databases[Tue-P-4-B]
Tuesday, 17 September, Gallery B

A deep learning approach to automatic characterisation of rhythm in non-native English speech
Poster; 1330–1530
Konstantinos Kyriakopoulos (University of Cambridge), Kate Knill (University of Cambridge), Mark Gales (Cambridge University)
Self-imitating Feedback Generation Using GAN for Computer-Assisted Pronunciation Training
Poster; 1330–1530
Seung Hee Yang (Seoul National University), Minhwa Chung (Seoul National University)
Language learning using Speech to Image retrieval
Poster; 1330–1530
Danny Merkx (Radboud University), Stefan L. Frank (Radboud University), Mirjam Ernestus (Radboud University Nijmegen)
Using Alexa for Flashcard-based Language Learning
Poster; 1330–1530
Lucy Skidmore (University of Sheffield), Roger Moore (University of Sheffield)
The 2019 Inaugural Fearless Steps Challenge: A Giant Leap for Naturalistic Audio
Poster; 1330–1530
John H.L. Hansen (Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems), Aditya Joglekar (The University of Texas at Dallas; CRSS - Center for Robust Speech Systems), Meena Chandra Shekar (The University of Texas at Dallas), Vinay Kothapally (The University of Texas at Dallas - Center for Robust Speech Systems (CRSS)), Lakshmish Kaushik (Center for Robust Speech Systems, University of Texas, Dallas), Chengzhu Yu (The University of Texas at Dallas)
Completely Unsupervised Phoneme Recognition By A Generative Adversarial Network Harmonized With Iteratively Refined Hidden Markov Models
Poster; 1330–1530
Kuan-yu Chen (National Taiwan University), Che-ping Tsai (National Taiwan University), Da-Rong Liu (National Taiwan University), Hung-yi Lee (National Taiwan University (NTU)), Lin-shan Lee (National Taiwan University)
Analysis of native listeners' facial microexpressions while shadowing non-native speech -- Potential of shadowers' facial expressions toward comprehensibility prediction --
Poster; 1330–1530
Tasavat Trisitichoke (The University of Tokyo), Shintaro Ando (The University of Tokyo), Daisuke Saito (The University of Tokyo), Nobuaki Minematsu (The University of Tokyo)
Transparent pronunciation scoring using articulatorily weighted phoneme edit distance
Poster; 1330–1530
Reima Karhila (Aalto University), Anna-Riikka Smolander (University of Helsinki), Sari Ylinen (University of Helsinki), Mikko Kurimo (Aalto University)
Development of Robust Automated Scoring Models Using Adversarial Input for Oral Proficiency Assessment
Poster; 1330–1530
Su-Youn Yoon (Educational Testing Service), Chong Min Lee (Educational Testing Service), Klaus Zechner (ETS), Keelan Evanini (Educational Testing Service)
Impact of ASR Performance on Spoken Grammatical Error Detection
Poster; 1330–1530
Yiting Lu (University of Cambridge), Mark Gales (Cambridge University), Kate Knill (University of Cambridge), Potsawee Manakul (University of Cambridge), Linlin Wang (Cambridge University Engineering Department), Yu Wang (University of Cambridge)

Emotion and personality in conversation[Tue-P-4-C]
Tuesday, 17 September, Gallery C

Joint Student-Teacher Learning for Audio-Visual Scene-Aware Dialog
Poster; 1330–1530
Chiori Hori (MERL), Anoop Cherian (Mitsubishi Electric Research Laboratories), Tim Marks (Mitsubishi Electric Research Laboratories), Takaaki Hori (Mitsubishi Electric Research Laboratories)
Do Conversational Partners Entrain on Articulatory Precision?
Poster; 1330–1530
Nichola Lubold (Arizona State University), Stephanie Borrie (Utah State University), Tyson Barrett (Utah State University), Megan Willi (Arizona State University), Visar Berisha (Arizona State University)
Conversational Emotion Analysis via Attention Mechanisms
Poster; 1330–1530
Zheng Lian (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Bin Liu (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Jian Huang (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China)
Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations
Poster; 1330–1530
Karthik Gopalakrishnan (Amazon), Behnam Hedayatnia (Amazon), Qinlang Chen (Amazon), Anna Gottardi (Amazon), Sanjeev Kwatra (Amazon), Anu Venkatesh (Amazon), Raefer Gabriel (Amazon), Dilek Hakkani-Tur (Amazon Alexa AI)
Analyzing Verbal and Nonverbal Features for Predicting Group Performance
Poster; 1330–1530
Uliyana Kubasova (University of the Fraser Valley), Gabriel Murray (University of the Fraser Valley), McKenzie Braley (University of the Fraser Valley)
Identifying Therapist and Client Personae for Therapeutic Alliance Estimation
Poster; 1330–1530
Victor Martinez (University of Southern California), Nikolaos Flemotomos (University of Southern California), Victor Ardulov (University of Southern California), Krishna Somandepalli (University of Southern California), Simon Goldberg (University of Wisconsin-Madison), Zac Imel (University of Utah), David Atkins (University of Washington), Shrikanth Narayanan (University of Southern California)
Do Hesitations Facilitate Processing of Partially Defective System Utterances? An Exploratory Eye Tracking Study
Poster; 1330–1530
Kristin Haake (Technische Universität Dortmund), Sarah Schimke (Technische Universität Dortmund), Simon Betz (Bielefeld University), Sina Zarrieß (University of Bielefeld)
Influence of Contextuality on Prosodic Realization of Information Structure in Chinese Dialogues
Poster; 1330–1530
Bin Li (Graduate School of Chinese Academy of Social Sciences), Yuan Jia (Institute of Linguistics, Chinese Academy of Social Sciences)
Cross-Lingual Transfer Learning for Affective Spoken Dialogue Systems
Poster; 1330–1530
Kristijan Gjoreski (voice INTER connect GmbH), Aleksandar Gjoreski (voice INTER connect GmbH), Ivan Kraljevski (voice INTER connect GmbH), Diane Hirschfeld (voice INTER connect GmbH)
Identifying Personality Traits Using Overlap Dynamics in Multiparty Dialogue
Poster; 1330–1530
Mingzhi Yu (University of Pittsburgh), Emer Gilmartin (Trinity College Dublin), Diane Litman (University of Pittsburgh)
Identifying Mood Episodes Using Dialogue Features from Clinical Interviews
Poster; 1330–1530
Zakaria Aldeneh (University of Michigan), Mimansa Jaiswal (University of Michigan), Michael Picheny (IBM TJ Watson Research Center), Melvin McInnis (University of Michigan), Emily Mower Provost (University of Michigan)

Voice quality, speech perception, and prosody[Tue-P-4-D]
Tuesday, 17 September, Hall 10/D

The effect of phoneme distribution on perceptual similarity in English
Poster; 1330–1530
Emma O'Neill (University College Dublin), Julie Berndsen (University College Dublin)
F0 Variability Measures Based on Glottal Closure Instants
Poster; 1330–1530
Yu-Ren Chien (Reykjavik University), Michal Borsky (Reykjavik University), Jon Gudnason (Reykjavik University)
Recognition of Creaky Voice from Emergency Calls
Poster; 1330–1530
Lauri Tavi (University of Eastern Finland), Tanel Alumäe (Tallinn University of Technology), Stefan Werner (University of Eastern Finland)
Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features
Poster; 1330–1530
Sofoklis Kakouros (University of Helsinki), Antti Suni (University of Helsinki), Juraj Šimko (University of Helsinki), Martti Vainio (University of Helsinki)
Compensation for French liquid deletion during auditory sentence processing
Poster; 1330–1530
Sharon Peperkamp (CNRS), Alvaro Martin Iturralde Zurita (CNRS)
Prosodic Factors Influencing Vowel Reduction in Russian
Poster; 1330–1530
Daniil Kocharov (Department of Phonetics, Saint Petersburg State University), Tatiana Kachkovskaia (Saint Petersburg State University), Pavel Skrelin (Saint Petersburg State University)
Time to frequency domain mapping of the voice source: the influence of open quotient and glottal skew on the low end of the source spectrum
Poster; 1330–1530
Christer Gobl (Trinity College Dublin), Ailbhe Ní Chasaide (Trinity College Dublin)
Testing the distinctiveness of intonational tunes: Evidence from imitative productions in American English
Poster; 1330–1530
Eleanor Chodroff (University of York), Jennifer Cole (Northwestern University)
A study of a cross-language perception based on cortical analysis using biomimetic STRFs
Poster; 1330–1530
Sangwook Park (Johns Hopkins University), David K. Han (Army Research Laboratory), Mounya Elhilali (Johns Hopkins University)
Perceptual evaluation of early versus late F0 peaks in the intonation structure of Czech question-word questions
Poster; 1330–1530
Pavel Šturm (Institute of Phonetics, Charles University), Jan Volín (Institute of Phonetics, Charles University)
Acoustic Correlates of Phonation Type in Chichimec
Poster; 1330–1530
Anneliese Kelterer (University of Graz), Barbara Schuppler (SPSC Laboratory, Graz University of Technology)

Speech Signal Characterization 3[Tue-P-4-E]
Tuesday, 17 September, Hall 10/E

Direct F0 Estimation with Neural-Network-based Regression
Poster; 1330–1530
Shuzhuang Xu (University of Edinburgh), Hiroshi Shimodaira (Centre for Speech Technology Research, University of Edinburgh)
Acoustic Modeling for Automatic Lyrics-to-Audio Alignment
Poster; 1330–1530
Chitralekha Gupta (National University of Singapore), Emre Yilmaz (National University of Singapore), Haizhou Li (National University of Singapore)
Two-Dimensional Convolutional Recurrent Neural Networks for Speech Activity Detection
Poster; 1330–1530
Anastasios Vafeiadis (Center for Research and Technology Hellas/Information Technologies Institute), Eleftherios Fanioudakis (Technological Educational Institute of Crete, Department of Music Technology and Acoustics), Ilyas Potamitis (Technological Educational Institute of Crete, Department of Music Technology and Acoustics), Konstantinos Votis (Information Technologies Institute, Center for Research and Technology Hellas), Dimitrios Giakoumis (Information Technologies Institute, Center for Research and Technology Hellas), Dimitrios Tzovaras (Information Technologies Institute, Center for Research and Technology Hellas), Liming Chen (Faculty of Computing, Engineering and Media, De Montfort University), Raouf Hamzaoui (Faculty of Computing, Engineering and Media, De Montfort University)
A Study of Soprano Singing in Light of the Source-filter Interaction
Poster; 1330–1530
Tokihiko Kaburagi (Kyushu University)
Real Time Online Visual End Point Detection Using Unidirectional LSTM
Poster; 1330–1530
Tanay Sharma (Samsung R&D Institute India, Bengaluru), Rohith Aralikatti (Samsung Research India Bangalore), Dilip Kumar Margam (Samsung R&D Institute, Bangalore), Abhinav Thanda (Samsung R&D Institute, Bangalore), Sharad Roy (Samsung R&D Institute, Bangalore), Pujitha Appan Kandala (Samsung R&D Institute India, Bengaluru), Shankar M Venkatesan (Samsung R&D Institute, Bangalore)
Fully-Convolutional Network for Pitch Estimation of Speech Signals
Poster; 1330–1530
Luc Ardaillon (STMS IRCAM-CNRS-Sorbonne University UPMC), Axel Roebel (STMS IRCAM-CNRS-Sorbonne University UPMC)
Vocal Pitch Extraction in Polyphonic Music using Convolutional Residual Network
Poster; 1330–1530
Mingye Dong (University of Science and Technology of China), Jie Wu (Miscrosft Search Technology Center Asia, Xiaoice), Jian Luan (Miscrosft Search Technology Center Asia, Xiaoice)
Multi-level Adaptive Speech Activity Detector for Speech in Naturalistic Environments
Poster; 1330–1530
Bidisha Sharma (National University of Singapore, Singapore), Rohan Kumar Das (National University Singapore), Haizhou Li (National University of Singapore)
On the Importance of Audio-source Separation for Singer Identification in Polyphonic Music
Poster; 1330–1530
Bidisha Sharma (National University of Singapore, Singapore), Rohan Kumar Das (National University Singapore), Haizhou Li (National University of Singapore)
Investigating the physiological and acoustic contrasts between choral and operatic singing
Poster; 1330–1530
Hiroko Terasawa (University of Tsukuba), Kenta Wakasa (University of Tsukuba), Hideki Kawahara (Wakayama University), Ken-Ichi Sakakibara (Health Sciences University of Hokkaido)
Optimizing Voice Activity Detection for Noisy Conditions
Poster; 1330–1530
Ruixi Lin (CloudMinds Technology), Charles Costello (Georgia Institute of Technology), Charles Jankowski (CloudMinds Technology), Vishwas Mruthyunjaya (CloudMinds Technology)
Small-Footprint Magic Word Detection Method Using Convolutional LSTM Neural Network
Poster; 1330–1530
Taiki Yamamoto (Tokushima University), Ryota Nishimura (Tokushima University), Masayuki Misaki (Panasonic Corporation), Norihide Kitaoka (Tokushima University)

Speech Processing and Analysis[Tue-S&T-2]
Tuesday, 17 September, Hall 4

Directional Audio Rendering Using a Neural Network Based Personalized HRTF
Show&Tell; 1330–1530
Geon Woo Lee (School of EECS, Gwangju Institute of Science and Technology (GIST), Gwangju 61005), Jung Hyuk Lee , Seong Ju Kim , Hong Kook Kim
Online Speech Processing and Analysis Suite
Show&Tell; 1330–1530
Wikus Pienaar (Centre for Text Technology (CTexT), North-West University Potchefstroom), Daan Wissing
Formant pattern and spectral shape ambiguity of vowel sounds, and related phenomena of vowel acoustics - Exemplary evidence
Show&Tell; 1330–1530
Dieter Maurer (Institute for the Performing Arts and Film, Zurich University of the Arts), Heidy Suter , Christian d’Hereuse , Volker Dellwo
Sound Tools eXtended (STx) 5.0 – a powerful sound analysis tool optimized for speech
Show&Tell; 1330–1530
Anton Noll (Acoustics Research Institute, Austrian Academy of Sciences, Vienna), Jonathan Stuefer , Nicola Klingler , Hannah Leykum , Carina Lozo , Jan Luttenberger , Michael Pucher , Carolin Schmid
FarSpeech: Arabic Natural Language Processing for Live Arabic Speech
Show&Tell; 1330–1530
Mohamed Eldesouki (Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha), Naassih Gopee , Ahmed Ali , Kareem Darwish
A System for Real-time Privacy Preserving Data Collection for Ambient Assisted Living
Show&Tell; 1330–1530
Fasih Haider (Usher Institute of Population Health Sciences & Informatics Edinburgh Medical School, the University of Edinburgh), Saturnino Luz
NUS Speak-to-Sing: AWeb Platform for Personalized Speech-to-Singing Conversion
Show&Tell; 1330–1530
Chitralekha Gupta (Department of Electrical and Computer Engineering, National University of Singapore), Karthika Vijayan , Bidisha Sharma , Xiaoxue Gao , Haizhou Li

The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge – P[Tue-SS-4-C]
Tuesday, 17 September, Gallery C [More info]

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks
Poster; 1330–1430
Cheng-I Lai (Johns Hopkins University), Nanxin Chen (Johns Hopkins University), Jesus Villalba (Johns Hopkins University), Najim Dehak (Johns Hopkins University)
Long Range Acoustic Features for Spoofed Speech Detection
Poster; 1330–1430
Rohan Kumar Das (National University Singapore), Jichen Yang (National University of Singapore), Haizhou Li (National University of Singapore)
Transfer-Representation Learning for Detecting Spoofing Attacks with Converted and Synthesized Speech in Automatic Speaker Verification System
Poster; 1330–1430
Su-Yu Chang (National Sun Yat-sen University), Kai-Cheng Wu (National Sun Yat-sen University), Chia-Ping Chen (National Sun Yat-sen University)
A Light Convolutional GRU-RNN Deep Feature Extractor for ASV Spoofing Detection
Poster; 1330–1430
Alejandro Gómez Alanís (University of Granada), Antonio M. Peinado (Universidad de Granada), Jose A. Gonzalez (University of Malaga), Angel Gomez (University of Granada)
Detecting Spoofing Attacks Using VGG and SincNet: BUT-Omilia Submission to ASVspoof 2019 Challenge
Poster; 1330–1430
Hossein Zeinali (Brno University of Technology), Themos Stafylakis (Omilia - Conversational Intelligence), Georgia Athanasopoulou (School of Electronic & Computer Engineering, Technical University of Crete, Chania, Greece), Johan Rohdin (Brno University of Technology), Ioannis Gkinis (Omilia - Conversational Intelligence), Lukas Burget (Brno University of Technology), Jan Černocký (Brno University of Technology)
Deep Residual Neural Networks for Audio Spoofing Detection
Poster; 1330–1430
Moustafa Alzantot (UCLA), Ziqi Wang (UCLA), Mani Srivastava (UCLA)
Replay attack detection with complementary high-resolution information using end-to-end DNN for the ASVspoof 2019 Challenge
Poster; 1330–1430
Jee-weon Jung (University of Seoul), Hye-jin Shim (University of Seoul), Hee-Soo Heo (School of Computer Science, University of Seoul, Korea), Ha-Jin Yu (University of Seoul)
Ensemble Models for Spoofing Detection in Automatic Speaker Verification
Poster; 1330–1430
Bhusan Chettri (Queen Mary University of London), Daniel Stoller (Queen Mary University of London), Veronica Morfi (Queen Mary University of London), Marco Martinez (Queen Mary University of London), Emmanouil Benetos (Queen Mary University of London), Bob L. Sturm (KTH Royal Institute of Engineering)
The DKU Replay Detection System for the ASVspoof 2019 Challenge: On Data Augmentation - Feature Representation - Classification - and Fusion
Poster; 1330–1430
Weicheng Cai (Sun Yat-sen University), Haiwei Wu (Sun Yat-sen University), Danwei Cai (Duke Kunshan University), Ming Li (Duke Kunshan University)
Robust Bayesian and Light Neural Networks for Voice Spoofing Detection
Poster; 1330–1430
Radosław Białobrzeski (Samsung R&D Institute Poland), Michał Kośmider (Samsung R&D Institute Poland), Mateusz Matuszewski (Samsung R&D Institute Poland), Marcin Plata (Samsung R&D Institute Poland), Alexander Rakowski (Samsung R&D Institute Poland)
STC Antispoofing Systems for the ASVspoof2019 Challenge
Poster; 1330–1430
Galina Lavrentyeva (ITMO University, Speech Technology Center), Sergey Novoselov (ITMO University, Speech Technology Center), Tseren Andzhukaev (Speech Technology Center), Marina Volkova (Speech Technology Center), Artem Gorlanov (Speech Technology Center), Alexandr Kozlov (Speech Technology Center Ltd.)
The SJTU Robust Anti-spoofing System for the ASVspoof 2019 Challenge
Poster; 1330–1430
Yexin Yang (Shanghai Jiao Tong University), Hongji Wang (Shanghai Jiao Tong University), Heinrich Dinkel (Shanghai Jiao Tong University), Zhengyang Chen (MoE Key Lab of Artificial Intelligence SpeechLab, Department of Computer Science and EngineeringShanghai Jiao Tong University, Shanghai, China), Shuai Wang (Shanghai Jiao Tong University), Yanmin Qian (Shanghai Jiao Tong University), Kai Yu (Shanghai Jiao Tong University)
IIIT-H Spoofing Countermeasures for Automatic Speaker Verification Spoofing and Countermeasures Challenge 2019
Poster; 1330–1430
K N R K Raju Alluri (IIIT - Hyderabad), Anil Kumar Vuppala (IIIT Hyderabad)
Anti-Spoofing Speaker Verification System with Multi-Feature Integration and Multi-Task Learning
Poster; 1330–1430
Rongjin Li (Xiamen University), Miao Zhao (Xiamen University), Zheng Li (Xiamen University), Lin Li (Xiamen University), Qingyang Hong (Xiamen University)
Speech Replay Detection with x-Vector Attack Embeddings and Spectral Features
Poster; 1330–1430
Jennifer Williams (University of Edinburgh), Joanna Rownicka (The University of Edinburgh)

The 2019 Automatic Speaker Verification Spoofing and Countermeasures Challenge: ASVspoof Challenge – O[Tue-SS-4-4]
Tuesday, 17 September, Hall 11 [More info]

ASVspoof 2019: Future Horizons in Spoofed and Fake Audio Detection
Oral; 1430–1500
Massimiliano Todisco (EURECOM - School of Engineering & Research Center - Digital Security Department), Xin Wang (National Institute of Informatics, Sokendai University), Ville Vestman (School of Computing, University of Eastern Finland, Finland), Md Sahidullah (University of Eastern Finland), Héctor Delgado (EURECOM), Andreas Nautsch (EURECOM), Junichi Yamagishi (National Institute of Informatics), Tomi Kinnunen (University of Eastern Finland), Nicholas Evans (EURECOM), Kong Aik Lee (Data Science Research Laboratories, NEC Corporation)
Discussion
Oral; 1500–1530

Coffee break in both exhibition foyers, lower and upper level 1[Tue-B-3]
Tuesday, 17 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 1530–1600

Speech Intelligibility[Tue-O-5-1]
Tuesday, 17 September, Main Hall

Survey Talk
Survey talk: Preserving Privacy in Speaker and Speech Characterisation [More info]
Survey Talk; 1600–1640
Andreas Nautsch (EURECOM, Sophia Antipolis)
Evaluating Near End Listening Enhancement Algorithms in Realistic Environments
Oral; 1640–1700
Carol Chermaz (The Centre for Speech Technology Research, The University of Edinburgh), Cassia Valentini-Botinhao (The Centre for Speech Technology Research, University of Edinburgh), Henning Schepker (Dept. of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg), Simon King (University of Edinburgh)
Improvement and Assessment of Spectro-Temporal Modulation Analysis for Speech Intelligibility Estimation
Oral; 1700–1720
Amin Edraki (Department of Electrical and Computer Engineering, Queen’s University, Kingston, Canada.), Wai-Yip Chan (Department of Electrical and Computer Engineering, Queen’s University, Kingston, Canada.), Jesper Jensen (Department of Electronic Systems, Aalborg University, Aalborg, Denmark. + Oticon, Denmark), Daniel Fogerty (Department of Communication Sciences and Disorders, University of South Carolina, Columbia, USA.)
Listener Preference on the Local Criterion for Ideal Binary-Masked Speech
Oral; 1720–1740
Zhuohuang Zhang (Indiana University Bloomington), Yi Shen (Indiana University Bloomington)
Using a Manifold Vocoder for Spectral Voice and Style Conversion
Oral; 1740–1800
Tuan Dinh (Oregon Health & Science University), Alexander Kain (Oregon Health & Science University), Kris Tjaden (University of New York at Buffalo)

ASR neural network architectures - 1[Tue-O-5-2]
Tuesday, 17 September, Hall 1

Multi-Span Acoustic Modelling using Raw Waveform Signals
Oral; 1600–1620
Patrick von Platen (University of Cambridge), Chao Zhang (University of Cambridge), Philip C. Woodland (University of Cambridge)
An analysis of local monotonic attention variants
Oral; 1620–1640
André Merboldt (Lehrstuhl Informatik 6, RWTH Aachen University), Albert Zeyer (Human Language Technology and Pattern Recognition Group (Chair of Computer Science 6), Computer Science Department, RWTH Aachen University), Ralf Schlüter (Lehrstuhl Informatik 6, RWTH Aachen University), Hermann Ney (RWTH Aachen University)
Layer Trajectory BLSTM
Oral; 1640–1700
Eric Sun (Microsoft Corp), Jinyu Li (Microsoft Corp), Yifan Gong (Microsoft Corp)
Improving Transformer Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration
Oral; 1700–1720
Shigeki Karita (NTT Communication Science Laboratories), Nelson Yalta (Waseda University), Shinji Watanabe (Johns Hopkins University), Marc Delcroix (NTT Communication Science Laboratories), Atsunori Ogawa (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)
Trainable Dynamic Subsampling for End-to-End Speech Recognition
Oral; 1720–1740
Shucong Zhang (University of Edinburgh), Erfan Loweimi (University of Edinburgh), Yumo Xu (University of Edinburgh), Peter Bell (University of Edinburgh), Steve Renals (University of Edinburgh)
Shallow-Fusion End-to-End Contextual Biasing
Oral; 1740–1800
Ding Zhao (Google Inc.), Tara Sainath (Google), David Rybach (Google), Pat Rondon (Google), Deepti Bhatia (Google Inc.), Bo Li (Google), Ruoming Pang (Google Inc.)

Speech and Language Analytics for Mental Health[Tue-O-5-3]
Tuesday, 17 September, Hall 2

Modeling Interpersonal Linguistic Coordination in Conversations using Word Mover's Distance
Oral; 1600–1620
Md Nasir (University of Southern California), Sandeep Nallan Chakravarthula (University of Southern California), Brain Baucom (University of Utah), David C Atkins (University of Washington), Panayiotis Georgiou (Univ. Southern California), Shrikanth Narayanan (University of Southern California)
Bag-of-Acoustic-Words for Mental Health Assessment: A Deep Autoencoding Approach
Oral; 1620–1640
Wenchao Du (Carnegie Mellon University), Louis-Philippe Morency (Carnegie Mellon University), Jeffrey Cohn (University of Pittsburgh), Alan W Black (Carnegie Mellon University)
Objective Assessment of Social Skills Using Automated Language Analysis for Identification of Schizophrenia and Bipolar Disorder
Oral; 1640–1700
Rohit Voleti (Arizona State University), Stephanie Woolridge (Queen's University), Julie M. Liss (Arizona State University), Melissa Milanovic (Queen's University), Christopher R. Bowie (Queen's University), Visar Berisha (Arizona State University)
Into the Wild: Transitioning from Recognizing Mood in Clinical Interactions to Personal Conversations for Individuals with Bipolar Disorder
Oral; 1700–1720
Katie Matton (University of Michigan), Melvin McInnis (University of Michigan), Emily Mower Provost (University of Michigan)
Detecting Depression with Word-Level Multimodal Fusion
Oral; 1720–1740
Morteza Rohanian (Queen Mary University of London), Julian Hough (Queen Mary University of London), Matthew Purver (Queen Mary University of London)
Assessing Neuromotor Coordination in Depression Using Inverted Vocal Tract Variables
Oral; 1740–1800
Carol Espy-Wilson (University of Maryland), Adam Lammert (MIT Lincoln Laboratory), Nadee Seneviratne (University of Maryland), Thomas Quatieri (MIT Lincoln Laboratory)

Dialogue Modelling[Tue-O-5-4]
Tuesday, 17 September, Hall 11

Towards Universal Dialogue Act Tagging for Task-Oriented Dialogues
Oral; 1600–1620
Shachi Paul (Amazon), Rahul Goel (Amazon Alexa), Dilek Hakkani-Tur (Amazon Alexa AI)
HyST: A Hybrid Approach for Flexible and Accurate Dialogue State Tracking
Oral; 1620–1640
Rahul Goel (Amazon Alexa), Shachi Paul (Amazon), Dilek Hakkani-Tur (Amazon Alexa AI)
Multi-lingual Dialogue Act Recognition with Deep Learning Methods
Oral; 1640–1700
Jiří Martínek (University of West Bohemia, Dept. of Computer Science and Engineering), Pavel Kral (University of West Bohemia, Dept. of Computer Science and Engineering), Ladislav Lenc (University of West Bohemia, Dept. of Computer Science and Engineering), Christophe Cerisara (CNRS)
BERT-DST: Scalable End-to-End Dialogue State Tracking with Bidirectional Encoder Representations from Transformer
Oral; 1700–1720
Guan-Lin Chao (Carnegie Mellon University), Ian Lane (Carnegie Mellon University)
Discovering the Dialog Rules by means of an Evolutionary Approach
Oral; 1720–1740
David Griol (Universidad Carlos III de Madrid), Zoraida Callejas (University of Granada), Arash Eshghi (Heriot-Watt University), Oliver Lemon (Heriot-Watt University, Edinburgh)
Active Learning for Domain Classification in a Commercial Spoken Personal Assistant
Oral; 1740–1800
Xi Chen (Apple Inc.), Adithya Sagar (Apple Inc.), Justine Kao (Apple Inc.), Tony Li (Apple Inc.), Christopher Klein (Apple Inc), Stephen Pulman (Apple Inc.), Ashish Garg (Apple Inc.), Jason Williams (Apple Inc.)

Speech synthesis: pronunciation, multilingual, and low resource[Tue-P-5-A]
Tuesday, 17 September, Gallery A

BOOSTING CHARACTER-BASED CHINESE SPEECH SYNTHESIS VIA MULTI-TASK LEARNING AND DICTIONARY TUTORING
Poster; 1600–1800
Yuxiang Zou (Institute of Automation, Chinese Academy of Sciences, China), Linhao Dong (CASIA), Bo Xu (CASIA)
Developing Pronunciation Models in New Languages Faster by Exploiting Common Grapheme-to-Phoneme Correspondences Across Languages
Poster; 1600–1800
Harry Bleyan (Google), Sandy Ritchie (Google), Jonas Fromseier Mortensen (Google), Daan van Esch (Google)
Cross-lingual - Multi-speaker Text-To-Speech Synthesis Using Neural Speaker Embedding
Poster; 1600–1800
Mengnan Chen (East China Normal University), Minchuan Chen (Pingan Technology), Shuang Liang (Ping An Technology), Jun Ma (Pingan Technology), Lei Chen (East China Normal University), Shaojun Wang (Pingan Technology), Jing Xiao (Pingan Technology)
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features
Poster; 1600–1800
Zexin Cai (Duke Kunshan University, China), Yaogen Yang (Taiyuan University of Technology, Taiyuan, China), Chuxiong Zhang (Duke Kunshan University), Xiaoyi Qin (Sun Yat-sen University, Guangzhou, China), Ming Li (Duke Kunshan University)
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion
Poster; 1600–1800
Hao Sun (Peking University), Xu Tan (Microsoft Research Asia), Jun-Wei Gan (Microsoft), Hongzhi Liu (Peking University), Sheng Zhao (Microsoft), Tao QIN (Microsoft Research Asia), Tie-Yan Liu (0)
Building a mixed-lingual neural TTS system with only monolingual data
Poster; 1600–1800
Liumeng Xue (Northwestern Polytechnical University), Wei Song (JD.COM), Guanghui Xu (JD.COM), Lei Xie (0), Zhizheng Wu (JD.COM)
Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion
Poster; 1600–1800
Alex Sokolov (Amazon Alexa, Applied Scientist), Tracy Rohlin (Amazon), Ariya Rastrow (Amazon)
Analysis of Pronunciation Learning in End-to-End Speech Synthesis
Poster; 1600–1800
Jason Taylor (University of Edinburgh), Korin Richmond (University of Edinburgh)
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Poster; 1600–1800
YUAN-JUI CHEN (National Taiwan University), Tao Tu (National Taiwan University), Cheng-chieh Yeh (National Taiwan University), Hung-yi Lee (National Taiwan University (NTU))
Learning to speak fluently in a foreign language: Multilingual speech synthesis and cross-language voice cloning
Poster; 1600–1800
Yu Zhang (Google Brain), Ron Weiss (Google Brain), Heiga Zen (Google), Yonghui Wu (Google), Zhifeng Chen (Google), RJ Skerry-Ryan (Google), Ye Jia (Google), Andrew Rosenberg (Google LLC), Bhuvana Ramabhadran (Google)
Unified Language-Independent DNN-Based G2P Converter
Poster; 1600–1800
Markéta Jůzová (University of West Bohemia), Daniel Tihelka (University of West Bohemia), Jakub Vít (University of West Bohemia)
Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Poster; 1600–1800
Dongyang Dai (Tsinghua University), Zhiyong Wu (Tsinghua University), Shiyin Kang (Tencent), Xixin Wu (The Chinese University of Hong Kong), JIA JIA (Tsinghua University), Dan Su (Tencent AILab Shenzhen), Dong Yu (Tencent AI Lab), Helen Meng (The Chinese University of Hong Kong)
Transformer based Grapheme-to-Phoneme Conversion
Poster; 1600–1800
Sevinj Yolchuyeva (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary), Géza Németh (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary), Bálint Gyires-Tóth (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary)

Cross-lingual and multilingual ASR[Tue-P-5-B]
Tuesday, 17 September, Gallery B

Multilingual Speech Recognition with Corpus Relatedness Sampling
Poster; 1600–1800
Xinjian Li (CMU), Siddharth Dalmia (Carnegie Mellon University), Alan W Black (Carnegie Mellon University), Florian Metze (Carnegie Mellon University)
On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition
Poster; 1600–1800
Zhiping Zeng (Temasek Laboratories, Nanyang Technological University), Yerbolat Khassanov (School of Computer Science and Engineering, Nanyang Technological University), Van Tung Pham (Temasek Laboratories, Nanyang Technological University), Haihua Xu (Temasek Laboratories, Nanyang Technological University), Eng Siong Chng (School of Computer Science and Engineering, Nanyang Technological University), Haizhou Li (Department of Electrical and Computer Engineering, National University of Singapore)
Towards Language-Universal Mandarin-English Speech Recognition
Poster; 1600–1800
ShiLiang Zhang (Alibaba Group), Yuan Liu (Alibaba Group), Ming Lei (Alibaba Group), Bing Ma (Alibaba Group), lei xie (School of Computer Science, Northwestern Polytechnical University)
Multi-dialect acoustic modeling using phone mapping and online i-vectors
Poster; 1600–1800
Harish Arsikere (Amazon.com), Ashtosh Sapru (Amazon), Sri Garimella (Amazon)
Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model
Poster; 1600–1800
Anjuli Kannan (Google Brain), Arindrima Datta (Google), Tara Sainath (Google), Eugene Weinstein (Google), Bhuvana Ramabhadran (Google), Yonghui Wu (Google), Ankur Bapna (Google), Zhifeng Chen (Google), SeungJi Lee (Google)
Recognition of Latin American Spanish using Multi-task Learning
Poster; 1600–1800
Carlos Mendes (VoiceInteraction - Speech Processing Technologies, SA), Alberto Abad (INESC-ID/IST), João Neto (L2F/INESC-ID/IST), Isabel Trancoso (INESC-ID / IST)
End-to-end Accented Speech Recognition
Poster; 1600–1800
Thibault Viglino (École Polytechnique Fédérale de Lausanne), Petr Motlicek (Idiap Research Institute), Milos Cernak (Logitech Europe)
End-to-End Articulatory Attribute Modeling for Low-resource Multilingual Speech Recognition
Poster; 1600–1800
Sheng Li (National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory), Chenchen Ding (National Institute of Information and Communications Technology (NICT)), Xugang Lu (NICT), Peng Shen (NICT), Tatsuya Kawahara (Kyoto University), Hisashi Kawai (National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory)
Exploiting Monolingual Speech Corpora for Code-mixed Speech Recognition
Poster; 1600–1800
Karan Taneja (IIT Bombay), Satarupa Guha (Microsoft), Preethi Jyothi (Indian Institute of Technology Bombay), Basil Abraham (Microsoft)
Phoneme-Based Contextualization for Cross-Lingual Speech Recognition in End-to-End Models
Poster; 1600–1800
Ke Hu (Google), Antoine Bruguier (Google), Tara Sainath (Google), Rohit Prabhavalkar (Google), Golan Pundak (Google)
Constrained Output Embeddings for End-to-End Code-Switching Speech Recognition with Only Monolingual Data
Poster; 1600–1800
Yerbolat Khassanov (Nanyang Technological University), Haihua Xu (Temasek Laboratories @ NTU, Singapore), Van Tung Pham (Nanyang Technological University), Zhiping Zeng (Temasek Laboratories, Nanyang Technological University), Eng Siong Chng (Nanyang Technological University), Chongjia Ni (I2R), Bin Ma (Alibaba Inc.)

Spoken Term Detection, Confidence Measure, and End-to-End Speech Recognition [Tue-P-5-C]
Tuesday, 17 September, Gallery C

Improving ASR confidence scores for Alexa using acoustic and hypothesis embeddings
Poster; 1600–1800
Prakhar Swarup (Amazon Alexa, Bangalore), Roland Maas (Amazon.com), Sri Garimella (Amazon), Sri Harish Mallidi (Amazon, USA), Bjorn Hoffmeister (Amazon.com)
Analysis of Multilingual Sequence-to-Sequence speech recognition systems
Poster; 1600–1800
Martin Karafiat (FIT BUT), Murali Karthick Baskar (Brno University of Technology), Shinji Watanabe (Johns Hopkins University), Takaaki Hori (MERL), Matthew Wiesner (Johns Hopkins University), Jan Černocký (Brno University of Technology)
Lattice generation in attention-based speech recognition models
Poster; 1600–1800
Michał Zapotoczny (University of Wroclaw), Piotr Pietrzak (University of Wroclaw), Adrian Łańcucki (University of Wroclaw), Jan Chorowski (University of Wroclaw)
Sampling from Stochastic Finite Automata with Applications to CTC Decoding
Poster; 1600–1800
Martin Jansche (Google), Alexander Gutkin (Google)
ShrinkML: End-to-End ASR Model Compression Using Reinforcement Learning
Poster; 1600–1800
Lukasz Dudziak (Samsung AI Center), Mohamed Abdelfattah (Samsung AI Center), Ravichander Vipperla (Nuance Communications), Stefanos Laskaridis (Samsung AI Center), Nicholas Lane (Samsung AI Center, Oxford)
Acoustic-to-Phrase Models for Speech Recognition
Poster; 1600–1800
Yashesh Gaur (Microsoft), Jinyu Li (Microsoft), Zhong Meng (Microsoft Corporation), Yifan Gong (Microsoft Corp)
Performance Monitoring for End-to-End Speech Recognition
Poster; 1600–1800
Ruizhi Li (The Johns Hopkins University), Gregory Sell (Johns Hopkins University), Hynek Hermansky (JHU)
Investigation of Transformer based Spelling Correction Model for CTC-based End-to-End Mandarin Speech Recognition
Poster; 1600–1800
ShiLiang Zhang (Alibaba Group), Ming Lei (Alibaba Group), ZhiJie Yan (Alibaba Group)
Improving Performance of End-to-End ASR on Numeric Sequences
Poster; 1600–1800
Cal Peyser (Google Inc.), Hao Zhang (Google), Tara Sainath (Google), Zelin Wu (Google Inc.)
A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting
Poster; 1600–1800
Ye Bai (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China), Jiangyan Yi (Institute of Automation, Chinese Academy of Sciences), Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China), Zhengqi Wen (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences), Zhengkun Tian (Institute of Automation, Chinese Academy of Sciences), Chenghao Zhao (Jiangsu Normal University, China.), Cunhang Fan (Institute of Automation, Chinese Academy of Sciences)
Sub-band Convolutional Neural Networks for Small-footprint Spoken Term Classification
Poster; 1600–1800
Chieh-Chi Kao (Amazon.com), Ming Sun (Amazon.com), Yixin Gao (Amazon.com), Shiv Vitaladevuni (Amazon.com), Chao Wang (Amazon.com)
Investigating Radical-based End-to-End Speech Recognition Systems for Chinese Dialects and Japanese
Poster; 1600–1800
Sheng Li (National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory), Xugang Lu (NICT), Chenchen Ding (National Institute of Information and Communications Technology (NICT)), Peng Shen (NICT), Tatsuya Kawahara (Kyoto University), Hisashi Kawai (National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory)
JOINT DECODING OF CTC BASED SYSTEMS FOR SPEECH RECOGNITION
Poster; 1600–1800
Jiaqi Guo (Shanghai Jiao Tong University), Yongbin You (AISpeech Ltd), Yanmin Qian (Shanghai Jiao Tong University), Kai Yu (Shanghai Jiao Tong University)
A Joint End-to-End and DNN-HMM Hybrid Automatic Speech Recognition System with Transferring Shared Knowledge
Poster; 1600–1800
Tomohiro Tanaka (NTT Corporation), Ryo Masumura (NTT Corporation), Takafumi Moriya (NTT Corporation), Takanobu Oba (NTT Media Intelligence Laboratories, NTT Corporation), Yushi Aono (NTT Media Intelligence Laboratories, NTT Corporation)
Active Learning Methods for Low Resource End-To-End Automatic Speech Recognition
Poster; 1600–1800
Karan Malhotra (Indian Institute of Science), Shubham Bansal (Indian Institute of Science), Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012)

Speech perception[Tue-P-5-D]
Tuesday, 17 September, Hall 10/D

The role of musical experience in the perceptual weighting of acoustic cues for the obstruent coda voicing contrast in American English
Poster; 1600–1800
Michelle Cohn (University of California, Davis), Georgia Zellou (UC Davis), Santiago Barreda (University of California, Davis)
Place Shift as an Autonomous Process: Evidence from Japanese Listeners
Poster; 1600–1800
Yuriko YOKOE (Sophia University)
A Perceptual Study of CV Syllables in both Spoken and Whistled Speech: a Tashlhiyt Berber Perspective
Poster; 1600–1800
Julien Meyer (Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France), Laure Dentel (The World Whistles Research Association), Silvain Gerber (Univ. Grenoble Alpes, CNRS, GIPSA-lab, Grenoble 38000, France), Rachid Ridouane (LPP (CNRS/Sorbonne Nouvelle))
Consonant classification in Mandarin based on the depth image feature: a pilot study
Poster; 1600–1800
Han-Chi Hsieh (National Yang-Ming University), Wei-Zhong Zheng (National Yang-Ming University), Ko-Chiang Chen (National Yang-Ming University), Ying-Hui Lai (National Yang-Ming University)
The different roles of expectations in phonetic and lexical processing
Poster; 1600–1800
Shiri Lev-Ari (Royal Holloway University of London), Robin Dodsworth (North Carolina State University), Jeff Mielke (North Carolina State University), Sharon Peperkamp (CNRS)
Perceptual adaptation to device and human voices: learning and generalization of a phonetic shift across real and voice-AI talkers
Poster; 1600–1800
Bruno Ferenc Segedin (UC Davis), Michelle Cohn (University of California, Davis), Georgia Zellou (UC Davis)
End-to-end Convolutional Sequence Learning for ASL Fingerspelling Recognition
Poster; 1600–1800
Katerina Papadimitriou (Electrical and Computer Engineering Department, University of Thessaly), Gerasimos Potamianos (University of Thessaly)
Individual differences in implicit attention to phonetic detail in speech perception
Poster; 1600–1800
Natalie Lewandowski (Independent scientist), Daniel Duran (Albert-Ludwigs-Universität)
Effects of Natural Variability in Cross-Modal Temporal Correlations on Audiovisual Speech Recognition Benefit
Poster; 1600–1800
Kaylah Lalonde (Boys Town National Researxh Hospital)
Listening with great expectations: An investigation of word form anticipations in naturalistic speech​
Poster; 1600–1800
Martijn Bentum (Centre for Language Studies, Radboud University), Louis ten Bosch (Radboud University Nijmegen), Antal van den Bosch (Radboud University Nijmegen), Mirjam Ernestus (Radboud University Nijmegen)
Quantifying expectation modulation in human speech processing
Poster; 1600–1800
Martijn Bentum (Centre for Language Studies, Radboud University), Louis ten Bosch (Radboud University Nijmegen), Antal van den Bosch (Radboud University Nijmegen), Mirjam Ernestus (Radboud University Nijmegen)
Perception of Pitch Contours in Speech & Nonspeech
Poster; 1600–1800
Daniel Turner (Northwestern University), Ann Bradlow (Northwestern University), Jennifer Cole (Northwestern University)
Analyzing reaction time and error sequences in lexical decision experiments
Poster; 1600–1800
Louis ten Bosch (Radboud University Nijmegen), Lou Boves (0), Kimberley Mulder (Center for Language Studies, Radboud University, Nijmegen)
Interactions of length and overlap in the TRACE model of spoken word recognition
Poster; 1600–1800
James Magnuson (University of Connecticut)
Automatic detection of the temporal segmentation of hand movements in British English Cued Speech
Poster; 1600–1800
Li Liu (Ryerson university), Jianze Li (Ryerson university), Gang Feng (Gipsa-lab), Xiao-Ping Zhang (Ryerson university)

Topics in Speech and Audio Signal Processing[Tue-P-5-E]
Tuesday, 17 September, Hall 10/E

Multiview Shared Subspace Learning across Speakers and Speech Commands
Poster; 1600–1800
Krishna Somandepalli (University of Southern California), Naveen Kumar (Disney Research), Arindam Jati (University of Southern California), Panayiotis Georgiou (Univ. Southern California), Shrikanth Narayanan (University of Southern California)
A Machine Learning Based Clustering Protocol for Determining Hearing Aid Initial Configurations from Pure-Tone Audiograms
Poster; 1600–1800
Chelzy Belitz (University of Texas at Dallas), Hussnain Ali (University of Texas at Dallas), John H.L. Hansen (Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)
Acoustic Scene Classification with Mismatched Devices Using CliqueNets and Mixup Data Augmentation
Poster; 1600–1800
Truc Nguyen (Graz University of Technology), Franz Pernkopf (Graz University of Technology)
DeepLung: Smartphone Convolutional Neural Network-based Inference of Lung Anomalies for Pulmonary Patients
Poster; 1600–1800
Mohsin Ahmed (University of Virginia), Md Mahbubur Rahman (Samsung research America), Jilong Kuang (Samsung Research America)
On the Use/Misuse of the Term 'Phoneme'
Poster; 1600–1800
Roger Moore (University of Sheffield), Lucy Skidmore (University of Sheffield)
Understanding and Visualizing Raw Waveform-based CNNs
Poster; 1600–1800
Hannah Muckenhirn (Idiap Research Institute and Ecole Polytechnique Fédérale de Lausanne (EPFL)), Vinayak Abrol (Mathematical Institute), Mathew Magimai Doss (Idiap Research Institute), Sebastien Marcel (0)
Fréchet Audio Distance: A Reference-free Metric for Evaluating Music Enhancement Algorithms
Poster; 1600–1800
Kevin Kilgour (Google AI), Mauricio Zuluaga (Google AI), Dominik Roblek (Google AI), Matthew Sharifi (Google AI)
ReMASC: Realistic Replay Attack Corpus for Voice Controlled Systems
Poster; 1600–1800
Yuan Gong (University of Notre Dame), Jian Yang (University of Notre Dame), Jacob Huber (University of Notre Dame), Mitchell MacKnight (University of Notre Dame), Christian Poellabauer (University of Notre Dame)
Analyzing intra-speaker and inter-speaker vocal tract impedance characteristics in a low-dimensional feature space using t-SNE
Poster; 1600–1800
Balamurali B T (Singapore University of Technology and Design), Jer-Ming Chen (ingapore University of Technology and Design)

The Zero Resource Speech Challenge 2019: TTS without T[Tue-SS-5-6]
Tuesday, 17 September, Hall 3 [More info]

The Zero Resource Speech Challenge 2019: TTS without T
Oral; 1600–1615
Ewan Dunbar (Université Paris Diderot), Robin Algayres (Cognitive Machine Learning (ENS - CNRS - EHESS - INRIA - PSL Research University)), Julien Karadayi (Cognitive Machine Learning (ENS - CNRS - EHESS - INRIA - PSL Research University)), Mathieu Bernard (Cognitive Machine Learning (ENS - CNRS - EHESS - INRIA - PSL Research University)), Juan Benjumea (Cognitive Machine Learning (ENS - CNRS - EHESS - INRIA - PSL Research University)), Xuan-Nga Cao (LSCP - EHESS / ENS / PSL Research University / CNRS / INRIA), Lucie Miskic (Laboratoire de Linguistique Formelle (CNRS - Paris Diderot - Sorbonne Paris Cité)), Charlotte Dugrain (Laboratoire de Linguistique Formelle (CNRS - Paris Diderot - Sorbonne Paris Cité)), Lucas Ondel (Brno University of Technology), Alan W Black (Carnegie Mellon University), Laurent Besacier (LIG), Sakriani Sakti (Nara Institute of Science and Technology (NAIST) / RIKEN AIP), Emmanuel Dupoux (Ecole des Hautes Etudes en Sciences Sociales)
Combining Adversarial Training and Disentangled Speech Representation for Robust Zero-Resource Subword Modeling
Oral; 1615–1630
Siyuan Feng (Department of Electronic Engineering, The Chinese University of Hong Kong), Tan Lee (The Chinese University of Hong Kong), Zhiyuan Peng (Department of Electronic Engineering, The Chinese University of Hong Kong)
Temporally-Aware Acoustic Unit Discovery for Zerospeech 2019 Challenge
Oral; 1630–1645
Bolaji Yusuf (Bogazici University), Alican Gök (Bogazici University), Batuhan Gundogdu (Bogazici University), Öykü Deniz Köse (Bogazici University), Murat Saraclar (Bogazici University)
Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks
Oral; 1645–1700
Ryan Eloff (Stellenbosch University), André Nortje (Stellenbosch University), Benjamin van Niekerk (Stellenbosch University), Avashna Govender (University of Edinburgh), Leanne Nortje (Stellenbosch University), Arnu Pretorius (Stellenbosch University), Elan van Biljon (Stellenbosch University), Ewald Van der westhuizen (Stellenbosch University), Lisa van Staden (Stellenbosch University), Herman Kamper (Stellenbosch University)
Unsupervised End-to-End Learning of Discrete Linguistic Units for Voice Conversion
Oral; 1705–1720
Andy T. Liu (College of Electrical Engineering and Computer Science, National Taiwan University), Po-chun Hsu (College of Electrical Engineering and Computer Science, National Taiwan University), Hung-yi Lee (National Taiwan University (NTU))
Zero resource speech synthesis using transcripts derived from perceptual acoustic units
Oral; 1720–1735
Karthik Pandia D S (Indian Institute of Technology Madras), Hema Murthy (IIT Madras)
VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019
Oral; 1735–1750
Andros Tjandra (Nara Institute of Science and Technology), Berrak Sisman (National University of Singapore), Mingyang Zhang (National University of Singapore), Sakriani Sakti (Nara Institute of Science and Technology (NAIST) / RIKEN AIP), Haizhou Li (National University of Singapore), Satoshi Nakamura (Nara Institute of Science and Technology)
General Discussion
Oral; 1750–1800

Speaker Recognition Evaluation[Tue-O-5-5]
Tuesday, 17 September, Hall 12

The 2018 NIST Speaker Recognition Evaluation
Oral; 1740–1800
Seyed Omid Sadjadi (NIST), Craig Greenberg (National Institute of Standards and Technology), Elliot Singer (MIT Lincoln Laboratory), Douglas Reynolds (MIT Lincoln Laboratory), Lisa Mason (U.S. DoD), Jaime Hernandez-Cordero (U.S. DoD)
State-of-the-art Speaker Recognition for Telephone and Video Speech: the JHU-MIT Submission for NIST SRE18
Oral; 1620–1640
Jesus Villalba (Johns Hopkins University), Nanxin Chen (Johns Hopkins University), David Snyder (The Johns Hopkins University), Daniel Garcia-Romero (Human Language Technology Center of Excellence, Johns Hopkins University), Alan McCree (JHU HLTCOE), Gregory Sell (Johns Hopkins University), Jonas Borgstrom (MIT Lincoln Laboratory), Fred Richardson (MIT Lincoln Laboratory), Suwon Shon (Massachusetts Institute of Technology), Francois Grondin (Massachusetts Institute of Technology), Reda Dehak (LSE-EPITA), Leibny Paola Garcia Perera (Nuance Communications Inc), Dan Povey (Johns Hopkins University), Pedro Torres-Carrasquillo (MIT Lincoln Laboratory), Sanjeev Khudanpur (Johns Hopkins University), Najim Dehak (Johns Hopkins University)
X-vector DNN Refinement using Full-length Recordings for Speaker Recognition
Oral; 1640–1700
Daniel Garcia-Romero (Human Language Technology Center of Excellence, Johns Hopkins University), David Snyder (The Johns Hopkins University), Gregory Sell (Johns Hopkins University), Alan McCree (JHU HLTCOE), Dan Povey (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
Oral; 1720–1740
Kong Aik Lee (Data Science Research Laboratories, NEC Corporation), Ville Hautamaki (University of Eastern Finland), Tomi Kinnunen (University of Eastern Finland), Hitoshi Yamamoto (NEC Corporation), Koji Okabe (NEC Corporation), Ville Vestman (School of Computing, University of Eastern Finland, Finland), Jing Huang (JD AI Research), Guohong Ding (JD AI Platform Speech Group), hanwu sun (Institute for Infocomm Research), Anthony Larcher (Université du Mans - LIUM), Rohan Kumar Das (National University Singapore), Haizhou Li (National University of Singapore), Mickael Rouvier (LIA - Avignon University), Pierre-Michel Bousquet (LIA, University uf Avignon), Wei RAO (Nanyang Technological University), Qing Wang (Northwest Polytechnic University), Chunlei Zhang (Tencent AI Lab), Fahimeh Bahmaninezhad (University of Texas at Dallas), Héctor Delgado (EURECOM), Massimiliano Todisco (EURECOM - School of Engineering & Research Center - Digital Security Department)
Pindrop Labs’ Submission to the First Multi-target Speaker Detection and Identification Challenge
Oral; 1600–1620
Elie Khoury (Pindrop), khaled Lakhdhar (Pindrop), Andrew Vaughan (0), Ganesh Sivaraman (Pindrop), Parav Nagarsheth (Pindrop)
Speaker recognition benchmark using the CHiME-5 corpus
Oral; 1700–1720
Daniel Garcia-Romero (Human Language Technology Center of Excellence, Johns Hopkins University), David Snyder (The Johns Hopkins University), Shinji Watanabe (Johns Hopkins University), Gregory Sell (Johns Hopkins University), Alan McCree (JHU HLTCOE), Dan Povey (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)

Student Reception[Tue-S-3]
Tuesday, 17 September, Tesla Lab

Student Reception
Social; 1830–2030

Reviewer Cultural Event[Tue-S-4]
Tuesday, 17 September, Mumuth

Reviewer Cultural Event
Social; 1900–2100

Speaker Check-in[Wed-C]
Wednesday, 18 September, Room 8

Speaker Check-in
Check-In; 0800–1700

Registration[Mon-R-3]
Wednesday, 18 September, Foyer

Registration
Registration; 0800–1700

Keynote 3: Manfred Kaltenbacher[Wed-K-3]
Wednesday, 18 September, Main Hall

Keynote
Physiology and physics of voice production [More info]
Keynote; 0830–0930
Manfred Kaltenbacher (Vienna University of Technology)

Coffee break in both exhibition foyers, lower and upper level 1[Wed-B-1]
Wednesday, 18 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 0930–1030

Prosody[Wed-O-6-1]
Wednesday, 18 September, Main Hall

Survey Talk
Survey talk: Prosody Research and Applications: The State of the Art [More info]
Survey Talk; 1000–1040
Nigel G. Ward (University of Texas at El Paso)
Dimensions of prosodic prominence in an attractor model
Oral; 1040–1100
Simon Roessig (University of Cologne), Doris Mücke (University of Cologne), Lena Pagel (University of Cologne)
Comparative analysis of prosodic characteristics using WaveNet embeddings
Oral; 1100–1120
Antti Suni (University of Helsinki), Marcin Wlodarczak (Stockholm Univeristy), Martti Vainio (University of Helsinki), Juraj Šimko (University of Helsinki)
The Role of Voice Quality in the Perception of Prominence in Synthetic Speech
Oral; 1120–1140
Andy Murphy (Trinity College Dublin), Irena Yanushevskaya (Trinity College Dublin), Ailbhe Ní Chasaide (Trinity College Dublin), Christer Gobl (Trinity College Dublin)
Phonological awareness of French rising contours in Japanese learners
Oral; 1140–1200
Rachel ALBAR (Université Paris 7), Hiyon YOO (Université Paris 7)

Speech and Audio Classification 1[Wed-O-6-2]
Wednesday, 18 September, Hall 1

Audio Classification of Bit-Representation Waveform
Oral; 1000–1020
Masaki Okawa (University of Yamanashi), Takuya Saito (University of Yamanashi), Naoki Sawada (University of Yamanashi), Hiromitsu Nishizaki (University of Yamanashi)
Locality-constrained Linear Coding based Fused Visual Features for Robust Acoustic Event Classification
Oral; 1020–1040
Manjunath Mulimani (National Institute of Technology Karnataka, Surathkal), Shashidhar G Koolagudi (National Institute of Technology Karnataka, Surathkal)
Learning how to listen: A temporal-frequential attention model for sound event detection
Oral; 1040–1100
Yuhan Shen (Tsinghua University), Kexin He (Tsinghua University), Wei-Qiang Zhang (Tsinghua University)
A Deep Residual Network for Large-Scale Acoustic Scene Analysis
Oral; 1100–1120
Logan Ford (MIT), Hao Tang (Massachusetts Institute of Technology), Francois Grondin (Massachusetts Institute of Technology), James Glass (Massachusetts Institute of Technology)
Supervised Classifiers for Audio Impairments with Noisy Labels
Oral; 1120–1140
Chandan Karadagur Ananda Reddy (Microsoft), Ross Cutler (Microsoft), Johannes Gehrke (Microsoft)
Self-attention for Speech Emotion Recognition
Oral; 1140–1200
Lorenzo Tarantino (EPFL), Philip Garner (Idiap), Alexandros Lazaridis (Swisscom)

Singing and multimodal synthesis[Wed-O-6-3]
Wednesday, 18 September, Hall 2

Unsupervised Singing Voice Conversion
Oral; 1000–1020
Eliya Nachmani (FAIR), Lior Wolf (Tel Aviv University)
Adversarially Trained End-to-end Korean Singing Voice Synthesis System
Oral; 1020–1040
Juheon Lee (Seoul National University), Hyeong-Seok Choi (Seoul National University, Music and Audio Research Group), Chang-bin Jeon (Seoul National University, Music and Audio Research Group), Junghyun Koo (Seoul National University, Music and Audio Research Group), Kyogu Lee (Seoul National University)
Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling
Oral; 1040–1100
Yi Yuan-Hao (University of Science and Technology of China), Ai Yang (University of Science and Technology of China), Zhen-Hua Ling (University of Science and Technology of China), Lirong Dai (University of Science &Technology of China)
Conditional Variational Auto-Encoder for Text-Driven Expressive AudioVisual Speech Synthesis
Oral; 1100–1120
Sara Dahmani (University of Lorraine), Vincent Colotte (University of Lorraine), Valérian Girard (Univerty of lorraine), Slim Ouni (Université de Lorraine)
A Strategy for Improved Phone-Level Lyrics-to-Audio Alignment for Speech-to-Singing Synthesis
Oral; 1120–1140
David Ayllon (Oben Inc), Fernando Villavicencio (Oben Inc), Pierre Lanchantin (ObEN Inc)
Modeling Labial Coarticulation with Bidirectional Gated Recurrent Networks and Transfer Learning
Oral; 1140–1200
Théo Biasutto--Lervat (Université de Lorraine, LORIA), Sara Dahmani (Université de Lorraine, LORIA), Slim Ouni (Université de Lorraine, LORIA)

ASR Neural Network Training - 2[Wed-O-6-5]
Wednesday, 18 September, Hall 12

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Oral; 1000–1020
Daniel Park (Google Brain), William Chan (Google Brain), Yu Zhang (Google Brain), Chung-Cheng Chiu (Google), Barret Zoph (Google Brain), Ekin Dogus Cubuk (Google Brain), Quoc Le (Google Brain)
Forget a Bit to Learn Better: Soft Forgetting for CTC-based Automatic Speech Recognition
Oral; 1020–1040
Kartik Audhkhasi (IBM Research), George Saon (IBM), Zoltán Tüske (IBM Research), Brian Kingsbury (IBM Research), Michael Picheny (IBM TJ Watson Research Center)
Online Hybrid CTC/Attention Architecture for End-to-end Speech Recognition
Oral; 1040–1100
Haoran Miao (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China), Gaofeng Cheng (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China), Pengyuan Zhang (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China), Ta Li (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China), Yonghong yan (Key Laboratory of Speech Acoustics and Content Understanding, Institute of Acoustics, China)
A Highly-Efficient Distributed Deep Learning System For Automatic Speech Recognition
Oral; 1100–1120
Wei Zhang (IBM Research), Xiaodong Cui (IBM T. J. Watson Research Center), Ulrich Finkler (IBM Research), George Saon (IBM), Abdullah Kayi (IBM Research), Alper Buyuktosunoglu (IBM Research), Brian Kingsbury (IBM Research), David Kung (IBM Research), Michael Picheny (IBM TJ Watson Research Center)
Knowledge Distillation for End-to-End Monaural Multi-talker ASR System
Oral; 1120–1140
Wangyou Zhang (Shanghai Jiao Tong University), Xuankai Chang (Shanghai Jiao Tong University), Yanmin Qian (Shanghai Jiao Tong University)
Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech
Oral; 1140–1200
Tobias Menne (RWTH Aachen University), Ilya Sklyar (RWTH Aachen University), Ralf Schlüter (Lehrstuhl Informatik 6, RWTH Aachen University), Hermann Ney (RWTH Aachen University)

Speaker Recognition and Anti-spoofing[Wed-P-6-A]
Wednesday, 18 September, Gallery A

Blind Channel Response Estimation for Replay Attack Detection
Poster; 1000–1200
Anderson R. Avila (Institut national de la recherche scientifique), Md Jahangir Alam (ETS/CRIM), Douglas O'Shaughnessy (INRS-EMT (Univ. of Quebec)), Tiago Falk (INRS-EMT, University of Quebec)
Cross-domain replay spoofing attack detection using domain adversarial training
Poster; 1000–1200
Hongji Wang (Shanghai Jiao Tong University), Heinrich Dinkel (Shanghai Jiao Tong University), Shuai Wang (Shanghai Jiao Tong University), Yanmin Qian (Shanghai Jiao Tong University), Kai Yu (Shanghai Jiao Tong University)
A Study of X-vector Based Speaker Recognition on Short Utterances
Poster; 1000–1200
Ahilan Kanagasundaram (University of Jaffna), Sridha Sridharan (Queensland University of Technology), Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012), Prachi Singh (Indian Institute of Science), Clinton Fookes (Queensland University of Technology)
Tied Mixture of Factor Analyzers Layer to Combine Frame Level Representations in Neural Speaker Embeddings
Poster; 1000–1200
Nanxin Chen (Johns Hopkins University), Jesus Villalba (Johns Hopkins University), Najim Dehak (Johns Hopkins University)
Biologically Inspired Adaptive-Q Filterbanks for Replay Spoofing Attack Detection
Poster; 1000–1200
Buddhi Wickramasinghe (School of Electrical Engineering and Telecommunications, UNSW), Eliathamby Ambikairajah (The University of New South Wales, Sydney), Julien Epps (School of Electrical Engineering and Telecommunications, UNSW Australia)
On robustness of unsupervised domain adaptation for speaker recognition
Poster; 1000–1200
Pierre-Michel Bousquet (LIA, University uf Avignon), Mickael Rouvier (LIA - Avignon University)
Large-scale Speaker Retrieval on Random Speaker Variability Subspace
Poster; 1000–1200
Suwon Shon (Massachusetts Institute of Technology), Younggun Lee (Neosapience, Inc.), Taesu Kim (Neosapience, Inc.)
Energy Separation-based Instantaneous Frequency Estimation for Cochlear Cepstral Feature for Replay Spoof Detection
Poster; 1000–1200
Ankur Patil (DA-IICT), Rajul Acharya (DA-IICT), Aditya Krishna Sai Pulikonda (IIIT Vadodara), Hemant Patil (DA-IICT Gandhinagar)
Optimization of False Acceptance/Rejection Rates and Decision Threshold for End-to-End Text-Dependent Speaker Verification Systems
Poster; 1000–1200
Victoria Mingote (University of Zaragoza), Antonio Miguel (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Dayana Ribas (ViVoLab, University of Zaragoza), Alfonso Ortega (University of Zaragoza), Eduardo Lleida Solano (University of Zaragoza)
Deep Hashing for Speaker Identification and Retrieval
Poster; 1000–1200
Lei Fan (Nanjing University), Qing-Yuan Jiang (Nanjing University), Ya-Qi Yu (Nanjing University), Wu-Jun Li (Nanjing University)
Adversarial Optimization for Dictionary Attacks on Speaker Verification
Poster; 1000–1200
Mirko Marras (University of Cagliari), Pawel Korus (New York University - AGH University of Science and Technology), Nasir Memon (New York University), Gianni Fenu (University of Cagliari)
An Adaptive-Q Cochlear Model for Replay Spoofing Detection
Poster; 1000–1200
Tharshini Gunendradasan (University of New South Wales), Eliathamby Ambikairajah (University of New South Wales), Julien Epps (School of Electrical Engineering and Telecommunications, UNSW Australia), Haizhou Li (NUS)
An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network
Poster; 1000–1200
Sungrack Yun (Qualcomm AI Research), Janghoon Cho (Qualcomm AI Research), Wonil Chang (Qualcomm AI Research), Jungyun Eum (Qualcomm AI Research), Kyuwoong Hwang (Qualcomm AI Research)
Shortcut Connections based Deep Speaker Embeddings for End-to-End Speaker Verification System
Poster; 1000–1200
Soonshin Seo (Sogang University), Daniel Rim (Sogang University), Minkyu Lim (Sogang University), Donghyun Lee (Sogang University), Hosung Park (Sogang University), Junseok Oh (Sogang University), Changmin Kim (Sogang University), Ji-Hwan Kim (Sogang University)
Device Feature Extractor for Replay Spoofing Detection
Poster; 1000–1200
Chang Huai You (Institute for Infocomm Research), Jichen Yang (Department of Electrical and Computer Engineering, NUS, Singapore), Tran Huy Dat (Institute for Infocomm Research)

Rich transcription and ASR systems[Wed-P-6-B]
Wednesday, 18 September, Gallery B

Meeting Transcription Using Asynchronous Distant Microphones
Poster; 1000–1200
Takuya Yoshioka (Microsoft), Dimitrios Dimitriadis (Microsoft), Andreas Stolcke (Microsoft Research), William Hinthorn (Microsoft), Zhuo Chen (Microsoft), Michael Zeng (Microsoft), Xuedong Huang (Microsoft Cloud and AI)
The Althingi ASR System
Poster; 1000–1200
Inga Rún Helgadóttir (Reykjavík University), Anna Björk Nikulásdóttir (Reykjavik University), Judy Fong (Reykjavik University), Michal Borsky (Reykjavik University), Róbert Kjaran (Reykjavik University), Jon Gudnason (Reykjavik University)
CRIM's Speech Transcription and Call Sign Detection System for the ATC Airbus Challenge task
Poster; 1000–1200
vishwa gupta (Computer Research Institute of Montreal), Lise Rebout (Computer Research Institute of Montreal), Gilles Boulianne (CRIM - Centre de recherche informatique de Montréal), Pierre-André Ménard (CRIM - Centre de recherche informatique de Montréal), Md Jahangir Alam (ETS/CRIM)
Detection and Recovery of OOVs for Improved English Broadcast News Captioning
Poster; 1000–1200
Samuel Thomas (IBM Research AI), Kartik Audhkhasi (IBM Research AI), Zoltan Tuske (IBM Research AI), Yinghui Huang (IBM Research AI), Michael Picheny (IBM Research AI)
Improving Large Vocabulary Urdu Speech Recognition System using Deep Neural Networks
Poster; 1000–1200
Muhammad Umar Farooq (Center for Language Engineering, Al-Khawarizimi Institute of Computer Sciences (KICS), UET Lahore.), Farah Adeeba (Center for Language Engineering, Al-Khawarizimi Institute of Computer Sciences (KICS), UET Lahore.), Sahar Rauf (Center for Language Engineering, Al-Khawarizimi Institute of Computer Sciences (KICS), UET Lahore.), Sarmad Hussain (Center for Language Engineering, Al-Khawarizimi Institute of Computer Sciences (KICS), UET Lahore.)
Hybrid Arbitration using raw ASR string and NLU information – taking the best of both embedded world and cloud world
Poster; 1000–1200
Min Tang (Nuance Communications)
Leveraging a character - word and prosody triplet for an ASR error robust and agglutination friendly punctuation approach
Poster; 1000–1200
György Szaszák (Budapest University of Technology and Economics), Máté Ákos Tündik (Budapest University of Technology and Economics)
The Airbus Air Traffic Control speech recognition 2018 challenge: towards ATC automatic transcription and call sign detection
Poster; 1000–1200
Thomas Pellegrini (Université de Toulouse III France), Jérôme Farinas (Université Toulouse 3 - Institut de Recherche en Informatique de Toulouse), Estelle Delpech (Airbus), François Lancelot (Airbus)
Kite: Automatic speech recognition for unmanned aerial vehicles
Poster; 1000–1200
Dan Oneață (POLITEHNICA University of Bucharest), Horia Cucu (Speech and Dialogue Research Laboratory, University “Politehnica” of Bucharest)
Exploring Methods for the Automatic Detection of Errors in Manual Transcription
Poster; 1000–1200
Xiaofei Wang (The Johns Hopkins University), Jinyi Yang (Key Laboratory of Speech Acoustics and Content Understanding), Ruizhi Li (The Johns Hopkins University), Samik Sadhu (Johns Hopkins University), Hynek Hermansky (JHU)
Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training
Poster; 1000–1200
Astik Biswas (Stellenbosch University), Raghav Menon (Stellenbosch University), Ewald Van der westhuizen (Stellenbosch University), Thomas Niesler (University of Stellenbosch)

Speech and Language Analytics for Medical Applications[Wed-P-6-C]
Wednesday, 18 September, Gallery C

Optimizing Speech-Input Length for Speaker-Independent Depression Classification
Poster; 1000–1200
Tomasz Rutowski (Ellipsis Health), Amir Harati (Ellipsis Health), Yang Lu (Ellipsis Health), Elizabeth Shriberg (Ellipsis Health)
Feature space visualization with spatial similarity maps for pathological speech data
Poster; 1000–1200
Philipp Klumpp (Friedrich-Alexander-Universität Erlangen-Nürnberg), Juan Camilo Vásquez Correa (Pattern Recognition Lab, Friedrich Alexander University), Tino Haderlein (Friedrich-Alexander-Universität Erlangen-Nürnberg), Elmar Noeth (Friedrich-Alexander-University Erlangen-Nuremberg)
Predicting Behavior in Cancer-Afflicted Patient and Spouse Interactions using Speech and Language
Poster; 1000–1200
Sandeep Nallan Chakravarthula (University of Southern California), Haoqi Li (University of Southern California), Shao-Yen Tseng (University of Southern California), Maija Reblin (Moffitt Cancer Center), Panayiotis Georgiou (Univ. Southern California)
Automatic Assessment of Language Impairment Based on Raw ASR Output
Poster; 1000–1200
Ying Qin (The Chinese University of Hong Kong), Tan Lee (The Chinese University of Hong Kong), Anthony Pak Hin Kong (University of Central Florida)
A New Approach for Automating Analysis of Responses on Verbal Fluency Tests from Subjects At-Risk for Schizophrenia
Poster; 1000–1200
Mary Pietrowicz (IBM Research), Carla Agurto (T.J. Watson IBM Research Laboratory), Raquel Norel (IBM), Elif Eyigoz (IBM Research), Guillermo Cecchi (IBM Research), Zarina Bilgrami (Icahn School of Medicine at Mount Sinai), Cheryl Corcoran (Icahn School of Medicine at Mount Sinai, Bronx VA MIRECC)
Comparison of Telephone Recordings and Professional Microphone Recordings for Early Detection of Parkinson's Disease - using Mel-Frequency Cepstral Coefficients with Gaussian Mixture Models
Poster; 1000–1200
Laetitia Jeancolas (SAMOVAR, UMR 5157, Télécom SudParis, CNRS, Université Paris-Saclay), Graziella Mangone (Sorbonne Université, UPMC Univ Paris 06 UMR S 1127; INSERM U 1127 and CIC 1422; CNRS UMR 7225), Jean Christophe Corvol (Sorbonne Université, UPMC Univ Paris 06 UMR S 1127; INSERM U 1127 and CIC 1422; CNRS UMR 7225), Marie Vidailhet (Sorbonne Université, UPMC Univ Paris 06 UMR S 1127; INSERM U 1127 and CIC 1422; CNRS UMR 7225), Stephane Lehericy (Sorbonne Université, UPMC Univ Paris 06 UMR S 1127; INSERM U 1127 and CIC 1422; CNRS UMR 7225), Badr-Eddine Benkelfat (SAMOVAR, UMR 5157, Télécom SudParis, CNRS, Université Paris-Saclay), Habib Benali (PERFORM Centre, Electrical & Computer Engineering Department Concordia University), Dijana Petrovska (Telecom Sud Paris)
Spectral Subspace Analysis for Automatic Assessment of Pathological Speech Intelligibility
Poster; 1000–1200
Parvaneh Janbakhshi (Idiap Research Institute), Ina Kodrasi (Idiap Research Institute), Herve Bourlard (Idiap Research Institute & EPFL)
An investigation of therapeutic rapport through prosody in brief psychodynamic psychotherapy
Poster; 1000–1200
Carolina De Pasquale (Dublin Institute of Technology), Charlie Cullen (University of the West of Scotland, UK), Brian Vaughan (Dublin Institute of Technology)
Feature Representation of Pathophysiology of Parkinsonian Dysarthria
Poster; 1000–1200
Alice Rueda (Ryerson University), Juan Camilo Vásquez Correa (Pattern Recognition Lab, Friedrich Alexander University), CRISTIAN DAVID RIOS URREGO (Universidad de Antioquia), Juan Rafael Orozco-Arroyave (Universidad de Antioquia), Sridhar Krishnan (Ryerson University), Elmar Noth (Pattern Recognition Lab, Friedrich Alexander University)
Neural Transfer Learning for Cry-based Diagnosis of Perinatal Asphyxia
Poster; 1000–1200
Charles Onu (McGill University), Jonathan Lebensold (McGill University), William Hamilton (McGill University), Doina Precup (McGill University)
Investigating the Variability of Voice Quality and Pain Levels as a Function of Multiple Clinical Parameters
Poster; 1000–1200
Hui-Ting Hong (Department of Electrical Engineering, National Tsing Hua University), Jeng-Lin Li (Department of Electrical Engineering, National Tsing Hua University), Yi-Ming Weng (Department of Emergency Medicine, Tao-Yuan General Hospital), Chip-Jin Ng (Department of Emergency Medicine, Chang Gung Memorial Hospital), Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University)
Assessing Parkinson’s Disease From Speech by Using Fisher Vectors
Poster; 1000–1200
José Vicente Egas López (Institute of Informatics), Juan Rafael Orozco-Arroyave (Universidad de Antioquia), Gábor Gosztolya (Research Group on Artificial Intelligence)

Speech perception in adverse listening conditions[Wed-P-6-D]
Wednesday, 18 September, Hall 10/D

Effects of Spectral and Temporal Cues to Mandarin Concurrent-vowels Identification for Normal-hearing and Hearing-impaired Listeners
Poster; 1000–1200
Zhen Fu (Peking University), Xihong Wu (Peking University), Jing Chen (Peking University)
Talker intelligibility and listening effort with temporally modified speech
Poster; 1000–1200
Maximillian Paulus (University College London), Valerie Hazan (University College London (UCL)), Patti Adank (University College London)
R2SPIN: Re-recording the Revised Speech Perception in Noise Test
Poster; 1000–1200
Lauren Ward (University of Salford), Matthew Paradis (BBC r&D), Catherine Robinson (BBC R&D), Ben Shirley (University of Salford)
Contributions of Consonant-vowel Transitions to Mandarin Tone Identification in Simulated Electric-acoustic Hearing
Poster; 1000–1200
Fei Chen (Southern University of Science and Technology)
Disfluencies and Human Speech Transcription Errors
Poster; 1000–1200
Vicky Zayats (University of Washington), Trang Tran (University of Washington), Courtney Mansfield (University of Washington), Richard Wright (University of Washington), Mari Ostendorf (University of Washington)
The influence of distraction on speech processing: How selective is selective attention?
Poster; 1000–1200
Sandra Isabella Parhammer (University of Innsbruck & Medical University of Innsbruck), Miriam Ebersberg (University of Innsbruck & Medical University of Innsbruck), Jenny Tippmann (Technical University of Dresden), Katja Stärk (Max-Planck-Institute for Psycholinguistics), Andreas Opitz (University of Leipzig), Barbara Hinger (University of Innsbruck), Sonja Rossi (Medical University of Innsbruck)
Subjective evaluation of communicative effort for younger and older adults in interactive tasks with energetic and informational masking
Poster; 1000–1200
Valerie Hazan (University College London (UCL)), Outi Tuomainen (UCL), Linda Taschenberger (UCL)
Perceiving Older Adults Producing Clear and Lombard Speech
Poster; 1000–1200
Chris Davis (The MARCS Institute, Western Sydney University), Jeesun Kim (The MARCS Institute, Western Sydney University)
Phone-attribute posteriors to evaluate the speech of cochlear implant users
Poster; 1000–1200
Tomas Arias-Vergara (Ludwig-Maximilians University), Juan Rafael Orozco-Arroyave (Universidad de Antioquia), Milos Cernak (Logitech Europe), Sandra Gollwitzer (Ludwig-Maximilians University), Maria Schuster (Ludwig-Maximilians University), Elmar Nöth (Friedrich-Alexander-University Erlangen-Nuremberg)
Effects of urgent speech and congruent/incongruent text on speech intelligibility in noise and reverberation
Poster; 1000–1200
Nao Hodoshima (Department of Information Media Technology, Tokai University)
Quantifying Cochlear Implant Users' Ability for Speaker Identification using CI Auditory Stimuli
Poster; 1000–1200
Nursadul Mamun (Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Computer Engineering, The University of Texas at Dallas), Ria Ghosh (Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Computer Engineering, The University of Texas at Dallas), John H.L. Hansen (Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Computer Engineering, The University of Texas at Dallas)
Lexically Guided Perceptual Learning of a Vowel Shift in an Interactive L2 Listening Context
Poster; 1000–1200
Emily Felker (Radboud University), Mirjam Ernestus (Radboud University), Mirjam Broersma (Radboud University)

Speech Enhancement: Single channel 1[Wed-P-6-E]
Wednesday, 18 September, Hall 10/E

Monaural speech enhancement with dilated convolutions
Poster; 1000–1200
Shadi Pirhosseinloo (Department of Electrical Engineering and Computer Science, University of Kansas), Jonathan Brumberg (University of Kansas)
Masking Estimation with Phase Restoration of Clean Speech for Monaural Speech Enhancement
Poster; 1000–1200
Changchun Bao (Beijing University of Technology), Xianyun Wang (Beijing University of Technology)
Progressive Speech Enhancement with Residual Connections
Poster; 1000–1200
Jorge Llombart (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza), Dayana Ribas Gonzalez (UNIZAR), Antonio Miguel (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Luis Vicente (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Alfonso Ortega (University of Zaragoza), Eduardo Lleida Solano (University of Zaragoza)
Noise Adaptive Speech Enhancement using Domain Adversarial Training
Poster; 1000–1200
Chien-Feng Liao (Academia Sinica), Yu Tsao (Academia Sinica), Hung-yi Lee (National Taiwan University (NTU)), Hsin-Min Wang (Academia Sinica)
Environment-dependent Attention-driven Recurrent Convolutional Neural Network for Robust Speech Enhancement
Poster; 1000–1200
Meng Ge (Tianjin University), Longbiao Wang (Tianjin University), LI NAN (/), Hao Shi (Tianjin University), Jianwu Dang (JAIST), Xiangang Li (Didi Chuxing)
A Statistically Principled and Computationally Efficient Approach to Speech Enhancement using Variational Autoencoders
Poster; 1000–1200
Manuel Pariente (Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Antoine Deleforge (Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France), Emmanuel Vincent (Inria)
Speech Enhancement Using Forked Generative Adversarial Networks with Spectral Subtraction
Poster; 1000–1200
Ju Lin (Clemson University), Sufeng Niu (Linkedin Inc.), Zice Wei (Clemson University), Xiang Lan (Clemson University), Adriaan J. van Wijngaarden (Bell Labs), Melissa C. Smith (Clemson University), Kuang-Ching Wang (Clemson University)
Specialized Speech Enhancement Model Selection Based on Learned Non-Intrusive Quality Assessment Metric
Poster; 1000–1200
Ryandhimas Edo Zezario (Research Center for Information Technology Innovation, Academia Sinica), Szu-wei Fu (Research Center for Information Technology Innovation, Academia Sinica), Xugang Lu (NICT), Hsin-Min Wang (Academia Sinica), Yu Tsao (Academia Sinica)
Speaker-aware Deep Denoising Autoencoder with Embedded Speaker Identity for Speech Enhancement
Poster; 1000–1200
Fu-Kai Chuang (Department of Electrical Engineering, Yuan Ze University), Syu-Siang Wang (MOST Joint Research Center for AI Technology and All Vista Healthcare), Jeih-weih Hung (National Chi Nan University), Yu Tsao (Academia Sinica), Shih-Hau Fang (Department of Electrical Engineering, Yuan Ze University)
Investigation on cost function for monaural speech separation
Poster; 1000–1200
Yun Liu (Inner Mongolia University at Hohhot), Hui Zhang (Inner Mongolia University at Hohhot), Xueliang Zhang (Computer Science Department, Inner Mongolian University), Yuhang Cao (Inner Mongolia University, Hohhot)
Deep Attention Gated Dilated Temporal Convolutional Networks with Intra-Parallel Convolutional Modules for End-to-End Monaural Speech Separation
Poster; 1000–1200
Ziqiang Shi (Fujitsu Research and Development Center), Huibin Lin (Fujitsu Research and Development Center), Liu Liu (Fujitsu Research and Development Center), Rujie Liu (Fujitsu Research and Development Center), Jiqing Han (Harbin Institute of Technology), Anyan Shi (ShuangFeng First)

Speech and Speaker Recognition[Wed-S&T-3]
Wednesday, 18 September, Hall 4

Avaya Conversational Intelligence™: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations
Show&Tell; 1000–1200
Jan Mizgajski (Avaya, Santa Clara, California), Adrian Szymczak , Robert Głowski , Piotr Szymanski , Piotr Zelasko , Łukasz Augustyniak , Mikołaj Morzy , Yishay Carmiel , Jeff Hodson , Łukasz Wojciak , Daniel Smoczyk , Adam Wrobel , Bartosz Borowik , Adam Artajew , Marcin Baran , Cezary Kwiatkowski , Marzena Zyła-Hoppe
Robust Keyword Spotting via Recycle-Pooling for Mobile Game
Show&Tell; 1000–1200
Shounan An (Game Dev. AI Team, NARC, Netmarble), Youngsoo Kim , Hu Xu , Jinwoo Lee , MyungWoo Lee , Insoo Oh
Multimodal Dialog with the MALACH Audiovisual Archive
Show&Tell; 1000–1200
Adam Chylek (New Technologies for the Information Society (NTIS) Faculty of Applied Sciences, University of West Bohemia), Lubos Smıdl , Jan Svec
SpeechMarker: A Voice based Multi-level Attendance Application
Show&Tell; 1000–1200
Sarfaraz Jelil (Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, Guwahati), Abhishek Shrivastava , Rohan Kumar Das , S. R. M. Prasanna , Rohit Sinha
Robust Sound Recognition: A Neuromorphic Approach
Show&Tell; 1000–1200
Jibin Wu (National University of Singapore), Zihan Pan , Malu Zhang , Rohan Kumar Das , Yansong Chua , Haizhou Li
The CUHK Dysarthric Speech Recognition Systems for English and Cantonese
Show&Tell; 1000–1200
Shoukang Hu (The Chinese University of Hong Kong, Hong Kong SAR), Shansong Liu , Heng Fai Chang , Mengzhe Geng , Jiani Chen , Lau Wing Chung , To Ka Hei , Jianwei Yu , Ka Ho Wong , Xunying Liu , Helen Meng

The Interspeech 2019 Computational Paralinguistics Challenge (ComParE)[Wed-SS-6-4]
Wednesday, 18 September, Hall 11 [More info]

The INTERSPEECH 2019 Computational Paralinguistics Challenge: Styrian Dialects - Continuous Sleepiness - Baby Sounds & Orca Activity
Oral; 1000–1015
Björn Schuller (University of Augsburg / Imperial College London), Anton Batliner (University of Augsburg), Christian Bergler (Friedrich-Alexander-University Erlangen-Nuremberg), Florian Pokorny (Medical University of Graz), Jarek Krajewski (Univ. Wuppertal), Meg Cychosz (University of California, Berkeley), Ralf Vollmann (University of Graz), Sonja-Dana Roelen (Rhenish University of Applied Science Cologne), Sebastian Schnieder (Institut für experimentelle Psychophysiologie), Elika Bergelson (Duke University), Alejandrina Cristia (Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS), Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University), Amanda Seidl (Purdue University), Anne Warlaumont (University of California, Los Angeles), Lisa Yankowitz (University of Pennsylvania), Elmar Noeth (Friedrich-Alexander-University Erlangen-Nuremberg), Shahin Amiriparian (University of Augsburg / Technische Universität München), Simone Hantke (Universität Passau / Technische Universität München), Maximilian Schmitt (ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg)
Relevance-based Feature Masking: Improving Neural Network based Whale Classification through Explainable Artificial Intelligence
Oral; 1119–1127
Dominik Schiller (Uni Augsburg), Tobias Huber (Uni Augsburg), Florian Lingenfelser (Uni Augsburg), Michael Dietz (Uni Augsburg), Andreas Seiderer (Uni Augsburg), Elisabeth André (Uni Augsburg)
Spatial - Temporal and Spectral Multiresolution Analysis for the INTERSPEECH 2019 ComParE Challenge
Oral; 1127–1135
Marie-José Caraty (STIH - Paris University), Claude Montacié (Sorbonne University (STIH))
The DKU-LENOVO Systems for the INTERSPEECH 2019 Computational Paralinguistic Challenge
Oral; 1135–1143
Haiwei Wu (Sun Yat-sen University), Weiqing Wang (Duke Kunshan University), Ming Li (Duke Kunshan University)
Overview on Approaches and Results
Oral; 1143–1200
Using Speech Production Knowledge for Raw Waveform Modelling based Styrian Dialect Identification
Oral; 1015–1023
S. Pavankumar Dubagunta (Idiap Research Institute), Mathew Magimai Doss (Idiap Research Institute)
Deep Neural Baselines for Computational Paralinguistics
Oral; 1023–1031
Daniel Elsner (University of Munich), Stefan Langer (University of Munich), Fabian Ritz (University of Munich), Robert Müller (University of Munich), Steffen Illium (University of Munich)
Styrian dialect classification: comparing and fusing classifiers based on a feature selection using a genetic algorithm
Oral; 1031–1039
Thomas Kisler (University of Munich), Raphael Winkelmann (University of Munich), Florian Schiel (Bavarian Archive for Speech Signals (BAS), University of Munich)
Using Attention Networks and Adversarial Augmentation for Styrian Dialect Continuous Sleepiness and Baby Sound Recognition
Oral; 1039–1047
Sung-Lin Yeh (Department of Electrical Engineering, National Tsing Hua University), Gao-Yi Chao (Department of Electrical Engineering, National Tsing Hua University), Bo-Hao Su (Department of Electrical Engineering, National Tsing Hua University), Yu-Lin Huang (Department of Electrical Engineering, National Tsing Hua University), Meng-Han Lin (Department of Electrical Engineering, National Tsing Hua University), Yin-Chun Tsai (Department of Electrical Engineering, National Tsing Hua University), Yu-Wen Tai (Department of Electrical Engineering, National Tsing Hua University), Zheng-Chi Lu (Department of Electrical Engineering, National Tsing Hua University), Chieh-Yu Chen (NVIDIA), Tsung-Ming Tai (NVIDIA), Chiu-Wang Tseng (NVIDIA), Cheng-Kuang Lee (NVIDIA), Chi-Chun Lee (Department of Electrical Engineering, National Tsing Hua University)
Ordinal Triplet Loss: Investigating Sleepiness Detection from Speech
Oral; 1047–1055
Peter Wu (CMU), SaiKrishna Rallabandi (Carnegie Mellon University), Eric Nyberg (Carnegie Mellon University), Alan W Black (Carnegie Mellon University)
Voice Quality and Between-Frame Entropy for Sleepiness Estimation
Oral; 1055–1103
Vijay Ravi (Student), Soo Jin Park (UCLA), Amber Afshan (University of California, Los Angeles), Abeer Alwan (UCLA)
Using Fisher Vector and Bag-of-Audio-Words Representations to Identify Styrian Dialects - Sleepiness - Baby & Orca Sounds
Oral; 1103–1111
Gábor Gosztolya (Research Group on Artificial Intelligence)
Instantaneous Phase and Long-term Acoustic Cues for Orca Activity Detection
Oral; 1111–1119
Rohan Kumar Das (National University Singapore), Haizhou Li (National University of Singapore)

Lunch Break in lower foyer[Wed-B-2]
Wednesday, 18 September, Foyer

Lunch Break in lower foyer
Break; 1200–1330

Bilingualism, L2, and Non-nativeness[Wed-O-7-1]
Wednesday, 18 September, Main Hall

Survey Talk
Survey talk: Recognition of foreign-accented speech: Challenges and opportunities for human and computer speech communication [More info]
Survey Talk; 1330–1410
Ann Bradlow (Northwestern University, Evanston, IL)
The Effects of Time Expansion on English as a Second Language Individuals
Oral; 1410–1430
John Novak (University of Illinois at Chicago), Daniel Bunn (University of Illinois at Chicago), Robert Kenyon (University of Illinois at Chicago)
Capturing L1 Influence on L2 Pronunciation by Simulating Perceptual Space Using Acoustic Features
Oral; 1430–1450
Shuju Shi (University of Illinois at Urbana-Champaign), Chilin Shih (University of Illinois at Urbana-Champaign), Jinsong Zhang (Beijing Language and Culture University)
Cognitive factors in Thai-naïve Mandarin speakers’ imitation of Thai lexical tones
Oral; 1450–1510
Juqiang Chen (MARCS Institute, Western Sydney University), Catherine Best (The MARCS Institute, Western Sydney University), Mark Antoniou (The MARCS Institute Western Sydney University)
Foreign-Language Knowledge Enhances Artificial-Language Segmentation
Oral; 1510–1530
Annie Tremblay (University of Kansas), Mirjam Broersma (Radboud University)

Spoken Term Detection[Wed-O-7-2]
Wednesday, 18 September, Hall 1

Neural Named Entity Recognition from Subword Units
Oral; 1330–1350
Abdalghani Abujabal (Max Planck Institute for Informatics), Judith Gaspers (Amazon)
Unsupervised Acoustic Segmentation and Clustering using Siamese Network Embeddings
Oral; 1350–1410
Saurabhchand Bhati (The Johns Hopkins University), Shekhar Nayak (Indian Institute of Technology Hyderabad), Sri Rama Murty Kodukula (IIT Hyderabad), Najim Dehak (Johns Hopkins University)
An Empirical Evaluation of DTW Subsampling Methods for Keyword Search
Oral; 1410–1430
Bolaji Yusuf (Bogazici University), Murat Saraclar (Bogazici University)
Linguistically-informed Training of Acoustic Word Embeddings for Low-resource Languages
Oral; 1430–1450
Zixiaofan Yang (Columbia University), Julia Hirschberg (Columbia University)
Multimodal Word Discovery and Retrieval with Phone Sequence and Image Concepts
Oral; 1450–1510
Liming Wang (University of Illinois, Urbana Champaign), Mark Hasegawa-Johnson (University of Illinois)
Empirical Evaluation of Sequence-to-Sequence Models for Word Discovery in Low-resource Settings
Oral; 1510–1530
Marcely Zanon Boito (Laboratoire d'Informatique de Grenoble), Aline Villavicencio (School of Computer Science and Electronic Engineering, University of Essex), Laurent Besacier (LIG)

Speech and Audio Source Separation and Scene Analysis 2[Wed-O-7-4]
Wednesday, 18 September, Hall 11

Direct-Path Signal Cross-Correlation Estimation for Sound Source Localization in Reverberation
Oral; 1330–1350
Wei Xue (JD AI Research), Ying Tong (JD AI Research), Guohong Ding (JD AI Research), Chao Zhang (JD AI Research), Tao Ma (JD AI Research), Xiaodong He (JD AI Research), Bowen Zhou (JD AI Research)
Multiple Sound Source Localization with SVD-PHAT
Oral; 1350–1410
Francois Grondin (Massachusetts Institute of Technology), James Glass (Massachusetts Institute of Technology)
ROBUST DOA ESTIMATION BASED ON CONVOLUTIONAL NEURAL NETWORK AND TIME-FREQUENCY MASKING
Oral; 1410–1430
Wangyou Zhang (Shanghai Jiao Tong University), Ying Zhou (Shanghai Jiao Tong University), Yanmin Qian (Shanghai Jiao Tong University)
Multichannel Loss Function for Supervised Speech Source Separation by Mask-based Beamforming
Oral; 1430–1450
Yoshiki Masuyama (Waseda University), Masahito Togami (Line Corporation), Tatsuya Komatsu (LINE Coporation)
Direction-aware Speaker Beam for Multi-channel Speaker Extraction
Oral; 1450–1510
Guanjun Li (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences, China), Shan Liang (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences, China), Shuai Nie (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences, China), Wenju Liu (National Laboratory of Patten Recognition, Institute of Automation, Chinese Academy of Sciences, China), Meng Yu (Tencent AI Lab, Bellevue, WA, USA), Lianwu Chen (Tencent AI Lab, Shenzhen, China), Shouye Peng (Xueersi Online School), Changliang Li (KingSoft AI Lab)
Multimodal SpeakerBeam: Single channel target speech extraction with audio-visual speaker clues
Oral; 1510–1530
Tsubasa Ochiai (NTT Communication Science Laboratories), Marc Delcroix (NTT Communication Science Laboratories), Keisuke Kinoshita (NTT), Atsunori Ogawa (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)

Speech Enhancement: Single Channel 2[Wed-O-7-5]
Wednesday, 18 September, Hall 12

Speech Denoising With Deep Feature Losses
Oral; 1330–1350
Francois Germain (Stanford University), Qifeng Chen (HKUST), Vladen Koltun (Intel Labs)
VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking
Oral; 1350–1410
Quan Wang (Google Inc.), Hannah Muckenhirn (Idiap Research Institute), Kevin Wilson (Google Inc.), Prashant Sridhar (Google Inc.), Zelin Wu (Google Inc.), John Hershey (Google Inc.), Rif Saurous (Google Inc.), Ron Weiss (Google Inc.), Ye Jia (Google Inc.), Ignacio Lopez Moreno (Google Inc.)
Incorporating Symbolic Sequential Modeling for Speech Enhancement
Oral; 1410–1430
Chien-Feng Liao (Academia Sinica), Yu Tsao (Academia Sinica), Xugang Lu (NICT), Hisashi Kawai (NICT)
Maximum a Posteriori Speech Enhancement Based on Double Spectrum
Oral; 1430–1450
Pejman Mowlaee (Graz University of Technology), Daniel Scheran (Graz University of Technology), Johannes Stahl (Graz University of Technology), Sean Wood (Graz University of Technology), Bastiaan Kleijn (Victoria University of Wellington)
Coarse-to-fine Optimization for Speech Enhancement
Oral; 1450–1510
Jian Yao (Apple Inc.), Ahmad Al-Dahle (Apple Inc.)
Kernel Machines Beat Deep Neural Networks on Mask-based Single-channel Speech Enhancement
Oral; 1510–1530
Like Hui (The Ohio State University), Siyuan Ma (The Ohio State University), Mikhail Belkin (The Ohio State University)

Speech Recognition and beyond[Wed-P-7-B]
Wednesday, 18 September, Gallery B

Acoustic Model Bootstrapping Using Semi-Supervised Learning
Poster; 1330–1530
Langzhou Chen (Amazon Cambridge office), Volker Leutnant (Amazon Aachen office)
Transfer Learning from Audio-Visual Grounding to Speech Recognition
Poster; 1330–1530
Wei-Ning Hsu (Massachusetts Institute of Technology), David Harwath (Massachusetts Institute of Technology), James Glass (Massachusetts Institute of Technology)
Bandwidth Embeddings for Mixed-bandwidth Speech Recognition
Poster; 1330–1530
Gautam Mantena (Apple Inc.), Ozlem Kalinli (Apple Inc), Ossama Abdel-Hamid (Apple Inc), Don McAllaster (Apple Inc)
Adversarial Black-Box Attacks on Automatic Speech Recognition Systems Using Multi-Objective Evolutionary Optimization
Poster; 1330–1530
shreya khare (IBM Research), Rahul Aralikatte (University of Copenhagen), Senthil Mani (IBM Research)
Towards Debugging Deep Neural Networks by Generating Speech Utterances
Poster; 1330–1530
Bilal Soomro (University of Eastern Finland), Anssi Kanervisto (University of Eastern Finland), Trung Ngo Trong (University of Eastern Finland), Ville Hautamaki (University of Eastern Finland)
Compression of CTC-Trained Acoustic Models by Dynamic Frame-Wise Distillation Or Segment-Wise N-Best Hypotheses Imitation
Poster; 1330–1530
Haisong Ding (University of Science and Technology of China), Kai Chen (Microsoft Research Asia), Qiang Huo (Microsoft Research Asia)
Keyword Spotting for Hearing Assistive Devices Robust to External Speakers
Poster; 1330–1530
Iván López-Espejo (Aalborg University), Zheng-Hua Tan (Aalborg University), Jesper Jensen (Oticon A/S and Aalborg University)
Latent Dirichlet Allocation based Acoustic Data Selection for Automatic Speech Recognition
Poster; 1330–1530
Mortaza (Morrie) Doulaty (Microsoft), Thomas Hain (University of Sheffield)
Target speaker recovery and recognition network with average x-vector and global training
Poster; 1330–1530
Li Wenjie (The Institute of Acoustics of the Chinese Academy of Sciences), pengyuan zhang (Institute of Acoustics, Chinese Academy of Sciences), Yonghong Yan (Institute of Acoustics, Chineses Academy of Sciences)
Lyrics recognition from singing voice focused on correspondence between voice and notes
Poster; 1330–1530
Motoyuki Suzuki (Faculty of Information Science and Technology, Osaka Institute of Technology), Sho Tomita (Faculty of Information Science and Technology, Osaka Institute of Technology), Tomoki Morita (Faculty of Information Science and Technology, Osaka Institute of Technology)

Emotion Modeling and Analysis[Wed-P-7-C]
Wednesday, 18 September, Gallery C

Cross-corpus speech emotion recognition using semi-supervised transfer non-negative matrix factorization with adaptation regularization
Poster; 1330–1530
Hui Luo (Harbin institute of technology), Jiqing Han (Harbin Institute of Technology)
Does the Lombard Effect Improve Emotional Communication in Noise? – Analysis of Emotional Speech Acted in Noise –
Poster; 1330–1530
Yi Zhao (National Institute of Informatics (NII)), Atsushi Ando (NTT Corporation), Shinji Takaki (National Institute of Informatics), Junichi Yamagishi (National Institute of Informatics), Satoshi Kobashikawa (NTT Corporation)
Linear Discriminant Differential Evolution for Feature Selection in Emotional Speech Recognition
Poster; 1330–1530
Soumaya Gharsellaoui (Université de Moncton Campus de Shippagan), Sid Ahmed Selouani (Université de Moncton Campus de Shippagan), Mohammed Sidi Yakoub (Université de Moncton Campus de Shippagan)
Multi-modal learning for Speech Emotion Recognition : An Analysis and comparison of ASR outputs with ground truth transcription
Poster; 1330–1530
Saurabh Sahu (University of Maryland College Park), Vikramjit Mitra (Apple Inc.), Nadee Seneviratne (University of Maryland, College Park), Carol Espy-Wilson (University of Maryland at College Park)
Modeling user context for valence prediction from narratives
Poster; 1330–1530
Aniruddha Tammewar (University Of Trento), Alessandra Cervone (University of Trento), Eva-Maria Rathner (University of Ulm), Giuseppe Riccardi (University of Trento)
Front-end Feature Compensation and Denoising for Noise Robust Speech Emotion Recognition
Poster; 1330–1530
Rupayan Chakraborty (TCS Innovation Labs-Mumbai), Ashish Panda (Innovation Labs, Tata Consultancy Services), Meghna Pandharipande (TCS), Sonal Joshi (Tata Consultancy Services), Sunil Kumar Kopparapu (TCS Research and Innovation - Mumbai)
The Contribution of Acoustic Features Analysis to Model Emotion Perceptual Process for Language Diversity
Poster; 1330–1530
Xingfeng Li (Japan Advanced Institute of Science and Technology), Masato Akagi (Japan Advanced Institute of Science and Technology)
Design and Development of a Multi-lingual Speech Corpora (TaMaR-EmoDB) for Emotion Analysis
Poster; 1330–1530
Rajeev Rajan (College of Engineering ,Trivandrum), Haritha U.G (College of Engineering, Trivandrum), Sujitha A.C (College of Engineering,Trivandrum), Rajisha T.M (College of Applied Sciences,Malappuram)
Speech Emotion Recognition with a Reject Option
Poster; 1330–1530
Kusha Sridhar (The University of Texas at Dallas), Carlos Busso (The University of Texas at Dallas)
Development of Emotion Rankers Based on Intended and Perceived Emotion Labels
Poster; 1330–1530
Zhenghao Jin (New York Institute of Technology), Houwei Cao (New York Institute of Technology)
Emotion Recognition from Natural Phone Conversations in Individuals With and Without Recent Suicidal Ideation
Poster; 1330–1530
John Gideon (University of Michigan), Heather Schatten (Warren Alpert Medical School, Brown University; Butler Hospital), Melvin McInnis (University of Michigan), Emily Mower Provost (University of Michigan)
An acoustic and lexical analysis of emotional valence in spontaneous speech: Autobiographical memory recall in older adults
Poster; 1330–1530
Deniece Nazareth (University of Twente), Ellen Tournier (Radboud University Nijmegen), Sarah Leimkötter (University of Twente), Esther Janse (Centre for Language Studies, Radboud University Nijmegen), Dirk Heylen (University of Twente), Gerben Westerhof (University of Twente), Khiet Truong (University of Twente)

Articulatory Phonetics[Wed-P-7-D]
Wednesday, 18 September, Hall 10/D

Articulatory characteristics of secondary palatalization in Romanian fricatives
Poster; 1330–1530
Laura Spinu (City University of New York / Kingsborough), Maida Percival (University of Toronto), Alexei Kochetov (University of Toronto)
Articulation of vowel length contrasts in Australian English
Poster; 1330–1530
Louise Ratko (Macquarie University), Michael Proctor (Macquarie University), Felicity Cox (Macquarie University)
V-to-V coarticulation induced acoustic and articulatory variability of vowels: The effect of pitch-accent
Poster; 1330–1530
Andrea Deme (Eötvös Loránd University & MTA-ELTE Lendület Lingual Articulation Research Group), Márton Bartók (Eötvös Loránd University & MTA-ELTE Lendület Lingual Articulation Research Group), Tekla Etelka Graczi (Research Institute for Linguistics, Hungarian Academy of Sciences & MTA-ELTE "Momentum" Lingual Articulation Research Group), Tamás Gábor Csapó (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics), Alexandra Markó (Eötvös Loránd University, MTA-ELTE Lendület Lingual Articulation Research Group)
The contribution of lip protrusion to Anglo-English /r/: Evidence from hyper- and non-hyperarticulated speech
Poster; 1330–1530
Hannah King (Université Paris Diderot), Emmanuel Ferragne (Université Paris Diderot)
Articulatory analysis of transparent vowel /iː/ in harmonic and antiharmonic Hungarian stems: Is there a difference?
Poster; 1330–1530
Alexandra Markó (Eötvös Loránd University, MTA-ELTE Lendület Lingual Articulation Research Group), Márton Bartók (Eötvös Loránd University, MTA-ELTE Lendület Lingual Articulation Research Group), Tamás Gábor Csapó (Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics), Tekla Etelka Graczi (Research Institute for Linguistics, Hungarian Academy of Sciences & MTA-ELTE "Momentum" Lingual Articulation Research Group), Andrea Deme (Eötvös Loránd University & MTA-ELTE Lendület Lingual Articulation Research Group)
On the role of oral configurations in European Portuguese nasal vowels
Poster; 1330–1530
Conceição Cunha (IPS, University of Munich), Samuel Silva (DETI/IEETA, Universidade de Aveiro), Catarina Oliveira (ESSUA/IEETA, Universidade de Aveiro), Paula Martins (ESSUA/IEETA, Universidade de Aveiro), António Teixeira (DETI/IEETA, Universidade de Aveiro), Arun Joseph (Max-Planck-Institut für biophysikalische Chemie), Jens Frahm (Max Plank Institute for Biophysical Chemistry)
Ejectivity and Place of Articulation in Tigrinya: An Ultra-Fast MRI Study
Poster; 1330–1530
Zainab Hermes (University of Chicago), Ryan Shosted (University of Illinois at Urbana-Champaign), Maojing Fu (University of Illinois at Urbana-Champaign), Sharon Rose (University of California, San Diego), Zhi-Pei Liang (University of Illinois at Urbana-Champaign), Brad Sutton (University of Illinois at Urbana-Champaign)

Speech and Audio Classification 2[Wed-P-7-E]
Wednesday, 18 September, Hall 10/E

Residual + Capsule Networks (ResCap) for Simultaneous Single-Channel Overlapped Keyword Recognition
Poster; 1330–1530
Yan Xiong (Arizona State University), Visar Berisha (Arizona State University), Chaitali Chakrabarti (Arizona State University)
Music Genre Classification using Duplicated Convolutional Layers in Neural Networks
Poster; 1330–1530
Hansi Yang (Tsinghua University), Wei-Qiang Zhang (Tsinghua University)
A Storyteller's tale: Audio books' Literary Genres classification - using CNN and RNN architectures
Poster; 1330–1530
Nehory Carmi (The Open University of Israel), Azaria Cohen (The Open University of Israel), Mireille Avigal (The Open University of Israel), Anat Lerner (The Open University of Israel, senior faculty)
A Study for Improving Device-Directed Speech Detection toward Frictionless Human-Machine Interaction
Poster; 1330–1530
Che-Wei Huang (Amazon), Roland Maas (Amazon.com), Sri Harish Mallidi (Amazon, USA), Bjorn Hoffmeister (Amazon.com)
Unsupervised Methods for Audio Classification from Lecture Discussion Recordings
Poster; 1330–1530
Hang Su (Chinese University of Hong Kong), Borislav Dzodzo (Chinese University of Hong Kong), Xixin Wu (The Chinese University of Hong Kong), Xunying Liu (Chinese University of Hong Kong), Helen Meng (The Chinese University of Hong Kong)
Neural Whispered Speech Detection with Imbalanced Learning
Poster; 1330–1530
Takanori Ashihara (NTT Corporation), Yusuke Shinohara (NTT Corporation), Hiroshi Sato (NTT Corporation), Takafumi Moriya (NTT Corporation), Kiyoaki Matsui (NTT Media Intelligence laboratories), Takaaki Fukutomi (NTT Corporation), Yoshikazu Yamaguchi (NTT Corporation), Yushi Aono (NTT Corporation)
Deep Learning for Orca Call Type Identification – A Fully Unsupervised Approach
Poster; 1330–1530
Christian Bergler (Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab), Manuel Schmitt (Friedrich-Alexander-University Erlangen-Nuremberg, Department of Computer Science, Pattern Recognition Lab), Rachael Xi Cheng (Leibniz Institute for Zoo and Wildlife Research (IZW) in the Forschungsverbund Berlin e.V.), Andreas Maier (University Erlangen-Nuremberg), Volker Barth (Anthro-Media), Elmar Nöth (Friedrich-Alexander-University Erlangen-Nuremberg)
Open-Vocabulary Keyword Spotting With Audio And Text Embeddings
Poster; 1330–1530
Niccolò Sacchi (EPFL), Alexandre Nanchen (Idiap Research Institute), Martin Jaggi (EPFL), Milos Cernak (Logitech Europe)
ToneNet: A CNN Model of Tone Classification of Mandarin Chinese
Poster; 1330–1530
Qiang Gao (Communication University of China), Shutao Sun (Communication University of China), Yaping Yang (Communication University of China)
Temporal Convolution for Real-time Keyword Spotting on Mobile Devices
Poster; 1330–1530
Seungwoo Choi (Hyperconnect), Seokjun Seo (Hyperconnect), Beomjun Shin (Hyperconnect), Hyeongmin Byun (Hyperconnect), Martin Kersner (Hyperconnect), Beomsu Kim (Hyperconnect), Dongyoung Kim (Hyperconnect), Sungjoo Ha (Hyperconnect)
Audio Tagging with Compact Feedforward Sequential Memory Network and Audio-to-Audio Ratio Based Data Augmentation
Poster; 1330–1530
Zhiying Huang (Alibaba Inc.), ShiLiang Zhang (Alibaba Group), Ming Lei (Alibaba Inc.)

Speech Annotation and Labelling[Wed-S&T-4]
Wednesday, 18 September, Hall 4

BAS Web Services for Automatic Subtitle Creation and Anonymization
Show&Tell; 1330–1530
Florian Schiel (Institute of Phonetics and Speech Processing, LMU Munich), Thomas Kisler
A User-Friendly and Adaptable Re-Implementation of an Acoustic Prominence Detection and Annotation Tool
Show&Tell; 1330–1530
Jana Voße (Phonetics and Phonology Work Group, Bielefeld University), Petra Wagner
PyToBI: a Toolkit for ToBI Labeling under Python
Show&Tell; 1330–1530
Monica Dominguez (University Pompeu Fabra, Barcelona), Patrick Louis Rohrer , Juan Soler-Company
GECKO - A Tool for Effective Annotation of Human Conversations
Show&Tell; 1330–1530
Golan Levy (Gong.io), Raquel Sitman , Ido Amir , Eduard Goldstein , Ran Mochary , Eilon Reshef , Roi Reichardt , Omri Allouche
SLP-AA: Tools for Sign Language Phonetic and Phonological Research
Show&Tell; 1330–1530
Roger Yu-Hsiang Lo (Department of Linguistics University of British Columbia, Vancouver, BC), Kathleen Currie Hall
SANTLR: Speech Annotation Toolkit for Low Resource Languages
Show&Tell; 1330–1530
Xinjian Li (Language Technologies Institute, Carnegie Mellon University; Pittsburgh, PA), Zhong Zhou , Siddharth Dalmia , Alan W. Black , Florian Metze

The VOiCES from a Distance Challenge – O[Wed-SS-7-3]
Wednesday, 18 September, Hall 2 [More info]

The VOiCES from a Distance Challenge 2019
Oral; 1330–1340
Mahesh Kumar Nandwana (SRI International), Julien van Hout (SRI International), Colleen Richey (SRI International), Mitchell McLaren (SRI International), Maria Alejandra Barrios (In-Q-Tel), Aaron Lawson (SRI International)
STC Speaker Recognition Systems for The VOiCES From a Distance Challenge
Oral; 1340–1345
Sergey Novoselov (ITMO University, Speech Technology Center), Aleksei Gusev (ITMO University, Speech Technology Center), Artem Ivanov (ITMO University, Speech Technology Center), Timur Pekhovsky (ITMO University, Speech Technology Center Ltd), Andrey Shulipa (ITMO University), Galina Lavrentyeva (ITMO University, Speech Technology Center), Vladimir Volokhov (Speech Technology Center), Alexandr Kozlov (Speech Technology Center Ltd.)
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge
Oral; 1345–1350
Pavel Matejka (Brno University Of Technology), Oldrich Plchot (Brno University of Technology), Hossein Zeinali (Sharif University of Technology), Ladislav Mošner (Brno University of Technology, Faculty of Information Technology), Anna Silnova (Brno University of Technology), Lukas Burget (Brno University of Technology), Ondřej Novotný (Brno University of Technology), Ondrej Glembek (Brno University of Technology)
The STC ASR System for the VOiCES from a Distance Challenge 2019
Oral; 1350–1355
Ivan Medennikov (STC-innovations Ltd), Yuri Khokhlov (STC-innovations Ltd), Aleksei Romanenko (ITMO University), Ivan Sorokin (STC), Anton Mitrofanov (STC-innovations Ltd), Vladimir Bataev (Speech Technology Center Ltd), Andrei Andrusenko (STC-innovations Ltd), Tatiana Prisyach (STC-innovations Ltd), Mariya Korenevskaya (STC-innovations Ltd), Oleg Petrov (ITMO University), Alexander Zatvornitskiy (Speech Technology Center)
The I2R's ASR System for the VOiCES from a Distance Challenge 2019
Oral; 1355–1400
Tze Yuang Chong (Institute for Infocomm Research), Kye Min Tan (Institute for Infocomm Research), Kah Kuan Teh (A*STAR Singapore), Changhuai You (Institute for Infocomm Research), hanwu sun (Institute for Infocomm Research), Tran-Huy Dat (Institute for Infocomm Research)

The VOiCES from a Distance Challenge – P[Wed-SS-7-A]
Wednesday, 18 September, Gallery A [More info]

The VOiCES from a Distance Challenge 2019
Poster; 1400–1530
Mahesh Kumar Nandwana (SRI International), Julien van Hout (SRI International), Colleen Richey (SRI International), Mitchell McLaren (SRI International), Maria Alejandra Barrios (In-Q-Tel), Aaron Lawson (SRI International)
The LeVoice Far-filed Speech Recognition System for VOiCES from a Distance Challenge 2019
Poster; 1400–1530
Yulong Liang (Lenovo Reaserch), Lin Yang (Lenovo Reaserch), Xuyang Wang (Lenovo Reaserch), Yingjie Li (Lenovo Reaserch), Chen Jia (Lenovo Reaserch), Junjie Wang (Lenovo Reaserch)
The JHU ASR System for VOiCES from a Distance Challenge 2019
Poster; 1400–1530
Yiming Wang (Johns Hopkins University), David Snyder (The Johns Hopkins University), Hainan Xu (Johns Hopkins University), Vimal Manohar (Johns Hopkins University), Phani Sankar Nidadavolu (Johns Hopkins University), Dan Povey (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)
The DKU System for the Speaker Recognition Task of the 2019 VOiCES from a Distance Challenge
Poster; 1400–1530
Danwei Cai (Duke Kunshan University), Xiaoyi Qin (Sun Yat-sen University, Guangzhou, China), Weicheng Cai (Sun Yat-sen University), Ming Li (Duke Kunshan University)
STC Speaker Recognition Systems for The VOiCES From a Distance Challenge
Poster; 1400–1530
Sergey Novoselov (ITMO University, Speech Technology Center), Aleksei Gusev (ITMO University, Speech Technology Center), Artem Ivanov (ITMO University, Speech Technology Center), Timur Pekhovsky (ITMO University, Speech Technology Center Ltd), Andrey Shulipa (ITMO University), Galina Lavrentyeva (ITMO University, Speech Technology Center), Vladimir Volokhov (Speech Technology Center), Alexandr Kozlov (Speech Technology Center Ltd.)
Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge
Poster; 1400–1530
Pavel Matejka (Brno University Of Technology), Oldrich Plchot (Brno University of Technology), Hossein Zeinali (Sharif University of Technology), Ladislav Mošner (Brno University of Technology, Faculty of Information Technology), Anna Silnova (Brno University of Technology), Lukas Burget (Brno University of Technology), Ondřej Novotný (Brno University of Technology), Ondrej Glembek (Brno University of Technology)
The STC ASR System for the VOiCES from a Distance Challenge 2019
Poster; 1400–1530
Ivan Medennikov (STC-innovations Ltd), Yuri Khokhlov (STC-innovations Ltd), Aleksei Romanenko (ITMO University), Ivan Sorokin (STC), Anton Mitrofanov (STC-innovations Ltd), Vladimir Bataev (Speech Technology Center Ltd), Andrei Andrusenko (STC-innovations Ltd), Tatiana Prisyach (STC-innovations Ltd), Mariya Korenevskaya (STC-innovations Ltd), Oleg Petrov (ITMO University), Alexander Zatvornitskiy (Speech Technology Center)
The I2R's ASR System for the VOiCES from a Distance Challenge 2019
Poster; 1400–1530
Tze Yuang Chong (Institute for Infocomm Research), Kye Min Tan (Institute for Infocomm Research), Kah Kuan Teh (A*STAR Singapore), Changhuai You (Institute for Infocomm Research), hanwu sun (Institute for Infocomm Research), Tran-Huy Dat (Institute for Infocomm Research)
Multi-task Discriminative Training of Hybrid DNN-TVM Model for Speaker Verification with Noisy and Far-Field Speech
Poster; 1400–1530
Arindam Jati (University of Southern California), Raghuveer Peri (University of Southern California), Monisankha Pal (University of Southern California), Tae Jin Park (University of Southern California), Naveen Kumar (Disney Research), Ruchir Travadi (University of Southern California), Panayiotis Georgiou (Univ. Southern California), Shrikanth Narayanan (University of Southern California)
pThe JHU Speaker Recognition System for the VOiCES 2019 Challenge
Poster; 1400–1530
David Snyder (The Johns Hopkins University), Jesus Villalba (Johns Hopkins University), Nanxin Chen (Johns Hopkins University), Dan Povey (Johns Hopkins University), Gregory Sell (Johns Hopkins University), Najim Dehak (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)
Intel Far-field Speaker Recognition System for VOiCES Challenge 2019
Poster; 1400–1530
Jonathan Huang (Intel Corp.), Tobias Bocklet (Intel Corp)
THE I2R SUBMISSION TO VOiCES DISTANCE SPEAKER RECOGNITION CHALLENGE 2019
Poster; 1400–1530
Hanwu Sun (A*Star Singapore), Kah Kuan Teh (A*STAR Singapore), Ivan Kukanov (A*STAR Singapore), Huy Dat Tran (A*STAR Singapore)

Coffee break in both exhibition foyers, lower and upper level 1[Wed-B-3]
Wednesday, 18 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 1530–1600

Multimodal ASR[Wed-O-8-1]
Wednesday, 18 September, Main Hall

Survey Talk
Survey Talk: Multimodal Processing of Speech and Language [More info]
Survey Talk; 1600–1640
Florian Metze (Carnegie Mellon University, Pittsburgh, PA)
MobiVSR : Efficient and Light-weight Neural Network for Visual SpeechRecognition on Mobile Devices
Oral; 1640–1700
Nilay Shrivastava (NSIT), Astitwa Saxena (NSIT), Yaman Kumar (Adobe), Rajiv Shah (IIIT Delhi), Amanda Stent (Bloomberg), Debanjan Mahata (Bloomberg), Preeti Kaur (NSUT), Roger Zimmermann (NUS)
Speaker Adaptation for Lip-reading Using Visual Identity Vectors
Oral; 1700–1720
Pujitha Appan Kandala (Samsung R&D Institute India, Bengaluru), Abhinav Thanda (Samsung R&D Institute India, Bengaluru), Dilip Kumar Margam (Samsung R&D Institute, Bangalore), Rohith Aralikatti (Samsung Research India Bangalore), Tanay Sharma (Samsung R&D Institute India, Bengaluru), Sharad Roy (Samsung R&D Institute India, Bengaluru), Shankar M Venkatesan (Samsung R&D Institute India, Bengaluru)
MobiLipNet: Resource-efficient deep learning based lipreading
Oral; 1720–1740
Alexandros Koumparoulis (University of Thessaly), Gerasimos Potamianos (University of Thessaly)
LipSound: Neural Mel-spectrogram Reconstruction for Lip Reading
Oral; 1740–1800
Leyuan Qu (University of Hamburg), Cornelius Weber (University of Hamburg), Stefan Wermter (University of Hamburg)

ASR Neural Network Architectures - 2[Wed-O-8-2]
Wednesday, 18 September, Hall 1

Two-Pass End-to-End Speech Recognition
Oral; 1600–1620
Ruoming Pang (Google Inc.), Tara Sainath (Google), David Rybach (Google), Yanzhang He (Google Inc.), Rohit Prabhavalkar (Google), Wei Li (Googole Inc.), Mirko Visontai (Google Inc), Qiao Liang (Google), Trevor Strohman (Google), Yonghui Wu (Google), Ian McGraw (Google), Chung-Cheng Chiu (Google)
Extract - Adapt and Recognize: an End-to-end Neural Network for Corrupted Monaural Speech Recognition
Oral; 1620–1640
Max W. Y. Lam (The Chinese University of Hong Kong), Jun Wang (Tencent AI Lab), Xunying Liu (Chinese University of Hong Kong), Helen Meng (The Chinese University of Hong Kong), Dan Su (Tencent AILab Shenzhen), Dong Yu (Tencent AI Lab)
Multi-task multi-resolution char-to-BPE cross-attention decoder for end-to-end speech recognition
Oral; 1640–1700
Dhananjaya Gowda (Samsung Research), Abhinav Garg (Samsung Research), Kwangyoun Kim (Samsung Research), Mehul Kumar (Samsung Resaerch), Chanwoo Kim (Samsung Resaerch)
Multi-Stride Self-Attention for Speech Recognition
Oral; 1700–1720
Kyu Han (JD AI Research), Jing Huang (JD AI Research), Yun Tang (JD AI Research), Xiaodong He (JD AI Research), Bowen Zhou (JD AI Research)
LF-MMI Training of Bayesian and Gaussian Process Time Delay Neural Networks for Speech Recognition
Oral; 1720–1740
shoukang hu (The Chinese University of Hong Kong), Xurong Xie (Chinese University of Hong Kong), SHANSONG LIU (The Chinese University of Hong Kong), Max W. Y. Lam (The Chinese University of Hong Kong), Jianwei Yu (the Chinese University of Hong Kong), Xixin Wu (The Chinese University of Hong Kong), Xunying Liu (Chinese University of Hong Kong), Helen Meng (The Chinese University of Hong Kong)
Self-Teaching Networks
Oral; 1740–1800
Liang Lu (Microsoft), Eric Sun (Microsoft), Yifan Gong (Microsoft Corp)

Training Strategy for Speech Emotion Recognition[Wed-O-8-3]
Wednesday, 18 September, Hall 2

Improved End-to-End Speech Emotion Recognition Using Self Attention Mechanism and Multitask Learning
Oral; 1600–1620
Yuanchao Li (Honda R&D Co., Ltd), Tianyu Zhao (Kyoto University), Tatsuya Kawahara (Kyoto University)
Continuous Emotion Recognition in Speech – Do We Need Recurrence?
Oral; 1620–1640
Maximilian Schmitt (ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg), Nicholas Cummins (University of Augsburg), Björn Schuller (University of Augsburg / Imperial College London)
Speech Based Emotion Prediction: Can a Linear Model Work?
Oral; 1640–1700
Anda Ouyang (University of New South Wales), Ting Dang (UNSW), Vidhyasaharan Sethu (The University of New South Wales), Eliathamby Ambikairajah (The University of New South Wales, Sydney)
Speech Emotion Recognition based on Multi-Label Emotion Existence Model
Oral; 1700–1720
Atsushi Ando (NTT Corporation), Ryo Masumura (NTT Corporation), Hosana Kamiyama (NTT Corporation), Satoshi Kobashikawa (NTT Corporation), Yushi Aono (NTT Corporation)
Gender de-biasing in speech emotion recognition
Oral; 1720–1740
Cristina Gorrostieta (Cogito Corp), Reza Lotfian (Cogito Corp), Kye Taylor (Cogito Corp), Richard Brutti (Cogito Corp), John Kane (Cogito Corporation)
CycleGAN-based Emotion Style Transfer as Data Augmentation for Speech Emotion Recognition
Oral; 1740–1800
Fang Bao (University of Stuttgart), Michael Neumann (University of Stuttgart), Ngoc Thang Vu (University of Stuttgart)

Voice Conversion for Style, Accent, and Emotion[Wed-O-8-4]
Wednesday, 18 September, Hall 11

Lombard Speech Synthesis using Transfer Learning in a Tacotron Text-to-Speech System
Oral; 1600–1620
Bajibabu Bollepalli (Aalto University), Lauri Juvela (Aalto University), Paavo Alku (Aalto University)
Augmented CycleGANs for continuous scale normal-to-Lombard speaking style conversion
Oral; 1620–1640
Shreyas Seshadri (Department of Signal Processing and Acoustics, Aalto University), Lauri Juvela (Aalto University), Paavo Alku (Aalto University), Okko Räsänen (Tampere University)
Foreign Accent Conversion by Synthesizing Speech from Phonetic Posteriorgrams
Oral; 1640–1700
Guanlong Zhao (Texas A&M University), Shaojin Ding (Texas A&M University), Ricardo Gutierrez-Osuna (Texas A&M University)
A Multi-Speaker Emotion Morphing Model Using Highway Networks and Maximum Likelihood Objective
Oral; 1700–1720
Ravi Shankar (Johns Hopkins University), Jacob Sager (Johns Hopkins University), Archana Venkataraman (Johns Hopkins University)
Effects of waveform PMF on anti-spoofing detection
Oral; 1720–1740
Itshak Lapidot (Afeka Tel-Aviv College of Engineering, ACLP), Jean-Francois Bonastre (University of Avignon/LIA)

Speaker Recognition II[Wed-O-8-5]
Wednesday, 18 September, Hall 12

Self-supervised speaker embeddings
Oral; 1600–1620
Themos Stafylakis (Omilia - Conversational Intelligence), Johan Rohdin (Brno University of Technology), Oldrich Plchot (Brno University of Technology), Petr Mizera (Czech Technical University in Prague), Lukas Burget (Brno University of Technology)
Privacy-Preserving Speaker Recognition with Cohort Score Normalisation
Oral; 1620–1640
Andreas Nautsch (EURECOM), Jose Patino (EURECOM), Amos Treiber (TU Darmstadt), Themos Stafylakis (Omilia), Petr Mizera (Omilia), Massimiliano Todisco (EURECOM - School of Engineering & Research Center - Digital Security Department), Thomas Schneider (TU Darmstadt), Nicholas Evans (EURECOM)
Large Margin Softmax Loss for Speaker Verification
Oral; 1640–1700
Yi Liu (Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University), Liang He (Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University), Jia Liu (Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University)
A Deep Neural Network for Short-Segment Speaker Recognition
Oral; 1700–1720
Amirhossein Hajavi (Queen's University), Ali Etemad (Queen's University & Ingenuity Labs)
Deep Speaker Embedding Extraction with Channel-Wise Feature Responses and Additive Supervision Softmax Loss Function
Oral; 1720–1740
Jianfeng Zhou (Xiamen University), Tao Jiang (Xiamen University), Zheng Li (Xiamen University), Lin Li (Xiamen University), Qingyang Hong (Xiamen University)
VoiceID Loss: Speech Enhancement for Speaker Verification
Oral; 1740–1800
Suwon Shon (Massachusetts Institute of Technology), Hao Tang (Massachusetts Institute of Technology), James Glass (Massachusetts Institute of Technology)

Speech Coding and Evaluation[Wed-P-8-A]
Wednesday, 18 September, Gallery A

Parameter enhancement for MELP speech codec in noisy communication environment
Poster; 1600–1800
Min-Jae Hwang (Yonsei university), Hong-Goo Kang (Yonsei University)
Extending the E-Model Towards Super-wideband and Fullband Speech Communication Scenarios
Poster; 1600–1800
Sebastian Möller (Quality and Usability Lab, TU Berlin), Gabriel Mittag (Technische Universität Berlin), Thilo Michael (Quality and Usability Lab, Technische Universität Berlin), Vincent Barriac (Orange), Hitoshi Aoki (Nippon Telegraph and Telephone Corporation)
Cascaded Cross-Module Residual Learning towards Lightweight End-to-End Speech Coding
Poster; 1600–1800
Kai Zhen (Indiana University, School of Informatics, Computing, and Engineering), Jongmo Sung (Electronics and Telecommunications Research Institute), Mi Suk Lee (Electronics and Telecommunications Research Institute), Seungkwon Beack (Electronics and Telecommunications Research Institute), Minje Kim (Indiana University, School of Informatics, Computing, and Engineering)
End-to-end Optimization of Source Models for Speech and Audio Coding Using a Machine Learning Framework
Poster; 1600–1800
Tom Bäckström (Aalto University)
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
Poster; 1600–1800
Jean-Marc Valin (Mozilla), Jan Skoglund (Google LLC)
Super-Wideband Spectral Envelope Modeling for Speech Coding
Poster; 1600–1800
Guillaume Fuchs (Fraunhofer IIS), Chamran Ashour (Ericsson), Tom Bäckström (Aalto University)
Speech Audio Super-Resolution For Speech Recognition
Poster; 1600–1800
Xinyu Li (Amazon), Venkata Chebiyyam (Amazon), Katrin Kirchhoff (University of Washington)
Artificial Bandwidth Extension using H∞ Optimization
Poster; 1600–1800
Deepika Gupta (IITG), Hanumant Shekhawat (IITG)
Quality Degradation Diagnosis for Voice Networks - Estimating the Perceived Noisiness - Coloration - and Discontinuity of Transmitted Speech
Poster; 1600–1800
Gabriel Mittag (Technische Universität Berlin), Sebastian Möller (Quality and Usability Lab, TU Berlin)
A Cross-entropy-guided (CEG) Measure for Speech Enhancement Front-end Assessing Performances of Back-end Automatic Speech Recognition
Poster; 1600–1800
Li Chai (University of Science and Technology of China), Jun Du (University of Science and Technologoy of China), Chin-Hui Lee (Georgia Institute of Technology)

Feature extraction for ASR[Wed-P-8-B]
Wednesday, 18 September, Gallery B

Modulation Vectors as Robust Feature Representation for ASR in Domain Mismatched Conditions
Poster; 1600–1800
Samik Sadhu (Johns Hopkins University), Hynek Hermansky (JHU)
Prosody Usage Optimization for Children Speech Recognition with Zero Resource Children Speech
Poster; 1600–1800
Chenda Li (Shanghai Jiao Tong University), Yanmin Qian (Shanghai Jiao Tong University)
Unsupervised Raw Waveform Representation Learning for ASR
Poster; 1600–1800
Purvi Agrawal (PhD Student, Indian Institute of Science, Bangalore-560012, India), Sriram Ganapathy (Indian Institute of Science, Bangalore, India, 560012)
Low-Dimensional Bottleneck Features for On-Device Continuous Speech Recognition
Poster; 1600–1800
David Ramsay (MIT Media Laboratory), Kevin Kilgour (Google AI), Dominik Roblek (Google AI), Matthew Sharif (Google AI)
Binary Speech Features for Keyword Spotting Tasks
Poster; 1600–1800
Alexandre Riviello (Polytechnique Montreal), Jean-Pierre David (Polytechnique Montreal)
wav2vec: Unsupervised Pre-training for Speech Recognition
Poster; 1600–1800
Steffen Schneider (Facebook AI Research), Alexei Baevski (Facebook AI Research), Ronan Collobert (Facebook AI Research), Michael Auli (Facebook AI Research)
Automatic Detection of Prosodic Focus in American English
Poster; 1600–1800
Sunghye Cho (Linguistic Data Consortium, University of Pennsylvania), Mark Liberman (University of Pennsylvania), Yong-cheol Lee (Cheongju University)
Feature exploration for almost zero-resource ASR-free keyword spotting using a multilingual bottleneck extractor and correspondence autoencoders
Poster; 1600–1800
Raghav Menon (Stellenbosch University), Herman Kamper (Stellenbosch University), Ewald Van der westhuizen (Stellenbosch University), John Quinn (School of Informatics, University of Edinburgh), Thomas Niesler (University of Stellenbosch)
On Learning Interpretable CNNs with Parametric Modulated Kernel-based Filters
Poster; 1600–1800
Erfan Loweimi (The University of Sheffield), Peter Bell (The University of Edinburgh), Steve Renals (The University of Edinburgh)

Lexicon and Language Model for Speech Recognition[Wed-P-8-C]
Wednesday, 18 September, Gallery C

Reverse Transfer Learning: Can Word Embeddings Trained for Different NLP Tasks Improve Neural Language Models?
Poster; 1600–1800
Lyan Verwimp (ESAT - KU Leuven), Jerome Bellegarda (Apple Inc.)
Unified Verbalization for Speech Recognition & Synthesis Across Languages
Poster; 1600–1800
Sandy Ritchie (Google), Richard Sproat (None), Kyle Gorman (The Graduate Center, City University of New York), Daan van Esch (Google), Christian Schallhart (Google), Nikos Bampounis (Computational Linguist), Benoit Brard (Google), Jonas Mortensen (Google), Amelia Holt (Google), Eoin Mahon (Google)
Better morphology prediction for better speech systems
Poster; 1600–1800
Dravyansh Sharma (Carnegie Mellon University), Melissa Wilson (Google LLC), Antoine Bruguier (Google LLC)
Joint Grapheme and Phoneme Embeddings for Contextual End-to-End ASR
Poster; 1600–1800
Zhehuai Chen (Shanghai Jiao Tong University), Mahaveer Jain (Facebook), Yongqiang Wang (Facebook), Michael Seltzer (Facebook), Christian Fuegen (Facebook)
Character-Aware Sub-word Level Language Modelling for Uyghur and Turkish ASR
Poster; 1600–1800
Chang Liu (Institute of Acoustics, Chinese Academy of Sciences), Zhen Zhang (National Computer Network Emergency Response Technical Team), pengyuan zhang (Institute of Acoustics, Chinese Academy of Sciences), Yonghong Yan (Institute of Acoustics, Chineses Academy of Sciences)
Connecting and Comparing Language Model Interpolation Techniques
Poster; 1600–1800
Ernest Pusateri (Apple Inc.), Christophe Van Gysel (Apple Inc.), Rami Botros (Apple Inc.), Sameer Badaskar (Apple Inc.), Mirko Hannemann (Apple Inc.), Youssef Oualil (Apple Inc.), Ilya Oparin (Apple Inc.)
Enriching Rare Word Representations in Neural Language Models by Embedding Matrix Augmentation
Poster; 1600–1800
Yerbolat Khassanov (Nanyang Technological University), Zhiping Zeng (Temasek Laboratories, Nanyang Technological University), Van Tung Pham (Nanyang Technological University), Haihua Xu (Temasek Laboratories @ NTU, Singapore), Eng Siong Chng (Nanyang Technological University)
Comparative Study of Parametric and Representation Uncertainty Modeling for Recurrent Neural Network Language Models
Poster; 1600–1800
Jianwei Yu (the Chinese University of Hong Kong), Max W. Y. Lam (The Chinese University of Hong Kong), shoukang hu (The Chinese University of Hong Kong), Xixin Wu (The Chinese University of Hong Kong), Xu Li (The Chinese University of Hong Kong), Yuewen Cao (The Chinese University of HongKong), Xunying Liu (Chinese University of Hong Kong), Helen Meng (The Chinese University of Hong Kong)
Improving automatically induced lexicons for highly agglutinating languages using data-driven morphological segmentation
Poster; 1600–1800
Wiehan Agenbag (Stellenbosch University), Thomas Niesler (University of Stellenbosch)
Attention-based word vector prediction with LSTMs and its application to the OOV problem in ASR
Poster; 1600–1800
Alejandro Coucheiro-Limeres (Speech Technology Group, Universidad Politécnica de Madrid), Fernando Fernández-Martínez (Speech Technology Group, Universidad Politécnica de Madrid), Rubén San-Segundo (Speech Technology Group, Universidad Politécnica de Madrid), Javier Ferreiros-López (Speech Technology Group, Universidad Politécnica de Madrid)
Code-Switching Sentence Generation by Bert and Generative Adversarial Networks
Poster; 1600–1800
Yingying Gao (China Mobile Research), Junlan Feng (China Mobile Research), Ying Liu (China Mobile Research), Leijing Hou (China Mobile Research), Xin Pan (China Mobile Research), Yong Ma (China Mobile Research)

First and second language acquisition[Wed-P-8-D]
Wednesday, 18 September, Hall 10/D

Vietnamese learners tackling the German /ʃt/ in perception
Poster; 1600–1800
Anke Sennema (University of Vienna), Silke Hamann (University of Amsterdam)
Nasal consonant discrimination in infant- and adult-directed speech
Poster; 1600–1800
Bogdan Ludusan (Bielefeld University), Annett Jorschick (Bielefeld University), Reiko Mazuka (RIKEN Center for Brain Science)
No distributional learning in adults from attended listening to non-speech
Poster; 1600–1800
Ellen Marklund (Stockholm Babylab, Phonetics Laboratory, Department of Linguistics, Stockholm University), Johan Sjons (Department of Linguistics, Stockholm University), Lisa Gustavsson (Stockholm Babylab, Phonetics Laboratory, Department of Linguistics, Stockholm University), Elísabet Eir Cortes (Department of Linguistics, Stockholm University)
A computational model of early language acquisition from audiovisual experiences of young infants
Poster; 1600–1800
Okko Räsänen (Tampere University), Khazar Khorrami (Tampere University)
The Production of Chinese Affricates /ts/ and /tsʰ/ by Native Urdu Speakers
Poster; 1600–1800
Dan Du (Beijing Language and Culture University), Jinsong Zhang (Beijing Language and Culture University)
An articulatory-acoustic investigation into GOOSE-fronting in German-English bilinguals residing in London - UK
Poster; 1600–1800
Scott Lewis (Queen Mary University of London), Adib Mehrabi (Queen Mary University of London), Esther de Leeuw (Queen Mary University of London)
Multimodal Articulation-Based Pronunciation Error Detection with Spectrogram and Acoustic Features
Poster; 1600–1800
Sabrina Jenne (Institute for Natural Language Processing, University of Stuttgart), Ngoc Thang Vu (University of Stuttgart)
Using Prosody to Discover Word Order Alternations in a Novel Language
Poster; 1600–1800
Anouschka Foltz (University of Graz), Sarah Cooper (Bangor University), Tamsin McKelvey (Bangor University)
Speaking rate - information density - and information rate in first-language and second-language speech
Poster; 1600–1800
Ann Bradlow (Northwestern University)
Articulation rate as a metric in spoken language assessment
Poster; 1600–1800
Calbert Graham (University of Cambridge), Francis Nolan (University of Cambridge)
Learning Alignment for Multimodal Emotion Recognition from Speech
Poster; 1600–1800
Haiyang Xu (AI Labs, Didi Chuxing), Hui Zhang (AI Labs, Didi Chuxing), Kun Han (AI Labs, Didi Chuxing), Yun Wang (AI Labs, Didi Chuxing), Yiping Peng (AI Labs, Didi Chuxing), Xiangang Li (AI Labs, Didi Chuxing)
Liquid deletion in French child-directed speech
Poster; 1600–1800
Sharon Peperkamp (CNRS), Monica Hegde (CNRS), Maria Julia Carbajal (CNRS)
Towards detection of canonical babbling by citizen scientists: Performance as a function of clip length
Poster; 1600–1800
Amanda Seidl (Purdue University), Anne Warlaumont (University of California, Los Angeles), Alejandrina Cristia (Laboratoire de Sciences Cognitives et Psycholinguistique (ENS, EHESS, CNRS), Département d'Etudes Cognitives, Ecole Normale Supérieure, PSL Research University)

Speech and Audio Classification 3[Wed-P-8-E]
Wednesday, 18 September, Hall 10/E

Multi-stream Network With Temporal Attention For Environmental Sound Classification
Poster; 1600–1800
Xinyu Li (Amazon), Venkata Chebiyyam (Amazon), Katrin Kirchhoff (University of Washington)
Few-Shot Audio Classification with Attentional Graph Neural Networks
Poster; 1600–1800
Shilei Zhang (IBM Research - China), Yong Qin (IBM Research - China), Kewei Sun (IBM Research - China), Yonghua Lin (IBM Research - China)
Semi-supervised Audio Classification with Consistency-Based Regularization
Poster; 1600–1800
Kangkang Lu (Agency for Science, Technology and Research), Chuan Sheng Foo (Agency for Science, Technology and Research), Kah Kuan Teh (Agency for Science, Technology and Research), Huy Dat Tran (Agency for Science, Technology and Research), Vijay Ramaseshan Chandrasekhar (Agency for Science, Technology and Research)
Neural Network Distillation on IoT Platforms for Sound Event Detection
Poster; 1600–1800
Gianmarco Cerutti (Fondazione Bruno Kessler), Rahul Prasad (Fondazione Bruno Kessler), Alessio Brutti (Fondazione Bruno Kessler), Elisabetta Farella (Fondazione Bruno Kessler)
Class-wise Centroid Distance Metric Learning for Acoustic Event Detection
Poster; 1600–1800
Xugang Lu (NICT), Peng Shen (NICT), Sheng Li (NICT), Yu Tsao (Academia Sinica), Hisashi Kawai (NICT)
A Hybrid Approach to Acoustic Scene Classification Based on Universal Acoustic Models
Poster; 1600–1800
Xue Bai (University of Science and Technology of China), Jun Du (University of Science and Technologoy of China), Zi-Rui Wang (0), Chin-Hui Lee (Georgia Institute of Technology)
Hierarchical Pooling Structure for Weakly Labeled Sound Event Detection
Poster; 1600–1800
Kexin He (Tsinghua University), Yuhan Shen (Tsinghua University), Wei-Qiang Zhang (Tsinghua University)
SOUND EVENT DETECTION IN MULTICHANNEL AUDIO USING CONVOLUTIONAL TIME-FREQUENCY-CHANNEL SQUEEZE AND EXCITATION
Poster; 1600–1800
Wei Xia (University of Texas at Dallas), Kazuhito Koishida (Microsoft)
A Robust Framework For Acoustic Scene Classification
Poster; 1600–1800
LAM PHAM (University of Kent), Ian McLoughlin (The University of Kent, School of Computing, Medway), Huy Phan (University of Kent), Ramaswamy Palaniappan (University of Kent)
Compression of Acoustic Event Detection Models With Quantized Distillation
Poster; 1600–1800
Bowen Shi (Toyota Technological Institute at Chicago), Ming Sun (Amazon), Chieh-Chi Kao (Amazon), Viktor Rozgic (Amazon), Spyros Matsoukas (Amazon), Chao Wang (Amazon)
An End-to-End Audio Classification System based on Raw Waveforms and Mix-Training Strategy
Poster; 1600–1800
Jiaxu Chen (Hikvision Research Institute), Jing Hao (Hikvision Research Institute), Kai Chen (Hikvision Research Institute), Di Xie (Hikvision Research Institute), Shicai Yang (Hikvision Research Institute), Shiliang Pu (Hikvision Research Institute)

Speech Synthesis [Wed-S&T-5]
Wednesday, 18 September, Hall 4

Web-Based Speech Synthesis Editor
Show&Tell; 1600–1800
Martin Gruber (New Technologies for the Information Society (NTIS) Faculty of Applied Sciences, University of West Bohemia), Jakub Vit , Jindrich Matousek
GFM-Voc: A real-time voice quality modification system
Show&Tell; 1600–1800
Olivier Perrotin (Univ. Grenoble Alpes, CNRS, Grenoble INP, GIPSA-lab), Ian McLoughlin
Off the cuff: Exploring extemporaneous speech delivery with TTS
Show&Tell; 1600–1800
Éva Székely (Division of Speech, Music and Hearing, KTH Royal Institute of Technology, Stockholm), Gustav Eje Henter , Jonas Beskow , Joakim Gustafson
Synthesized Spoken Names: Biases Impacting Perception
Show&Tell; 1600–1800
Lucas Kessler (Rochester Institute of Technology), Cecilia Ovesdotter Alm , Reynold Bailey
Unbabel Talk - Human Verified Translations for Voice Instant Messaging
Show&Tell; 1600–1800
Luís Bernardo (Unbabel, Lisboa), Mathieu Giquel , Sebastiao Quintas , Paulo Dimas , Helena Moniz , Isabel Trancoso
Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-speech Synthesizer
Show&Tell; 1600–1800
Azam Rabiee (KAIST Institute for Artificial Intelligence, Korea Advanced Institute of Technology, Daejeon), Tae-Ho Kim , Soo-Young Lee

Voice quality characterization for clinical voice assessment: Voice production - acoustics - and auditory perception[Wed-SS-8-6]
Wednesday, 18 September, Hall 3 [More info]

Identifying distinctive acoustic and spectral features in Parkinson’s disease
Oral; 1700–1715
Yermiyahu Hauptman (Afeka Tel-Aviv College of Engineering, ACLP), Ruth Aloni-Lavi (Afeka Tel-Aviv College of Engineering, ACLP), Itshak Lapidot (Afeka Tel-Aviv College of Engineering, ACLP), Tanya Gurevich (Movement Disorders Unit, Dept of Neurology, Tel-Aviv Sourasky Medical Center), Yael Manor (Movement Disorders Unit, Dept of Neurology, Tel-Aviv Sourasky Medical Center), Stav Naor (Movement Disorders Unit, Dept of Neurology, Tel-Aviv Sourasky Medical Center, Israel), Noa Diamant (Movement Disorders Unit, Dept of Neurology, Tel-Aviv Sourasky Medical Center), Irit Opher (Afeka Tel-Aviv College of Engineering, ACLP)
Aerodynamics and lumped-masses combined with delay lines for modeling vertical and anterior-posterior phase differences in pathological vocal fold vibration
Oral; 1615–1630
Carlo Drioli (University of Udine, Department of Mathematics and Computer Science), Philipp Aichinger (Medical University of Vienna, Department of Otorhinolaryngology, Division of Phoniatrics-Logopedics)
Mel-cepstral Coefficients of Voice Source Waveforms for Classification of Phonation Types in Speech
Oral; 1630–1645
Sudarsana Reddy Kadiri (Aalto University), Paavo Alku (Aalto University)
Automatic detection of ASD in children using acoustic and text features from brief natural conversations
Oral; 1645–1700
Sunghye Cho (Linguistic Data Consortium, University of Pennsylvania), Mark Liberman (University of Pennsylvania), Neville Ryant (Linguistic Data Consortium), Meredith Cola (Children's Hospital of Philadelphia), Robert T. Schultz (Children's Hospital of Philadelphia), Julia Parish-Morris (Children's Hospital of Philadelphia)
Analysis and Synthesis of Vocal Flutter and Vocal Jitter
Oral; 1600–1615
Jean Schoentgen (Université Libre de Bruxelles), Philipp Aichinger (Medical University of Vienna, Division of Phoniatrics-Logopedics, Department of Otorhinolaryngology, Vienna)
Reliability of clinical voice parameters captured with smartphones - measurements of added noise and spectral tilt
Oral; 1715–1730
Felix Schaeffler (Queen Margaret University), Stephen Jannetts (Queen Margaret University), Janet Beck (Queen Margaret University)
Say what? A dataset for exploring the error patterns that two ASR engines make
Oral; 1730–1745
Meredith Moore (Arizona State University), Michael Saxon (Arizona State University), Hemanth Venkateswara (Arizona State University), Visar Berisha (Arizona State University), Sethuraman Panchanathan (Arizona State University)
Discussion
Oral; 1745–1800

Interspech Soireé [Wed-S-5]
Wednesday, 18 September, Stefaniensaal

Interspech Soireé
Social; 1900–2400

Speaker Check-in[Thu-C]
Thursday, 19 September, Room 8

Speaker Check-in
Check-In; 0800–1200

Registration[Mon-R-4]
Thursday, 19 September, Foyer

Registration
Registration; 0800–1700

Keynote 4: Mirella Lapata[Thu-K-4]
Thursday, 19 September, Main Hall

Keynote
Learning natural language interfaces with neural models [More info]
Keynote; 0830–0930
Mirella Lapata (University of Edinburgh)

Coffee break in both exhibition foyers, lower and upper level 1[Thu-B-1]
Thursday, 19 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 0930–1030

Speech Synthesis: Articulatory and Physical Approaches[Thu-O-9-1]
Thursday, 19 September, Main Hall

Survey Talk
Survey Talk: Realistic Physics-Based Computational Voice Production [More info]
Survey Talk; 1000–1040
Oriol Guasch (La Salle, Universitat Ramon Llull)
An extended two-dimensional vocal tract model for fast acoustic simulation of single-axis symmetric three-dimensional tubes
Oral; 1040–1100
Debasish Mohapatra (University of British Columbia), Victor Zappi (University of British Columbia), Sidney Fels (University of British Columbia)
Perceptual optimization of an enhanced geometric vocal fold model for articulatory speech synthesis
Oral; 1100–1120
Peter Birkholz (Institute of Acoustics and Speech Communication, TU Dresden), Susanne Drechsel (Department of Speech Science and Phonetics, Martin Luther University of Halle-Wittenberg), Simon Stone (Technische Universität Dresden)
Articulatory Copy Synthesis Based on A Genetic Algorithm
Oral; 1120–1140
Yingming Gao (Institute of Acoustics and Speech Communication, Technische Universität Dresden), Simon Stone (Technische Universität Dresden), Peter Birkholz (Institute of Acoustics and Speech Communication, TU Dresden)
A phonetic-level analysis of different input features for articulatory inversion
Oral; 1140–1200
Abdolreza Sabzi Shahrebabaki (Norwegian university of science and technology), Negar Olfati (Norwegian university of science and technology), Ali Shariq Imran (NTNU), Sabato Marco Siniscalchi (University of Enna Kore), Torbjørn Svendsen (Norwegian University of Science and Technology)

Sequence-to-sequence Speech Recognition[Thu-O-9-2]
Thursday, 19 September, Hall 1

Advancing sequence-to-sequence based speech recognition
Oral; 1000–1020
Zoltán Tüske (IBM Research), Kartik Audhkhasi (IBM Research), George Saon (IBM)
Sequence-to-Sequence Speech Recognition with Time-Depth Separable Convolutions
Oral; 1020–1040
Awni Hannun (Stanford University), Ann Lee (Facebook AI Research), Qiantong Xu (Facebook AI Research), Ronan Collobert (Facebook AI Research)
Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text
Oral; 1040–1100
Murali Karthick Baskar (Brno University of Technology), Shinji Watanabe (Johns Hopkins University), Ramón Astudillo (INESC-ID/L2F), Takaaki Hori (Mitsubishi Electric Research Laboratories), Lukas Burget (Brno University of Technology), Jan Černocký (Brno University of Technology)
Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition
Oral; 1100–1120
Ye Bai (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China), Jiangyan Yi (Institute of Automation, Chinese Academy of Sciences), Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China), Zhengkun Tian (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China), Zhengqi Wen (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Oral; 1120–1140
Kazuki Irie (RWTH Aachen University), Rohit Prabhavalkar (Google), Anjuli Kannan (Google Brain), Antoine Bruguier (Google), David Rybach (Google), Patrick Nguyen (Google)
Listen - Attend - Spell and Adapt: Speaker Adapted Sequence-to-Sequence ASR
Oral; 1140–1200
Felix Weninger (Nuance Communications), Jesús Andrés-Ferrer (Nuance Communications), Xinwei Li (Nuance Communications), Puming Zhan (Nuance Communications)

Search Methods for Speech Recognition[Thu-O-9-3]
Thursday, 19 September, Hall 2

Lattice re-scoring during manual editing for automatic error correction of ASR transcripts
Oral; 1000–1020
Anna Runarsdottir (Reykjavik University), Inga Rún Helgadóttir (Reykjavík University), Jon Gudnason (Reykjavik University)
GPU-based WFST Decoding with Extra Large Language Model
Oral; 1020–1040
Daisuke Fukunaga (Sony Corporation), Yoshiki Tanaka (Sony Corporation), Yuichi Kageyama (Sony Corporation)
Real-time One-pass Decoder for Speech Recognition Using LSTM Language Models
Oral; 1040–1100
Javier Jorge (Machine Learning and Language Processing), Adrià Giménez Pastor (Machine Learning and Language Processing), Javier Iranzo Sánchez (Machine Learning and Language Processing), Jorge Civera Saiz (Machine Learning and Language Processing), Albert Sanchis Navarro (Machine Learning and Language Processing), Alfons Juan Císcar (Machine Learning and Language Processing)
Vectorized Beam Search for CTC-Attention-based Speech Recognition
Oral; 1100–1120
Hiroshi Seki (Toyohashi University of Technology), Takaaki Hori (Mitsubishi Electric Research Laboratories), Shinji Watanabe (Johns Hopkins University), Niko Moritz (MERL), Jonathan Le Roux (Mitsubishi Electric Research Laboratories)
Contextual Recovery of Out-of-Lattice Named Entities in Automatic Speech Recognition
Oral; 1120–1140
Jack Serrino (MIT CSAIL), Leonid Velikovich (Google Inc.), Petar Aleksic (Google Inc), Cyril Allauzen (Google Research)
Sequence-to-Sequence Learning via Attention Transfer for Incremental Speech Recognition
Oral; 1140–1200
Sashi Novitasari (Nara Institute of Science and Technology), Andros Tjandra (Nara Institute of Science and Technology), Sakriani Sakti (Nara Institute of Science and Technology (NAIST) / RIKEN AIP), Satoshi Nakamura (Nara Institute of Science and Technology)

Audio Signal Characterization[Thu-O-9-4]
Thursday, 19 September, Hall 11

Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition
Oral; 1000–1020
Zheng Lian (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Jianhua Tao (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Bin Liu (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China), Jian Huang (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China)
Spatio-Temporal Attention Pooling for Audio Scene Classification
Oral; 1020–1040
Huy Phan (University of Kent), Oliver Y. Chén (University of Oxford), Lam Pham (University of Kent), Philipp Koch (University of Lübeck), Maarten De Vos (University of Oxford), Ian McLoughlin (University of Kent), Alfred Mertins (University of Lübeck)
Subspace Pooling Based Temporal Features Extraction For Audio Event Recognition
Oral; 1040–1100
Qiuying Shi (Harbin Institute of Technology), Hui Luo (Harbin institute of technology), Jiqing Han (Harbin Institute of Technology)
Multi-Scale Time-Frequency Attention for Rare Sound Event Detection
Oral; 1100–1120
Jingyang Zhang (Tsinghua University), Wenhao Ding (Tsinghua University), Jintao Kang (Ministry of Public Security), Liang He (Tsinghua University)
Acoustic Scene Classification by Implicitly Identifying Distinct Sound Events
Oral; 1120–1140
Hongwei Song (Harbin Institute of Technology), Jiqing Han (Harbin Institute of Technology), Shiwen Deng (Harbin Normal University), Zhihao Du (Harbin Institute of Technology)
Parameter-Transfer Learning for Low-Resource Individualization of Head-Related Transfer Functions
Oral; 1140–1200
XIAOKE QI (China University of Political Science and Law), LU WANG (Shenzhen University)

Speech and Voice Disorders I[Thu-O-9-5]
Thursday, 19 September, Hall 12

Prosodic characteristics of Mandarin declarative and interrogative utterances in Parkinson’s disease
Oral; 1000–1020
Study of the performance of automatic speech recognition systems in speakers with Parkinson's Disease
Oral; 1020–1040
Laureano Moro Velazquez (Johns Hopkins University), Jaejin Cho (Johns Hopkins University), Shinji Watanabe (Johns Hopkins University), Mark Hasegawa-Johnson (University of Illinois), Odette Scharenborg (Multimedia computing, Delft University of Technology), Kim Heejin (University of Illinois), Najim Dehak (Johns Hopkins University)
Towards the Speech Features of Mild Cognitive Impairment: Universal Evidence from Structured and Unstructured Connected Speech of Chinese
Oral; 1040–1100
Tianqi Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Chongyuan Lian (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Jingshen Pan (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Quanlei Yan (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Feiqi Zhu (The Third Affiliated Hospital of Shenzhen University), Manwa Ng (University of Hong Kong), Lan Wang (SIAT), Nan Yan (Shenzhen Institutes of Advanced Technology)
Child Speech Disorder Detection with Siamese Recurrent Network using Speech Attribute Features
Oral; 1100–1120
Jiarui Wang (The Chinese University of Hong Kong), Ying Qin (The Chinese University of Hong Kong), Zhiyuan Peng (The Chinese University of Hong Kong), Tan Lee (The Chinese University of Hong Kong)
Interpretable Deep Learning Model for the Detection and Reconstruction of Dysarthric Speech
Oral; 1120–1140
Daniel Korzekwa (Amazon), Roberto Barra-Chicote (Amazon), Bozena Kostek (Gdansk University of Technology), Thomas Drugman (Amazon), Mateusz Lajszczak (Amazon)
Vocal Biomarker Assessment Following Pediatric Traumatic Brain Injury: A Retrospective Cohort Study
Oral; 1140–1200
Camille Noufi (Stanford University), Adam Lammert (MIT Lincoln Laboratory), James Williamson (MIT Lincoln Laboratory), Daryush Mehta (Massachusetts General Hospital), Gregory Ciccarelli (MIT Lincoln Laboratory), Douglas Sturim (MIT), Jordan Green (MGH IHP), Thomas Campbell (The University of Texas at Dallas), Thomas Quatieri (MIT Lincoln Laboratory)

Speaker and Language Recognition II[Thu-P-9-A]
Thursday, 19 September, Gallery A

Adversarial Regularization for End-to-end Robust Speaker Verification
Poster; 1000–1200
Qing Wang (Northwestern Polytechnical University), Pengcheng Guo (Northwestern Polytechnical University), Sining Sun (Northwestern Polytechnical University), Lei Xie (Northwestern Polytechnical University), John H.L. Hansen (Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)
Blind bandwidth extension with a non-linear function and its evaluation on x-vector-based speaker verification
Poster; 1000–1200
Ryota Kaminishi (Tokyo Metropolitan University), Haruna Miyamoto (Tokyo Metropolitan University), Sayaka Shiota (Tokyo Metropolitan University), Hitoshi Kiya (Tokyo Metropolitan University)
Auto-Encoding Nearest Neighbor i-vectors for Speaker Verification.
Poster; 1000–1200
Umair Khan (Universitat Politècnica de Catalunya), Miquel India (Universitat Politecnica de Catalunya), Javier Hernando (Universitat Politecnica de Catalunya)
Towards A Fault-tolerant Speaker Verification System: A Regularization Approach To Reduce The Condition Number
Poster; 1000–1200
Siqi Zheng (Alibaba), Gang Liu (Alibaba Group), Hongbin Suo (Alibaba, Inc.), Yun Lei (Alibaba Group)
Deep Learning Based Multi-Channel Speaker Recognition in Noisy and Reverberant Environments
Poster; 1000–1200
Hassan Taherian (The Ohio State University), Zhong-Qiu Wang (The Ohio State University), DeLiang Wang (Ohio State University)
Joint optimization of neural acoustic beamforming and dereverberation with x-vectors for robust speaker verification
Poster; 1000–1200
Joon-Young Yang (Hanyang Universiry), Joon-Hyuk Chang (Hanyang University)
A new time-frequency attention mechanism for TDNN and CNN-LSTM-TDNN - with application to language identification
Poster; 1000–1200
Xiaoxiao Miao (Institute of Acoustics, Chineses Academy of Sciences), Ian McLoughlin (School of Computing, The University of Kent, Medway, UK), Yonghong Yan (Institute of Acoustics, Chineses Academy of Sciences)
Combining Speaker Recognition and Metric Learning for Speaker-Dependent Representation Learning
Poster; 1000–1200
Joao Monteiro (INRS-EMT), Md Jahangir Alam (ETS/CRIM), Tiago Falk (INRS-EMT, University of Quebec)
VAE-based regularization for deep speaker embedding
Poster; 1000–1200
Yang Zhang (Beijing University of Posts and Telecommunications), Lantian Li (Tsinghua University), Dong Wang (Tsinghua University)
Language Recognition using Triplet Neural Networks
Poster; 1000–1200
Victoria Mingote (University of Zaragoza), Diego Castan (SRI International), Mitchell McLaren (SRI International), Mahesh Kumar Nandwana (SRI International), Alfonso Ortega (University of Zaragoza), Eduardo Lleida Solano (University of Zaragoza), Antonio Miguel (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain)
Spatial Pyramid Encoding with Convex Length Normalization for Text-Independent Speaker Verification
Poster; 1000–1200
Youngmoon Jung (KAIST), Younggwan Kim (LG electronics), Hyungjun Lim (Korea Advanced Institute of Science and Technology (KAIST)), Yeunju Choi (KAIST), Hoirin Kim (KAIST)
End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification
Poster; 1000–1200
Hee-Soo Heo (School of Computer Science, University of Seoul, Korea), Jee-weon Jung (University of Seoul), IL-Ho Yang (University of Seoul), Sung-Hyun Yoon (University of Seoul), Hye-jin Shim (University of Seoul), Ha-Jin Yu (University of Seoul)
An Effective Deep Embedding Learning Architecture for Speaker Verification
Poster; 1000–1200
Yiheng Jiang (University of Science and Technology of China), Yan Song (university of science and technology of china), Ian McLoughlin (The University of Kent, School of Computing, Medway), Zhifu Gao (University of Science and Technology of China), Lirong Dai (University of Science &Technology of China)
Far-Field End-to-End Text-Dependent Speaker Verification based on Mixed Training Data with Transfer Learning and Enrollment Data Augmentation
Poster; 1000–1200
Xiaoyi Qin (Sun Yat-sen University, Guangzhou, China), Danwei Cai (Duke Kunshan University), Ming Li (Duke Kunshan University)
Two-stage Training for Chinese Dialect Recognition
Poster; 1000–1200
Zongze Ren (Shanghai Institute for Advanced Communication and Data Science), Guofu Yang (Shanghai Institute for Advanced Communication and Data Science), Shugong Xu (Shanghai Institute for Advanced Communication and Data Science)

Medical applications and visual ASR[Thu-P-9-B]
Thursday, 19 September, Gallery B

An Attention-Based Hybrid Network for Automatic Detection of Alzheimer’s Disease from Narrative Speech
Poster; 1000–1200
Jun Chen (University of Michigan), Ji Zhu (University of Michigan), Jieping Ye (University of Michigan)
On the Use of Pitch Features for Disordered Speech Recognition
Poster; 1000–1200
SHANSONG LIU (The Chinese University of Hong Kong), shoukang hu (The Chinese University of Hong Kong), Xunying Liu (Chinese University of Hong Kong), Helen Meng (The Chinese University of Hong Kong)
Large-Scale Visual Speech Recognition
Poster; 1000–1200
Brendan Shillingford (DeepMind), Yannis Assael (DeepMind), Matthew Hoffman (DeepMind), Thomas Paine (DeepMind), Cían Hughes (DeepMind), Utsav Prabhu (Google), Hank Liao (Google Inc.), Hasim Sak (Google), Kanishka Rao (Google), Lorrayne Bennett (DeepMind), Marie Mulville (Google), Misha Denil (DeepMind), Ben Coppin (DeepMind), Ben Laurie (Google), Andrew Senior (Google inc), Nando de Freitas (DeepMind)
Investigating the Lombard Effect Influence on End-to-End Audio-Visual Speech Recognition
Poster; 1000–1200
Pingchuan Ma (Imperial College London), Stavros Petridis (Imperial College London / Samsung AI Centre), Maja Pantic (Imperial College London / Samsung AI Centre)
"Computer - test my hearing": Accurate speech audiometry with smart speakers
Poster; 1000–1200
Jasper Ooster (Medical Physics and Cluster of Excellence Hearing4all, Carl-von-Ossietzky University Oldenburg), Pia Nancy Porysek Moreta (Medical Physics and Cluster of Excellence Hearing4all, Carl-von-Ossietzky University Oldenburg), Jörg-Hendrik Bach (HörTech gGmbH & Hearing4all), Inga Holube (Institute of Hearing Technology and Audiology, Jade University of Applied Sciences, Oldenburg), Bernd T. Meyer (Medizinische Physik and Cluster of Excellence Hearing4all, Universität Oldenburg)
Synchronising audio and ultrasound by learning cross-modal embeddings
Poster; 1000–1200
Aciel Eshky (University of Edinburgh), Manuel Sam Ribeiro (The University of Edinburgh), Korin Richmond (Informatics, University of Edinburgh), Steve Renals (University of Edinburgh)
Automatic Hierarchical Attention Neural Network for Detecting Alzheimer’s Disease
Poster; 1000–1200
Yilin Pan (University of Sheffield), Heidi Christensen (University of Sheffield), Bahman Mirheidari (Department of Computer Science, University of Sheffield), Annalena Venneri (Sheffield Institute for Translational Neuroscie), Daniel Blackburn (Sheffield Institute for Translational Neuroscie), Markus Reuber (Academic Neurology Unit, University of Sheffield)
Deep sensing of breathing signal during conversational speech
Poster; 1000–1200
Venkata Srikanth Nallanthighal (Philips Research, Eindhoven and Radboud University, Nijmegen), Aki Härmä (Philips Research, Eindhoven)
Parrotron: An End-to-End Speech-to-Speech Conversion Model and its Applications to Hearing-Impaired Speech and Speech Separation
Poster; 1000–1200
Fadi Biadsy (Google Inc), Ron Weiss (Google Brain), Pedro Moreno (google inc.), Dimitri Kanevsky (Google), Ye Jia (Google)
Exploiting Visual Features using Bayesian Gated Neural Networks for Disordered Speech Recognition
Poster; 1000–1200
SHANSONG LIU (The Chinese University of Hong Kong), shoukang hu (The Chinese University of Hong Kong), Yi Wang (University of Cambridge), Jianwei Yu (the Chinese University of Hong Kong), Rongfeng Su (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences.), Xunying Liu (Chinese University of Hong Kong), Helen Meng (The Chinese University of Hong Kong)
Video-Driven Speech Reconstruction using Generative Adversarial Networks
Poster; 1000–1200
Konstantinos Vougioukas (Imperial College London), Pingchuan Ma (Imperial College London), Stavros Petridis (Imperial College London / Samsung AI Centre), Maja Pantic (Imperial College London)

Turn management in dialogue[Thu-P-9-C]
Thursday, 19 September, Gallery C

Investigating Linguistic and Semantic Features for Turn-Taking Prediction in Open-Domain Human-Computer Conversation
Poster; 1000–1200
S. Zahra Razavi (University of Rochester), Benjamin Kane (University of Rochester), Lenhart K. Schubert (University of Rochester)
Follow-Up Question Generation using Neural Tensor Network-based Domain Ontology Population in an Interview Coaching System
Poster; 1000–1200
Ming-Hsiang Su (National Cheng Kung University), Chung-Hsien Wu (National Cheng Kung University), Yi Chang (National Cheng Kung University)
Benchmarking benchmarks: introducing new automatic indicators for benchmarking Spoken Language Understanding corpora
Poster; 1000–1200
FREDERIC BECHET (Aix Marseille Universite - LIS/CNRS), Christian Raymond (INSA de Rennes - IRISA)
A Neural Turn-taking Model without RNN
Poster; 1000–1200
Chaoran Liu (ATR), Carlos Ishi (ATR Hiroshi Ishiguro Labs.), Hiroshi Ishiguro (ATR Hiroshi Ishiguro Labs)
An Incremental Turn-Taking Model For Task-Oriented Dialog Systems
Poster; 1000–1200
Andrei Catalin Coman (University of Trento), Koichiro Yoshino (Nara Institute of Science and Technology), Yukitoshi Murase (Nara Institute of Science and Technology), Satoshi Nakamura (Nara Institute of Science and Technology), Giuseppe Riccardi (University of Trento)
Personalized Dialogue Response Generation Learned from Monologues
Poster; 1000–1200
Feng-Guang Su (National Taiwan University), Aliyah Hsu (National Taiwan University), Yi-Lin Tuan (National Taiwan University), Hung-yi Lee (National Taiwan University (NTU))
Voice quality as a turn-taking cue
Poster; 1000–1200
Mattias Heldner (Dept Linguistics, Stockholm University), Marcin Wlodarczak (Stockholm Univeristy), Štefan Beňuš (Constantine the Philosopher University in Nitra, Institute of Informatics, SAS, Bratislava), Agustin Gravano (Universidad de Buenos Aires)
Turn-taking Prediction Based on Detection of Transition Relevance Place
Poster; 1000–1200
Kohei Hara (Kyoto University), Koji Inoue (Kyoto University), Katsuya Takanashi (Kyoto University), Tatsuya Kawahara (Kyoto University)
Analysis of effect and timing of fillers in natural turn-taking
Poster; 1000–1200
Divesh Lala (Kyoto University), Shizuka Nakamura (Kyoto University), Tatsuya Kawahara (Kyoto University)
Multimodal Response Obligation Detection with Unsupervised Online Domain Adaptation
Poster; 1000–1200
Shota Horiguchi (Hitachi, Ltd.), Naoyuki Kanda (Hitachi, Ltd.), Kenji Nagamatsu (Hitachi, Ltd.)

Corpus annotation and evaluation[Thu-P-9-D]
Thursday, 19 September, Hall 10/D

On the Role of Style in Parsing Speech with Neural Models
Poster; 1000–1200
Trang Tran (University of Washington), Jiahong Yuan (Liulishuo), Yang Liu (LingoChamp), Mari Ostendorf (University of Washington)
Subword RNNLM Approximations for Out-Of-Vocabulary Keyword Search
Poster; 1000–1200
Mittul Singh (Aalto University), Sami Virpioja (University of Helsinki), Peter Smit (Aalto University), Mikko Kurimo (Aalto University)
Simultaneous Detection and Localization of a Wake-Up Word using Multi-Task Learning of the Duration and Endpoint
Poster; 1000–1200
Takashi Maekaku (Yahoo Japan corporation), Yusuke Kida (Yahoo Japan Corporation), Akihiko Sugiyama (Yahoo Japan Corporation)
On the Contributions of Visual and Textual Supervision in Low-resource Semantic Speech Retrieval
Poster; 1000–1200
Ankita Pasad (Toyota Technological Institute at Chicago), Bowen Shi (Toyota Technological Institute at Chicago), Herman Kamper (Stellenbosch University), Karen Livescu (TTI-Chicago)
Automatic Detection of Off-topic Spoken Responses Using Very Deep Convolutional Neural Networks
Poster; 1000–1200
Xinhao Wang (Educational Testing Service), Su-Youn Yoon (Educational Testing Service), Keelan Evanini (Educational Testing Service), Klaus Zechner (ETS), Yao Qian (Educational Testing Service)
Rescoring Keyword Search Confidence Estimates with Graph-based Re-ranking Using Acoustic Word Embeddings
Poster; 1000–1200
Anna Piunova (RWTH Aachen University), Eugen Beck (RWTH Aachen University), Ralf Schlüter (Lehrstuhl Informatik 6, RWTH Aachen University), Hermann Ney (RWTH Aachen University)
SpeechYOLO: Detection and Localization of Speech Objects
Poster; 1000–1200
Yael Segal (Bar Ilan University), Tzeviya Sylvia Fuchs (Bar-Ilan University), Joseph Keshet (Bar-Ilan University)
Prosodic Phrase Alignment for Machine Dubbing
Poster; 1000–1200
Alp Öktem (Col·lectivaT SCCL), Mireia Farrús (Universitat Pompeu Fabra), Antonio Bonafonte (Universitat Politècnica de Catalunya)
Spot the pleasant people! Navigating the cocktail party buzz
Poster; 1000–1200
Christina Tånnander (Swedish Agency for Accessible Media), Per Fallgren (KTH Royal Institute of Technology), Jens Edlund (KTH Speech, Music and Hearing), Joakim Gustafson (KTH)
Neural Text Clustering with Document-level Attention based on Dynamic Soft Labels
Poster; 1000–1200
Zhi Chen (University of Science and Technology of China), Wu Guo (university of science and technology of china), Lirong Dai (University of Science &Technology of China), Zhen-Hua Ling (University of Science and Technology of China), Jun Du (University of Science and Technologoy of China)
Noisy BiLSTM-based Models for Disfluency Detection
Poster; 1000–1200
Nguyen Bach (Alibaba), Fei Huang (Alibaba)

Speech Enhancement: Multi-channel and Intelligibility[Thu-P-9-E]
Thursday, 19 September, Hall 10/E

On Mitigating Acoustic Feedback in Hearing Aids with Frequency Warping by All-Pass Networks
Poster; 1000–1200
Ching-Hua Lee (University of California, San Diego), Kuan-Lin Chen (University of California, San Diego), fred harris (University of California, San Diego), Bhaskar D. Rao (University of California, San Diego), Harinath Garudadri (University of California, San Diego)
Neural Spatial Filter: Target Speaker Speech Separation Assisted with Directional Information
Poster; 1000–1200
Rongzhi Gu (Peking University Shenzhen Graduate School), Lianwu Chen (Tencent AI Lab, Shenzhen), Shixiong Zhang (AI Lab, Tencent, Seattle), Jimeng Zheng (Tencent AI Lab Shenzhen), Meng Yu (Tencent AI Lab), Yong Xu (Tencent AI Lab), Dan Su (Tencent AILab Shenzhen), Yuexian Zou (Peking University Shenzhen Graduate School), Dong Yu (Tencent AI Lab)
My lips are concealed: Audio-visual speech enhancement through obstructions
Poster; 1000–1200
Triantafyllos Afouras (University of Oxford), Joon Son Chung (University of Oxford), Andrew Zisserman (University of Oxford)
Deep Multitask Acoustic Echo Cancellation
Poster; 1000–1200
Amin Fazel (Samsung Semiconductor Inc.), Mostafa El-Khamy (Samsung Semiconductor Inc.), Jungwon Lee (Samsung Semiconductor Inc.)
Deep Learning for Joint Acoustic Echo and Noise Cancellation with Nonlinear Distortions
Poster; 1000–1200
Hao Zhang (The Ohio State University, USA), Ke Tan (The Ohio State University, USA), DeLiang Wang (Ohio State University)
Harmonic beamformers for non-intrusive speech intelligibility prediction
Poster; 1000–1200
Charlotte Sørensen (Aalborg University/GN Hearing A/S), Jesper B. Boldt (GN Hearing A/S), Mads Christensen (Aalborg University)
Convolutional Neural Network-based Speech Enhancement for Cochlear Implant Recipients
Poster; 1000–1200
Nursadul Mamun (Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Computer Engineering, The University of Texas at Dallas), Soheil Khorram (University of Texas at Dallas), John H.L. Hansen (Cochlear Implant Processing Laboratory, Center for Robust Speech Systems (CRSS-CILab), Department of Electrical Computer Engineering, The University of Texas at Dallas)
Validation of the Non-Intrusive Codebook-based Short Time Objective Intelligibility Metric for Processed Speech
Poster; 1000–1200
Charlotte Sørensen (Aalborg University/GN Hearing A/S), Jesper B. Boldt (GN Hearing A/S), Mads Christensen (Aalborg University)
Predicting Speech Intelligibility of Enhanced Speech Using Phone Accuracy of DNN-based ASR Systems
Poster; 1000–1200
Kenichi Arai (NTT Communication SCience Laboratories), Shoko Araki (NTT Communication Science Laboratories), Atsunori Ogawa (NTT Communication Science Laboratories), Keisuke Kinoshita (NTT), Tomohiro Nakatani (NTT Corporation), Katsuhiko Yamamoto (Wakayama University), Toshio Irino (Wakayama University)
A novel method to correct steering vectors in MVDR beamformer for noise robust ASR
Poster; 1000–1200
suliang bu (Univ of Missouri-Columbia), Yunxin Zhao (University of Missouri), Mei-Yuh Hwang (Mobvoi AI Lab, Redmond WA)
End-to-End Multi-Channel Speech Enhancement Using Inter-Channel Time-Restricted Attention on Raw Waveform
Poster; 1000–1200
Hyeonseung Lee (Seoul National University), Hyung Yong Kim (Seoul National University), Woo Hyun Kang (Department of Electrical and Computer Engineering and INMC, Seoul National University), Jeunghun Kim (0), Nam Soo Kim (Seoul National University)

Privacy in Speech and Audio Interfaces[Thu-SS-9-6]
Thursday, 19 September, Hall 3 [More info]

The GDPR & Speech Data: Reflections of Legal and Technology Communities - First Steps towards a Common Understanding
Oral; 1000–1020
Andreas Nautsch (EURECOM), Catherine Jasserand (University of Groningen), Els Kindt (KU Leuven), Massimiliano Todisco (EURECOM - School of Engineering & Research Center - Digital Security Department), Isabel Trancoso (INESC-ID / IST), Nicholas Evans (EURECOM)
Privacy-Preserving Adversarial Representation Learning in ASR: Reality or Illusion?
Oral; 1020–1040
Brij Mohan Lal Srivastava (Inria), Aurélien Bellet (INRIA), Marc Tommasi (Université de Lille), Emmanuel Vincent (Inria)
Privacy-preserving Siamese Feature Extraction for Gender Recognition Versus Speaker Identification
Oral; 1040–1100
Alexandru Nelus (Institute of Communication Acoustics, Ruhr-Universität Bochum), Silas Rech (Institute of Communication Acoustics, Ruhr-Universität Bochum), Timm Koppelmann (Institute of Communication Acoustics, Ruhr-Universität Bochum), Henrik Biermann (Institute of Communication Acoustics, Ruhr-Universität Bochum), Rainer Martin (Ruhr-Universität Bochum)
Privacy-preserving Variational Information Feature Extraction for Domestic Activity Monitoring Versus Speaker Identification
Oral; 1100–1120
Alexandru Nelus (Institute of Communication Acoustics, Ruhr-Universität Bochum), Janek Ebbers (Paderborn University), Reinhold Haeb-Umbach (Paderborn University), Rainer Martin (Ruhr-Universität Bochum)
Extracting Mel-Frequency and Bark-Frequency Cepstral Coefficients from Encrypted Signals
Oral; 1120–1140
Patricia Thaine (University of Toronto), Gerald Penn (University of Toronto)
Sound Privacy: A Conversational Speech Corpus for Quantifying the Experience of Privacy
Oral; 1140–1200
Pablo Pérez Zarazaga (Aalto University), sneha das (Aalto University), Tom Bäckström (Aalto University), vishnu vidyadhara raju v (IIIT Hyderabad), Anil Kumar Vuppala (IIIT Hyderabad)

Lunch Break in lower foyer[Thu-B-1]
Thursday, 19 September, Foyer

Lunch Break in lower foyer
Break; 1200–1330

Neural Networks for Language Modeling[Thu-O-10-1]
Thursday, 19 September, Main Hall

Survey Talk
Survery Talk: Reaching over the gap: Cross- and interdisciplinary research on human and automatic speech processing [More info]
Survey Talk; 1330–1410
Odette Scharenborg (Delft University of Technology)
Improved Deep Duel Model for Rescoring N-best Speech Recognition List Using Backward LSTMLM and Ensemble Encoders
Oral; 1410–1430
Atsunori Ogawa (NTT Communication Science Laboratories), Marc Delcroix (NTT Communication Science Laboratories), Shigeki Karita (NTT Communication Science Laboratories), Tomohiro Nakatani (NTT Corporation)
Language Modeling with Deep Transformers
Oral; 1430–1450
Kazuki Irie (RWTH Aachen University), Albert Zeyer (Human Language Technology and Pattern Recognition Group (Chair of Computer Science 6), Computer Science Department, RWTH Aachen University), Ralf Schlüter (Lehrstuhl Informatik 6, RWTH Aachen University), Hermann Ney (RWTH Aachen University)
Scalable Multi Corpora Neural Language Models for ASR
Oral; 1450–1510
Anirudh Raju (Amazon), Denis Filimonov (Amazon), Gautam Tiwari (Amazon), Guitang Lan (Amazon), Ariya Rastrow (Amazon.com)
Who Needs Words? Lexicon-Free Speech Recognition
Oral; 1510–1530
Tatiana Likhomanenko (Facebook AI Research), Gabriel Synnaeve (Facebook AI Research), Ronan Collobert (Facebook AI Research)

Representation Learning of Emotion and Paralinguistics[Thu-O-10-2]
Thursday, 19 September, Hall 1

Direct Modelling of Speech Emotion from Raw Speech
Oral; 1330–1530
Siddique Latif (University of Southern Queensland Australia), Rajib Rana (University of Southern Queensland), Sara Khalifa (Distributed Sensing Systems Group, Data61, CSIRO), Raja Jurdak (Distributed Sensing Systems Group, Data61, CSIRO), Julien Epps (School of Electrical Engineering and Telecommunications, UNSW Australia)
Improving Emotion Identification using Phone Posteriors in Raw Speech Waveform based DNN
Oral; 1330–1530
Mousmita Sarma (Department of Electronics and Communication Engineering, Gauhati University), Pegah Ghahremani (Johns Hopkins University), Dan Povey (Johns Hopkins University), Nagendra Goel (GoVivace Inc.), Kandarpa Kumar Sarma (Department of Electronics and Communication Engineering, Gauhati University), Najim Dehak (Johns Hopkins University)
Pyramid Memory Block and Timestep Attention for Speech Emotion Recognition
Oral; 1330–1530
Miao Cao (Department of Computer Science, University of Science and Technology Beijing, Beijing, China), Chun Yang (Department of Computer Science, University of Science and Technology Beijing, Beijing, China), Fang Zhou (Department of Computer Science, University of Science and Technology Beijing, Beijing, China), Xu-cheng Yin (Department of Computer Science, University of Science and Technology Beijing, Beijing, China)
Robust Speech Emotion Recognition under Different Encoding Conditions
Oral; 1330–1530
Christopher Oates (audEERING GmbH), Andreas Triantafyllopoulos (audEERING GmbH), Ingmar Steiner (audEERING GmbH), Björn Schuller (audEERING GmbH)
Using the Bag-of-Audio-Word Feature Representation of ASR DNN Posteriors for Paralinguistic Classification
Oral; 1330–1530
Gábor Gosztolya (Research Group on Artificial Intelligence)
Disentangling Style Factors from Speaker Representations
Oral; 1330–1530
Jennifer Williams (University of Edinburgh), Simon King (University of Edinburgh)

World’s Languages and Varieties[Thu-O-10-3]
Thursday, 19 September, Hall 2

Sentence Prosody and Wh-indeterminates in Taiwan Mandarin
Oral; 1330–1530
Yu-yin Hsu (The Hong Kong Polytechnic University), Anqi Xu (University College London)
Frication as a Vowel Feature? – Evidence from the Rui’an Wu Chinese Dialect
Oral; 1330–1530
Fang Hu (Institute of Linguistics, Chinese Academy of Social Sciences)
Vowels and Diphthongs in the Xupu Xiang Chinese Dialect
Oral; 1330–1530
Zhenrui Zhang (University of Chinese Academy of Social Sciences), Fang Hu (Institute of Linguistics, Chinese Academy of Social Sciences)
Age-related changes in European Portuguese vowel acoustics
Oral; 1330–1530
Luciana Albuquerque (Institute of Electronics and Informatics Engineering of Aveiro (IEETA)/ Center for Health Technology and Services Research (CINTESIS.UA), University of Aveiro), Catarina Oliveira (Institute of Electronics and Informatics Engineering of Aveiro (IEETA)/ School of Health Science (ESSUA), University of Aveiro), António Teixeira (DETI/IEETA, University of Aveiro), Pedro Sa-Couto (Center for Research and Development in Mathematics and Applications (CIDMA)/ Department of Mathematics (DMAT), University of Aveiro), Daniela Figueiredo (Center for Health Technology and Services Research (CINTESIS.UA)/ School of Health Sciences (ESSUA), University of Aveiro)
Vowel-Tone Interaction in Two Tibeto-Burman Languages
Oral; 1330–1530
Wendy Lalhminghlui (Indian Institute of Technology Guwahati), Viyazonuo Terhiija (Indian Institute of Technology Guwahati), Priyankoo Sarmah (Indian Institute of Technology Guwahati)
The Vowel System of Korebaju
Oral; 1330–1530
Jenifer Vega Rodriguez (Université Sorbonne Nouvelle - Paris 3)

Adaptation and Accommodation in Conversation[Thu-O-10-4]
Thursday, 19 September, Hall 11

Fundamental frequency accommodation in multi-party human-robot game interactions: The effect of winning or losing
Oral; 1330–1350
Omnia Ibrahim (URPP Language and Space, University of Zurich), Gabriel Skantze (KTH Speech Music and Hearing), Sabine Stoll (University of Zurich), Volker Dellwo (Department of Computational Linguistics, University of Zurich)
Pitch Accent Trajectories across Different Conditions of Visibility and Information Structure – Evidence from Spontaneous Dyadic Interaction
Oral; 1350–1410
Petra Wagner (Universität Bielefeld), Nataliya Bryhadyr (Bielefeld University), Marin Schröer (Bielefeld University)
The greennn tree - lengthening position influences uncertainty perception
Oral; 1410–1430
Simon Betz (Bielefeld University), Sina Zarrieß (University of Bielefeld), Eva Szekely (KTH Royal Institute of Technology), Petra Wagner (Universität Bielefeld)
CNN-BLSTM Based Question Detection from Dialogs Considering Phase and Context Information
Oral; 1430–1450
Yuke Si (Tianjin University), Longbiao Wang (Tianjin University), Jianwu Dang (JAIST), Mengfei Wu (Tianjin University), Aijun Li (Institute of Linguistics, CASS)
Mirroring to Build Trust in Digital Assistants
Oral; 1450–1510
Katherine Metalf (Apple, Inc.), Barry-John Theobald (Apple, Inc.), Garrett Weinberg (Apple, Inc.), Robert Lee (Apple, Inc.), Ing-Marie Jonsson (Apple, Inc.), Russ Webb (Apple, Inc.), Nicholas Apostoloff (Apple, Inc.)
Three's a Crowd? Effects of a Second Human on Vocal Accommodation with a Voice Assistant
Oral; 1510–1530
Eran Raveh (Saarland University), Ingo Siegert (Otto-von-Guericke University Magdeburg), Ingmar Steiner (audEERING GmbH), Iona Gessinger (Saarland University), Bernd Möbius (Saarland University)

Speaker Recognition III[Thu-P-10-A]
Thursday, 19 September, Gallery A

End-to-End Neural Speaker Diarization with Permutation-Free Objectives
Poster; 1330–1530
Yusuke Fujita (Hitachi, Ltd.), Naoyuki Kanda (Hitachi, Ltd.), Shota Horiguchi (Hitachi, Ltd.), Kenji Nagamatsu (Hitachi, Ltd.), Shinji Watanabe (Johns Hopkins University)
Mixup Learning Strategies for Text-independent Speaker Verification
Poster; 1330–1530
Yingke Zhu (The Hong Kong University of Science & Technology), Tom Ko (South University of Science and Technology), Brian Mak (The Hong Kong University of Science and Technology)
Optimizing a Speaker Embedding Extractor Through Backend-Driven Regularization
Poster; 1330–1530
Luciana Ferrer (CONICET), Mitchell McLaren (SRI International)
The NEC-TT 2018 Speaker Verification System
Poster; 1330–1530
Kong Aik Lee (Data Science Research Laboratories, NEC Corporation), Hitoshi Yamamoto (NEC Corporation), Koji Okabe (NEC Corporation), Qiongqiong Wang (Data Science Research Laboratories, NEC Corporation), Ling Guo (Biometrics Research Laboratories, NEC Corporation), Takafumi Koshinaka (Data Science Research Labs., NEC Corporation), Jiacen Zhang (Tokyo Institute of Technology), Koichi Shinoda (Tokyo Institute of Technology)
Autoencoder-based Semi-Supervised Curriculum Learning For Out-of-domain Speaker Verification
Poster; 1330–1530
Siqi Zheng (Alibaba), Gang Liu (Alibaba Group), Hongbin Suo (Alibaba, Inc.), Yun Lei (Alibaba Group)
Multi-Channel Training for End-to-End Speaker Recognition under Reverberant and Noisy Environment
Poster; 1330–1530
Danwei Cai (Duke Kunshan University), Xiaoyi Qin (Sun Yat-sen University, Guangzhou, China), Ming Li (Duke Kunshan University)
The DKU-SMIIP System for NIST 2018 Speaker Recognition Evaluation
Poster; 1330–1530
Danwei Cai (Duke Kunshan University), Weicheng Cai (Sun Yat-sen University), Ming Li (Duke Kunshan University)
Self Multi-Head Attention for Speaker Recognition
Poster; 1330–1530
Miquel India (Universitat Politecnica de Catalunya), pooyan safari (TALP research center, BarcelonaTech.), Javier Hernando (Universitat Politecnica de Catalunya)
Phonetically-aware embeddings - Wide Residual Networks with Time-Delay Neural Networks and Self Attention models for the 2018 NIST Speaker Recognition Evaluation
Poster; 1330–1530
Ignacio Viñals (ViVoLab, Aragon Institute for Engineering Research (I3A), University of Zaragoza), Dayana Ribas (ViVoLab, University of Zaragoza), Victoria Mingote (University of Zaragoza), Jorge Llombart (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza), Pablo Gimeno (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza), Antonio Miguel (ViVoLAB, Aragon Institute for Engineering Research (I3A), University of Zaragoza, Spain), Alfonso Ortega (University of Zaragoza), Eduardo Lleida Solano (University of Zaragoza)
Variational Domain Adversarial Learning for Speaker Verification
Poster; 1330–1530
Youzhi Tu (The Hong Kong Polytechnic University), Manwai Mak (The Hong Kong Polytechnic University), Jen-Tzung Chien (National Chiao Tung University)
A Unified Framework for Speaker and Utterance Verification
Poster; 1330–1530
Liu Tianchi (National University of Singapore), Maulik Madhavi (National University of Singapore), Rohan Kumar Das (National University Singapore), Haizhou Li (National University of Singapore)
Analysis of Critical Metadata Factors for the Calibration of Speaker Recognition Systems
Poster; 1330–1530
Mahesh Kumar Nandwana (SRI International), Luciana Ferrer (CONICET), Mitchell McLaren (SRI International), Diego Castan (SRI International), Aaron Lawson (SRI International)
Factorization of Discriminatively Trained i-vector Extractor for Speaker Recognition
Poster; 1330–1530
Ondřej Novotný (Brno University of Technology), Oldrich Plchot (Brno University of Technology), Ondrej Glembek (Brno University of Technology), Lukas Burget (Brno University of Technology)
End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks
Poster; 1330–1530
Daniele Salvati (University of Udine), Carlo Drioli (University of Udine, Department of Mathematics and Computer Science), Gian Luca Foresti (University of Udine)
Whisper to neutral mapping using cosine similarity maximization in i-vector space for speaker verification
Poster; 1330–1530
Abinay Reddy Naini (Indian Institute of Science), Achuth Rao MV (Student), Prasanta Ghosh (Assistant Professor, EE, IISc)

NN architectures for ASR[Thu-P-10-B]
Thursday, 19 September, Gallery B

Pretraining by Backtranslation for End-to-end ASR in Low-Resource Settings
Poster; 1330–1530
Matthew Wiesner (Johns Hopkins University), Adithya Renduchintala (Johns Hopkins University), Shinji Watanabe (Johns Hopkins University), Chunxi Liu (Johns Hopkins University), Najim Dehak (Johns Hopkins University), Sanjeev Khudanpur (Johns Hopkins University)
Ectc-Docd: An End-to-end Structure with CTC Encoder and OCD Decoder For Speech Recognition
Poster; 1330–1530
Cheng Yi (Institute of Automation, Chinese Academy of Sciences), Feng Wang (Institute of Automation, Chinese Academy of Sciences), Bo Xu (Institute of Automation, Chinese Academy of Sciences)
End-to-End Multi-Speaker Speech Recognition using Speaker Embeddings and Transfer Learning
Poster; 1330–1530
Pavel Denisov (University of Stuttgart), Ngoc Thang Vu (University of Stuttgart)
Cross-Attention End-to-End ASR for Two-Party Conversations
Poster; 1330–1530
Suyoun Kim (Carnegie Mellon University), Siddharth Dalmia (Carnegie Mellon University), Florian Metze (Carnegie Mellon University)
Towards using context-dependent symbols in CTC without state-tying decision trees
Poster; 1330–1530
Jan Chorowski (University of Wroclaw), Adrian Łańcucki (University of Wroclaw), Bartosz Kostka (University of Wroclaw), Michał Zapotoczny (University of Wroclaw)
AN ONLINE ATTENTION-BASED MODEL FOR SPEECH RECOGNITION
Poster; 1330–1530
Ruchao Fan (Beijing University of Posts and Telecommunications), Pan Zhou (Department of Computer Science and Technology, Tsinghua University), Wei Chen (Voice Interaction Technology Center, Sogou Inc.), Jia Jia (Department of Computer Science and Technology, Tsinghua University), Gang Liu (Beijing University of Posts and Telecommunications)
Self-Attention Transducers for End-to-End Speech Recognition
Poster; 1330–1530
Zhengkun Tian (Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi (Institute of Automation, Chinese Academy of Sciences), Jianhua Tao (Institute of Automation, Chinese Academy of Sciences), Ye Bai (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, China), Zhengqi Wen (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences)
Improving Transformer-based Speech Recognition Systems with Compressed Structure and Speech Attributes Augmentation
Poster; 1330–1530
Sheng Li (National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory), Raj Dabre (National Institute of Information and Communications Technology (NICT)), Xugang Lu (NICT), Peng Shen (NICT), Tatsuya Kawahara (Kyoto University), Hisashi Kawai (National Institute of Information and Communications Technology (NICT), Advanced Speech Technology Laboratory)
Extending an Acoustic Data-Driven Phone Set for Spontaneous Speech Recognition
Poster; 1330–1530
Jeong-Uk Bang (Chungbuk National University), Mu-Yeol Choi (ETRI), Sang-Hun Kim (ETRI), Oh-Wook Kwon (Chungbuk National University)
Joint Maximization Decoder with Neural Converters for Fully Neural Network-based Japanese Speech Recognition
Poster; 1330–1530
Takafumi Moriya (NTT Corporation), Jian Wang (The University of Tokyo), Tomohiro Tanaka (NTT Corporation), Ryo Masumura (NTT Corporation), Yusuke Shinohara (NTT Corporation), Yoshikazu Yamaguchi (NTT Corporation), Yushi Aono (NTT Corporation)
Real to H-space Encoder for Speech Recognition
Poster; 1330–1530
Titouan parcollet (University of Avignon), Mohamed Morchid (University of Avignon), Georges Linares (LIA, University of Avignon), Renato de Mori (Mc Gill University and University of Avignon)

Speech synthesis: text processing, prosody, and emotion[Thu-P-10-C]
Thursday, 19 September, Gallery C

Pre-trained Text Embeddings for Enhanced Text-to-Speech Synthesis
Poster; 1330–1530
Tomoki Hayashi (Nagoya University), Shinji Watanabe (Johns Hopkins University), Tomoki Toda (Nagoya University), Kazuya TAKEDA (Professor), Shubham Toshniwal (Toyota Technological Institute at Chicago), Karen Livescu (TTI-Chicago)
Visualization and Interpretation of Latent Spaces for Controlling Expressive Speech Synthesis through Audio Analysis
Poster; 1330–1530
Noé Tits (University of Mons), Fengna Wang (Acapela Group), Kevin El Haddad (University of Mons), Vincent Pagel (Acapela Group), Thierry Dutoit (University of Mons)
Pre-trained Text Representations for Improving Front-End Text Processing in Mandarin Text-to-Speech Synthesis
Poster; 1330–1530
Bing Yang (Tencent), Jiaqi Zhong (Tencent), Shan Liu (Tencent)
A Mandarin Prosodic Boundary Prediction Model Based on Multi-Task Learning
Poster; 1330–1530
Huashan Pan (Databaker (Beijing) Technology Company Limited), Xiulin Li (Databaker (Beijing) Technology Company Limited), Zhiqiang Huang (Databaker (Beijing) Technology Company Limited)
Dual Encoder Classifier Models as Constraints in Neural Text Normalization
Poster; 1330–1530
Ajda Gokcen (University of Washington), Hao Zhang (Google), Richard Sproat (None)
Knowledge-based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis
Poster; 1330–1530
Jingbei Li (Tsinghua University), Zhiyong Wu (Tsinghua University), Runnan Li (Tsinghua University (THU)), Pengpeng Zhi (TAL Education Group), Song Yang (TAL Education Group), Helen Meng (The Chinese University of Hong Kong)
Automated Emotion Morphing in Speech Based on Diffeomorphic Curve Registration and Highway Networks
Poster; 1330–1530
Ravi Shankar (Johns Hopkins University), Hsi-Wei Hsieh (Johns Hopkins University), Nicolas Charon (Johns Hopkins University), Archana Venkataraman (Johns Hopkins University)
Spontaneous Conversational Speech Synthesis from Found Data
Poster; 1330–1530
Eva Szekely (KTH Royal Institute of Technology), Gustav Eje Henter (KTH Royal Institute of Technology), Jonas Beskow (KTH Speech, music and hearing), Joakim Gustafson (KTH)
FINE-GRAINED ROBUST PROSODY TRANSFER FOR SINGLE-SPEAKER NEURAL TEXT-TO-SPEECH
Poster; 1330–1530
Viacheslav Klimkov (Amazon.com), Srikanth Ronanki (Amazon), Jonas Rohnke (Amazon), Thomas Drugman (Amazon)
Speech Driven Backchannel Generation using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction
Poster; 1330–1530
Nusrah Hussain (Koç University), Engin Erzin (Koc University), Metin Sezgin (0), Yucel Yemez (Koc University)
Semi-supervised Prosody Modeling Using Deep Gaussian Process Latent Variable Model
Poster; 1330–1530
Tomoki Koriyama (Tokyo Institute of Technology), Takao Kobayashi (Tokyo Institute of Technology)
Bootstrapping a Text Normalization System for an Inflected Language. Numbers as a Test Case
Poster; 1330–1530
Anna Björk Nikulásdóttir (Reykjavik University), Jón Guðnason (Reykjavik University)
Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS
Poster; 1330–1530
Haohan Guo (School of Computer Science, Northwestern Polytechnical University, Xian, China), Frank Soong (Microsoft AI & Research, Beijing, China), Lei He (Microsoft AI & Research, Beijing, China), Lei Xie (School of Computer Science, Northwestern Polytechnical University, Xian, China)
Duration modeling with global phoneme-duration vectors
Poster; 1330–1530
Jinfu Ni (Advanced Speech Technology Laboratory, ASTREC, National Institute of Information and Communications Technology), Yoshinori Shiga (National Institute of Information and Communications Technology), Hisashi Kawai (National Institute of Information and Communications Technology)
Improving speech synthesis with discourse relations
Poster; 1330–1530
Adele Aubin (University of Edinburgh), Alessandra Cervone (University of Trento), Oliver Watts (Edinburgh University), Simon King (University of Edinburgh)

Speech and voice disorders II[Thu-P-10-D]
Thursday, 19 September, Hall 10/D

Use of Beiwe Smartphone App to Track Speech Decline in Amyotrophic Lateral Sclerosis (ALS)
Poster; 1330–1530
Kathryn Connaghan (MGH Institute of Health Professions), Jordan Green (MGH Institute of Health Professions), Sabrina Paganoni (Harvard Medical School, School of Medicine), James Chan (Massachusetts General Hospital Department of Biostatistics), Harli Weber (Neurological Clinical Research Institute, Department of Neurology, Massachusetts General Hospital), Ella Collins (Neurological Clinical Research Institute, Department of Neurology, Massachusetts General Hospital), Brian Richburg (MGH Institute of Health Professions), Marziye Eshghi (MGH Institute of Health Professions), JP Onnela (T.H. Chan Harvard School of Public Health), James Berry (Harvard Medical School, School of Medicine)
Parallel vs. non-parallel voice conversion for esophageal speech.
Poster; 1330–1530
Luis Serrano (University of the Basque Country), Sneha Raman (University of the Basque Country), David Tavarez (University of the Basque Country - UPV/EHU), Eva Navas (University of the Basque Country (UPV/EHU)), Inma Hernaez (University of the Basque Country (UPV/EHU))
Hypernasality Severity detection using Constant Q Cepstral Coefficients
Poster; 1330–1530
Akhilesh Dubey (IIT Guwahati), S R Mahadeva Prasanna (IIT Guwahati), S Dandapat (IIT Guwahati)
Automatic Depression Level Detection via Lp-norm Pooling
Poster; 1330–1530
Mingyue Niu (Institute of Automation Chinese Academy of Sciences), Jianhua Tao (Institute of Automation Chinese Academy of Sciences), Bin Liu (Institute of Automation Chinese Academy of Sciences), Cunhang Fan (Institute of Automation Chinese Academy of Sciences)
Comparison of Speech Tasks and Recording Devices for Voice Based Automatic Classification of Healthy Subjects and Patients with Amyotrophic Lateral Sclerosis
Poster; 1330–1530
Suhas BN (Indian Institute of Science), Deep Patel (IISc), Nithin Rao (USC), Yamini Belur (National Institute of Mental Health and Neurosciences), Pradeep Reddy (NIMHANS), Nalini Atchayaram (NIMHANS), Ravi Yadav (NIMHANS), Dipanjan Gope (Indian Institute of Science), Prasanta Ghosh (Assistant Professor, EE, IISc)
Profiling Speech Motor Impairments in Persons with Amyotrophic Lateral Sclerosis: An Acoustic-Based Approach
Poster; 1330–1530
Hannah Rowe (Massachusetts General Hospital Institute of Health Professions (MGH IHP)), Jordan Green (MGH IHP)
Diagnosing Dysarthria with Long Short-Term Memory Networks
Poster; 1330–1530
Alex Mayle (Ohio University), Zhiwei Mou (First Affiliated Hospital of Jinan University), Razvan Bunescu (Ohio University), Sadegh Mirshekarian (Ohio University), Li Xu (Ohio University), Chang Liu (Ohio University)
Modification of Devoicing Error in Cleft Lip and Palate Speech
Poster; 1330–1530
Protima Nomo Sudro (IIT Guwahati), S R Mahadeva Prasanna (IIT Guwahati)
Reduced Task Adaptation in Alternating Motion Rate Tasks as an Early Marker of Bulbar Involvement in Amyotrophic Lateral Sclerosis
Poster; 1330–1530
Marziye Eshghi (MGH Institute of Health Professions), Panying Rong (University of Kansas), Antje S. Mefferd (Vanderbilt University), Kaila Stipancic (MGH Institute of Health Professions), Yana Yunusova (University of Toronto), Jordan R. Green (MGH Institute of Health Professions)
Towards the Speech Features of Early-stage Dementia: Design and Application of the Mandarin Elderly Cognitive Speech Database
Poster; 1330–1530
Tianqi Wang (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Quanlei Yan (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Jingshen Pan (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences), Feiqi Zhu (The Third Affiliated Hospital of Shenzhen University), Rongfeng Su (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences.), Yi Guo (Shenzhen People’s Hospital,Shenzhen), Lan Wang (SIAT), Nan Yan (Shenzhen Institutes of Advanced Technology)
Acoustic characteristics of lexical tone disruption in Mandarin speakers after brain damage
Poster; 1330–1530
Wenjun Chen (Shanghai International Studies University), Jeroen van de Weijer (Shenzhen University), Shuangshuang Zhu (Shanghai Sunshine Rehabilitation Center), Qian Qian (Shanghai Sunshine Rehabilitation Center), Manna Wang (Shanghai Sunshine Rehabilitation Center)
Intragestural variation in natural sentence production: Essential Tremor patients treated with DBS
Poster; 1330–1530
Anne Hermes (Laboratoire de Phonétique et Phonologie (CNRS/Sorbonne Nouvelle)), Doris Muecke (IfL Phonetics, University of Cologne), Tabea Thies (IfL Phonetics, University of Cologne), Michael T. Barbe (Department of Neurology, University Hospital Cologne)
Nasal Air Emission in Sibilant Fricatives of Cleft Lip and Palate Speech
Poster; 1330–1530
Sishir Kalita (IIT Guwahati), Protima Nomo Sudro (IIT Guwahati), S R Mahadeva Prasanna (IIT Guwahati), S Dandapat (IIT guwahati)

Speech and Audio Source Separation and Scene Analysis 3[Thu-P-10-E]
Thursday, 19 September, Hall 10/E

A MODIFIED ALGORITHM FOR MULTIPLE INPUT SPECTROGRAM INVERSION
Poster; 1330–1530
Dongxiao Wang (Tokyo Institute of Technology), Hirokazu Kameoka (NTT Communication Science Laboratories), Koichi Shinoda (Tokyo Institute of Technology)
End-to-End Monaural Speech Separation with Multi-Scale Dynamic Weighted Gated Dilated Convolutional Pyramid Network
Poster; 1330–1530
Ziqiang Shi (Fujitsu Research and Development Center), Huibin Lin (Fujitsu Research and Development Center), Liu Liu (Fujitsu Research and Development Center), Rujie Liu (Fujitsu Research and Development Center), Shoji Hayakawa (Fujitsu Laboratories Ltd.), Shouji Harada (Fujitsu Laboratories Ltd.), Jiqing Han (Harbin Institute of Technology)
End-to-end music source separation: is it possible in the waveform domain?
Poster; 1330–1530
Francesc Lluís (Music Technology Group (Universitat Pompeu Fabra)), Jordi Pons (Music Technology Group (Universitat Pompeu Fabra)), Xavier Serra (Music Technology Group (Universitat Pompeu Fabra))
A comprehensive study of speech separation: spectrogram vs waveform separation
Poster; 1330–1530
Fahimeh Bahmaninezhad (University of Texas at Dallas), Jian Wu (Northwestern Polytechnical University, Xian, China), Rongzhi Gu (Peking University Shenzhen Graduate School), Shi-Xiong Zhang (Tencent AI Lab, Bellevue WA, USA), Yong Xu (Tencent AI Lab, Bellevue WA, USA), Meng Yu (Tencent AI Lab, Bellevue WA, USA), Dong Yu (Tencent AI Lab, Bellevue WA, USA)
Evaluating Audiovisual Source Separation in the Context of Video Conferencing
Poster; 1330–1530
Berkay Inan (École polytechnique fédérale de Lausanne), Milos Cernak (Logitech Europe), Helmut Grabner (Zurich University of Applied Sciences), Helena Peic Tukuljac (École Polytechnique Fédérale de Lausanne), Rodrigo Pena (École Polytechnique Fédérale de Lausanne), Benjamin Ricaud (École Polytechnique Fédérale de Lausanne)
Influence of Speaker-Specific Parameters on Speech Separation Systems
Poster; 1330–1530
David Ditter (University of Hamburg), Timo Gerkmannn (University of Hamburg)
CNN-LSTM models for Multi-Speaker Source Separation using Bayesian Hyper Parameter Optimization
Poster; 1330–1530
Jeroen Zegers (KU Leuven, Dept. ESAT), Hugo Van hamme (KU Leuven)
Towards joint sound scene and polyphonic sound event recognition
Poster; 1330–1530
Helen L Bear (Queen Mary University of London), Ines Nolasco (Queen Mary University of London), Emmanouil Benetos (Queen Mary University of London)
Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features
Poster; 1330–1530
Cunhang Fan (Institute of Automation, Chinese Academy of Sciences), Bin Liu (Institute of Automation, Chinese Academy of Sciences), Jianhua Tao (Institute of Automation, Chinese Academy of Sciences), Jiangyan Yi (Institute of Automation, Chinese Academy of Sciences), Zhengqi Wen (Institute of Automation, Chinese Academy of Sciences)
Probabilistic Permutation Invariant Training for Speech Separation
Poster; 1330–1530
Midia Yousefi (Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems), Soheil Khorram (University of Texas at Dallas), John H.L. Hansen (Univ. of Texas at Dallas; CRSS - Center for Robust Speech Systems)
Which Ones Are Speaking? Speaker-inferred Model for Multi-talker Speech Separation
Poster; 1330–1530
Jing Shi (Institute of Automation, Chinese Academy of Sciences.), Jiaming Xu (Institute of Automation, Chinese Academy of Sciences.), Bo Xu (Institute of Automation, Chinese Academy of Sciences.)

Speech-to-text and Speech Assessment[Thu-S&T-6]
Thursday, 19 September, Hall 4

Elpis, an accessible speech-to-text tool
Show&Tell; 1330–1530
Ben Foley (The University of Queensland), Alina Rakhi , Nicholas Lambourne , Nicholas Buckeridge , Janet Wiles
Framework for conducting tasks requiring human assessment
Show&Tell; 1330–1530
Martin Gruber (New Technologies for the Information Society (NTIS) Faculty of Applied Sciences, University of West Bohemia), Adam Chylek , Jindrich Matousek
Multimedia Simultaneous Translation System for Minority Language Communication with Mandarin
Show&Tell; 1330–1530
Shen Huang (Tencent Minority-Mandarin Translation), Bojie Hu , Shan Huang , Pengfei Hu , Jian Kang , Zhiqiang Lv , Jinghao Yan , Qi Ju , Shiyin Kang , Deyi Tuo , Nurmemet Yolwas
The SAIL LABS Media Mining Indexer and the CAVA Framework
Show&Tell; 1330–1530
Erinc Dikici (SAIL LABS Technology GmbH), Gerhard Backfried , Jürgen Riedler
CaptionAI: a real-time multilingual captioning application
Show&Tell; 1330–1530
Nagendra Kumar Goel (Go-Vivace Inc., McLean, VA), Mousmita Sarma , Saikiran Valluri , Dharmeshkumar Agrawal , Steve Braich , Tejendra Singh Kuswah , Zikra Iqbal , Surbhi Chauhan , Raj Karbar

Speech Technologies for Code-Switching in Multilingual Communities[Thu-SS-10-5]
Thursday, 19 September, Hall 12 [More info]

Improving Code-Switched Language Modeling Performance Using Cognate Features
Oral; 1330–1347
Victor Soto (Columbia University), Julia Hirschberg (Columbia University)
Linguistically Motivated Parallel Data Augmentation for Code-switch Language Modeling
Oral; 1347–1404
Grandee Lee (National University of Singapore), XIANGHU YUE (Department of Electrical & Computer Engineering, National University of Singapore), Haizhou Li (National University of Singapore)
Variational Attention using Articulatory Priors for generting Code Mixed Speech using Monolingual Corpora
Oral; 1404–1421
SaiKrishna Rallabandi (Carnegie Mellon University), Alan W Black (Carnegie Mellon University)
Code-Switching Detection Using ASR-Generated Language Posteriors
Oral; 1421–1438
Qinyi Wang (National University of Singapore), Emre Yilmaz (National University of Singapore), Adem Derinel (National University of Singapore), Haizhou Li (National University of Singapore)
Semi-supervised acoustic model training for five-lingual code-switched ASR
Oral; 1438–1455
Astik Biswas (Stellenbosch University), Emre Yilmaz (National University of Singapore), Febe De Wet (Stellenbosch University), Ewald Van der westhuizen (Stellenbosch University), Thomas Niesler (University of Stellenbosch)
Multi-Graph Decoding for Code-Switching ASR
Oral; 1455–1512
Emre Yilmaz (National University of Singapore), Samuel Cohen (National University of Singapore), Xianghu Yue (National University of Singapore), David van Leeuwen (Radboud University Nijmegen), Haizhou Li (National University of Singapore)
End-to-End Multilingual Multi-Speaker Speech Recognition
Oral; 1512–1530
Hiroshi Seki (Toyohashi University of Technology), Takaaki Hori (Mitsubishi Electric Research Laboratories), Shinji Watanabe (Johns Hopkins University), Jonathan Le Roux (Mitsubishi Electric Research Laboratories), John Hershey (MERL)

Coffee break in both exhibition foyers, lower and upper level 1[Thu-B-2]
Thursday, 19 September, Foyer

Coffee break in both exhibition foyers, lower and upper level 1
Break; 1530–1600

Closing Session[Thu-G-3]
Thursday, 19 September, Main Hall

Closing Session
General; 1600–1700