Selected Publications

Audio/Speech/Music Generation

  • 2025 pre-print CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech | Helin Wang*, Jiarui Hai*, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak | [paper] [page] [code] [space]
  • 2025 Interspeech EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Jiarui Hai, Yong Xu, Hao Zhang, Chenxing Li, Helin Wang, Mounya Elhilali, Dong Yu | [paper] [page] [code] [space]
  • 2025 ICASSP SSR Speech: Towards Stable, Safe, and Robust Zero-shot Text-based Speech Editing and Synthesis | Helin Wang, Meng Yu, Jiarui Hai, Chen Chen, Yuchen Hu, Rilin Chen, Najim Dehak, Dong Yu | [paper] [page] [code]
  • 2024 Interspeech DreamVoice: Text-Guided Voice Conversion | Jiarui Hai*, Karan Thakkar*, Helin Wang, Zengyi Qin, Mounya Elhilali | [paper] [code] [page]
  • 2023 WASPAA Diff-Pitcher: Diffusion-based Singing Voice Pitch Correction | Jiarui Hai, Mounya Elhilali | [paper] [page] [code]

Audio/Speech/Music Separation

  • 2025 pre-print SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline | Helin Wang, Jiarui Hai, Dongchao Yang, Chen Chen, Kai Li, Junyi Peng, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Najim Dehak | [paper] [page] [code] [space]
  • 2025 ICASSP SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer | Helin Wang*, Jiarui Hai*, Yen-Ju Lu, Karan Thakkar, Mounya Elhilali, Najim Dehak | [paper] [page] [code]
  • 2024 Interspeech Noise-robust Speech Separation with Fast Generative Correction | Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak | [paper]
  • 2024 ICASSP DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction | Jiarui Hai*, Heilin Wang*, Dongchao Yang, Karan Thakkar, Najim Dehak, Mounya Elhilali | [paper] [page] [code]

Audio/Speech/Music Understanding

  • 2024 ICASSP Investigating Self-Supervised Deep Representations for EEG-based Auditory Attention Decoding | Karan Thakkar, Jiarui Hai, Mounya Elhilali | [paper]
  • 2023 ASRU Boosting Modality Representation with Pre-trained Models and Multi-task Training for Multimodal Sentiment Analysis | Jiarui Hai*, Yu-Jeh Liu*, Mounya Elhilali | [paper]
  • 2022 ICASSP Progressive Teacher-Student Training Framework for Music Tagging | Rui Lu, Baigong Zheng, Jiarui Hai, Fei Tao, Zhiyao Duan, Ji Liu | [paper]