Selected Publications
Audio/Speech/Music Generation
2025
pre-print
CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech
| Helin Wang*, Jiarui Hai*, Dading Chong, Karan Thakkar, Tiantian Feng, Dongchao Yang, Junhyeok Lee, Laureano Moro Velazquez, Jesus Villalba, Zengyi Qin, Shrikanth Narayanan, Mounya Elhiali, Najim Dehak | [paper] [page] [code] [space]
2025
Interspeech
EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer | Jiarui Hai, Yong Xu, Hao Zhang, Chenxing Li, Helin Wang, Mounya Elhilali, Dong Yu | [paper] [page] [code] [space]
2025
ICASSP
SSR Speech: Towards Stable, Safe, and Robust Zero-shot Text-based Speech Editing and Synthesis | Helin Wang, Meng Yu, Jiarui Hai, Chen Chen, Yuchen Hu, Rilin Chen, Najim Dehak, Dong Yu | [paper] [page] [code]
2024
Interspeech
DreamVoice: Text-Guided Voice Conversion | Jiarui Hai*, Karan Thakkar*, Helin Wang, Zengyi Qin, Mounya Elhilali | [paper] [code] [page]
2023
WASPAA
Diff-Pitcher: Diffusion-based Singing Voice Pitch Correction | Jiarui Hai, Mounya Elhilali | [paper] [page] [code]
Audio/Speech/Music Separation
2025
pre-print
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline | Helin Wang, Jiarui Hai, Dongchao Yang, Chen Chen, Kai Li, Junyi Peng, Thomas Thebaud, Laureano Moro Velazquez, Jesus Villalba, Najim Dehak | [paper] [page] [code] [space]
2025
ICASSP
SoloAudio: Target Sound Extraction with Language-oriented Audio Diffusion Transformer | Helin Wang*, Jiarui Hai*, Yen-Ju Lu, Karan Thakkar, Mounya Elhilali, Najim Dehak | [paper] [page] [code]
2024
Interspeech
Noise-robust Speech Separation with Fast Generative Correction | Helin Wang, Jesus Villalba, Laureano Moro-Velazquez, Jiarui Hai, Thomas Thebaud, Najim Dehak | [paper]
2024
ICASSP
DPM-TSE: A Diffusion Probabilistic Model for Target Sound Extraction | Jiarui Hai*, Heilin Wang*, Dongchao Yang, Karan Thakkar, Najim Dehak, Mounya Elhilali | [paper] [page] [code]
Audio/Speech/Music Understanding
2024
ICASSP
Investigating Self-Supervised Deep Representations for EEG-based Auditory Attention Decoding | Karan Thakkar, Jiarui Hai, Mounya Elhilali | [paper]
2023
ASRU
Boosting Modality Representation with Pre-trained Models and Multi-task Training for Multimodal Sentiment Analysis | Jiarui Hai*, Yu-Jeh Liu*, Mounya Elhilali | [paper]
2022
ICASSP
Progressive Teacher-Student Training Framework for Music Tagging | Rui Lu, Baigong Zheng, Jiarui Hai, Fei Tao, Zhiyao Duan, Ji Liu | [paper]