publications | Tony Woo

2025

Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech

Tony Woo^*, Sehun Lee^*, Kang-wook Kim, and Gunhee Kim

In EMNLP 2025
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations

Jaeyeon Kim, Heeseung Yun, Sang Hoon Woo, Chao-Han Huck Yang, and Gunhee Kim

In arXiv preprint
SubAlign: Speech Tokenization Aligned with LLM Vocabularies for Spoken Language Modeling

Kang-wook Kim, Sehun Lee, Sang Hoon Woo, and Gunhee Kim

In TTIC Summer Workshop on Foundations of Speech and Audio Foundation Models 2025
DExTER: Can Omnimodal Language Models Resolve Audio-Visual Deixis?

Sehun Lee, Yoonji Nam, Sang Hoon Woo, and Gunhee Kim

Manuscript under review

EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Jaeyeon Kim, Jaeyoon Jung, Jinjoo Lee, and Sang Hoon Woo

In ICASSP 2024
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning

Jaeyeon Kim, Jaeyoon Jung, Minjeong Jeon, Sang Hoon Woo, and Jinjoo Lee

In DCASE2024 Challenge
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Jaeyeon Kim, Minjeong Jeon, Jaeyoon Jung, Sang Hoon Woo, and Jinjoo Lee

In DCASE2024 Workshop

Talking Face Generation With Multilingual TTS

Hyoung-Kyu Song^*, Sang Hoon Woo^*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, and Kang-wook Kim

In CVPR 2022
SANE-TTS: Stable And Natural End-to-End Multilingual Text-to-Speech

Hyunjae Cho, Wonbin Jung, Junhyeok Lee, and Sang Hoon Woo

In Interspeech 2022

Leveraging IoTs and Machine Learning for Patient Diagnosis and Ventilation Management in the Intensive Care Unit

Gregory B. Rehm, Sang Hoon Woo, Xin Luigi Chen, Brooks T. Kuhn, Irene Cortes-Puch, Nicholas R. Anderson, Jason Y. Adams, and Chen-Nee Chuah

IEEE Pervasive Computing