Tony Woo

Hello everyone! My name is Tony Woo (or if you prefer my Korean name, Sang Hoon Woo) and I’m a research intern at SK Telecom and the Vision & Learning Lab at Seoul National University, where I work on the Sovereign AI Foundation Model Project, a national initiative supported by the South Korean government. I am also applying to PhD programs in the United States this year.

My works cover a range of topics, but my central research area is conversational AI. Broadly, my research goal is to create natural, context-aware conversational agents that improve human capability. Some of my current subtopics of interest include:

Spoken Dialogue Interface: Humans perform most of their communication through speech, not text. Conversational systems should therefore support natural, voice-based interaction, not just by converting text to speech, but by understanding the characteristics of speech as a medium and adapting or exploiting them for more effective communication.
Multimodal Context: Human perception is inherently multimodal, shaped by simultaneous cues from vision, audio, and more. I am interested in developing conversational agents that can interpret these diverse signals, integrate them meaningfully, and, when appropriate, generate multimodal expressions themselves.
Dynamic Conversational Context: Conversations are dynamic. Users reveal new information about themselves, revise opinions, and shift preferences over time. Even external factors, e.g. temporal, social, or situational, change as interactions unfold. A robust conversational agent must track these evolving contexts and adapt its responses accordingly.

I’m currently advised by Professor Gunhee Kim at Seoul National University. Before my current role, I worked at a couple of Korean startups for my alternative military service, focusing on speech and language technologies. I received my B.S. in Computer Science from University of California Davis, where I had the opportunity to work with Professor Prem Devanbu and Professor Chen-Nee Chuah, that introduced me to the world of research.

news

Oct 13, 2025	Just started my research internship at SK Telecom!
Aug 21, 2025	Our paper Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech got accepted to EMNLP 2025!

selected publications

Think, Verbalize, then Speak: Bridging Complex Thoughts and Comprehensible Speech

Tony Woo^*, Sehun Lee^*, Kang-wook Kim, and Gunhee Kim

In EMNLP 2025
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations

Jaeyeon Kim, Heeseung Yun, Sang Hoon Woo, Chao-Han Huck Yang, and Gunhee Kim

In arXiv preprint
DExTER: Can Omnimodal Language Models Resolve Audio-Visual Deixis?

Sehun Lee, Yoonji Nam, Sang Hoon Woo, and Gunhee Kim

Manuscript under review
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning

Jaeyeon Kim, Jaeyoon Jung, Jinjoo Lee, and Sang Hoon Woo

In ICASSP 2024
Talking Face Generation With Multilingual TTS

Hyoung-Kyu Song^*, Sang Hoon Woo^*, Junhyeok Lee, Seungmin Yang, Hyunjae Cho, Youseong Lee, Dongho Choi, and Kang-wook Kim

In CVPR 2022