Ph.D. Student, School of Computing
National University of Singapore (NUS)
I am a Ph.D student at the School of Computing, National University of Singapore (NUS), supervised by Prof. Jin-Song Dong. Previously, I received my M.Eng degree from Tsinghua University and my B.Eng degree from Hohai University. I was a research intern at Ping An Technology, working on the robustness of face anti-spoofing, and an intern developer for MindSpore at Huawei. My research lies in the security and safety of multimodal AI systems. My core focus is video security, particularly the safety and robustness of video large language models. More broadly, I study how perception and multimodal models can fail or be attacked across temporal and visual data, spanning adversarial machine learning, the safety of large multimodal and audio-language models, and trustworthy AI. Earlier in my research career, I worked on wireless sensor networks.
My core work studies the robustness of video recognition models and the safety failures of VideoLLMs. A central question is how video recognition systems can be fooled by stylized perturbations (IEEE S&P 2023, AAAI 2024, TDSC 2026, DLSP 2024), and why VideoLLMs perform relatively weak, e.g., missing harmful content that is plainly visible to humans (AAAI 2026).
Style-driven, unrestricted black-box adversarial attack that fools video classifiers with natural style transfer.
Extends the idea to a regional logo, attacking video recognition through stylized logos.
Generalizes style-based attacks to segmented regions via the Segment Anything Model.
Beyond video, I study security and safety issues in other temporal data. On the audio side I work on guardrails for audio-language models and proactive voice protection; on skeleton data I study query-efficient attacks against action recognition systems.
I also work on image-based adversarial machine learning and other interdisciplinary AI or AI-security topics.
Proposes query-efficient skeletal attacks (ISAAC-K/N) with bone-length and temporal constraints, and uncovers a query-free no-box attack that exposes the fragility of skeleton-based action recognition.
NeurIPS'2026, CVPR'2026, ECCV'2026, ICML'2026, AAAI'2026, ICLR'2026, ACM MM'2025, SMC'2024, TIFS, TDSC, TETCI, NCAA
S&P'2023, USENIX'2023, ACML'2023, PRCV'2023, TIFS, Sustainability
NDSS'2025