Yuxin Cao

About

I am a Ph.D student at the School of Computing, National University of Singapore (NUS), supervised by Prof. Jin-Song Dong. Previously, I received my M.Eng degree from Tsinghua University and my B.Eng degree from Hohai University. I was a research intern at Ping An Technology, working on the robustness of face anti-spoofing, and an intern developer for MindSpore at Huawei. My research lies in the security and safety of multimodal AI systems. My core focus is video security, particularly the safety and robustness of video large language models. More broadly, I study how perception and multimodal models can fail or be attacked across temporal and visual data, spanning adversarial machine learning, the safety of large multimodal and audio-language models, and trustworthy AI. Earlier in my research career, I worked on wireless sensor networks.

News

2026/02 — One conference paper on backdoor attack is accepted to CVPR 2026.
2025/11 — One journal paper on skeletal adversarial attack is accepted to TIFS.
2025/09 — One conference paper on videoLLM safety is accepted to AAAI 2026.
2025/09 — Two conference papers on audio safeguard are accepted to NeurIPS 2025.
2025/01 — Two conference papers on audio protection are accepted to USENIX Security 2025.
2023/12 — One conference paper on video adversarial attacks is accepted to AAAI 2024.
2023/09 — One conference paper on face anti-spoofing detection is accepted to NeurIPS 2024.
2022/08 — One conference paper on video adversarial attacks is accepted to IEEE S&P 2023.

2024/12 — Two conference papers are accepted to ICASSP 2025.
2024/07 — One conference paper on super-resolution is accepted to ACM MM 2024.
2024/06 — I am awarded with Outstanding Graduates of Beijing!
2024/05 — One conference paper on double sampling randomized smoothing is accepted to ICML 2024.
2022/06 — I am awarded with Outstanding Undergraduate Thesis of Jiangsu Province!
2021/05 — I am awarded with Outstanding Undergraduates of Jiangsu Province!
2021/01 — One journal paper on enhancing UWSN localizability is accepted to Ad Hoc Networks.
2020/08 — One journal paper on three-dimensonal node coverage optimization in UWSN is accepted to Internet of Things Jounral.
2020/06 — One journal paper on optimization for dense crowd emergency evacuation is accepted to Journal of Cultural Heritage.

Research

Adversarial Machine Learning on Video Models

My core work studies the robustness of video recognition models and the safety failures of VideoLLMs. A central question is how video recognition systems can be fooled by stylized perturbations (IEEE S&P 2023, AAAI 2024, TDSC 2026, DLSP 2024), and why VideoLLMs perform relatively weak, e.g., missing harmful content that is plainly visible to humans (AAAI 2026).

StyleFool series — style-based attacks in video recognition

StyleFool

IEEE S&P 2023

Style-driven, unrestricted black-box adversarial attack that fools video classifiers with natural style transfer.

LogoStyleFool

AAAI 2024, TDSC 2026

Extends the idea to a regional logo, attacking video recognition through stylized logos.

LocalStyleFool

DLSP 2024

Generalizes style-based attacks to segmented regions via the Segment Anything Model.

Security and Protection on Other Temporal Data

Beyond video, I study security and safety issues in other temporal data. On the audio side I work on guardrails for audio-language models and proactive voice protection; on skeleton data I study query-efficient attacks against action recognition systems.

ALMGuard: Guardrails for Audio-Language Models — NeurIPS 2025
E2E-VGuard: Adversarial Prevention for Speech Synthesis — NeurIPS 2025
Whispering Under the Eaves: Privacy vs. ASR — USENIX 2025
SafeSpeech: Universal Voice Protection — USENIX 2025
Bones of Contention: Attacks on Skeleton Recognition — TIFS 2026

Image & Interdisciplinary Security

I also work on image-based adversarial machine learning and other interdisciplinary AI or AI-security topics.

Unbridled Icarus: Survey of MLLM Image-Input Security — SMC 2024
Effects of Exponential Gaussian on Randomized Smoothing — ICML 2024
Backdoor Attacks on Lane Detection — CVPR 2026
Flow-Attention Net for 3D Mask Detection — NeurIPS 2023
GRFormer: Lightweight Image Super-Resolution — ACM MM 2024
Uncertainty-Aware Masked Modeling in Medical Imaging — ICASSP 2025

Selected Works

StyleFool series

IEEE S&P2023 · CCF-A · CORE: A*

StyleFool: Fooling Video Classification Systems via Style Transfer

Yuxin Cao, Xi Xiao, Ruoxi Sun, Derui Wang, Minhui Xue, Sheng Wen

An unrestricted black-box video adversarial attack that uses style transfer to craft natural, hard-to-detect perturbations against video classifiers, reducing queries while resisting existing defenses.

Paper Code

StyleFool series

AAAI2024 · CCF-A · CORE: A*

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer

Yuxin Cao*, Ziyu Zhao*, Xi Xiao, Derui Wang, Minhui Xue, Jin Lu

Extends style-based attacks to a regional stylized logo, enabling targeted attacks on video recognition while preserving video naturalness and evading patch-based defenses.

Paper Code

VideoLLM safety

AAAI2026 · CCF-A · CORE: A*

Failures to Surface Harmful Contents in Video Large Language Models

Yuxin Cao, Wei Song, Derui Wang, Jingling Xue, Jin Song Dong

Shows that state-of-the-art VideoLLMs rarely report clearly visible harmful content, and traces the failure to temporal under-sampling, spatial token loss, and weak encoder–decoder grounding.

Paper Code

Skeleton security

TIFS2026 · CCF-A · Q1/top · CORE: N/A

Bones of Contention: Query-Efficient Attacks Against Skeleton Recognition Systems

Yuxin Cao*, Kai Ye*, Derui Wang, Minhui Xue, Hao Ge, Chenxiong Qian, Jin Song Dong

Proposes query-efficient skeletal attacks (ISAAC-K/N) with bone-length and temporal constraints, and uncovers a query-free no-box attack that exposes the fragility of skeleton-based action recognition.

Paper

Face anti-spoofing

NeurIPS2023 · CCF-A · CORE: A*

Flow-Attention-based Spatio-Temporal Aggregation Network for 3D Mask Detection

Yuxin Cao, Yian Li, Yumeng Zhu, Derui Wang, Minhui Xue

FASTEN uses facial optical flow, flow attention, and spatio-temporal aggregation to detect highly realistic 3D masks from only five frames, and has been deployed on real mobile devices.

Paper Code

Audio-LM safety

NeurIPS2025 · CCF-A · CORE: A*

ALMGuard: Safety Shortcuts as Guardrails for Audio-Language Models

Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang

Identifies universal shortcut activation perturbations that trigger safety behavior in audio-language models, cutting jailbreak success to 4.6% while preserving benign utility.

Paper Code

Publications

* denotes equal contribution. CCF and CORE labels included where applicable.

2026

Towards Stealthy and Effective Backdoor Attacks on Lane Detection: A Naturalistic Data Poisoning Approach
Yifan Liao, Yuxin Cao, Yedi Zhang, Wentao He, Yan Xiao, Xianglong Du, Zhiyong Huang, Jin Song Dong

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026.

CCF-ACORE: A* Paper
Bones of Contention: Exploring Query-Efficient Attacks Against Skeleton Recognition Systems
Yuxin Cao*, Kai Ye*, Derui Wang, Minhui Xue, Hao Ge, Chenxiong Qian, Jin Song Dong

IEEE Transactions on Information Forensics and Security (TIFS), 2026.

Q1 / topCCF-ACORE: N/A Paper
Failures to Surface Harmful Contents in Video Large Language Models
Yuxin Cao, Wei Song, Derui Wang, Jingling Xue, Jin Song Dong

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2026.

CCF-ACORE: A* Paper
Query-Efficient Video Adversarial Attack with Stylized Logo on Service Computing
Duoxun Tang*, Yuxin Cao*, Xi Xiao, Derui Wang, Sheng Wen and Tianqing Zhu

IEEE Transactions on Dependable and Secure Computing (TDSC), 2026.

Q1 / topCCF-ACORE: A* Paper
DUAP: Dual-task Universal Adversarial Perturbations Against Voice Control Systems
Suyang Sun, Weifei Jin, Yuxin Cao, Wei Song, Jie Hao

Proceedings of the IEEE International Conference on Multimedia & Expo (ICME), 2026.

CCF-BCORE: A Paper

2025

ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang

Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2025.

CCF-ACORE: A* Paper Code
E2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech Synthesis
Zhisheng Zhang, Derui Wang, Yifan Mi, Zhiyong Wu, Jie Gao, Yuxin Cao, Kai Ye, Minhui Xue, Jie Hao

Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2025.

CCF-ACORE: A* Paper Code
Whispering Under the Eaves: Protecting User Privacy Against Commercial and LLM-empowered Automatic Speech Recognition Systems
Weifei Jin, Yuxin Cao, Junjie Su, Derui Wang, Yedi Zhang, Minhui Xue, Jie Hao, Jin Song Dong, Yixian Yang

Proceedings of the USENIX Security Symposium (USENIX), 2025.

CCF-ACORE: A* Paper Code
SafeSpeech: Robust and Universal Voice Protection Against Malicious Speech Synthesis
Zhisheng Zhang, Derui Wang, Qianyi Yang, Pengyang Huang, Junhan Pu, Yuxin Cao, Kai Ye, Jie Hao, Yixian Yang

Proceedings of the USENIX Security Symposium (USENIX), 2025.

CCF-ACORE: A* Paper Code
Uncertainty-Aware Masked Modeling in Medical Imaging
Jiayu Zhang*, Yuxin Cao*, Dexuan Xu

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.

CCF-BCORE: B Paper
AVID: Model Attribution via Inverse Diffusion
Luyu Zhu, Kai Ye, Jiayu Yao, Chenxi Li, Luwen Zhao, Yuxin Cao, Derui Wang, Jie Hao

Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025.

CCF-BCORE: B Paper

2024

LogoStyleFool: Vitiating Video Recognition Systems via Logo Style Transfer
Yuxin Cao*, Ziyu Zhao*, Xi Xiao, Derui Wang, Minhui Xue, Jin Lu

Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), 2024.

CCF-ACORE: A* Paper
LocalStyleFool: Regional Style Transfer Attack Using Segment Anything Model
Yuxin Cao, Jinghao Li, Xi Xiao, Derui Wang, Minhui Xue, Hao Ge, Wei Liu, Guangwu Hu

Proceedings of the 7th Deep Learning Security and Privacy Workshop (DLSP, IEEE S&P Workshop), 2024.

CORE: N/A Paper
Unbridled Icarus: A Survey of the Potential Perils of Image Inputs in Multimodal Large Language Model Security
Yihe Fan, Yuxin Cao, Ziyu Zhao, Ziyao Liu, Shaofeng Li

Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2024.

CCF-CCORE: B Paper
Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer
Weifei Jin, Yuxin Cao, Junjie Su, Qi Shen, Kai Ye, Derui Wang, Jie Hao, Ziyao Liu

Proceedings of the 2nd ACM Workshop on Secure and Trustworthy Deep Learning Systems (SecTL, AsiaCCS Workshop), 2024.

CORE: N/A Paper
GRFormer: Grouped Residual Self-Attention for Lightweight Single Image Super-Resolution
Yuzhen Li, Zehang Deng, Yuxin Cao, Lihua Liu

Proceedings of the ACM International Conference on Multimedia (ACM MM), 2024.

CCF-ACORE: A* Paper
3D Face Reconstruction Using A Spectral-Based Graph Convolution Encoder
Haoxin Xu, Zezheng Zhao, Yuxin Cao, Chunyu Chen, Hao Ge, Ziyao Liu

Proceedings of the Web Conference (WWW short paper), 2024.

CCF-ACORE: A* Paper
Effects of Exponential Gaussian Distribution on (Double Sampling) Randomized Smoothing
Youwei Shu, Xi Xiao, Derui Wang, Yuxin Cao, Siji Chen, Minhui Xue, Linyi Li, Bo Li

Proceedings of the International Conference on Machine Learning (ICML), 2024.

CCF-ACORE: A* Paper
Mitigating Unauthorized Speech Synthesis for Voice Protection
Zhisheng Zhang, Qianyi Yang, Derui Wang, Pengyang Huang, Yuxin Cao, Kai Ye, Jie Hao

Proceedings of the 1st ACM Workshop on Large AI Systems and Models with Privacy and Safety Analysis (LAMPS, CCS Workshop), 2024.

CORE: N/A Paper

2023

StyleFool: Fooling Video Classification Systems via Style Transfer
Yuxin Cao, Xi Xiao, Ruoxi Sun, Derui Wang, Minhui Xue, Sheng Wen

Proceedings of the IEEE Symposium on Security & Privacy (IEEE S&P), 2023.

CCF-ACORE: A* Paper Code
Flow-Attention-based Spatio-Temporal Aggregation Network for 3D Mask Detection
Yuxin Cao, Yian Li, Yumeng Zhu, Derui Wang, Minhui Xue

Proceedings of the Conference on Neural Information Processing Systems (NeurIPS), 2023.

CCF-ACORE: A* Paper Code
Three-dimensional iterative enhancement for coverage hole recovery in UWSNs
Lingli Zhang, Chengming Luo, Xiyun Ge, Yuxin Cao, Haobo Zhang

Journal of Marine Science and Engineering, 2023.

Q1CORE: N/A Paper
A fine extraction algorithm for image-based surface cracks in underwater dams
Gaifang Xin, Xinnan Fan, Pengfei Shi, Chengming Luo, Jianjun Ni, Yuxin Cao

Measurement Science and Technology, 2023.

Q1CORE: N/A Paper

Before 2022

Three Dimensional Coverage Optimization of Underwater Nodes under Multi-Constraints Combined with Water Flow
Chengming Luo, Yuxin Cao, Gaifang Xin, Biao Wang, En Lu, Houlian Wang

IEEE Internet of Things Journal (IOTJ), 2022.

Q1 / topCORE: N/A Paper
Path intelligent optimization for dense crowd emergency evacuation in heritage buildings
Yuxin Cao, Chengming Luo, Yuanyuan Liu, Siru Teng, Gaifang Xin

Journal of Cultural Heritage, 2021.

Q1CORE: N/A Paper
A hybrid coverage control for enhancing UWSN localizability using IBSO-VFA
Chengming Luo, Biao Wang, Yuxin Cao, Gaifang Xin, Cheng He, Lin Ma

Ad Hoc Networks, 2021.

Q1CORE: N/A Paper
Stable positioning for mobile targets using distributed fusion correction strategy of heterogeneous data
Gaifang Xin, Xinnan Fan, Chengming Luo, Yuxin Cao, Hai Yang, Haiyan Xu, Xuewu Zhang

Ad Hoc Networks, 2020.

Q1CORE: N/A Paper
Polarization error analysis of an all-optical fibre small current sensor for partial discharge
Gaifang Xin, Jun Zhu, Chengming Luo, Jing Tang, Wei Li, Yuxin Cao, Haiyan Xu

Journal of Electrical Engineering & Technology, 2020.

Q3CORE: N/A Paper

Selected Awards and Honors

2024Outstanding Undergraduates of Beijing
2023IEEE S&P Travel Grant
2022Outstanding Undergraduate Thesis of Jiangsu Province (First Prize)
2021Outstanding Undergraduates of Jiangsu Province
2019ICM Meritorious Winner
2018CAMCM Outstanding Winner
2018National Scholarship
2016First Prize of National High School Mathematics Competition
2012Honor Roll of American Mathematics Competition

Teaching

Teaching Assistant

CS5425/CS4225 Big Data Systems for Data Science, National University of Singapore, 2025 Fall.

CS5425/CS4225 Big Data Systems for Data Science, National University of Singapore, 2026 Spring.

Supervisor

CP2107 Independent Introduction to CS Research (Odyssey), National University of Singapore, 2026 Fall.

Services

Reviewer

NeurIPS'2026, CVPR'2026, ECCV'2026, ICML'2026, AAAI'2026, ICLR'2026, ACM MM'2025, SMC'2024, TIFS, TDSC, TETCI, NCAA

Sub-reviewer

S&P'2023, USENIX'2023, ACML'2023, PRCV'2023, TIFS, Sustainability

External Reviewer

NDSS'2025