打开网易新闻 查看精彩图片

第二期

威胁二:自主网络利用与工具滥用

当LLM智能体被赋予代码执行或系统级工具的访问权限时,它们能够自主执行对抗性的网络安全攻击,从而催生了所谓的“自主网络利用”。与需要外部行为者操纵模型的快速注入或越狱攻击不同,自主利用涉及智能体自身在无需直接人工监督的情况下,识别、组织并执行攻击[26][64][67]。过往研究已经表明,这些对抗性智能体能够在沙盒环境中成功攻陷网站[51],并实施对“一日漏洞”(one-day vulnerabilities,即已披露但尚未广泛修复的漏洞)的利用[68]。此外,根据各种行业评估,在此类情境下,攻击者的目标可能包括数据窃取、欺诈、勒索软件部署以及网络横向移动[4]。

尤其需要认识到,自主网络利用的经济性对攻击者极为有利[26][64][67]。例如,攻击者可以使用GPT-4以每次仅几美元的成本执行有效的一日漏洞利用,使得攻击成本低于雇佣人类攻击者。此外,LLM驱动攻击的可并行化特性进一步加剧了此问题,使得大规模、高数量的攻击尝试既成为可能又经济可行[64]。

2.1 一日漏洞的利用

近期研究显示,LLM智能体,尤其是基于GPT-4构建的智能体,能够自主利用真实世界中的一日漏洞,包括Python包、在线平台及容器管理系统中未打补丁的CVE(公共漏洞与暴露)[68]。为了实施诸如SQL注入、远程代码执行(RCE)以及并发/协同攻击等复杂利用,对抗性智能体可以利用工具使用、规划和文档检索等功能。值得注意的是,在给定CVE描述的情况下,GPT-4普遍优于所有其他被测试的模型以及传统的漏洞扫描器(如OWASP ZAP[69]和Metasploit[70]),实现了87%的成功率[68]。

2.2 自主网站入侵

在近期工作中,Fang等人[51]展示了GPT-4智能体如何在无需事先知晓特定漏洞的情况下,自主攻破沙盒化网站。这些智能体执行多步攻击,例如利用跨站脚本(XSS)和跨站请求伪造(CSRF)进行XSS+CSRF链式攻击[71]、服务端模板注入(SSTI)[72]以及盲SQL联合注入[73]。该研究表明,智能体的能力(如上下文管理、工具集成和战略规划)对于攻击的成功至关重要。

2.3 涌现性工具滥用

近期研究还表明,自主LLM智能体可以展现出协同和自适应的工具使用行为,以实施网络攻击。例如,Rechard等人[51]设计了能够通过工具调用和动态规划的战略组合来执行复杂多步网站利用的LLM智能体。Shi等人的ConAgents框架[74]强调了专业智能体在工具选择、执行和行动校准方面的结构化协作,使智能体能够从失败中迭代恢复并优化其行动。在工具集成方面,Wang等人的ToolGen框架[75]使LLM能够将工具调用作为“下一个令牌生成”的一部分,从而无需外部检索模块即可简化对庞大工具集的调用链。

在近期涌现的自主智能体安全研究中,OpenClaw[7]已成为剖析工具滥用风险最具代表性的案例平台之一。作为一款自托管的开源AI智能体框架,OpenClaw将认知决策与工具执行相分离,赋予智能体系统级操作权限,使其能够自主调用网络浏览器、执行Shell命令、管理本地文件,并通过跨平台消息系统发送消息、管理工作流乃至形成涌现性的AI社交网络。然而,这一能力集成的设计也造就了极为复杂的攻击面。Ying等人[76]对OpenClaw生态进行了系统化安全分析,揭示了包括提示注入驱动的远程代码执行(RCE)、顺序化工具攻击链、上下文失忆以及供应链污染在内的多重关键漏洞。该研究提出三层风险分类法,将自主智能体的脆弱性归纳为AI认知层、软件执行层与信息系统层三个维度。

更为关键的是,OpenClaw的扩展性设计催生了新型的“引导注入”(guidance injection)攻击向量。Liu等人[77]发现,OpenClaw引入了可扩展的技能生态,允许第三方开发者通过生命周期钩子在智能体初始化阶段注入行为引导信息。与传统提示注入依赖显式恶意指令不同,引导注入通过将有害行为伪装为常规最佳实践,嵌入对抗性操作叙事,使其被自动纳入智能体的推理框架并影响后续任务执行,而不引起警觉。研究者构建了涵盖凭据外泄、工作区破坏、权限提升及持久化后门安装等13种攻击类别的26个恶意技能,在52个自然用户提示和6种主流LLM后端的评估中,攻击成功率介于16.0%至64.2%之间,且94%的恶意技能能够规避现有静态扫描与LLM检测工具的识别。

此外,OpenClaw社区市场ClawHub中超过13,000个社区贡献的智能体技能中,根据近期审计,13%至26%包含安全漏洞,恶意插件被广泛发现以加密货币工具和生产力集成为名部署信息窃取程序和后门[78]。对OpenClaw及其五个衍生框架的系统性安全评估进一步表明,所有被评估的智能体均存在显著的安全漏洞,且智能体化系统的风险程度显著高于其底层模型单独使用时的风险水平——特别是侦察与发现行为成为最常见的弱点,而不同框架暴露出凭据泄露、横向移动、权限提升和资源开发等各异的高风险特征。这些发现共同凸显出,智能体系统的安全不仅受限于底层模型的安全属性,更由模型能力、工具使用、多步规划与运行时编排之间的耦合程度所决定;一旦智能体被赋予执行能力和持久的运行时上下文,早期阶段的薄弱环节将被放大为具体的系统级安全失效。

综上所述,这些研究共同凸显了在实现具有工具访问权限的自主智能体系统时,部署强健的隔离机制和运行时监控机制的重要性。

向上滑动,可查看所有参考文献

[1]. How Llms Work. Ai What Do Large Language Models "Understand"? Image, 21:1, 2024.

[2]. Andrei Kucharavy. Fundamental Limitations Of Generative Llms. In Large Language Models In Cybersecurity: Threats, Exposure And Mitigation, Pages 55-64. Springer Nature Switzerland Cham, 2024.

[3]. Thomas Kwa, Ben West, Joel Becker, Amy Deng, Kathryn Garcia, Max Hasin, Sami Jawhar, Megan Kinniment, Nate Rush, Sydney Von Arx, Et Al. Measuring Ai Ability To Complete Long Tasks. Arxiv Preprint Arxiv:2503.14499, 2025.

[4]. Palo Alto Networks (Unit 42). Ai Agents Are Here. So Are The Threats., 2025.

[5]. Langchain. Langchain Documentation. Https://Python.Langchain.Com/, 2024.

[6]. Toran Bruce Richards. Autogpt: An Autonomous Gpt Experiment. Https://Github.Com/Torantulino/Auto-Gpt, 2024.

[7]. Openclaw. Https://Openclaw.Im

[8]. Guanzhi Wang et al. Voyager: An open-ended embodied agent with large language models. arXiv preprint arXiv:2305.16291, 2023.

[9]. Paolo Dal Cin, Daniel Kendzior, Yusof Seedat, and Renato Marinho. Three essentials for agentic ai security. MIT Sloan Management Review (Online), pages 1-4, 2025.

[10]. Reuters. Just in time? manufacturers turn to ai to weather tariff storm, 2025. URL https://www.reuters.com/business/just-time-manufacturers-turn-ai-weather-tariff-storm-2025-08-13/.

[11]. Wired. Forget chatbots. ai agents are the future, 2025. URL https://www.wired.com/story/fast-forward-forget-chatbots-ai-agents-are-the-future/. Accessed: 2025-08-16.

[12]. Joon Sung Park et al. Generative agents: Interactive simulacra of human behavior. In Proceedings of the ACM Symposium on User Interface Software and Technology (UIST), 2023.

[13]. Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents. Cell, 187 (22):6125-6151, 2024.

[14]. Mourad Gridach, Jay Nanavati, Khaldoun Zine El Abidine, Lenon Mendes, and Christina Mack. Agentic ai for scientific discovery: A survey of progress, challenges, and future directions. arXiv preprint arXiv:2503.08979, 2025.

[15]. Verge. Inside the automated warehouse where robots are packing your groceries, 2025. URL https://www.theverge.com/robot/719880/ocado-online-grocery-automation-krogers-luton-ogrp-robot-grid. Accessed: 2025-08-16.

[16]. Zihan Chen, Yixin Wu, et al. Autoagents: A framework for automatic agent generation. arXiv preprint arXiv:2309.17288, 2023. URL https://arxiv.org/abs/2309.17288.

[17]. Reuters. Amazon's delivery, logistics get ai boost, 2025. URL https://www.reuters.com/business/retail-consumer/amazons-delivery-logistics-will-get-an-ai-boost-2025-06-04/. Accessed: 2025-08-16.

[18]. Subash Neupane, Sudip Mittal, and Shahram Rahimi. Towards a hipaa compliant agentic ai system in healthcare. arXiv preprint arXiv:2504.17669, 2025.

[19]. Ken Huang. Ai agents in healthcare. In Agentic AI: Theories and Practices, pages 303-321. Springer, 2025.

[20]. Nalan Karunanayake. Next-generation agentic ai for transforming healthcare. Informatics and Health, 2(2): 73-83, 2025.

[21]. Michael Moritz, Eric Topol, and Pranav Rajpurkar. Coordinated ai agents for advancing healthcare. Nature Biomedical Engineering, pages 1-7, 2025.

[22]. James Zou and Eric J Topol. The rise of agentic ai teammates in medicine. The Lancet, 405(10477):457, 2025.

[23]. Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres, Thorsten Holz, and Mario Fritz. Not what you've signed up for: Compromising real-world llm-integrated Applications with indirect prompt injection. Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security, 2023. URL https://api.semanticscholar.org/CorpusID:258546941.

[24]. Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models. ArXiv, abs/2211.09527, 2022. URL https://api.semanticscholar.org/CorpusID:253581710.

[25]. Luca Beurer-Kellner, Beat Buesser, Ana-Maria Cre¸tu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, et al. Design Patterns for Securing LLM Agents against Prompt Injections. arXiv preprint arXiv:2506.08837, 2025.

[26]. Donghyun Lee and Mo Tiwari. Prompt infection: Llm-to-llm prompt injection within multi-agent systems. arXiv preprint arXiv:2410.07283, 2024.

[27]. OWASP GenAI Project. Owasp genai llm01: Prompt injection, 2025.

[28]. Jeremy McHugh, Kristina Sekrst, and Jonathan Rodriguez Cefalu. Prompt injection 2.0: Hybrid ai threats. ArXiv, abs/2507.13169, 2025. URL https://api.semanticscholar.org/CorpusID:280296803.

[29]. Yulin Chen, Haoran Li, Yuan Sui, Yufei He, Yue Liu, Yangqiu Song, and Bryan Hooi. Can indirect prompt injection attacks be detected and removed? arXiv preprint arXiv:2502.16580, 2025.

[30]. Qiusi Zhan, Zhixiang Liang, Zifan Ying, and Daniel Kang. InjeAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the Association for Computational Linguistics: ACL 2024, pages 10471-10506, Bangkok, Thailand, August 2024. Association for Computational Linguistics. doi:10.18653/v1/2024. findings-acl.624. URL https://aclanthology.org/2024. findings-acl.624/.

[31]. Qiusi Zhan, Richard Fang, Henil Shalin Panchal, and Daniel Kang. Adaptive attacks break defenses against indirect prompt injection attacks on llm agents. In Findings of the Association for Computational Linguistics: NAACL 2025, pages 7101–7117, 2025.

[32]. Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and transferable adversarial attacks on aligned language models, 2023. URL https://arxiv.org/abs/2307.15043.

[33]. Xiaogeng Liu, Zhiyuan Yu, Yizhe Zhang, Ning Zhang, and Chaowei Xiao. Automatic and universal prompt injection attacks against large language models. arXiv preprint arXiv:2403.04957, 2024.

[34]. Neel Jain, Avi Schwarzschild, Yuxin Wen, Gowthami Somepalli, John Kirchenbauer, Ping-yeh Chiang, Micah Goldblum, Aniruddha Saha, Jonas Geiping, and Tom Goldstein. Baseline defenses for adversarial attacks against aligned language models. arXiv preprint arXiv:2309.00614, 2023.

[35]. Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, and Tong Sun. Autodan: interpretable gradient-based adversarial attacks on large language models. arXiv preprint arXiv:2310.15140, 2023.

[36]. Zexuan Zhong, Ziqing Huang, Alexander Wettig, and Danqi Chen. Poisoning retrieval corpora by injecting adversarial passages. arXiv preprint arXiv:2310.19156, 2023.

[37]. Zeyi Liao, Lingbo Mo, Chejian Xu, Mintong Kang, Jiawei Zhang, Chaowei Xiao, Yuan Tian, Bo Li, and Huan Sun. Eia: Environmental injection attack on generalist web agents for privacy leakage. arXiv preprint arXiv:2409.11295, 2024.

[38]. Chejian Xu, Mintong Kang, Jiawei Zhang, Zeyi Liao, Lingbo Mo, Mengqi Yuan, Huan Sun, and Bo Li. Advagent: Controllable blackbox red-teaming on web agents. arXiv preprint arXiv:2410.17401, 2024.

[39]. Chen Henry Wu, Rishi Shah, Jing Yu Koh, Ruslan Salakhutdinov, Daniel Fried, and Aditi Raghunathan. Dissecting adversarial robustness of multimodal lm agents. arXiv preprint arXiv:2406.12814, 2024.

[40]. Kaijie Zhu, Xianjun Yang, Jindong Wang, Wenbo Guo, and William Yang Wang. Melon: Provable defense against indirect prompt injection attacks in ai agents. arXiv preprint arXiv:2502.05174, 2025.

[41]. Yanzhe Zhang, Tao Yu, and Diyi Yang. Attacking vision-language computer agents via pop-ups. arXiv preprint arXiv:2411.02391, 2024.

[42]. Sam Johnson, Viet Pham, and Thai Le. Manipulating llm web agents with indirect prompt injection attack via html accessibility tree. arXiv preprint arXiv:2507.14799, 2025.

[43]. Junhyuk Choi, Yeseon Hong, Minju Kim, and Bugeun Kim. Examining identity drift in conversations of llm agents, 2025. URL https://arxiv.org/abs/2412.00804.

[44]. Jiawei Guo and Haipeng Cai. System prompt poisoning: Persistent attacks on large language models beyond user injection. arXiv preprint, 2025. URL https://arxiv.org/abs/2505.06493.

[45]. Quan Zhang, Binqi Zeng, Chijin Zhou, Gwhwan Go, Heyuan Shi, and Yu Jiang. Human-imperceptible retrieval poisoning attacks in llm-powered applications. In Companion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, pages 502-506, 2024.

[46]. Cody Clop and Yannick Teglia. Backdoored retrievers for prompt injection attacks on retrieval augmented generation of large language models. arXiv preprint arXiv:2410.14479, 2024.

[47]. Le Wang, Zonghao Ying, Tianyuan Zhang, Siyuan Liang, Shengshan Hu, Mingchuan Zhang, Aishan Liu, and Xianglong Liu. Manipulating multimodal agents via cross-modal prompt injection. arXiv preprint arXiv:2504.14348, 2025.

[48]. Sean Park. Unveiling ai agent vulnerabilities part ii: Code execution. Trend Micro Research Report, 2025. URL https://www.trendmicro.com/vinfo/br/security/news/cybercrime-and-digital-threats/unveiling-ai-agent-vulnerabilities-code-execution.

[49]. Eugene Bagdasaryan, Tsung-Yin Hsieh, Ben Nassi, and Vitaly Shmatikov. Abusing images and sounds for indirect instruction injection in multi-modal llms. arXiv preprint arXiv:2307.10490, 2023.

[50]. Rodrigo Pedro, Daniel Castro, Paulo Carreira, and Nuno Santos. From prompt injections to sql injection attacks: How protected is your llm-integrated web application? arXiv preprint arXiv:2308.01990, 2023.

[51]. Richard Fang, Rohan Bindu, Akul Gupta, Qiushi Zhan, and Daniel Kang. LLM agents can autonomously hack websites. arXiv preprint arXiv:2402.06664, 2024.

[52]. Rodrigo Pedro, Miguel E. Coimbra, Daniel Castro, Paulo Carreira, and Nuno Santos. Prompt-to-sql injections in llm-integrated web applications: Risks and defenses. 2025 IEEE/ACM 47th International Conference on Software Engineering (ICSE), pages 1768-1780, 2025. URL https://api.semanticscholar.org/CorpusID:272856332.

[53]. MITRE Corporation. CVE-2024-5565: Vanna.AI Remote Code Execution Vulnerability, 2024. URL https://cve.org/CVErecord?id=CVe-2024-5565.

[54]. Anshuman Chhabra, Kartik Patwari, Chandana Kuntala, Deepak Kumar Sharma, Prasant Mohapatra, et al. Towards fair video summarization. Transactions on Machine Learning Research, 2023.

[55]. Xingxing Wei, Siyuan Liang, Ning Chen, and Xiaochun Cao. Transferable adversarial attacks for image and video object detection. arXiv preprint arXiv:1811.12641, 2018.

[56]. Zhipeng Wei, Jingjing Chen, Xingxing Wei, Linxi Jiang, Tat-Seng Chua, Fengfeng Zhou, and Yu-Gang Jiang. Heuristic black-box adversarial attacks on video recognition models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 12338-12345, 2020.

[57]. Linxi Jiang, Xingjun Ma, Shaoxiang Chen, James Bailey, and Yu-Gang Jiang. Black-box adversarial attacks on video recognition models. In Proceedings of the 27th ACM International Conference on Multimedia, pages 864-872, 2019.

[58]. Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, and Weizhe Zhang. Audiojailbreak: Jailbreak attacks against end-to-end large audio-language models. arXiv preprint arXiv:2505.14103, 2025.

[59]. Eugene Bagdasaryan, Rishi Jha, Vitaly Shmatikov, and Tingwei Zhang. Adversarial illusions in multi-modal embeddings. In 33rd USENIX Security Symposium (USENIX Security 24), pages 3009-3025, 2024.

[60]. Lukas Aichberger, Alasdair Paren, Yarin Gal, Philip Torr, and Adel Bibi. Attacking multimodal os agents with malicious image patches, 2025. URL https://arxiv.org/abs/2503.10809.

[61]. Cristian Pinzon, Juan F. De Paz, Javier Bajo, Alvaro Herrero, and Emilio Corchado. Aida-sql: An adaptive intelligent intrusion detector agent for detecting sql injection attacks. In 2010 10th International Conference on Hybrid Intelligent Systems, pages 73-78, 2010. doi:10.1109/HIS.2010.5600026.

[62]. Johann Rehberger. Deepseek ai: From prompt injection to account takeover, 2024. URL https://embracehered.com/blog/posts/2024/deepseek-ai-prompt-injection-to-xss-and-account-takeover/.

[63]. Sander Schulhoff, Jeremy Pinto, Anam Khan, L-F Bouchard, Chenglei Si, Svetlina Anati, Valen Tagliabue, Anson Liu Kost, Christopher Carnahan, and Jordan Boyd-Graber. Ignore this title and hackaprompt: Exposing systemic vulnerabilities of llms through a global scale prompt hacking competition. In The 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2023.

[64]. Stav Cohen, Ron Bitton, and Ben Nassi. Here comes the ai worm: Unleashing zero-click worms that target genai-powered applications. arXiv preprint arXiv:2403.02817, 2024.

[65]. Diego Gosmar, Deborah A Dahl, and Dario Gosmar. Prompt injection detection and mitigation via ai multi-agent nlp frameworks. arXiv preprint arXiv:2503.11517, 2025.

[66]. Sippo Rossi, Alisia Marianne Michel, Raghava Rao Mukkamala, and Jason Bennett Thatcher. An early categorization of prompt injection attacks on large language models, 2024. URL https://arxiv.org/abs/2402.00898.

[67]. U.S. AI Safety Institute. Technical blog: Strengthening ai agent hijacking evaluations. https://www.nist.gov/news-events/news/2025/01/technical-blog-strengthening-ai-agent-hijacking-evaluations, January 2025. Accessed: January 29, 2025.

[68]. Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. Llm agents can autonomously exploit one-day vulnerabilities. arXiv preprint arXiv:2404.08144, 2024.

[69]. Simon Bennetts. Owasp zed attack proxy. AppSec USA, 2013.

[70]. David Kennedy, Jim O'gorman, Devon Kearns, and Mati Aharoni. Metasploit: the penetration tester's guide. No Starch Press, 2011.

[71]. Nathalie Muehlberger. Csrf and xss: Practical examples using burp suite. Seminararbeit, Ausgewählte Kapitel der IT-Security, 2020. URL https://wiki.elvis.science/images/b/b3/Thesis.pdf. Accessed: 2026-02-05.

[72]. Rushi Mamtora, DP Sharma, and Jatin Patel. Server-side template injection with custom exploit. International Journal of Scientific Research in Science, Engineering and Technology, 2021.

[73]. Jean Rosemond Dora, Ladislav Hluchý, and Karol Nemoga. Ontology for blind sql injection. Computing and Informatics, 42(2):480-500, 2023.

[74]. Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Pengjie Ren, Suzan Verberne, and Zhaochun Ren. Learning to use tools via cooperative and interactive agents. arXiv preprint arXiv:2403.03031, 2024.

[75]. Renxi Wang, Xudong Han, Lei Ji, Shu Wang, Timothy Baldwin, and Haonan Li. Toolgen: Unified tool retrieval and calling via generation. arXiv preprint arXiv:2410.03439, 2024.

[76]. Uncovering Security Threats and Architecting Defenses in Autonomous Agents: A Case Study of OpenClaw,https://arxiv.org/html/2603.12644

[77]. Trojan's Whisper: Stealthy Manipulation of OpenClaw through Injected Bootstrapped Guidance,https://browse-export.arxiv.org/abs/2603.19974

[78]. OpenClaw Vulnerability: Website-to-Local Agent Takeover,https://www.oasis.security/blog/openclaw-vulnerability

[79]. Ronny Ko, Jiseong Jeong, Shuyuan Zheng, Chuan Xiao, Tae-Wan Kim, Makoto Onizuka, and Won-Yong Shin. Seven security challenges that must be solved in cross-domain multi-agent llm systems. arXiv preprint arXiv:2505.23847, 2025.

[80]. Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, and Merouane Debbah. From prompt injections to protocol exploits: Threats in llm-powered ai agents workflows. arXiv preprint arXiv:2506.23260, 2025.

[81]. Introduction to mcp, 2025. URL https://modelcontextprotocol.io/introduction. Accessed: 2025-06-04.

[82]. Rao Surapaneni, Miku Jha, Michael Vakoc, and Todd Segal. A2a: A new era of agent interoperability, April 2025. URL https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/. Accessed: 2025-06-04.

[83]. Gaowei Chang. Agentnetworkprotocol (anp) github repository. https://github.com/agent-network-protocol/AgentNetworkProtocol, 2024. Accessed: 2025-06-04.

[84]. Agent communication protocol: Welcome, 2024. URL https://agentcommunicationprotocol.dev/introduction/welcome. Accessed: 2025-06-04.

[85]. Saman Taghavi Zargar, James Joshi, and David Tipper. A survey of defense mechanisms against distributed denial of service (ddos) flooding attacks. IEEE communications surveys & tutorials, 15(4):2046-2069, 2013.

[86]. Shaofeng Li, Hui Liu, Tian Dong, Benjamin Zi Hao Zhao, Minhui Xue, Haojin Zhu, and Jialiang Lu. Hidden backdoors in human-centric language models. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security, pages 3123-3140, 2021.

[87]. Wei Zou, Runpeng Geng, Binghui Wang, and Jinyuan Jia. Poisonedrag: Knowledge poisoning attacks to retrieval-augmented generation of large language models. ArXiv, abs/2402.07867, 2024. URL https://api.semanticscholar.org/CorpusID:267626957.

[88]. Vale Tolpegin, Stacey Truex, Mehmet Emre Gursoy, and Ling Liu. Data poisoning attacks against federated learning systems. In European symposium on research in computer security, pages 480-501. Springer, 2020.

[89]. Shubhi Shukla, Manaar Alam, Sarani Bhattacharya, Pabitra Mitra, and Dehdeep Mukhopadhyay. "whispering mlaas": Exploiting timing channels to compromise user privacy in deep neural networks. IACR Transactions on Cryptographic Hardware and Embedded Systems, pages 587-613, 2023.

[90]. Edoardo Debenedetti, Giorgio Severi, Nicholas Carlini, Christopher A Choquette-Choo, Matthew Jagielski, Milad Nasr, Eric Wallace, and Florian Tramer. Privacy side channels in machine learning systems. In 33rd USENIX Security Symposium (USENIX Security 24), pages 6861-6848, 2024.

[91]. Venkatraman Renganathan and Tyler Holt Summers. Spoof resilient coordination for distributed multi-robot systems. 2017 International Symposium on Multi-Robot and Multi-Agent Systems (MRS), pages 135-141, 2017. URL https://api.semanticscholar.org/CorpusID:7897062.

[92]. Richard M. Chang, Guofei Jiang, Franjo Ivancic, Sriram Sankaranarayanan, and Vitaly Shmatikov. Inputs of coma: Static detection of denial-of-service vulnerabilities. 2009 22nd IEEE Computer Security Foundations Symposium, pages 186-199, 2009. URL https://api.semanticscholar.org/CorpusID:6355518.

[93]. Shiyi Yang, Zhibo Hu, Xinshu Li, Chen Wang, Tong Yu, Xiwei Xu, Liming Zhu, and Lina Yao. Drunkagent: Stealthy memory corruption in llm-powered recommender agents. arXiv preprint arXiv:2503.23804, 2025.

[94]. Sumeet Motwani, Mikhail Baranchuk, Martin Strohmeier, Vijay Bolina, Philip Torr, Lewis Hammond, and Christian Schroeder de Witt. Secret collusion among ai agents: Multi-agent deception via steganography. Advances in Neural Information Processing Systems, 37:73439-73486, 2024.

[95]. Rana Shahroz, Zhen Tan, Sukwon Yun, Charles Fleming, and Tianlong Chen. Agents under siege: Breaking pragmatic multi-agent llm systems with optimized prompt attacks. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9661-9674, 2025.

[96]. Yang Li, Wenhao Zhang, Jianhong Wang, Shao Zhang, Yali Du, Ying Wen, and Wei Pan. Aligning individual and collective objectives in multi-agent cooperation. Advances in Neural Information Processing Systems, 37: 44735-44760, 2024.

[97]. Bei Chen, Gaolei Li, Xi Lin, Zheng Wang, and Jianhua Li. Blockagents: Towards byzantine-robust llm-based multi-agent coordination via blockchain. In Proceedings of the ACM Turing Award Celebration Conference-China 2024, pages 187-192, 2024.

[98]. Zijun Liu, Yanzhe Zhang, Peng Li, Yang Liu, and Diyi Yang. A dynamic llm-powered agent network for task-oriented agent collaboration. In First Conference on Language Modeling, 2024.

[99]. Jun Yan, Vikas Yadav, Shiyang Li, Lichang Chen, Zheng Tang, Hai Wang, Vijay Srinivasan, Xiang Ren, and Hongxia Jin. Backdooring instruction-tuned large language models with virtual prompt injection. arXiv preprint arXiv:2307.16888, 2023.

[100]. Jinyuan Jia, Yupei Liu, and Neil Zhenqiang Gong. Badencoder: Backdoor attacks to pre-trained encoders in self-supervised learning. In 2022 IEEE Symposium on Security and Privacy (SP), pages 2043–2059. IEEE, 2022.

[101]. Rui Zeng, Xi Chen, Yuwen Pu, Xuhong Zhang, Tianyu Du, and Shouling Ji. Clibe: detecting dynamic backdoors in transformer-based nlp models. arXiv preprint arXiv:2409.01193, 2024.

[102]. JFrog Security Research Team. Jfrog and hugging face join forces to expose malicious ml models, March 2025. URL https://jfrog.com/community/ai/jfrog-and-hugging-face-join-forces-to-expose-malicious-ml-models/. Accessed: 2025-06-04.

[103]. Wei Duan, Jie Lu, and Junyu Xuan. Group-aware coordination graph for multi-agent reinforcement learning. arXiv preprint arXiv:2404.10976, 2024.

[104]. Mert Cemri, Melissa Z Pan, Shuyi Yang, Lakshya A Agrawal, Bhavya Chopra, Rishabh Tiwari, Kurt Keutzer, Aditya Parameswaran, Dan Klein, Kannan Ramchandran, et al. Why do multi-agent llm systems fail? arXiv preprint arXiv:2503.13657, 2025.

[105]. Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. Autogen: Enabling next-gen llm applications via multi-agent conversations. In First Conference on Language Modeling, 2024.

[106]. Xuezhou Zhang, Yuzhe Ma, Adish Singla, and Xiaojin Zhu. Adaptive reward-poisoning attacks against reinforcement learning. In International Conference on Machine Learning, pages 11225–11234. PMLR, 2020.

[107]. David F Ferraiolo, Ravi Sandhu, Serban Gavrila, D Richard Kuhn, and Ramaswamy Chandramouli. Proposed nist standard for role-based access control. ACM Transactions on Information and System Security (TISSEC), 4 (3):224–274, 2001.

[108]. Boyi Zeng, Lizheng Wang, Yuncong Hu, Yi Xu, Chenghu Zhou, Xinbing Wang, Yu Yu, and Zhouhan Lin. Hurfe: Human-readable fingerprint for large language models. Advances in Neural Information Processing Systems, 37: 126332–126362, 2024.

[109]. Timour Igamberdiev, Thomas Arnold, and Ivan Habernal. Dp-rewrite: Towards reproducibility and transparency in differentially private text rewriting. arXiv preprint arXiv:2208.10400, 2022.

[110]. Weiyan Shi, Ryan Shea, Si Chen, Chiyuan Zhang, Ruoxi Jia, and Zhou Yu. Just fine-tune twice: Selective differential privacy for large language models. arXiv preprint arXiv:2204.07667, 2022.

[111]. Jie Huang, Hanyin Shao, and Kevin Chen-Chuan Chang. Are large pre-trained language models leaking your personal information? arXiv preprint arXiv:2205.12628, 2022.

[112]. Feng He, Tianqing Zhu, Dayong Ye, Bo Liu, Wanlei Zhou, and Philip S Yu. The emerged security and privacy of llm agent: A survey with case studies. arXiv preprint arXiv:2407.19354, 2024.

[113]. William Enck, Peter Gilbert, Seungyeop Han, Vasant Tendulkar, Byung-Gon Chun, Landon P Cox, Jaeyeon Jung, Patrick McDaniel, and Anmol N Sheth. Taintdroid: an information-flow tracking system for realtime privacy monitoring on smartphones. ACM Transactions on Computer Systems (TOCS), 32(2):1–29, 2014.

[114]. Shoaib Ahmed Siddiqui, Radhika Gaonkar, Boris Köpf, David Krueger, Andrew Paverd, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Menglin Xia, and Santiago Zanella-Béguelin. Permissive information-flow analysis for large language models. arXiv preprint arXiv:2410.03055, 2024.

[115]. Pierre Peigne, Mikolaj Kniejski, Filip Sondej, Matthieu David, Jason Hoelscher-Obermaier, Christian Schroeder de Witt, and Esben Kran. Multi-agent security tax: Trading off security and collaboration capabilities in multi-agent systems. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 27573–27581, 2025.

[116]. Satbir Singh. Llm-based agents: The benefits and the risks. https://www.enkryptai.com/blog/llm&agents&benefits&&srisks, February 2025. Accessed: 2025-08-21.

[117]. Christian Schroeder de Witt. Open challenges in multi-agent security: Towards secure systems of interacting ai agents. arXiv preprint arXiv:2505.02077, 2025.

[118]. Anshuman Chhabra, Peizhao Li, Prasant Mohapatra, and Hongfu Liu. " what data benefits my classifier?" enhancing model performance and interpretability through influence-based data selection. In ICLR, 2024.

[119]. Anshuman Chhabra, Bo Li, Jian Chen, Prasant Mohapatra, and Hongfu Liu. Outlier gradient analysis: Efficiently identifying detrimental training samples for deep learning models. In ICML, 2025.

[120]. Sahar Abdelnabi, Aideen Fay, Giovanni Cherubin, Ahmed Salem, Mario Fritz, and Andrew Paverd. Get my drift? catching llm task drift with activation deltas. 2025 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), pages 43–67, 2024. URL https://api.semanticscholar.org/CorpusID:270211056.

[121]. Dami Choi, Yonadav Shavit, and David K Duvenaud. Tools for verifying neural models' training data. Advances in Neural Information Processing Systems, 36:1154-1188, 2023.

[122]. Dan Petrovic. Advanced interpretability techniques for tracing llm activations. https://dejan.ai/blog/advanced-interpretability-techniques-for-tracing-llm-activations/, March 2025. Accessed: 2025-08-21.

[123]. Tom Sander, Pierre Fernandez, Alain Durmus, Matthijs Douze, and Teddy Furon. Watermarking makes language models radioactive. Advances in Neural Information Processing Systems, 37:21079-21113, 2024.

[124]. Meng Hao, Hongwei Li, Hanxiao Chen, Pengzhi Xing, Guowen Xu, and Tianwei Zhang. Iron: Private inference on transformers. Advances in neural information processing systems, 35:15718-15731, 2022.

[125]. Georgios A Kaissis, Marcus R Makowski, Daniel Rückert, and Rickmer F Braren. Secure, privacy-preserving and federated machine learning in medical imaging. Nature Machine Intelligence, 2(6):305-311, 2020.

[126]. Francis Dutil, Alexandre See, Lisa Di Jorio, and Florent Chandelier. Application of homomorphic encryption in medical imaging. arXiv preprint arXiv:2110.07768, 2021.

[127]. Harshal Tapsamudre, Arun Kumar, Vikas Agarwal, Nisha Gupta, and Sneha Mondal. Ai-assisted controls change management for cybersecurity in the cloud. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 36, pages 12629-12635, 2022.

[128]. Leo de Castro, Antigoni Polychroniadou, and Daniel Escudero. Privacy-preserving large language model inference via gpu-accelerated fully homomorphic encryption. In Neurips Safe Generative AI Workshop 2024, 2024.

[129]. Deevashwer Rathee, Dacheng Li, Ion Stoica, Hao Zhang, and Raluca Popa. Mpc-minimized secure llm inference. arXiv preprint arXiv:2408.03561, 2024.

[130]. Tao Lu, Haoyu Wang, Wenjie Qu, Zonghui Wang, Jinye He, Tianyang Tao, Wenzhi Chen, and Jiaheng Zhang. An efficient and extensible zero-knowledge proof framework for neural networks. Cryptology ePrint Archive, 2024.

[131]. Yurun Chen, Xavier Hu, Keting Yin, Juncheng Li, and Shengyu Zhang. Evaluating the robustness of multimodal agents against active environmental injection attacks. arXiv preprint arXiv:2502.13053, 2025.

[132]. Shuyan Zhou, Frank F Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Tianyue Ou, Yonatan Bisk, Daniel Fried, et al. Webarena: A realistic web environment for building autonomous agents. arXiv preprint arXiv:2307.13854, 2023.

[133]. De Chezelles, Thibault Le Sellier, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lu, Ori Yoran, Dehan Kong, Frank F Xu, Siva Reddy, Quentin Cappart, et al. The browsergym ecosystem for web agent research. arXiv preprint arXiv:2412.05467, 2024.

[134]. Ke Yang, Yao Liu, Sapana Chaudhary, Rasool Fakoor, Pratik Chaudhari, George Karypis, and Huzefa Rangwala. Agentocam: A simple yet strong baseline for llm-based web agents. arXiv preprint arXiv:2410.13825, 2024.

[135]. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback. Advances in neural information processing systems, 35:27730-27744, 2022.

[136]. Xiang Deng, Yu Gu, Boyuan Zheng, Shijie Chen, Samuel Stevens, Boshi Wang, Huan Sun, and Yu Su. Mind2web: Towards a generalist agent for the web, 2023. URL https://arxiv.org/abs/2306.06070.

[137]. Yaxin Luo, Zhaoyi Li, Jiacheng Liu, Jiacheng Cui, Xiaohan Zhao, and Zhiqiang Shen. Open captcha world: A comprehensive web-based platform for testing and benchmarking multimodal llm agents. arXiv preprint arXiv:2505.24878, 2025.

下期预告:“威胁三:多智能体与协议层威胁”,敬请期待。

山石网科是中国网络安全行业的技术创新领导厂商,由一批知名网络安全技术骨干于2007年创立,并以首批网络安全企业的身份,于2019年9月登陆科创板(股票简称:山石网科,股票代码:688030)。

现阶段,山石网科掌握30项自主研发核心技术,申请560多项国内外专利。山石网科于2019年起,积极布局信创领域,致力于推动国内信息技术创新,并于2021年正式启动安全芯片战略。2023年进行自研ASIC安全芯片的技术研发,旨在通过自主创新,为用户提供更高效、更安全的网络安全保障。目前,山石网科已形成了具备“全息、量化、智能、协同”四大技术特点的涉及基础设施安全、云安全、数据安全、应用安全、安全运营、工业互联网安全、信息技术应用创新、AI安全、安全服务、安全教育等10大类产品及服务,50余个行业和场景的完整解决方案。