2024.04.19
This standard document provides a framework for evaluating the resilience of large language models (LLMs) against adversarial attacks. The framework applies to the testing and validation of LLMs across various attack classifications, including L1 Random, L2 Blind-Box, L3 Black-Box, and L4 White-Box. Key metrics used to assess the effectiveness of these attacks include the Attack Success Rate (R) and Decline Rate (D). The document outlines a diverse range of attack methodologies, such as instruction hijacking and prompt masking, to comprehensively test the LLMs' resistance to different types of adversarial techniques. The testing procedure detailed in this standard document aims to establish a structured approach for evaluating the robustness of LLMs against adversarial attacks, enabling developers and organizations to identify and mitigate potential vulnerabilities, and ultimately improve the security and reliability of AI systems built using LLMs. By establishing the "Large Language Model Security Testing Method," WDTA seeks to lead the way in creating a digital ecosystem where AI systems are not only advanced but also secure and ethically aligned. It symbolizes our dedication to a future where digital technologies are developed with a keen sense of their societal implications and are leveraged for the greater benefit of all.
查看详细
2024.01.29
通过对AI安全风险进行深入分析和研究,我们坚信企业和组织可以采取更为有效的管理措施,确保数据安全与隐私保护。面对数字经济时代,企业和组织必须高度重视人工智能安全风险管理,并积极应对各类人工智能安全挑战。借助《AI安全白皮书》的指导和建议,我们期待广大从业者能够更好地把握人工智能安全管理要点,确保数字经济发展可持续性。同时,我们也期待更多企业和组织能够参与到人工智能安全事业中来,共同为构建安全、健康的数字经济环境贡献力量。
查看详细
首页< 上一页123下一页 >末页
本网站使用Cookies以使您获得最佳的体验。为了继续浏览本网站,您需同意我们对Cookies的使用。想要了解更多有关于Cookies的信息,或不希望当您使用网站时出现cookies,请阅读我们的Cookies声明隐私声明
全 部 接 受
拒 绝