AI Security Assessment ▶ AI Security Testing

What is an AI Security Assessment?

The use of artificial intelligence (AI) in enterprises today extends far beyond simple chatbots. Modern AI applications range from LLM-based chat applications and RAG systems with access to internal knowledge sources to AI agents that invoke external tools and APIs. These systems increasingly process business-critical data and are deeply integrated into existing business processes and IT landscapes.

From a security perspective, the relevant concern is not the language model in isolation but the full AI application: model, context construction, RAG pipeline, orchestration logic, tools and APIs, permissions, data flows, output processing, and operating environment. We therefore test the concrete implementation in its enterprise context, not just the model. This shows which attacks are actually possible, which actions the system can trigger, and what business impact could result.

Many LLM-based AI applications also behave non-deterministically — their outputs are context-dependent and only partially predictable, as a significant part of the control logic resides within the language model itself. These model- and context-specific risks require specialized assessment methods. In real projects, particularly critical risks often arise from overprivileged agents and integrations, for example when an assistant is given access to entire SharePoint areas, HR databases, or other sensitive data sources. At the same time, classical vulnerabilities remain relevant: insecure APIs, broken access control, XSS, SSRF, remote code execution, misconfigurations, or insufficient logging can also be exploitable in AI systems and significantly increase the impact of AI-specific attacks.

placeholder for background/neural-network.jpg

Objective

Identification of vulnerabilities in AI-based applications and assessment of risks arising from AI-specific threat scenarios

Question

How resilient is the AI application against prompt injections, data manipulation and misuse, and what can attackers achieve in the worst case?

Scope

LLM integrations, RAG pipelines, agentic systems, AI models, APIs and surrounding infrastructure

AI Security Assessment Process: Methodology & Approach

Our AI Security Assessment provides a systematic evaluation of your AI-based applications with regard to AI-specific vulnerabilities and misconfigurations. Building on established frameworks such as the OWASP Top 10 for LLM Applications 2025, the OWASP Top 10 for Agentic Applications 2026, and the MITRE ATLAS knowledge base, we combine automated attack techniques with expert-driven manual testing.

The assessment typically begins with a threat analysis workshop, aiming to understand the AI application’s architecture, data flows, and trust boundaries, and to derive specific threat scenarios. A clear distinction between the different system types and their architectural components is essential: LLM-based chat applications, RAG systems with connected data sources, and agentic systems with tool access and autonomous decision-making each present different attack surfaces and risk profiles.

A central element of the threat analysis is examining the information flow along the entire processing chain. In particular, we identify the trust boundaries between system components: At which points do untrusted inputs — such as user queries, uploaded documents, or externally retrieved content — enter the processing pipeline? How and where is this data merged with the model context? And most importantly: Which data flows and decisions take place within the LLM context and are therefore influenceable by an attacker through manipulated inputs? Based on this analysis, we derive concrete attack scenarios and prioritize the resulting test cases by business context and technical risk.

In general, the assessment is based on the approach of an examination that is as comprehensive as possible. However, depending on the type of application or system and the relevant threats, a risk-based approach is also possible (comparable to a penetration test ). In this case, the focus is on particularly security-critical or endangered areas, whereby the scope of the test is determined by the time budget agreed upon in advance.

Building on the results of the threat analysis, we perform both automated and manual analyses to identify vulnerabilities. As a first step, we deploy specialized tools that automatically execute a broad range of attack variants against the AI application — across different input modalities, including text and, where in scope, files and images. These automated tests systematically cover known attack categories in particular. However, since they are inherently limited to predefined patterns, our assessors complement the results with targeted manual analysis. They evaluate findings in the context of the specific system architecture, examine orchestration logic for logical weaknesses, and conduct application-specific attack scenarios that go beyond standardized testing catalogs.

Because many LLM-based AI systems behave non-deterministically, we deliberately execute attacks multiple times and in variations to reliably assess actual exploitability. We evaluate both the reproducibility and the consistency of identified vulnerabilities, delivering well-founded risk assessments rather than individual snapshots. A gray- or white-box approach is recommended.

Core Components of SCHUTZWERK AI Security Assessment

LLM-based applications can typically be categorized by their degree of autonomy into one of three types: chat applications, RAG systems, and agentic systems. Depending on the type and architecture of the application under review, the following areas are examined as part of the assessment:

Prompt injection and jailbreak resilience

Testing for direct prompt injections (manipulation via user inputs)
Testing for indirect prompt injections (injected instructions in external data sources, documents, or emails)
Testing for multi-turn attacks (distributed malicious instructions across multiple messages)
Testing the effectiveness of implemented safeguards (e.g., system prompt hardening, input filters, output validation)

Sensitive information disclosure

Testing for disclosure of system prompts and internal configurations
Testing for data leakage through model outputs (personal data, trade secrets, credentials)
Testing data exfiltration controls across context boundaries

Output processing and handling

Testing for insecure processing of model outputs in downstream components (e.g., export, browsers, shells, automation systems)
Testing for classical attack vectors through manipulated outputs (XSS, SSRF, remote code execution)
Testing input and output validation and sanitization across all formats (e.g., text, audio, files, images)

RAG pipeline security

Testing for document poisoning (injection of manipulated documents into the knowledge base)
Testing access controls at the document level
Testing vector database and embedding integrity

Agentic security

Testing for agent goal hijacking (redirecting the agent toward unintended objectives)
Testing for tool misuse and privilege escalation
Testing for overprivileged agents and integrations with access to overly broad data sources or functions
Testing for memory poisoning (manipulation of the agent’s persistent memory)
Testing permission boundaries and the least-privilege principle
Testing for agent communication poisoning in multi-agent systems
Testing MCP server integrations (tool poisoning, command injection, privilege escalation via scope creep)
Testing secure integration of MCP servers and external APIs (authentication, input validation, permission boundaries)

Resources and availability

Testing for resource-based attacks (Denial of Wallet — targeted inputs that massively increase costs or resource consumption of the inference infrastructure)
Testing for unbounded consumption (missing rate limiting, excessive token usage)

Supply chain and infrastructure

Testing supply chain security of frameworks, models, and dependencies
Testing the security of model hosting and inference infrastructure
Testing access controls and authorization for AI components and APIs
Testing logging, monitoring, and auditability of AI interactions

AI Security Assessment and Relevant Regulations & Standards

The security of AI applications is addressed by a growing number of regulations and standards. An AI Security Assessment supports you in demonstrably meeting the technical requirements for robustness and cybersecurity — particularly for high-risk AI systems:

EU AI Act — The EU regulation on artificial intelligence defines risk-based requirements for AI systems, including transparency obligations, technical documentation, and requirements for robustness and cybersecurity. An AI Security Assessment supports you in technically verifying the robustness and cybersecurity requirements — particularly for high-risk AI systems pursuant to Art. 15 of the regulation.
OWASP Top 10 for LLM Applications 2025 — The internationally recognized catalog of the most critical security risks for AI language models forms a central foundation for our assessment methodology. It covers, among others, Prompt Injection, Sensitive Information Disclosure, Supply Chain Vulnerabilities, and Excessive Agency.
OWASP Top 10 for Agentic Applications 2026 — This complementary catalog specifically addresses the security risks of agentic AI systems, including Agent Goal Hijack, Tool Misuse, and Memory & Context Poisoning.
OWASP MCP Top 10 (Beta) — This catalog, currently in development, addresses the specific security risks of the Model Context Protocol (MCP), an emerging protocol increasingly used for tool integration in AI systems. Identified risk categories include Tool Poisoning, Command Injection, Privilege Escalation via Scope Creep, and Context Injection.
MITRE ATLAS — ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a knowledge base of adversary tactics, techniques, and case studies for AI systems and serves as a reference for systematic assessment.

Assessment Results: Risk Evaluation & Security Measures

As a result of the assessment we will provide a detailed report. Depending on the type and scope of the project, the final report will include the following parts:

Management summary with a description of the results and the security level
Description of the project approach, scope, schedule and methodology
Detailed description of identified vulnerabilities in order to understand underlying issues and to enable reconstruction of possible attacks (where necessary with proof-of-concept implementation)
Detailed description of the iterative exploitation process when using chained vulnerabilities
Risk assessment of identified vulnerabilities taking into account the IT environment or the application context (risk classification: low, medium, high, critical)
Description of measures to remedy the vulnerabilities
If necessary, a description of higher-level strategy, concept and process-related measures or optimization suggestions.

If desired, the following points can be additionally integrated into the final report:

A detailed overview of the AI application architecture and data flows
A threat model specific to the AI components and their integration
Recommendations for secure development and operational practices for AI applications
Prioritized remediation recommendations based on a defense-in-depth approach

AI Security Assessment