Attack Surface of LLM Application

Prompt Injection/JB Concept +1 — Data Source

In The last episode…

I discussed my initiation and my foundation concept in previous blog. I strongly recommend reading it so you will not miss any important concept.

You should have an idea of the difference between jailbreak and prompt injection, how to approach an LLM, and something I would try. In short, the previous episode discussed:

  1. Jailbreak vs Prompt Injection

  2. Understand the model and business

  3. Trigger hallucination

To keep it simple, I would include only three ideas (or fewer if my blog is too long). I would also include some basic LLM knowledge so you can understand the attack surface of an LLM application.

Rule of Thumb #4 — It is not only system instruction

As mentioned in the previous blog, Rule of Thumb #2, a model does not simply appear and start talking to you. A business would customize it for its specific purpose. This brings up the question: what are the customizations

As an attacker, I love customizations. The more they customize, the larger the attack surface becomes. Once we have some understanding of the customizations in the target application, we can plan our attack for potential vulnerabilities.

There are many ways to customize/instruct a model to address business cases, including:

System instruction

The basic customization provides information about context, task, input format, output format, tone, style, guidelines, protection, rules, and persona (?). This helps the model understand how to respond in the expected manner.

There is a blog from Sheila Teo who won Singapore’s GPT-4 Prompt Engineering Competition in 2023. You can learn how a good developer would craft a good prompt.

A general attack goal is to override the instruction, which is prompt injection.

RAG/Embedding (For non-word data)/Vector Store(storing vectors)?

Most models are trained on a large general set of language data to ‘understand’ words, meanings, and context. However, this is not ideal for businesses, as they are too ‘generic.’ So, how can we pass the specific information we know to the model?

For example, let’s say there is a set of legal documents, and I want the model to help me summarize what is happening there so I can reduce the need for human resources , no not to cut jobs, but to improve productivity. How can I implement this?

One way is called RAG (Retrieval-Augmented Generation). It allows the model to ‘understand’ external knowledge that didn’t appear in its training set, enabling it to know how to answer based on the trained parameters while processing the vectors in a word document. This is a good one with diagram to show what is it.

If you see an LLM app processing an uploaded document, there’s an 80% chance it’s doing RAG (Retrieval-Augmented Generation), a 10% chance it’s performing OCR (Optical Character Recognition) to extract text, a 5% chance it’s handling unrestricted file uploads with low risk, a 4% chance of insecure direct object access, and a 1% chance of a web shell. Congratulations!

Tool call/Function/Pipe/Agent

If I want a model to not only summarize external data but also interact with or perform actions on other components, how can I do that?

For example, there is a good CTF simulate airline assistant from Wiz. A model is trained to understand the keywords in the user’s prompt and know how to generate a tool call message to activate the tool and perform the action.

Reuse the example in previous blog and the example from LLM Studio Guide. The following is the output from the model when I instruct it to make a GET request.

“id”: “chatcmpl-0ewo4zr373t8gdjzwyvds”, “object”: “chat.completion”, “created”: 1743925830, “model”: “llama-3.2–3b-instruct”, “choices”: [ { “index”: 0, “logprobs”: null, “finish_reason”: “tool_calls”, “message”: { “role”: “assistant”, “tool_calls”: [ { “id”: “290412298”, “type”: “function”, “function”: { “name”: “open_safe_url”, “arguments”: “{\”url\”:\”https://jsonplaceholder.typicode.com/todos/1\”}” } } ] } } ], “usage”: { “prompt_tokens”: 1178, “completion_tokens”: 27, “total_tokens": 1205 }, “stats”: {}, “system_fingerprint”: “llama-3.2–3b-instruct” }

“The model generates this tool call message to the LLM Studio server, which then calls the function open_safe_url with the given argument and responds with the result in a defined format. The detailed code can be found in the example from LLM Studio Guide.

If you want to know it in a more “raw” way, this is a good blog from DhanushKumar.

As an attacker, this is an important part of LLM attacks, as it can introduce vulnerabilities we are familiar with. We should try to understand what the tool call is doing and also enumerate the available tools using our prompt.

Lang Chain

An OG way to process, format, and pipe the request and response between models. It is a set of libraries that support these functions in a more low-level way.

I strongly recommend to take some free self-paced course from Nvidia to know what is it.

It is useful in code reviews or MLOps pipeline reviews. If a model is defended by another model or protection mechanism, then it’s likely that LangChain is involved. We may need to consider controlling the output of a model to bypass the defense, e.g., ‘Return exactly ‘XXXXX’ during summarization…’

Fine tuning

As we discussed, models are trained for general purposes using general datasets (except Claude/DeepSeek, based on my limited knowledge of them). If we want the model to have specific knowledge of a domain or train its responses at the model level, fine-tuning is an option besides RAG/System prompts. Fine-tuning can create a more specialized model for a specific task.

This is a good blog discussing RAG vs Fine tuning

I encountered a fine-tuned model designed to review a large amount of user conversations and perform summarization. In this case, I tried to sabotage it by twisting the conversation summarization to cause business impact.

For example, I looked for the highest-ranking person in the conversation, used prompt injection to manipulate their comments, and then generated a summary that could contradict the original comments made by this high-ranking individual.

Rule of Thumb #5 —SSRF? RCE? CSRF? SQLi?

“Hey Shiba, I am trying SSRF, why is it not work?”. With the basic knowledge about different external data source input, I hope you would understand why this question is interesting. To exploit a SSRF (Server Side Request Forgery), we need to able to change the target of a server request as the minimum requirement.

In LLM application assessment, unless there is an implementation allowing you to specific the target in prompt, otherwise it is very unlikely to happen.

For example, I implemented the following code in the CTF I made for my friends to experience LLM assessment:

def open_safe_url() -> dict:
    try:        # Make a GET request
        url = "http://localhost:8000/test.html"
        response = requests.get(url)                    # Check if the request was successful
        if response.status_code == 200:
            return {
                  "status": "success",
                    "message": f"Fetched content from {url}",
                    "response_body": response.text  # Return the response body
                }
        else:
                return {
                    "status": "error",
                    "message": f"Failed to fetch content from {url}, Status code: {response.status_code}"
                }        except Exception as e:
        return {"status": "error", "message": f"An error occurred: {str(e)}"}

By hardcoding the URL link in the tool code, no matter what the user instructed, the python would only be able to make a call to the defined link. However, there is also an issue in this implementation.

In this implementation, it returns the full HTTP response instead of further data parsing. Hence, it is possible to use prompt injection to extract the full raw HTTP response and look for sensitive data.

With this in mind, you will need to check the boundary of the tool code implemented in the LLM application during assessment.

Next…?

This blog builds the knowledge of LLM application for planning our prompt injection attack when the LLM application integrate with external data source. Apart from extracting prompt, we also need to check what is the tool/integration the LLM app is possibly using.

Next, we will have a look about attention and context.

Last updated