Let's Recon and Plan Step-by-Step

Generated by https://deepai.org/

Introduction

Earlier this year, I was studying how a model can call tools and process the result. Since I was completely new to this field, so I used OpenWebUI and llama3.1:latest to build a small lab. When I was playing this lab, I found some common issue of unsuccessful prompts and learnt how to improve them. It did great help when I was learning and conducting prompt injection or jailbreak prompt.

Lab Setup

Setting in OpenWebUI:

System prompt:

In tool setting, create a tool:

This setting can quickly form a lab of LLM + API call integration. Remember to define the content in the /test endpoint. In the system prompt, it is assumed that the endpoint should return a JSON of days. The LLM will then process the data by summing it up, for example:

Then, the LLM would say 4 (or hallucinate something else LOL).

This is a vulnerable setting for:

  1. Defense by system prompt only

  2. Lack of tool output post process

  3. Returning unnecessary data from endpoint

Despite the above vulnerable setting, personally I think hardcoding a link can limit attacks everybody like, e.g. SSRF.

Problem

If the lab is setup successfully, it should return meaningless terms like above. If you encounter this, then it is likely you don't know:

  • Don’t know how to recon what LLM can do

  • Don’t know how to identify suitable target action

  • Don’t know how to prompt for target action

  • Don’t know how to prompt based on recon result

  • Don’t know how to identify and bypass protection

This blog is going to share some small trick I used.

Recon

Recon is the most important in any security test. I put a list of basic things I would check:

Can/Cannot Do

For example, in the lab, the following prompt can show details about what is instructed to do. We can observe it is integrated with tool, the "business" action it designed to do, and the fixed response if our prompt hits some condition.

It is very interesting to recon LLM behaviour for exploitation. Further instances can be found in "Jailbreak Prompt Build Journel" Sections. They are some ways to recon and trigger the LLM to return evil things by leveraging LLM behavior.

Response

Observe the pattern when it hits some special mechanism, such as special system prompt, input filter, and output filter. For example, I just removed one word from the lab, then it change its response following the system instruction:

Limitation

After knowing what the LLM can do, we also need to know what we cannot prompt. So that we can increase the success chance:

Plan our Target

After recon, we have idea about what the LLM app is designed for. So we can start planning what to do. First of all, we cannot forget a LLM is basically a transformer that transform our input into expected output:

Hence, we need to set up a suitable and applicable target for our attack.

In the system prompt of this lab, the prompt tasked the LLM to make a GET HTTP request:

Hence, I will target to know "What is the raw HTTP Response", like a common Appsec Pentester.

I also encountered some application that preforms document negotiation. Since it can be very difficult to trace all comments, LLM can do a great help and summarize it. Hence, as an attacker, we can perform prompt injection here to manipulate the content of the summary:

There are also other scenarios:

If the LLM would read some GitHub repo for further tasks, we can inject payload into Issues and trigger the client agent to execute malicious action. My other blog demonstrated this possible attack flow:

Content Manipulation is useful against text processing function. A real life example is:

Absolutely, Jailbreak and ask for content that violates the LLM safety guardrail is one of targets.

In this stage, it is similar to perform a web app test. Using prompt (or ask our client LOL), we may need to consider the following before a prompt injection:

  • Where can I put the payload?

  • What (LLM/Workflow/Human) can trigger the payload?

  • Is there any file upload? Does the app do embedding? How a token is counted?

  • How does the LLM get my input? API Call? Vector store Search?

  • How does the LLM store a chat history? Lang Graph short term memory? Or a database?

  • Is there any agentic RAG? How to make sure the Agent would not change our prompt for improving its search?

  • Is there any agent/MCP?

  • What is the criteria of retrieving my input?

  • How to make sure my input will get into the context window?

  • Does it call external source?

We can also look for chances of model bog down, which makes the LLM model busy and deny service eventually. For example, I once made Gemini 2.5 Flash to repeat the following, although it eventually timeout after 5 minutes, but it looks like a possible direction:

Conclusion

So only the following left:

  • Don’t know how to recon what LLM can do

  • Don’t know how to identify suitable target action

  • Don’t know how to prompt for target action

  • Don’t know how to prompt based on recon result

  • Don’t know how to identify and bypass protection

The blog covered how to recon and choose a suitable target action.

Last updated