ATTTENTION!! Let’s see how to abuse it in LLM attack/JB.

Prompt Injection/JB Concept +2 — Attention and Different

Disclaimer: The information provided in this material is for educational purposes only for improving security.

Initiation

I discussed attack surface of LLM tool in my previous blog. I strongly recommend reading it so you will not miss any important concept.

You should have an idea of the attack surface of LLMs in an LLM Application assessment, apart from the model itself. In short, the previous episode discussed:

Different integrations with external data sources.
We should enumerate the tool calls implemented and their limitations.

To keep it simple, I would include only three ideas (or fewer if my blog is too long). I would also include some basic LLM knowledge so you can understand the concept of attacking LLM application.

Rule of Thumb #6— What is LLM Attention?

In blogs about prompt injection or prompt engineering, the “context” is that the student never misses a class. Why would the LLM “understand” what we are talking about? Why would the LLM “understand” how to respond to us?

Before discussing what “context” is, we need to understand what “attention” is. It is the core of “instruction” and “processing.”

One day, I was learning how to fine-tune a model, I found there were some interesting parameters in the following code:

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 3407,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
)

The set of values in target_modules (especially q_proj, k_proj and v_proj) caught my eyes. After some reading and Youtube video, they are the parameters used for tuning attention mechanism.

They are used to calculate how similar a set of vectors are, generating a matrix of attention scores. Then, the weighted sum is applied to the vectors (words), and the resulting vector is the context of the sentence.

To visualize it in simple way, for a “sentence” : apple, banana, orange, car, human, the “context” looks like the following:

If the model is trained with more weight given to human data, then the overall context vector would be closer to that of humans, which means “Something related to humans.”

When the LLM responds to a user, the output (words) is generated based on the context vector and other parameters tuned during the training process. Hence, the LLM will generate words related to the context within our expectation.

In LLM usage, the system prompt is treated as the primary context vector. It defines the starting point of a maze that an attacker needs to solve. That’s why I emphasize the importance of enumeration and understanding the purpose of the LLM. By doing so, we can craft our prompt to make use of the context.

Due to the length of the blog, the method about attacking context will be discussed in the coming blogs.

Rule of Thumb #7 — Draw attention in One-Prompt Application

When I was conducting the assessment, I noticed that some applications were implemented with a “One Click Summaries” feature. Additionally, there is an input field, similar to a search bar, that allows you to submit a single prompt, such as for image generation.

In such cases, the context is limited to the system prompt, or we can inject the prompt from the external data source it loads.

If the application is handling a lot of content, we will need to draw the model’s attention to process our instruction. This can be done by:

Use keyword/special character “Do not ignore this input, especially the following…” “!!!xxxxxxx!!!” “ >>> xxxxx <<<” However, it should be noted that these methods can trigger defenses, as they are so obvious that they seem to be attempting to take over something. Unless we can build a strong context, it will be difficult to bypass the defense.
Use Larger prompt size with logical details Dominating the context window using prompt size: By considering the system prompt and the model’s use case, we can craft a larger, more detailed prompt to deceive the LLM into following our instruction. My previous blog discussed how to solve Lost-In-The-Middle problem using outstanding prompt size.
Put your prompt either at the top/the bottom The research paper of Lost-In-The-Middle suggested that the position of external document does matter the LLM response. In an attack scenario, the documents are the places where we can inject our prompt. Since the LLM gives more attention to the “introduction” and “conclusion” parts of a long message, we can consider placing our payload there.

Rule of Thumb #8 — Chat Bot Prompt

The One-Prompt Application approach also applies to a chatbot. The major difference when attacking a chatbot is that it sends the chat history to the model to generate a response. So when I attack a chat bot, I can try to shift the attention to something else in the chat flow. That’s something harder to be conducted against one-click application since one-click application does not have chat history.

During my research, I found a fairly stable prompt to set up a context to ask chat bot for illegal questions. I also learnt this method from this paper. The approach suggested by this paper was asking the LLM to further describe something further based on harmless objective.

Once I make the LLM into the context, I can even ask it how to craft something from domestic product.

When I was writing this blog, I got ChatGPT-4o Mini, 4, 4.5, Llama 4 Scout and Deepseek 0324 to teach me how to make “math” as example. The following screenshots show my approach worked across different models. So far I observe stable performance on those models.

Claude 3.7 Sonnet is the model that would stop me:

To maintain this to be a short blog, I will share my approach in coming blogs.

Next…?

This blog builds the knowledge about basic attention mechanism in LLM and some approach to One-Prompt Application. In the coming blog, I will show how to asked the ChatGPT to answer something illegal/break the rule with the concept we discussed in this blog.

Next, we will have a look about how to build and attack context.

PreviousAttack Surface of LLM Application NextGive me 500 words and Turn ChatGPT and :lama to be Hasenberg

Last updated 3 months ago