The Simplest Possible Workflow

What It Does

This simple workflow is ideal for cases where you want to directly query the LLM with a user’s input without any additional formatting or processing. It takes the user’s question, sends it to the LLM, and returns the generated response.

workflow:
  - name: "query_llm"
    type: "generation"
    config:
      llm_model: "gpt-4o-mini"
      max_tokens: 64
      temperature: 0.2
    inputs:
      input_prompt: "user_question"

How It Works

  • Single Node Process: This workflow consists of one node named query_llm that is of type generation. It is designed to take a user’s question (provided as user_question) and generate an answer directly using an LLM.

  • Configuration Details:

    • llm_model: Uses the "gpt-4o-mini" model.
    • max_tokens: Limits the response to 64 tokens.
    • temperature: Set to 0.2 to control randomness in the response (lower values yield more deterministic outputs).
  • Input Mapping: The node expects an input under the key input_prompt. In this workflow, input_prompt is mapped directly to user_question, meaning that whatever question is supplied by the user will be forwarded as the prompt to the LLM.

Connecting Nodes

As a next step, we want build a workflow with two nodes. First we format a custom prompt, then we use this prompt to call an LLM.

What It Does

  • Customizing the Prompt: The custom_prompt node injects a fixed directive into the user’s provided text. This directive tells the model to “be Yoda” when formulating its response. This simple manipulation makes it easy to experiment with persona-based responses without complex multi-turn conversation logic.

  • Generating the Response: The modified prompt is then forwarded to the query_llm node, which generates a response using the "gpt-4o-mini" model. The response is limited in length and controlled by specified generation parameters.

user_input:
  prompt:
    type: string
workflow:
  - name: "custom_prompt"
    type: "jinja"
    config:
      output_template:
        jinja_template_str: >
          You are Yoda, whatever the prompt, make sure to respond like Yoda.
          Prompt: {{ prompt }}
    inputs:
      prompt: "prompt"
  - name: "query_llm"
    type: "generation"
    config:
      llm_model: "gpt-4o-mini"
      max_tokens: 64
      temperature: 0.2
      stop_sequences: ["<|end_of_text|>"]
    inputs:
      input_prompt: "custom_prompt.output"

What Is New?

User Input

Specifically outlines the inputs an end user will have to provide to the workflow. This is required as the first node might have inputs that are not typed or have multiple inputs which are not all to be provided by the user.

Jinja Node

Used to format inputs for nodes, such as the prompt. Inputs can be referenced as variables.

Connecting Nodes

For the LLM call node, we do not directly use a user input, but use the output of the query formatting done in the Jinja node as the input. This is how nodes are connected in a workflow

Multi-Turn Chat

In this example, we use the same nodes and concepts, but demonstrate how the Jinja node can be used to do complex formatting, including if statements and for loops to enable a multi-turn chat conversation.

What It Does

  1. Formats Conversation Data: The first node (query_prompt) transforms a series of conversation messages into a single, structured prompt using an inline Jinja template. It:

    • Checks for and properly formats a system message (if present).
    • Iterates over the remaining messages, formatting each with role headers and delimiters.
    • Adds an assistant header at the end if the conversation ends with a user message, signaling the LLM to generate a response.
  2. Generates a Response: The second node (query_llm) takes the formatted prompt and passes it to the "gpt-4o-mini" model to generate a text response. The response is constrained by the token limit, temperature settings, and stop sequences to ensure the output is clear and concise.

user_input:
  messages:
    type: Messages
workflow:
  - name: "query_prompt"
    type: "jinja"
    config:
      output_template:
        jinja_template_str: >
          <|begin_of_text|>
          {% if messages[0]['role'] == 'system' %}
            {% set loop_messages = messages[1:] %}
            {% set system_message = '<|start_header_id|>' + 'system' + '<|end_header_id|>\n\n' + messages[0]['content'].strip() + '<|eot_id|>' %}
          {% else %}
            {% set loop_messages = messages %}
            {% set system_message = '' %}
          {% endif %}

          {% for message in loop_messages %}
            {% if loop.index0 == 0 %}
              {{ system_message }}
            {% endif %}

            {{ '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n' + message['content'].strip() + '<|eot_id|>' }}

            {% if loop.last and message['role'] == 'user' %}
              {{ '<|start_header_id|>' + 'assistant' + '<|end_header_id|>\n\n' }}
            {% endif %}
          {% endfor %}
    inputs:
      messages: "messages"
  - name: "query_llm"
    type: "generation"
    config:
      llm_model: "gpt-4o-mini"
      max_tokens: 100
      temperature: 0.2
      stop_sequences: ["<|eot_id|>"]
    inputs:
      # takes the output from query_prompt step
      input_prompt: "query_prompt.output"

How It Works

This workflow consists of two main steps: a prompt formatting stage and a generation stage.

User Input

  • Definition: The workflow begins by defining a user_input section where the key messages is declared with a type of Messages.
  • Purpose: This ensures that the workflow expects structured conversation data (such as a list of message objects) when the application is run.

Step 1 – Prompt Formatting with Jinja (query_prompt Node)

  • Node Type: jinja

  • Purpose: The query_prompt node formats the conversation history into a single prompt string that is suitable for sending to the LLM.

  • Jinja Template Explained: The jinja_template_str is defined inline within the YAML. Here’s what it does:

    1. Beginning of Text: The template starts with the token <|begin_of_text|>, which can be used by the LLM to recognize the start of the prompt.
    2. Handling System Messages:
      • Check for a System Message: The template checks if the first message in the messages array has a role of "system".
      • If True:
        • It separates the system message from the rest of the messages by assigning the remaining messages to loop_messages.
        • It formats the system message by wrapping the role with <|start_header_id|> and <|end_header_id|>, appending the stripped content, and terminating it with <|eot_id|>.
      • If False:
        • It sets loop_messages to include all messages.
        • No system message is added (i.e., system_message is an empty string).
    3. Looping Through Messages: The template iterates over loop_messages:
      • First Message in the Loop: If it’s the first message in the loop ( loop.index0 == 0), it outputs the system_message (if one was defined).
      • Formatting Each Message: For every message, it outputs:
        • A header that marks the message’s role (e.g., user or assistant), formatted with <|start_header_id|> and <|end_header_id|>.
        • The message content, stripped of extra whitespace, followed by <|eot_id|> to signal the end of that message.
      • Prompting the Assistant: If the last message in the conversation is from the user (loop.last and message['role'] == 'user'), the template appends an empty header for the assistant. This cues the LLM that it should generate a response.
  • Input Mapping: The node receives its input via the key messages, which comes from the overall user_input.

Step 2 – LLM Generation (query_llm Node)

  • Node Type: generation

  • Purpose: This node takes the formatted prompt from the previous step and uses it to generate a response from an LLM.

  • Configuration Details:

    • llm_model: Uses "gpt-4o-mini", a variant of GPT-4 optimized for this use case.
    • max_tokens: Limits the response to 64 tokens.
    • temperature: Set to 0.2 for controlled and less random output.
    • stop_sequences: Specifies ["<|eot_id|>"] as a stopping point for the model, ensuring that it stops generating once the designated token is reached.
  • Input Mapping: The node receives its input_prompt from the output of the query_prompt node ( query_prompt.output).