A Framework for Testing Agentic Design Patterns Using LangChain & LangGraph

This blog describes a Python application built using LangChain and LangGraph frameworks for testing agentic workflow design patterns such as Chaining, Routing, or Reflection. The application currently implements 10 AI agent patterns, each with several ore-configured representative use cases that you can run using OpenAI GPT, Anthropic Claude, or Google Gemini LLM models for comparison.

Introduction

The rise of Large Language Models (LLMs) over last 3 years has fundamentally changed how we build software by allowing you to utilize AI agents that can not only execute tasks without relying on preset workflows, but also learn from that to improve over time. There is an abundance of blogs and YouTube channels with demos of various AI agents. But moving from impressive demos to AI systems ready for real-life production use requires more than just API calls to GPT-4 or Claude LLM. It requires you to build agent’s workflow following relevant design patterns, called agentic workflow design patterns to enable AI to reason, plan, collaborate, and improve over time. That is similar to building software following well-established software design patterns described in the book “Design Patterns: Elements of Reusable Object-Oriented Software” was published in 1994 by Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides.

What are AI Agents and Agentic Workflows?

An AI agent is more than an AI chat-bot. It’s a software system that can break down complex tasks posed to it into manageable steps and then use tools and external resources as relevant for each step to execute the task. Agent can maintain context and use memory across multiple interactions with a user as required to complete tasks. It can collaborate with other agents, self-critique its work or adapt and improve its actions based on feedback from agents or users.

Agentic workflows are the design patterns that make this possible. Just as software engineers use design patterns like Factory, Observer, or Strategy to solve recurring problems, agentic workflows provide proven solutions like Routing, Reflection, or Planning for common AI challenges.

Why Design Patterns Matter

Without structured patterns, AI applications become brittle and unpredictable. Consider these real-world challenges:

  • Code generation: A single LLM call might produce buggy code. A Reflection pattern where one model generates and another critiques produces more reliable results.
  • Customer service: Routing every query to a general-purpose agent is inefficient. A Routing pattern classifies requests and delegates to specialized handlers.
  • Research analysis: Processing a paper sequentially is slow. A Parallelization pattern runs summary, question generation, and key term extraction concurrently.

Why Testing Framework

I found that most discussions of agentic patterns are theoretical. I can read a blog or watch YouTube video, but neither could answer me: How do these patterns perform in practice? Which LLM models (OpenAI, Anthropic, Google) work best for each pattern? How do you actually implement them?

Hence I decided to build a practical testing framework that currently implements 10 core agentic patterns using LangChain and LangGraph libraries, with the ability to run and compare agents across different models. Python application code is available on GitHub.

The Testing Framework

Why LangChain and LangGraph?

Building agentic workflows from scratch means wrestling with LLM API differences, managing conversation history, implementing state machines, and handling tool calls. Agent frameworks like LangChain or CrewAI abstract these complexities.

LangChain library provides agent foundation:

  • Model abstraction – allows switching between OpenAI GPT, Anthropic Claude, and Google Gemini with a single interface
  • LCEL (LangChain Expression Language) – allows composing workflow chains with intuitive syntax: chain = prompt | llm | parser
  • Tool integration – a decorator-based function calling that works across tool providers
  • Memory primitives – built-in conversation buffers and history management tools

LangGraph library extends LangChain by adding advanced agent capabilities:

  • State machines – allows defining complex workflows as graphs with nodes (tasks) and edges (workflow routes
  • Multi-agent orchestration – allows coordinating sequential, parallel, or debate-style agents collaboration
  • Persistence – provides InMemoryStore class to allow using semantic, episodic, or procedural memory
  • Conditional routing – provides support for dynamic workflow paths based on agent decisions

Framework Architecture

The testing framework follows a consistent structure that makes patterns easy to implement, run, and compare.

ModelFactory – Multi-Provider Abstraction

A central factory for creating LLM instances from any provider:

The factory handles API keys, default parameters such as temperature or max_tokens, and provider-specific configurations.

Pattern Folder Structure Convention

Implementation of every agentic pattern follows the same directory structure:

Each run.py implements two key functions:

  • run() – execute the design pattern with a single model (or multiple for multi-role patterns)
  • compare_models() – run across multiple models and compare results

OutputWriter – Standardized Logging

All results are automatically logged to a file in the experiments/results/ folder with timestamp and pattern name added to the file name:

This makes it easy to compare how GPT-4, Claude, and Gemini perform on the same task.

Running Design Patterns

Any agentic design pattern can be run from CLI or imported programmatically:

Agentic Design Patterns

This section explores 10 core design patterns that I implemented so far (plus, RAG patterns that has been used in other projects), organized by complexity. For complete implementation details, see my GitHub repository.

Pattern Selection Guide

Before diving into individual patterns, you can use a decision tree below to identify which pattern fits your use case:

Foundation Patterns (1-3) – Building Blocks

Design patterns in this category form the basis for creating more complex agentic workflows.

1. Chaining – Sequential Processing Pipeline

The Pattern: Instead of using a single LLM call for a complex problem, break it down into several simpler steps which will be resolved by a sequence of multiple LLM calls (chain), where output of each step feeds the next step input.

How It Works: Pipeline chain is defined using LangChain Expression Language (LCEL) syntax:

Sample Use Case: Agent analyses product order received from a user:

  • Step 1: Extract technical specifications from user description
  • Step 2: Transform specs into required structured format
  • Step 3: Generate implementation recommendations based on spec documentation

Why It Matters: Most real life problems are complex and require multiple processing steps to provide solution. Applying Chaining pattern makes this explicit and testable.

Trade-offs and Considerations:

AdvantageDisadvantage
Clear separation of concernsEach step adds API latency
Easier debugging (inspect intermediate outputs)Errors in early steps cascade
Specialized prompts per stepHigher token costs (N calls vs. 1)
Testable individual componentsOver-decomposition adds complexity

Latency Impact: A 3-step chain incurs 3× the latency of a single call. For latency-sensitive applications, you should balance decomposition granularity against response time requirements.

Error Handling Strategy: Consider adding validation between steps to catch errors early:

When NOT to Use Chaining Pattern:

  • Simple tasks that a single well-crafted prompt can handle
  • When latency is critical and steps cannot be parallelized
  • When intermediate outputs don’t provide debugging value

2. Routing – Intent-Based Delegation

The Pattern: Complex problems often cannot be handled by a single sequential workflow. The Routing pattern provides a solution by introducing conditional logic into agentic workflow for choosing next step for a specific task. It allows the system to analyze incoming requests to determine a task nature, classify tasks, and then route them to specialized agents for handling.

How It Works: A coordinator LLM agent analyzes the request and then selects the appropriate handler agent:

Sample Use Case: Customer service automation – coordinator / router agent analyzes incoming requests to determine which specialist handler should process it:

  • Booking requests → route to Booking agent if incoming request requires service booking
  • Information queries → route to FAQ agent if incoming request is a question
  • Unclear requests → route to Human if incoming request is not classified (escalation)

Why It Matters: General-purpose agents are often inefficient in handling specific requests. Routing increases efficiency by enabling specialization.

Classification Approaches:

There are several ways to implement the classification step, each with trade-offs:

ApproachSpeedAccuracyBest For
Keyword matchingFast (no LLM call)LowObvious, distinct categories
LLM classificationSlower (+1 API call)HighNuanced, overlapping categories
Embedding similarityMediumMedium-HighLarge number of categories
HybridMediumHighProduction systems

Hybrid Classification Example:

Handling Edge Cases:

Multi-Intent Requests: “Book a flight to Paris and tell me about visa requirements”

Confidence Thresholds: Escalate to human assistant when uncertain

Sample of Production Classification Prompt:

When NOT to Use Routing Pattern:

  • All requests can be handled by a single general-purpose agent
  • Categories overlap significantly (consider hierarchical routing instead)
  • Classification overhead exceeds the benefit of specialization

3. Parallelization – Concurrent Execution

The Pattern: Resolving complex problems often require completing multiple sub-tasks that can be executed simultaneously, rather than sequentially. The Parallelization pattern involves executing multiple independent LLM agent chains concurrently to reduce overall latency and then synthesize results as required to solve the problem.

How It Works: Using RunnableParallel object in LangChain with asynchronous execution:

Sample Use Case: Peer review of a research paper:

  • Parallel execution: Peer agents generate summary, questions, and key terms simultaneously
  • Synthesis: Combine generated reviews into a comprehensive paper critique

Why It Matters: Reduces paper review latency by Nx where N is the number of peers and provides multiple perspectives on the research paper.

Understanding Concurrency vs. Parallelism:

An important distinction for LLM applications:

  • Concurrency (what we achieve using this pattern): Multiple tasks in progress simultaneously via async I/O. While waiting for one API response, we can initiate other requests.
  • True parallelism: Would require multiple CPU cores executing simultaneously (not typical for I/O-bound LLM calls).

The latency reduction comes from overlapping API wait times, not CPU parallelism. If each LLM call takes 2 seconds, three parallel calls still complete in ~2 seconds (not 6).

Synthesis Strategies:

After parallel execution, you need to combine results. Choose based on your use case:

StrategyDescriptionBest For
AggregationCombine all outputs into single document Research summaries, reports
VotingMultiple agents answer same question, majority winsFactual queries, classification
Weighted MergeAssign confidence scores, prioritize higher confidenceWhen agent reliability varies
Structured MergeEach agent fills different fields of output schemaMulti-aspect analysis

Aggregation Example:

Voting Example:

Error Handling in Parallel Execution:

Parallel chains can partially fail. You should handle such failures gracefully:

When NOT to Use Parallelization Pattern:

  • Tasks have dependencies (output of A is input to B)
  • Order of execution matters for correctness
  • Rate limits would be exceeded by concurrent requests
  • Combined token usage exceeds context window for synthesis

Enhancement Patterns (4-5) – Adding Intelligence

The foundation patterns enable agents to be efficient, fast, and flexible when resolving complex problems. However, even sophisticated workflows may not help agent to handle incoming request correctly, if task understanding by the agent is not accurate or agent is missing information required to give correct answer. Design patterns in this group allow making agents more capable in handling complex problems through evaluating its own work and iterating to refine task understanding; or using relevant tools to obtain missing data.

4. Reflection – Iterative Improvement Through Critique

The Pattern: Reflection pattern offers a solution for self-correction and refinement by establishing a feedback loop where one LLM model generates output, and then another model evaluates it against predefined criteria to allow the first model to revise output based on the received feedback. The iterative process progressively enhances the accuracy and quality of the final result.

How It Works: Using dual-model approach with specific roles defined for each AI agent:

Sample Use Case: Code generation on request with generated code review:

  • Creator (GPT-4o): Generates Python function based on a user request
  • Critic (Claude Sonnet): Reviews generated code for bugs, style, or edge cases support
  • Creator: Revises the code based on received feedback

Why It Matters: Single-pass code generation is often flawed. Using Reflection design pattern allows catching errors and improving code quality and style.

Framework Feature: Application supports using different models for creator and critic roles to leverage model-specific strengths.

Iteration Control Strategies:

Deciding when to stop iterating is crucial for both quality and cost of using Reflection:

StrategyDescriptionProsCons
Fixed iterationsAlways run N cyclesPredictable cost/timeMay over/under-iterate
Quality threshold Stop when grade ≥ target EfficientRequires quantifiable metrics
Diminishing returnsStop when delta improvement < ε Balances quality/cost Needs improvement tracking
Critic consensusStop when no issues foundQuality-focusedMay never converge |

Implementation with Multiple Strategies:

Designing Effective Critic Prompts:

The critic prompt is critical for pattern success. A vague critic request produces a vague feedback.

Cross-Model Reflection:

Using the same model for creator and critic often creates an “echo chamber” where the critic approves flawed output because it has similar blind spots.

When NOT to Use Reflection Pattern:

  • Task has objective correctness criteria (you should use automated tests instead)
  • Single-pass output is consistently acceptable
  • Latency constraints don’t allow multiple iterations
  • Cost per iteration is prohibitive for the use case

5. Tool Use – Extending Agent Capabilities

The Pattern: Tool Use pattern enables agents to interact with external APIs, databases or services by equipping agents with domain-specific tools that they can call on as needed by the task they received.

How It Works: The pattern is often implemented through a Function Calling mechanism which involves defining and describing external functions or capabilities to the LLM. In LangChain that is done using the @tool decorator:

The LLM receives both the user’s request and available tool definitions. Based on this information, the LLM decides if calling one or more tools is required to generate response.

Sample Use Case: Research assistant:

  • Query: “What is quantum computing?”
  • Agent: Formulates answer using search results returned by the tool

Why It Matters: LLM model alone is limited to training data which may not had relevant information or had out of date data. The Tool Use pattern enables it to access additional information or real-time data and perform actions or calculations as needed for generating response.

Tool Description Best Practices:

The LLM model decides which tool to call based on the function name and description. Poor or incomplete descriptions often lead to incorrect tool selection.

Tool Execution Patterns:

Tools can be used in various patterns depending on the task:

PatternDescriptionExample
Single tool One tool call, use result “What’s the weather?” -> call weather_api()
SequentialOutput of A feeds into Bsearch -> summarize results
ParallelMultiple tools simultaneouslyweather + news + calendar
IterativeSame tool, refined queries search -> refine -> search again

Sequential Tool Chain:

Error Handling for Tools:

Tools can fail. You should design agents to handle failures gracefully:

Tool Selection Prompt Enhancement:

Always consider how to help the LLM model to make better tool selection decision:

When NOT to Use Tool Use Pattern:

  • Information is certainly available in the LLM’s training data
  • Tool latency would result in unacceptably slow responses
  • Task can be completed with LLM reasoning alone
  • Tool results would need extensive validation

Orchestration Patterns (6-7) – Complex Workflows

Intelligent behavior often requires an agent to break down a complex task into smaller steps that are planned to achieve the task goal once all the steps are completed. Some steps may require domain expertise or use of specific tools therefore, the plan should account for collaboration with specialized agents that have required expertise or tools. Design patterns in this group offer a standardized solution for having agentic system first to create a coherent plan to meet a goal and then coordinating multiple agent to execute this plan.

6. Planning – Strategic Breakdown and Execution

The Pattern: Planning pattern involves breaking a complex task into smaller steps, creating an execution plan and then following it step-by-step to achieve the final goal.

How It Works: Two-phase approach using LangGraph state machine:

Sample Use Case: Design RESTful API for a book library system

  • Planner: Analyzes requirements, breaks into steps (entities, relationships, constraints)
  • Executor: Create design for each step, produces SQL schema

Why It Matters: Handling complex tasks benefit from decomposing high-level requirements into actionable, sequential steps and creating an explicit plan with detailed design for each step before execution.

Plan Representation Formats:

You need to decide what structure to use for creating plans as that may affect plan execution quality, for example:

1. Linear Plans – a sequential list of items

2. Directed Acyclic Graph (DAG) Plans – a non-linear list of items with dependencies

3. Hierarchical Plans – nested goals:

Structured Plan Generation:

Plan Validation:

Before passing the plan to an agent for execution, you should validate it, for example:

Adaptive Re-planning:

In real-life, task execution can often deviates from the generated plan. It is recommended to build in re-planning capability to account for changes in task execution, for example:

When NOT to Use Planning Pattern:

  • Task is straightforward enough for direct execution
  • Planning overhead exceeds execution time
  • Requirements are too vague for meaningful decomposition
  • Real-time response is required

7. Multi-Agent Collaboration – Coordinated Teamwork

The Pattern: Multi-Agent Collaboration pattern involves creating a system of multiple specialized agents that work together in a structured way through defined communication protocols and interaction models allowing the group to deliver solution that would be impossible for any single agent.

Sample Use Cases:

To illustrate use of the Multi-Agent Collaboration pattern, I used LangGraph to demonstrate three different collaboration models:

  1. Sequential Pipeline: Research paper analysis (Researcher -> Critic -> Synthesizer)
  2. Parallel & Synthesis: Product launch campaign (Marketing + Content + Analyst -> Coordinator)
  3. Multi-Perspective Debate: Code review system (Security + Performance + Quality -> Synthesizer)

Sequential Pipeline: Research -> Critic -> Synthesizer

Parallel & Synthesis: Marketing + Content + Analyst -> Coordinator

Multi-Perspective Debate: Security + Performance + Quality -> Synthesizer

Why It Matters: Problems that agents face in real-life often require involvement of specialized agents working together in different collaboration structures. LangGraph library makes it easy to implement any collaboration pattern.

State Management in Multi-Agent Systems:

LangGraph uses a shared state object that all agents read from and write to:

Agent Role Definition Best Practices:

You should define strong agent role boundaries to prevent agents from either duplicating work, or providing generic feedback, for example:

Coordination Challenges and Solutions:

ChallengeExampleSolution
Conflicting outputsSecurity: “add auth” vs Performance: “reduce overhead” Synthesizer with explicit conflict resolution rules
Information loss Key details lost between agentsStructured hand-off format with required fields
Infinite loopsAgents keep requesting revisionsMax iteration limits, improvement thresholds
Redundant workMultiple agents analyze same aspect Clear scope boundaries, explicit “out of scope”

Conflict Resolution in Synthesizer:

Structured Hand-off Between Agents:

When NOT to Use Multi-Agent Collaboration Pattern:

  • Single perspective on task execution is sufficient
  • Multi-agent coordination overhead exceeds benefits
  • Agents would have highly overlapping responsibilities
  • Task requires sharing deep context that’s hard to transfer between agents

Advanced Patterns (8-10) – Self-Improvement & Quality Assurance

Agentic systems need to remember information from past interactions not only to provide coherent and personalized user experience, but also to learn and self-improve using collected data. Design patterns in this group enable agents to remember past conversations, learn from them, and improve over time.

8. Memory Management – Context Across Interactions

The Pattern: Memory Management pattern is very important as it allows agents to keep track of conversations, personalize responses, or learn from the interactions. LLM models rely on three memory types:

  • Semantic Memory: Facts and knowledge (user preferences, domain knowledge)
  • Episodic Memory: Past experiences (conversation history, previous tickets)
  • Procedural Memory: Rules and strategies (company policies, protocols)

How It Works: LangChain offers ConversationBuifferMemory to automatically inject the history of a single conversation into a prompt, LangGraph enables advanced, long-term memory via the InMemoryStore:

Sample Use Case: Financial advisor chat bot:

  • Remembers user’s investment preferences (semantic memory)
  • Recalls past conversations (episodic memory)
  • Follows fiduciary duty rules (procedural memory)

Why It Matters: Without a memory mechanism, agents are stateless. They are unable to maintain conversational context, learn from experience, or personalize responses for users.

Agent Memory Architecture Decisions:

Your choice of storage backend for agentic application is always based on use case requirements:

Backend PersistenceScalabilityQuery TypesBest For
InMemoryStoreSession onlySingle instanceKey-valuePrototyping, demos
RedisConfigurableHigh (clustered) Key-value, TTLProduction, multi-instance
PostgreSQL + pgvectorYesHighSQL + semanticComplex queries + similarity
Pinecone/WeaviateYesVery high Semantic onlyLarge-scale retrieval
SQLiteYesLowSQLDesktop apps, edge comp.

Memory Retrieval Strategies:

Keep in mind that how you retrieve memories can affect response quality, for example:

Memory Types Implementation:

Memory Consolidation:

Volume of episodic memories can significantly grow aver time. Therefor, detailed episodic memories should be periodically compressed into summarized semantic knowledge to manage the size / limit of the context window, for example:

When NOT to Use Memory Management Pattern:

  • Stateless interactions are acceptable in your use case (e.g., simple Q&A chat)
  • Privacy requirements prohibit storing user data
  • Context window size is sufficient for conversation history for your use case
  • Memory maintenance complexity exceeds benefits

9. Learning & Adapting – Self-Improvement Through Benchmarking

The Pattern: Learning & Adapting pattern enable agents to iteratively evolve by autonomously improving its parameters or even, its own code based on test results. Without this ability their performance can degrade when faced with a novel task.

How It Works: Agent follows Benchmark -> Analyze -> Improve itslef cycle

Sample Use Case: An agent that improves its own code through cycles of:

  1. Testing current implementation
  2. Analyzing performance and failures
  3. Generating improved version
  4. Selecting best version for next iteration

    Scoring Formula: Best version is selected using weighted combination of three factors

    Why It Matters: Demonstrates meta-learning – an agent that autonomously improves its behavior based on new data and iterations. Pattern is applicable to prompt engineering, hyper-parameter tuning, and automated optimization.

    Designing Effective Benchmarks:

    Remember that quality of your benchmark determines agent learning quality. Therefore, a comprehensive benchmark suite should include different test types:

    Test TypePurposeExample
    CorrectnessDoes output match expected?sort([3,1,2]) -> [1,2,3]
    Edge casesHandles boundaries? Empty list, single element, duplicates, etc.
    PerformanceMeets speed requirements?Sort 10,000 elements in < 100ms
    RobustnessHandles bad input? None, wrong types, malformed data
    ScaleWorks at production volume?1M element sort

    Comprehensive Benchmark Example:

    Avoiding Local Optima:

    Keep in mind that agent learning loop can get stuck optimizing for specific failing tests while regressing on others. Therefore, you should always track the overall agent performance, for example:

    Scoring Formula Variations:

    Different tasks typically need different scoring weights, for example:

    When NOT to Use Learning & Adapting Pattern:

    • Task doesn’t have measurable success criteria
    • Cost of benchmark testst creation exceeds expected benefits
    • Solution space is too large for iterative search to succeede quickly
    • Human review is required anyway

    10. Goal Setting & Monitoring – Quality Assurance Through Review Cycles

    The Pattern: Goal Setting & Monitoring patterns is about setting a specific goal for an agent and providing the means to track the progress and determine if goal is achieved.

    How It Works: Pattern is demonstrated using two agents:

    • Developer Agent:
      • Analyzes requirements
      • Creates implementation plan
      • Writes Python code
      • Revises code based on feedback
    • Manager Agent:
      • Reviews code against requirements
      • Grades the code across 4 criteria (0-100 scale):
        • Requirements coverage (40%)
        • Code quality (30%)
        • Error handling (15%)
        • Code documentation (15%)
      • Provides prioritized, actionable feedback

    Agents collaborate via the following iteration cycle:

    1. Developer agent that creates implementation plans and generates code
    2. Manager agent that monitors progress, reviews code, and provides feedback
    3. Iterative improvement cycle based on manager feedback
    4. Grade-based progress tracking – iterations will stop if grade is above 85

    Sample Use Case: REST API client implementation based on simple requirements: retry logic, rate limiting, error handling

    Why It Matters: The pattern provides a standardize solution by giving LLM model a sense of purpose and self-assessment. Automated code review and quality assurance use case shows how multi-agent collaboration enables complex quality control workflows.

    Designing Effective Grading Rubrics:

    You should alwaws rememebr that vague prompt, such as vague rubrics will lead to inconsistent grading. Therefore, be explicit in the prompt about what each score means:

    Manager Prompt with Explicit Rubric:

    When NOT to Use Goal Setting & Monitoring Pattern:

    • Task doesn’t have clear success criteria
    • Single-pass generation is sufficient to generate response
    • Human review of response is required regardless
    • Iteration cost (API calls, time) exceeds benefits

    11. Retrieval-Augmented Generation (RAG) – Access Context Specific Data

    The Pattern: RAG pattern enables LLM to access and integrate external, current, and context-specific information to enhance the accuracy, relevance and factual basis of LLM response.

    How It Works:

    • User request is analyzed to determine question type (factual, comparison, overview, etc.)
    • Based on the question type, system determines what information is required to answer it
    • Simple RAG process involves Retrieval (searching a knowledge base for relevant content) and Augmentation (adding citations from retrieved documents to the LLM prompt)
    • GraphRAG process also leverages connections between different entities (nodes in the knowledge graph) which allows system to answer questions that require knowledge of relationships between different pieces of information / documents.

    Why It Matters:

    • Grounds responses in facts: Allows reducing hallucinations by providing source material
    • Enables domain-specific knowledge: Provides access to proprietary documents, databases, or specialized corporate information
    • Dynamic knowledge: Enables updating knowledge base used by LLM without retraining the model
    • Transparency: Can cite sources for information used to generate answers

    Sample Use Cases:

    • Question answering over enterprise documents
    • Customer support with knowledge base integration
    • Research assistants with access to a library of scientific papers
    • Legal or medical applications requiring factual accuracy of generated answers

    Note: While RAG is a very important agentic design pattern, it’s not implemented in the testing framework. I’ve have a separate GraphRAG project that uses this design paternal to build an application that demonstrates advanced retrieval techniques including graph-based knowledge representation, multi-hop reasoning, and hybrid search strategies. You can find more information about this project in the blogs:

    1. Building a GraphRAG System – Core Infrastructure & Document Ingestion
    2. GraphRAG Part 2 – Cross-Doc & Sub-graph Extraction, Multi-Vector Entity Representation
    3. GraphRAG Part 3 – Intelligent MVR, Query Routing and Context Generation
    4. GraphRAG Part 4 – Community Detection and Embedding, Search and Hybrid Retrieval Integration

    Why Separate?: GraphRAG requires significant infrastructure (object storage for documents, graph database, vector databases, embedding models, indexing pipelines) that deserves a dedicated attention. The GraphRAG project explores these components in depth.

    Combining Patterns: Real-World Applications

    Individual patterns are building blocks for agentic applications. In real-world production application, you would typically need to combine multiple patterns:

    Example: AI Code Assistant

    Example: Research Analyst Agent

    Pattern Combination Guidelines:

    CombinationWhen to Use Watch Out For
    Routing + Specialized ChainsMultiple distinct request typesMisclassification cascades
    Planning + Multi-AgentComplex tasks needing expertise Coordination overhead
    Tool Use + ReflectionExternal data needs verificationTool failures during reflection
    Memory + Any PatternPersonalization neededMemory retrieval latency
    Parallelization + SynthesisIndependent analyses to combineContext window limits

    Anti-Pattern: Over-Engineering

    Not every agentic application needs using every design pattern. A simple FAQ chatbot using Routing and Tool Use patterns (for knowledge base search) likely will be more effective than a complex multi-agent system.

    Rule of thumb: Start with the simplest pattern that could work.

    Add complexity only when you have evidence it’s needed.

    Key Insights and Comparison Results

    Cost and Latency Implications

    Understanding the resource implications of each pattern helps in architecture decisions:

    PatternAPI Calls per RequestRelative Cost Latency Impact
    ChainingN (number of steps)MediumAdditive (+N calls)
    Routing1 (classify) + handler Low-Medium+1 classification call
    ParallelizationN (parallel tasks) Medium-HighReduced (concurrent)
    Reflection2-6 (iterations × 2) High 2× per iteration
    Tool Use1 + tool calls Low-Medium +tool latency
    Planning2+ (plan + execute) Medium+planning phase
    Multi-Agent3-10+ (varies) HighestDepends on topology
    Memory1 + retrieval Low-Medium+retrieval latency
    Learning & Adapting 5-20+ (iterations)Very High Minutes to hours
    Goal Monitoring4-14 (iterations × 2) HighMinutes

    Cost Optimization Strategies

    1. Model Tiering: Use cheaper LLM models for simpler tasks

    2. Early Terminatio*: Stop iterations when good enough

    3. Caching: Store and reuse common results

    4. Batching: Combine multiple small requests

    ### Latency Budget Example:

    For a 3-second response time budget:

    ComponentBudgetDesign Pattern Choice
    Classification300msKeyword matching (no LLM) |
    Main processing2000msSingle LLM call or 2-step chain
    Tool calls 500ms Max 1-2 fast tools
    Synthesis200msLightweight post-processing

    For a 30-second budget which is ussually acceptable for complex tasks:

    • Full planning phase
    • 2-3 reflection iterations
    • Multi-agent collaboration (sequential)

    LLM Model Performance

    After implementing these 10 patterns across multiple LLM providers I nmotices that Model Performance Varies by Pattern:

    • Code generation (Reflection, Learning & Adapting): GPT-4o and Claude Sonnet excel
    • Structured planning (Planning, Goal Monitoring): Claude Sonnet provides more detailed plans
    • Tool use: GPT-4o has more reliable function calling
    • Multi-step reasoning: All frontier models (GPT-4o, Claude Sonnet 4.5, Gemini 3.5 Pro) perform well

    The Framework’s Value: Having a consistent testing harness made it possible to quickly prototype patterns, compare models objectively, and identify which combinations work best for specific tasks.

    Conclusion and What’s Next

    This project demonstrates that agentic design patterns are not just theoretical concepts. Like software design patterns, they’re practical solutions with measurable benefits. The testing framework implements 10 core patterns using LangChain and LangGraph, with the ability to run and compare them across OpenAI, Anthropic, and Google models.

    Key Takeaways:

    1. Design Patterns matter: Structured workflows significantly outperform single LLM calls
    2. Different tasks need different patterns**: There is no one-size-fits-all solution
    3. Model choice matters: Different LLMs have different strengths
    4. Frameworks accelerate development: LangChain and LangGraph abstract model APIs and provide tools to dramatically speed up workflows development

    In this project I focused on intra-application agent communication where agents collaborate within a single application runtime. But building an agentic system often involve inter-applicatio communication:

    • Inter-Agent Communication (A2A): Google’s protocol for agents from different systems to discover, negotiate, and coordinate with each other
    • Model Context Protocol (MCP): Anthropic’s protocol for agents to access tools and resources across application boundaries
    • Hybrid architectures: Combining intra-application patterns (like those implemented here) with inter-application protocols

    Part 2 will explore these communication paradigms, their trade-offs, and how they complement the design patterns covered in this post.

    Framework code is open source shared in GitHub repository. Try running the patterns, compare models, and adapt them to your use cases. The future of AI applications is agentic – these patterns are your starting point.

    Leave a comment