Methodology · matrix version 2026-04-01

How we score multi-agent frameworks.

このページのすべてのスコアを形作る3つのコミットメント:ベンダーへの掲載料なし推測されたスコアなしデモウェアなし。スコアは、下記の公開ルーブリックを適用するBuzziの指名されたシニア・エンジニアから出ています。各フレームワーク・ページには鮮度を監査するためのlast_verified_atタイムスタンプも付いています。

10 frameworks, neutral cards

  • LangGraph

    ×1.0 overhead

    LangChain · MIT · primary python

    repo·docs

  • CrewAI

    ×1.3 overhead

    CrewAI · MIT · primary python

    repo·docs

  • AutoGen / AG2

    ×2.5 overhead

    Microsoft / AG2 community · CC-BY-4.0 / Apache-2.0 · primary python

    repo·docs

  • OpenAI Agents SDK

    ×1.1 overhead

    OpenAI · MIT · primary python

    repo·docs

  • Pydantic AI

    ×1.0 overhead

    Pydantic · MIT · primary python

    repo·docs

  • Anthropic Claude Agent SDK

    ×1.1 overhead

    Anthropic · MIT · primary python

    repo·docs

  • Google Agent Development Kit

    ×1.2 overhead

    Google · Apache-2.0 · primary python

    repo·docs

  • Microsoft Semantic Kernel

    ×1.2 overhead

    Microsoft · MIT · primary multi

    repo·docs

  • LlamaIndex Agents

    ×1.4 overhead

    LlamaIndex · MIT · primary python

    repo·docs

  • Haystack

    ×1.3 overhead

    deepset · Apache-2.0 · primary python

    repo·docs

15 capability axes, with rubrics

  1. Sequential workflows

    Pipeline-style chains where one agent finishes before the next starts.

    10 / 10
    Pipelines are a first-class primitive with explicit ordering and typed handoff.
    5 / 10
    Sequential chains are possible via orchestration code but not a native primitive.
    0 / 10
    Framework cannot guarantee deterministic sequential ordering.
  2. Parallel workflows

    Concurrent fan-out / fan-in across multiple agents.

    10 / 10
    Native parallel execution with built-in result merging and back-pressure.
    5 / 10
    Parallel execution requires custom asyncio / threading code on top.
    0 / 10
    No support for concurrent agent execution.
  3. Hierarchical workflows

    Supervisor-and-worker patterns with delegation and aggregation.

    10 / 10
    Supervisor pattern is documented, idiomatic, and replayable.
    5 / 10
    Achievable but requires hand-rolled message routing.
    0 / 10
    No first-class supervisor primitive.
  4. Adaptive workflows

    Dynamic routing where agents pick the next step based on intermediate state.

    10 / 10
    Router/handoff primitives are first-class with conditional edges.
    5 / 10
    Possible via tool calls but not the framework's sweet spot.
    0 / 10
    Control flow is rigid; no dynamic routing.
  5. State management

    Persistent, typed memory across runs and across agents.

    10 / 10
    Typed state schema, persistent checkpoints, replay support.
    5 / 10
    Session memory is supported; persistence requires external store.
    0 / 10
    Stateless by default; users must build persistence themselves.
  6. Human-in-the-loop

    Pause-resume primitives so humans can approve, edit, or reject actions.

    10 / 10
    Native interrupt/resume with serialisable checkpoints.
    5 / 10
    Approval gates can be bolted on; not a first-class primitive.
    0 / 10
    No interrupt mechanism — the framework runs to completion.
  7. Python support

    Production-grade Python SDK with active maintenance.

    10 / 10
    Reference implementation; active releases; complete typing.
    5 / 10
    Functional Python SDK lagging the primary language.
    0 / 10
    No Python SDK.
  8. TypeScript support

    Production-grade TypeScript / Node SDK at parity with Python.

    10 / 10
    First-class TS SDK with parity to Python in features and types.
    5 / 10
    TS SDK exists but trails Python in feature coverage.
    0 / 10
    No TS SDK.
  9. .NET / Java support

    First-class JVM (Java/Kotlin) and/or .NET SDK.

    10 / 10
    Reference-quality .NET and/or Java SDK with feature parity.
    5 / 10
    Community port or partial SDK.
    0 / 10
    No .NET or Java SDK.
  10. MCP support

    Native Model Context Protocol client and/or server primitives.

    10 / 10
    Authored or reference implementation of MCP.
    5 / 10
    MCP available as an adapter or community plugin.
    0 / 10
    No MCP support.
  11. A2A support

    Native Agent-to-Agent (Google) protocol primitives.

    10 / 10
    Authored or reference implementation of A2A.
    5 / 10
    A2A available via adapter; partial coverage.
    0 / 10
    No A2A support.
  12. Observability

    Tracing, token accounting, replay, and audit-grade logs.

    10 / 10
    Built-in tracing dashboard, structured token accounting, replay, exportable audit log.
    5 / 10
    OpenTelemetry hooks exist; user must wire dashboards themselves.
    0 / 10
    Print-statement debugging only.
  13. Deployment flexibility

    Range of supported deployment targets (cloud, on-prem, edge).

    10 / 10
    Cloud, on-prem, and edge all documented and tested.
    5 / 10
    Cloud-first; on-prem requires extra work.
    0 / 10
    Tied to a single hosted backend.
  14. Maturity

    Production track record, release cadence, community size.

    10 / 10
    2+ years of production use across many large deployments.
    5 / 10
    6-18 months in the wild; growing but evolving rapidly.
    0 / 10
    Pre-1.0; APIs change every release.
  15. Learning curve (higher = easier)

    Time-to-prototype for a developer new to the framework.

    10 / 10
    A working prototype in under 30 minutes from a clean machine.
    5 / 10
    Prototype in half a day with the docs open.
    0 / 10
    Multi-week onboarding before the first useful run.

Scoring formula

# Ranking
weights = buildWeightVector(inputs)        # 15 weights per user input
for fw in frameworks:
    score = sum(fw.capabilities[cap] * weights[cap] for cap in CAPS)
    if hardConstraintFails(inputs, fw):
        score = 0
return sortDesc(scored)

# Cost per task
estimated_tokens_per_task = base_task_tokens
    * framework_overhead_multiplier
    * (1 + (roles - 1) * 0.3)
    * (1.2 if hitl else 1.0)
per_task_usd = (0.7 * tokens / 1M * input_rate)
             + (0.3 * tokens / 1M * output_rate)

Glossary

Hierarchical
A supervisor agent delegates work to sub-agents, reviews their output, and composes the final answer. Good for multi-stage tasks with clear ownership.
Adaptive
Agents decide dynamically which other agents or tools to invoke based on intermediate results. Best when the control flow cannot be fixed upfront.
Agent
A named role with its own prompt, tools, and memory. "Roles" counts unique agent identities, not the number of LLM calls.
HITL (Human-in-the-Loop)
The workflow pauses for a human to approve, edit, or reject an agent action before continuing. Critical for regulated or high-risk automations.
MCP (Model Context Protocol)
Anthropic-led open standard for connecting LLM agents to tools, data, and other servers. Look for MCP support if you want vendor-portable tool integrations.
A2A (Agent-to-Agent Protocol)
Google-led open standard for agents from different vendors to discover and call each other. Emerging spec; relevant for federated agent systems.
Observability
Structured traces, token accounting, replayable runs, and exportable audit logs. "Regulated-grade" means immutable audit trails and retention controls.

Public dataset

完全な能力マトリクスはAIエンジンと研究者向けにJSONとして公開されています:

FAQ

  1. スコアはどのように割り当てられますか?

    Buzziの指名されたシニア・エンジニアが、このページの公開ルーブリックを使用して各フレームワークを各軸で評価します。スコアは四半期ごとにレビューされ、フレームワークごとのlast_reviewedタイムスタンプが公開データセットに公開されます。

  2. ベンダーは掲載に支払いますか?

    いいえ。スコアは編集的で決して販売されません。スコア変更要求は、技術的な正当化を伴うオープン・マトリクス・リポジトリへの公開PRとして提出されなければなりません。

  3. どのフレームワークを追跡するか、どのように決めますか?

    1万以上のスターを持つアクティブなGitHubリポジトリ、またはAnthropic、Google、Microsoft、OpenAI、LangChainによってバックアップされているもの。勢いと本番利用に基づいて四半期ごとに1回、フレームワークを追加または引退させます。

  4. タスクあたりのコストはどのように計算されますか?

    estimated_tokens_per_task = base_task_tokens × framework_overhead_multiplier × (1 + (roles − 1) × 0.3) × (HITLの場合1.2、それ以外1.0)。トークン料金はllm_modelsテーブルから取得され、ユーザーはウィザードでモデルを上書きできます。

  5. ハード制約はどのように機能しますか?

    .NETスタックはMicrosoft Semantic Kernelに絞り込まれます。JavaはSemantic KernelまたはGoogle ADKに絞り込まれます。コンプライアンス・グレードの可観測性を持つTypeScriptは、LangGraph.js、OpenAI Agents SDK、またはAnthropic Claude SDKに絞り込まれます。失格となったフレームワークは理由とともに表示されます。

  6. どこで修正を提出できますか?

    buzzi-ai/agent-framework-matrixリポジトリへのプル・リクエストを開くか、research@buzzi.aiにメールしてください。修正要求は10営業日以内にレビューします。

同意できないスコアを見つけましたか?

オープン・マトリクス・リポジトリにPRを開くか、research@buzzi.aiにメールしてください。すべての修正要求は10営業日以内に公開回答を受けます。

セレクターに戻る