Methodology · matrix version 2026-04-01
How we score multi-agent frameworks.
このページのすべてのスコアを形作る3つのコミットメント:ベンダーへの掲載料なし、推測されたスコアなし、デモウェアなし。スコアは、下記の公開ルーブリックを適用するBuzziの指名されたシニア・エンジニアから出ています。各フレームワーク・ページには鮮度を監査するためのlast_verified_atタイムスタンプも付いています。
10 frameworks, neutral cards
LangGraph
×1.0 overheadLangChain · MIT · primary python
CrewAI
×1.3 overheadCrewAI · MIT · primary python
AutoGen / AG2
×2.5 overheadMicrosoft / AG2 community · CC-BY-4.0 / Apache-2.0 · primary python
OpenAI Agents SDK
×1.1 overheadOpenAI · MIT · primary python
Pydantic AI
×1.0 overheadPydantic · MIT · primary python
Anthropic Claude Agent SDK
×1.1 overheadAnthropic · MIT · primary python
Google Agent Development Kit
×1.2 overheadGoogle · Apache-2.0 · primary python
Microsoft Semantic Kernel
×1.2 overheadMicrosoft · MIT · primary multi
LlamaIndex Agents
×1.4 overheadLlamaIndex · MIT · primary python
Haystack
×1.3 overheaddeepset · Apache-2.0 · primary python
15 capability axes, with rubrics
Sequential workflows
Pipeline-style chains where one agent finishes before the next starts.
- 10 / 10
- Pipelines are a first-class primitive with explicit ordering and typed handoff.
- 5 / 10
- Sequential chains are possible via orchestration code but not a native primitive.
- 0 / 10
- Framework cannot guarantee deterministic sequential ordering.
Parallel workflows
Concurrent fan-out / fan-in across multiple agents.
- 10 / 10
- Native parallel execution with built-in result merging and back-pressure.
- 5 / 10
- Parallel execution requires custom asyncio / threading code on top.
- 0 / 10
- No support for concurrent agent execution.
Hierarchical workflows
Supervisor-and-worker patterns with delegation and aggregation.
- 10 / 10
- Supervisor pattern is documented, idiomatic, and replayable.
- 5 / 10
- Achievable but requires hand-rolled message routing.
- 0 / 10
- No first-class supervisor primitive.
Adaptive workflows
Dynamic routing where agents pick the next step based on intermediate state.
- 10 / 10
- Router/handoff primitives are first-class with conditional edges.
- 5 / 10
- Possible via tool calls but not the framework's sweet spot.
- 0 / 10
- Control flow is rigid; no dynamic routing.
State management
Persistent, typed memory across runs and across agents.
- 10 / 10
- Typed state schema, persistent checkpoints, replay support.
- 5 / 10
- Session memory is supported; persistence requires external store.
- 0 / 10
- Stateless by default; users must build persistence themselves.
Human-in-the-loop
Pause-resume primitives so humans can approve, edit, or reject actions.
- 10 / 10
- Native interrupt/resume with serialisable checkpoints.
- 5 / 10
- Approval gates can be bolted on; not a first-class primitive.
- 0 / 10
- No interrupt mechanism — the framework runs to completion.
Python support
Production-grade Python SDK with active maintenance.
- 10 / 10
- Reference implementation; active releases; complete typing.
- 5 / 10
- Functional Python SDK lagging the primary language.
- 0 / 10
- No Python SDK.
TypeScript support
Production-grade TypeScript / Node SDK at parity with Python.
- 10 / 10
- First-class TS SDK with parity to Python in features and types.
- 5 / 10
- TS SDK exists but trails Python in feature coverage.
- 0 / 10
- No TS SDK.
.NET / Java support
First-class JVM (Java/Kotlin) and/or .NET SDK.
- 10 / 10
- Reference-quality .NET and/or Java SDK with feature parity.
- 5 / 10
- Community port or partial SDK.
- 0 / 10
- No .NET or Java SDK.
MCP support
Native Model Context Protocol client and/or server primitives.
- 10 / 10
- Authored or reference implementation of MCP.
- 5 / 10
- MCP available as an adapter or community plugin.
- 0 / 10
- No MCP support.
A2A support
Native Agent-to-Agent (Google) protocol primitives.
- 10 / 10
- Authored or reference implementation of A2A.
- 5 / 10
- A2A available via adapter; partial coverage.
- 0 / 10
- No A2A support.
Observability
Tracing, token accounting, replay, and audit-grade logs.
- 10 / 10
- Built-in tracing dashboard, structured token accounting, replay, exportable audit log.
- 5 / 10
- OpenTelemetry hooks exist; user must wire dashboards themselves.
- 0 / 10
- Print-statement debugging only.
Deployment flexibility
Range of supported deployment targets (cloud, on-prem, edge).
- 10 / 10
- Cloud, on-prem, and edge all documented and tested.
- 5 / 10
- Cloud-first; on-prem requires extra work.
- 0 / 10
- Tied to a single hosted backend.
Maturity
Production track record, release cadence, community size.
- 10 / 10
- 2+ years of production use across many large deployments.
- 5 / 10
- 6-18 months in the wild; growing but evolving rapidly.
- 0 / 10
- Pre-1.0; APIs change every release.
Learning curve (higher = easier)
Time-to-prototype for a developer new to the framework.
- 10 / 10
- A working prototype in under 30 minutes from a clean machine.
- 5 / 10
- Prototype in half a day with the docs open.
- 0 / 10
- Multi-week onboarding before the first useful run.
Scoring formula
# Ranking
weights = buildWeightVector(inputs) # 15 weights per user input
for fw in frameworks:
score = sum(fw.capabilities[cap] * weights[cap] for cap in CAPS)
if hardConstraintFails(inputs, fw):
score = 0
return sortDesc(scored)
# Cost per task
estimated_tokens_per_task = base_task_tokens
* framework_overhead_multiplier
* (1 + (roles - 1) * 0.3)
* (1.2 if hitl else 1.0)
per_task_usd = (0.7 * tokens / 1M * input_rate)
+ (0.3 * tokens / 1M * output_rate)Glossary
- Hierarchical
- A supervisor agent delegates work to sub-agents, reviews their output, and composes the final answer. Good for multi-stage tasks with clear ownership.
- Adaptive
- Agents decide dynamically which other agents or tools to invoke based on intermediate results. Best when the control flow cannot be fixed upfront.
- Agent
- A named role with its own prompt, tools, and memory. "Roles" counts unique agent identities, not the number of LLM calls.
- HITL (Human-in-the-Loop)
- The workflow pauses for a human to approve, edit, or reject an agent action before continuing. Critical for regulated or high-risk automations.
- MCP (Model Context Protocol)
- Anthropic-led open standard for connecting LLM agents to tools, data, and other servers. Look for MCP support if you want vendor-portable tool integrations.
- A2A (Agent-to-Agent Protocol)
- Google-led open standard for agents from different vendors to discover and call each other. Emerging spec; relevant for federated agent systems.
- Observability
- Structured traces, token accounting, replayable runs, and exportable audit logs. "Regulated-grade" means immutable audit trails and retention controls.
Public dataset
完全な能力マトリクスはAIエンジンと研究者向けにJSONとして公開されています:
- /api/tools/agent-framework/frameworks.json — ライブ、エッジキャッシュ、CORS開放。
- /data/agent-frameworks-matrix.json — GitHubの
buzzi-ai/agent-framework-matrixにミラーされた静的スナップショット。
FAQ
スコアはどのように割り当てられますか?
Buzziの指名されたシニア・エンジニアが、このページの公開ルーブリックを使用して各フレームワークを各軸で評価します。スコアは四半期ごとにレビューされ、フレームワークごとのlast_reviewedタイムスタンプが公開データセットに公開されます。
ベンダーは掲載に支払いますか?
いいえ。スコアは編集的で決して販売されません。スコア変更要求は、技術的な正当化を伴うオープン・マトリクス・リポジトリへの公開PRとして提出されなければなりません。
どのフレームワークを追跡するか、どのように決めますか?
1万以上のスターを持つアクティブなGitHubリポジトリ、またはAnthropic、Google、Microsoft、OpenAI、LangChainによってバックアップされているもの。勢いと本番利用に基づいて四半期ごとに1回、フレームワークを追加または引退させます。
タスクあたりのコストはどのように計算されますか?
estimated_tokens_per_task = base_task_tokens × framework_overhead_multiplier × (1 + (roles − 1) × 0.3) × (HITLの場合1.2、それ以外1.0)。トークン料金はllm_modelsテーブルから取得され、ユーザーはウィザードでモデルを上書きできます。
ハード制約はどのように機能しますか?
.NETスタックはMicrosoft Semantic Kernelに絞り込まれます。JavaはSemantic KernelまたはGoogle ADKに絞り込まれます。コンプライアンス・グレードの可観測性を持つTypeScriptは、LangGraph.js、OpenAI Agents SDK、またはAnthropic Claude SDKに絞り込まれます。失格となったフレームワークは理由とともに表示されます。
どこで修正を提出できますか?
buzzi-ai/agent-framework-matrixリポジトリへのプル・リクエストを開くか、research@buzzi.aiにメールしてください。修正要求は10営業日以内にレビューします。
同意できないスコアを見つけましたか?
オープン・マトリクス・リポジトリにPRを開くか、research@buzzi.aiにメールしてください。すべての修正要求は10営業日以内に公開回答を受けます。
セレクターに戻る