Methodology · matrix version 2026-04-01

How we score multi-agent frameworks.

이 페이지의 모든 점수를 형성하는 세 가지 약속: 벤더 유료 게재 없음, 추측된 점수 없음, 데모웨어 없음. 점수는 아래 공개 루브릭을 적용하는 Buzzi의 지정된 시니어 엔지니어로부터 옵니다. 각 프레임워크 페이지에는 신선도 감사를 위한 last_verified_at 타임스탬프도 포함되어 있습니다.

10 frameworks, neutral cards

LangGraph
×1.0 overhead
LangChain · MIT · primary python
repo·docs
CrewAI
×1.3 overhead
CrewAI · MIT · primary python
repo·docs
AutoGen / AG2
×2.5 overhead
Microsoft / AG2 community · CC-BY-4.0 / Apache-2.0 · primary python
repo·docs
OpenAI Agents SDK
×1.1 overhead
OpenAI · MIT · primary python
repo·docs
Pydantic AI
×1.0 overhead
Pydantic · MIT · primary python
repo·docs
Anthropic Claude Agent SDK
×1.1 overhead
Anthropic · MIT · primary python
repo·docs
Google Agent Development Kit
×1.2 overhead
Google · Apache-2.0 · primary python
repo·docs
Microsoft Semantic Kernel
×1.2 overhead
Microsoft · MIT · primary multi
repo·docs
LlamaIndex Agents
×1.4 overhead
LlamaIndex · MIT · primary python
repo·docs
Haystack
×1.3 overhead
deepset · Apache-2.0 · primary python
repo·docs

15 capability axes, with rubrics

Sequential workflows
Pipeline-style chains where one agent finishes before the next starts.
10 / 10
Pipelines are a first-class primitive with explicit ordering and typed handoff.
5 / 10
Sequential chains are possible via orchestration code but not a native primitive.
0 / 10
Framework cannot guarantee deterministic sequential ordering.
Parallel workflows
Concurrent fan-out / fan-in across multiple agents.
10 / 10
Native parallel execution with built-in result merging and back-pressure.
5 / 10
Parallel execution requires custom asyncio / threading code on top.
0 / 10
No support for concurrent agent execution.
Hierarchical workflows
Supervisor-and-worker patterns with delegation and aggregation.
10 / 10
Supervisor pattern is documented, idiomatic, and replayable.
5 / 10
Achievable but requires hand-rolled message routing.
0 / 10
No first-class supervisor primitive.
Adaptive workflows
Dynamic routing where agents pick the next step based on intermediate state.
10 / 10
Router/handoff primitives are first-class with conditional edges.
5 / 10
Possible via tool calls but not the framework's sweet spot.
0 / 10
Control flow is rigid; no dynamic routing.
State management
Persistent, typed memory across runs and across agents.
10 / 10
Typed state schema, persistent checkpoints, replay support.
5 / 10
Session memory is supported; persistence requires external store.
0 / 10
Stateless by default; users must build persistence themselves.
Human-in-the-loop
Pause-resume primitives so humans can approve, edit, or reject actions.
10 / 10
Native interrupt/resume with serialisable checkpoints.
5 / 10
Approval gates can be bolted on; not a first-class primitive.
0 / 10
No interrupt mechanism — the framework runs to completion.
Python support
Production-grade Python SDK with active maintenance.
10 / 10
Reference implementation; active releases; complete typing.
5 / 10
Functional Python SDK lagging the primary language.
0 / 10
No Python SDK.
TypeScript support
Production-grade TypeScript / Node SDK at parity with Python.
10 / 10
First-class TS SDK with parity to Python in features and types.
5 / 10
TS SDK exists but trails Python in feature coverage.
0 / 10
No TS SDK.
.NET / Java support
First-class JVM (Java/Kotlin) and/or .NET SDK.
10 / 10
Reference-quality .NET and/or Java SDK with feature parity.
5 / 10
Community port or partial SDK.
0 / 10
No .NET or Java SDK.
MCP support
Native Model Context Protocol client and/or server primitives.
10 / 10
Authored or reference implementation of MCP.
5 / 10
MCP available as an adapter or community plugin.
0 / 10
No MCP support.
A2A support
Native Agent-to-Agent (Google) protocol primitives.
10 / 10
Authored or reference implementation of A2A.
5 / 10
A2A available via adapter; partial coverage.
0 / 10
No A2A support.
Observability
Tracing, token accounting, replay, and audit-grade logs.
10 / 10
Built-in tracing dashboard, structured token accounting, replay, exportable audit log.
5 / 10
OpenTelemetry hooks exist; user must wire dashboards themselves.
0 / 10
Print-statement debugging only.
Deployment flexibility
Range of supported deployment targets (cloud, on-prem, edge).
10 / 10
Cloud, on-prem, and edge all documented and tested.
5 / 10
Cloud-first; on-prem requires extra work.
0 / 10
Tied to a single hosted backend.
Maturity
Production track record, release cadence, community size.
10 / 10
2+ years of production use across many large deployments.
5 / 10
6-18 months in the wild; growing but evolving rapidly.
0 / 10
Pre-1.0; APIs change every release.
Learning curve (higher = easier)
Time-to-prototype for a developer new to the framework.
10 / 10
A working prototype in under 30 minutes from a clean machine.
5 / 10
Prototype in half a day with the docs open.
0 / 10
Multi-week onboarding before the first useful run.

Scoring formula

# Ranking
weights = buildWeightVector(inputs)        # 15 weights per user input
for fw in frameworks:
    score = sum(fw.capabilities[cap] * weights[cap] for cap in CAPS)
    if hardConstraintFails(inputs, fw):
        score = 0
return sortDesc(scored)

# Cost per task
estimated_tokens_per_task = base_task_tokens
    * framework_overhead_multiplier
    * (1 + (roles - 1) * 0.3)
    * (1.2 if hitl else 1.0)
per_task_usd = (0.7 * tokens / 1M * input_rate)
             + (0.3 * tokens / 1M * output_rate)

Glossary

Hierarchical: A supervisor agent delegates work to sub-agents, reviews their output, and composes the final answer. Good for multi-stage tasks with clear ownership.
Adaptive: Agents decide dynamically which other agents or tools to invoke based on intermediate results. Best when the control flow cannot be fixed upfront.
Agent: A named role with its own prompt, tools, and memory. "Roles" counts unique agent identities, not the number of LLM calls.
HITL (Human-in-the-Loop): The workflow pauses for a human to approve, edit, or reject an agent action before continuing. Critical for regulated or high-risk automations.
MCP (Model Context Protocol): Anthropic-led open standard for connecting LLM agents to tools, data, and other servers. Look for MCP support if you want vendor-portable tool integrations.
A2A (Agent-to-Agent Protocol): Google-led open standard for agents from different vendors to discover and call each other. Emerging spec; relevant for federated agent systems.
Observability: Structured traces, token accounting, replayable runs, and exportable audit logs. "Regulated-grade" means immutable audit trails and retention controls.

Public dataset

전체 능력 매트릭스는 AI 엔진 및 연구자를 위해 JSON으로 게시됩니다:

/api/tools/agent-framework/frameworks.json — 실시간, 엣지 캐시, CORS 개방.
/data/agent-frameworks-matrix.json — buzzi-ai/agent-framework-matrix의 GitHub에 미러된 정적 스냅샷.

FAQ

점수는 어떻게 할당됩니까?
Buzzi의 지정된 시니어 엔지니어가 이 페이지의 공개 루브릭을 사용하여 각 프레임워크를 각 축에서 평가합니다. 점수는 분기마다 검토됩니다; 우리는 공개 데이터셋에서 프레임워크별 last_reviewed 타임스탬프를 게시합니다.
벤더가 게재 위치에 대해 지불합니까?
아니요. 점수는 편집적이며 결코 판매되지 않습니다. 점수 변경 요청은 기술적 정당화와 함께 오픈 매트릭스 저장소에 공개 PR로 제출되어야 합니다.
어떤 프레임워크를 추적할지 어떻게 결정합니까?
10k+ 별을 가진 활성 GitHub 저장소 또는 Anthropic, Google, Microsoft, OpenAI 또는 LangChain의 지원을 받는 것. 우리는 분기마다 한 번 모멘텀과 프로덕션 사용을 기반으로 프레임워크를 추가하거나 은퇴시킵니다.
작업당 비용은 어떻게 계산됩니까?
estimated_tokens_per_task = base_task_tokens × framework_overhead_multiplier × (1 + (역할 − 1) × 0.3) × (HITL인 경우 1.2, 그렇지 않으면 1.0). 토큰 요금은 우리의 llm_models 테이블에서 옵니다. 사용자는 마법사에서 모델을 재정의할 수 있습니다.
하드 제약은 어떻게 작동합니까?
.NET 스택은 Microsoft Semantic Kernel로 좁혀집니다. Java는 Semantic Kernel 또는 Google ADK로 좁혀집니다. 컴플라이언스 등급 관찰성을 가진 TypeScript는 LangGraph.js, OpenAI Agents SDK 또는 Anthropic Claude SDK로 좁혀집니다. 자격이 박탈된 프레임워크는 이유와 함께 표시됩니다.
수정 요청은 어디에서 제출합니까?
buzzi-ai/agent-framework-matrix 저장소에 풀 리퀘스트를 열거나 research@buzzi.ai로 이메일을 보내세요. 수정 요청은 10영업일 이내에 검토합니다.

동의하지 않는 점수를 발견하셨나요?

오픈 매트릭스 저장소에 PR을 열거나 research@buzzi.ai로 이메일을 보내세요. 모든 수정 요청은 10영업일 이내에 공개 응답을 받습니다.

셀렉터로 돌아가기

How we score multi-agent frameworks.

10 frameworks, neutral cards

LangGraph

CrewAI

AutoGen / AG2

OpenAI Agents SDK

Pydantic AI

Anthropic Claude Agent SDK

Google Agent Development Kit

Microsoft Semantic Kernel

LlamaIndex Agents

Haystack

15 capability axes, with rubrics

Sequential workflows

Parallel workflows

Hierarchical workflows

Adaptive workflows

State management

Human-in-the-loop

Python support

TypeScript support

.NET / Java support

MCP support

A2A support

Observability

Deployment flexibility

Maturity

Learning curve (higher = easier)

Scoring formula

Glossary

Public dataset

FAQ

점수는 어떻게 할당됩니까?

벤더가 게재 위치에 대해 지불합니까?

어떤 프레임워크를 추적할지 어떻게 결정합니까?

작업당 비용은 어떻게 계산됩니까?

하드 제약은 어떻게 작동합니까?

수정 요청은 어디에서 제출합니까?

동의하지 않는 점수를 발견하셨나요?