Methodology · matrix version 2026-04-01
How we score multi-agent frameworks.
Три обязательства формируют каждую оценку на этой странице: никакой платной размещения вендоров, никаких угаданных оценок и никакого demoware. Оценки исходят от названного старшего инженера Buzzi, применяющего публичные рубрики ниже; каждая страница фреймворка также несёт временную метку last_verified_at для аудита свежести.
10 frameworks, neutral cards
LangGraph
×1.0 overheadLangChain · MIT · primary python
CrewAI
×1.3 overheadCrewAI · MIT · primary python
AutoGen / AG2
×2.5 overheadMicrosoft / AG2 community · CC-BY-4.0 / Apache-2.0 · primary python
OpenAI Agents SDK
×1.1 overheadOpenAI · MIT · primary python
Pydantic AI
×1.0 overheadPydantic · MIT · primary python
Anthropic Claude Agent SDK
×1.1 overheadAnthropic · MIT · primary python
Google Agent Development Kit
×1.2 overheadGoogle · Apache-2.0 · primary python
Microsoft Semantic Kernel
×1.2 overheadMicrosoft · MIT · primary multi
LlamaIndex Agents
×1.4 overheadLlamaIndex · MIT · primary python
Haystack
×1.3 overheaddeepset · Apache-2.0 · primary python
15 capability axes, with rubrics
Sequential workflows
Pipeline-style chains where one agent finishes before the next starts.
- 10 / 10
- Pipelines are a first-class primitive with explicit ordering and typed handoff.
- 5 / 10
- Sequential chains are possible via orchestration code but not a native primitive.
- 0 / 10
- Framework cannot guarantee deterministic sequential ordering.
Parallel workflows
Concurrent fan-out / fan-in across multiple agents.
- 10 / 10
- Native parallel execution with built-in result merging and back-pressure.
- 5 / 10
- Parallel execution requires custom asyncio / threading code on top.
- 0 / 10
- No support for concurrent agent execution.
Hierarchical workflows
Supervisor-and-worker patterns with delegation and aggregation.
- 10 / 10
- Supervisor pattern is documented, idiomatic, and replayable.
- 5 / 10
- Achievable but requires hand-rolled message routing.
- 0 / 10
- No first-class supervisor primitive.
Adaptive workflows
Dynamic routing where agents pick the next step based on intermediate state.
- 10 / 10
- Router/handoff primitives are first-class with conditional edges.
- 5 / 10
- Possible via tool calls but not the framework's sweet spot.
- 0 / 10
- Control flow is rigid; no dynamic routing.
State management
Persistent, typed memory across runs and across agents.
- 10 / 10
- Typed state schema, persistent checkpoints, replay support.
- 5 / 10
- Session memory is supported; persistence requires external store.
- 0 / 10
- Stateless by default; users must build persistence themselves.
Human-in-the-loop
Pause-resume primitives so humans can approve, edit, or reject actions.
- 10 / 10
- Native interrupt/resume with serialisable checkpoints.
- 5 / 10
- Approval gates can be bolted on; not a first-class primitive.
- 0 / 10
- No interrupt mechanism — the framework runs to completion.
Python support
Production-grade Python SDK with active maintenance.
- 10 / 10
- Reference implementation; active releases; complete typing.
- 5 / 10
- Functional Python SDK lagging the primary language.
- 0 / 10
- No Python SDK.
TypeScript support
Production-grade TypeScript / Node SDK at parity with Python.
- 10 / 10
- First-class TS SDK with parity to Python in features and types.
- 5 / 10
- TS SDK exists but trails Python in feature coverage.
- 0 / 10
- No TS SDK.
.NET / Java support
First-class JVM (Java/Kotlin) and/or .NET SDK.
- 10 / 10
- Reference-quality .NET and/or Java SDK with feature parity.
- 5 / 10
- Community port or partial SDK.
- 0 / 10
- No .NET or Java SDK.
MCP support
Native Model Context Protocol client and/or server primitives.
- 10 / 10
- Authored or reference implementation of MCP.
- 5 / 10
- MCP available as an adapter or community plugin.
- 0 / 10
- No MCP support.
A2A support
Native Agent-to-Agent (Google) protocol primitives.
- 10 / 10
- Authored or reference implementation of A2A.
- 5 / 10
- A2A available via adapter; partial coverage.
- 0 / 10
- No A2A support.
Observability
Tracing, token accounting, replay, and audit-grade logs.
- 10 / 10
- Built-in tracing dashboard, structured token accounting, replay, exportable audit log.
- 5 / 10
- OpenTelemetry hooks exist; user must wire dashboards themselves.
- 0 / 10
- Print-statement debugging only.
Deployment flexibility
Range of supported deployment targets (cloud, on-prem, edge).
- 10 / 10
- Cloud, on-prem, and edge all documented and tested.
- 5 / 10
- Cloud-first; on-prem requires extra work.
- 0 / 10
- Tied to a single hosted backend.
Maturity
Production track record, release cadence, community size.
- 10 / 10
- 2+ years of production use across many large deployments.
- 5 / 10
- 6-18 months in the wild; growing but evolving rapidly.
- 0 / 10
- Pre-1.0; APIs change every release.
Learning curve (higher = easier)
Time-to-prototype for a developer new to the framework.
- 10 / 10
- A working prototype in under 30 minutes from a clean machine.
- 5 / 10
- Prototype in half a day with the docs open.
- 0 / 10
- Multi-week onboarding before the first useful run.
Scoring formula
# Ranking
weights = buildWeightVector(inputs) # 15 weights per user input
for fw in frameworks:
score = sum(fw.capabilities[cap] * weights[cap] for cap in CAPS)
if hardConstraintFails(inputs, fw):
score = 0
return sortDesc(scored)
# Cost per task
estimated_tokens_per_task = base_task_tokens
* framework_overhead_multiplier
* (1 + (roles - 1) * 0.3)
* (1.2 if hitl else 1.0)
per_task_usd = (0.7 * tokens / 1M * input_rate)
+ (0.3 * tokens / 1M * output_rate)Glossary
- Hierarchical
- A supervisor agent delegates work to sub-agents, reviews their output, and composes the final answer. Good for multi-stage tasks with clear ownership.
- Adaptive
- Agents decide dynamically which other agents or tools to invoke based on intermediate results. Best when the control flow cannot be fixed upfront.
- Agent
- A named role with its own prompt, tools, and memory. "Roles" counts unique agent identities, not the number of LLM calls.
- HITL (Human-in-the-Loop)
- The workflow pauses for a human to approve, edit, or reject an agent action before continuing. Critical for regulated or high-risk automations.
- MCP (Model Context Protocol)
- Anthropic-led open standard for connecting LLM agents to tools, data, and other servers. Look for MCP support if you want vendor-portable tool integrations.
- A2A (Agent-to-Agent Protocol)
- Google-led open standard for agents from different vendors to discover and call each other. Emerging spec; relevant for federated agent systems.
- Observability
- Structured traces, token accounting, replayable runs, and exportable audit logs. "Regulated-grade" means immutable audit trails and retention controls.
Public dataset
Полная матрица возможностей публикуется в формате JSON для ИИ-движков и исследователей:
- /api/tools/agent-framework/frameworks.json — в реальном времени, кэшировано на edge, CORS открыт.
- /data/agent-frameworks-matrix.json — статический снимок, отзеркаленный на GitHub в
buzzi-ai/agent-framework-matrix.
FAQ
Как назначаются оценки?
Названный старший инженер Buzzi оценивает каждый фреймворк по каждой оси, используя публичные рубрики на этой странице. Оценки пересматриваются ежеквартально; мы публикуем временную метку last_reviewed на фреймворк в публичном датасете.
Платят ли вендоры за размещение?
Нет. Оценки редакторские и никогда не продаются. Запросы на изменение оценок должны подаваться как публичные PR в открытый репозиторий матрицы с техническим обоснованием.
Как вы решаете, какие фреймворки отслеживать?
Активные репозитории GitHub с более чем 10 тыс. звёзд или поддерживаемые Anthropic, Google, Microsoft, OpenAI или LangChain. Мы добавляем или удаляем фреймворки раз в квартал на основе импульса и использования в продакшене.
Как рассчитывается стоимость задачи?
estimated_tokens_per_task = base_task_tokens × framework_overhead_multiplier × (1 + (роли − 1) × 0,3) × (1,2 если HITL иначе 1,0). Тарифы токенов берутся из нашей таблицы llm_models; пользователи могут переопределить модель в мастере.
Как работают жёсткие ограничения?
.NET-стек сужает до Microsoft Semantic Kernel. Java сужает до Semantic Kernel или Google ADK. TypeScript с наблюдаемостью compliance-уровня сужает до LangGraph.js, OpenAI Agents SDK или Anthropic Claude SDK. Дисквалифицированные фреймворки показываются с причиной.
Где я могу подать исправление?
Откройте pull request к репозиторию buzzi-ai/agent-framework-matrix или напишите на research@buzzi.ai. Мы рассматриваем запросы на исправление в течение 10 рабочих дней.
Нашли оценку, с которой не согласны?
Откройте PR в открытом репозитории матрицы или напишите на research@buzzi.ai. Все запросы на исправление получают публичный ответ в течение 10 рабочих дней.
Назад к селектору