AIツール

Arena Agent Mode レビュー

Arena Agent Modeは、AIモデルの実際の評価とランキングのためのコミュニティ主導型プラットフォームであり、ユーザーはAIモデルとチャットし、比較し、投票することができます。

shipped 2026年6月5日aifreemium

aiproduct-hunt

注目ポイント

1エージェントAIのパフォーマンスをベンチマークするために、2026年6月4日にAgent ModeとAgent Arenaリーダーボードをローンチしました。

2フリーミアム価格モデルを提供しており、Pro Tierは月額20ドルで利用可能です。

3GPT-5.5、Claude Opus 4.7、Gemini 3.1 Proなどの最先端AIモデルを、複雑な多段階タスクで評価します。

4Initialized Capitalを含む投資家から2億5000万ドルの資金を確保し、Unicornステータスを達成しました。

Arena Agent Mode について

ビジネスモデル

Freemium SaaS

本社

San Francisco, USA

設立

2022

チーム規模

51-100

資金調達

Unicorn

累計調達額

$250 million

プラットフォーム

Web, Mobile

対象ユーザー

AI researchers, developers, and businesses

料金プラン

Free Tier

Free

• Access to basic features
• Limited model comparisons

Pro Tier

$20/mo

• Unlimited model comparisons
• Advanced analytics
• Priority support

経営陣

Amit KumarCo-FounderLinkedIn

Michael SiebelCo-FounderLinkedIn

Paul O'ConnorCo-FounderLinkedIn

投資家

Initialized Capital, Felicis Ventures, Founders Fund

overview

Arena Agent Modeとは？

Arena Agent Modeは、Arena（旧LMArena）が開発したAI評価プラットフォームツールであり、AI愛好家、研究者、企業が、複雑な多段階タスクにおける最先端の大規模言語モデル（LLM）のパフォーマンスをベンチマークすることを可能にします。テキスト、コード、画像生成を含む様々なモダリティにわたるAIモデルの実際の評価とコミュニティ主導のランキングを促進します。2026年6月4日にローンチされたAgent Modeは、数百万のライブセッションでモデルがウェブ検索、ファイルシステム、bash、画像生成などのツールを利用できるようにすることで、エージェント性能を具体的に測定します。Arena AIは、透明性、セキュリティ、人間による監視の原則を遵守する責任あるAIポリシーに従い、EU AI ActやData Actなどの進化する規制に準拠しています。顧客データは保護され、顧客のインスタンス外でのモデルトレーニングには使用されず、AIリクエストの入力/出力は監査とパフォーマンスチューニングのためだけに記録されます。

features

Arena Agent Modeの主な機能

Arena Agent Modeは、AIモデルの厳格な評価と比較のために設計された包括的な機能スイートを提供し、実際のパフォーマンスとコミュニティの意見を重視しています。このプラットフォームの機能は、シンプルなチャットインターフェースを超え、高度なベンチマークとデータ駆動型の洞察を含んでいます。

複雑な多段階タスクにおける実世界でのAIモデル評価。
LLM、画像、コードモデルの公開リーダーボードを形成するコミュニティ主導のランキング。
バイアスを減らすためのブラインドバトルによるAIモデルの並列比較。
テキスト、コード、画像、ビデオ、ビジョン、ドキュメント、検索を含む複数のモダリティにわたる評価。
ウェブ検索、ファイルシステム、bash、画像生成などのツールを使用したエージェント性能の測定。
2026年4月2日にリリースされた、最先端のAI機能に対応するArena Leaderboard Datasetへのアクセス。
堅牢なガバナンスと法的レビューを備えたエンタープライズAI評価サービス。
AIモデルの出力とチャットし、投票するためのユーザーフレンドリーなインターフェース。
500万以上のコミュニティ投票によって動くArenaのモデルルーターであるMultimodal Maxは、2026年5月5日に導入されました。

use cases

Arena Agent Modeは誰が使うべきか？

Arena Agent Modeは、人工知能の開発、研究、応用に関わる多様なオーディエンス向けに設計されており、個人の探求と企業レベルの評価の両方にツールを提供します。

AI愛好家＆研究者: コミュニティ主導のリーダーボードにアクセスし貢献し、異なるモデルがどのように推論するかを探求するために。
開発者＆プロダクトチーム: モデルのベンチマーク、様々なモダリティにわたるAIモデルのパフォーマンス評価、および重要な変更の検証のために。
企業＆モデルラボ: 人間からのフィードバックに基づいたAI評価サービスを利用し、コンプライアンスを確保し、エージェントの有効性を最大化するために。
創業者＆インディーハッカー: 複数のAIモデルを比較して独立したソリューションを得ることで、ブレインストーミングやアイデア出しのために。
クリエイティブプロフェッショナル: 画像生成やその他のマルチモーダルAI機能を評価するために。

pricing

Arena Agent Modeの価格とプラン

Arena Agent Modeはフリーミアムビジネスモデルで運営されており、基本的なアクセスには無料ティアを、拡張された機能と使用には有料ティアを提供しています。このプラットフォームの価格体系は、個人ユーザーと、より広範な評価機能を必要とする大規模な組織の両方に対応するように設計されています。

Free Tier: 無料、コア機能、5つのスキーマ、3つのデータセット、1シートが含まれます。クレジットカードは不要です。
Pro Tier: 月額20ドル、強化された機能とリソースを提供します。
Starter Tier (LLM Benchmark Plans): 月額29ユーロ、月500クレジット、20のスキーマ、10のデータセット、5シートが含まれます。
Professional Tier (LLM Benchmark Plans): 月額99ユーロ、月2,000クレジット、無制限のスキーマとデータセット、無制限のシート、およびAPI/MCPアクセスを提供します。
Enterprise Tier (LLM Benchmark Plans): 月額299ユーロ、月10,000クレジット、無制限のスキーマとデータセット、および無制限のシートが含まれます。

類似ツール

Arena Agent Modeと競合他社

Arena Agent Modeは、AI評価プラットフォームの競争環境において、実世界でのコミュニティ主導の評価と、エージェントAIのパフォーマンスに特化した焦点を当てることで差別化を図っています。他のプラットフォームが比較ツールを提供する一方で、Arenaのエージェント性能ランキングのための独自の因果追跡手法は、明確な優位性を提供します。

Yupp↗

Yupp allows users to compare responses from over 500 AI models side-by-side and aggregates user preferences into a community-driven leaderboard called VIBE.

Similar to Arena Agent Mode, Yupp focuses on community-driven evaluation and side-by-side comparison of various AI models, including LLMs and image generation models, with a public leaderboard reflecting user preferences. Yupp also offers a unique DePIN model where users can receive credits for their feedback.

SEAL Showdown (by Scale AI)↗

SEAL Showdown provides a public leaderboard built on millions of real-world conversations and human preferences from a diverse global user base, offering demographically segmented insights.

Like Arena Agent Mode, SEAL Showdown emphasizes real-world evaluation and community feedback to rank AI models, but it distinguishes itself by focusing on representative rankings from a global user base with demographic segmentation.

CodeLens.AI↗

CodeLens.AI specializes in comparing how multiple top LLMs handle actual code tasks, featuring side-by-side comparisons and community voting on winners to shape its leaderboard.

CodeLens.AI is a direct competitor for the 'code models' aspect of Arena Agent Mode, offering a similar community-driven comparison and voting mechanism specifically tailored for evaluating AI models on coding tasks.

Sneos.com↗

Sneos.com is a multi-chat AI platform that enables instant side-by-side comparisons of responses from various LLMs to a single prompt, with shareable URLs for research and collaboration.

While Sneos.com offers direct side-by-side comparison of AI model outputs similar to Arena Agent Mode, its primary emphasis is on facilitating individual or collaborative research and decision-making through shareable comparisons, rather than a community-voted public leaderboard.

Arena Agent Mode を訪問↗