Is Step 3.7 Flash free?

Step 3.7 Flash operates on a freemium model, offering a free tier. For usage beyond the free tier, it is usage-based, with input tokens priced at $0.00020 per 1k tokens and output tokens at $0.00115 per 1k tokens.

What are the main features of Step 3.7 Flash?

Key features of Step 3.7 Flash include its 198-billion-parameter sparse MoE architecture, native image and video understanding via a 1.8B-parameter vision encoder, a 256k context window, three selectable reasoning levels, and reliable interaction with external APIs and tools. It also supports NVIDIA inference stacks and offers an Advisor Mode for cost-efficient agentic operations.

How does Step 3.7 Flash compare to alternatives?

Step 3.7 Flash distinguishes itself with native multimodal support (images and video), outperforming competitors like DeepSeek V4 Flash in this aspect. It demonstrates strong coding performance, scoring 56.3 on SWE-Bench PRO, and leads the ClawEval-1.1 benchmark for tool orchestration. Its Advisor Mode offers a cost-effective alternative to models like Claude Opus 4.6 for similar performance levels.

AI Tool

Step 3.7 Flash Review

Name: Step 3.7 Flash
Availability: OnlineOnly
Author: Stork.AI

Step 3.7 Flash is a high-efficiency, multimodal Mixture-of-Experts (MoE) vision-language model designed for real-world agentic workflows, developed by StepFun.

shipped May 31, 2026aifreemium

aiproduct-hunt

Step 3.7 Flash - AI tool for step flash. Professional illustration showing core functionality and features.

Why it matters

1Released on May 28, 2026, Step 3.7 Flash is a 198-billion-parameter sparse MoE model.

2It features a 256k context window and activates approximately 11 billion parameters per token during inference.

3The model achieved a second-place finish on SWE-Bench PRO with a score of 56.3.

4Step 3.7 Flash leads the ClawEval-1.1 benchmark with a score of 67.1 for workflow integrity and tool orchestration.

Stork’s verdict on Step 3.7 Flash

Step 3.7 Flash delivers high-efficiency multimodal agents with massive context, but its 198B-parameter scale implies significant deployment overhead.

Step 3.7 Flash reviewed by Stork AI · stork.ai/en/step-3-7-flash

About Step 3.7 Flash

Founded

2023

Specs

API Docs

View Documentation →

API Available

Yes, public API

overview

What is Step 3.7 Flash?

Step 3.7 Flash is a high-efficiency, multimodal Mixture-of-Experts (MoE) vision-language model developed by StepFun that enables AI Developers and Enterprise users to build and deploy advanced AI agents. It provides advanced perception, search, and reasoning capabilities at production scale for agentic workflows. This 198-billion-parameter sparse MoE model, released on May 28, 2026, activates approximately 11 billion parameters per token during inference, ensuring high throughput. It integrates a 196B-parameter language backbone with a 1.8B-parameter vision encoder, facilitating native image and video understanding. The model supports a substantial 256k context window and offers three selectable reasoning levels (low, medium, and high) to balance speed, cost, and cognitive depth. Its primary function is to support agentic workflows requiring multimodal perception, search, and multi-step reasoning across various digital environments.

features

Key Features of Step 3.7 Flash

Step 3.7 Flash incorporates a suite of technical features designed for high-performance agentic AI applications, leveraging a multimodal Mixture-of-Experts architecture. These capabilities enable advanced perception, reasoning, and action across diverse data types and operational environments.

198-billion-parameter sparse Mixture-of-Experts (MoE) model, activating approximately 11 billion parameters per token.
Native image and video understanding via an integrated 1.8B-parameter vision encoder.
Supports a 256k context window for extensive information processing.
Offers three selectable reasoning levels (low, medium, high) to optimize for speed, cost, or cognitive depth.
Reliable interaction with external APIs, browsers, terminals, and Office tools for complex task execution.
Open-source availability under the Apache 2.0 License on platforms like Hugging Face and ModelScope.
Full inference stack support from NVIDIA, including availability as an NVIDIA NIM inference microservice.
Advisor Mode functionality, allowing a smaller executor model to escalate complex tasks to a larger advisor model for cost efficiency.

use cases

Who Should Use Step 3.7 Flash?

Step 3.7 Flash is engineered for professionals and organizations requiring advanced multimodal AI capabilities for agentic workflows, particularly those focused on automation, complex data interpretation, and application development.

AI Developers: For building and deploying next-generation AI applications, including multimodal agents with reliable tool use and orchestration.
Enterprise Users: For parsing massive financial reports, running multi-step search loops with cross-source verification, and operating concurrent coding agents in high-throughput pipelines.
Engineers/Researchers: For agentic coding, independently tracing multi-file repositories, identifying bugs from issue reports, and generating functional code patches.
Content Creators: For applications requiring text-to-speech, voice cloning, creative writing, and advanced language learning functionalities.
Individuals Seeking Personal AI Assistance: For knowledge acquisition, information finding, and general multimodal interaction.

pricing

Step 3.7 Flash Pricing & Plans

Step 3.7 Flash operates on a freemium and usage-based pricing model, allowing users to access a free tier before incurring costs based on token consumption. Specific rate limits are applied to concurrency, requests per minute (RPM), and tokens per minute (TPM), with a request timeout of 10 minutes. Users requiring higher limits can contact platform@stepfun.com.

Freemium: A free tier is available for initial access and limited usage.
Step 1 (32K): Input: $0.00205 per 1k tokens, Output: $0.00959 per 1k tokens.
Step 3.5 Flash: Input: $0.000096 per 1k tokens, Output: $0.000288 per 1k tokens.
Step 3.5 Flash 2603: Input: $0.000100 per 1k tokens, Output: $0.000300 per 1k tokens.
Step 3.7 Flash: Input: $0.00020 per 1k tokens, Output: $0.00115 per 1k tokens.

Similar Tools

Step 3.7 Flash vs Competitors

Step 3.7 Flash is positioned as a leading multimodal agentic model, competing in the 'Flash' model market against established and emerging AI solutions. Its strengths lie in native multimodal perception, robust tool orchestration, and competitive performance in coding and visual intelligence benchmarks.

Google Gemini (as an agent)↗

Gemini is a multimodal AI model capable of understanding and operating across various data types, including images, video, and text, enabling sophisticated reasoning and direct UI control.

Similar to Step 3.7 Flash, Gemini offers real-time perception and action capabilities, particularly strong in multimodal understanding and complex decision-making. Its freemium access is typically via API for developers, allowing for the creation of custom agents.

AskUI Vision Agent↗

AskUI Vision Agent specializes in automating desktop and mobile workflows by visually understanding and interacting with graphical user interfaces at the operating system level.

This is a direct competitor focusing on the 'see and act' aspect for digital interfaces, translating visual data into low-level commands. Its specialization in GUI automation provides a focused alternative to a general 'flash-speed' agent model.

SkygenOn Stork Compare

Skygen is an AI desktop automation agent that provides real-time visibility and runs tasks across various applications, websites, and cloud computers.

Skygen aligns closely with Step 3.7 Flash's description of a 'flash-speed agent model that can see and act' within digital environments, emphasizing real-time operation and broad application interaction. It offers a freemium model, similar to the described pricing of Step 3.7 Flash.

OpenAI OperatorOn Stork Compare

OpenAI Operator is designed to execute multi-step actions directly within a web browser, enabling autonomous completion of complex web tasks.

While its pricing is listed as a paid 'Pro' tier rather than freemium, OpenAI Operator offers a direct functional comparison by focusing on agents that 'see' (perceive web interfaces) and 'act' (perform tasks) at speed within a browser environment.

Agno AI Agents↗

Agno AI Agents is a framework built for performance, enabling the creation of lightning-fast, production-ready AI agents with minimal startup times and a tiny footprint.

Agno directly addresses the 'flash-speed' aspect, offering a framework to build agents that are exceptionally fast and efficient. While its 'see' capability is more about perceiving digital states for action rather than explicit visual recognition, its emphasis on rapid, production-grade agent deployment makes it a strong competitor for high-performance autonomous tasks.

See every Step 3.7 Flash alternative, compared→

Visit Step 3.7 Flash↗

AI Reputation Report

Is Step 3.7 Flash yours?

ChatGPT, Perplexity, Gemini, Claude & Grok answer buyer questions about Step 3.7 Flash every day. See whether they name Step 3.7 Flash — or send buyers to a rival.

See what AI saysfree preview