overview
What is Step 3.7 Flash?
Step 3.7 Flash is a high-efficiency, multimodal Mixture-of-Experts (MoE) vision-language model developed by StepFun that enables AI Developers and Enterprise users to build and deploy advanced AI agents. It provides advanced perception, search, and reasoning capabilities at production scale for agentic workflows. This 198-billion-parameter sparse MoE model, released on May 28, 2026, activates approximately 11 billion parameters per token during inference, ensuring high throughput. It integrates a 196B-parameter language backbone with a 1.8B-parameter vision encoder, facilitating native image and video understanding. The model supports a substantial 256k context window and offers three selectable reasoning levels (low, medium, and high) to balance speed, cost, and cognitive depth. Its primary function is to support agentic workflows requiring multimodal perception, search, and multi-step reasoning across various digital environments.