The recommended hardware configuration for Step3 is 4 A800/H800 GPUs with 80GB of video memory for optimal performance. However, it also supports running in a single GPU environment, but inference will be relatively slow.
Model weights are provided in both bf16 and block-fp8 formats, the latter of which reduces the video memory requirements and allows the model to run on resource-limited hardware. Developers can choose the appropriate weight format according to their hardware conditions.
For production deployments, a multi-GPU configuration is recommended for better throughput and responsiveness. For development or testing purposes, a single GPU environment can also fulfill the basic needs.
This answer comes from the articleStep3: Efficient generation of open source big models for multimodal contentThe

































