Text-to-3D World Generation ProcessThere are three key steps:
1. Preparation of cues
Use simple statements to describe the scene, e.g. "A medieval castle in the sunshine, surrounded by a moat". Avoid complex modifiers and support both English and Chinese prompts.
2. Generate panoramic images
Run the core generation command:python3 demo_panogen.py --prompt "阳光下的中世纪城堡" --output_path test_results/castle
The generated panorama will be saved in the panorama.png file in the specified directory.
3. Creating 3D scenes
Generation of 3D models with semantic layering through panoramas:CUDA_VISIBLE_DEVICES=0 python3 demo_scenegen.py --image_path test_results/castle/panorama.png --labels_fg1 castle --labels_fg2 river --classes outdoor --output_path test_results/castle
The -labels_fg1/2 parameter specifies the foreground objects to be layered (e.g., castles/rivers), and -classes distinguishes between indoor and outdoor scenes.
The completed 3D model can be accessed via theThree ways to use: Open modelviewer.html in browser to preview; export .obj/.glb to Blender for editing; or import directly into Unity/Unreal engine. The whole process takes about 30-60 minutes from cue word to interactable scene (A100 graphics card).
This answer comes from the articleHunyuanWorld-1.0: Generating Interactive 360° 3D Worlds from Text or ImagesThe
































