Technical details of multimodal trajectory generation
The generative planner in the Orion framework uses a probabilistic diffusion model architecture to simultaneously output 6-8 candidate trajectories and their confidence scores. The core technical innovation point is:
- Linguistic instruction-based conditional generation: encode natural language instructions (e.g., "overtake in the left lane") output by LLM into 128-dimensional conditional vectors.
- Physical constraint embedding: hard-coding of vehicle dynamics parameters (maximum steering angle 30°, acceleration 2.0 m/s², etc.) during the generation process
- Multi-objective optimization: simultaneous consideration of the three dimensions of comfort, safety and access efficiency
The trajectory output from the planner contains the complete motion state (position, velocity, facing angle) within a 5-second time window, updated at a frequency of 10Hz. In CARLA simulation tests, the planner improves the comfort score of the lane change maneuver by 35%, and the emergency braking distance error is controlled within ±0.3 meters. Users can adjust the generation parameters by modifying the planner.yaml configuration file, e.g., setting trajectory_mode to "multimodal" to activate the parallel trajectory generation mode.
This answer comes from the articleOrion: Xiaomi's Open Source End-to-End Autonomous Driving Reasoning and Planning FrameworkThe































