gen4dAided by text-to-image and text-to-video diffusion models, existing 4D content creation pipes utilize score distillation sampling to optimize the entire dynamic 3D scene.We present a novel 4D content generation framework, Diffusion4D, that, for the first time, adapts video diffusion models for explicit synthesis of spatial-temporal consistent novel views of 4D