HunyuanCustom is a multimodal, conditional, and controllable video generation model focused on subject consistency. It accepts text, image, audio, and video inputs for flexible, user-defined video creation.
- Multimodal Input
- Identity Consistency
- LLaVA-based Fusion
- AudioNet & Video Injection