As Embodied AI systems, such as robots, autonomous vehicles, and augmented reality agents, become increasingly integrated into real-world applications, their ability to accurately perceive and interpret visual data is crucial. High-quality image perception directly impacts decision-making, navigation, and interaction with the environment. However, image quality in these systems is often subject to diverse challenges, including varying lighting conditions, motion blur, occlusions, and sensor limitations.