This project tackles human-centered long video understanding using multimodal sensing (e.g., eye gaze, audio) to enable robots to support manufacturing, healthcare, and education. The team is advancing vision-language models, adaptive displays, AI accelerators, and real-time feedback systems for task guidance, safety, and experiential learning, with applications ranging from surgical assistance to classroom robotics.













