Abstract:
Anticipating lane changes of surrounding vehicles is paramount for the safe and
efficient operation of autonomous vehicles. Previous works employ the usage of physical
variables which do not contain contextual information. Recent methodologies relied on action
recognition models such as 3D CNNs and RNNs, thereby dealing with complex architecture.
Albeit the advent of transformers into action recognition, there are limited works employing
transformer architectures. This research addresses the critical challenge of Lane Change
Prediction (LCP) for autonomous vehicles, employing Video Action Prediction with a focus
on the integration of ViViT (Video Vision Transformers). Utilizing the PREVENTION
dataset, which provides detailed annotations of vehicle trajectories and critical events, the
proposed approach outperforms prior methods, achieving over 85% test accuracy in
predicting lane changes with a horizon of 1 second. Comparative analyses underscore ViViT's
superiority in capturing spatio-temporal dependencies in video data while requiring fewer
parameters, enhancing computational efficiency. This research contributes to advancing
autonomous driving technology by showcasing ViViT's efficacy in real-world applications
and advocating for its further exploration in enhancing vehicle safety and efficiency.