EXPDF: Adversarially Robust Spatiotemporal Deepfake Detection Using Temporal-Aware Attention Fusion And Evolutionary Hyperparameter Optimization

Kanchan Warkar; Rishikesh Rawat; Ghizal F. Ansari; Sudhir Mohod

doi:10.4238/f0svdv06

Authors

Kanchan Warkar Author
Rishikesh Rawat Author
Ghizal F. Ansari Author
Sudhir Mohod Author

DOI:

https://doi.org/10.4238/f0svdv06

Keywords:

Adversarial training, Deepfake detection, Explainable artificial intelligence, Hyperparameter optimization, Spatiotemporal learning, Temporal attention fusion.

Abstract

Deepfake videos generated using advanced artificial intelligence techniques have become increasingly difficult to distinguish from authentic content, creating major challenges for digital media authentication as well as multimedia forensics. Although recent progress has improved deepfake detection, many existing approaches still show limited cross-domain generalization, vulnerability to adversarial perturbations, and reduced reliability on unseen datasets. In addition, several detectors depend heavily on dataset-specific artifacts, which cause performance degradation under real-world conditions and raise concerns regarding reproducibility as well as practical deployment. To address these limitations, this study proposes an adversarially robust spatiotemporal deepfake detection framework that integrates Xception-based spatial feature extraction, bidirectional Long Short-Term Memory (BiLSTM) temporal modeling, and a Temporal-Aware Attention Fusion (TAF) module for adaptive feature aggregation. Robustness is improved through Fast Gradient Sign Method (FGSM), and Projected Gradient Descent (PGD) adversarial training, while the hybrid Marine Predators Algorithm–Genetic Algorithm (MPA–GA) strategy is used for automated hyperparameter optimization. The framework was evaluated on four benchmark datasets, including FaceForensics++ (FF++), Celeb-DF v2, DFDC, and WildDeepfake. Experimental results obtained from ten independent runs demonstrated accuracies of 98.2 ± 0.3%, 97.1 ± 0.4%, 95.8 ± 0.5%, and 93.9 ± 0.7% on FF++, Celeb-DF v2, DFDC, and WildDeepfake, respectively. During cross-dataset evaluation, the framework achieved 91.6 ± 0.6% accuracy when trained on FF++ and tested on Celeb-DF v2, which indicates strong transferability across unseen distributions. Under PGD adversarial attacks, the proposed model retained 86.2 ± 0.8% accuracy, demonstrating improved perturbation resilience. Additionally, Grad-CAM as well as SHAP analyses provided interpretable forensic evidence supporting model decisions. The proposed framework establishes a reproducible, robust, and explainable deepfake detection pipeline suitable for practical multimedia forensic applications.

EXPDF: Adversarially Robust Spatiotemporal Deepfake Detection Using Temporal-Aware Attention Fusion And Evolutionary Hyperparameter Optimization

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Most read articles by the same author(s)

Similar Articles

Make a Submission

side

INDEXING

right

Language