Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes

I. Kukanov, J. Laakkonen, T. Kinnunen, V. Hautamäki

IEEE Spoken Language Technology Workshop (SLT) 2024, Macau, China

Paper Poster

Listen Paper Podcast

Problem Statement

Take Home Message

  • ProtoMAML and ProtoNET adapt to new attacks/domains with limited number of samples
  • ProtoMAML demonstrates superior adaptability compared to ProtoNet, but demands higher computational resources
  • Few-Shot Adaptation keeps system up-to-date to new deepfakes
Descriptive Alt Text

Solutions

Meta Learning

Batch Tasks

A. Prototypical Network - ProtoNet

Train Prototype Representation

Evaluation: Nearest to a Prototype

ProtoNet Train and Eval

B. Optimization-based adaptation - ProtoMAML

Train: Optimize for Each Task

ProtoMAML Train

Evaluation: Adapt and Test on Query

ProtoMAML Eval

Results

Baseline architecture

Baseline Network Architecture

Summary. Baseline / ProtoNET / ProtoMAML. EER, %

Summary Results

ProtoNet Adaptation with 2 - 256 Shots

ProtoNet Number of Shots Effect

ProtoMAML Adaptation with 2 - 96 Shots

ProtoMAML Number of Shots Effect

ProtoMAML: Effect of Adaptation Steps

ProtoMAML Number of Steps Effect

Recommendations

  • Use Meta-learning for continuous adaptation and keep deepfake detection systems up-to-date
  • Use ProtoNet when have resource constraints
  • Use ProtoMAML for more accurate adaptation

Conclusions

BibTex

@article{MetalearningDeepfake2024,
    title={Meta-Learning Approaches for Improving Detection of Unseen Speech Deepfakes}, 
    author={Ivan Kukanov and Janne Laakkonen and Tomi Kinnunen and Ville Hautamäki},
    year={2024},
    eprint={2410.20578},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2410.20578} 
}