//
🙎🏻
Trusted Media Challenge
Search
Try Notion
I have led the ROSE Lab Team winning 9th Place in Trusted Media Challenge
Fake media is an existential threat to societies today. If left unchecked, it risks becoming a serious national security concern. AI Singapore is launching the Trusted Media Challenge to test solutions and explore how Artificial Intelligence (AI) technologies can be leveraged to combat fake media. This challenge will focus on the detection of audiovisual counterfeit media, where both video and audio modalities may be modified.
Our team categorized the main problem into 3 sub-problems and developed 3 individual models:
Deepfake Detection
Audio and Voice Forgery Detection
Audio Swap Detection
1. Deepfake Detection
Deepfakes use deep learning artificial intelligence to replace the likeness of one person with another in video and other digital media. Our deepfake model uses EfficientNet as a backbone classifier to differentiate the natural face and deepfake face. The result is shown in the video below. [Youtube] [Bilibili]
2. Audio and Voice Forgery Detection
Voice forgery problem is also presented in this competition. The AI model analysis the voice characteristics of the target person and manipulating the original voice to make it sound like the target person. We covert the voice signal into MEL and MFCC to detect any presence of the tempering or forgery of the voice.
3. Audio Swap Detection
Another problem we need to detect in this competition is the audio swap. The organization randomly swap the audio of two videos. Our system also needs to be able to detect this problem. To address this issue, we focus on analyzing the consistency of the voice signal and its corresponding lip motion.
We use a 20 frames sliding window to capture both lip motions and audio signals. The refined SyncNet model will give use the confidence score for the synchronization, distance score and also an estimate offset time.
Here is the demo video for the SyncNet model running on the videos used in the challenge [Youtube] [Bilibili]
ROSE Overall System
Here is the overall system design we developed for this competition. We combined 3 individual model results (1. Deepfake Detection, 2. Audio Forgery Detection and 3. Lip Audio Sync Detector) and returned an overall confidence score.
Overall, we rank number 9 in the Final Run from 475 total teams from both academics and industries. Our system performance is on par with leading industrial AI labs such as Shopee, Alibaba, Ant Finance, and SenseTime.