Posts by Collection

portfolio

publications

Unsupervised labelling of stolen handwritten digit embeddings with density matching

Published in International Conference on Applied Cryptography and Network Security, 2020

Biometrics authentication is now widely deployed, and from that omnipresence comes the necessity to protect private data. Recent studies proved touchscreen handwritten digits to be a reliable biometrics. We set a threat model based on that biometrics: in the event of theft of unlabelled embeddings of handwritten digits, we propose a labelling method inspired by recent unsupervised translation algorithms. Provided a set of unlabelled embeddings known to have been produced by a Long Short Term Memory Recurrent Neural Network (LSTM RNN), we demonstrate that inferring their labels is possible. The proposed approach involves label-wise clustering of the embeddings and label identification of each group by matching their distribution to the label-relative classes of a comparison hand-crafted labeled set of embeddings.

Download here

Handwritten digits reconstruction from unlabelled embeddings

Published in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

In this paper, we investigate template reconstruction attack of touchscreen biometrics, based on handwritten digits writer verification. In the event of a template database theft, we show that reconstructing the original drawn digit from the embeddings is possible without access to the original embedding encoder. Using an external labelled dataset, an attack encoder is trained along with a Mixture Density Recurrent Neural Network decoder. Thanks to an alignment flow, initialized with Linear Discriminant Analysis and Procrustes, the transfer function between the output space of the original and the attack encoder is estimated. The successive application of transfer function and decoder to the stolen embeddings allows to reconstruct the original drawings, which can be used to spoof the behavioural biometrics system.

Download here

Spoofing speaker verification with voice style transfer and reconstruction loss

Published in 2021 IEEE International Workshop on Information Forensics and Security (WIFS), 2021

In this paper we investigate a template reconstruction attack against a speaker verification system. A stolen speaker embedding is processed with a zero-shot voice-style transfer system to reconstruct a Mel-spectrogram containing as much speaker information as possible. We assume the attacker has a black box access to a state-of-the-art automatic speaker verification system. We modify the AutoVC voice-style transfer system to spoof the automatic speaker verification system. We find that integrating a new loss targeting embedding reconstruction and optimizing training hyper-parameters significantly improves spoofing. Results obtained for speaker verification are similar to other biometrics, such as handwritten digits or face verification. We show on standard corpora (VoxCeleb and VCTK) that the reconstructed Mel-spectrograms contain enough speaker characteristics to spoof the original authentication system.

Download here

On the invertibility of a voice privacy system using embedding alignment

Published in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021

This paper explores various attack scenarios on a voice anonymization system using embeddings alignment techniques. We use Wasserstein-Procrustes (an algorithm initially designed for unsupervised translation) or Procrustes analysis to match two sets of -vectors, before and after voice anonymization, to mimic this transformation as a rotation function. We compute the optimal rotation and compare the results of this approximation to the official Voice Privacy Challenge results. We show that a complex system like the baseline of the Voice Privacy Challenge can be approximated by a rotation, estimated using a limited set of -vectors. This paper studies the space of solutions for voice anonymization within the specific scope of rotations. Rotations being reversible, the proposed method can recover up to 62% of the speaker identities from anonymized embeddings.

Download here

Published in , 1900

JHU IWSLT 2023 Dialect Speech Translation System Description

Published in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), 2023

This paper presents JHU’s submissions to the IWSLT 2023 dialectal and low-resource track of Tunisian Arabic to English speech translation. The Tunisian dialect lacks formal orthography and abundant training data, making it challenging to develop effective speech translation (ST) systems. To address these challenges, we explore the integration of large pre-trained machine translation (MT) models, such as mBART and NLLB-200 in both end-to-end (E2E) and cascaded speech translation (ST) systems. We also improve the performance of automatic speech recognition (ASR) through the use of pseudo-labeling data augmentation and channel matching on telephone data. Finally, we combine our E2E and cascaded ST systems with Minimum Bayes-Risk decoding. Our combined system achieves a BLEU score of 21.6 and 19.1 on test2 and test3, respectively.

Download here

CLUSTERING UNSUPERVISED REPRESENTATIONS AS DEFENSE AGAINST POISONING ATTACKS ON SPEECH COMMANDS CLASSIFICATION SYSTEM

Published in Workshop on Automatic Speech Recognition and Understanding (ASRU 2023), 2023

Poisoning attacks entail attackers intentionally tampering with training data. In this paper, we consider a dirty-label poisoning attack scenario on a speech commands classifi- cation system. The threat model assumes that certain utter- ances from one of the classes (source class) are poisoned by superimposing a trigger on it, and its label is changed to another class selected by the attacker (target class). We propose a filtering defense against such an attack. First, we use DIstillation with NO labels (DINO) to learn unsupervised representations for all the training examples. Next, we use K-means and LDA to cluster these representations. Finally, we keep the utterances with the most repeated label in their cluster for training and discard the rest. For a 10% poisoned source class, we demonstrate a drop in attack success rate from 99.75% to 0.25%. We test our defense against a variety of threat models, including different target and source classes, as well as trigger variations.

Download here

talks

An Introduction to Voice Conversion

Published:

I gave an 1h15 talk about the bases of Voice Conversion, which was then followed by a 3h competitive lab on antispoofing techniques against various voice conversion and TTS systems, co-animated by Thibault Gaudier and Valentin Pelloin. This talk was targetting PhD student and grad students.

Adversarial and Poisoning attacks against speech systems: where to find them?

Published:

Abstract: The majority of today’s machine learning algorithms share common foundations and core concepts, rendering them susceptible to various attacks. In this short talk, I would like to dive into the world of adversarial attacks and poisoning attacks on speech systems. What are they, how dangerous are they, and what can be done against them?

Do you trust your data? A Journey through Adversarial and Poisoning Attacks and Defenses on Speech Systems.

Published:

Abstract: As the prevalence of voice-controlled devices and speech systems continues to grow, so too does the importance of ensuring their security and reliability. However, these systems are increasingly vulnerable to adversarial and poisoning attacks, which can exploit vulnerabilities and compromise their performance. In this talk, we delve into the intricate landscape of adversarial attacks targeting speech systems, presenting our research on detecting and classifying these attacks to better understand their nuances and impact. Furthermore, we discuss the creation of dirty and clean label poisoning attacks, where maliciously crafted data is injected into training datasets, and explore their implications on system integrity. We also examine a range of defenses designed to mitigate the effects of poisoning attacks, aiming to increase the resilience of speech recognition systems against such threats.

teaching

TA for Machine Learning

Master course, ENSIM, 2022

I was a Teaching Assistant for the course on Machine Learning in the ‘Ecole Nationale Superieure d’Ingenieurs du Mans” in the Spring 2022. We taught students about basic Machine learning techniques, data preparation and Deep Learning basics, through Python. I had the responsibility of 2 groups of 10-15 students for one lab per week.

Mentoring a student for the WISE program

Student mentoring, Johns Hopkins University, ECE, 2024

I was mentor for the WISE (Women In Science and Engineering), where I suppervised a highschool student - Jayden Stewart - for 4 months, working on depression detection from speech and discovering research.