Posts by Collection

portfolio

Portfolio item number 1

Short description of portfolio item number 1

Portfolio item number 2

Short description of portfolio item number 2

publications

Unsupervised labelling of stolen handwritten digit embeddings with density matching

Published in International Conference on Applied Cryptography and Network Security, 2020

Biometrics authentication is now widely deployed, and from that omnipresence comes the necessity to protect private data. Recent studies proved touchscreen handwritten digits to be a reliable biometrics. We set a threat model based on that biometrics: in the event of theft of unlabelled embeddings of handwritten digits, we propose a labelling method inspired by recent unsupervised translation algorithms. Provided a set of unlabelled embeddings known to have been produced by a Long Short Term Memory Recurrent Neural Network (LSTM RNN), we demonstrate that inferring their labels is possible. The proposed approach involves label-wise clustering of the embeddings and label identification of each group by matching their distribution to the label-relative classes of a comparison hand-crafted labeled set of embeddings.

Download here

Handwritten digits reconstruction from unlabelled embeddings

Published in ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

In this paper, we investigate template reconstruction attack of touchscreen biometrics, based on handwritten digits writer verification. In the event of a template database theft, we show that reconstructing the original drawn digit from the embeddings is possible without access to the original embedding encoder. Using an external labelled dataset, an attack encoder is trained along with a Mixture Density Recurrent Neural Network decoder. Thanks to an alignment flow, initialized with Linear Discriminant Analysis and Procrustes, the transfer function between the output space of the original and the attack encoder is estimated. The successive application of transfer function and decoder to the stolen embeddings allows to reconstruct the original drawings, which can be used to spoof the behavioural biometrics system.

Download here

Spoofing speaker verification with voice style transfer and reconstruction loss

Published in 2021 IEEE International Workshop on Information Forensics and Security (WIFS), 2021

In this paper we investigate a template reconstruction attack against a speaker verification system. A stolen speaker embedding is processed with a zero-shot voice-style transfer system to reconstruct a Mel-spectrogram containing as much speaker information as possible. We assume the attacker has a black box access to a state-of-the-art automatic speaker verification system. We modify the AutoVC voice-style transfer system to spoof the automatic speaker verification system. We find that integrating a new loss targeting embedding reconstruction and optimizing training hyper-parameters significantly improves spoofing. Results obtained for speaker verification are similar to other biometrics, such as handwritten digits or face verification. We show on standard corpora (VoxCeleb and VCTK) that the reconstructed Mel-spectrograms contain enough speaker characteristics to spoof the original authentication system.

Download here

On the invertibility of a voice privacy system using embedding alignment

Published in 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 2021

This paper explores various attack scenarios on a voice anonymization system using embeddings alignment techniques. We use Wasserstein-Procrustes (an algorithm initially designed for unsupervised translation) or Procrustes analysis to match two sets of -vectors, before and after voice anonymization, to mimic this transformation as a rotation function. We compute the optimal rotation and compare the results of this approximation to the official Voice Privacy Challenge results. We show that a complex system like the baseline of the Voice Privacy Challenge can be approximated by a rotation, estimated using a limited set of -vectors. This paper studies the space of solutions for voice anonymization within the specific scope of rotations. Rotations being reversible, the proposed method can recover up to 62% of the speaker identities from anonymized embeddings.

Download here

Multi-lingual Speech to Speech Translation for Under-Resourced Languages

Published in Le Mans University, 2022

This report describe the research done during the first ESPERANTO/JSALT workshop from the 13th June 2022 to the 5th of August 2022.

Download here

Published in , 1900

JHU IWSLT 2023 Dialect Speech Translation System Description

Published in Proceedings of the 20th International Conference on Spoken Language Translation (IWSLT 2023), 2023

This paper presents JHU’s submissions to the IWSLT 2023 dialectal and low-resource track of Tunisian Arabic to English speech translation. The Tunisian dialect lacks formal orthography and abundant training data, making it challenging to develop effective speech translation (ST) systems. To address these challenges, we explore the integration of large pre-trained machine translation (MT) models, such as mBART and NLLB-200 in both end-to-end (E2E) and cascaded speech translation (ST) systems. We also improve the performance of automatic speech recognition (ASR) through the use of pseudo-labeling data augmentation and channel matching on telephone data. Finally, we combine our E2E and cascaded ST systems with Minimum Bayes-Risk decoding. Our combined system achieves a BLEU score of 21.6 and 19.1 on test2 and test3, respectively.

Download here

CLUSTERING UNSUPERVISED REPRESENTATIONS AS DEFENSE AGAINST POISONING ATTACKS ON SPEECH COMMANDS CLASSIFICATION SYSTEM

Published in Workshop on Automatic Speech Recognition and Understanding (ASRU 2023), 2023

Poisoning attacks entail attackers intentionally tampering with training data. In this paper, we consider a dirty-label poisoning attack scenario on a speech commands classifi- cation system. The threat model assumes that certain utter- ances from one of the classes (source class) are poisoned by superimposing a trigger on it, and its label is changed to another class selected by the attacker (target class). We propose a filtering defense against such an attack. First, we use DIstillation with NO labels (DINO) to learn unsupervised representations for all the training examples. Next, we use K-means and LDA to cluster these representations. Finally, we keep the utterances with the most repeated label in their cluster for training and discard the rest. For a 10% poisoned source class, we demonstrate a drop in attack success rate from 99.75% to 0.25%. We test our defense against a variety of threat models, including different target and source classes, as well as trigger variations.

Download here

talks

An Introduction to Voice Conversion

Published: June 22, 2022

I gave an 1h15 talk about the bases of Voice Conversion, which was then followed by a 3h competitive lab on antispoofing techniques against various voice conversion and TTS systems, co-animated by Thibault Gaudier and Valentin Pelloin. This talk was targetting PhD student and grad students.

Adversarial and Poisoning attacks against speech systems: where to find them?

Published: January 22, 2024

Abstract: The majority of today’s machine learning algorithms share common foundations and core concepts, rendering them susceptible to various attacks. In this short talk, I would like to dive into the world of adversarial attacks and poisoning attacks on speech systems. What are they, how dangerous are they, and what can be done against them?

Do you trust your data? A Journey through Adversarial and Poisoning Attacks and Defenses on Speech Systems.

Published: June 25, 2024

Abstract: As the prevalence of voice-controlled devices and speech systems continues to grow, so too does the importance of ensuring their security and reliability. However, these systems are increasingly vulnerable to adversarial and poisoning attacks, which can exploit vulnerabilities and compromise their performance. In this talk, we delve into the intricate landscape of adversarial attacks targeting speech systems, presenting our research on detecting and classifying these attacks to better understand their nuances and impact. Furthermore, we discuss the creation of dirty and clean label poisoning attacks, where maliciously crafted data is injected into training datasets, and explore their implications on system integrity. We also examine a range of defenses designed to mitigate the effects of poisoning attacks, aiming to increase the resilience of speech recognition systems against such threats.

About Neural systems vulnerabilities: Classical attacks and recent defenses.

Published: December 18, 2024

Abstract: The widespread adoption of voice-controlled devices and speech recognition systems underscores the critical need for robust security measures to ensure their reliability. These systems face growing threats from adversarial and poisoning attacks, which exploit vulnerabilities to degrade performance or manipulate outcomes. This talk explores the evolving landscape of adversarial attacks on speech systems, focusing on their detection and classification to illuminate their characteristics and impacts. We also investigate dirty and clean label poisoning attacks, where malicious data is stealthily introduced into training datasets, compromising system integrity. Finally, we present a range of defense mechanisms designed to counteract poisoning attacks, enhancing the resilience and trustworthiness of speech recognition technologies.

The Limits of Speech Systems: Navigating Adversarial and Poisoning Threats with Robust Defenses.

Published: January 31, 2025

Abstract: The rapid proliferation of voice-controlled devices and speech recognition systems has heightened the need for robust security measures to safeguard their reliability and trustworthiness. These technologies are increasingly targeted by adversarial and data poisoning attacks, which exploit system vulnerabilities to degrade performance or manipulate outputs. This talk examines the evolving threat landscape for speech systems, with a focus on the detection and classification of adversarial attacks to better understand their mechanisms and impacts. We further explore both dirty- and clean-label poisoning strategies, where malicious data is covertly embedded into training sets, undermining model integrity. Finally, we present and evaluate a range of defense strategies designed to counteract such threats, strengthening the resilience of speech recognition systems against manipulation.

teaching

Computational Modeling for Electrical and Computer Engineering (EN.520.123)

Undergrad Course, Johns Hopkins University, ECE, 2023

I was co-instructor for the Computational Modelling for Electrical and Computer Engineering Course for undergrad students of the ECE department in the Spring 2023. This course included 3 teaching periods per week to 50 students, for a total of 3 credits for the attendees.

Explore Machine Learning solutions for Security (EN.500.111.03 & EN.500.111.22)

Undergrad Course, Johns Hopkins University, ECE, 2023

I was Instructor for the Explore Machine Learning solutions for Security Course for undergrad students of the School of Engineering during the fall 2023. This course included one period of teaching per week, to two separate groups of 10 students, for a total of 1 credit for the attendees.

Computational Modeling for Electrical and Computer Engineering (EN.520.123)

Undergrad Course, Johns Hopkins University, ECE, 2024

I was co-instructor for the Computational Modelling for Electrical and Computer Engineering Course for undergrad students of the ECE department in the Spring 2024. This course included 3 teaching periods per week to 50 students, for a total of 3 credits for the attendees.

Machine Learning for Signal Processing (EN.520.612/EN.520.412)

400/600 Course, Johns Hopkins University, ECE, 2024

I was co-instructor for the Machine Learning for Signal Processing Course for grad and undergrad students of the ECE department in the Fall 2024. This course included 3 teaching periods per week to 50 students, for a total of 3 credits for the attendees.

AI for Biometric Systems: Techniques, Applications, and Ethics (EN.520.612/EN.520.412)

400/600 Course, Johns Hopkins University, ECE, 2025

I created the course AI for Biometric Systems: Techniques, Applications, and Ethics for grad and undergrad students of the ECE department in the Spring 2025. This course will included 3 teaching periods per week to 10 students, for a total of 3 credits for the attendees.

Machine Learning for Signal Processing (EN.520.612/EN.520.412)

400/600 Course, Johns Hopkins University, ECE, 2025

AI for Biometric Systems: Techniques, Applications, and Ethics (EN.520.612/EN.520.412)

400/600 Course, Johns Hopkins University, ECE, 2026

Following its successful launch in Spring 2025, this course offers an advanced and updated exploration of artificial intelligence methods applied to biometric systems. Designed for both graduate and advanced undergraduate students in Electrical and Computer Engineering, the course combines theoretical foundations, practical implementation, and ethical reflection on the use of biometric technologies in modern society.

Thomas Thebaud

Posts by Collection

portfolio

publications

talks

teaching