MMSYM 2026 Workshop on
Semi-Automated Workflows for Facilitating Multimodal Language Processing
(SWiFT-MLP)
Workshop date 8th of September (co-located with Symposium Series on Multimodal Communication, 9-11 September 2026)
MMSYM 2026 in Leuven, Belgium
About
SWiFT-MLP: Semi-Automated Workflows for Facilitating Multimodal Language Processing is a hands-on workshop on practical, reproducible workflows for analyzing multimodal communication.
It is designed for researchers and students working with audio/video data who want to use new technology to facilitate speech/gesture annotation and analyze gesture kinematics.
Building on the earlier MEDAL workshop on automatic processing of multimodal interaction, SWiFT-MLP introduces semi-automated pipelines to:
- automatically extract body keypoints using MediaPipe
- automatically segment manual gestures
- automatically transcribe speech using WhisperX
- export annotations to common analysis tools (e.g., Praat, ELAN)
- compare signals across modalities (e.g., speech, gesture) and analyze gesture kinematics
The workshop is especially well-suited to the MMSYM community, bringing together perspectives from multimodal communication, linguistics, NLP, computer vision, and cognitive science. It aims to support researchers who need robust, transparent, and reusable workflows for studying embodied communication in interaction.
News
Application
Application is now open until April 30, 2026. We will accept a maximum of 20 participants.
A registration fee of 100 euros will be charged. The cost includes training, coffee breaks, lunch, and a social event in the evening.
Official registration and payment are due after applicants have been selected and have received a confirmation email. The confirmation email will be sent by May 15.
Schedule
| Time | Event |
|---|---|
| 09:00 - 09:15 | Introduction |
| 09:15 - 09:45 | Setup of the environment and tools (Visual Studio Code, Miniconda, MediaPipe, WhisperX) |
| 09:45 - 10:45 | Extracting body key points using MediaPipe |
| 10:45 - 11:00 | Break |
| 11:00 - 12:15 | Gesture segmentation and visualizing segmentation results in ELAN |
| 12:15 - 13:30 | Lunch break |
| 13:30 - 13:45 | Speech transcription using WhisperX |
| 13:45 - 15:00 | Hands-on: Exporting transcriptions into ELAN |
| 15:00 - 15:15 | Break |
| 15:15 - 16:30 | Multimodal similarity analysis using kinematic and speech features |
| 16:30 - 17:00 | Discussion, wrap-up, and next steps |
Organizers
Esam Ghaleb
Max Planck Institute for Psycholinguistics
Sho Akamine
Max Planck Institute for Psycholinguistics
This workshop builds upon an earlier version developed for the MEDAL Summer School, which was jointly organized with Raquel Fernández (Universitat of Amsterdam).
Sponsor
We thank the Multimodal Language Department (MLD) at Max Planck Institute for Psycholinguistics, for sponsoring this workshop.