MMSYM 2026 Workshop on
Semi-Automated Workflows for Facilitating Multimodal Language Processing
(SWiFT-MLP)

Workshop date 8th of September (co-located with Symposium Series on Multimodal Communication, 9-11 September 2026)

MMSYM 2026 in Leuven, Belgium

About

SWiFT-MLP: Semi-Automated Workflows for Facilitating Multimodal Language Processing is a hands-on workshop on practical, reproducible workflows for analyzing multimodal communication.

It is designed for researchers and students working with audio/video data who want to use new technology to facilitate speech/gesture annotation and analyze gesture kinematics.

Building on the earlier MEDAL workshop on automatic processing of multimodal interaction, SWiFT-MLP introduces semi-automated pipelines to:

  • automatically extract body keypoints using MediaPipe
  • automatically segment manual gestures
  • automatically transcribe speech using WhisperX
  • export annotations to common analysis tools (e.g., Praat, ELAN)
  • compare signals across modalities (e.g., speech, gesture) and analyze gesture kinematics

The workshop is especially well-suited to the MMSYM community, bringing together perspectives from multimodal communication, linguistics, NLP, computer vision, and cognitive science. It aims to support researchers who need robust, transparent, and reusable workflows for studying embodied communication in interaction.

News

2026-03-01
SWiFT-MLP will be held at MMSYM 2026 in Leuven, Belgium, on September 8th, 2026.

Application

Application is now open until April 30, 2026. We will accept a maximum of 20 participants.

A registration fee of 100 euros will be charged. The cost includes training, coffee breaks, lunch, and a social event in the evening.

Official registration and payment are due after applicants have been selected and have received a confirmation email. The confirmation email will be sent by May 15.

Apply here

Schedule

Time Event
09:00 - 09:15 Introduction
09:15 - 09:45 Setup of the environment and tools (Visual Studio Code, Miniconda, MediaPipe, WhisperX)
09:45 - 10:45 Extracting body key points using MediaPipe
10:45 - 11:00 Break
11:00 - 12:15 Gesture segmentation and visualizing segmentation results in ELAN
12:15 - 13:30 Lunch break
13:30 - 13:45 Speech transcription using WhisperX
13:45 - 15:00 Hands-on: Exporting transcriptions into ELAN
15:00 - 15:15 Break
15:15 - 16:30 Multimodal similarity analysis using kinematic and speech features
16:30 - 17:00 Discussion, wrap-up, and next steps

Organizers

Esam Ghaleb
Esam Ghaleb

Max Planck Institute for Psycholinguistics

Sho Akamine
Sho Akamine

Max Planck Institute for Psycholinguistics

This workshop builds upon an earlier version developed for the MEDAL Summer School, which was jointly organized with Raquel Fernández (Universitat of Amsterdam).

Raquel Fernández

Sponsor

We thank the Multimodal Language Department (MLD) at Max Planck Institute for Psycholinguistics, for sponsoring this workshop.