It’s More Fun To Sink Them: Gesture-Based Interaction and the Musicalisation of Political Speech

Vanissa Law
Apr 1
5 min read

Updated: Apr 3

It’s More Fun To Sink Them (2026) is an interactive media installation that establishes a real-time audiovisual relationship between spectators and archival speech footage of Donald Trump through webcam-based facial tracking. The work explores how authority operates through mediated speech and instruction by transforming political rhetoric into responsive audiovisual material shaped directly by the viewer’s bodily presence.

Rather than treating speech as a stable carrier of semantic meaning, the installation approaches voice as performative sound that can be reshaped through gesture. Through this interaction, the spectator becomes implicated in the transformation of political speech, encountering authority not as a fixed structure but as a responsive and unstable audiovisual phenomenon.

The installation extends earlier investigations in Locker Room Improv (2017), in which fragments of Trump’s speech were recomposed as fixed-media material. The present work develops this approach by introducing embodied interaction as a compositional agent.

System Overview

The installation employs real-time facial landmark tracking to control the behaviour of archival speech recordings of Donald Trump. A webcam detects the viewer’s face and extracts gesture data using MediaPipe integrated within a MAX/MSP/Jitter environment. These gestures are mapped onto parameters controlling image deformation, temporal manipulation, and audio processing.

Three facial gestures shape the audiovisual output:

movement of the face within the frame distorts the geometry of Trump’s facial image
blinking triggers accelerated repetition of short speech fragments
opening the mouth activates a delay effect

When no viewer is detected, the video becomes blurred and the audio level is reduced. When a face enters the frame, both image clarity and sound intensity return. This establishes a direct perceptual relationship between viewer presence and media activation.

Rather than explicitly explaining the interaction system, the installation reveals its listening mechanisms indirectly through prohibition. Because blinking cannot easily be avoided, spectators become aware of their own participation in the system.

Technical Construction

The installation is implemented in MAX 9 and integrates webcam-based facial tracking with preprocessed video material sourced from publicly available recordings of Donald Trump’s speeches. These source clips were aligned using DaVinci Resolve face-tracking tools so that Trump’s face remains approximately centred across multiple video segments. This preprocessing ensures consistency in subsequent real-time deformation.

Facial landmark tracking is achieved using jweb-mediapipe, an implementation developed by Rob Rich that enables MediaPipe tracking within the MAX environment. Landmark data are mapped directly onto parameters controlling video deformation and audio transformation in real time.

Gesture Selection and Tracking Strategy

Following iterative testing, three facial gestures were selected as reliable and computationally efficient control inputs:

iris position (<Left_Iris>, <Right_Iris>) via facemesh.html
left-eye blinking (<eyeBlinkLeft>) via facemesh.html
jaw opening (<jawOpen>) via face-landmarker.html

Although MediaPipe supports multi-face tracking, the installation tracks only a single spectator at a time in order to maintain system stability during exhibition operation and to ensure consistent interaction responsiveness.

Viewer presence functions as a primary activation condition. When no face is detected, the video image becomes blurred and the audio level is reduced; when a face enters the frame, image clarity and sound intensity are restored. This establishes an immediate perceptual link between bodily presence and media behaviour.

Mapping of Gestures to Audiovisual Behaviour

Iris position controls deformation of Trump’s facial image through the jit.gl.meshwarp object. A mesh grid of 13 × 11 vertices is applied to the video texture, with vertices (5,4) and (7,4) displaced according to iris coordinates. Because OpenGL-based processing is used, jit.movie outputs textures directly (@output_texture 1) to ensure efficient GPU rendering.

Blinking of the left eye triggers a looping replay of the previous 700 milliseconds of video three times consecutively, with playback rates of 1.2×, 1.4×, and 1.6×. This produces a progressively rising pitch contour in Trump’s voice. Simultaneously, the instruction

DON’T BLINK

appears as a visual directive addressed to the spectator.

Jaw opening activates a delay effect implemented using tapin~ / tapout~, which disengages when the jaw returns to its resting position. Pitch material in the speech recordings is additionally constrained using the retune~ object, mapped to the scale:

[ 0, 2, 4, 5, 7, 8 ]

A second instruction,

DON’T DropJaw

appears during this interaction, reinforcing the relationship between bodily gesture and system response. Together, these mappings introduce a subtle musicalisation of speech while preserving intelligibility.

https://video.wixstatic.com/video/d8d2e0_a8bc0816dd11430193cd62cd40dbad49/480p/mp4/file.mp4

Design Decisions Under Technical Constraints

Earlier versions of the installation employed the Ableton-derived object abl.device.echo~ to generate echo processing. However, testing on an Apple M1 laptop revealed sustained CPU usage exceeding 50%, which compromised real-time stability. The delay system was therefore replaced with a lighter implementation using native MAX objects.

Only the left eye is tracked for blink detection. Because natural blinking typically involves both eyes simultaneously, tracking a single eye provides sufficient reliability while reducing computational load.

Blinking was selected as a primary interaction gesture because it is involuntary, universal, and difficult to suppress consciously.

Being watched and controlled

The instructions DON’T BLINK and DON’T DROP JAW function simultaneously as interface cues and conceptual devices. Rather than explicitly explaining the interaction mechanism, the system reveals what it monitors through prohibition. Because blinking cannot easily be avoided, spectators experience a subtle tension between instruction and bodily inevitability. This produces a sensation of being observed and regulated by the system.

The work reflects on the broader condition of living within environments structured by continuous instruction—what to do, what not to do, and how to behave. The installation invites reflection on how authority operates through suggestion rather than force.

Political Speech and Shifting Normative Expectations

Donald Trump’s public speech style—frequently informal, exaggerated, and rhetorically unstable relative to expectations traditionally associated with presidential discourse—forms the material basis of the installation’s audiovisual transformations. His election in 2016 and subsequent return to office in 2025 have prompted widespread reconsideration of political norms and media credibility.

Rather than evaluating these developments directly, the installation examines how spectators process political language when its authority appears simultaneously amplified and destabilised through mediation.

A second layer of the work concerns global informational entanglement. Contemporary political events circulate rapidly across borders, becoming shared psychological environments regardless of geographic distance. Within this context, interaction becomes a strategy for maintaining agency: rather than passively receiving mediated speech, the spectator reshapes it through bodily gesture.

Language, Voice, and Listening Beyond Meaning

An underlying concern of the installation is the decomposition of language into sound. Rather than treating speech primarily as semantic communication, the work approaches spoken voice as material that can be reshaped, fragmented, and re-heard as gesture, rhythm, and timbre. This approach extends across the artist’s wider compositional practice, including acoustic choral works in which text is frequently displaced from communicative function and reorganised as sonic structure.

This approach resonates with Martin Heidegger’s suggestion that, in order to hear sound itself, one must “listen away from things”—that is, to suspend attention from referential meaning in order to encounter sound as such. When speech is detached from its communicative role, its musical and performative qualities become perceptible.

Relationship to Earlier Work

It’s More Fun To Sink Them extends the earlier project Locker Room Improv (2017), an interactive MAX/MSP-based system constructed from fragments of Donald Trump’s recorded speech. In that earlier work, Trump’s voice functioned as acoustic material rather than political commentary. Through processes of segmentation, recombination, and transformation, the project foregrounded the unexpected musical and comedic qualities embedded in his vocal delivery.

The present installation develops this investigation further by shifting from fixed-media manipulation toward embodied interaction. Whereas Locker Room Improv explored the instability of political speech through compositional processing, It’s More Fun To Sink Them introduces the spectator’s gestures as active agents shaping the audiovisual outcome in real time.

A fixed-media version of Locker Room Improv is available online:

Together, the two works form part of an ongoing investigation into how mediated political voices can be reinterpreted through humour, fragmentation, and interaction, revealing the fragile boundary between authority, performance, and spectacle.

Development and Exhibition Context

It's More Fun To Sunk Them was developed duirng a residency at the Digital Arts Studios (DAS), Belfast in 2025. The work was presented as part of DAS Annual Review Exhibition: BLINK at PS², Belfast, running from 2 April to 25 April 2026.

Tools and Implementation

Software:

MAX/MSP/Jitter, MediaPipe (jweb implementation by Rob Rich)

Video preprocessing: DaVinci Resolve face tracking

Hardware: USB webcam, single-channel video, mono audio

vanissa law

Composer・Media Artist・Curator

Exploring sound, listening, and collective experience through
choral music, media installations, and collaborative projects.

It’s More Fun To Sink Them: Gesture-Based Interaction and the Musicalisation of Political Speech

System Overview

Technical Construction

Gesture Selection and Tracking Strategy

Mapping of Gestures to Audiovisual Behaviour

Design Decisions Under Technical Constraints

Being watched and controlled

Political Speech and Shifting Normative Expectations

Language, Voice, and Listening Beyond Meaning

Relationship to Earlier Work

Development and Exhibition Context

Tools and Implementation

Recent Posts

Comments

vanissa law

Composer・Media Artist・Curator

Exploring sound, listening, and collective experience through choral music, media installations, and collaborative projects.

System Overview

Technical Construction

Gesture Selection and Tracking Strategy

Mapping of Gestures to Audiovisual Behaviour

Design Decisions Under Technical Constraints

Being watched and controlled

Political Speech and Shifting Normative Expectations

Language, Voice, and Listening Beyond Meaning

Relationship to Earlier Work

Development and Exhibition Context

Tools and Implementation

Comments

Exploring sound, listening, and collective experience through
choral music, media installations, and collaborative projects.