Introduction
Have you ever tried to explain something with your hands when words didn’t work? Maybe while traveling, or helping someone who speaks a different language? You smile, you point, you gesture — hoping they'll just “get it.”
That moment, where language fails but connection still matters, became the foundation of this project.
The ambition wasn’t to build a gesture recognition tool — it was to explore how technology can interpret non-verbal communication in a cultural context. This includes gestures, facial expressions, body language, and the unspoken cues that carry meaning across different cultures.
Over six weeks, we laid the foundation for an interactive system that could support context-based communication through machine learning. Due to scope and time limitations, the first proof of concept focused on gesture interpretation as a starting point.
Phase 1: Framing the Challenge
Our initial research focused on the complexity of language and how meaning is formed through more than just words. Through concept mapping and user research, we explored how high-context vs. low-context cultures influence the way people communicate.
We broke down communication into five layers:
Spoken language
Body movement
Facial expression
Emotional tone
Cultural and social context
Our central design question became:
How can we support communication across cultures by interpreting non-verbal signals in context — not just translating words, but translating meaning?
Phase 2: Research & Defining the Direction
I conducted a deep dive into existing solutions like Google Lens, Word Lens, Woolaroo, and LookTel — examining their strengths in object and word recognition, but also their blind spots when it comes to cultural nuance.
We identified an opportunity space: most translation tools focus on literal input, but few address the implicit and contextual nature of human communication. The concept was shaped around the idea of interpreting behavior, not just language — enabling someone to understand what a gesture or facial cue means in another culture.
Phase 3: Prototyping Within Constraints
We initially planned to build a wearable interface using Raspberry Pi and a camera module, but hit technical barriers installing TensorFlow locally. To maintain progress, I re-scoped the prototype to a web-based interaction using Teachable Machine and webcam input.
This shift kept the interaction concept intact while allowing for rapid iteration and user testing. Although limited to recognizing one gesture, the focus remained on designing for contextual interpretation, not just classification.
Phase 4: Dataset Design & Training
The gesture selected for the prototype was the Dutch “lekker” sign — a culturally embedded, non-verbal way to say something is tasty or enjoyable. I trained a custom model using a self-created image dataset, captured across various lighting conditions and backgrounds.
To improve accuracy and inclusivity, the dataset included images from users of different ethnicities and genders. I also compared image classification and pose detection to determine which method was more stable for gesture recognition in casual, real-world settings.
Caption: Exploring different ML approaches to classify gestures under varied conditions.
Phase 5: UX & Interaction Testing
The browser prototype allowed users to perform the “lekker” gesture and receive its cultural meaning in return. While it only recognized a single input, the test validated the interaction flow and helped frame how a user might receive feedback when interpreting foreign, non-verbal cues.
Caption: Real-time gesture interpretation with contextual meaning.
In addition, a demo video was made with the attempt to showcase our product if the prototype was fully working. It gives the viewer an idea how the end result should look like.
The long-term vision for this system includes multimodal input — recognizing gestures, facial expressions, posture, and possibly tone — and wearable or mobile integration for real-time, cross-cultural support.
Reflection
This project wasn’t about building another translation app — it was about creating space for cultural understanding in communication design. By focusing on what people express, not just what they say, I began exploring how UX and AI can work together to bridge meaning across cultures.
“This was the first time I thought about AI not just as a tool — but as a lens for empathy.”
It also sharpened my ability to adapt under technical constraints and design meaningful interaction flows even with limited inputs. In the future, I want to continue working on inclusive, context-aware interfaces that support real-world understanding — not just digital functionality.
Key UX Learnings
Researched and framed the problem through a cultural UX lens
Designed and trained a gesture recognition model for contextual feedback
Balanced technical feasibility with user-centered design
Scoped a multimodal concept down to a working prototype using rapid iteration
Validated the importance of context, culture, and diversity in AI-driven interactions