2026 Full Stack & AI Engineer

Picturam

ReactNode.jsWhisperComfyUIAI

A PWA that translates speech into semantic images in real time, helping communication with elderly people who are deaf, illiterate or have verbal comprehension difficulties. Dual-mode pipeline (local GPU or cloud) with a two-layer cache.

The Challenge

Elderly people with hearing loss, illiteracy or comprehension problems miss much of what caregivers, therapists or family members tell them. Spoken language is a fragile channel that doesn't always reach the listener.

Results

  • Dual-mode pipeline: local GPU (RTX 4080) or cloud, config-swappable
  • Two-layer cache (exact phrase + concept) cuts latency 10-50x on repeats
  • Typical latency ~200-300ms cached, 2-5s on fresh generation
  • Use cases: care homes, speech therapy, family communication

The Solution

I built a PWA that captures the voice, transcribes it with Whisper, extracts the key concept with an LLM and generates a semantic image to accompany the sentence. The server orchestrates two interchangeable modes — local GPU (faster-whisper + Ollama + ComfyUI) or cloud (Deepgram + Gemini + fal.ai) — and a two-layer cache eliminates redundant work.

Motivation

I wanted to push the social use case of generative models: instead of decorative images, images that serve someone who can't hear or read. And along the way, experiment with a pipeline that could run on my local GPU or on cloud without changing the app.

Challenges

The hardest part was keeping usable latency on a pipeline with three models in series (STT → LLM → image). The two-layer cache and fuzzy matching of known people were the two decisions that made the conversational mode viable.

Learnings

I learned to design provider abstractions (STT/LLM/image) that can be swapped without coupling to a specific SDK, and that in real scenarios (a care home with flaky wifi) offline-first stops being optional.

Context

Most active project in my portfolio in 2026 (44 commits in 60 days). Technically solid MVP, pending a commercial milestone: open demo, care-home pilot, or third-party API.