Real systems. Real clients. Real numbers.

Everything below is running, being piloted, or in active development. Status is honest, metrics are either measured or clearly marked as targets.

Some clients are anonymised where disclosure was not authorised. No stock photography, no logos we didn't earn — only what's been built.

Live

Deployed and running in production with real users.

Pilot

MVP processing real client data, scoped and signed.

Strategic

Architecture and approach signed off, build in progress.

R&D

Internal Vailis tool or research we apply to client work.

Featured

2 flagship projects

StrategicFeatured

Marketplace

AI-Native Wine Distributor

20 AI agents across 7 operating units

Central-European wine distributor, 150 SKU, regional operations

Problem

A traditional wine distributor was sitting on 101 months of inventory in the budget segment and carrying material receivables from a handful of debtors. Scaling with headcount was off the table — margins could not support it.

Solution

Designed an AI-native operating system where 20 agents run across 7 units: SOMMELIER and HUNTER on supply, CLOSER and NEGOTIATOR on sales, CELLAR and DISPATCHER on ops, COLLECTOR and BOOKKEEPER on finance, plus Taste Graph, Demand Prediction, Winemaker Intelligence and Pricing Engine as the data layer. Stack: Supabase, Claude API, Telegram gateway.

Impact

Architecture signed off by founder — equity structure agreed
Tier-1 agents (HUNTER, CLOSER, COLLECTOR) scoped as first build block
Infrastructure cost modelled at ~$300/month in Y1
Data flywheel designed as the long-term defensibility moat

AI-agentsvertical-saasB2Bmulti-agent

Read full case →

PilotFeatured

Fintech

Offshore Digital Bank — AI Onboarding + VIP Manager

KYC / AML automation for a Class A license

Offshore digital bank, Caribbean, Class A banking license

Problem

11 of 19 KYC scoring questions were being answered by hand. Seven HIGH-priority compliance alerts had been open for three weeks. Onboarding time was bleeding out, and relationship managers had no tool to prep VIP interactions.

Solution

Built a Scoring Engine on Claude for the 19-question onboarding model, an Alert Triage classifier that separates false_positive / needs_review / probable_match, document OCR, and an EDD Narrative generator. On top: a VIP Manager assistant with real-time prompts, meeting prep, follow-up automation and a regulatory knowledge base.

Impact

Alert triage: 7 backlog alerts cleared in under a minute — vs three weeks of manual work
Scoring engine: 71% agreement with the bank on an 8-application test set
Five P0 compliance gaps closed (MFA, DPA jurisdiction, Certification)
LLM operating cost held between $50 and $260 per month

KYC-AMLbankingcomplianceRAG

Read full case →

All projects

8 total

Industry

Status

PilotRetail / Ops

Retail Accounting Automation

5.9M receipts, 232 stores, one pipeline

Multi-format retail group — 40 grocery and 100 beverage outlets

Problem

6.5 full-time operators were tied up on manual goods receiving and reconciling two accounting contours (management vs accounting). Receivables tracking was reactive, and shrinkage was being absorbed rather than flagged.

Solution

Mobile receiving app that scans barcodes directly into the ERP, an auto-reconciliation engine between the two accounting systems, and an anomaly scoring layer for fraud, shrinkage and receivables risk. A Telegram bot lets managers query the system in natural language.

Impact

Target annual saving: $40–49K, payback around three months
104 payroll ghosts identified (charges with no matching payouts)
4 outlets flagged with shrinkage above 10% of turnover

retailanomaly-detectionERP-integrationfraud

Read full case →

PilotFintech

B2B Payment Platform — Support Quality + L1 Bot

RAG-powered support for P2P and card operations

B2B payment platform, P2P transfers and card business

Problem

Support tickets were piling up faster than the team could clear them. Roughly a third of inbound questions were repeats of things already answered, and response quality varied wildly between agents.

Solution

Analysed 1000+ historical tickets for classification, sentiment and resolution time, built a RAG layer over the knowledge base, and shipped a Telegram L1 bot for FAQ, payment statuses and common troubleshooting. A scoring layer compares agent answers to the best-in-class response pattern.

Impact

1000+ tickets cleaned, classified and templated (200+ FAQ templates)
Median resolution time 2h, p95 at 8h
Repeat-question rate reduced from 30% to 28%

supportfintechRAGbot

Read full case →

LiveSports

GPAGA — Golf Scoring Platform

Tournament scoring for a national golf association

National golf association, Georgia

Problem

A new association needed a platform to actually run tournaments — scoring, handicaps, leaderboards — plus a web presence that could carry a federation-grade brand. Nothing off the shelf covered Stableford, Scramble and Match Play in one place.

Solution

Shipped a Next.js platform on gpaga.ge with Google OAuth, a mobile hole-by-hole carousel, a desktop scorecard table and leaderboards by division. All four scoring formats (Stableford, Stroke Play, Match Play, Scramble) are supported natively, with handicap-aware Stableford point display.

Impact

100+ registered players, 80% Google OAuth adoption
50+ casual rounds and 10+ official tournaments logged
Leaderboards by division A / B / C with live updates

sports-techplatformscoring-enginescheduling

Read full case →

LiveInternal / Tools

Brainstorm Bot — Voice Archive for Founders

Diarized transcripts and action items, in-chat

Internal tool, used daily by Vailis and partner founders

Problem

Ideas, decisions and commitments were being generated in group voice chats and evaporating within a day. No archive, no action-item extraction, no way to search months back.

Solution

A Telegram bot catches audio, text and YouTube links, transcribes with Deepgram Nova-3 including speaker diarization, and runs a Claude Haiku pass that returns a summary, extracted ideas and commitments straight back to the chat. Files are written into Obsidian via GitHub sync for long-term retrieval.

Impact

Four services live on the VPS, multi-chat routing enabled
Voice-ID accuracy 0.898–0.974 on two primary speakers
~$0.29 per 30-minute transcript — predictable unit economics

voiceautomationknowledge-managementdiarization

Read full case →

PilotInternal / Tools

Autorec — macOS System Audio Capture

Native recorder for calls that have no record button

Internal Vailis infrastructure, foundation for voice stack

Problem

Telegram and several other call platforms do not expose a record button. To feed the transcription pipeline we needed a reliable way to capture both microphone and system output on macOS, without relying on browser extensions or screen-recording hacks.

Solution

A native macOS app (Swift + AVFoundation + Core Audio Taps) with a floating REC overlay and a menu-bar status item. Records microphone and system output as separate tracks and uploads them to the processing backend once the session ends.

Impact

Phase 1 shipped — overlay and menu bar live, clean capture on both tracks
Feeds the Brainstorm transcription pipeline without manual handling
Separated tracks allow downstream diarization to stay accurate

macOSvoice-captureinfrastructureswift

Read full case →

PilotInternal / Tools

SubTracker — Subscription & API Cost Tracker

Private tracker for AI subscriptions and API spend

Internal Vailis tool, open-source candidate

Problem

Running an AI-native consulting operation means fifteen-plus subscriptions and as many API keys, spread across Gmail receipts and vendor dashboards. No single place to see what is actually being spent each month or catch forgotten renewals.

Solution

A Chrome extension plus a local FastAPI + SQLite backend. A three-step Gmail pipeline — Discover → Classify → Scan — finds billing domains, then pulls only confirmed receipts. Fuzzy matching on sender names and billing patterns keeps false positives low.

Impact

Phase 1 shipped: 16 files, ~1,648 lines, 34 passing tests
Six real services auto-discovered from Gmail (Claude, Cursor, Google, Gemini, LuxAlgo, MiniMax)
False positives reduced from 93 to 28 through pattern filtering

personal-toolautomationgmail-parsing

Read full case →

Want one of these for your company?

30 minutes, free. One real problem from your business.

Book 30 minutes — free

Or write directly: stan@vailis.ai