Real systems. Real clients. Real numbers.

Everything below is running, being piloted, or in active development. Status is honest, metrics are either measured or clearly marked as targets.

Some clients are anonymised where disclosure was not authorised. No stock photography, no logos we didn't earn — only what's been built.

Live

Deployed and running in production with real users.

Pilot

MVP processing real client data, scoped and signed.

Strategic

Architecture and approach signed off, build in progress.

R&D

Internal Vailis tool or research we apply to client work.

Featured

2 flagship projects
StrategicFeatured
Marketplace

AI-Native Wine Distributor

20 AI agents across 7 operating units

Central-European wine distributor, 150 SKU, regional operations

Problem

A traditional wine distributor was sitting on 101 months of inventory in the budget segment and carrying material receivables from a handful of debtors. Scaling with headcount was off the table — margins could not support it.

Solution

Designed an AI-native operating system where 20 agents run across 7 units: SOMMELIER and HUNTER on supply, CLOSER and NEGOTIATOR on sales, CELLAR and DISPATCHER on ops, COLLECTOR and BOOKKEEPER on finance, plus Taste Graph, Demand Prediction, Winemaker Intelligence and Pricing Engine as the data layer. Stack: Supabase, Claude API, Telegram gateway.

Impact

  • Architecture signed off by founder — equity structure agreed
  • Tier-1 agents (HUNTER, CLOSER, COLLECTOR) scoped as first build block
  • Infrastructure cost modelled at ~$300/month in Y1
  • Data flywheel designed as the long-term defensibility moat
AI-agentsvertical-saasB2Bmulti-agent
Read full case →
PilotFeatured
Fintech

Offshore Digital Bank — AI Onboarding + VIP Manager

KYC / AML automation for a Class A license

Offshore digital bank, Caribbean, Class A banking license

Problem

11 of 19 KYC scoring questions were being answered by hand. Seven HIGH-priority compliance alerts had been open for three weeks. Onboarding time was bleeding out, and relationship managers had no tool to prep VIP interactions.

Solution

Built a Scoring Engine on Claude for the 19-question onboarding model, an Alert Triage classifier that separates false_positive / needs_review / probable_match, document OCR, and an EDD Narrative generator. On top: a VIP Manager assistant with real-time prompts, meeting prep, follow-up automation and a regulatory knowledge base.

Impact

  • Alert triage: 7 backlog alerts cleared in under a minute — vs three weeks of manual work
  • Scoring engine: 71% agreement with the bank on an 8-application test set
  • Five P0 compliance gaps closed (MFA, DPA jurisdiction, Certification)
  • LLM operating cost held between $50 and $260 per month
KYC-AMLbankingcomplianceRAG
Read full case →

All projects

8 total

Industry

Status

PilotRetail / Ops

Retail Accounting Automation

5.9M receipts, 232 stores, one pipeline

Multi-format retail group — 40 grocery and 100 beverage outlets

Problem

6.5 full-time operators were tied up on manual goods receiving and reconciling two accounting contours (management vs accounting). Receivables tracking was reactive, and shrinkage was being absorbed rather than flagged.

Solution

Mobile receiving app that scans barcodes directly into the ERP, an auto-reconciliation engine between the two accounting systems, and an anomaly scoring layer for fraud, shrinkage and receivables risk. A Telegram bot lets managers query the system in natural language.

Impact

  • Target annual saving: $40–49K, payback around three months
  • 104 payroll ghosts identified (charges with no matching payouts)
  • 4 outlets flagged with shrinkage above 10% of turnover
retailanomaly-detectionERP-integrationfraud
Read full case →
PilotFintech

B2B Payment Platform — Support Quality + L1 Bot

RAG-powered support for P2P and card operations

B2B payment platform, P2P transfers and card business

Problem

Support tickets were piling up faster than the team could clear them. Roughly a third of inbound questions were repeats of things already answered, and response quality varied wildly between agents.

Solution

Analysed 1000+ historical tickets for classification, sentiment and resolution time, built a RAG layer over the knowledge base, and shipped a Telegram L1 bot for FAQ, payment statuses and common troubleshooting. A scoring layer compares agent answers to the best-in-class response pattern.

Impact

  • 1000+ tickets cleaned, classified and templated (200+ FAQ templates)
  • Median resolution time 2h, p95 at 8h
  • Repeat-question rate reduced from 30% to 28%
supportfintechRAGbot
Read full case →
LiveSports

GPAGA — Golf Scoring Platform

Tournament scoring for a national golf association

National golf association, Georgia

Problem

A new association needed a platform to actually run tournaments — scoring, handicaps, leaderboards — plus a web presence that could carry a federation-grade brand. Nothing off the shelf covered Stableford, Scramble and Match Play in one place.

Solution

Shipped a Next.js platform on gpaga.ge with Google OAuth, a mobile hole-by-hole carousel, a desktop scorecard table and leaderboards by division. All four scoring formats (Stableford, Stroke Play, Match Play, Scramble) are supported natively, with handicap-aware Stableford point display.

Impact

  • 100+ registered players, 80% Google OAuth adoption
  • 50+ casual rounds and 10+ official tournaments logged
  • Leaderboards by division A / B / C with live updates
sports-techplatformscoring-enginescheduling
Read full case →
LiveInternal / Tools

Brainstorm Bot — Voice Archive for Founders

Diarized transcripts and action items, in-chat

Internal tool, used daily by Vailis and partner founders

Problem

Ideas, decisions and commitments were being generated in group voice chats and evaporating within a day. No archive, no action-item extraction, no way to search months back.

Solution

A Telegram bot catches audio, text and YouTube links, transcribes with Deepgram Nova-3 including speaker diarization, and runs a Claude Haiku pass that returns a summary, extracted ideas and commitments straight back to the chat. Files are written into Obsidian via GitHub sync for long-term retrieval.

Impact

  • Four services live on the VPS, multi-chat routing enabled
  • Voice-ID accuracy 0.898–0.974 on two primary speakers
  • ~$0.29 per 30-minute transcript — predictable unit economics
voiceautomationknowledge-managementdiarization
Read full case →
PilotInternal / Tools

Autorec — macOS System Audio Capture

Native recorder for calls that have no record button

Internal Vailis infrastructure, foundation for voice stack

Problem

Telegram and several other call platforms do not expose a record button. To feed the transcription pipeline we needed a reliable way to capture both microphone and system output on macOS, without relying on browser extensions or screen-recording hacks.

Solution

A native macOS app (Swift + AVFoundation + Core Audio Taps) with a floating REC overlay and a menu-bar status item. Records microphone and system output as separate tracks and uploads them to the processing backend once the session ends.

Impact

  • Phase 1 shipped — overlay and menu bar live, clean capture on both tracks
  • Feeds the Brainstorm transcription pipeline without manual handling
  • Separated tracks allow downstream diarization to stay accurate
macOSvoice-captureinfrastructureswift
Read full case →
PilotInternal / Tools

SubTracker — Subscription & API Cost Tracker

Private tracker for AI subscriptions and API spend

Internal Vailis tool, open-source candidate

Problem

Running an AI-native consulting operation means fifteen-plus subscriptions and as many API keys, spread across Gmail receipts and vendor dashboards. No single place to see what is actually being spent each month or catch forgotten renewals.

Solution

A Chrome extension plus a local FastAPI + SQLite backend. A three-step Gmail pipeline — Discover → Classify → Scan — finds billing domains, then pulls only confirmed receipts. Fuzzy matching on sender names and billing patterns keeps false positives low.

Impact

  • Phase 1 shipped: 16 files, ~1,648 lines, 34 passing tests
  • Six real services auto-discovered from Gmail (Claude, Cursor, Google, Gemini, LuxAlgo, MiniMax)
  • False positives reduced from 93 to 28 through pattern filtering
personal-toolautomationgmail-parsing
Read full case →

Want one of these for your company?

30 minutes, free. One real problem from your business.

Book 30 minutes — free

Or write directly: stan@vailis.ai