How the paper is built

How it’s built

A field guide to The Lexington Times

A 24/7 livestream. A searchable AI transcript archive. A real-time map of LFD calls, flood gauges, and crime reports. All glued together by open-source code, public data, and one very opinionated ops panel.

What you’re looking at

The Lexington Times is three products pretending to be one paper. Here’s what each one does and where it runs.

01

The paper

lexingtonky.news

WordPress. Aggregation via wp-rss-aggregator, original reporting, commentary, meeting agendas, SEO via Yoast, Jetpack stats. The human-readable front door.

02

The feeds engine

feeds.lexingtonky.news

Node + Express. Scrapes public sources, rewrites with Claude, stores Articles as JSON, exposes RSS, sitemaps, OG images, full-text search, and an /api/match endpoint so the paper can cross-link transcripts.

03

The livestream

youtube.com/@TheLexingtonTimes

Orchestrator + OBS + dashboard. 24/7 civic radio with ambient scanner audio, rotating traffic cams, real-time LFD incidents, NWS alerts, and AI-voiced briefs. Runs on a Lightsail box.

The stack

Runtime Node 20, TypeScript strict, pnpm monorepo
Storage Postgres + Redis Streams, S3, filesystem JSON
Broadcast OBS Studio + obs-websocket, RTMPS to YouTube & Facebook
Voice ElevenLabs TTS (Turbo v2.5, dual-host dialog, music API)
Writers Anthropic Claude (Opus & Sonnet), web_search tool, RAG over meeting transcripts
Dashboard React + Vite + Web Audio API, server-rendered MapLibre PNG stages
Host AWS Lightsail (WordPress + orchestrator), EC2 (feeds service), Cloudflare (CDN/DNS)

Public data sources

Everything below is either public record, a licensed re-publisher, or data we generate ourselves. No scraping of paid sources. No social-media private APIs. No PII.

LFUCGGranicus agendas + live meetings, events API, traffic camera ArcGIS, council districts, construction ArcGIS, parks + facilities
WeatherNWS alerts (Louisville WFO), USGS flood gauges, SPC outlook, drought monitor, FEMA county-level declarations
Public safetyLFD incident dispatch (HTML scrape), CrimeScape (LPD redistributor), jail roster
TransportationKYTC traffic cameras, LFUCG construction, snow plow GPS
State + federalKentucky Legislature (LRC), US DOJ Eastern District of KY, US Attorney’s Office, Kentucky AG, FRED (economic)
SportsESPN (UK Kentucky Wildcats), Bengals via AP wire

The meeting pipeline

LFUCG publishes Council + commission meetings on Granicus with scanned agenda PDFs and auto-generated captions. That’s where our civic coverage starts — but the raw feed isn’t something you can skim. A dedicated Python pipeline turns each clip into something searchable.

1IngestScrapes Granicus for new meeting clips, downloads the video, pulls the scanned agenda PDF (OCR’d via tesseract), and captures captions + timestamps.
2Transcribeffmpeg extracts audio; Whisper transcribes; segment-level timestamps are preserved so the frontend can deep-link into the video at any moment.
3Fact pass (GPT-4o)Extracts structured facts — votes, dollar amounts, names, agenda items, timestamps — into extracted_facts.json. Pure structured extraction, no narrative.
4Narrative pass (Claude Sonnet)Writes a section-by-section summary grounded in the extracted facts, with [timestamp: MM:SS] markers for clickable video seeking.
5ServeStatic JSON + React SPA for browsing the archive, plus a RAG Q&A endpoint that LexBot queries when picking the day’s most interesting clip or building a recap script.

The whole thing is open source — code, docs, and the Granicus reverse-engineering notes (granicus.md) that made timestamp deep-linking possible.

paul-codes-1 / fuzzy-potato

What LexBot reads on air

Every voice segment is a written-for-radio script generated by Claude, spoken by ElevenLabs, and logged to the transcript archive. The cadence is deliberately unhurried — it’s a civic radio feed, not TikTok.

HourlyNews briefsRewritten from the day’s aggregated headlines; 2-minute paraphrase with attribution.
06:00Morning briefingDual-host weekday roundup: weather, overnight LFD activity, meetings on today’s docket.
18:00Evening wrapDual-host recap of the day — public meetings, notable filings, what happened locally.
q3hWeather cut-insEvery three hours, on top of whatever’s playing, with NWS alert override on severe days.
Every 8hAsk LexListener-submitted questions answered on air, fact-checked via Anthropic web_search.
Per meetingMeeting recapsAfter an LFUCG meeting ends, the RAG-selected most-interesting clip + a commentary pass.
09:00Calendar spotlightToday at 9 a.m. — what’s happening in Lexington.
On triggerBreaking cut-insOn BREAKING events, a dedicated cut-in clip with 30-second delay for fact-check.
Daily 16:00Crime blotterAggregated prior-24h reports by district — counts only, no individual addresses.
11:00 M-FPet of the dayLexington Humane Society adoptable + photo overlay.
Mon 07:00Neighborhood of the weekThree-minute profile of one of 52 Lexington neighborhoods.
11:45 M-FLunch spotlightIndependent restaurant of the day — seed list of 30, 60-day dedup, no chains.
:30 hourlyTrivia breakLocal-history question, answer 10 minutes later, with chat-guesser callouts.

Guardrails

  • AI disclosure — every transcript includes a human-readable note that the segment was generated by LexBot, which model wrote it, and when it aired.
  • Fair use — aggregated headlines carry title + ≤200-char summary, with the canonical source link. Voice scripts paraphrase and attribute; they never quote.
  • No hallucinations on facts — Ask Lex and breaking briefs run through Anthropic’s web_search_20250305 tool, geo-biased to Lexington, KY.
  • Circuit breakers — every poller opens on 3 consecutive failures and stays open 5 minutes before retry. No silent infinite-retry.
  • Dwell + cooldown — scene-switch safety constants prevent the stream from rapidly flipping between cams / alerts / ads in ways that would look unstable on air.
  • Fourth wall — hosts never say “I don’t have that in front of me,” “our data,” or “checking our feeds.” If information is thin, we pivot to what a local would know.
  • Crime blotter is aggregated — counts per council district and crime type. We don’t name individuals or addresses, even when the public data set would allow it.
  • YouTube quota safety — live-title and chat polling are rate-tuned to stay under the 10k-unit/day Data API cap. Never auto-publishes without the stream-key gate.
  • Human in the loop for Shorts — highlight clips go through a two-step unlisted-upload → manual review → publish flow.

Humans in the loop

There is a human behind all of this. Most of it was built, is operated, and is continuously debugged by Paul Oliva, with help from friends and open-source contributors. There is an ops control panel on :3008 with one-click overrides for:

  • skipping / pinning a segment on the podcast rotation
  • forcing a specific camera in the 2×2 tile mode
  • injecting weather alerts, breaking headlines, and incidents for testing
  • rotating the Facebook broadcast token, cutting over to a new live_video
  • approving viewer-submitted milestones + shoutouts
  • pinning a civic graphic on-air for a fixed dwell

The paper will only ever be as opinionated as the humans holding the ops panel.

Questions we get

Is this really 24/7?

Yes. The orchestrator runs on a Lightsail box with a systemd unit that restarts on any crash, a circuit-breaker on every external source (3 failures → 5-minute open), and pre-rendered scanner ambience loops for the 01:00–05:00 ET overnight window so we never hit dead air. Occasional minute-scale outages happen when we re-deploy; multi-minute gaps are rare and we notice.

How do you stop the AI from making things up?

Three layers. First: every voice segment is a script that gets normalized through a fixed set of writing rules (no numerals, no symbols, no em dashes) before synthesis — so the model can’t improvise outside the brief. Second: questions that need fresh information route through Anthropic’s web_search tool with geo-bias to Lexington, KY, so “when did that park open” pulls a real source rather than a guess. Third: meeting recaps are grounded in extracted facts from the transcript pipeline — if it isn’t in the clip, it’s not in the recap. Scripts also have a fourth-wall rule: hosts never say “I don’t have that” — they pivot to what a Lexington local would actually know.

Can I contribute?

Yes — the meeting pipeline is open source at paul-codes-1/fuzzy-potato, and we accept tips, corrections, and pitches at [email protected]. If you’d like to submit a shoutout (birthday, anniversary, neighborhood milestone) for HOST_B to read on air during the morning briefing, there’s a form linked from the footer.

This is pretty cool. Can we hire Paul?
Paul Oliva

Paul M. Oliva — Senior Full Stack Engineer

Yes — he is open to new opportunities. Paul built the paper, LexBot, the meeting pipeline, and most of the civic-data plumbing on this site on top of a full-time engineering job — shipping features end-to-end across a Next.js frontend and a Node/TypeScript API for a high-volume ticketing platform.

What you’re hiring, in short:

  • FrontendReact, Next.js, TypeScript, Redux, Tailwind
  • BackendNode.js, Express, REST, GraphQL, tRPC, FastAPI
  • AI / LLMOpenAI, Claude, RAG, ChromaDB, Tesseract OCR
  • CloudAWS (Lambda, S3, RDS, SNS/SQS), Docker, GitHub Actions
  • DataPostgreSQL, MySQL, DynamoDB, MongoDB, Redis
  • PaymentsAdyen, Stripe, PayPal
  • ObservabilityDatadog, Sentry, CloudWatch
pauloliva.com

The code

Public data begets public code. Much of the infrastructure powering this newsroom is open source and happily borrowed from:

  • WordPress + wp-rss-aggregator for the front-of-house
  • Express + simplexml for the feeds engine
  • MapLibre GL JS + OpenFreeMap tiles for every civic map
  • Puppeteer + sharp for server-rendered maps & OG images
  • OBS Studio + obs-websocket for broadcast control
  • pino, node-cron, axios, fast-xml-parser, cheerio
  • Anthropic & ElevenLabs for the AI pieces

Back to the paper: lexingtonky.news. Or read the editorial stance: about the paper.

Founded & published by