A field guide to The Lexington Times
A 24/7 livestream. A searchable AI transcript archive. A real-time map of LFD calls, flood gauges, and crime reports. All glued together by open-source code, public data, and one very opinionated ops panel.
What you’re looking at
The Lexington Times is three products pretending to be one paper. Here’s what each one does and where it runs.
The paper
lexingtonky.news
WordPress. Aggregation via wp-rss-aggregator, original reporting, commentary, meeting agendas, SEO via Yoast, Jetpack stats. The human-readable front door.
The feeds engine
feeds.lexingtonky.news
Node + Express. Scrapes public sources, rewrites with Claude, stores Articles as JSON, exposes RSS, sitemaps, OG images, full-text search, and an /api/match endpoint so the paper can cross-link transcripts.
The livestream
youtube.com/@TheLexingtonTimes
Orchestrator + OBS + dashboard. 24/7 civic radio with ambient scanner audio, rotating traffic cams, real-time LFD incidents, NWS alerts, and AI-voiced briefs. Runs on a Lightsail box.
The stack
Public data sources
Everything below is either public record, a licensed re-publisher, or data we generate ourselves. No scraping of paid sources. No social-media private APIs. No PII.
The meeting pipeline
LFUCG publishes Council + commission meetings on Granicus with scanned agenda PDFs and auto-generated captions. That’s where our civic coverage starts — but the raw feed isn’t something you can skim. A dedicated Python pipeline turns each clip into something searchable.
extracted_facts.json. Pure structured extraction, no narrative.[timestamp: MM:SS] markers for clickable video seeking.The whole thing is open source — code, docs, and the Granicus reverse-engineering notes (granicus.md) that made timestamp deep-linking possible.
What LexBot reads on air
Every voice segment is a written-for-radio script generated by Claude, spoken by ElevenLabs, and logged to the transcript archive. The cadence is deliberately unhurried — it’s a civic radio feed, not TikTok.
Guardrails
- AI disclosure — every transcript includes a human-readable note that the segment was generated by LexBot, which model wrote it, and when it aired.
- Fair use — aggregated headlines carry title + ≤200-char summary, with the canonical source link. Voice scripts paraphrase and attribute; they never quote.
- No hallucinations on facts — Ask Lex and breaking briefs run through Anthropic’s
web_search_20250305tool, geo-biased to Lexington, KY. - Circuit breakers — every poller opens on 3 consecutive failures and stays open 5 minutes before retry. No silent infinite-retry.
- Dwell + cooldown — scene-switch safety constants prevent the stream from rapidly flipping between cams / alerts / ads in ways that would look unstable on air.
- Fourth wall — hosts never say “I don’t have that in front of me,” “our data,” or “checking our feeds.” If information is thin, we pivot to what a local would know.
- Crime blotter is aggregated — counts per council district and crime type. We don’t name individuals or addresses, even when the public data set would allow it.
- YouTube quota safety — live-title and chat polling are rate-tuned to stay under the 10k-unit/day Data API cap. Never auto-publishes without the stream-key gate.
- Human in the loop for Shorts — highlight clips go through a two-step unlisted-upload → manual review → publish flow.
Humans in the loop
There is a human behind all of this. Most of it was built, is operated, and is continuously debugged by Paul Oliva, with help from friends and open-source contributors. There is an ops control panel on :3008 with one-click overrides for:
- skipping / pinning a segment on the podcast rotation
- forcing a specific camera in the 2×2 tile mode
- injecting weather alerts, breaking headlines, and incidents for testing
- rotating the Facebook broadcast token, cutting over to a new live_video
- approving viewer-submitted milestones + shoutouts
- pinning a civic graphic on-air for a fixed dwell
The paper will only ever be as opinionated as the humans holding the ops panel.
Questions we get
Is this really 24/7?
Yes. The orchestrator runs on a Lightsail box with a systemd unit that restarts on any crash, a circuit-breaker on every external source (3 failures → 5-minute open), and pre-rendered scanner ambience loops for the 01:00–05:00 ET overnight window so we never hit dead air. Occasional minute-scale outages happen when we re-deploy; multi-minute gaps are rare and we notice.
How do you stop the AI from making things up?
Three layers. First: every voice segment is a script that gets normalized through a fixed set of writing rules (no numerals, no symbols, no em dashes) before synthesis — so the model can’t improvise outside the brief. Second: questions that need fresh information route through Anthropic’s web_search tool with geo-bias to Lexington, KY, so “when did that park open” pulls a real source rather than a guess. Third: meeting recaps are grounded in extracted facts from the transcript pipeline — if it isn’t in the clip, it’s not in the recap. Scripts also have a fourth-wall rule: hosts never say “I don’t have that” — they pivot to what a Lexington local would actually know.
Can I contribute?
Yes — the meeting pipeline is open source at paul-codes-1/fuzzy-potato, and we accept tips, corrections, and pitches at [email protected]. If you’d like to submit a shoutout (birthday, anniversary, neighborhood milestone) for HOST_B to read on air during the morning briefing, there’s a form linked from the footer.
This is pretty cool. Can we hire Paul?
Paul M. Oliva — Senior Full Stack Engineer
Yes — he is open to new opportunities. Paul built the paper, LexBot, the meeting pipeline, and most of the civic-data plumbing on this site on top of a full-time engineering job — shipping features end-to-end across a Next.js frontend and a Node/TypeScript API for a high-volume ticketing platform.
What you’re hiring, in short:
- FrontendReact, Next.js, TypeScript, Redux, Tailwind
- BackendNode.js, Express, REST, GraphQL, tRPC, FastAPI
- AI / LLMOpenAI, Claude, RAG, ChromaDB, Tesseract OCR
- CloudAWS (Lambda, S3, RDS, SNS/SQS), Docker, GitHub Actions
- DataPostgreSQL, MySQL, DynamoDB, MongoDB, Redis
- PaymentsAdyen, Stripe, PayPal
- ObservabilityDatadog, Sentry, CloudWatch
The code
Public data begets public code. Much of the infrastructure powering this newsroom is open source and happily borrowed from:
- WordPress + wp-rss-aggregator for the front-of-house
- Express + simplexml for the feeds engine
- MapLibre GL JS + OpenFreeMap tiles for every civic map
- Puppeteer + sharp for server-rendered maps & OG images
- OBS Studio + obs-websocket for broadcast control
- pino, node-cron, axios, fast-xml-parser, cheerio
- Anthropic & ElevenLabs for the AI pieces
Back to the paper: lexingtonky.news. Or read the editorial stance: about the paper.