Show HN: Full-text search engine for Epstein docs (OCR and OpenSearch) https://ift.tt/sqn78RC

Show HN: Full-text search engine for Epstein docs (OCR and OpenSearch) Hi HN, Like many people, I was frustrated that the released Epstein/Maxwell court documents were mostly scanned images (PDFs) with no text layer. This made them impossible to Ctrl+F or analyze programmatically. I built a pipeline to fix this using Python, Tesseract, and OpenSearch. The Site: https://ift.tt/9FDpiqT The Stack: Ingestion: Python workers using ocrmypdf (Tesseract) to perform parallel OCR on raw files. Search: OpenSearch for indexing the extracted text. Frontend: Next.js (SSR) for the UI. Infrastructure: Self-hosted Docker swarm. Features: Sub-second full-text search across all files. Highlights search terms directly on the PDF page. Deep linking to specific pages/documents. This is a transparency tool, not a political one. I wanted to make the raw primary sources accessible to researchers and journalists. Feedback on the search relevance or indexing pipeline is welcome! December 24, 2025 at 01:27AM

Comments