Show HN: Full-text search engine for Epstein docs (OCR and OpenSearch) https://ift.tt/sqn78RC

December 23, 2025

Show HN: Full-text search engine for Epstein docs (OCR and OpenSearch) https://ift.tt/sqn78RC

Show HN: Full-text search engine for Epstein docs (OCR and OpenSearch) Hi HN, Like many people, I was frustrated that the released Epstein/Maxwell court documents were mostly scanned images (PDFs) with no text layer. This made them impossible to Ctrl+F or analyze programmatically. I built a pipeline to fix this using Python, Tesseract, and OpenSearch. The Site: https://ift.tt/9FDpiqT The Stack: Ingestion: Python workers using ocrmypdf (Tesseract) to perform parallel OCR on raw files. Search: OpenSearch for indexing the extracted text. Frontend: Next.js (SSR) for the UI. Infrastructure: Self-hosted Docker swarm. Features: Sub-second full-text search across all files. Highlights search terms directly on the PDF page. Deep linking to specific pages/documents. This is a transparency tool, not a political one. I wanted to make the raw primary sources accessible to researchers and journalists. Feedback on the search relevance or indexing pipeline is welcome! December 24, 2025 at 01:27AM

Search This Blog

Hd mp4, Hollywood DVDRip Latest movies Bollywood Dual Audio,

Show HN: Full-text search engine for Epstein docs (OCR and OpenSearch) https://ift.tt/sqn78RC

Comments

Post a Comment

Popular Posts

Show HN: “Command line text processing with GNU Coreutils” eBook https://ift.tt/3oPZV59

Show HN: Demo of Tailwind CSS, Gulpjs, Alpine.js (Version 2) https://ift.tt/3lXUJcn