Skip to content

Document Processing & Media

PDFs, Word docs, Excel, OCR, image manipulation, video — every non-text input pipeline. The portfolio's heaviest user is GoGreen-DOC-AI; GoGreen-SmartForms and OpenSentinel are close runners-up.

PDF

PyMuPDF (fitz)

What it is. Python wrapper around MuPDF — fastest pure-Python PDF reader/writer. Handles text extraction, image extraction, page-level rendering.

Used in: GoGreen-DOC-AI, GoGreen-SmartForms.

pdf-parse (Node)

Lightweight Node PDF text extractor. Used in: OpenSentinel, Realestate-all-docker, Recruiting_AI.

PDFKit (Node)

PDF generation library. Used in: MyPollingApp (poll-result PDFs), OpenSentinel (report PDFs).

react-pdf

Render PDFs in the browser. Used in: GoGreen-DOC-AI, GoGreen-SmartForms (in-app document viewing).

jsPDF

Client-side PDF generation. Used in: TimeSheetAI (timesheet exports).

Dompdf (PHP)

Render HTML to PDF. Used in: Voting_NewAndImproved (ballots, results PDFs).

ReportLab (Python)

Programmatic PDF generation — invoices, reports, structured documents. Used in: Automotive-Repair-Diagnosis-AI (diagnostic reports, invoices, estimates).

LlamaParse / Unstructured.io (not currently used)

SaaS PDF-to-LLM-text services. The portfolio uses Docling + PyMuPDF instead, self-hosted.


Office documents

Mammoth

DOCX → HTML / plain text. Used in: Boomer_AI, NaggingWifeAI, OpenSentinel, Recruiting_AI.

openpyxl (Python) / ExcelJS / XLSX (Node) / PhpOffice (PHP)

Read and write XLSX. Used in: - openpyxl — GoGreen-DOC-AI. - XLSX — Boomer_AI, Ecom-Sales, GogreenSellerAI, NaggingWifeAI, Recruiting_AI, TimeSheetAI. - ExcelJS — OpenSentinel. - PhpOffice — Voting_NewAndImproved.


OCR

Tesseract / Tesseract.js / pytesseract

What it is. Google-maintained OCR engine. The default open-source OCR.

Used in: - GoGreen-DOC-AI — Tesseract via Docling pipeline. - GoGreen-SmartForms — Tesseract for form scans (paired with OpenCV preprocessing). - OpenSentinel — Tesseract.js for browser-side OCR. - Tutor_AI-Docker — Tesseract.js (homework photo → text).

Pattern. Image preprocessing (deskew, threshold, denoise via OpenCV/Pillow) → Tesseract → optional LLM cleanup pass.

Docling

What it is. IBM-released document-AI library — converts PDFs, Word, PowerPoint, images into clean structured Markdown/JSON. Bundles its own OCR + layout analysis. Faster and more accurate than Tesseract on document-shaped inputs.

Used in: GoGreen-DOC-AI (the heavy doc-ingestion app).


Image processing

Pillow (Python)

Used in: Automotive-Repair-Diagnosis-AI, GoGreen-DOC-AI, GoGreen-SmartForms.

OpenCV

Used in: Automotive-Repair-Diagnosis-AI (vision diagnostics), GoGreen-SmartForms (form preprocessing for OCR).

Sharp (Node)

What it is. Native libvips-backed image processor. Resize, crop, format-convert, animated GIF — extremely fast.

Used in: Ecom-Sales, GogreenSellerAI (product imagery pipeline).

Jimp (Node)

Pure-JS image library — slower than Sharp but no native deps. Used in: Realestate-all-docker.

face-api.js

What it is. Browser-side face detection / recognition (TensorFlow.js wrapper).

Used in: TimeSheetAI (face-based clock-in / verification).


Barcode / QR

pyzbar

QR / barcode detection. Used in: GoGreen-SmartForms.

QRCode generation

  • qrcode (Node) — MyPollingApp, GoGreenPaperlessInitiative, TimeSheetAI.
  • qrcode (Python) — GoGreen apps.

ua-parser-js

User-agent string → device info. Used in: MyPollingApp (device detection for poll metadata).

geoip-lite

IP → geolocation. Used in: MyPollingApp.


Video / audio

FFmpeg

What it is. The Swiss-army-knife video/audio toolkit. Used as a child-process from Node.

Used in: GoGreenMarketing (video processing for social-media campaigns).

Whisper (OpenAI API)

Speech-to-text. Used in: FamilyChat, Ecom-Sales, MangyDogCoffee, NaggingWifeAI, Salon, SCO, SellMeACar, SellMe_PRT, Tutor_AI, plus the Jarvis apps for local Whisper.

See LLM-Providers.md for the realtime variant.

ElevenLabs

TTS. See LLM-Providers.md.


Math / diagrams (frontend rendering)

KaTeX

What it is. Fast LaTeX math typesetting — server-side or client-side. Used in: Tutor_AI (math homework rendering).

Mermaid

What it is. Markdown-flavored diagram syntax → SVG. Flowcharts, sequence, class, ER diagrams.

Used in: Tutor_AI.


Specialty extractors

Crawlee

What it is. Web-scraping framework — Apify's open-source crawler. Manages the queue, browser pool, retries, proxies. Works with Cheerio (HTML parsing) or Playwright (full browser).

Used in: GoGreen-AI-Concierge (CheerioCrawler — knowledge ingestion), GoGreenMarketing (SEO crawler worker).

langdetect

Language identification on text. Used in: GoGreen-SmartForms.

feedparser (Python)

RSS / Atom parser. Used in: PolyMarketAI (news ingest).

NewsAPI

News-headline aggregation API. Used in: PolyMarketAI.

NHTSA APIs

Free US-government vehicle data — VIN decode, recalls, TSBs (technical service bulletins). Used in: Automotive-Repair-Diagnosis-AI.

python-obd

Python interface to OBD-II vehicle diagnostic ports (over Bluetooth/USB ELM327 adapters). Used in: Automotive-Repair-Diagnosis-AI.

Brave Search API

Privacy-preserving web search. Used in: AI-Wordpress (web augmentation), Automotive (repair-procedure lookup).


Comparison alternatives (not used)

Tool Notes
Apache Tika Universal text extraction (Java). PyMuPDF + Mammoth + openpyxl cover the same surface in Python.
AWS Textract / GCP Document AI Managed OCR + layout. Docling is the open-source alternative we use.
AWS Rekognition Image labeling / face detection. Not used.
PaddleOCR / EasyOCR Tesseract competitors. Tesseract is "good enough" with Docling on top.

Decision guide

Need to extract text from a PDF?              PyMuPDF (Python) or pdf-parse (Node).
Need a doc → clean Markdown pipeline?         Docling (best DX for AI ingestion).
Need OCR on a scanned form?                   Tesseract + OpenCV preprocessing.
Need OCR in the browser?                      Tesseract.js.
Need to generate a PDF report?                ReportLab (Python), PDFKit (Node), Dompdf (PHP).
Need to resize / format-convert images?       Sharp (server) or Pillow (Python).
Need browser face detection?                  face-api.js.
Need to crawl a customer's site?              Crawlee.
Need video processing?                        FFmpeg.
Need math rendering in the UI?                KaTeX.
Need diagrams in the UI?                      Mermaid.