TB-19 — Blackhouse Gmail Sent Folder Scrape
Summary
Scrape the sent folder of dustin@theblkhse.com for outbound PDF attachments — final cert deliveries, forwarded permits, combined permit+plan PDFs sent to clients and jurisdictions. The original scrape (TB-08) only searched received emails. Result: 3,777 PDFs downloaded, 88% match rate, 517 linked to projects.
What it produced
- 3,777 PDFs downloaded to /data/blkhse-archive/sent/
- _sent_match_report.json with 88% match rate
- 517 files linked into /data/projects/ folder structure
- SHA-256 deduplication against existing archive files
Connections
- depends on: TB-08b-historical-file-linkage — reuses linkage patterns and classification logic
- depends on: Google-Services — Blackhouse Gmail OAuth2 (BLKHSE_GMAIL_REFRESH_TOKEN)
- produced: outbound cert/permit/plan PDFs linked to projects
- feeds into: TB-17-archive-backfill — newly linked files available for parsing