For Mac  ·  Apple Silicon  ·  Free Trial

Stop rewatching.
Start learning.

Harvestry transcribes your videos — lecture recordings, online courses, tutorials, conference talks — captures the key screenshots, and exports a polished, interactive study document — entirely on your Mac, entirely in private.

Requires macOS 15 or later  ·  Apple Silicon

Harvestry — knowledge harvested
Harvestry app showing the Artemis II Flight Day 9 video processed into a transcript with screenshots
How it works

Import a video. Get a document.

You can rewatch a video. You can't search it. You can't highlight the important parts, annotate a timestamp, or find that one slide you half-remember without scrubbing back through everything. Harvestry converts any video — lecture, tutorial, talk, course — into a structured study document. Read it, search it, mark it up. Keep it forever.

1 — Import
Drop in any video

Paste a URL from YouTube, Vimeo, or any of a thousand platforms — or open a local file. Harvestry downloads and prepares it automatically.

2 — Process
Transcribed and captured — together

Every spoken word is transcribed on the Apple Neural Engine. Key video image frames are captured simultaneously on the GPU. Both finish at the same time — usually in just a few minutes.

3 — Study
Beautiful HTML export

A self-contained study page that opens in any web browser — on your Mac, iPhone, or any device. Live word highlighting, text annotations, audio sync, and dark mode built in. No internet required.

Transcription

Research-grade accuracy.
Zero cloud fees.

Harvestry uses WhisperKit — Argmax's optimised framework for running OpenAI's Whisper on Apple Silicon. Every word is processed on the Apple Neural Engine, the dedicated ML accelerator in every M-series chip. No API key. No audio upload. No subscription.

Five model sizes let you dial in exactly the right tradeoff for your video. Tiny finishes a 90-minute video in minutes. Large Turbo captures technical vocabulary, proper nouns, and accented speech that smaller models miss.

5
Model sizes
100%
On-device
5
Languages
Harvestry Settings — Transcription Model panel showing five Whisper model sizes

"All American: The Power of Sports" by U.S. National Archives, licensed under CC BY 3.0

Harvestry transcript panel showing captured screenshots alongside timestamped passages
Screenshot Capture

Every slide. Once. The best frame.

Not every frame deserves a screenshot — and the same slide should never be captured twice. Harvestry runs a multi-stage pipeline — Apple Vision shape detection, accurate on-device OCR, and perceptual fingerprinting — to surface every distinct slide worth keeping and quietly drop the near-duplicates a simpler tool would flood you with. Every stage runs on your Mac.

  • Powered by Apple Vision.On-device machine learning scans each frame for text structure. A presentation slide, whiteboard, map, or labeled diagram qualifies. A presenter's logo-print shirt, a news ticker, or a lower-third caption does not.
  • Reads every slide — then de-duplicates.Harvestry runs accurate OCR on each candidate and compares the actual words. A new headline is a new slide, and it's kept. The same words are a repeat, and it's skipped. Two dark slides that look identical to a pixel comparison are still told apart by their text.
  • Knows the slide from the speaker.Talking-head and b-roll shots that differ only by a gesture are recognized by a perceptual fingerprint and collapsed to a single representative — so you get the slides, not fifty near-identical frames of the presenter.
  • Best-frame selection.Catching the transition is only half the job. Harvestry scans forward from the cut, scores a short window of frames for sharpness, and decodes only the winner. Slides that fade or animate in are captured fully settled — never mid-entrance.
Manual Curation

The algorithm picks.
You perfect it.

Automated capture is thorough, but you know your video better than any algorithm. Harvestry gives you a full suite of curation tools so you can take the result from good to exactly right — without rerunning any processing.

  • Full scrubber timeline.Navigate anywhere in the video by dragging. The current time and total duration are always visible.
  • Seek to Clear.Paused on a blurry frame? Press ◀ or ▶ to step to the nearest sharp, clear frame in that direction — automatically.
  • Add or remove screenshots.Capture any frame at full resolution and insert it into the transcript with one press — or hover over any included screenshot and click Remove.
Harvestry video player showing the scrubber timeline, Seek to Clear, and Add to Transcript controls
Annotated study document showing margin notes and highlights in the exported HTML
Your Annotated Library

Curate a library
worth coming back to.

Every video you process joins your library as a fully annotated study document — your highlights, margin notes, and thinking baked into the HTML, ready whenever you return. Not a folder of files. A curated collection that grows more valuable with every lecture you add.

  • Margin notes.Select any phrase and write a short note. In the export it appears in the margin alongside the amber-underlined phrase — the look of handwritten marginalia on a printed page, baked in and visible to anyone who opens the file.
  • Highlights and inline notes.Four highlight colours and freeform inline notes dropped anywhere between passages. All embedded as standard HTML — visible the instant the file opens, in any browser.
  • Your annotations stay with it, always.Every highlight and note stays attached to its lecture. Open something you processed six months ago and your thinking is exactly where you left it.
  • Organised by you.Group lectures into folders — a whole course, a speaker's body of work, a topic you're going deep on. Your entire curriculum in one sidebar, synced silently across your Macs via iCloud.
LLM Consolidation

From transcript to insight.

A raw transcript is accurate. It isn't always elegant. Spoken content is conversational, not written — presenters repeat themselves, revise mid-sentence, and circle back. The optional consolidation stage sends your transcript through an LLM and produces a second, refined version of your study document: same facts, same structure, clearer writing.

Standard
Claude API

Routes your transcript through Anthropic's Claude API. Produces consistently high-quality, well-structured academic prose. Fast, with no local setup required beyond an API key.

⚠ Transcript text is sent to Anthropic over HTTPS. Use local Ollama instead if your video contains sensitive material.

Pro — Fully Local
Local Ollama

Connects to a locally running Ollama instance. Any compatible model — Llama, Mistral, Gemma, and others. Nothing leaves your Mac. No per-token cost. No internet required.

Requires a capable Mac and a few minutes of Ollama setup. Once running, it is completely private and free of any external dependency.

The consolidated document flows into the same HTML export pipeline as the base transcript — same typography, same audio sync, same layout. Both versions are fully editable in Harvestry before export. With Harvestry Pro, you can also export either version — or both — straight to your Obsidian vault or any Markdown editor.

HTML Export — Opens in Your Browser

A document worth reading.

The export isn't a raw text dump. Harvestry generates a polished, fully self-contained HTML page you open directly in Safari, Chrome, or any browser — styled like a high-quality editorial publication, with live audio playback, per-word highlighting, and text markup built in.

Screenshot of the exported HTML study document in a browser
Live Word Highlight

As the video plays, each word lights up in real time. Click any timestamp to jump to that moment.

Slide-Aware Layout

Screenshots are classified at export time. Lecture slides with readable content render at full page width — legible without zooming. Presenter footage floats to the margin as a compact thumbnail, keeping the text front and centre.

Highlights & Marginalia

Four highlight colours and margin notes are created in the app and baked into the HTML — visible to anyone the moment the file opens, no interaction required.

Offline Forever

No CDN fonts, no remote scripts, no dependencies. Dark mode and font-size preferences are baked in and persist across reloads. Works in any browser, on any device — today or in ten years.

Pro

Obsidian & Markdown export. Harvestry Pro also exports your study document as a Markdown file — YAML frontmatter, per-screenshot assets, and a per-lecture folder structure that drops straight into your Obsidian vault or any Markdown editor. See what's in Pro →

Privacy

Almost everything stays on your Mac.

Transcription, screenshot capture, and HTML export all run locally. No audio, no video, and no images are ever transmitted anywhere. The core pipeline has no network dependencies beyond the video download itself.

The one optional exception: if you choose to use the Claude API for LLM consolidation, your transcript text is sent to Anthropic's servers over HTTPS. This step is always opt-in — the base processing pipeline is fully local whether or not you use it. The Pro tier supports local Ollama models instead, keeping even that step entirely on-device.

Transcription — Apple Neural Engine
WhisperKit processes audio entirely on-chip. No API calls. No uploads. Your spoken words stay on your device.
Screenshots — GPU, Hardware Decoder & Apple Vision
AVFoundation extracts frames using VideoToolbox. Apple Vision scans each thumbnail for text structure, reads the actual words via accurate on-device OCR, and fingerprints every frame so distinct slides are kept and near-duplicate shots of the speaker are dropped. No images, no analysis, and no results ever leave your machine.
Both run simultaneously
Because transcription and capture use different hardware, they run in parallel. A 90-minute video completes both stages at the same time.
HTML Export — Fully local
Export generation reads from your disk and writes to your disk. The only network call is the optional yt-dlp video fetch.
Localisation

Available in five languages.

Every label, prompt, status message, and tooltip is fully localised. Harvestry respects your macOS language preference automatically — no configuration required.

🇬🇧 English 🇩🇪 German 🇮🇹 Italian 🇪🇸 Spanish 🇫🇷 French

Ready to learn more from everything you watch?

Download Harvestry and turn your first video into a study document — free. No subscription, no account. Drop in a file or paste a URL and press Begin Processing.

Requires macOS 15 or later  ·  Apple Silicon  ·  Free trial included