For Mac  ·  Apple Silicon  ·  Free Trial

Stop rewatching.
Start studying.

Harvestry transcribes your videos — lecture recordings, online courses, tutorials, conference talks — captures the key screenshots, and exports a polished, interactive study document — entirely on your Mac, entirely in private.

Requires macOS 15 or later  ·  Apple Silicon

Harvestry — knowledge harvested
The Law of Gravitation — Lecture 1
Export HTML
Feynman lecture 1964
Richard Feynman, Cornell University, 1964 — via Wikimedia Commons
21:04
1:02:33 ◀ Seek to Clear ▶ + Add to Transcript
Feynman Messenger Lectures — Lecture 1: The Law of Gravitation
1:02:33 Duration
847 MB File Size
1920×1080 Resolution
Pipeline
Transcription
Screenshot Capture
LLM Consolidation
Export Page
Completed
Show in Finder Open in Browser
Transcript
47 passages · 2,341 words · 34 screenshots Copy
16:31

The gravitational force between any two objects is proportional to the product of their masses and inversely as the square of the distance. F = Gm₁m₂ / r². Every object attracts every other object.

Screenshot — Feynman at blackboard
18:47

Kepler found the ellipse empirically from Brahe's data. Newton showed why it must be an ellipse. That's the difference between observation and understanding.

Note

Key: Kepler described the orbit, Newton explained it. Same pattern throughout physics.

21:04

The impressive thing is the universality. The same law — same equation, same constant G — works for the apple, the moon, Jupiter's moons, and the double stars.

24:18

Nature uses only the longest threads to weave her patterns, so each small piece of her fabric reveals the organization of the entire tapestry.

How it works

Import a video. Get a document.

You can rewatch a video. You can't search it. You can't highlight the important parts, annotate a timestamp, or find that one slide you half-remember without scrubbing back through everything. Harvestry converts any video — lecture, tutorial, talk, course — into a structured study document. Read it, search it, mark it up. Keep it forever.

1 — Import
Drop in any video

Paste a URL from YouTube, Vimeo, or any of a thousand platforms — or open a local file. Harvestry downloads and prepares it automatically.

2 — Process
Transcribed and captured — together

Every spoken word is transcribed on the Apple Neural Engine. Key video image frames are captured simultaneously on the GPU. Both finish at the same time — usually in just a few minutes.

3 — Study
Beautiful HTML export

A self-contained study page that opens in any web browser — on your Mac, iPhone, or any device. Live word highlighting, text annotations, audio sync, and dark mode built in. No internet required.

Transcription

Research-grade accuracy.
Zero cloud fees.

Harvestry uses WhisperKit — Argmax's optimised framework for running OpenAI's Whisper on Apple Silicon. Every word is processed on the Apple Neural Engine, the dedicated ML accelerator in every M-series chip. No API key. No audio upload. No subscription.

Five model sizes let you dial in exactly the right tradeoff for your video. Tiny finishes a 90-minute video in minutes. Large Turbo captures technical vocabulary, proper nouns, and accented speech that smaller models miss.

5
Model sizes
100%
On-device
5
Languages
Harvestry Settings — Transcription Model panel showing five Whisper model sizes
Harvestry transcript panel showing captured screenshots alongside timestamped passages
Screenshot Capture

Slides get the attention they deserve.

Not every frame deserves a screenshot. Harvestry uses two layers of Apple Vision — shape detection and on-device OCR — to find every slide worth capturing, including ones that would defeat a pixel-diff algorithm alone.

  • Powered by Apple Vision.On-device machine learning scans each frame for text structure without reading a word. A presentation slide, whiteboard, or labeled diagram qualifies. A presenter's logo-print shirt, a news ticker, or a lower-third caption does not.
  • Coverage-based detection.A frame only triggers capture when more than 20% of pixels genuinely change — a presenter's webcam moves at 5–10% and never qualifies. A slide transition rewrites 60–90% and always does.
  • Best-frame selection.Detecting the transition is only half the job. Harvestry scans forward from the cut, scores a short window of frames for sharpness, and decodes only the winner. Slides that fade or animate in are always captured fully settled — never mid-entrance.
  • Reads the words when pixels aren't enough.Two dark-background slides can look nearly identical to a pixel comparison. When that happens, Harvestry reads the text on each frame and measures how much the word sets overlap — a low Jaccard score means a different slide, and both get captured. All on-device.
Manual Curation

The algorithm picks.
You perfect it.

Automated capture is thorough, but you know your video better than any algorithm. Harvestry gives you a full suite of curation tools so you can take the result from good to exactly right — without rerunning any processing.

  • Full scrubber timeline.Navigate anywhere in the video by dragging. The current time and total duration are always visible.
  • Seek to Clear.Paused on a blurry frame? Press ◀ or ▶ to step to the nearest sharp, clear frame in that direction — automatically.
  • Add or remove screenshots.Capture any frame at full resolution and insert it into the transcript with one press — or hover over any included screenshot and click Remove.
Harvestry video player showing the scrubber timeline, Seek to Clear, and Add to Transcript controls
Marginalia & highlights in the exported HTML
Annotation

Your marginalia travel
with the document.

All markup is created in the app and baked into the exported HTML — not applied later in the browser. Share the file and the annotations are already there, visible to anyone, in any browser.

  • Margin notes.Select any phrase in the transcript, press the annotation icon, and write a short note. In the export it appears in the left or right margin alongside the amber-underlined phrase, connected by a dotted line — the look of handwritten marginalia on a printed page. On narrow screens they stack cleanly below the passage.
  • Text highlights.Four colours — yellow, green, blue, pink. Applied from the transcript panel, embedded as <mark> elements in the HTML. Visible the instant the file opens.
  • Inline notes.Drop a note anywhere between passages — after a key definition, between a screenshot and the next sentence. They appear inline in the exported document.
HTML Export — Opens in Your Browser

A document worth reading.

The export isn't a raw text dump. Harvestry generates a polished, fully self-contained HTML page you open directly in Safari, Chrome, or any browser — styled like a high-quality editorial publication, with live audio playback, per-word highlighting, and text markup built in.

Exported Study Document Screenshot of the HTML export in a browser — full page view
Live Word Highlight

As the video plays, each word lights up in real time. Click any timestamp to jump to that moment.

Slide-Aware Layout

Screenshots are classified at export time. Lecture slides with readable content render at full page width — legible without zooming. Presenter footage floats to the margin as a compact thumbnail, keeping the text front and centre.

Highlights & Marginalia

Four highlight colours and margin notes are created in the app and baked into the HTML — visible to anyone the moment the file opens, no interaction required.

Offline Forever

No CDN fonts, no remote scripts, no dependencies. Dark mode and font-size preferences are baked in and persist across reloads. Works in any browser, on any device — today or in ten years.

Privacy

Almost everything stays on your Mac.

Transcription, screenshot capture, and HTML export all run locally. No audio, no video, and no images are ever transmitted anywhere. The core pipeline has no network dependencies beyond the video download itself.

The one optional exception: if you choose to use the Claude API for LLM consolidation, your transcript text is sent to Anthropic's servers over HTTPS. This step is always opt-in — the base processing pipeline is fully local whether or not you use it. The Pro tier supports local Ollama models instead, keeping even that step entirely on-device.

Transcription — Apple Neural Engine
WhisperKit processes audio entirely on-chip. No API calls. No uploads. Your spoken words stay on your device.
Screenshots — GPU, Hardware Decoder & Apple Vision
AVFoundation extracts frames using VideoToolbox. Apple Vision scans each thumbnail for text structure, and when needed reads the actual words via on-device OCR to distinguish rapidly changing slides. No images, no analysis, and no results ever leave your machine.
Both run simultaneously
Because transcription and capture use different hardware, they run in parallel. A 90-minute video completes both stages at the same time.
HTML Export — Fully local
Export generation reads from your disk and writes to your disk. The only network call is the optional yt-dlp video fetch.
LLM Consolidation

From transcript to insight.

A raw transcript is accurate. It isn't always elegant. Spoken content is conversational, not written — presenters repeat themselves, revise mid-sentence, and circle back. The optional consolidation stage sends your transcript through an LLM and produces a second, refined version of your study document: same facts, same structure, clearer writing.

Standard
Claude API

Routes your transcript through Anthropic's Claude API. Produces consistently high-quality, well-structured academic prose. Fast, with no local setup required beyond an API key.

⚠ Transcript text is sent to Anthropic over HTTPS. Use local Ollama instead if your video contains sensitive material.

Pro — Fully Local
Local Ollama

Connects to a locally running Ollama instance. Any compatible model — Llama, Mistral, Gemma, and others. Nothing leaves your Mac. No per-token cost. No internet required.

Requires a capable Mac and a few minutes of Ollama setup. Once running, it is completely private and free of any external dependency.

The consolidated document flows into the same HTML export pipeline as the base transcript — same typography, same audio sync, same layout. Both versions are fully editable in Harvestry before export. Coming soon

Localisation

Available in five languages.

Every label, prompt, status message, and tooltip is fully localised. Harvestry respects your macOS language preference automatically — no configuration required.

🇬🇧 English 🇩🇪 German 🇮🇹 Italian 🇪🇸 Spanish 🇫🇷 French

Ready to learn more from everything you watch?

Download Harvestry and turn your first video into a study document — free. No subscription, no account. Drop in a file or paste a URL and press Begin Processing.

Requires macOS 15 or later  ·  Apple Silicon  ·  Free trial included