Harvestry Documentation The Processing Pipeline

The Processing Pipeline

Four steps that turn a raw video into a polished, interactive study document. Steps 1 and 2 run in parallel; steps 3 and 4 run sequentially after.

Overview

When you click Begin Processing, Harvestry runs four stages:

  1. Transcription — WhisperKit transcribes the audio track using the Apple Neural Engine, producing word-level timestamped segments.
  2. Screenshot Capture — A scene-detection scan identifies key moments, then full-resolution frames are captured and filtered for blur.
  3. LLM Consolidation — Optionally, the transcript is sent to Claude or a local Ollama model to generate structured study notes. Can be skipped.
  4. Export Page — Everything is assembled into a self-contained HTML folder: index.html, styles.css, audio.m4a, and an images/ directory.

Steps 1 and 2 run in parallel because they use different hardware — the Apple Neural Engine for transcription and the GPU/Video Toolbox for frame extraction. On most lectures this means you wait only as long as the slower of the two, not the sum of both.

Harvestry — Processing Pipeline
Pipeline Progress
1. Transcription
WhisperKit · Apple Neural Engine · 4,312 words · 1:12:44
Done
2. Screenshot Capture
47 screenshots captured · 2 discarded (blur)
Done
3. LLM Consolidation
Off
Claude
Ollama
Generating notes…
62%
4. Export Page
Waiting for consolidation…

Starting the Pipeline

Select a lecture in the sidebar that has Ready or Complete status. In the detail view, click the green Begin Processing button in the toolbar. The button disappears once processing starts and is replaced by a cancel button.

You can navigate away from the lecture while it processes — the pipeline continues in the background. The sidebar will update the status badge as each step completes.

Step 1: Transcription

Harvestry uses WhisperKit, an on-device implementation of OpenAI's Whisper model, to transcribe the audio track. Key characteristics:

Five model sizes are available — Tiny, Base, Small, Medium, and Large Turbo — ranging from ~75 MB to ~800 MB. Larger models are more accurate but slower and require more memory. See Transcription for the full comparison table and download instructions.

Step 2: Screenshot Capture

Screenshot capture runs in two phases:

Both phases use the GPU and video decoder via Apple's AVFoundation framework and run in parallel with transcription on the Neural Engine.

See Screenshot Capture for details on blur filtering, the max interval setting, and manual frame capture.

Step 3: LLM Consolidation

Once the transcript is available, you can optionally send it to a language model to generate structured study notes. The mode selector on the Step 3 row has three options:

See LLM Consolidation for setup instructions and prompt customization.

Step 4: Export

The final step assembles all outputs into a folder at your configured export location:

See HTML Export for the full page layout, audio sync, and annotation export details.

Monitoring Progress

During processing, the detail view shows four step rows with progress bars. The Transcript Panel on the right switches to a live log view, showing timestamped messages as the pipeline runs — useful for understanding what's happening or diagnosing an issue.

Progress fractions are approximate; they reflect how much of the video has been processed, not wall-clock time remaining.

After Completion

When all four steps complete:

Re-processing Options

After a lecture is complete, you have several options to update or redo work:

Action What it does Keeps screenshots? Keeps transcript?
Reexport Regenerates the HTML folder from existing transcript and screenshots. Use after adding annotations or changing export settings. Yes Yes
Retranscribe Runs only transcription again (e.g. with a different Whisper model), then re-exports. Screenshots are preserved. Yes Replaced
Reprocess Runs all four pipeline steps from scratch. Use when you want a completely fresh result. Replaced Replaced

These options are available from the overflow menu in the detail view toolbar, or by right-clicking the lecture in the sidebar.