Skip to content
← Back to revision analytics

Change type methodology

Every verse-level difference between consecutive editions is automatically classified into one of five categories. This page describes how the classification works so you can interpret the analytics with confidence.

Overview

For each verse that differs between two consecutive editions, the classifier compares the full verse text through a series of increasingly fine-grained checks. Punctuation and whitespace are evaluated first, then capitalization, then character-level orthographic similarity, and finally word-level structural differences. The first category that fully explains the change is assigned.

CategoryRule
PunctuationAfter stripping all punctuation, the word sequences are identical.
SpellingWords differ only in capitalization or orthographic form (character-level similarity ≥ 75%).
Grammar1–2 word changes involving function words (pronouns, articles, prepositions, verb forms, etc.).
Misc.1–2 content-word changes that are not grammatical (e.g. name swaps, noun substitutions).
SubstantiveA single contiguous change span adds, removes, or replaces 3 or more whole words.

Step-by-step process

  1. Character-level diff. Using diff-match-patch, the algorithm produces a sequence of character-level diff spans (equal, insert, delete) for each verse change.
  2. Classify each span independently. Every changed span (insertion or deletion) is classified using these rules, checked in order:
    • If the span contains only punctuation or whitespace → Punctuation.
    • If the span is part of a deletion+insertion pair and they differ only in letter case → Spelling.
    • If the span has no whitespace and is adjacent to a word character in the surrounding unchanged text (i.e. it modifies part of a word, like exceedingexceedingly) → Spelling.
    • Otherwise, count the whole words in the span. If 1–2 words and at least one is a function word (pronoun, article, preposition, auxiliary verb, etc.) → Grammar. If 1–2 words and none are function words → Misc.. 3+ words → Substantive.
  3. Verse-level classification. The verse is assigned the most severe type found among its spans. For example, a verse with two Spelling spans and one Grammar span is classified as Grammar.

Category details & examples

Punctuation

The verse words are identical once punctuation and whitespace are stripped. This covers added or removed commas, semicolons, em-dashes, periods, quotation marks, and whitespace adjustments.

Before: ...I knew that the Lord had delivered Laban into my hands for this cause—that I might...

After: ...I knew that the Lord had delivered Laban into my hands for this cause, that I might...

Em-dash replaced with comma — same words.

Spelling

Covers three kinds of variation that don’t change word identity or meaning:

  • Capitalization and vs And
  • British/American spelling favour favor, marvellous marvelous
  • Archaic modernization shewn shown, lustre luster
  • Typographical errors dilligent diligent

Two words are considered orthographic variants when their character-level similarity (Python SequenceMatcher.ratio()) is ≥ 0.75. This threshold captures the variants above while correctly rejecting genuine word swaps like which who (similarity 0.50) or is are (similarity 0.00).

Before: ...having been favoured of the Lord...

After: ...having been favored of the Lord...

"favoured" → "favored" — similarity 0.93, treated as spelling variant.

Grammar

One or two words are structurally different, and at least one of the changed words is a function word—a pronoun, article, preposition, conjunction, auxiliary verb, or common verb form. These typically involve verb tense changes, pronoun swaps, or small insertions/deletions that adjust phrasing without significantly altering the verse’s content.

Before: ...my father had read and saw many great and marvellous things...

After: ...my father had read and seen many great and marvelous things...

"saw" → "seen" is 1 structural change (verb form). "marvellous" → "marvelous" is a spelling variant and not counted.

Before: ...the record which I make, to be true...

After: ...the record which I make, is true...

"to be" → "is" — all function words, so classified as Grammar.

Misc.

One or two words are structurally different, but none of the changed words are function words. This captures small content-level edits—name corrections, noun substitutions, and other targeted word swaps that change meaning rather than syntax.

Before: ...king Benjamin had a gift from God...

After: ...king Mosiah had a gift from God...

"Benjamin" → "Mosiah" — a proper-noun swap, not a grammatical change.

Substantive

A single contiguous change span that adds, removes, or replaces 3 or more whole words. This captures meaningful editorial revisions: phrase insertions or deletions, theological rewording, and sentence restructuring. Multiple small changes (e.g. three separate pronoun swaps) are each classified individually as Grammar, not aggregated into Substantive.

Before: ...the mother of God...

After: ...the mother of the Son of God...

Insertion of "Son of the" — 3 words added in one span.

Before: ...he pitched his tent in a valley beside a river of water...

After: ...he pitched his tent in a valley by the side of a river of water...

"beside" → "by the side of" — 4 words inserted in one span.

Limitations

  • The classifier is heuristic, not a linguistic analysis. Borderline cases exist at span boundaries—how diff-match-patch splits changes can shift a span between categories.
  • The “partial word” detection relies on context: if the surrounding unchanged text has a word character adjacent to the span, it is treated as a within-word modification (Spelling). In rare cases this heuristic may misclassify a very short whole-word change that happens to abut another word.

Source code

The classification logic lives in scripts/process_data.py (functions _classify_span(), classify_from_diffs()) and is mirrored in components/DiffRender.tsx for client-side highlighting. The full source is available in the project repository.