Change type methodology
Every verse-level difference between consecutive editions is automatically classified into one of five categories. This page describes how the classification works so you can interpret the analytics with confidence.
Overview
For each verse that differs between two consecutive editions, the classifier compares the full verse text through a series of increasingly fine-grained checks. Punctuation and whitespace are evaluated first, then capitalization, then character-level orthographic similarity, and finally word-level structural differences. The first category that fully explains the change is assigned.
| Category | Rule |
|---|---|
| Punctuation | After stripping all punctuation, the word sequences are identical. |
| Spelling | Words differ only in capitalization or orthographic form (character-level similarity ≥ 75%). |
| Grammar | 1–2 word changes involving function words (pronouns, articles, prepositions, verb forms, etc.). |
| Misc. | 1–2 content-word changes that are not grammatical (e.g. name swaps, noun substitutions). |
| Substantive | A single contiguous change span adds, removes, or replaces 3 or more whole words. |
Step-by-step process
- Character-level diff. Using
diff-match-patch, the algorithm produces a sequence of character-level diff spans (equal, insert, delete) for each verse change. - Classify each span independently. Every changed span (insertion or deletion) is classified using these rules, checked in order:
- If the span contains only punctuation or whitespace → Punctuation.
- If the span is part of a deletion+insertion pair and they differ only in letter case → Spelling.
- If the span has no whitespace and is adjacent to a word character in the surrounding unchanged text (i.e. it modifies part of a word, like
exceeding→exceedingly) → Spelling. - Otherwise, count the whole words in the span. If 1–2 words and at least one is a function word (pronoun, article, preposition, auxiliary verb, etc.) → Grammar. If 1–2 words and none are function words → Misc.. 3+ words → Substantive.
- Verse-level classification. The verse is assigned the most severe type found among its spans. For example, a verse with two Spelling spans and one Grammar span is classified as Grammar.
Category details & examples
Punctuation
The verse words are identical once punctuation and whitespace are stripped. This covers added or removed commas, semicolons, em-dashes, periods, quotation marks, and whitespace adjustments.
Before: ...I knew that the Lord had delivered Laban into my hands for this cause—that I might...
After: ...I knew that the Lord had delivered Laban into my hands for this cause, that I might...
Em-dash replaced with comma — same words.
Spelling
Covers three kinds of variation that don’t change word identity or meaning:
- Capitalization —
andvsAnd - British/American spelling —
favour→favor,marvellous→marvelous - Archaic modernization —
shewn→shown,lustre→luster - Typographical errors —
dilligent→diligent
Two words are considered orthographic variants when their character-level similarity (Python SequenceMatcher.ratio()) is ≥ 0.75. This threshold captures the variants above while correctly rejecting genuine word swaps like which → who (similarity 0.50) or is → are (similarity 0.00).
Before: ...having been favoured of the Lord...
After: ...having been favored of the Lord...
"favoured" → "favored" — similarity 0.93, treated as spelling variant.
Grammar
One or two words are structurally different, and at least one of the changed words is a function word—a pronoun, article, preposition, conjunction, auxiliary verb, or common verb form. These typically involve verb tense changes, pronoun swaps, or small insertions/deletions that adjust phrasing without significantly altering the verse’s content.
Before: ...my father had read and saw many great and marvellous things...
After: ...my father had read and seen many great and marvelous things...
"saw" → "seen" is 1 structural change (verb form). "marvellous" → "marvelous" is a spelling variant and not counted.
Before: ...the record which I make, to be true...
After: ...the record which I make, is true...
"to be" → "is" — all function words, so classified as Grammar.
Misc.
One or two words are structurally different, but none of the changed words are function words. This captures small content-level edits—name corrections, noun substitutions, and other targeted word swaps that change meaning rather than syntax.
Before: ...king Benjamin had a gift from God...
After: ...king Mosiah had a gift from God...
"Benjamin" → "Mosiah" — a proper-noun swap, not a grammatical change.
Substantive
A single contiguous change span that adds, removes, or replaces 3 or more whole words. This captures meaningful editorial revisions: phrase insertions or deletions, theological rewording, and sentence restructuring. Multiple small changes (e.g. three separate pronoun swaps) are each classified individually as Grammar, not aggregated into Substantive.
Before: ...the mother of God...
After: ...the mother of the Son of God...
Insertion of "Son of the" — 3 words added in one span.
Before: ...he pitched his tent in a valley beside a river of water...
After: ...he pitched his tent in a valley by the side of a river of water...
"beside" → "by the side of" — 4 words inserted in one span.
Limitations
- The classifier is heuristic, not a linguistic analysis. Borderline cases exist at span boundaries—how
diff-match-patchsplits changes can shift a span between categories. - The “partial word” detection relies on context: if the surrounding unchanged text has a word character adjacent to the span, it is treated as a within-word modification (Spelling). In rare cases this heuristic may misclassify a very short whole-word change that happens to abut another word.
Source code
The classification logic lives in scripts/process_data.py (functions _classify_span(), classify_from_diffs()) and is mirrored in components/DiffRender.tsx for client-side highlighting. The full source is available in the project repository.