I’ve posted before on Worlds Without End about some examples of “stylistic drift” that occurred during Book of Mormon dictation. As I then explained, Joseph Smith didn’t restart his dictation from the beginning of the Book of Mormon after Martin Harris lost the first 116 pages. Rather, Smith picked up the dictation where he’d left off: at the beginning of the Book of Mosiah. Only after translating from Mosiah to the end of the Book of Mormon did he come back to translate 1 Nephi through Words of Mormon. When we arrange the text according to this Mosiah-first dictation sequence, we observe that at the beginning of Book of Mormon dictation Smith tended to use the terms “therefore,” “whosoever,” and “insomuch,” whereas by the end of the dictation process he instead preferred the synonymous terms “wherefore,” “whoso,” and “inasmuch.” This implies that Smith exerted at least some influence on the vocabulary of the English Book of Mormon rather than just passively reciting an existing English text projected on the surface of his seer stone.
In order to strengthen this hypothesis of monotonic stylistic drift, a few years ago I devised a way of statistically testing it in the aggregate. The results were consistent with the hypothesis, and I posted them on my personal blog. Recently I was contacted by BYU statistician G. Bruce Schaalje, who had discovered my work after independently performing a similar analysis with a similar result. Bruce looked at my data and suggested a few simple, important refinements, so it seems an update is in order. I’m grateful to him for his help.
To mitigate reader boredom, I’ll begin with an abbreviated explanation of my results. Those interested in the full details can find them after the chart below.
Here’s the short version. First I divided the Book of Mormon into twenty-one sections of ten chapters each, which I arranged according to their Mosiah-first dictation sequence. These twenty-one sections are represented by the x-values in the chart below. Then I measured each section’s relative stylistic similarity (based on the frequencies of about sixty common words) to the front and back ends of the Book of Mormon (when arranged according to dictation sequence). These are represented by our chart’s y-values. Higher y-values indicate greater similarity to the front end of the Book, while lower y-values indicate greater similarity to the back end. If the hypothesis of monotonic stylistic drift is correct, then we should end up with a fairly straight trend line with a negative slope. As our x-values increase, our y-values should decrease at a fairly constant rate. As you can see, this is basically the pattern we observe. The first data point’s a bit of an outlier, so we do get some deviation there from the predicted slope. But overall, the pattern is consistent with monotonic stylistic drift. This is significant because monotonic stylistic drift is what we would expect if the English text of the Book of Mormon were composed by a single author or translator in a Mosiah-first sequence of dictation. If the English text had multiple authors/translators or was composed in a different sequence, we wouldn’t really expect to see this pattern in the data.[1]
Here’s a fuller description of my method for those interested in the gory details:
1. First, I divided the Book of Mormon into twenty-one sequential ten-chapter sections, starting with the first chapter of Mosiah. I excluded Book of Mormon chapters that parallel chapters in the King James Bible. I also excluded from my final analysis a twenty-second section consisting of only five leftover chapters (Jac. 7, Enos 1, Jar. 1, Omni 1, and Words 1). This section was an outlier, probably because the text sample was too small.
2. Second, I generated a list of all the articles, conjunctions, pronouns, and prepositions that are found in all twenty-one of my ten-chapter sections. In the literature on stylometry these words are sometimes referred to as “non-contextual” words. Bruce Schaalje prefers to call them “low content” words, because they aren’t really non-contextual. The theory is that by studying texts’ usage of such low-content words, we may be able to detect their authors’ distinctive patterns of speech or thought with relatively little interference from other variables such as subject matter or genre.
3. Next, I computed what we might call the “stylistic distances” between each ten-chapter section and each individual chapter of the Book of Mormon. Basically, these “distances” were a measure of how differently the words from Step 2 are used in each pair of texts. More specifically, I used a measure known as “Delta.” Delta is an averaged and normalized measure of differences in word-frequencies between two texts.[2]
4. I now had a spreadsheet with 21 columns (representing the 21 ten-chapter sections) and 215 rows (representing the 215 individual Book of Mormon chapters). I then found the regression slope of each column.[3] Crudely put, these regression slopes represent each ten-chapter section’s relative similarity to the first- and last-dictated chapters of the Book of Mormon. A higher slope indicates a greater stylistic similarity to the first-dictated half of the Book of Mormon, while a lower slope indicates a greater stylistic similarity to the last-dictated half of the Book of Mormon.
5. I plotted these regression slopes as y-values on the graph above, then generated a polynomial trendline. If the Book of Mormon’s style changed monotonically during the course of dictation, the resulting trendline should be a nearly straight line with a negative slope. As you can see, this prediction is mostly borne out in my results.
[1] To download my data, click here. I performed two control tests, one on the Book of Mormon arranged according to 1 Nephi-priority, and another on the King James Bible. Neither test produced a pattern consistent with monotonic stylistic evolution, though the KJV chart is actually pretty interesting in its own right, exhibiting text clusters that likely reflect the influence of both genre and different translation committees. Follow the hyperlinks to see the charts.
[2] For an introduction to Delta, see John Burrows, “‘Delta’: A Measure of Stylistic Difference and a Guide to Likely Authorship,” Literary and Linguistic Computing 17 (3): 267–87.
[3] In the regression analysis for each column, I excluded the rows corresponding to chapters that are part of that column’s corresponding ten-chapter section. This was to avoid a sort of “false positive” effect. Delta distances between a ten-chapter set and the chapters comprising the set would tend to be smaller than delta distances to other chapters, potentially resulting in a linear, downward sloping plot even in the absence of style evolution by the author. I’m grateful to Bruce Schaalje for bringing this problem and its solution to my attention.