Spaces:
Running
Efficient n-gram generation in 1984
In November 1983, Brian Hayes wrote a Computer Recreations article in Scientific American on n-gram generation.
Several readers suggested using the corpus itself instead of a probability table.
See the last page of this article:
http://bit-player.org/wp-content/extras/bph-publications/SciAm-1983-11-Hayes-drivel.pdf
Manber and Myers' 1990 SODA paper on Suffix Arrays:
https://courses.cs.washington.edu/courses/cse590q/00au/papers/manber-myers_soda90.pdf
Ed Fredkin's 1959 paper on Trie Memory:
https://dl.acm.org/doi/pdf/10.1145/367390.367400
Fredkin cites these early papers:
J. C. R. LICKLIDER AND N. BURTON, Long range constraints in the statistical structure of printed English, American Journal of Psychology, Vol 68 (1955), 650-653.
C. E . SHANNON, Prediction and entropy in printed English, Bell System Technical Journal, Vol 30 (1951), 50-64.