Tuesday, 7 April 2026

Second dip into AI for transcribing documents

 Again using the new ScribeAI from My Heritage (MH) I tried another difficult document with it. This was from the court of common pleas from 1523. It was a short document which had previously been transcribed and translated professionally by someone well versed in them. The document is in highly abbreviated Latin in a neat secretary hand, and the image was clear.

The description of the document from MH was "This document is an entry from the Court of Common Pleas (identified by the 'CP 40' reference in the filename)", and gave more information on this.

The transcription and translation was much better than expected although some text was added that did not come from the document. The transcription did show missing characters from the abbreviations in square brackets which was good. The transcription unfortunately also showed a couple of wrong names.

The results were good (impressive even) for a quick look. Care needs to be taken if a document looks to be of interest to give it more evaluation as the AI output on its own can mislead.

It is interesting that AI transcriptions (the ones that I have played with at least) do not show missing characters where there is a mark on the parchment or paper, or a crease, or a tear at the edge of a page, but the missing characters seem to be guessed. I assume it needs human intelligence to identify such things.