Inscriptions are one of the main direct sources of new evidence from the ancient world, but the majority have suffered damage over the centuries, and parts of the text are illegible or lost (Figure 1). Restoring the missing or damaged text is one of the main undertakings of the discipline of Epigraphy: it is a complex and time consuming task, and ancient historians can estimate the likelihood of different possible solutions based on context clues in the inscription – such as grammatical and linguistic considerations, layout and shape, textual parallels, and historical context. Now, by using machine learning trained on ancient texts, researchers at the Faculty of Classics at the University of Oxford (Thea Sommerschield and Professor Jonathan Prag) and Google DeepMind (Yannis Assael) have built Pythia, the first ancient text restoration model that recovers missing characters from a damaged text input using deep neural networks. Bringing together the disciplines of ancient history and deep learning, this work offers a fully automated aid to the text restoration task, providing ancient historians with multiple textual restorations, as well as the confidence level for each hypothesis.
Pythia takes a sequence of damaged text as input, and is trained to predict character sequences comprising hypothesised restorations of ancient Greek inscriptions. The architecture works at both the character- and word-level, thereby effectively handling long-term context information, and dealing efficiently with incomplete word representations (Figure 2). This makes it applicable to all disciplines dealing with ancient texts (philology, papyrology, codicology) and applies to any language (ancient or modern). To train Pythia, the largest digital corpus of ancient Greek inscriptions (PHI Greek Inscriptions) was converted to machine actionable text (called PHI-ML). On PHI-ML, PYTHIA’s predictions achieve a 30.1% character error rate, compared to the 57.3% of evaluated human epigraphists. Moreover, in 73.5% of cases the ground-truth sequence was among the Top-20 restoration hypotheses of Pythia, which effectively demonstrates the impact of this assistive method on the field of digital epigraphy, and sets the state-of-the-art in ancient text restoration.
The combination of machine learning and epigraphy has the potential to impact meaningfully the study of inscribed texts, and widen the scope of the historian’s work. For this reason, an online Python notebook, Pythia, and PHI-ML’s processing pipeline have been open sourced on GitHub. By so doing, it is the authors’ hope to aid future research and inspire further interdisciplinary work.
● Read the article in the New Scientist
● This work has been accepted by EMNLP 2019,
● Read more about this work on the DeepMind blog post.
● Read the preprint of the article “Restoring ancient text using deep learning: a case study on Greek epigraphy” on arXiv.
Figure 1: Damaged inscription: a decree of the Athenian Assembly relating to the management
of the Acropolis (dating 485/4 BCE). IG I3 4B. (CC BY-SA 3.0, WikiMedia)
Figure 2: Pythia processing the phrase μηδέν ἄγαν (Mēdèn ágan) "nothing in excess," a fabled maxim inscribed on Apollo’s temple in Delphi. The letters "γα" are the characters to be predicted, and are annotated with ‘?’. Since ἄ??ν is not a complete word, its embedding is treated as unknown (‘unk’). The decoder outputs correctly "γα".