Spaces:
Sleeping
Sleeping
wjbmattingly
commited on
Commit
•
4682866
1
Parent(s):
56a0241
added catmus citation
Browse files- README.md +44 -0
- send_image.py +2 -2
README.md
CHANGED
@@ -10,3 +10,47 @@ license: cc-by-sa-4.0
|
|
10 |
|
11 |
This is a simple [Kraken](https://kraken.re/main/index.html) FastAPI. It is designed to allow users to obtain line segmentation simply from a Kraken model. I have plans to expand this to include all other Kraken inputs including OCR.
|
12 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
10 |
|
11 |
This is a simple [Kraken](https://kraken.re/main/index.html) FastAPI. It is designed to allow users to obtain line segmentation simply from a Kraken model. I have plans to expand this to include all other Kraken inputs including OCR.
|
12 |
|
13 |
+
|
14 |
+
# Models
|
15 |
+
|
16 |
+
## Catmus Medieval
|
17 |
+
|
18 |
+
```bibtex
|
19 |
+
@article{pinche_clérice_chagué_camps_vlachou-efstathiou_gille levenson_brisville-fertin_boschetti_fischer_gervers_et al._2024,
|
20 |
+
title={CATMuS Medieval},
|
21 |
+
DOI={10.5281/zenodo.12743230},
|
22 |
+
abstractNote={CATMuS (Consistent Approach to Transcribing ManuScript) Medieval is a Kraken HTR model trained on four different languages (in descending order of importance in the dataset: Old and Middle French, Latin, Spanish (and other languages of Spain), Italian) on strictly graphematic transcriptions. No abbreviations are resolved.
|
23 |
+
|
24 |
+
This model is the result of the collaboration from researchers from CREMMA, GalliCorpora, HTRomance and DEEDS projects. It follows the CREMMA Guidelines (Supplemented by the CREMMA Medii Aevi) and will be consolidated under the CATMuS Medieval Guidelines in an upcoming paper.
|
25 |
+
|
26 |
+
The model is trained with NFD Unicode normalization: each diacritic (including superscripts) are transcribed as their own characters, separately from the "main" character.
|
27 |
+
|
28 |
+
Data
|
29 |
+
|
30 |
+
See https://huggingface.co/datasets/CATMuS/medieval
|
31 |
+
|
32 |
+
All source datasets and papers are referenced in the related works section, all transcribers are mentioned in the collaborators section, all partner-project members are mentioned as authors.
|
33 |
+
|
34 |
+
Fundings
|
35 |
+
|
36 |
+
|
37 |
+
|
38 |
+
CREMMA, DIM MAP, Région Île-de-France
|
39 |
+
|
40 |
+
CremmaLab, DIM MAP, Région Île-de-France
|
41 |
+
|
42 |
+
GalliCorpora, Datalab, Bibliothèque nationale de France
|
43 |
+
|
44 |
+
HTRomance, Datalab, Bibliothèque nationale de France
|
45 |
+
|
46 |
+
Text as Image, Image as Text: Charter integrity and topic modelling, SSHRCC 1350911
|
47 |
+
|
48 |
+
Les Décades de Bersuire, première traduction française de l'Histoire romaine de Tite-Live – LiBer, ANR 21-CE27-0008
|
49 |
+
|
50 |
+
Projet Fabliaux, Biblissima+, ANR 21-ESRE-0005},
|
51 |
+
publisher={Zenodo},
|
52 |
+
author={Pinche, Ariane and Clérice, Thibault and Chagué, Alix and Camps, Jean-Baptiste and Vlachou-Efstathiou, Malamatenia and Gille Levenson, Matthias and Brisville-Fertin, Olivier and Boschetti, Federico and Fischer, Franz and Gervers, Michael and et al.},
|
53 |
+
year={2024},
|
54 |
+
month={Jul}
|
55 |
+
}
|
56 |
+
```
|
send_image.py
CHANGED
@@ -2,8 +2,8 @@ import requests
|
|
2 |
import os
|
3 |
|
4 |
# API endpoint
|
5 |
-
|
6 |
-
url = "http://127.0.0.1:8000/process_all"
|
7 |
|
8 |
# Path to the image file
|
9 |
image_path = os.path.join("data", "ms.jpg")
|
|
|
2 |
import os
|
3 |
|
4 |
# API endpoint
|
5 |
+
url = "https://wjbmattingly-kraken-api.hf.space/ocr"
|
6 |
+
# url = "http://127.0.0.1:8000/process_all"
|
7 |
|
8 |
# Path to the image file
|
9 |
image_path = os.path.join("data", "ms.jpg")
|