SOFA

Syntactic Old French Annotator

SOFA provides


Lemmatisation

SOFA allows you to predict the lemmas for the majority of words. This process is yet to be improved but is still useful.

The lemmatisation is done using the TreeTagger with the Old French parameters kindly provided by Achim Stein.

Part-of-Speech

Each word is annotated with a Part-of-Speech tag using Conditionnal Random Fields algorithm (CRF) by Wapiti. We trained specific CRF models for Old French. The template used can be found in the source code.

Dependency Parsing

Using the previous steps, syntactic annotation is made with Mate tools models specifically trained for Old French.

Visualization

SOFA uses the Arborator view made by Kim Gerdes to provide a graphical view for each annotated sentence. This way you can easily explore syntactic annotations.

Three scores are also displayed : the Accuracy of Part-of-Speech, the Unlabelled Attachment Score (UAS) for governors prediction, and the Labelled Attachment Score (LAS).

Related papers


Treebank and Linguistic Theories - 2014

Gaël Guibon, Isabelle Tellier, Mathieu Constant, Sophie Prévost, Kim Gerdes. Parsing Poorly Standardized Language Dependency on Old French. V. Henrich, E. Hinrichs, D.de Kok, P. Osenova & A. Przepiórkowski Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13), Dec 2014, Tübingen, Germany. pp.51-61, 2014, Proceedings of the Thirteenth International Workshop on Treebanks and Linguistic Theories (TLT13). http://tlt13.sfs.uni-tuebingen.de/. hal-01250959v2

Traitement Automatique des Langues Naturelles - 2015

Gael Guibon, Isabelle Tellier, Sophie Prévost, Mathieu Constant, Kim Gerdes. Analyse syntaxique de l'ancien français : quelles propriétés de la langue influent le plus sur la qualité de l'apprentissage ?. TALN 22, Jun 2015, Caen, France. , 2015, TALN 22, Actes en ligne https://taln2015.greyc.fr/articlesenlignetaln/. hal-01251006

Treebanks and Linguistic Theories - 2015

Gaël Guibon, Isabelle Tellier, Sophie Prévost, Mathieu Constant, Kim Gerdes. Searching for Discriminative Metadata of Heterogenous Corpora. Markus Dickinson, Erhard Hinrichs, Agnieszka Patejuk, Adam Przepiórkowski. Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14), Dec 2015, Varsovie, Poland. pp.72-82, 2015, Proceedings of the Fourteenth International Workshop on Treebanks and Linguistic Theories (TLT14). http://tlt14.ipipan.waw.pl/proceedings/. hal-01250981

About us


Gaël Guibon

Isabelle Tellier
Sophie Prévost
Mathieu Constant
Kim Gerdes

Affiliations


LaTTiCe-CNRS

Laboratory

LIGM-CNRS

Laboratory

LPP-CNRS

Laboratory

Sorbonne-Nouvelle

University

UPEM

University