Explanations to words and acronyms

ADL – Arkiv for Dansk Litteratur = Danish National Archive of Literature

aligned texts –  texts where the original text and its translation are paired sentence by sentence

AU – Aarhus Universitet = University of Aarhus

BLARK – Basic Language Ressource Kit = A tool box of fundamentdal langauge resources such as corpora and taggers

CA - Conversation Analysis

CBS - Copenhagen Business School

CLARIN - Common Language Resources and Technology Infrastructure

Corpus (plural: corpora) = a corpus is an electronic collection of texts which are used for linguistic, literary or language technology purposes.


  1. A specific area of knowledge mastered by a group of individuals known as domain experts who have achieved this status through education and/or their profession.
  2. A specific area of a computer network under which network devices are organized. On the internet domain names often reflect the name of the individual, the institution or the company to whom the domain belongs. Domains are organized by certain extensions representing country codes (e.g. “dk” in www.dr.dk) or the purpose of the domain (e.g. “com”, “org”).

DSL – Dansk Sprog- og Litteraturselskab = the Society for Danish Language and Literature

DSN - Dansk Sprognævn = Danish Language Council

DUDS – Danish Under Digital Study

ESFRI – European Strategy Forum for Research Infrastructures

HLT – Human Language Technology

KB – Det Kongelige Bibliotek (Danmarks Nationalbibliotek) = the Royal Library (the Danish National Library)

KU - Københavns Universitet = University of Copenhagen

KU-LAN - Københavns Universitet Lanchart (Danish National Research Foundation Centre for Language Change in Real Time) = Danish National Research Foundation Centre for Language Change in Real Time at the University of Copenhagen

KU-CST – Københavns Universitet - Center for Sprogteknologi = Centre for Language Technology at the University of Copenhagen

KU-INSS - Københavns Universitet - Institut for Nordiske Studier og Sprogvidenskab = Department of Scandinavian Studies and Linguistics at the University of Copenhagen

Lemma = the dictionary form of a word

Lemmatiser – lemmatization = a tool which associates actual word forms with their corresponding lemma

LGP - Language for General Purposes = alment sprog

LSP - Language for Special Purposes = fagsprog

MOVIN - (Microanalysis Of Verbal/non-Verbal/Visual INteraction) is the Danish network for scholars of interaction analysis and conversation analytic studies = dansk netværk for interaktionsanalyse og konversationsanalytiske studier

Multimodal = Multimodal = containing information which is mediated through several semiotic channels, e.g. a TV transmission including sound, video and subtitles

MUMIN – Nordic Network for Multimodal Interfaces (the project was closed)

NatMus - Nationalmuseet = the National Museum of Denmark

OCR - Optical Character Recognition

Parallel corpora

1.  aligned texts where the original text and its translation are paired sentence by sentence

2.  texts in two languages about the same subject

PoS - Part of Speech 

POS-tagging - marking up text into word classes based on the definition of the word and its context

Repository – databank consisting of various corpora

SDU – Syddansk Universitet = University of Southern Denmark

Tagger = A tool which automatically enriches a text with linguistic information. Often used in the sense of a part-of-speech tagger classifying each token in a text as belonging to a particular part-of-speech.

TEI – Text Encoding Initiative = konsortium af institutioner og forskningsprojekter der samarbejder om at udvikle og vedligeholde en standard til repæsentation af tekster i digital form

Token – An actual word form in a text, e.g. "girls" in the sentence "He gave the girls an apple".

Tokenization = the automatic operation of splitting a text string into a sequence of tokens.

VTU - Ministeriet for Videnskab, Teknologi og Udvikling = Ministry of Science, Technology and Innovation

XML – eXtensible Markup Language = XML er et opmærkningssprog der bruges til at beskrive data og dataskrukturer

WAYF (Where Are You From) WAYF is the connection between the login systems at the connected institutions and external web based services = WAYF-adgang = standard til sikker identifikation af brugere ved login til et websted fx studenter og medarbejdere på et universitet.

WP – work package