All resources in DARIAH

TXM-Kurzreferenz Textanalyse

(View Complete Item Description)

Die vorliegende Kurzreferenz für das Textanalyse-Tool "TXM" - http://textometrie.ens-lyon.fr/ - wurde von Christof Schöch (Lehrstuhl für Computerphilologie, Universität Würzburg) erstellt. Das Dokument kann als einführende Kurzreferenz für ein paar besonders grundlegende Funktionen von TXM dienen. Diese Kurzreferenz wendet sich an Einsteiger und kann nicht das umfangreiche Handbuch ersetzen, das man gezielt für bestimmte Fragestellungen konsultieren sollte.

Material Type: Diagram/Illustration, Reading

Authors: Christof Schöch, Lehrstuhl für Computerphilologie

Einführung in Python für Nicht-Informatiker

(View Complete Item Description)

Einführung in Python für Nicht-Informatiker (entstanden im Rahmen eines BA Digital Humanities Studiengangs) Dieses Projekt umfasst Ipython notebooks, die Lehrmaterialien für den Unterricht von Python enthalten. Dieser Einsteigerkurs in Python richtet sich in erster Linie an Geisteswissenschaftler, setzt also keine Informatik-Kenntnisse voraus. Die Notebooks bieten eine Slideshow mit dem Vorteil, dass der Code sicher korrekt ist. Zur Zeit geht es nicht darum, dass die Notebooks von den Lernenden verwendet werden.

Material Type: Data Set, Lecture Notes, Lesson Plan

Authors: Lehrstuhl für Computerphilologie, Universität Würzburg

The Programming Historian 2: Understanding Regular Expressions

(View Complete Item Description)

In this exercise we will use advanced find-and-replace capabilities in a word processing application in order to make use of structure in a brief historical document that is essentially a table in the form of prose. Without using a general programming language, we will gain exposure to some aspects of computational thinking, especially pattern matching, that can be immediately helpful to working historians (and others) using word processors, and can form the basis for subsequent learning with more general programming environments.

Material Type: Diagram/Illustration, Homework/Assignment

Author: Doug Knox

The Programming Historian 2: Cleaning Data with OpenRefine

(View Complete Item Description)

Don’t take your data at face value. That is the key message of this tutorial which focuses on how scholars can diagnose and act upon the accuracy of data. In this lesson, you will learn the principles and practice of data cleaning, as well as how OpenRefine can be used to perform four essential tasks that will help you to clean your data: 1. Remove duplicate records 2. Separate multiple values contained in the same field 3. Analyse the distribution of values throughout a data set 4. Group together different representations of the same reality These steps are illustrated with the help of a series of exercises based on a collection of metadata from the Powerhouse museum, demonstrating how (semi-)automated methods can help you correct the errors in your data.

Material Type: Diagram/Illustration

Authors: Ruben Verborgh and Max De Wilde, Seth van Hooland

Open Metadata Handbook

(View Complete Item Description)

This book is intended to give the non-expert an overview of standards and best practises related to publishing metadata about works. Its primary focus is metadata from cultural heritage institutions - i.e. GLAM institutions (galleries, libraries, archives and museums). The book was started to help us get to grips with diverse collections of metadata which we were interested in using to figure out which works have entered the public domain in which different countries. At the OKF, we have been working on the developement of automated calculation to determine the public domain status of a work (see http://publicdomain.okfn.org/calculators), and we soon realized that we often do not have the necessary metadata to accurately determine whether or not a work is in the public domain. We have obtained data from different sources, e.g. BBC, British National Library, but we need to combine this data in meaningful ways in order to achieve a more comprehensive set of metadata. This required us to engage in the process of vocabulary alignment, removing duplicate entries, understanding whether similar fields actually mean the same thing, and figuring out whether different data models are compatible with each others.

Material Type: Reading

Author: Public Domain Working Group & Open Knowledge Foundation