Categories

RSS Aggregator

LoCloud is a Best Practice Network of 32 partners, co-funded under the CIP ICT-PSP Programme of the [...]

This past Monday, in my musing, I mentioned Kanu Hawaii–a nonprofit that recruits people to take act [...]

The iPad market is saturated. Tablets are gadgets for a largish, elite niche. So, as a technology, t [...]

The Art & Science of Curation is a project which explores ideas around Curation and the role of [...]

New Byzantine texts were added to the Thesaurus Linguae Graecae on 22 April 2014 0082 APOLLONIUS DYS [...]

(10) metadata entry Contribution: Susanne Uhlirz Name: Susanne Uhlirz URL: link to the original post [...]

Archäologie und Computer 2007. Workshop 12 Wien 2008. PDF-Files auf CD-ROM Preis: zehn Euro ISBN 978 [...]

The following is an excerpt from a Program Update by Christa Williford, with contributions from Amy [...]

Todays list of Open Access (free to read) Archaeology articles:STAC: The Severe Terrain Archaeologic [...]

Personal Digital Archiving 2014. Photo by Bill Lefurgy. Cinda May, a key organizer of the Personal D [...]

Google has released all its old Google Street View pictures, so we can travel back in time…. We’ve g [...]

New Voices In Classical Reception Studies Conference Proceedings Volume 1 Conference Proceedings Vol [...]

At the Inaugural Texas Digital Humanities Consortium Conference (TXDHC) on April 12, Elijah Meeks su [...]

Irmengard MAYER1 / Marina DÖRING-WILLIAMS1/ Georgios TOUBEKIS2 / Michael JANSEN2 / Michael PETZET3 ( [...]

Filippo SUSCA (Dipartimento di Progettazione dell’Architettura, Facoltà di Architettura di Firenze, [...]

Top Subscribed RSS

Top Contributors

Topics as Word Clouds

Elijah Meeks and Mat Jockers both have used word clouds to visualize topics from topic models. Colour, orientation, relative placement of the words – all of these could be used to convey different dimensions of the data. Below, you’ll find clouds for each of my initial 50 topics generated from the Roman materials in the Portable Antiquities Scheme database (some 100 000 rows, or nearly 1/5 the database, collected together into ‘documents’ where each unitary district authority is the ‘document’ and the text are the descriptions of things found there). The word clouds are generated from the word weights file that MALLET can output. There are 8100 unique tokens when I convert the database into a MALLET file; each one of those is present in each ‘bag of words’ or topic that MALLET generates, but to differing degrees. Thus, word clouds (here generated with Wordle) pull out important information that the word keys document does not. However, given that I optimized the interval whilst generating the topic models, the keys document provides an indication of the strength of the topic in the corpus. I’ve arranged the word clouds scaling them against the size of the strongest topic (topic 22), top-bottom, left-right. I’ll be damned if I can get wordpress to just display each image under the other one. Even stripped my table out, it did!

At any rate, as one churns through the 50 topics, after about the first 11 (depicted below), the topics get progressively more noisy as MALLET attempts to deal with incomplete transcriptions of the epigraphy of the coins, and the frequent notes about the source for the identification of the coins (the work of Guest & Wells). The final topic depicted here, topic 20, directly references a note often left in the database concerning the quality

Topic 22
Topic 48
Topic 43
Topic 32
Topic 7
Topic 33
Topic 13
Topic 47
Topic 46
Topic 35
Topic 20

an individual record; these frequently are in connection with materials that entered the British Museum collection before the Portable Antiquities Scheme got going and hence the information is not up to usual standards.

This exercise then suggests to me that 50 topics is just too much. I’m rerunning everything with 10 topics this time.

Topic 22

Topic 22

Topic 48

Topic 48

Topic 43

Topic 43

Topic 32

Topic 32

Topic 7

Topic 7

Topic 33

Topic 33

Topic 13

Topic 13

Topic 47

Topic 47

Topic 46

Topic 46

Topic 35

Topic 35

Topic 20

Topic 20

(85)

Share
metadata entry

Contribution: Shawn

Name: Shawn

URL: link to the original post

Entry: http://electricarchaeology.ca/2013/06/26/topics-as-word-clouds/

Language: English

Format: text/html