Geo word clouds

Citation for published version (APA): Buchin, K. A., Creemers, D. J. A., Lazzarotto , A., Speckmann, B., & Wulms, J. J. H. M. (2016). Geo word clouds. In 2016 IEEE Pacific Visualization Symposium (PacificVis), 19-22 April 2016, Taipei, Taiwan (pp. 144- 151). Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/PACIFICVIS.2016.7465262

DOI: 10.1109/PACIFICVIS.2016.7465262

Document status and date: Published: 01/01/2016

Document Version: Accepted manuscript including changes made at the peer-review stage

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement: www.tue.nl/taverne

Take down policy If you believe that this document breaches copyright please contact us at: [email protected] providing details and we will investigate your claim.

Download date: 03. Oct. 2021 Geo Word Clouds

Kevin Buchin* Daan Creemers† Andrea Lazzarotto‡ TU Eindhoven, The Netherlands TU Eindhoven, The Netherlands Ca’ Foscari University of Venice, Italy Bettina Speckmann§ Jules Wulms¶ TU Eindhoven, The Netherlands TU Eindhoven, The Netherlands

Vinageois Fromage de Bergues Chti’corée Saint Piat Jolirond BriqueBoulet de Cassel Ch’ti Gris Brique Petit Coeur de Rety Chti’crémeux Ch’ti Gris Ch’ti Gris Carré du Vinage

Mont Cornet Carré du Vinage Pavé Petit Coeur de Rety Petit Crémé Vieux Roncq PavéCamembert RougePavé fort Fromage blancCarré du Vinage rmg blanc Fromage Petit Crémé Vinageois Petit CréméFromageFromage du Vinage blanc St Piat Fromage du Vinage rmg blanc Fromage Pierre Blanche St PaulinChti’crémeux Mont Cornet Mont Cornet Chti’corée Maloine Flocon de lait Pierre Blanche Bergues rmg blanc Fromage Barkass Jolirond Vieux Roncq Boulette de Cambrai RécolletVinadouxFromage du Vinage Récollet St Piat Abondance Boulet de Cassel Vieux Roncq Flocon de lait Petit Vinageois Pierre Blanche Maloine Vinadoux St Piat Chti’crémeux Petit Vinageois Bergues Flocon de lait Brosse Bergues Saint Piat Valaine Maloine Valaine Boulette de Cambrai Fromage de Bergues Camembert Gruyère Valaine de Meaux Piemont Brique Pavé Jolirond Fromage de Bergues Piemont Perche Tèt’ de rep’ Récollet Chti’corée Boulette de Cambrai Portion Mimolette Chabicou Récollet ReblochonRécollet BargkasFéta Piemont Camembert Saint Piat Chabicou Grande Réserve GrandeReblochon Réserve Saint-Grégoire Petit Vinageois Tèt’ de rep’ Petit Coeur de Rety Abondance Camembert St Paulin Camembert Maroilles St Paulin Maroilles Camembert Féta Crémeux de Rhuys Féta Brosse Monplaisir Sérac Bargkas Récollet Rouge fort Camembert Vinadoux Brie de Melun Bronnou Rouge fort Brie Monplaisir St Jaoua Roquefort Tèt’ de rep’ Brébizou Chabicou Maroilles Pavé Brébizou Nanteau Brie Pavé Perche Perche Brie de Melun Sivry Barkass Bronnou Nanteau Barkass Bronnou Camembert Tchic Pyramide Brosse Grande Réserve Vinageois Randonneur Brie de Melun FaisselleFromage blanc Saint-Grégoire St Jaoua St Marcelin Fromage blanc Tchic St Jaoua Boulet de Cassel Brie Beaussay Tchic Saint-Grégoire Brie de Meaux Belledochon Randonneur Cabotin Bûche Gouda Pyramide Nanteau Mimolette Gruyère Gouda Claquebitou Bûche Bûche Crottin Roquefort Munster Claquebitou CrémeuxFaisselle de Rhuys Camembert Bûche Gouda Monplaisir Sainte Maure de Touraine

Bûche Munster Claquebitou Munster Beaussay Bargkas Val D’osseux Tomme de Rhuys Randonneur Val D’osseux Colombier de Sivry Pyramide Nivernais Tomme de Rhuys Pyramide Cancoillotte Colombier deCrottin Sivry Beaussay St Marcelin Bûche Sivry Crémeux de Rhuys Sivry Montgaren Pyramide Trempette CrottinPyramide Chabis Camembert Camembert Chabis Galochard Brébizou Val D’osseux Colombier de Sivry Chabis Trempette Chabis Cancoillotte P’tit Paul Nivernais Galochard Pyramide St Marcelin Bouchon Pyramide Bouton de Culotte

MontagnardBleu Bûche

Chabis Bûche Chabis Galochard Long

Crottin Bouchon Montagnard P’tit Paul PaletTomme Cabrichon Palet Blanchardon Blanchardon Gaperon Feuilles du Limousin Bouchon Long Feuilles du Limousin Tomme Bouchon TommeCalou Brique Bleu CrottinGaletGruyère Bleu d’Avèze Bouchon Faisselle FromagePyramide blanc Trempette St Méaudrais Gaperon Saint Nectaire Calou Bouton de Culotte Bûche Palet Cabrichon Bleu de chèvre Crottin Charolais Montgaren Belledochon Bleu d’ Pyramide Bouton de Culotte Nivernais Saint Nectaire Belledochon Beaufort Long Beaufort Raclette Bouchon Bleu d’Avèze Bleu de chèvre Sérac CharolaisPortion Bouchon Brique Beaufort Brique Servagnet Gruyère Ramier roux Bleu Morbier P’tit Charo PyramideBleu de pays Savoureux Bleu Cabotin Reblochon P’tit Paul St Méaudrais Camembert Sainte Maure de Touraine Miaulante St Méaudrais Gaperon Bassoise Bleu St Félicien Abondance Blanchardon Bleu Servagnet Sainte Maure de Touraine Cabotin Bleu BouchonCaillat Charolais Brique Montagnard Calou P’tit Charo St Félicien Faisselle Fourme d’Ambert Bleu du Vercors-Sassenage

PicodonBleu Bleu du Vercors-Sassenage Faisselle Bleu de pays Bleu de pays Miaulante Bleu du Vercors-Sassenage St Félicien TommeBrique Bassoise Fromage blancCamembert Le Mazet Bleu d’Auvergne Caillou Morbier Galet Saint Nectaire Reblochon Brique Gruyère Bleu d’Avèze Caillou Savoureux Cabrichon P’tit Charo Montgaren Fromage blanc Raclette Caillou Bassoise Sérac Ramier rouxRaclette Abondance Morbier Camembert Trésor Raclette Morbier Comté Portion Galet fribourgeois Servagnet Vacherin fribourgeois Pyramide Fourme d’Ambert Abondance Bleu Le Mazet St Marcellin Gascounet Pélardon Gruyère Gruyère Tomme Ramier roux Picodon Morbier St Marcellin Bleu d’Auvergne Savoureux Pavé St MarcellinBeaufort Fourme d’Ambert TrésorTommeMorbier Beaufort Reblochon Pyramide Abondance GruyèreVacherin fribourgeois Raclette Feuilles du Limousin Gruyère Pyramide Raclette TrésorCaillat Miaulante Tomme de Rhuys Comté Le Mazet Caillat Gascounet Bleu de chèvre Brique Comté Beaufort GascounetPavé Fromage blanc Brique Bûche Chabichou Chabichou Pavé Brique Pélardon PélardonBûche Bûche Figure 1: Geo word clouds of production in (left: orthogonal, middle: 45 degrees, right: 10 degrees).

ABSTRACT visualization method, there are many recent studies on word clouds, Word clouds are a popular method to visualize the frequency of on topics ranging from usability issues [12], over the visualization words in textual data. Nowadays many text-based data sets, such as of dynamic text content [4], to the appropriate ranking of the words Flickr tags, are geo-referenced, that is, they have an important spa- to be visualized in large volumes of Twitter data [9]. tial component. However, existing automated methods to generate Many text-based data sets, such as Flickr tags, have an impor- word clouds are unable to incorporate such spatial information. We tant spatial component. It hence seems natural to incorporate this introduce geo word clouds: word clouds which capture not only the spatial component when creating word clouds. Consequently there are many (manually crafted) examples of “word cloud maps”, that frequency but also the spatial relevance of words. 1 Our input is a set of locations from one (or more) geographic re- is, geographic maps which are composed solely of words (often gions with (possibly several) text labels per location. We aggregate names of regions). However, existing automated methods to gener- word frequencies according to point clusters and employ a greedy ate word clouds are unable to incorporate such spatial information. strategy to place appropriately sized labels without overlap as close In this paper we propose an automated method to create geo word as possible to their corresponding locations. While doing so we clouds: spatially informative word clouds (see Fig. 1). Our word “draw” the spatial shapes of the geographic regions with the corre- clouds capture not only the frequency but also the spatial relevance sponding labels. We experimentally explore trade-offs concerning of words. Furthermore, they “draw” the geographic regions, from the location of labels, their relative sizes and the number of spatial which the input is taken, to both attract attention and to give visual clusters. The resulting word clouds are visually pleasing and have cues to identify locations. a low error in terms of relative scaling and locational accuracy of In Section 2, we formally define geo word clouds, give exact words, while using a small number of clusters per label. requirements and specify corresponding quality measures. In Sec- tion 3 we then present a greedy algorithm to compute geo word Index Terms: I.3.3 [Computer Graphics]: Picture/Image Genera- clouds. An appropriate coloring scheme is an important part of tion word cloud design. In Section 4 we hence explain our design choices with respect to color in detail. There are several parameters 1 INTRODUCTION to choose when generating geo word clouds, such as rotation angles Word clouds are a text-based visualization highlighting important for the words, (relative) font sizes and the amount of clustering of keywords in large amounts of text, emphasizing more important or the input. In Section 5 we experimentally evaluate how different frequent words by a larger font size. Due to the popularity of this parameter choices influence the quality of the resulting geo word clouds and explore trade-offs between different requirements. We *email: [email protected] † can conclude that the geo word clouds generated by our greedy al- email: [email protected] gorithm are visually pleasing and satisfy our quality criteria well. ‡email: [email protected] Finally in Section 6 we discuss various ways to extend our current §email: [email protected] approaches to different scenarios and requirements. ¶email: [email protected] Related work. Possibly the best-known algorithm for word cloud generation is the one by Wordle [17]. It utilizes a simple greedy

1Image search for word world map approach to produce visually pleasing word clouds. Barth et al. [1] 2 GEO WORD CLOUDS consider a scenario where a set of words is given with adjacency 2 Our input is a set of n points (locations) p1,..., pn in R which information, that is, the relative positions of words in a word cloud are located inside a geographic region M. Each point is associated are pre-specified. They show that in general, the problem of cre- with one or more words wi from a list of m words w1,...,wm. Each ating word clouds, while respecting adjacencies between words, is word has a specific shape, namely the outline of the word spelled NP-hard. They consequently present an approximation algorithm in a specific font chosen by the user. Furthermore, the user can for this problem. Both of these algorithms do not consider spatial restrict the possible (degrees of) rotation of words in the output. As locations and hence cannot be used to produce geo word clouds. customary with word clouds, more frequent (important) words are There are various methods similar to our approach, but none are drawn using a bigger font size. Finally, the words should be placed directly comparable. Chi et al. [3] propose morphable word clouds, in spatial proximity to their defining points. Given this input, we where a sequence of spatial shapes is “drawn” with a time-varying want to create a geo word cloud (a set of word shapes M0) which set of words. To fill up the shapes they use a greedy algorithm, fills (and hence draws) the geographic region M. but when placing the words no spatial constraints are taken into ac- count, while this is of importance for geo word clouds. Moreover, 2.1 Requirements they use rigid body dynamics to displace the words over time while enforcing constraints. Their work is mostly focused on the tempo- Requirement 1 (R1) All word shapes are disjoint. ral aspect, that is, arranging the time-varying sets of words within the morphing shapes such that the evolution of both word clouds Requirement 2 (R2) The geo word cloud has a small number of and shapes can be easily tracked by the user. Our algorithm and shapes per word. constraints, however, focus on the spatial scenario where every in- put location can have multiple labels and each word can label many R2 prevents valid outputs that are not interesting. See Table 1 where different locations. Our algorithm hence has to cluster labels appro- every point in the input is represented by an individual word. In- priately and be more flexible when optimizing both the clustering stead, if points associated with the same word are relatively close and the placement of words. together, we want to place a single bigger word shape in our word Related to both our work and classic map labeling are tag cloud which ideally covers the points. To do so, we cluster the maps [8, 13] which are intended to “expose textual topics that are points associated with a single word (see Section 3) and place a tied to a specific location on a map”. Tag maps place the tags at a single shape per cluster, sized according to cluster size. R2 is hence fixed location on a map with a size corresponding to their impor- intended to keep the clustering from overfitting. tance (thus, in fact, creating a graduated symbol map with varying text symbols). Since the locations for all tags are fixed, overlaps Requirement 3 (R3) Shapes of a word cover its points well. between tags frequently occur. In contrast, it is central to geo word clouds to avoid overlap. To emphasize the spatiality of geo word clouds, we ensure that the There is an extensive body of work on automated label place- shapes in our output are close to their corresponding data points. ment for point features [18]. Map labeling is loosely related to geo See Table 1: the more points are covered by a word and the smaller word clouds, but there are some crucial differences. First of all, our the distance between the word and the points that are not covered, word clouds are the map and as such, cover space (more or less) the better the word represents the spatial data. In subsection 2.2 we completely. Second, we do not label specific features but instead show how we measure the error for the points that are not covered. are summarizing spatial word data. Hence we have considerably Requirement 4 (R4) A word with k points has a total area as close more flexibility in placing words and in deciding how to aggregate k 0 data associated with the same word. Finally, when labeling maps, it as possible to n M . is common to omit labels to avoid clutter. For our geo word clouds it is of utmost importance to show all labels of high relevance. So the common algorithmic approach of maximizing the number of la- Requirement Input Bad Good bels placed is not applicable (instead one could consider a variant of maximizing the size of labels [6]). R Geo word clouds are an abstract representation of spatial data. 1 Blue Blue Since the word size is scaled according to importance, they have a Green Green

certain resemblance to cartograms. Cartograms convey values per Green Green Green Green region and several types of cartograms represent regions by simple Green Green geometric shapes, such as circles [5], squares or rectangles [11], R2 Green scaled according to the value associated with the region. How- Green Green Green ever, the challenges for generating word clouds differ considerably from those for cartograms: cartograms are based on a subdivision R3/M1 Green

of space into regions, which provides a starting point for an overlap- Green free placement. In contrast, placing words very close to the corre- sponding data points in a word cloud, naturally will lead to consid- R4/M2 Blue Green

erable overlap, which needs to be resolved. Blue Green Ignoring the issue of overlap it would seem natural to use rect- Green Green angular cartograms to generate word clouds. However, methods R5/M3 to generate rectangular cartograms (for example [2, 16]) place a Blue Blue strong emphasis on maintaining adjacencies between regions and as a result limit the freedom of choosing the aspect ratio of rectan- R 6+7 Green Green Green Green Green Green gles. Such region adjacencies are of little importance for geo word Blue Blue Blue clouds, but being able to choose aspect ratios is of high importance. It hence seems unlikely that methods for rectangular cartograms can easily be adapted to generate geo word clouds. Table 1: Requirements and measures. The size of a word should give the reader a good indication of the Measure 1 (M1) The coverage error for all points in M is importance of the word. See Table 1: one word has three points, ∑p∈M Hausdorff (p,word(cp)). while the other has only two. To express this in the word cloud, we want a ratio of about 3 : 2 when it comes to the area of the words It might be necessary to slightly reduce the size of some words, to representing these points. find a suitable position for them. We want to measure how well the words represent the number of points they visualize. For a word 0 Requirement 5 (R5) M covers input area M well. with k points, which is scaled by factor x, the number of points that |k − k · x| C R ensures that the outline of M becomes visible in the output, and are not represented is . Let be the set of all clusters, and 5 Rep(c) that the word cloud we generate is dense. We want the shapes of define the value we defined for the number of points that are M0, to stay inside M, but at the same time do not leave too much not represented by the word of this cluster. white space. Measure 2 (M2) The percentage of points not represented is ∑c∈C Rep(c) Requirement 6 (R6) Shapes of a single word have the same color. n .

R6 ensures that geo word clouds look consistent and that it is easy If words may be scaled by a factor x > 1, M2 can be over 100%. 0 to find multiple instances of the same word. However, human users Next we want to measure R5: how well does M resemble M. cannot distinguish many colors. Hence we developed a careful An effective way to do this is calculating the symmetric difference scheme to reuse colors, see Section 4 for details. between M0 and M: parts of M that are covered by M0 do not add to this measure, but area that is not covered and covered area outside Requirement 7 (R7) Coloring should distinguish different words, but should not draw too much attention to particular words. of M increases the measure. This is indicated by the colored area in Table 1. We can also see that choosing an appropriate global scaling The last requirement should help a reader of the geo word cloud to for the words helps reducing the symmetric difference. Since we find different words, but at the same time make sure that only the require the words to be disjoint, we know that the symmetric differ- font size and number of occurrences of a word inform the reader of ence will not be lowered by overlapping words. We use a bounding its importance. R5 and R7 together put emphasis on the size of the box per shape for this measure, since users tend to mentally aggre- word as the only indicator for its importance. See Table 1: we can gate the letters, the space between letters and also the space around see that the word Green has different colors in the first case, and the the ascenders and descenders of the word with the word itself. green color is a lot brighter than the blue one. The second color is 0 a lot easier to read and visually pleasing. Section 4.2 explains how Measure 3 (M3) The symmetric difference between M and M. we ensure that different words with the same color are still clearly distinguishable. 3 PLACEMENTALGORITHM Placing a label for each point would clutter the map (see Section 5 2.2 Measures for a more detailed discussion). Since we want to place bigger To evaluate the quality of the geo word clouds generated by our words whenever possible, we use k-means to cluster points with algorithm we define three quality measures. Intuitively speaking, the same label and place for each cluster a single word. The size geo word clouds that score well on the first two measures consist of of the word depends on the number of points in that cluster. The words which (mostly) have the right size and are (mostly) placed at points are clustered multiple times with an increasing number of the correct location. This is a necessary condition for the user to be desired clusters, to optimize the number of clusters. The best num- able to use our word clouds effectively. The third measure ensures ber of clusters is found when the error, the distance to the centroid that the spatial shape is drawn appropriately, which is an impor- of a cluster, is small. We add a penalty term to the error for adding tant clue for the user to be able to accurately determine locations. extra clusters to ensure that we do not keep on adding clusters to de- Hence these measures ensure that our geo word clouds add spatial crease the error. We select the clustering that minimized the error information to word clouds in a meaningful way. including the penalty term. When computing geo word clouds we need to make careful After clustering we determine suitable values for rotating the trade-offs between the various requirements. For example, to fill word of each cluster. We determine the eigenvector of the points up the whole area (R5), we might have to scale some words down in the cluster and use the principal component of this vector. This to fit them (R4). Furthermore, it might be better to make a reason- gives the rotation of the word which minimizes the error to the able coverage error on two words, instead of placing one perfectly bounding box of the word. We allow the user to restrict the pos- and having a huge error for the other word (R3). sible rotations: orthogonal, 45 degrees, etc. For each word we then As stated above, we cluster the data points of each word and choose the rotation that is closest to the orientation of the principal then associate each cluster with a single (scaled) shape. To measure component (see Fig. 1 and 2 for examples). whether the shape of a cluster covers the points in that cluster well We use a greedy strategy to place the words. For this the words (R3), we use the Hausdorff distance. For each point in the cluster, are sorted by the size of their point cluster and the placement is per- we measure the Hausdorff distance from this point to the bounding formed from the bigger to the smallest cluster to avoid large errors box of the shape. Points that are covered and are inside the bound- for bigger clusters. We compute the word placement pixel-based. ing box of the shape have distance zero, while for points that are Per pixel we store whether it is occupied by a word or not. The not covered we take the distance from the point to the bounding positions in which a word can be placed are obtained by a uniform box. The sum of the distances for a single word is the error for that filter, i.e. a convolution that considers only the bounding box of the particular cluster. word. When we actually place a word, we only set the pixels that Table 1 shows which points add to the coverage error and how a overlap with the actual shape of the word. good placement of a word can decrease the error. The sum of the At each iteration a word is selected from the top of the list and error for all points measures how well all shapes in M0 cover the its best placement is computed. The best placement is a trade-off whole data set. between the size of the word and the error we make: if a word Let Hausdorff (p,r) denote the Hausdorff distance between a cannot be placed at its exact position, we try to scale it down, so point p and a rectangle r and let cp denote the cluster to which that the distance between the center of the word and the centroid of point p belongs. We define word(cp) as the bounding box of the the cluster can be smaller. If we scale a word down, we again place word associated with cluster c. it in the list and process it again when the scaled word is the biggest in our Wordle

M seaside British blue

blackOlympus

railway gig

beach Sussex dark summer

red Kent has the same

building park

water reection

Essex

live orange

old points and the

urban

travel garden united

county England

, nearly all big

Brighton grass

tower Yorkshire sky music bird eld square

tree Thames k urban

architecture girl

church grati light

yellow clouds night Birmingham lake

Londres house

Manchester people , concern the col-

wildlife

whitebridge

snow museum owers snow

Yorkshire cold railway ower 7

leaves

autumn

spring festival landscape bridge

yellow nature

road leaves northern

sand Tomme

artlights

Olympus

R

Hampshire

square

countryside sunrise

gig

northlights united

sun car

Britain music city

Englandriver spring Great Britain sun trees

reections coast

beach wedding

winter Wordle

street

grati building

England

blue is the area of light

rocks live green

white

lake river

museum

wildlife park girl

bird castle orange

sky United Kingdom

harbour wales British lm

square

harbour red rocks

Kerry Liverpool

clouds cold

tree reection trees

lm

night London

sunrise

Bristol tower and black united

Scotland M bridge

summer

Cumbria art

Britain

architecture rock

reections

Glasgow old

festival wales

street Dorset

A

sunset

ower museum owers city seaside 6

winter

sunset

car sea

building architecture

rock

wedding boat green water castle birds

house R nature

snow

landscape

grass dark wales

church street Devon

tower

boat

ocean

park city

garden

urban birds

Edinburghpeople Cornwall

autumn

Scotlandroad

wedding

music

travel county

lm

Irland

gig live Irlande Irish

Belfast Ulster

northern

city Eire Wicklow

street

IrelandIrlanda Donegal urban

Ulster Northern Ireland is not satisfied, since the word

Waterford

lake otenIreland Northern

Dublin

ocean 6

Donegal in Fig. 2 are not both blue, in fact, neither

Ireland R Cork

county

Irlanda Irish

Irland

Galway Eire Irlande sky

points. The value Kerry n both occur twice, but have different colors. Fur- and

, if the cluster for that word has

birds n

bird / yellow orange

city sea seaside

ower garden

gig M

urban wildlife Sussex park Olympus sky natureblue

Londres A

road use color to group semantically similar words or to evoke

architecture Thames

beach

eldwedding city

·

urban reection

museum sun white

house black owers

county coast bridge leaves

Essex

water k square

church red lake tower

black dark art lights Britain Great

Brighton

lm Kenttravel

not

spring railway old

summer OLORING Dorset

Yorkshire festival green snow

tower

Yorkshire street reections

winter building people

railway

British castle park Britain

music Manchester sunset British

united

Birmingham united Visualization

northern landscape

square

lake trees car

grass

music

clouds

live

building lm

live sun

Englandboat Two of our requirements, namely Another way of creating HCL color maps is by keeping the

leaves beach wedding

architecture

Hampshire

countryside boat bird

grati

sunrise

Great Britain England

snow

harbour museum

sky

church

blue

dark

city girl

tree

ocean

square

spring garden

yellow

wales

sunrise green

birds autumn

girl 4 C

car computing the font size whichof renders area the word inwhole a data bounding set box has pixel-based image, measured in pixels.words We stop become the too algorithm small, if the or either because they their are clusterpseudocode scaled size for down is the small, by greedy approach. a large amount. Algorithm 1 shows The first figure inWe [17] can shows quickly a see Wordle that with its standard coloring. words use the light greyof color, red/brown. and nearly Below all we smaller describecoloring. words our a approach shade to compute a better oring of the geoquirements word (see, for clouds. example, Fig. Our 1, algorithm the word adheres to these re- associations ( is). Our only goal iswords to and assign their colors sizes in in such thepossible. a word way cloud We that can cannot be the use identified different number as a easily of different as color colors forInstead needed each we would word, use exceed since awhich what the small are users similar fixed in can colour size perceive. palette are and unlikely to ensure have4.1 that the words same color. HCL colorHCL maps color maps have “wellgued balanced in perceptual [15]. properties” By aslevel, keeping ar- we the can Chroma create andare Luminance a uniformly on qualitative distributed the color over same the mapors whole containing are Hue colors spectrum. very that These similarattention col- in than others. brightness We and choosecolors such none which a of are qualitative still color them map distinguishable attracts of enough. 13 more Chroma on thevariable same Luminance. level, choosing Thismaps. a results single in Fig. Hue single 3aused. Hue, and shows This sequential having a is color a geo amake it nice word easier variation, cloud to but distinguish when arguably words. such qualitative color color maps map is and thermore, except for the big occurrence of color everywhere).Luminance We (HCL) choose color our modelword) to colors is ensure from perceived that to a no bewe color do Hue-Chroma- more (and important hence than others. At this point

Bristol England

river wildlife clouds

red lights

tree Kerry

rock old

park grati nature road north sand

Scotland Liverpool

seaside reections art London

wales trees

castle coast harbour

girl

light

cold urban reection street orange summer gig sunset light

street water Glasgow seaower

rocks architecture Devon rocks night

river

Britain

autumn Cornwall

snow

Edinburgh united night museum

white seaside

landscape wales

Cumbria grass cold

winter

owers

house people Scotland

s

wedding Olympus

bridge

festival

rock

music

county

lm

Irish Irlanda

Eire

gig Irland Belfast travel Wicklow

where

Ulster

northern Irlande Irelandlive

city

urban

street Ulster

Donegal A Waterford

Northern Ireland

lake Dublin

ocean

otenIreland Northern

√ Donegal Ireland /

Cork Irlande

Irish δ Irlanda

Galway Eire

county · Irland ) Kerry α

rock 1

birds

building boat

lm sky

road Britain

leaves beach urban

spring ocean seaside England

)+(

Yorkshire

tree tower city snow cold

house museum Brighton

lights s bridge

tower

blue northern red

orange square

reection

black park yellow architecture

people live ower summer

eld green − bridge Olympus

owers artwhite travel Thames

Sussex

grati

museum

British railway

sunset autumn

night grass 1

dark wales

garden music seaside c Great Britain 0 times the desired size (with sunrise water sun

clouds British

bird ( Hampshire united

castle

festival .

countryside church

house

live ·

tree Birmingham

yellow old

Londonbird Dorset

trees 1 street

landscape Britain

spring Londres

nature nature

Bristol

street

lake urban

snow

city road Liverpool α reections urban art car

gig then

orange sky Englandwhite

river England

railway

Glasgow building

coast birds county

=

grass grati light

music north sunrise

c reections

blue gig

museum

black leaves lm clouds old Edinburghnight

wales

wildlife Kerry s

lake

park

festival architecture

beach

red square

reection united

cold summer Great Britain Essex

city

rocks

Olympus river building

light dark

sea then

car lights

rocks church

D

boat

people P

harbour

winter sunset harbour

girl

united street green

Scotland owers water coast winter

castle landscape M

trees garden Kent

snow c

sun sand Devon autumn square wales c

ower girl Yorkshire music 5. The best placement can now be defined

Cornwall wedding seaside Cumbria rock .

Scotland 05 to

bridge .

architecture

Manchester tower 0 wedding park from 100%

P

0

county travel

live

urban c

P = inside street gig < Ulster in Irish =

minimal size

Belfast

northern c

Eire Wicklow Irland

city c α

Ireland is the area of the rectangle bounding our region Irlanda suggests, the preferred font size is determined by s Irlande Ulster <

Donegal Northern Ireland

Waterford

lake do otenIreland Northern do

4

A ocean Dublin

(biggest first)

Donegal R

Greedy geo word clouds P C Irlanda Ireland find best placement for break adjust font size of keep reorder place remove update the available space Cork C size percent Irland ∈ ∈ county Irish . .

05). A penalty is introduced both for scaling a word and Galway Eire Irlande M p p . c c ← acquire shape load data set Kerry create clusters from sort p assign font size to assign rotation to if if else Figure 2: Geo word clouds of Flickr tags in Great Britain and Ireland (left: orthogonal, middle: 45 degrees, right: 10 degrees). is used to influence how important each part of the penalty is. ← ← ← ← α It remains to discuss how the optimal size of each word is de- The scaling of the words uses a finite number of values uniformly M D for all P for all return C is the distance between the word placement and the centroid of . is preferred. Ifwe a ensure map that consists words of thatplaced should two in be disjoint the placed regions other in one. (see one region Fig. cannot 2) be termined. As Currently we use Algorithm 1 steps of 0 δ its cluster and as the placement withoptions the with lowest the penalty. same penalty, In the the one case with of the different highest value of for moving it away fromcomputed as its a desired weighted average position. The final penalty is M distributed from word in the list.the Furthermore, cluster, if we decrease we its do size,word not since now. place If we the cover we less would word pointsgive keep close the with the to user this word the at impression itsposition that original of there size, the are we word, a would while lot of there points might around not the be many at all. aesr htwrsta r iia nsz e oo hti at is that color a get size in c similar we are size, that font least of words order that descending sure in make colors assign to algorithm for this value used any have we choose maps, can our of of used most map are For color colors etc. a all twice, that color a ensure using we before Hence color. used previously the and col word biggest the from start We color a words. assign the of size decreasing h hl u spectrum Hue used. whole been of map all the color have a they of after use them the reuses Assume to but colors different size, assigns similar words that of distinguish algorithm words still an but use colors, we of Therefore, number well. small a use to want We distribution Color 4.2 ducteurs-De-Fromages-cat03500 tags, unique 125 with cheese of constructed producers been 95 producers contains has of set data list and data a first The France from The starting in websites, different sets. produced from data manually cheese different about two is on set algorithm the tested We small. is color same the light left have adjacent is to two that words of white space chance adjacent the The two and or words. gray words these white nearly gray between with light words space filled with The biggest is filled the is colors. means words distinguishable which big very words, get gray all will dark or small. black very is many the color same and the words getting small words many previ- adjacent are the two there of words However, chance smallest hold. are the not and For size might in ous already. shrunk have size to by color the distinguishable assigning we that words the E 5 > Bronnou Start Brébizou h etwr ndcesn re fwr ie,gt color a gets sizes, word of order decreasing in word next The hs lisaespotdb i.3:W e htteeaenot are there that see We 3b: Fig. by supported are claims These again, one similar a or color, same the use we that time the By rmg blanc Fromage 2 ( Bûche iue3 e odcod fFac:dfeetclrmaps. color different France: of clouds word Geo 3: Figure i http://www.monmenu.fr/s/exploitations/Liste-Pro ,clr htaeasge osctvl ilb ut different. quite be will consecutively assigned are that colors 1, + tJaoua St a eunil“hee colors. “cheese” Sequential (a) c iue4 C oo a,slcigcneuiecolors. consecutive selecting map, color HCL 4: Figure c XPERIMENTS Faisselle ) rmu eRhuys de Crémeux oosaa rmtepeiu oo.Snew aechosen have we Since color. previous the from away colors Beaussay Claquebitou Crottin tMarcelin St Camembert mod lud’Auvergne Bleu Perche Blanchardon airroux Ramier om eRhuys de Tomme Camembert Pyramide ored’Ambert Fourme ’i Charo P’tit Camembert Camembert Bouchon Gouda Chabichou Vinadoux Gaperon Chabis Chabis Bûche eilsd Limousin du Feuilles Charolais rmg blanc Fromage Bouchon Valaine an Nectaire Saint Gascounet

Savoureux l eMazet Le

tFélicien St Jolirond where , Pyramide Bassoise Bergues Raclette ’i Paul P’tit Pyramide rmg eBergues de Fromage col Cabotin Long Portion aneMued Touraine de Maure Sainte tPiat St Palet l Calou Camembert Vinageois Picodon lud’Avèze Bleu

og fort Rouge ei ou eRety de Coeur Petit Chti’corée = Caillat

Brique Nanteau Maloine

Pavé Bleu Montagnard blanc Fromage iu Roncq Vieux 0 Pyramide Nivernais

Bûche Tomme Gris Ch’ti Sivry

lud pays de Bleu

Brique Brie Récollet

Brique D’osseux Val 3clr,wihi rm tef hrfr we Therefore itself. prime a is which colors, 13 norclrmpt hsword. this to map color our in

Pélardon Bleu

red Meaux de Brie Crottin

Gruyère Vinageois Petit ei Crémé Petit Roquefort Cornet Mont ar uVinage du Carré red Melun de Brie ireBlanche Pierre an Piat Saint lcnd lait de Flocon Chti’crémeux oltd Cassel de Boulet Pyramide Vinage du Fromage

Pavé

Cabrichon Bûche

Trésor blanc Fromage c Bleu Servagnet Trempette Montgaren Miaulante Mimolette Brosse tPaulin St Maroilles Brique Bargkas oobe eSivry de Colombier Barkass è’d rep’ de Tèt’ olted Cambrai de Boulette Belledochon otnd Culotte de Bouton rneRéserve Grande Randonneur srltvl rm to prime relatively is Beaufort tMarcellin St Morbier Pavé Chabicou Sérac Tchic Bouchon Reblochon Cancoillotte [

FaisselleGalochard Monplaisir lud chèvre de Bleu Morbier Beaufort c tMéaudrais St Gruyère

col

Saint-Grégoire Féta Comté o example for , Abondance Reblochon

ahrnfribourgeois Vacherin Tomme Caillou Galet

Bleu du Vercors-Sassenage

0 Gruyère Raclette Abondance Piemont MunsterRécollet ,..., l oosuiomydsrbtdover distributed uniformly colors Bronnou col b rycl eedn nfn size. font on dependent Grayscale (b) Brébizou rmg blanc Fromage Bûche tJaoua St l − Faisselle om eRhuys de Tomme 1 c Beaussay aneMued Touraine de Maure Sainte ] Claquebitou Crottin tMarcelin St Camembert oosaeasge in assigned are Colors . Perche = Gaperon Bleu du Vercors-Sassenage Rhuys de Crémeux ’i Charo P’tit airroux Ramier Pyramide Nivernais Charolais Camembert Camembert ’i Paul P’tit Bouchon ored’Ambert Fourme Gouda Chabichou Vinadoux Chabis Bûche rmg blanc Fromage Bouchon l Savoureux Valaine seFg ) Using 4). Fig. (see 3 lud’Auvergne Bleu Blanchardon Bassoise

an Nectaire Saint Gascounet

Picodon Jolirond and eMazet Le Pyramide Bergues Raclette Pyramide rmg eBergues de Fromage Gruyère Long Chabis Portion tFélicien St tPiat St Palet lud’Avèze Bleu Cabotin Camembert Calou lud pays de Bleu Vinageois Bleu

og fort Rouge ei ou eRety de Coeur Petit Chti’corée

Maloine

Pavé

Montagnard blanc Fromage

Brique Roncq Vieux Pyramide Chabicou i

Brique Tomme Gris Ch’ti Sivry Nanteau

Brique Brie Bûche Récollet tMéaudrais St

eilsd Limousin du Feuilles steidxof index the is a D’osseux Val Bleu

red Meaux de Brie Pavé Crottin Gruyère Vinageois Petit ei Crémé Petit Roquefort Cornet Mont ar uVinage du Carré red Melun de Brie ireBlanche Pierre an Piat Saint lcnd lait de Flocon Chti’crémeux rmg uVinage du Fromage Pélardon Pyramide Cambrai de Boulette

Gruyère

Cabrichon Bûche

Camembert blanc Fromage Servagnet Bleu Trempette Montgaren Comté Mimolette Brosse tPaulin St Maroilles Brique oltd Cassel de Boulet Bargkas oobe eSivry de Colombier Barkass Belledochon otnd Culotte de Bouton rneRéserve Grande Randonneur Saint-Grégoire tMarcellin St Trésor Morbier Pavé Sérac Miaulante Tchic Bouchon Cancoillotte Récollet

FaisselleGalochard Monplaisir lud chèvre de Bleu Morbier Beaufort

ahrnfribourgeois Vacherin Caillat Féta Reblochon Reblochon

Beaufort Tomme Caillou Galet

Abondance Raclette Abondance Piemont Munster rep’ de Tèt’ 2 . euto hsi htwrsgtsae on hc xliswhy explains which down, scaled get words The that difficult. is becomes this words of all placing result that acceptable. such is orientations which ent 0.69%, only is 45° uses uses that that cloud word one geo the the and between 90° error in Britain difference rotations allowing Great The when and 90°. smallest of France is error sets coverage data The the Ireland. difference plus for symmetric rotations the allowed and the represented for not words of percentage rotations. the looks in result freedom the more having orientation, when vertical even note strong clusters uniform, we most a more Finally, Since have help. Britain sparse. may set Great less data in is specific a result of the structure the and that more spread to words. many tend rotated of by left presence gaps the the fill because to happens allows words. Great tags This of less more case the contains Ireland. in set impact and minor data a Britain the has rotations where especially the France, for is choice of This Such case texture. the non-uniform in a true producing of risk the filled has be areas. to separate guaranteed of multiple of is scaling consists Great region relative map every in the the not when dense preserve uniformly, reason, more this to was For tries set algorithm words. data the the and in Britain included tags of amount tex- the and together uniform. packed more tightly more is is are ture which words gen- map The a In produces spaced. words evenly rotations. orthogonal allowed of 1 choice the the Fig. Ireland, for eral, plus values 10°. Britain parameter Great or different and 45° France Wewith 90°, respectively show points. of 2 its rotations Fig. of allowing and component by principal algorithm the the orien- using the test cluster determines algorithm that a The of rotations output. tation allowed the the in have determines may parameters words input the of One rotations Allowed 5.1 the discuss and computations. algorithm our We of our clustering. time of the running placement and then word We size the font illustrate cloud. initial also word the geo for the settings on different test rotations allowed the of and influence clusters the of all points. number for of the sum number total at the the look take by we represented, divide not cluster the is each by that for represented points words. not earlier: by is stated covered that as not points words of is percentage that the map measure the We of percentage the expresses difference. symmetric the of measuring much by how filled measure amount is of We the map diagonal diagonal. expresses the the the percentages to by in relative error error displacement the this of at the divide Looking as we map. measured and the is distances error of of coverage sum maps the total make to comparable: measures sizes the to different changes some make we However, RAM. of clock GB a 4 with and processors GHz web Xeon(R) 2.30 a Intel(R) of hard- 4 is speed The of platform Sage. consists of The quota instance hosted ware experiments. a provides the which run solution use based to We Cloud package. software Math mathematical Sage based Python a [14], Sage images tags. 545,140 which contains unique set in 126 data mode with resulting or The camera taken. the was frequent about photo most are the that the those selected removed set Ireland, and data and tags the Britain filtered a We Great from [10]. filtered only photos is keeping geo-tagged set with data set second data The Flickr cheese. produced of types i.e. yalwn oerttoswrsaepae nalto differ- of lot a in placed are words rotations more allowing By point, per error average coverage the show 3 Table and 2 Table Ireland in words that is options rotation more of effect positive A and compactness the decreases rotations allowed more Adding the because sparse looks Ireland concerning map the of part The the testing by start We experiments. several describe we Below and interval unit the to normalized is difference symmetric The 2. Section in described as measures the use we experiments the In using implemented was above described algorithm The Allowed Coverage Words not Symmetric Coverage Words not Symmetric Initial font size rotations error represented difference error represented difference 90° 7.31% 11.28 % 22.26 % 100% 7.31 % 11.28 % 22.26 % 45° 8.00 % 14.19 % 24.08 % 90% 8.39 % 9.13 % 25.94 % 10° 8.74 % 15.88 % 25.41 % 80% 7.59 % 4.17 % 27.42 %

Table 2: Metrics on allowed rotations (France) Table 4: Metrics on initial font size (France)

Allowed Coverage Words not Symmetric rotations error represented difference with 100, 90 or 80 percent of the optimal size, for the France data ◦ set. We observe that decreasing the initial font size from 100% 90 10.43 % 25.18 % 35.93 % of the optimal size to 90% increases the average coverage error. 45◦ 10.90 % 27.55 % 37.93 % ◦ The reason for this is that all words get smaller, including the large 10 10.63 % 30.40 % 38.53 % words such that they probably cover less points of their cluster. The distance from the points to those large words is now also increased. Table 3: Metrics on allowed rotations (Great Britain and Ireland) Decreasing the initial font size further to 80% gives a smaller error again, because the words are placed closer to their desired location. By decreasing the initial font size more words can be placed at the percentage of words that are not represented increases when the that initial font size. Therefore, the percentage of words not rep- number of allowed rotations increases. resented decreases. While this percentage goes down when care- The symmetric difference is smallest when only allowing rota- fully decreasing the initial font size, the symmetric difference in- tions of 90°, which matches our observation that the geo word cloud creases. This can be explained by the fact that all the words are looks more packed. simply smaller, which leads to less coverage of the map as a whole. Comparing the measures of the two data sets shows that France Although experimental data favors selecting an initial font size is covered better than the Great Britain plus Ireland, based on the of 80%, visual inspection shows that there are regions which are symmetric difference. Ireland has a large influence on this value, not filled nicely. Particularly for France, the Bordeaux region starts because the data set contains less points for Ireland relative to Great to contain white areas if we lower the initial font size. Britain and we want to preserve the relative scaling. We can aim to combine the positive effects of fewer and of more 5.3 Clustering rotation options by restricting the rotations based on the importance Placing a word for each point in the map will generate a word cloud of a word. Fig. 5 shows such a word cloud: the biggest 10 words where it is difficult to find frequent words, as can be seen in Fig. 6a. are orthogonal, 40% use 45°rotations and all remaining words 10°. Some measures for these kind of maps will be low, but these maps

Petit Coeur de Rety do not give an informative view of the data. The other extreme Petit Crémé JolirondBoulet de Cassel Vieux Roncq

Flocon de lait Pavé is to use a single cluster for each word type, as shown in Fig. 6c.

Ch’ti blanc Fromage Gris Fromage du Vinage Carré du Vinage St Piat Fromage de Bergues Mont Cornet Saint Piat Rouge fort Pierre BlancheFromage blanc Camembert Brique Petit Vinageois Gruyère Valaine Boulette de Cambrai This groups all words of a tag together. The disadvantage is that no Bergues Maloine Vinageois Récollet Chti’crémeux Chabicou Maroilles Pavé Tèt’ de rep’ regional differences can be seen anymore. Our approach performs Mimolette Vinadoux Monplaisir Féta Bargkas Camembert Piemont Reblochon Randonneur St Paulin Camembert Brébizou Perche Brie de Meaux clustering but preserves clusters that are far apart. The resulting geo Bronnou Chti’coréeCamembert Brie Récollet Tomme de Rhuys BarkassTchic Grande Réserve Fromage blancBûche Brie de Melun Portion Saint-Grégoire Bûche St Marcelin Roquefort Brosse word cloud, as can be seen in Fig. 6b, mediates between these two Claquebitou Sérac FaisselleBeaussay P’tit Paul Cabotin Munster Val D’osseux extremes. Multiple regions where tags are used can be observed, Sivry Cancoillotte Crottin Colombier de Sivry Pyramide Pyramide Crottin Montgaren St JaouaGouda Nanteau but the number of words per tag are not too large. Crémeux de Rhuys Chabis Nivernais Blanchardon Galochard P’tit Charo Montagnard Chabis TrempetteBleu

Bouchon PaletTomme Charolais Table 5 shows the coverage average error per point, percentage Abondance Long Feuilles du Limousin Bouchon

Bouton de Culotte of words not represented and the symmetric difference for three dif- Brique Calou Brique Bûche Bleu d’Avèze Bleu de chèvre Gaperon Pyramide Belledochon Beaufort Bleu Cabrichon St Félicien Bleu d’Auvergne Servagnet ferent types of clustering. The average error increases when points Le Mazet St Méaudrais Abondance Bouchon Savoureux Bleu de pays Bleu are clustered. This means that more points are placed further away Gruyère Faisselle Picodon Miaulante Bleu du Vercors-Sassenage Morbier Galet

SainteFromage Maure de Touraine blanc Raclette Beaufort from the position where they should be placed. This is as expected, Trésor Gruyère Reblochon Ramier rouxSaint Nectaire St Marcellin

Caillou Tomme Camembert Raclette CaillatComté GascounetBassoise Brique Morbier since now a point may have to be placed further away to cover other Vacherin fribourgeois Chabichou Bûche Fourme d’AmbertPavé points of the bigger cluster as well. PélardonPyramide If no clustering is applied multiple words cannot be placed at Figure 5: Allowed rotations based on size of words with smaller their position. These words have the same position, such that the words having more allowed rotations. words have to be scaled down in order to place them. When words are clustered less scaling is required, because the word can be placed closer to their position. The result is that the percentage 5.2 Initial font size of words not represented decreasing when applying clustering. The symmetric difference is lowest when k-means clustering is R4 of the problem formalization states that a word with k points should cover a total area as close as possible to kM0/n, where n is applied (see Table 5). When more clustering is used the clusters are the total number of points and M0 is the total area of the map. This suggests to start with a font size of a word such that this requirement is satisfied. We define this as the 100% initial font size. Coverage Words not Symmetric Clustering type During the algorithm some words cannot be placed nearby. A error represented difference word of this kind is scaled down and reinserted into the list of words No clustering 6.02 % 23.62 % 27.05 % to handle. Scaling a word without also scaling other words intro- k-means clustering 7.31 % 11.28 % 22.26 % duces relative scaling errors. We try to prevent this if possible. This One cluster per word 11.18 % 11.22 % 24.51 % can be done by starting with smaller initial font sizes. Table 4 shows the coverage average error per point, percentage of words not represented and the symmetric difference when starting Table 5: Metrics on type of clustering (France) Boulet de Cassel Vinageois Ch’ti Gris Fromage de Bergues Petit Vinageois St Paulin Jolirond Flocon de lait Jolirond Camembert Fromage du Vinage Ch’ti Gris Mimolette Chti’crémeux Fromage blanc Mont Cornet Petit Coeur de Rety Pavé Carré du Vinage St Piat Petit Crémé Vinadoux PavéMont Cornet Petit Crémé Vieux Roncq Carré du Vinage Chabicou Jolirond Carré du Vinage Carré du Vinage Vieux Roncq Camembert Petit Vinageois Fromage blanc Pavé Vinageois Brique blanc Fromage St PaulinChti’crémeux St Piat Chti’corée StFromagePaulin du Vinage Bergues Pierre Blanche Pierre Blanche Flocon de lait Pierre Blanche rmg blanc Fromage Tomme Petit Coeur de Rety Saint Piat Tomme RécolletVinadouxFromage du Vinage Maroilles Mimolette FromageRécollet de Bergues Boulet de Cassel Boulette de Cambrai Boulet de Cassel Fromage blanc Petit Vinageois TommeTommeFromage de BerguesTomme Bergues St Piat Munster Camembert Maloine Tomme ValaineRécollet Saint Piat Valaine Boulette de Cambrai Maloine Gruyère Tèt’ de rep’ Fromage blanc Chti’corée Faisselle Camembert Camembert MaroillesChabicou Tèt’ de rep’ Brique Pavé Rouge fort Maloine Piemont Vieux Roncq PiemontPavé Récollet Vinadoux St PaulinSt Paulin Fromage blanc Vinageois CamembertVinadoux Mimolette Chabicou Valaine Petit Crémé Rouge fort Crottin Fromage blanc BargkasFéta Piemont Grande Réserve

Munster Reblochon Tomme Grande Réserve Saint-Grégoire Féta Petit Coeur de Rety Maroilles Abondance Ch’ti Gris Mimolette Faisselle Nivernais Tomme Féta Camembert Brie de Melun Randonneur Fromage blanc Brie de Meaux MunsterRécollet Brie de Meaux Bergues Boulette de Cambrai Crémeux de Rhuys Fromage blanc Fromage blanc Grande Réserve Tomme Sérac Chti’corée Flocon de lait Tomme Nanteau Randonneur Camembert Monplaisir St Jaoua Bronnou Rouge fort Brie Tèt’ de rep’ Brébizou Nanteau TchicAbondance Pyramide BrieFromage blanc Raclette St Jaoua Roquefort Saint Piat Brie Brosse Crottin Claquebitou Fromage blanc Faisselle Perche Crémeux de Rhuys Monplaisir Fromage blanc Chti’crémeux Crottin TommeFromage blanc Barkass Bronnou Beaussay St Marcelin Fromage blancTomme Brie de Melun Pyramide Perche Tomme Tomme Crottin FaisselleTommeBargkas Brosse St Jaoua Roquefort Bargkas Tomme de Rhuys Camembert Bûche Roquefort Brie de Melun Fromage blanc Pavé Camembert Tomme Raclette Beaussay Tchic Gouda Faisselle Tomme CrottinSaint-Grégoire Fromage blanc Val D’osseux Cancoillotte Pyramide Claquebitou Fromage blanc Perche Fromage blanc Tchic Bûche Nanteau Barkass Bûche Faisselle Bûche Barkass Claquebitou Gouda Tomme Faisselle Raclette Camembert P’tit Paul Brébizou Tomme Saint-Grégoire Faisselle Brosse Monplaisir Gouda Pyramide Crottin Tomme Bûche Sivry Bronnou Raclette Fromage blanc Munster Crémeux de Rhuys Galochard Faisselle Faisselle Faisselle Val D’osseux Nivernais Munster Fromage blanc Cancoillotte Tomme de Rhuys Brie de Meaux Mont Cornet Brique BûcheCrottin Randonneur

Raclette Faisselle Brique Crottin Faisselle Fromage blanc Bouchon Pyramide Sérac Faisselle Colombier deCrottin Sivry FaisselleChabisP’tit PaulCrottin Tomme Tomme Montagnard Fromage blanc Fromage blanc Pyramide Sivry Montgaren Sainte Maure de Touraine Pyramide Tomme CrottinPyramide P’tit Charo Bûche BriqueTomme MontagnardSivryColombier de Sivry Cabotin Fromage blanc Gruyère Camembert Val D’osseux Cancoillotte Beaussay Tomme Tomme Tomme Brébizou Palet Colombier de Sivry Faisselle Crottin Chabis Trempette Fromage blanc AbondanceTomme Nivernais Galochard Trempette FaisselleLong Bleu d’Auvergne St Marcelin Sainte Maure de Touraine CabotinCrottin Fromage blanc Cabrichon

Tomme MontagnardBleu Galochard Brique Chabis Reblochon FaissellePalet RaclettePortion Gruyère Chabis Tomme Tomme PaletTomme Cabrichon Pyramide BriqueRaclette Reblochon P’tit Paul CrottinGaperon Chabis BûcheTomme Pyramide CabrichonTomme Reblochon Sérac Bouchon Long Portion Faisselle Faisselle Fromage blanc Crottin Reblochon Munster Bleu d’Avèze Bouchon Tomme Montgaren St Méaudrais P’tit Charo Fromage blanc Belledochon St MarcelinBassoise Faisselle Pyramide St Félicien Bûche Reblochon Gaperon Bouton de Culotte Fromage blanc Calou Abondance Bûche Pyramide Faisselle BleuFromage blanc Calou Bleu de chèvre Bûche Brique Tomme Bleu de chèvre Camembert Caillat Gruyère Tomme Saint Nectaire Belledochon Tomme TommeBleu du Vercors-Sassenage Beaufort Bouchon Pyramide Bouton de Culotte Bûche Bouchon Fromage blanc Faisselle Fourme d’Ambert Bleu de pays Tomme CharolaisPortion

Fourme d’Ambert Beaufort Bleu Tomme Brique Servagnet Long Gaperon Savoureux CaillatCrottin P’tit Charo Brique Servagnet St Marcellin Faisselle Raclette Cabotin Reblochon Tomme Blanchardon Camembert Calou Pyramide Fromage blanc BleuBleu du Vercors-Sassenage Bassoise Bleu St Marcellin Feuilles du Limousin BleuFromage blanc Morbier Faisselle Faisselle TommeBeaufort Blanchardon Bouchon Bleu du Vercors-Sassenage Saint NectaireTomme Sainte Maure de Touraine Caillou Faisselle BouchonCaillat Saint Nectaire GaletFromage blanc Faisselle Faisselle St Méaudrais Charolais Picodon Fromage blanc Bleu d’Avèze Fromage blanc Fourme d’AmbertPicodonBleu Picodon PyramideSt Félicien Tomme Bleu Faisselle Bleu du Vercors-Sassenage Fromage blanc St MarcellinTrésor Bleu du Vercors-Sassenage Bleu de chèvre Trempette Faisselle St Félicien Bleu de pays Miaulante Bassoise TommeFromage blanc BriqueSt Marcellin TommeCrottinTomme Fourme d’Ambert Miaulante Morbier Galet Comté Raclette Raclette Fromage blanc Tomme Brique Faisselle Charolais Crottin St Méaudrais Le Mazet Montgaren

Fromage blanc Raclette Crottin Tomme Ramier roux Caillou Bleu d’Auvergne Caillou Saint Nectaire Tomme MorbierRacletteTomme Vacherin fribourgeois Raclette Trésor Abondance Morbier Ramier roux Bleu d’Avèze Raclette Gruyère Tomme Pyramide Bleu de pays Servagnet Pyramide Pélardon Morbier Abondance Le Mazet St Marcellin Tomme Bleu d’Auvergne Tomme Ramier roux Gruyère Raclette Comté Savoureux Pavé Beaufort Belledochon Trésor Brique GruyèreVacherin fribourgeois Bleu BeaufortBûche Reblochon Feuilles du Limousin Gruyère Bouton de CulotteBlanchardon Miaulante Galet Crottin Tomme de Rhuys Comté Vacherin fribourgeois ChabichouPélardon Le Mazet Gruyère Gascounet Feuilles du Limousin Gascounet FaisselleTomme Gascounet Savoureux Chabichou Chabichou Tomme Pélardon Brique Pavé Bûche FaisselleBûche Pélardon

Tomme Pélardon (a) No clustering. (b) k-means. (c) Single cluster per word.

Figure 6: Geo word clouds of French : extreme cases for clustering. bigger, the words will have a larger font size, making them harder set and 60 minutes on the Great Britain plus Ireland data set when for words of similar size to stack nicely and fill up the map. the allowed rotations are 90°. Increasing the allowed rotations to We conclude that the extreme cases of clustering do not work 10° increases the running time by 60%. Preprocessing the data well for our problem and that our clustering is a better fit. set including clustering and normalizing takes only seconds for the France data set, but takes several minutes for the Great Britain and 5.4 Word placement Ireland data set, which has thousands of annotated images. To illustrate how our algorithm places words, we generated point maps which show the clusters and placements for the first three 6 EXTENSIONS words of the French cheese data set (see Fig. 7). Tomme is the most Our current algorithm focusses on finding a word cloud where the frequent cheese and hence the first to be placed. The clustering distance from data points to their visual representation is low. We generates two clusters of points, indicated by disks and crosses in discuss several extensions that expand or change the current re- Fig. 7a. The right Tomme is placed next to the boundary of France quirements. Furthermore, we discuss different kinds of data that such that it covers many disks. The locations for the left Tomme are would also be suitable for this kind of visualization. more widespread, so there is no unambiguous location for Tomme. A natural extension of our current requirements would take into It is placed as well as possible. account the directional relations between words. A second possibil- The second most frequent cheese is Faisselle. Also here the clus- ity is to determine the font size of a word not by the size but by the tering returns two clusters (see Fig. 7b). The right Faisselle cannot area of the corresponding point cluster. Hence a spread out cluster be placed at its optimal location, because Tomme is already placed of points would be represented by a larger word than tightly clus- and Faisselle is too large for the space left. Given this restriction, tered points. This kind of requirement deviates further from tradi- Faisselle is placed such that it covers as many points as possible. tional word clouds (where size directly corresponds to frequency), The third most frequent cheese is Crottin. Its locations are also but would put more emphasis on the spatial properties of the data. divided into two clusters. The points for the right Crottin are Alternatively, we could use a Voronoi diagram of the data points spread. Given that Tomme and Faisselle are already placed, Crottin and use the area of the Voronoi cells of a cluster to determine the is placed such that it covers as many points as possible (see Fig. 7c). font size. As the size of the point set increases, the sum of the areas can be expected to approximate frequency. 5.5 Running time We could also create geo word clouds for statistical data which is The experiments are performed using Sage Math Cloud. The place- aggregated by area (like wine production or population). We have ment algorithm runs on average in 30 minutes on the France data two clear options for spatially informative word clouds using this

Carré du Vinage Chti’corée Chti’crémeux Boulet de Cassel Chti’corée Boulet de Cassel Brique Boulet de Cassel Petit Coeur de Rety Brique Flocon de lait Brique Ch’ti Gris Ch’ti Gris Ch’ti Gris

Fromage du Vinage Petit Coeur de Rety Camembert Petit Vinageois Pavé Camembert Carré du Vinage Pavé Fromage blanc Pierre Blanche Pavé Petit Crémé Fromage blanc Fromage blancVieux Roncq Mont Cornet St Piat Petit Crémé Fromage du Vinage Pierre Blanche St Piat Petit Vinageois Mont Cornet Vinageois St Piat Saint Piat Mont Cornet Bergues Rouge fort Jolirond Barkass Maloine Flocon de lait Vieux Roncq Boulette de Cambrai CamembertCamembert Jolirond Vieux Roncq Boulette de Cambrai Jolirond Vinageois Boulette de Cambrai Flocon de lait Petit Crémé Maloine Vinadoux Fromage de Bergues Chti’crémeux Récollet Vinadoux Fromage de Bergues Carré du Vinage Bergues Chti’corée Bergues Vinadoux Fromage blanc Pyramide Chti’crémeux Pierre Blanche Valaine Mimolette Fromage blanc Fromage de Bergues Mimolette Brie de Meaux Piemont Valaine Brie de Meaux Fromage du Vinage Valaine Maroilles Tèt’ de rep’ Piemont Piemont Perche Maroilles Tèt’ de rep’ St Marcelin Tèt’ de rep’ Petit Vinageois Portion ReblochonRécollet Cabotin Reblochon Reblochon Saint Piat Chabicou Grande Réserve Chabicou Récollet Récollet St Paulin Maroilles Saint Piat Grande Réserve Grande Réserve Camembert St Paulin St Paulin Récollet Brosse Monplaisir Féta Brosse Roquefort Féta Féta Récollet Rouge fort Nanteau Brosse Sainte Maure de Touraine Rouge fort Mimolette Petit Coeur de Rety Brébizou Nanteau Pavé Brébizou Beaussay Monplaisir Pavé Monplaisir St Marcelin Brie deBrie Melun Sivry Brie Brébizou Nanteau Pavé Bronnou Brie de Melun Brie Camembert Tchic Bronnou St Marcelin Barkass Bronnou Brie de Melun Maloine Bouchon Randonneur Vinageois Camembert Tchic Pyramide St Jaoua Roquefort Randonneur Chabicou Tchic St Jaoua St Jaoua Bargkas Saint-Grégoire Camembert Camembert P’tit Paul Saint-Grégoire Roquefort Gouda P’tit Charo Saint-Grégoire Bûche Munster Perche Munster P’tit Charo Bûche Val D’osseux Bûche Perche Munster

Crottin blanc Fromage CrémeuxFaisselle de Rhuys Camembert Crottin Brie de Meaux Sainte Maure de Touraine Bûche Faisselle FaisselleCrottin Pyramide Gouda Bûche Bûche Claquebitou Bargkas Gouda Tomme de Rhuys Tomme de Rhuys Portion Bargkas Tomme de Rhuys Portion Barkass Colombier de Sivry Crémeux de Rhuys Colombier de Sivry Crémeux de Rhuys Randonneur Pyramide Colombier de Sivry ChabisPyramide Cancoillotte Sivry Pyramide Pyramide Cancoillotte Beaussay Sivry Cancoillotte Beaussay Brique Brique Galochard P’tit Paul Chabis Camembert Galochard Camembert Galochard Claquebitou Claquebitou Val D’osseux Bouton de Culotte Chabis Bouton de Culotte Chabis Charolais Val D’osseux Bouton de Culotte

Bleu du Vercors-Sassenage Bûche

Bouchon Bûche Montagnard Montagnard Long Bûche Trempette Long Montagnard ChabisTommeNivernais Bleu de chèvre Chabis Blanchardon P’tit Paul P’tit Charo Cabotin Belledochon Long Galet Tomme Feuilles du Limousin Tomme Crottin Abondance Pyramide Bleu de pays Montgaren Galet Crottin Gruyère CrottinGaletGruyère Bleu d’Auvergne FromagePyramide blanc Trempette Fromage blanc Trempette Pyramide Servagnet Palet Cabrichon Bouchon Cabrichon Palet Cabrichon Palet Pyramide Belledochon Gaperon St Marcellin Raclette Beaufort Nivernais Nivernais Belledochon

Raclette Bouchon

Camembert Sérac Raclette Bouchon Bleu d’Auvergne Bleu de chèvre Cabotin Gruyère Sérac Bleu de chèvre Brique Bleu d’Avèze Sérac Fromage blanc Blanchardon Morbier Beaufort Charolais Bleu Bleu d’Avèze Bleu Morbier Bleu Beaufort Fourme d’Ambert Savoureux Bleu Sainte Maure de Touraine St Méaudrais Morbier St Méaudrais Gaperon St Méaudrais Bleu Abondance Bouchon Calou PyramideFeuilles du Limousin Bassoise Bleu d’Avèze Abondance Brique Charolais Bleu de pays Gaperon Faisselle Bouchon Brique Bassoise Brique Feuilles du Limousin Faisselle Blanchardon Faisselle Bleu d’Auvergne

Bleu Bleu du Vercors-Sassenage Raclette Bouchon St Félicien Calou Bleu du Vercors-Sassenage Bassoise Caillat Saint Nectaire Gruyère Savoureux Caillou Raclette Caillou St Félicien Caillou Ramier rouxRaclette Gruyère Bleu de pays Reblochon Reblochon Miaulante Comté Gascounet Montgaren Camembert Montgaren Saint Nectaire Bleu Vacherin fribourgeois St MarcellinCamembert Picodon Comté St Marcellin Miaulante Pélardon Servagnet Vacherin fribourgeois Comté Gascounet Morbier Le Mazet Gruyère Bleu Le Mazet Gruyère Servagnet Bleu Vacherin fribourgeois Picodon CalouTommeSt Félicien Abondance Gruyère Gruyère Chabichou Reblochon Saint Nectaire Pélardon TrésorTommeMorbier Ramier rouxPicodon Pélardon Tomme Bouchon Abondance Trésor Morbier Trésor Ramier roux Miaulante Gascounet Savoureux Abondance Beaufort Fourme d’Ambert Caillat Fourme d’Ambert Caillat Beaufort Beaufort FromageLe MazetPavé blanc Fromage blanc Fromage blanc Chabichou Pavé Chabichou Pavé BriqueBûche Pyramide BriqueBûche BriqueBûche Pyramide Pyramide (a) Placement of Tomme. (b) Placement of Faisselle. (c) Placement of Crottin.

Figure 7: Point maps for Geo word clouds of French cheeses: placing the first three words. kind of data. First, we could define a good placement of the word as under project no. 612.001.207 and 639.023.208, respectively. We one where the word overlaps with the area in some way. If the word thank the primary reviewer for helpful comments on the paper. fits in the area, the best placement would be inside it, otherwise completely covering the area would be the best placement. REFERENCES We can fit this new problem in the problem description we are currently using. If we overlay the map with a grid and put a data [1] L. Barth, S. I. Fabrikant, S. G. Kobourov, A. Lubiw, M. Nollenburg,¨ point on each grid point, we get more data points in big areas and Y. Okamoto, S. Pupyrev, C. Squarcella, T. Ueckerdt, and A. Wolff. less data points in smaller areas. We can use our current algorithm Semantic word cloud representations: Hardness and approximation al- to produce an output on the data points we get in this way, to solve gorithms. In Proc. 11th Latin American Theoretical Informatics Sym- posium (LATIN), volume 8392 of Lecture Notes in Computer Science, the problem for areas. Another option is to always require words to pages 514–525, 2014. be inside the area. This way, we create small instances of the bigger [2] K. Buchin, B. Speckmann, and S. Verdonschot. Evolution strate- problem within the different areas of the map. gies for optimizing rectangular cartograms. In Proc. 6th International Fig. 2 shows geo word clouds where the relative scaling between Conference on Geographic Information Science (GIScience), volume Great Britain and Ireland is maintained. Here Ireland is sparsely 7478 of Lecture Notes in Computer Science, pages 29–42, 2012. filled, because the data set contains less points for Ireland. We could [3] M.-T. Chi, S.-S. Lin, S.-Y. Chen, C.-H. Lin, and T.-Y. Lee. Morphable extend the algorithm to scale words relative to the area of their com- word clouds for time-varying text data visualization. IEEE Trans- ponent instead of to the complete map (see Fig. 8). We could also actions on Visualization and Computer Graphics, 21(12):1415–1426, combine our approach with contiguous area cartograms [7], to dis- 2015. tort regions to match the required size. [4] W. Cui, Y. Wu, S. Liu, F. Wei, M. X. Zhou, and H. Qu. Context pre- serving dynamic word cloud visualization. In Proc. 3rd IEEE Pacific Visualization Symposium (PacificVis), pages 121–128, 2010. park Edinburgh church festival [5] D. Dorling. Area Cartograms: their Use and Creation, volume 59 of landscape

wedding castle street rocks Concepts and Techniques in Modern Geography. University of East

square Scotlandpeopleurban rock Anglia, Environmental Publications, Norwich, 1996. night city bridge reections [6] M. Formann and F. Wagner. A packing problem with applications to winter music nature road clouds museum

house united lettering of maps. In Proc. 7th International Symposium on Computa- trees sunset birds garden sunrise tional Geometry (SOCG), pages 281–288, 1991.

ower Cumbria harbour

building lake

Glasgow grass Scotland [7] M. Gastner and M. Newman. Diffusion-based method for producing autumn north snow lights tree bird old bridge density-equalizing maps. Proceedings of the National Academy of architecture

Britain orange white car dark travel Olympus England road sun sunrise British sunset tree railway travel nature seaseaside clouds summer rocks art carowers art red railway cold black Britain blue river sand Ireland boat Yorkshire Sciences of the United States of America (PNAS), 101(20):7499–7504, rock light light water wildlife church northern lake night sand black winter lm towerreection summer trees water square grass museum park gig

dark blue ower lights 2004. beach black spring

countryside Manchester sun spring north sky Kerry night landscape

building north white

beach Britain Great grati red

Yorkshire city

snow cold building

eld river

county wildlife autumn spring eld lake Londres ntdKingdom United [8] A. Jaffe, M. Naaman, T. Tassa, and M. Davis. Generating summaries orange yellow

ower sky white

green north coast night seaside castle

yellow lights white music wedding snow nature nature bird sunset live boat orange

river Ireland music bridge art leaves road snow and visualization for large collections of geo-referenced photographs. nature rock park road girl

Britain travel boat museum leaves

reection

architecture grati cold beach festival wildlife old Birmingham grass clouds railway house bridge street lights rocks

snow tower British Proc. 8th ACM International Workshop on Multimedia Information girl In wales festival sea tower

yellow bird urban leaves grati

coast countryside

rocks sea lm church gig girl

Londonower art Liverpool gig coast sun spring road birds sun live winter summer tree green church eld sky rock black street city sand Retrieval, pages 89–98, 2006. dark live tree Essex old museum green

music England car ower trees gig castle owers square sea wildliferiver sand wedding urban car dark

landscape water castle seaside house owers cold black night seaside

grass travel architecture British live square [9] M. Leginus, C. Zhai, and P. Dolog. Personalized generation of word wedding bird Olympus

beach light united city spring blue clouds orange bridge Dublinrailway green garden old autumn building sky

urban Bristol square summer

park beach sunrise light British sunset blue building bridge lm red trees autumn clouds from tweets. Journal of the Association for Information Sci- birds tree cold seaside lake harbour light garden sun leaves towerEngland tower birds reections lights art park wildlife people rock architecture waterreection street railway wedding street lm car lm music ence and Technology, 2015. orange grati red

harbour owers united rocks treesunited Thames Kent museum girl Sussex Dorset Hampshire [10] H. Mousselly-Sergieh, D. Watzinger, B. Huber, M. Doller,¨ E. Egyed- spring dark lake Brighton urban riverEngland snow wales citysunrise boatwinterBritain Zsigmond, and H. Kosch. World-wide scale geotagged image dataset Devon wales ocean Cornwall for automatic image annotation and reverse geotagging. In Proc. 5th ACM Multimedia Systems Conference, pages 47–52, 2014. Figure 8: Geo word cloud of Flickr tags in Great Britain and Ireland [11] E. Raisz. The rectangular statistical cartogram. Geographical Review, after manually merging the countries (rotations of 10 degrees). 24:292–296, 1934. [12] C. Seifert, B. Kump, W. Kienreich, G. Granitzer, and M. Granitzer. On the beauty and usability of tag clouds. In Proc. 12th International 7 CONCLUSIONS Conference on Information Visualisation, pages 17–25, 2008. [13] A. Slingsby, J. Dykes, J. Wood, and K. Clarke. Interactive tag maps We introduced geo word clouds and experimentally evaluated them and tag clouds for the multiscale exploration of large spatio-temporal with respect to trade-offs between different requirements and qual- datasets. In Proc. 11th International Conference on Information Visu- ity criteria. We found that orthogonal orientation of words yields alization, pages 497–504, 2007. the most compact drawings. By using an initial font size of 90% [14] W. Stein et al. Sage Mathematics Software (Version 6.7). The Sage of the optimal size we can reduce the displacement error at the cost Development Team, 2015. http://www.sagemath.org. of a slightly less filled map. We achieved this using a greedy algo- [15] M. Tennekes and E. de Jonge. Tree colors: color schemes for tree- rithm. We also considered a force-directed approach to place the structured data. IEEE Transactions on Visualization and Computer words without relative scaling, but this did not improve the place- Graphics, 20(12):2072–2081, 2014. ment, in particular since we have to deal with initially massively [16] M. van Kreveld and B. Speckmann. On rectangular cartograms. Com- overlapping words. putational Geometry: Theory and Applications, 37(3):175–187, 2007. The coverage of the map is also influenced by the clustering al- [17] F. B. Viegas, M. Wattenberg, and J. Feinberg. Participatory visualiza- gorithm chosen, we found that k-means clustering performs best. tion with Wordle. IEEE Transactions on Visualization and Computer The final geo word clouds produced by our algorithm add spatial Graphics, 15(6):1137–1144, 2009. information lacking in regular word clouds. Further studies should [18] A. Wolff and T. Strijk. The map labeling bibliography. http://li establish how this extra information can be utilized. inwww.ira.uka.de/bibliography/Theory/map.label ing.html, 2009. Acknowledgements. K. Buchin and B. Speckmann are supported by the Netherlands Organisation for Scientific Research (NWO)