The impact of the visual perception of shaded and flat objects on the usability of almost flat user interfaces

Kevin O’Connell

MSc User Experience Design Dún Laoghaire Institute of Art, Design and Technology Supervisors: Dr. Hilary Kenna, Dr. Andrew Errity April 2018

Declaration Statement

I hereby certify that the material, which I now submit for assessment on the programme of study leading to the award of Master of Science, is entirely my own work and has not been taken from the work of others except to the extent of such work which has been cited and acknowledged within the text of my own work.

No portion of the work contained in this thesis has been submitted in support of an application for another degree or qualification to this or any other institute.

______

Kevin O’Connell

30 April, 2018

i Acknowledgments

I would like to thank my family for their support during the many months of work to produce this research.

Many thanks to Joe O’Donaghue, who assisted me with the JavaScript component of the web application. I could not have completed the project without his valuable input.

Thanks to Pat Hamilton and Peter Johnson for proofreading my work.

Thanks to friends who assisted me with early user testing, and thanks to Sue Reardon who assisted me with IADT testing.

Finally, the support provided by my project supervisors, and IADT staff, was unparalleled. The measured input of Hilary Kenna and Andrew Errity refined my approach to the research question yet expanded the scope to provide for what was to be a highly challenging and rewarding research experience.

ii

Abstract

Studies into the visual design of user interfaces indicate that the dilution of signifiers on almost flat interfaces can lead to ‘click uncertainty’ (Meyer, 2015). This research study examines the impact of the visual perception of shaded and flat objects (signifiers) on the usability of almost flat user interfaces. Specifically, the study looks to answer the question “Can the use of shaded objects improve the visual perception of perceived affordances (signifiers) in almost flat user interfaces?”

Prior research details the perception of shape from shading which suggests that the application of shading can enhance the clickability, discoverability and findability of elements on almost flat interfaces. These studies fall short in the investigation of visual perception with respect to higher fidelity interfaces that utilise Gestalt visual design principles.

This thesis details a methodology for the examination of visual perception of perceived affordances in an almost flat user interface. The project extended prior research through the use of A/B testing of two web sites. The sites tested the visual perception of shaded (A test) and flat (B test) objects (signifiers). Thirty six participants were recruited for between-subjects testing. Testing measured response times between shaded (A test) and flat (B test) target identification. The test artefacts utilised Gestalt visual design guidelines, coupled with gamification principles.

The results suggest that there is no significant difference in response times between shaded and flat target identification, which would indicate that shaded targets do not improve visual perception of perceived affordances in almost flat user interfaces. This contradicts prior research by Creager (2017), where results from experiments revealed that shaded objects had the benefit of high findability in almost flat environments, and that convex gradients consistently conveyed a sense of depth and a raised three-dimensional shape, which was effective for reliably signifying clickability. However, the use of gamification principles was generally positively favoured by the test participants.

iii

Table of Contents

1 Introduction ...... 1 1.1 Keywords ...... 1 2 Literature review ...... 2 2.1 Introduction ...... 2 2.2 Visual Perception ...... 2 2.2.1 Ganzfeld and visual perception ...... 2 2.2.2 Gestalt ...... 3 2.3 Affordances and Signifiers ...... 9 2.4 Skeuomorphism ...... 11 2.5 The historical basis of Flat and Almost Flat design ...... 15 2.5.1 Modernism ...... 15 2.5.2 International Typographic Style (Swiss Style) ...... 17 2.5.3 Pioneers and influences of the movement ...... 17 2.5.4 How Modernism and International Typographic Style influenced contemporary user interface design ...... 22 2.6 Flat design ...... 23 2.7 Almost flat design/Skeuominimalism ...... 26 2.8 Summary of existing research ...... 28 2.9 Findability, clickability and discoverability: use of shaded gradients ...... 29 2.10 Research problem identified ...... 29 2.10.1 Overview ...... 29 2.10.2 Perception of shape from shading ...... 30 2.10.3 Perceptual biases in the interpretation of 3D shape from shading ...... 33 2.10.4 Toward Understanding the Findability and Discoverability of Shading Gradients in Almost-Flat Design ...... 36 2.10.5 Understanding the Findability and Perceived Clickability of Shaded and Flat Objects in Almost-flat Interfaces ...... 37 2.11 Research Question ...... 40 3 Research methodology ...... 41 3.1 Colour and contrast ...... 41 3.2 Targets and distractors ...... 42 3.3 Procedure ...... 44 3.4 Required prototype(s), techniques and technologies used to prototype ...... 44 3.4.1 Investigation of existing reaction games ...... 44

iv

3.4.2 Techniques and technologies used to prototype ...... 45 3.4.3 Choice of form factor...... 46 3.4.4 Advantages and disadvantages of each form factor...... 46 3.5 Design methodology ...... 47 3.6 Testing methodology ...... 48 3.6.1 Testing during application development ...... 48 3.6.2 Testing for the experiment: recruiting testers, ages and capabilities ...... 49 3.6.3 Screening test candidates ...... 49 3.6.4 Conducting the tests ...... 49 3.6.5 Environmental setup ...... 50 3.6.6 Data collection...... 50 3.7 Ethical and legal considerations ...... 50 4 Design & Implementation ...... 52 4.1 Paper prototypes ...... 52 4.1.1 Application flow ...... 52 4.1.2 Gamification ...... 52 4.1.3 Choice of form factor...... 53 4.2 Wireframe prototypes ...... 57 4.2.1 Balsamiq Mockups ...... 57 4.2.2 Adobe Xd wireframes ...... 57 4.3 Target/distractor arrays ...... 59 4.4 Implementation ...... 62 4.4.1 App Inventor ...... 62 4.4.2 HTML/CSS/JavaScript/PHP/MySQL ...... 62 4.5 Data collection ...... 64 5 User testing ...... 66 5.1 Exploratory study: wireframe prototype ...... 66 5.1.1 Testing with peers ...... 66 5.1.2 Testing using Amazon Mechanical Turk/Loop 11 ...... 68 5.1.3 Results from peer testing with Amazon Mechanical Turk/Loop 11 ...... 69 5.1.4 Exploratory study: recommendations for change ...... 71 5.2 Summative test: first iteration of the application ...... 71 5.2.1 Testing with Amazon Mechanical Turk/Loop 11 ...... 71 5.2.2 Results from testing with Amazon Mechanical Turk/Loop 11 ...... 73 5.3 Comparison test: application deployment ...... 73

v

5.3.1 Testing with Amazon Mechanical Turk/Loop 11 and IADT students ...... 73 6 Comparison (A/B) test: results ...... 75 6.1 Descriptive statistics ...... 75 6.1.1 Pre-test questionnaire data ...... 75 6.1.2 Post-test questionnaire data ...... 81 6.1.3 Analysis of experiment test data ...... 85 6.2 Inferential statistics: analysis of mean response time for all participants ...... 86 6.2.1 H1: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers ...... 87 6.2.2 H2: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers that include Gestalt principles ...... 88 6.2.3 H3: There will be no difference in click times between shaded and unshaded objects in tests with signifiers present on user interfaces derived from existing web sites ...... 89 7 Discussion ...... 90 7.1 Overview ...... 90 7.2 Research findings ...... 90 7.2.1 H1: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers ...... 91 7.2.2 H2: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers that include Gestalt principles ...... 92 7.2.3 H3: There will be no difference in click times between shaded and unshaded objects in tests with signifiers present on user interfaces derived from existing web sites ...... 92 7.2.4 Use of gamification principles as outlined by Kumar (Kumar, 2013) ...... 93 8 Conclusions ...... 94 8.1 Overview ...... 94 8.2 Key contributions ...... 94 8.3 Limitations...... 94 8.4 Future research ...... 95 8.5 Project reflection ...... 96 8.6 User research and design artefacts ...... 96 References ...... 97 Appendix A ...... 103 Appendix B ...... 108 Appendix C ...... 118 Appendix D ...... 119 Appendix E ...... 121

vi

Appendix F ...... 122 Appendix G ...... 123 Appendix H ...... 124 Appendix I ...... 126 Appendix J ...... 128 Appendix K ...... 129 Appendix L ...... 131 Appendix M ...... 133 Mean response time for test tasks: null hypothesis ...... 133 Test for normality ...... 133 Mean response times for test tasks (all targets) ...... 133 Mean response times for test tasks (abstract targets only) ...... 134 Mean response times for test tasks (screen grab targets only) ...... 134 Appendix N ...... 136

vii

1 Introduction

The following research investigates the impact of the visual perception of shaded and flat objects on the usability of almost flat user interfaces.

In 2012, with the release of the operating system, pioneered a general shift in user interface design to a ‘flat’ aesthetic and away from skeuomorphism. Subsequently, Apple and Google also adopted a ‘flat’ aesthetic. More recently, Microsoft and Google have iteratively updated their visual design languages to employ an ‘almost flat’ or ‘skeuominimalistic’ aesthetic.

This dissertation investigates the changes in contemporary design languages and looks to identify if the move to flatter user interfaces has led to a dilution of perceived affordances. The literature review examines affordances and contemporary design languages, while the research problem outlines prior studies relevant to the research questions.

Prior research by Ramachandran (1988), Liu and Todd (2004), Creager and Gillan (2016) and Creager (2017), outline the importance of shaded shapes in terms of visual perception, principally with regard to the findability and discoverability of perceived affordances (signifiers) in almost flat user interfaces. This research looks to extend that research, and to test the visual perception of shaded shapes within the context of a contemporary user interface, and in conjunction with Gestalt and gamification principles.

The report begins with an investigation of the key Gestalt visual perception principles and discusses affordances and signifiers. The historical basis of flat and almost flat design is discussed, examining skeuomorphic, flat, almost flat and skeuominimal user interfaces.

Prior research is discussed and a research question is formulated. Research, design and testing methodologies are detailed. Details of the design and user testing of the experiment are explained, followed by an examination and discussion of the test results. Finally, conclusions are discussed.

1.1 Keywords

Visual perception, perceived affordance, findability, discoverability, clickability, almost flat design, curved surfaces, gamification, games, gaming, skeuomorphic, skeuomorphism, flat, almost flat, skeuominimal, Gestalt, shaded targets

1 2 Literature review 2.1 Introduction

The literature review begins with an investigation into visual perception, with an emphasis on the Gestalt principles. The review continues with an appraisal of affordances and signifiers, examining their relevance in modern user interfaces. Following from this, the review will investigate skeuomorphic, flat and almost flat interfaces, in each case examining how the various visual approaches impact affordances and signifiers. The examination of flat and almost flat interfaces is prefaced with an exploration of the aesthetic and philosophical foundations that underpin the emergence of the adoption of ‘Flat’ and ‘Almost Flat’ as de-facto contemporary user interface standards.

2.2 Visual Perception

Visual preferences, visual styles may change, but visual perception does not change. Human biology does not react to contemporary consensus on what merits good design – our visual perception just ‘is’, but as designers we can make decisions about how to facilitate visual perception, and how to enable users, making the task of perceiving easier, or more difficult. The presentation of user interfaces is underpinned by fundamental design principles, including Ganzfeld and Gestalt principles.

2.2.1 Ganzfeld and visual perception

In ‘Perception and Imaging’, Zakia (1997) noted that a completely homogenous visual field provides the viewer with nothing to fix upon. A homogenous visual field is referred to as a ‘ganzfeld’, the German for ‘complete field’. Exposing a viewer to a completely homogenous visual field can cause disorientation, and in some cases can lead a viewer to experience hallucinations. This is as a result of the brain amplifying neural noise, which is interpreted in the higher visual cortex giving rise to hallucinations.

User interfaces are not homogenous, but as noted by Zakia (1997), “Even when the visual field is not homogenous, perception of the observed scene may not be accurate unless appropriate visual cues are also present”. (P.2)

So an objective of user interface design is to use appropriate visual cues to present interactive elements.

2

2.2.2 Gestalt

Gestalt psychology originated in Germany in the early 20th century. Brownie (2006) describes how Max Wertheimer, an early pioneer of Gestalt, formulated a framework to explain how we organise and group visual elements, specifically how visual elements are grouped to be perceived as wholes. Over time, a number of principles were developed by the Gestalt psychologists to describe the various ways in which elements can be grouped and perceived, including:

• proximity • similarity • continuity (good continuation) • closure • pragnanz • common fate • figure/ground

Within the scope of the literature review, the principle of figure/ground is most applicable to the perception of shaded objects, but is it useful to also outline the other Gestalt principles, as the principles work in concert.

Figure/ground

Lidwell, Holden, & Butler, (2010) note that the figure/ground relationship is one of a number of Gestalt principles of perception. Our visual perception separates stimuli into figure elements – which are objects of focus, and a ground element – an undifferentiated background. When there is a clear distinction between figure and ground, figure elements receive more attention than ground elements, and are move findable. Where there is no clear distinction between figure and ground, the interpretation of elements alternates between figure and ground, and figure elements are less findable.

The black triangle in the white square in figure 2.1 represents a simple heterogeneous visual field. The image possesses a pair of distinguishable attributes that can be identified as figure and ground.

3

Figure 2.1. Homogenous and heterogeneous fields. From the author. As noted by Zakia, there are several important observations regarding figure/ground relationships that also underpin the visual perception of user interfaces.

1. Even though figure and ground are on the same plane, the figure often appears nearer to the observer. 2. Figure and ground cannot be seen simultaneously, but can be seen sequentially. 3. Figure usually occupies an area smaller than does ground. 4. Figure is seen as having a contour, ground is not perceived as having a contour.

With respect to figure/ground relationships, Zakia users a term borrowed from science and engineering, the ‘signal to noise ratio’, which he uses to quantitatively describe figure-ground. Zakia notes that increasing the differences between figure and ground, or signal and noise, can increase the ‘vividness’ of the corresponding visual perception.

In the book Universal Principles of Design(2010), Lidwell expands on the concept of signal to noise ratio, noting that:

“Maximising signal means clearly communicating information with minimal degradation. Signal degradation occurs when information is presented inefficiently: unclear writing, inappropriate graphs, or ambiguous icons and labels” (p. 224), and “minimising noise means removing unnecessary elements, and minimising the expression of necessary elements”. (p.244)

4

With regard to a user interface, in most cases we want to minimise the amount of time that it takes a viewer to resolve figure/ground relationships. Ambiguous figure/ground relationships will take more effort to resolve.

Figure 2.2. Figure/ground use in web design. The black and white contrast provides an unambiguous presentation of the site name ‘uncrate’. From Uncrate (n.d.).

Proximity

In Universal Principles of Design (Lidwell et al., 2010), Proximity is defined as ‘the tendency to perceive elements that are closer together as being more related to each other than elements that are further apart’. (p.196)

5

Figure 2.3. Example of proximity in web design. The use of columns and text alignment reinforces relatedness of items. From Amazon (n.d.).

Similarity

Zakia states that “visual elements that are similar (in shape, color, size, movement and so on) tend to be seen as related” (p. 40). So when we see things that are related we tend to group them and see them as patterns.

Figure 2.4. An example of similarity in web design. From Crazy Diamond (n.d.).

6

Continuity

Zakia notes that “visual elements that require the fewest number of interruptions will be grouped to form continuous straight or curved lines”. (p. 50)

Figure 2.5. Website demonstrating continuity: the watches are aligned in an arc that leads the eye from the lower left to upper right. From Apple (n.d.).

Closure

In universal principles of design, closure is defined as “a tendency to perceive a set of individual elements as a single, recognisable pattern, rather than multiple, individual elements”. (p. 44)

7

Figure 2.6. The use of type demonstrates closure. Despite the breaks in the letter forms, the whole letter is interpreted. From NBC (n.d.).

Pragnanz

The tendency to visually simplify the grouping of elements. In Universal Principles of Design, Lidwell et al.(2010) note that Pragnanz is ‘a tendency to interpret ambiguous images as simple and complete, versus complex and incomplete’. (p.144)

Figure 2.7. The simple flat illustrations in the Windows 10 settings demonstrate Pragnanz. From the author.

8

Common fate

‘Elements that move in the same direction are perceived to be more related than elements that move in different directions or are stationary’. (p.50)

2.3 Affordances and Signifiers

The term "affordance" comes from the perceptual psychologist J. J. Gibson, who developed an ecological approach to perception in which affordances arise from direct perception (Still & Dark, 2013).

Norman (2013) expanded on the concept of affordance, explaining that “Affordances define what actions are possible” (p. XV). He further states that “The term affordance refers to the relationship between a physical object and a person (or agent)” (p. 11). Affordances define how we interact with objects in the physical world, so for example, a chair ‘affords’ sitting. A chair can also afford lifting, but not by all people – not by those too weak to lift the chair. Norman went on to define affordances as relationships (between the object and the agent) and to be effective affordances need to be discoverable, or perceivable. Norman’s perceived affordances refer to the perceived properties of an object that suggest how one could use it.

Gaver (1991) discussed perceived affordances, stating that affordances are independent of perception – they exist even if they are not acted upon. He states that when affordances are perceptible, they offer a direct link between perception and action, and that false affordances lead to mistakes, so there needs to be perceptual information available in order to act upon an affordance. Hartson (2003) reinterpreted Norman’s concept of perceived affordance, equating it to cognitive affordance, or helping users with their cognitive actions.

Norman further expanded on affordances and perceived affordances to include the concept of signifiers. “Affordances define what actions are possible. Signifiers specify how people discover those possibilities: signifiers are signs, perceptible signals of what can be done.” (Norman, 2013, p. XV)

The concept of affordance has generated much analysis relating to its continued relevance in design. Still and Dark (2013) argued that perceived affordances are still relevant, and support perceptual processes developed over time through consistent interactions with the environment.

In the book About Face (Cooper, 2014), Cooper re-interpreted Norman’s concept of affordances, and uses the concept of manual affordances, or when applied in the context of user interfaces,

9

virtual manual affordances. He notes that the trend towards flatter interfaces threatens ease of use by removing virtual manual affordances at the expense of strict visual simplification.

Krug (2014, para. 1) argued the case for including affordances, stating that “the clearer the visual cues, the more unambiguous the signal”. He further stated that mobile devices reduce the opportunities to include affordances – given that the hover state or over state is not available on mobile devices. He noted that the Flat design trend removed visual distinctions, further diluting affordances.

By way of contrast, some argued that affordance is a concept that no longer has value. Oliver (2005), for example, stated: “It is argued that the concept has drifted so far from its origins that it is now too ambiguous to be analytically valuable“. Oliver went on to ask that, despite the ambiguous nature of the term, was it still useful for designers to continue using it? He argued against it, noting that “Some coherent, plausible alternative is needed.” (p. 1)

Norman (2013) himself was of the opinion that his original use of the term has caused much confusion among the design community.

Soon designers were saying such things as, “I put an affordance there,” to describe why they displayed a circle on a screen to indicate where the person should touch, whether by mouse or by finger. “No,” I said, “that is not an affordance. That is a way of communicating where the touch should be.” (p. 13)

The issues around this misunderstanding of the term led Norman to clarify and extend his concept of affordance to include the ideas of perceived affordance and signifiers. Norman (2013)noted that “Perceived affordances help people figure out what actions are possible without the need for labels or instructions” (p.13), and “signifiers communicate where the action should take place”. (p. 14)

Additional research has questioned the need for affordances for ‘digital natives’. Oswald and Kolb (2014) carried out a survey relating to how the visual differences between iOS6 (skeuomorphic) and iOS7 (flat) were perceived and if they changed over time. The survey was repeated eight months later, to determine if perceptions to the update in the UI had changed. Their results suggested that affordances were no longer required, noting that “affordances based on physical micro-metaphors are not needed anymore, because a majority of the users have been using iOS6 for several years and do not need metaphoric-physical hints anymore, for instance to find out how to unlock an iPhone”. (p. 6)

10

However, the Nielson Norman group authored a number of studies that were critical of the general Flat design ethos, with particular emphasis on the quality of perceived affordances. One study (Meyer, 2015) which investigated the quality of ‘clickability signifiers’ before and after Flat design, noted that Flat designs tended to generate ‘click uncertainty’, which affected older users and younger ‘digital natives’ in slightly different ways. Younger users (18 to 30) were more adept than older users at locating clickable elements with weak signifiers, but the study noted that despite this, they did not enjoy click uncertainty any more than other users. This could imply that digital natives are still reliant on perceived affordances.

2.4 Skeuomorphism

The release of the Xerox Star in 1981 heralded an approach to HCI that defined the GUI for the next 30 years. The system utilised a conceptual model that presented visual metaphors of office hardware: paper, filing cabinets, folders and mimesis for other commonly used office hardware (Figure 2.8). At the time, the desktop metaphors were commonly used to explain something that was unfamiliar or hard to grasp by way of comparison with something that was familiar and easy to grasp (Preece, Sharp, & Rogers, 2015). The visual representations were designed to mimic their real physical counterparts, conceptualising abstract concepts as graphical visualisations.

11

Figure 2.8. Xerox Star desktop. From “Xerox Star 8010 OS and Applications,” (2017). The term skeuomorphism describes objects that imitate design features of artefacts produced in another material or older technique. In the context of contemporary user interfaces its application was and still is widespread, but it became synonymous with Apple products.

The desktop metaphor was duplicated by Apple with the release of the Lisa in 1983 and the Macintosh in 1984 (Figure 2.9). Microsoft and a host of software developers followed suit, and as the concept of the desktop as a visual metaphor matured, skeuomorphic representation became entrenched in the visualisation of interface elements.

12

Figure 2.9. Apple Macintosh desktop. From Wikipedia: Macintosh, (2016). With the emergence of mobile computing, the continued use of the desktop metaphor came under scrutiny, and many in the design community looked to the “post-metaphorical era of interaction design” (Cooper, 2014, p. 299). The use of skeuomorphism came in for particular criticism. Gosnell defined its continued use as “…the act of diluting inherent functionality down to superfluous ornamentation in design”. (Gosnell, 2012, p. 1)

Baraniuk (2012) provided further commentary relating to the use of skeuomorphism in user interfaces, stating that “What's being called ‘skeuomorphic’ is not at all skeuomorphic… They’re kitsch visual metaphors, but they’re not the unintentional side-effects of technological evolution”. (para. 7)

As contemporary approaches to user interface became more pervasive, the visual metaphors were seen as more of a hindrance. As Lee and Choe (2016) noted:

There is a need to think about whether the traditional GUI expression style is appropriate for the current user context. For example, the “save” icon in the past was a floppy disk, which was the medium for saving in the past. However, we can question whether the floppy disk that is no longer being used should still be used as a metaphor, and whether the concept of “save” is still maintained. (p. 2)

13

The continued use of skeuomorphism by Apple generated the majority of negative commentary. Gosnell (2012) stated: “as the backlash against skeuomorphic design continues to swell, the opposite end of the spectrum offers an appropriate response—that of simplicity and an authentic design model.” (p. 4)

Apple finally discontinued Skeuomorphic treatments when Tim Cook became CEO, and Apple designer Scott Forstall who championed the skeuomorphic approach, was fired. The release of iOS 7 featured a user interface devoid of skeuomorphism – Apple had finally embraced Flat design.

14

2.5 The historical basis of Flat and Almost Flat design

The aesthetic and philosophical foundations that underpin modern UI design are rooted firmly in the past, perhaps most obviously within Modernism, and more specifically within the International Typographic Style.

2.5.1 Modernism

Figure 2.10. Key Modernist contributions to modern user interface visual design. From the author. The evolution of Modernism began in the latter half of the 19th century. The term ‘Modernism’ covers a wide range of artistic and philosophical movements that expanded rapidly in the early 20th century, frequently cross-fertilising each other, notably including (but not limited to) Cubism, Futurism, Surrealism, and Dadaism, and included notable painters, architects, philosophers, writers and film makers of the era.

According to Weston, (1996), the base ethos of Modernism was a rejection of realism, and decorative motifs in design. Rather than basing design decisions on subjectivity and feeling, design was to be based on objectivity, on logic and rationalism. The precepts of Modernism are echoed in the words of Amédée Ozenfant and Le Corbusier, when they defined the Purist manifesto:

Purism wants to conceive clearly, execute loyally, exactly without deceits, it abandons troubled conceptions, summary or bristling executions. Purism fears the bizarre and the

15

original…, Purism does not believe that returning to nature signifies the copying of nature. (Weston, p.109)

The development of Modernism was set against the backdrop of a rapid acceleration in mechanisation and urbanisation in Western Europe and the United States. Radical social changes in Europe, including revolts against the established political status quo in the form of the rise of Socialism and Communism, had a profound impact on artists and thinkers of the time. The destruction and chaos of the First World War mirrored the revolt against the order of the past, and the period following the war set the scene for continued experimentation in art and design, with the rise of De Stjil, the founding of the Purism movement, and the creation of the Bauhaus school, among notable examples of Modernist ideals and thinking.

Figure 2.11. Early Modernist painting. Picasso, P. (1907), Les Demoiselles d'Avignon. Museum of Modern Art, New York. The period leading up to the Second World War and the rise of Hitler and Stalin led to a continued re-ordering of the artistic landscape, most notably in Europe and in Russia. In Europe, many nascent art movements, particularly in Germany, were supressed, and many leading German artists fled to the United States. In Russia, the ideas and manifestos of the Futurists fell out of favour to be replaced by the Constructivists and Rationalists, who in turn were suppressed or subsumed into a single official union for artists and writers.

None the less, by the 1930’s, Modernism had become a solid fixture on the artistic landscape of Europe and the United States, where it permeated the work of industrial designers, graphic designers and architects.

16

2.5.2 International Typographic Style (Swiss Style)

While the period encompassing the Second World War was undoubtedly destructive and horrific, it did have some positive impacts. Neutral Switzerland would become an artistic beacon, where innovative art schools flourished, setting the foundation for the International Typographic Style.

The International Typographic Style emerged from Switzerland and Germany in the 1950’s. According to Meggs and Purvis (2006), the visual characteristics of the style include:

• A unity of design based on a mathematically constructed grid • Objective photography and copy that present visual and verbal information in a clear and factual manner • The use of sans-serif typography set in a flush-left margin configuration

They further state that “the initiators of this movement believed sans-serif typography expresses the spirit of a more progressive age and that the mathematical grids are the most legible and harmonious means for structuring information”. (p. 356)

2.5.3 Pioneers and influences of the movement

Peter Behrens Predating the International Topographic Style by half a century, the work of Peter Behrens had a profound impact on one of the cornerstones of the style: the use of san-serif type. Behrens championed the use of san-serif type, and also used a grid system to structure his layouts.

17

Akzidenz Grotesk In tandem with the work of Behrens, the Berthold Foundry in Berlin released the sans-serif Akzidenz Grotesk typeface 1898: the font family was to have a major influence on 20th century typography.

Figure 2.12. Sample of the Akzidenz-Grotesk font. From Wiki, (2017).

18

Ernst Keller Ernst Keller taught in the Zurich School of Applied Art from 1918 until 1956, and is considered one of the founders of the Swiss Style. Keller had an interest in the use of symbolic imagery, simplified geometric forms, and vibrant contrasting colour.

Figure 2.13. From Ernst Keller, poster. From Matthews, (2017).

19

Theo Ballmer According to Meggs and Purvis, Theo Ballmer, working in the 1920s, applied De Stijl principles to graphic design, using an arithmetic grid of horizontal and vertical alignments. With his ‘Büro’ poster, both the black and red words are carefully developed on the underlying grid.

Figure 2.14. Theo Ballmer, poster. From Meggs & Purvis, (2006).

20

Max Bill Max Bill, working in Zurich in the 1930’s, formulated a manifesto of ‘Art Concret’, calling for universal art based on controlled arithmetical construction. Meggs notes that Bill’s layouts were “constructed of geometric elements organised with absolute order” (P.357, para.2), and that the use of Akzidenz Grotesk was a feature of his work of the period.

Figure 2.15. Max Bill, book cover. From Meggs & Purvis, (2006). Meggs and Purvis further note that Bill’s art evolved and developed cohesive principles of visual organisation. “Important concerns include the linear division of space into harmonious parts, modular grids; arithmetic and geometric progressions, permutations, and sequences; and the equalisation of contrasting and complementary relationships into an ordered whole”. (p. 375, para. 2)

21

Innovations in visual design thinking were matched by the development in the 1950’s of new font families that became synonymous with the International Typographic Style, with Akzidenz Grotesk being an inspiration for two of the most famous font families developed during the period: Univers and Helvetica.

2.5.4 How Modernism and International Typographic Style influenced contemporary user interface design

Figure 2.16. A brief timeline of operating system user interface development. From the author. Key user interface elements that have their basis in Modernism and the International Typographic Style include:

• type: the evolution of modern sans serif fonts – Akzidenz Grotesk • layout: use of grids: Peter Behrens, Ernst Keller • use of colour: flat colours, De Stijl Cubism • use of white space: Le Corbusier, Max Bill

All elements were key to the development of flat interfaces developed by Microsoft, with Windows 8, Apple, with iOS7, and Google with Android version 5.

22

2.6 Flat design

In the book About Face, Cooper (2014) celebrated the transition from the use of ”skeuomorphisms and over-wrought visual metaphors” into a “post-metaphorical era of interaction design”, and lauded our modern device interfaces which are “content and data- centric, minimising the cognitive footprint of UI controls almost to a fault”. (p. 299)

Microsoft pioneered the use of contemporary Flat design, with the release of the music player in 2006, utilising a pared-down UI. As outlined by Clayton (2013). The implementation of Flat design by Microsoft had its basis in the Modern Design Movement, typified by the Bauhaus school of thought, in the International Typographic Style (the Swiss Style), and in Motion Design techniques, typified by the work of Saul Bass. The appropriation of the ethos of these movements, particularly modernism and the International Typographic Style, had been gathering pace for some time in web and graphic design before Microsoft adopted the aesthetic.

Figure 2.17. Zune music player. The Zune UI pioneered the flat aesthetic in Microsoft products. From Windows Central (2017). The release of the mobile operating system in 2010 demonstrated all of the UI conventions championed by supporters of the Flat aesthetic: bright shapes, sans-serif

23 typography, flat images, and grid menus. Microsoft garnered further praise from the design community when they released Windows 8, in 2012 (Figure 2.18). In an article that detailed the development of Windows 8, Carr (2014) noted “…with its fidelity to the essence of materials, the most innovative element of (Windows 8) is its shift away from visual metaphors”. (para. 14) The article captured the design ethos that existed in Microsoft that led to the release of Windows 8. In the same article, Carr (2014) quoted Sam Moreau, the then Design and Research Director for Windows, who said “It’s not about adornments, it’s about typography, color, motion.” (para. 5)

Figure 2.18. Windows 8 Start Screen. From Schiola, (2014). In addition to positive subjective feedback, more objective research into the use of Flat design principles was published. A travel application prototype, named TAS MOVE, was developed using Flat Design principles: no added effects or visual embellishments, simple elements, a focus on typography and on colour. The resulting design was user tested, with positive results. (Kuan, Huang, Wang, Li, & Duh, 2015) The user testing suggested that the application of Flat design principles allowed users to execute their tasks efficiently. In an online experiment which grouped users by age, Robbins (2014) asked respondents to indicate their visual preference for a selection of icons that were rendered using skeuomorphic and flat techniques, with a mix of hybrid icons included in the set. The results indicated a fairly even split in preference for all but the 27 – 45 age group, which had a slightly higher preference for flat design.

24

Part of the drive away from Skeuomorphism related to performance increases in the hardware platforms. As more powerful hardware drove increasing screen pixel densities, developers questioned the need to use skeuomorphism to disguise low resolution displays.

Looking at Flat design from a functional perspective, Lankinen (2017) noted that:

Flat design is a movement originating from the necessity for scalable applications and shared UIs between varying platforms. Implementing UI elements in solid colours means that slow-to-load raster images can be converted into HTML elements with the help of CSS fill colours that render instantaneously and scale across different resolutions. (p. 8)

In this respect, Flat design serves a measurably positive function and is not merely a shallow device used to counter skeuomorphism.

As the backlash against skeuomorphism gathered pace, particularly against skeuomorphism in Apple products, design professionals began to codify what Flat design was – in effect developing a Flat manifesto. In an article examining Flat principles, Campbel-Dollaghan (2013) noted that:

…flat advocates (flatvocates?) argue that GUIs should eschew style for functionality. That means getting rid of bevelled edges, gradients, shadows and reflections, as well as creating a user experience that plays to the strengths of digital interfaces, rather than limiting the user to the confines of the familiar analogue world. (para. 3)

Interestingly, he concluded the article with a statement that tempered the enthusiastic reception by the design community for Flat design noting that “After radical flatness, we’ll probably see designers carefully reintroduce dimensionality where it’s really needed”. (Campbell-Dollaghan, 2013, para. 9)

With the prevalence of Flat in user interface and web design, additional studies began to emerge that measured the quality of the user experience in Flat designs, indicating deficiencies in the Flat design approach.

A study by Nielsen (2012) documented usability issues with the Windows 8 aesthetic (then called ), noting that “…the new look sacrifices usability on the altar of looking different than traditional GUIs. There's a reason GUI designers used to make objects look more detailed and actionable than they do in the Metro design”. (para. 11)

Of particular note was a marked reduction in the discoverability of user interface elements. Nielsen (2012) further noted problems with users not recognising affordances in tabbed elements, stating that “We also saw problems with users overlooking or misinterpreting tabbed

25

GUI components because of the low distinctiveness of the tab selection and the poor perceived affordance of the very concept of clickable tabs”. (para. 14)

Loranger (2015) discussed how to reduce click uncertainty in Flat user interfaces, with a recommendation to make buttons look clickable, and Budiu (2013) examined how click uncertainty can impact interaction cost. With click uncertainty, a user may be compelled to explore more of a user interface, with the potential to frustrate the user.

Burmistrov, Zlokazova, Izmalkova, and Leonova (2015) carried out a usability study where test subjects were asked to locate a target word in text, search for a specific icon in a matrix of icons, and search for clickable objects on a web page. The UI elements were rendered ‘traditionally’ and ‘Flat’. They measured time on task and number of errors, and also included an analysis of oculomotor indicators of cognitive load. They found that the Flat designs produced poorer results compared to the traditional designs, with notable increases in search times for flat icons and clickable objects on a web page, and higher cognitive loads for text and icons, and a higher error rate searching for clickable objects.

However, in a usability study comparing Windows 7and Windows 8in terms of efficiency, effectiveness and user satisfaction, Schneidermeier, Hertlein, & Wolff (2014) found that for the three criteria tested, Windows 7 ‘skeuomorphic’ was preferred over Windows 8 ‘flat’.

2.7 Almost flat design/Skeuominimalism

The reaction against Skeuomorphism that led to the dominance of Flat user interfaces from Microsoft, Apple and Google led to a different set of problems relating to user experience. As noted earlier, research indicated that the minimalistic user interfaces of Windows 8, iOS 7 and Android systems created problems relating to affordance.

While both approaches have attempted to enhance usability, Riley (2013) argued that neither can be considered a solution to the usability problem, stating, “Why is there such an outcry against Skeuomorphism? The main argument is that it gets in the way of usability. Flat aesthetic is lauded for being more honest, simple, and easy to understand. It removes the extra ‘stuff”. (para. 5)

He then cited negative feedback relating to a Flat design implementation of the Dropbox iOS app, noting:

26

But Flat is just as guilty…These pain points go far beyond aesthetic – down to the levels of user experience and usability. Skeuomorphism and Flat are disparate visual solutions, yes, but neither is a solution to the massive usability problem. (Riley, 2013, para. 6)

The usability issues surrounding Flat design UIs began to be addressed by Google, with updates to their design guidelines. This new approach to UI development became known as ‘Almost flat design’, or ‘Skeuominimalism’. Sanchez notes “Skeuominimalistic design is simplified up to the point where simplification does not affect usability. And its skeuomorphic affordances are maximized up to the point where it does not affect the simple beauty of ” (Sanchez, 2012, para. 11)

Moore (2013) noted the following relating to the use of gradients in almost flat design:

For the most part, these interfaces stick to the flat design principles of flat colors, no drop shadows, and use of color to encourage specific user actions (e.g. red compose button in Gmail)… …Subtle affordance is a big component of Almost Flat Design and gives it a critical advantage over true flat design. (para. 7)

Google’s design updates attempted to address shortcomings in Flat Design. Google employed the metaphor of Material, which can be assigned tactile qualities. In the introduction to the visual language, Google noted:

Surfaces and edges of the material provide visual cues that are grounded in reality. The use of familiar tactile attributes helps users quickly understand affordances. Yet the flexibility of the material creates new affordances that supersede those in the physical world, without breaking the rules of physics. (“Introduction - Material design,” 2017, Material is the Metaphor, para 2)

Google acknowledged the importance of perceived affordances, and worked to provide a framework to designers to allow them to incorporate these affordances into their products.

Microsoft have recently released their (“Fluent Design System,” 2017), a major update on the Microsoft from 2015 (Figure 2.19). With Fluent Design, Microsoft introduced metaphors that echo Google Material Design, such as light, depth, material and motion. Ironically, the introduction of material properties has the potential to re-introduce problems associated with skeuomorphism. Commenting on Acrylic, a material property that mimics translucent plastic, Bright (2017) noted:

27

The challenge with Acrylic—and any other materials in the future—will be to ensure that each material retains a clear and coherent purpose, without falling into the trap of mimicking real-world objects (this is a practice known as "skeuomorphism"). (para. 9)

Figure 2.19. Windows 10 ‘Fluent’ Desktop. From the author.

2.8 Summary of existing research

Based on the evidence of noted industry experts, it can be concluded that affordances are still relevant, and should be provided for in user interface development.

Based on the research evidence relating to Flat design, Flat user interfaces are deficient, with particular problems relating to users not recognising perceived affordances, resulting in an increase in ‘click uncertainty’. Based on the reaction of Google and Microsoft to address usability issues relating to Flat design, it can be asserted that Google and Microsoft introduced almost flat principles into their respective design languages in an attempt to address issues relating to perceived affordance, findability and discoverability, and this implies that Google and Microsoft consider affordances to be an important user interface attribute. In light of the research, there is justification to further research into how perceived affordance, findability, discoverability and clickability can be facilitated in almost flat user interfaces.

28

2.9 Findability, clickability and discoverability: use of shaded gradients

Both findability and clickability are concepts closely tied to Gestalt principle of figure/ground, which impacts the ability to identify perceived affordance and signifiers. Krug (2014) notes that as we scan a page, we look for visual cues that identify things as clickable (or tappable, on touch screens). He notes that the search for visual cues is similar to the searches we make in physical environments – or physical affordances.

Creager (2017) notes that in software interfaces, findability (the ease with which users can locate an object or feature they know exists) and perceived clickability (the degree to which actionable elements appear to be clickable) are fundamental components of usable systems.

Norman (2013) explains that when we interact with a product, we need to understand how it works, which is the essence of discoverability. “Discoverability: Is it possible to even figure out what actions are possible and where and how to perform them?” (p. 3)

Taken together, findability, clickability and discoverability need to be considered by the designer in order to facilitate the user. These attributes can be conferred on an artefact through perceived affordances and signifiers.

2.10 Research problem identified 2.10.1 Overview

The literature review investigated affordances and signifiers, examining their relevance in modern user interfaces, and investigated skeuomorphic, flat and almost flat interfaces, in each case examining how the various visual approaches impact affordances and signifiers. The review identified that the almost flat design approach currently employed by Microsoft and Google is an attempt to address the deficiencies of the flat design approach, which in turn was a reaction to the use of skeuomorphism and metaphoric devices in user interfaces, and in particular attempts to address issues relating to perceived affordances, which affect the findability, clickability and discoverability of interface elements.

The justification for choosing this research problem has a basis in principles relating to visual perception, specifically relating to the Gestalt principle of ‘figure/ground’, and references a series of studies relating to the visual perception of shaded objects, and how the visual perception of shaded objects can impact findability, clickability and discoverability. The following exposition details prior experiments used to inform and refine the research question.

29

2.10.2 Perception of shape from shading

In an article published in the Scientific American, Ramachandran (1988) outlines a series of experiments relating to the perception of shape from shading, investigating the perception of simple shapes using a single light source. The research revealed that a variety of rules are applied early in the visual processing of shape from shading, finding that perception of shape from shading assumes that there is only one light source and the source is above the target image or images.

Participant profile / sample

There were 11participants. No information relating to age or visual capabilities.

Test: task and instructions

For the first part of the test, participants were required to view a display consisting of two rows of objects (Figure 2.20) which were mirror images of each other. For the second part of the test, participants viewed targets containing clusters of three-dimensional shapes that appeared either as convex or concave elements. No details or instructions were given to the viewers.

Figure 2.20. From “Perception of shape from shading”. Ramachandran, (1988).

30

Metrics: what was being measured

The first part of the test was measuring the perception of depth and convexity. The second part was testing the assumption that for a unitary-light source constraint there is a tendency to assume that the light source is at the top (Figure 2.21, Figure 2.22), and the effect is especially strong if a mixture of objects is presented (Figure 2.23).

Figure 2.21. From “Perception of shape from shading”. Ramachandran, (1988).

31

Figure 2.22. From “Perception of shape from shading". Ramachandran, (1988).

Experiment control measures

No controls were recorded for the first part of the first test. For the second part of the first test, a control stimulus consisting of targets (Figure 2.24) that were similar to test targets (Figure 2.23) in terms of luminance polarity but did not convey any depth, was used.

32

Figure 2.23. From “Perception of shape from shading” Ramachandran, (1988).

Figure 2.24. From “Perception of shape from shading”. Ramachandran, (1988).

2.10.3 Perceptual biases in the interpretation of 3D shape from shading

Additional work carried out by Liu and Todd (2004) involved two experiments in which observers judged the sign and magnitude of surface curvature from shaded images of an indoor

33

scene. The overall pattern of results revealed a strong perceptual bias to interpret the images as convex rather than concave, and a weaker bias to prefer illumination from above rather than from below.

Participant profile / sample

Experiment 1 and 2: involved seven subjects, two authors and five others. All had normal or corrected-normal visual acuity. Each subject participated in five separate sessions.

Test: task and instructions

Experiment 1: test subjects were presented with an image of an ellipsoidal concavity or convexity on the left hand side of the monitor (Figure 2.25). The right side of the monitor contained an elliptical curve that the subjects could adjust to indicate the apparent sign and magnitude of the surface curvature on each condition. When subjects were satisfied with their settings, they pressed the spacebar to initiate a new trial.

Figure 2.25. The two upper panels depict convex surfaces, the two lower panels depict concave surfaces. From “Perceptual biases in the interpretation of 3D shape from shading”. Liu & Todd, (2004).

34

Experiment 2: similar to experiment 1, but the images were modified such that the transition from flat plane to convex or concave surface was moderated, so there was no abrupt change between the plane and the start of the convex or concave surface. See Figure 2.26.

Figure 2.26. The two upper panels depict convex surfaces, the two lower panels depict concave surfaces. From “Perceptual biases in the interpretation of 3D shape from shading”. Liu & Todd, (2004).

Metrics: what were they measuring?

Experiment 1 and 2: the sign and magnitude of the concavity or convexity of the target image.

Experiment control measures

Computer hardware, image size, distance from screen, target/distractor shading.

35

2.10.4 Toward Understanding the Findability and Discoverability of Shading Gradients in Almost-Flat Design

Building on the work of Kleffner and Ramachandran, Creager and Gillan (2016) investigated the findability and discoverability of shading gradients in almost flat design. The study demonstrated that there are potential usability benefits to using luminance gradients in otherwise flat design in software interfaces. The process of visual search can be made quick and easy when a key element is distinguished by a convex or concave cue. “Further, convex and concave stimuli are perceived to have depth during deliberate processing which likely explains increases in speed and accuracy in usability tests, relative to flat interfaces, due to conveyed affordances, or signifiers.” (p. 5)

Participant profile / sample

There were 17 participants, all undergraduate. All participants had either normal or corrected- normal vision.

Test: task and instructions

The task began with the display of a fixation symbol on the centre of the screen for one second. This was followed by a stimulus display (see figures 2.27 and 2.28) to which the participant would respond either target present (left ctrl key) or target absent (right ctrl key). The stimulus display was visible until the participant responded or three seconds elapsed, and a feedback message of either ‘correct’ or ‘incorrect’ was displayed in the centre of the screen for one second. If a response was not received within three seconds, the trial was marked as incorrect.

Figure 2.27. Three types of target and distractor stimuli. From “Toward Understanding the Findability and Discoverability of Shading Gradients in Almost-Flat Design“. Creager & Gillan, (2016).

36

Figure 2.28. An example of target present display (left) and target absent display (right). From “Toward Understanding the Findability and Discoverability of Shading Gradients in Almost-Flat Design“. Creager & Gillan, (2016). Testing was within-subject with a total of 360 trials per participant.

Metrics: what were they measuring?

Visual search times and depth ratings.

Experiment control measures

Computer hardware, distance from screen, image size.

2.10.5 Understanding the Findability and Perceived Clickability of Shaded and Flat Objects in Almost-flat Interfaces

Most recently, Creager (2017) carried out additional experiments on shaded and flat objects in almost flat interfaces. The research begins the process of validating the benefits of almost flat design identified in previous usability studies through the use of more tightly controlled shaded stimuli and connections to established psychological theory. Results from the experiments revealed that shaded objects had the benefit of high findability in almost flat environments. Finally, depth ratings revealed some types of shading had the benefit of conveying consistent clickability signifiers. Medium and high contrast convex gradients consistently conveyed a sense of depth and a raised three-dimensional shape, which was effective for reliably signifying clickability.

Participant profile / sample

Experiment 1: involving 17 undergraduate students. All participants self-reported either normal or corrected-normal vision.

37

Experiment 2: involving 28 undergraduate students. All participants had at least 20/20 visual acuity and had normal contrast sensitivity. Two participants showed signs of colour deficiency.

Test: task and instructions

Participants were seated approximately 0.7 meters away from an eye level, 17-inch monitor.

Experiment 1: Visual search. Three types of target and distractor stimuli were used (Figure 2.29) grouped into four target-distractor pairs (Figure 2.30). For the visual search task, each trial began with a fixation symbol displayed on the centre of the screen for one second. This was followed by a stimulus display which the participant would respond either target present (left ctrl key) or target absent (right ctrl key). The stimulus display was visible until the participant responded or three seconds elapsed, and a feedback message of either ‘correct’ or ‘incorrect’ was displayed in the centre of the screen for one second. If a response was not received within three seconds, the trial was marked as incorrect.

Figure 2.29. An example of target present display (left) and target absent display (right). From “Toward Understanding the Findability and Discoverability of Shading Gradients in Almost-Flat Design“. Creager, (2017).

Figure 2.30. An example of target present display (left) and target absent display (right). From “Understanding the Findability and Perceived Clickability of Shaded and Flat Objects in Almost-flat Interfaces”. Creager, (2017).

38

Experiment 1: Depth ratings. After completing the visual search task, participants were shown each of the convex, concave, and flat stimuli – one at a time – and asked to rate the apparent depth of each stimulus on a scale of minus 10 to plus 10. Participants were instructed that a negative value indicated the stimulus appeared depressed into the screen, a value of zero indicated the stimulus appeared flat, and a positive value indicated the stimulus appeared raised out of the screen.

Experiment 2: Visual search. Similar setup for experiment 1 but two additional convex stimuli were introduced with low and medium contrast (see Figure 2.31). The original flat and high contrast stimuli were retained from the first experiment. Experiment 2: Depth ratings. Similar to experiment 1 except that the new stimuli were used.

Figure 2.31. Experiment 2 stimuli. Flat, low, medium and high contrast convex shapes. From Creager, (2017).

Metrics: what were they measuring?

Search efficiency, perceived depth of convex, concave and flat objects, the effects of gradient contrast on the perceived clickabiliy of objects.

Experiment control measures

Computer hardware, participant position, target/distractor size and shading.

In addition to the cited experiments, the experiments undertaken by Creager suggest that the visual perception of shaded objects on almost flat interfaces will have a positive impact on the quality of perceived affordances (signifiers) in almost flat interfaces, and should have a positive impact on the usability of almost flat interfaces.

Before we physically interact with an artefact – real or virtual – we must first interact with it perceptually. We mentally engage with the artefact and then physically manifest the engagement through some form of interaction. We will not mentally engage with an artefact if we cannot perceive it – if it does not afford perception. The prior cited research indicates that shaded

39 objects create a perceived affordance that links to an automatic perception – it is a visceral engagement, independent of conscious thought.

2.11 Research Question

Can the use of shaded objects improve the visual perception of perceived affordances (signifiers) in almost flat user interfaces?

To answer the research question, three hypotheses have been generated: H1: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers H2: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers that include Gestalt principles H3: There will be no difference in click times between shaded and unshaded objects in tests with signifiers present on user interfaces derived from existing web sites

The abstract signifiers will be simple forms that mirror the forms used in Creager’s experiments and mimic clickable targets and distractors. The targets and distractors will be built using Google Material Design Guidelines, and the guidelines will be extended for the experiments using shaded objects. The signifiers present on user interfaces (existing web sites) will be more complex, and will contain text and/or icon elements.

40

3 Research methodology

This study extends the work of Creager (2017), and applies observations regarding shaded objects to the design of a high fidelity prototype in the form of a web application. Where Creager tested to see if the presence and absence of a target could be perceived, the study will test with a target always present, and will test to see if there is a difference in target identification between flat and shaded target perception i.e. the study will centre on A/B testing. Therefore, two applications will be produced, one that uses flat targets i.e. will utilise existing Google Material Design guidelines, (2017), and a second that will utilise Google Material Design guidelines plus shaded targets. Since the experiments are primarily relevant to the visual perception of shaded and non-shaded elements, there is a requirement to control target and distractor colour and contrast.

3.1 Colour and contrast

Part of Creager’s investigation involved an experiment to measure the impact of gradient contrast on findability and perceived clickability. Creager(2017) noted: With respect to visual search, it was hypothesized that search for medium and high contrast convex targets among flat distractors would be efficient in both target present and target absent conditions, but search for low contrast convex targets would be inefficient due to poor discriminability during guidance. (p. 22) Creager’s testing confirmed that the search for medium and high contrast convex targets among flat distractors was highly efficient. In addition, the results indicated that search for low contrast targets, while being less efficient than for medium and high contrast targets, was none the less still very efficient. In Creager’s experiments, the contrast ratio for low contrast shaded targets varied between 2.4:1 at the lightest part of the gradient and 7.0:1 at the darkest portion of the gradient. The contrast ratio for medium contrast shaded targets varied between 1.6:1 and 12.3:1, and the contrast ratio for high contrast shaded targets varied between 1.1:1 and 17.9:1. Targets and distractors were all presented against a flat white background, and were generated in grey scale. The contrast ratio for flat distractors against the background was 4.2:1.

Given the minor variation in efficiencies between the medium and high contrast distractors, and to a lesser extent the low contrast distractors, the study will utilise gradient contrasts that mirror those used in Creager’s experiments.

41

The study will control colour and contrast across all experiments, and will use current Web Content Accessibility Guidelines (WCAG), (n.d.) from W3C, combined with Google accessibility recommendations(n.d.), to establish a baseline. These will be implemented to ensure that the targets, distractors and background have sufficient contrast so as to be perceived by the testers. However, in some test cases, contrast between the background, the targets and distractors will be reduced to below 3:1 to measure visual perception when the contrast falls below WCAG and Google guidelines.

3.2 Targets and distractors

The targets and distractors will be in the form of buttons, which will be rendered such that they follow recommended colour contrast guidelines as specified by WCAG (n.d.) and Google (n.d.).

The guidelines do not address contrast for buttons. The Google guidelines are primarily designed to ensure that there is sufficient colour contrast between onscreen text and background colours. In order to provide a measurable baseline for the contrast used in the prototypes, the intention is to use the contrast recommendations for large text, which Google also recommends for icons: “Large text (at 14 pt bold/18 pt regular and up) should have a contrast ratio of at least 3:1 against its background”.

See figure 3.1 below.

Figure 3.1. Samples of shaded and unshaded targets. From the author.

42

To determine contrast, the study will follow recommendations from the W3C working group note, G18., (2016), which recommends that foreground and background elements maintain a contrast ratio of at least 4.5:1.

In certain situations, targets and distractor buttons may use text labels. In such cases and where the buttons have been rendered as shaded objects, it will be difficult to control contrast between text and the background gradient. In these situations, the following from the W3C working group will be implemented:

…if a letter is lighter at the top than it is at the bottom, it may be difficult to maintain the contrast ratio between the letter and the background over the full letter. In this case, the designer might darken the background behind the letter, or add a thin black outline (at least one pixel wide) around the letter in order to keep the contrast ratio between the letter and the background above 4.5:1.

See figure 3.2 below.

Figure 3.2. Sample shaded target containing a text label. An outline has been added to the text “GET STARTED” to ensure that the contrast ratio between the text and background is above 4.5:1. From Angular, modified by the author (n.d.).

43

3.3 Procedure

A/B testing will be carried out on both application prototypes with the intention of measuring differences between the two visual design approaches with respect to the clickability, findability, and discoverability (Chapter 2.9) of interface elements rendered as flat or shaded objects. Additionally, the prototypes will apply a number of Gestalt principles including figure/ground, proximity and good continuation, and certain test conditions will measure the impact of the utilisation of these principles on the visual perception of shaded and flat objects. The introduction of Gestalt principles will introduce deviations from Creager’s original experiment, but will allow for the development of visual puzzles that support established user interface paradigms and patterns published by Google, in particular with regard to guidelines relating to button components.

3.4 Required prototype(s), techniques and technologies used to prototype

The experiment will require the development of two high fidelity application prototypes which will be built as web-based applications, and remotely hosted. Testing will occur on a desktop PC. The application will incorporate gaming principles. The objective of using gamification is to improve the user experience of the application in order to motivate and engage the users during the testing process. See appendix A for details.

3.4.1 Investigation of existing reaction games

An investigation was carried out to examine existing Android based reaction games available from the Google Play store for mobile platforms. The investigation provided insights into the visual presentation used, including how closely existing games followed Material Design patterns. The investigation also determined what gamification principles could be utilised, and concluded that in-game awards and leader boards were widely utilised in Android based reaction games.

44

Figure 3.3. An example of Android based reaction games. (n.d.).

3.4.2 Techniques and technologies used to prototype

Functionally, the two prototypes will be the same, with the same number of targets and distractors, but the visual design of the target elements in each prototype will differ. One will be built utilising existing Google Material Design guidelines, and the other will be built using Google Material Design guidelines supplemented by the use of shaded objects with the intention of enhancing clickability, discoverability and findability of user interface elements, according to the principles outlined by Creager (2017). The testing regime will include tasks that test conditions including Gestalt principles, which measure any impact that the principles have on clickability, discoverability and findability. Independent variables will consist of the clickable elements on each of the prototypes.

Wireframe prototypes will consist of semi-functional interactive pages, with sufficient interactivity (clickability, findability and discoverability) to be robust and testable, and allow for a meaningful set of tasks to be assigned to the test subjects. Tests of the prototypes will be run remotely on a desktop.

Prior to testing the final application prototypes, testing will take place on the wireframe version of the prototype, built using Adobe Xd (version 4.0.12.6, n.d.). The wireframe prototype will be used to validate both the base prototype functionality and to check and refine the test script(s) that will be used in the final testing.

45

3.4.3 Choice of form factor

The choice of form factor used for testing will have a significant impact on the quantity and quality of test data from the experiments. Following examination of both form factors, a desktop based solution will be generated. This allows for the collection of a richer set of data, versus a mobile solution.

Desktop application

The desktop application will be built using Adobe Photoshop (version 19.1.1, n.d.), HTML, CSS, JavaScript, PHP and MySQL. The application will be remotely hosted. The collection of test data will be provided by Loop 11 (n.d.), which will log task interactivity and provide pre and post-test questionnaires.

Mobile App

The mobile form-factor was considered for development, and preliminary studies were carried out that included the evaluation of MIT App Inventor (n.d.). After building a number of prototypes that confirmed that App Inventor could support the required interactivity and could log the required data, further development was discontinued. App inventor was not capable of generating a sufficiently high fidelity front-end, and given that the project centred on visual perception, it was decided to discontinue development using App Inventor. For further details, see chapter 4.4.1.

Prior to settling on the desktop form factor, early paper prototypes and wireframes were generated for both form factors. See chapter 4.1.3.

3.4.4 Advantages and disadvantages of each form factor

Desktop application: advantages

The desktop provides a robust test environment, with the ability to apply strict controls on hardware setup. The use of third party data recording can yield rich quantitative data sets. The test setup can more easily reflect the testing regime used by Creager.

Desktop application: disadvantages

Since a portion of the testing will be carried out remotely, compromises will be made in controlling the test setup. The testing platform allows testers to specify certain task criteria, but lab conditions cannot be replicated.

46

Mobile App: advantages

Using a mobile platform has the potential to encourage tester participation, as the testing process will appear less formal than that used for desktop. Mobile testing also has the advantage of being able to take place in many locations.

Mobile App: disadvantages

It would be more difficult to apply laboratory controls to the testing, which may compromise the ability to provide consistent results.

3.5 Design methodology

The design methodology will be based on a cyclic process involving prototyping, testing, analysing and refining the test application and its associated artefacts (see figure 3.4). Nielsen (1993) recommends at least three iterations of user tests, but due to time constraints, at least two iterations are planned before final testing occurs. Refinements to the application and associated artefacts will be based on the testing methodology outlined in the next section.

Figure 3.4. The cyclic design process. Adapted from Aflafla1, (2014).

47

3.6 Testing methodology

Testing will be divided into two phases. The first phase will involve testing during the development of the application. The second phase will involve testing with the final application to generate data that addresses the research question.

Rubin and Chisnell (2008) outline a robust set of testing methodologies that will be applied to the testing of the application built for the research project.

They identify three types of tests that are used at different points in the development lifecycle of a product. These are exploratory (or formative), assessment (or summative), and validation (or verification) tests. An additional test, the comparison test, is not associated with any specific lifecycle phase. The testing types support an iterative design process.

3.6.1 Testing during application development

Early testing of the wireframe prototype will draw test participants from within the researcher’s social circle. This testing cycle will be used to check the design of the application prototype and to refine test questionnaires and the test task. In total, ten participants will take part in the testing. All participants will be provided with a test briefing document, a consent form, and a URL to the Loop 11 test site.

An additional ten test participants will be recruited using Amazon Mechanical Turk (MTurk)(n.d.). This testing cycle will be used to check the design of the application prototype and to refine test questionnaires and the test task. Additionally, the testing cycle will be used to explore and refine the recruitment methodology used for MTurk. These test participants will be provided with a test brief and a URL to the Loop 11 test site.

The Loop 11 test site will provide all test participants with an overview of the research study, a test brief, a pre and post-test questionnaire, and a link to the wireframe prototype.

The pre and post-test questionnaires will be delivered via Typeform (n.d.) for the first test group. For the test group recruited via MTurk, the pre and post-test questionnaires will be administered via Loop 11. In both cases, the test brief and pre and post-test questionnaires are identical, as is the wireframe prototype.

48

3.6.2 Testing for the experiment: recruiting testers, ages and capabilities

All research participants will either be drawn from the IADT student population, or remotely recruited using MTurk. Based on metrics from Nielsen, (2006) the aim is to recruit 40 participants in total, using between-subjects testing. A reward scheme will be used to incentivise test subjects from IADT. MTurk workers will be provided with a monetary reward for participating in the testing. All participants will be expected to have an understanding of web technologies and be comfortable using desktop or laptop computers, and have either normal or corrected-normal vision. The demographic is not important though demographic details will be captured using quantitative and qualitative data collection methods. Provided approval has been reached with the relevant authorities, testing in IADT will be arranged to facilitate students and tutors, at a time that fits with the project development schedule. The experiment will not require a specific age range for testing.

3.6.3 Screening test candidates

Screening requirements are relatively loose. Two primary criteria require that test subjects have normal or corrected normal vision (and colour-blind test subjects will be required to indicate which form of colour blindness they suffer from), and have access to a desktop or laptop computer which is connected to the internet.

3.6.4 Conducting the tests

Before testing commences, test subjects will sign consent forms and recording waivers. Remote testers will be provided with a study preamble that lists consent criteria. For on-site testing, the test moderator will outline the test process and prepare the test equipment, ensuring that the application is ready for testing, and that data collection systems are functional. A pre-test questionnaire will be administered to record basic demographic information and to log details relating to technical proficiency.

Once testing commences, testers will have the choice to discontinue testing. The system will be configured so that the testers will have time between tasks to take micro breaks.

On completion of all test tasks, the tester will complete a post-test questionnaire and will be de- briefed. The session will then be closed and all recorded and logged data will be compiled and stored for later analysis.

49

3.6.5 Environmental setup

Testing will be carried out using a combination of on-site and remote testing. On-site testing will be carried out in IADT, with test participants drawn from the college population. Testing will take place in a college laboratory with standardised computer hardware.

It will be more difficult to standardise remote testing. The test setup will be reliant on the MTurk workers hardware setup.

3.6.6 Data collection

A variety of data collection methods will be utilised, including:

• questionnaires – pre and post test • in-app data collection using MySQL • loop 11 data logging, including screen recording

Data collection methods

Qualitative and quantitative data will be collected from test participants in the form of questionnaires that will be administered before and after testing. Testing data will be gathered using online questionnaire via Loop 11.

Data collection: analysis

The collected data will be analysed to determine if there is a measureable difference in the time it takes to scan for and locate target interface elements, and to interact with the targets on both prototypes. The primary hypothesis is that the time it takes to scan for, locate and interact with targets that utilise shading gradients should be lower than scanning for, locating and interacting with targets that do not utilise shading gradients.

Analysis will also evaluate the impact of Gestalt principles on target identification.

3.7 Ethical and legal considerations

Before testing commences, all test participants will be required to sign a consent form stating that they have agreed to participate in the testing process, and it will be made clear to test participants that they can opt out of the testing at any stage. All testing data will be strictly anonymous, and any collected video footage will be kept private. Test participants will be given access to any data gathered about them.

50

Any questionnaires and data collection methods used during testing will be anonymised to preserve tester privacy. Loop 11 anonymises data by default, and MTurk workers are identified by codenames.

On-site testing in IADT will require permission from college authorities. Test participants will be drawn from a pool of post-graduate students. If testing takes place on college PCs, any third party software which may require installation on the machine(s) may be subject to legal restrictions, so any software end user license agreements will require checking beforehand.

51

4 Design & Implementation

In order to address the research questions, an application prototype was developed and used for A/B testing. This section will describe the development of the application prototype (including issues associated with the development technologies), and will detail the development of additional test assets such as questionnaires. Additionally, detail will be provided on the challenges of testing the application prototype. The design utilised the Google Material Design language (2017), and contained basic structural elements including toolbars, cards, and raised buttons.

The design followed the iterative design and testing process outlined in the methodology chapter. The design was informed by the structure of the experiment carried out by Creager (2017), which was used as a template for the process flow.

4.1 Paper prototypes

Once the form factor had been decided (see section 3.4.3), product concepts were developed on paper and whiteboard to investigate functionality. Early work concentrated on application flow and game features.

4.1.1 Application flow

Application flow was based around the experiments outlined by Creager, where a series of target/distractor arrays were displayed. The test displayed instructions for the task (visually identifying the target), followed by a countdown, after which the target/distractor array was displayed. Feedback was provided to the tester upon successful/unsuccessful identification of the target.

4.1.2 Gamification

Gamification features were investigated, and existing reaction type games for Android were evaluated. Features of existing Android games that were considered for inclusion in the application prototype included levels, scoring, progress feedback, in-game awards and leader boards.

Design decisions were also informed by Nielsen’s heuristics (1995), notably:

• visibility of system status • consistency and standards

52

• aesthetic and minimalist design • help and documentation

4.1.3 Choice of form factor

Early development work included scoping out the choice of form factor, with consideration given to using mobile or desktop. Both had advantages and disadvantages, notably a mobile solution would allow for testing in a wide variety of locations, while a desktop solution would provide less variability during testing. The decision to use a desktop form factor was due to technical limitations – the developer had more familiarity with desktop development. See figures 4.1 – 4.7 for details.

Figure 4.1. Whiteboard brainstorming on target/distractors. From the author.

53

Figure 4.2. Early work on flow using mobile form factor. From the author.

Figure 4.3. Early work on flow using desktop form factor: application launch. From the author.

54

Figure 4.4. Early work on flow using desktop form factor: target identified/not identified. From the author.

Figure 4.5. Early work on flow using desktop form factor: task feedback. From the author.

55

Figure 4.6. Early work on flow using desktop form factor: progress/awards/high scores. From the author.

Figure 4.7. Early work on flow using desktop form factor: application status. From the author.

56

4.2 Wireframe prototypes 4.2.1 Balsamiq Mockups

Once the base application flow, gamification features and target/distractor combinations had been investigated, an early wireframe was generated using Balsamiq Mockups (version 3.5, n.d.). See figure 4.8 below. After initial development of a set of wireframes based on the mobile form factor, further wireframe development was carried out using Abobe Xd.

Figure 4.8. Samples of the mobile application wireframe screens built using Balsamiq mockups. From the author.

4.2.2 Adobe Xd wireframes

Before finalising development using the desktop form factor, a mobile wireframe was developed in Adobe Xd. Many features of the mobile wireframe were carried over to the desktop version of the wireframe, which was then used for exploratory (formative) testing. See figures 4.9 – 4.11 below.

57

Figure 4.9. Samples of the mobile application wireframe screens built using Adobe Xd. From the author.

Figure 4.10. Samples of the desktop application wireframe screens built using Adobe Xd. From the author.

58

Figure 4.11. Sample instruction screen in Adobe Xd. From the author. Feedback from exploratory testing (see Chapter 5.1) provided insights into the usability of the application, tone of voice and gamification elements. The feedback was used to iterate the wireframe prototype prior to summative testing.

4.3 Target/distractor arrays

In tandem with wireframe development, work was carried out to define the target/distractor arrays which would form the primary components of the experiment.

The arrays consisted of 2x1, 3x3, 5x5 and individual screen grabs. The 5x5 arrays included subsets of arrays that utilised a number of Gestalt principles such as proximity, good continuation and also included different sized distractors, variations on target shape and poor contrast. See figure 4.12 – 4.14 for examples.

The target/distractor arrays went through a number of design iterations, with a switch from using target/distractors that only differed by contrast, to target/distractors that used the same contrast but different colours.

Consideration was given to the impact of Fitts’ law (1992) on the interaction between the tester and the target/distractor arrays. In Creager’s experiment, the presence or absence of a target was logged using keystrokes, which reduced tester hand movement. The experiment under consideration extends Creager’s original experiment but requires mouse interaction to detect the target, so some variation would be introduced as the mouse position could vary for each tester.

59

The experiment was designed to minimise mouse travel, and interface button elements were positioned on a central vertical axis which encouraged unconscious placement of the mouse in the centre of the application canvas prior to task execution. Later analysis of screen recordings of testing showed that testers tended to keep the mouse positioned in the centre of the application canvas.

Figure 4.12. Sample target/distractor array. From the author.

60

Figure 4.13. Sample target/distractor array utilising the ‘good continuation’ Gestalt principle. From the author.

Figure 4.14. Sample screen grab target/distractor which tests perception on realistic user interfaces. From the author.

61

4.4 Implementation 4.4.1 App Inventor

Early investigation of the suitability of the mobile form factor included an exploration of MITs App Inventor mobile development platform. A series of tests were carried out to scope development limitations. Work included the development of timer logic, an investigation of methods for saving test data to Secure Digital (SD) storage, and the generation of user interface elements (see figure 4.15 below). Development was suspended in favour of development of the desktop form factor, as it was concluded that the learning curve in familiarisation with App Inventor would jeopardise the project schedule.

Figure 4.15. MIT App inventor test screens. From the author.

4.4.2 HTML/CSS/JavaScript/PHP/MySQL

The final application prototype was built using HTML, CSS and JavaScript. The front end used PHP to connect with a MySQL backend, used for data storage.

All application logic and flow control was contained in a JavaScript file. An HTML file contained the application structure – primarily stored in a series of DIVs, and the test levels and screens were contained in a large array which was stored in the same HTML file. Application styling was contained in a CSS file. See appendix B for selected code samples.

In the experiment carried out by Creager, the position of the target with respect to the distractors was randomised, and initially it was intended that the application would replicate the

62

random display of target and distractors. Due to the difficulty in program logic, the final application prototype did not include randomisation of target positon, and instead the target/distractor arrays were generated as single images, with the target location identified as an html image map. Testers will not re-run the experiment, and between-subjects testing is to be used, so there is no benefit to having target/distractor positions appearing in random positions for each run of the experiment. In addition, randomising the target/distractor positions would cause issues with the A/B test. In order to provide statistical validity, each set of target/distractors needed to be identical between the A and B application variants.

In total, 96 target/distractor images were tested for each of the flat and shaded variants with the final 10 consisting of screen grabs from web sites.

Due to time limitations, a number of planned application features were not implemented, including the display of inter-level page status, continuous display of score, in-game awards and the leader board. In addition, in-game help was not implemented as intended, resulting in a compromised user experience.

An additional limitation of the coding of the application requires it to run on a desktop running at a resolution of 1920x1080 pixels, and the target/distractors are raster images. Running the application on a desktop at a lower resolution will cut off parts of the user interface, and will interfere with the application interactivity, thus compromising response times. Limiting the resolution to 1920x1080 pixels would limit to the potential test participant pool. Statcounter’s current statistics for desktop screen resolutions in North America indicate that the 1920x1080 pixel resolution accounts for 9.47% of all screen resolutions ( n.d.).

Since testing required two applications, two subdomains were generated on the researcher’s website, one for testing with flat targets, and one testing with shaded targets, with two variations of the application code. The only differences were in the calls to the target/distractor sets, one for flat targets and one for shaded targets.

The test sites used were:

• http://flat.kjoconnell.com/ • http://shaded.kjoconnell.com/

During early testing of the application, problems were encountered with the loading of images, which caused a brief flicker in the appearance of each new target/distractor image. The flicker effect caused visual distraction and affected target recognition. As a result, a JavaScript image

63

pre-loader file was written, which was called by the main HTML file. All test images could now be pre-cached on the client side, thus eliminating the flicker on image load.

4.5 Data collection

Data was collected using questionnaires for qualitative data, while quantitative data was collected and stored using a MySQL database.

All testing was run via the Loop 11 testing platform, which integrated pre-test, test task and post- test data collection. In addition to the pre, post and task tests, Loop 11 also collected data pertaining to participants, including geographical location, IP address, browser type, and operating system. Additionally, Loop 11 provided statistics on test duration and task path analysis. The system provides an option to record user screen interaction, and in all cases video was captured which provided additional insights and allowed for cross-referencing of data gathered using the questionnaires and the MySQL database. See figure 4.16 below.

Figure 4.16. The Loop 11 remote user testing site. From Loop 11 (n.d). Pre and post test data was exported from Loop 11 in Excel spreadsheet format.

After each test iteration, a JavaScript function passed test data through to a PHP file, which in turn passed the data through to a MySQL database. The application data collected consisted of:

• a numerical identifier • a universally unique identifier (UUID) • batch name • page number

64

• response time • date

After each test session, the logged test data was exported from the two MySQL databases in the form of CSV files.

65

5 User testing

As previously outlined, user testing was based on an iterative process, with each iteration designed to inform the next iteration. Testing was carried out over a six week period, with an analysis of the results of each testing iteration informing updates to the questionnaires, test methodology and the application prototype, before final deployment of the application.

Figure 5.1. Iterative testing. Testing consisted of three iterations. From the author.

5.1 Exploratory study: wireframe prototype

The objective of the exploratory study was to examine the effectiveness of preliminary design concepts. It was also used to test high-level assumptions regarding the user’s behaviour.

5.1.1 Testing with peers

Purpose

The exploratory study was used to test an early version of the application prototype to ensure that assumptions on process flow and UI development were sound, and to test pre and post-test questionnaires. Unmoderated testing was carried out remotely, and administered via email.

Participant recruitment

Ten testers were drawn from the researcher’s social circle.

66

Test assets

All participants were mailed a copy of the consent form and task instructions. Task instructions included an outline of the study scope, task duration and key test components. Details relating to the wireframe prototype outlined its functionality. See appendix C for a sample of the consent form. See appendix D for a sample of the task instructions.

Pre-test questionnaire

The pre-test questionnaire was used to collect qualitative data relating to the tester population. It also included questions relating to games and attitudes to gamification features. All questions in the pre-test questionnaire consisted of closed questions. See appendix E for the pre-test questionnaire.

Task

The task was presented as an Adobe Xd wireframe. The wireframe consisted of a shortened click through of the actual test task. The link to the wireframe was included in the testing documentation.

Post-test questionnaire

The post-test questionnaire was used to collect qualitative data relating to the wireframe prototype. The majority of questions were based on the Post-Study System Usability Questionnaire (PSSUQ), (2013), and also included questions relating to negative and positive aspects of the application prototype. See appendix F for the post-test questionnaire. While researching an appropriate post-test questionnaire, consideration was given to obtaining unbiased responses from test participants. Lewis (2013) notes that obtaining ‘honest responses is rarely a problem in most usability evaluation settings’. He further states:

Even if consistent item alignment were to result in some measurement bias due to response style, typical use of the IBM questionnaires is to compare systems or experimental conditions (a relative rather than absolute measurement). In this context of use, any systematic effect of response style (just like the effect of any other individual difference) will cancel out across comparisons. (p. 7)

67

5.1.2 Testing using Amazon Mechanical Turk/Loop 11

Purpose

Building on the peer tests, unmoderated testing was moved to the MTurk and Loop 11 platforms. The reason for the move to MTurk and Loop 11 was the platforms provided a method for recruiting remote testers, and provided a robust platform for conducting and recording tests. The testing also allowed for limited testing of the overall testing approach, and familiarised the researcher with MTurk and Loop 11.

Participant recruitment

All participants were recruited using MTurk. The researcher signed up to MTurk and generated a Human Intelligence Task (HIT). A financial incentive of $2 was provided for testers who met the required criteria: testers were obliged to use a desktop or laptop computer to carry out testing. A total of ten participants were recruited. See figure 5.2 below.

Figure 5.2. The Amazon Mechanical Turk web interface. From Amazon (n.d.).

Test assets

The HIT consisted of a task description, scope study, task instructions, approximate task duration, and a link to the questionnaires and task. The questionnaires and task were hosted on

68

the Loop 11 testing platform. See appendix G for details of the scope study and task instructions provided to the MTurk workers.

Pre-test questionnaire, task and post-test questionnaire

The pre-test questionnaire, task and post-test questionnaire mirrored the assets used during peer testing. See appendices E and F for pre and post-test questionnaire samples.

5.1.3 Results from peer testing with Amazon Mechanical Turk/Loop 11

Pre-test questionnaire responses provided insights into the attitudes of test participants to gamification principles, and post-test questionnaire responses provided insights into the wireframe prototype structure and user interface. In addition, insights gained from the use of MTurk helped to refine further recruitment of testers, and insights also included the need to refine task instructions, and to tighten tester acceptance criteria.

Metrics from Loop 11 provided insights into refining pre and post-test questionnaires, additional refinement of Loop 11 testing instructions, and underlined the usefulness of tester screen recording.

Key insights: pre-test • All testers had normal or corrected normal vision, none of the testers were colour blind. • A number of testers didn’t see (or read) the instruction relating to the countdown not functioning. • Game instructions are somewhat important. • Game progress is important. • Game feedback is important. • Leader boards are not so important. • Visual design is important. • Ease of use is important. • Ease of learning is somewhat important. • Game challenge is important. • Help and documentation is somewhat important. • In-game awards are split between important and unimportant.

69

Key insights: post-test (answers to PSSUQ)

Please rate your satisfaction with the application

0 1 2 3 4 5 6 7 8 9 10

Overall, I am satisfied with how easy it is to use this… It was simple to use this system. I was able to complete the tasks and scenarios quickly… I felt comfortable using this system. It was easy to learn to use this system. I believe I could become productive quickly using this… The system gave error that clearly told me… Whenever I made a mistake using the system, I could… The information (such as online help, on-screen… It was easy to find the information I needed. The information was effective in helping me complete… The organization of information on the system screens… The interface of this system was pleasant. I liked using the interface of this system. This system has all the functions and capabilities I… Overall, I am satisfied with this system.

Stronly disagree Disagree Somewhat disagree Somewhat agree Agree Strongly agree N/A

Figure 5.3. Summary of PSSUQ responses for exploratory wireframe testing. From the author.

Figure 5.4. Average of PSSUQ responses for exploratory wireframe testing. From the author.

70

Negative aspects of the system • “Button test was difficult to see on 4k screen resolution”. • Some testers were expecting the countdown timer to work. • Text contrast was poor.

Positive aspects of the system • Testers liked feedback as they worked through the system. • Testers found the system simple and easy to use. • Testers liked the visibility of system status. • Instructions were very clear.

5.1.4 Exploratory study: recommendations for change

• Consider the inclusion of leader boards. • Consider the inclusion of in-game awards. • Rework some of the questions so they are less ambiguous. • Ensure that there is sufficient contrast between text and background. • Provide a more visually engaging interface. • Refine pre and post-test questionnaires • Refine MTurk HIT acceptance criteria

5.2 Summative test: first iteration of the application

This testing was carried out after the exploratory study, and occurred midway through the product development lifecycle.

5.2.1 Testing with Amazon Mechanical Turk/Loop 11

Purpose

Summative testing was used to ensure that iterative development of the wireframe prototype provided the required changes to the application prototype, and to further refine pre and post- test questionnaires. It also provided an opportunity to test the MySQL databases, which would capture key response time metrics for the A and B tests. Unmoderated testing was carried out, and administered via MTurk and Loop 11.

71

Participant recruitment

All participants were recruited using MTurk. A financial incentive of $6 per test was provided for testers who met the acceptance criteria. Based on testing from the exploratory study, tester criteria was refined.

• Testers were obliged to use a desktop computer to carry out testing. • All testers were required to be located in the US. • Testers were required to have a HIT approval rate of greater than 90%.

The additional criteria were added to encourage the participation of experienced MTurk workers.

Since the study required between-subjects A/B testing, two HITs were generated, with each HIT directing testers to either the ‘flat’ test or the ‘shaded’ test. A total of four participants were recruited, two for each test branch.

Test assets

The HIT consisted of a task description, scope study, task instructions (including consent acceptance), approximate task duration, and a link to the questionnaires and task. The questionnaires and tasks were hosted on the Loop 11 testing platform. See appendix H for details of the task description, scope study and task instruction.

Pre-test questionnaire

The pre-test questionnaire mirrored the questionnaire used for the wireframe testing, but was expanded to include an additional four questions related to tester’s perception of modern user interfaces. See appendix I for details of the pre-test questionnaire.

Task

The task was presented as one of two links: a link to the ‘flat’ subdomain or a link to the ‘shaded’ subdomain.

Post-test questionnaire

The post-test questionnaire was expanded to include an additional six questions that related to tester’s perception of the application and test tasks. See appendix J for post-test questionnaire

72

5.2.2 Results from testing with Amazon Mechanical Turk/Loop 11

Summative testing feedback indicated that the pre and post-test questionnaires provided sufficient feedback to yield useful qualitative data for the final testing. Screen recordings indicated that despite explicit instructions requiring prospective testers to utilise desktops running at 1920x1080 or above, the instruction was ignored or was not seen. The testing generated viable MySQL test data.

In addition, caveats were added to the MTurk worker requirements, with clarifying instructions relating to desktop resolution. Since the research project required between-subjects testing, potential workers could participate only if they had not already participated in any other part of the study.

5.3 Comparison test: application deployment

The objective of the comparison test was to compare the two designs to establish if the shaded variant was more effective than the flat variant. Performance data were collected for each alternative design. Within the context of the research project, the final experiments were run as comparison tests, with one group of participants serving as a control group (testing with flat targets), and the other as the experimental group (testing with shaded targets).

5.3.1 Testing with Amazon Mechanical Turk/Loop 11 and IADT students

Purpose

The comparison test is used to answer the project research questions.

Participant recruitment: MTurk/Loop 11

Testing of remote participants was facilitated through MTurk. A total of 36 remote testers participated, with 22 valid test results. MTurk testing was run in three sets over a four day period. Participants were offered a financial incentive of $6 to complete the testing.

Participant recruitment: IADT Students

On-site testers were recruited from the IADT student population, facilitated by IADT staff. Eighteen testers participated in the on-site testing in IADT, with fourteen valid results. IADT testing took place in a single evening. Students were offered a prize for the highest score from the two tests, one prize for each test.

73

Test assets: MTurk/Loop 11

The HIT consisted of updates to the task description, study scope, and task instructions (including consent acceptance), approximate task duration, plus links to test questionnaires and task, which were hosted on the Loop 11 testing platform. Since the study required A/B testing, two HITs were generated, with each HIT directing testers to either the ‘flat’ test or the ‘shaded’ test. See appendix K for details of the updated task description, study scope and task instructions.

Test assets: IADT Students

IADT students were given an overview of the study scope, task instructions (in the form of a PPT) and consent forms, and links to test questionnaires and task. The students were divided into two groups, one group taking the ‘flat’ test and one the ‘shaded’ test. The questionnaires and task were hosted on the Loop 11 testing platform. See appendix L for details of the study scope, task instructions and link to questionnaires and task.

Pre and post-test questionnaires

The finalised pre and post-test questionnaires mirrored those generated during the summative testing. See appendix I and J for details.

Task

The task mirrored the task used for the summative testing.

Results from the comparison test are detailed in the next chapter.

74

6 Comparison (A/B) test: results

Experimental data was analysed using a combination of IBM SPSS (n.d.) and Microsoft Excel (n.d.). Data was collected from Loop 11 and from the two MySQL databases used for the ‘flat’ and ‘shaded’ subdomains. After each test iteration was run, the MySQL databases were wiped in preparation for the next test iteration.

Details of descriptive and inferential statistical analysis are provided below.

6.1 Descriptive statistics

From a pool of thirty six remote testers, twenty two returned valid results. From a pool of eighteen participants who tested in-situ in IADT, fourteen returned valid test results. Statistical analysis was performed on thirty six tests, eighteen each for shaded and flat conditions.

6.1.1 Pre-test questionnaire data

The pre-test questionnaire collected data relating to tester demographics and attitudes relating to gamifaction. The figures below summarise key findings from the pre-test questionnaire.

Age

Prefer not to say 0

45+ 1

30-45 24 Age

19-29 11

Under 18 0

0 5 10 15 20 25 Responses

Figure 6.1. Age profiles of all testers. 30.5% were aged 19 to 29, 66.6% were aged 30 to 45. From the author.

75

Gender

15 21

Male Female Other Perfer not to say

Figure 6.2. Gender breakdown for testers: 58.4% male, 41.6% female. From the author.

Computer proficiency 20 19 18 17 16 15 14 Responses 13 12 11 10 Novice Competent Expert

Figure 6.3. Computer proficiency of testers. From the author. 53% of testers claimed that they were competent, and 47% claimed they were expert.

76

What devices do you use to access the internet? 35 30 25 20 15 Responses 10 5 0 PC Laptop Tablet Phone Device type

Figure 6.4. The types of devices used to access the internet. From the author. Many testers indicated that they use multiple devices to access the internet.

Do you play computer games?

7

29

Yes No

Figure 6.5. Game play among testers. From the author. 80.5% of testers play computer games. The high proportion of gamers informed results relating to the inclusion of gamification principles.

77

Rate the quality of your vision 20 18 16 14 12 10 8 Responses 6 4 2 0 Poor Normal Corrected normal (wears glasses/) Vision quality

Figure 6.6. Quality of vision of testers. From the author. 53% of testers had normal vision, 47% had corrected normal vision, and none indicated that they suffered from colour blindness. The quality of vision of the testers mirrored those in Creager’s experiments.

What genres do you play? 25

20

15

10 Responses 5

0

Game genre

Figure 6.7. The genres of games played by testers. From the author. Figure 6.7 shows a breakdown of the types of games played by the testers. A high proportion of testers reported playing ‘Action’ type games, so could potentially be predisposed to situations requiring fast reflexes and hand/eye co-ordination, which would be advantageous to the experiment.

78

Game features, rated

In-game awards Help/documentation Challenge Ease of learning Ease of use/controls Visual design Leader boards Game feedback Game progress Game instructions

0 5 10 15 20 25 30 35 40

Unimportant 2 3 4 Important

Figure 6.8. Game features. From the author. Figure 6.8 shows a breakdown of types of game features and their importance to the testers. Help and documentation are not highly rated, challenge, ease of use, visual design and game progress are important.

Findability of interactive elements on modern UIs 16 14 12 10 8

Responses 6 4 2 0 Easy 2 3 4 5 6 Difficult Findability

Figure 6.9. The findability of interactive elements on modern UIs. From the author. The majority of testers responded that they do not have problems with the findability of interactive elements on modern user interfaces (see Figure 6.9). This would fit with the demographic data – the majority, if not all of the testers could be considered ‘digital natives’.

79

Testers were asked to list positive and negative aspects of modern user interfaces.

Positive responses included “Minimal design. Carefully considered colours” (Participant 2 Shaded), “Sleek interfaces. Beautiful. Functional. Easy to find information” (Participant 8 Shaded), “Visual appeal. Simplicity. Findability” (Participant 7 Shaded), “Standard layouts. Beautiful visuals” (Participant 2 Flat), “Extremely easy. Very intuitive.” (Participant 13 Flat), “Simple graphic interfaces. Effective animations. Carefully considered navigation” (Participant 7 Flat).

Negative responses included “Lack of interactive signifiers. Lack of feedback when errors occur” (Participant 1 Shaded), “Misuse of white space. Hamburger menus on non-mobile interfaces” (Participant 12 Shaded), “Poor navigation. Hidden menus” (Participant 8 Shaded), “Buttons not always clear. Not enough contrast” (Participant 6 Flat), “Inconsistent use of icons. Visual clutter. Poor contrast.” (Participant 12 Flat). See figures 6.10 – 6.11 below.

Figure 6.10. Word cloud of tester’s positive opinions of modern user interfaces. From the author.

80

Figure 6.11. Word cloud of tester’s negative opinions of modern user interfaces. From the author.

6.1.2 Post-test questionnaire data

The post-test questionnaire collected data relating to tester opinions and attitudes regarding the task and application interface. See figure 6.12 below for a summary of key findings from the PSSUQ.

81

Please rate your satisfaction with the application

0 5 10 15 20 25 30 35 40 Overall, I am satisfied with how easy it is to use this… It was simple to use this system. I was able to complete the tasks and scenarios quickly… I felt comfortable using this system. It was easy to learn to use this system. I believe I could become productive quickly using this… The system gave error messages that clearly told me… Whenever I made a mistake using the system, I could… The information (such as online help, on-screen… It was easy to find the information I needed. The information was effective in helping me complete… The organization of information on the system screens… The interface of this system was pleasant. I liked using the interface of this system. This system has all the functions and capabilities I… Overall, I am satisfied with this system.

Extremely dissatisfied Very dissatisifed Somewhat dissatisfied Somewhat satisfied Very satisfied Extremely satisfied n/a

Figure 6.12. PSSUQ responses for both shaded and flat tests. From the author.

Figure 6.13. Average of PSSUQ responses for comparison testing for both shaded and flat tests. From the author. The responses to the PSSUQ questionnaire were generally positive with the majority of testers responding that they were very satisfied or extremely satisfied with the system. A notable finding relates to the statement ‘Whenever I made a mistake using the system, I could recover easily and quickly’. In the context of the application, the task is linear, and once the testers are executing

82

the application subtasks, they have no way to deviate or backtrack. It is notable that 23 testers rated this statement with ‘n/a’. Three testers indicated they were ‘very dissatisfied’ for questions six and seven (It was easy to find the information I needed, The information was effective in helping me complete the tasks and scenarios).

Testers were asked to describe their perception of the application. Responses included “Clear, efficient, easy to use” (Participant 2 Shaded), “A bit boring” (Participant 5 Shaded), “Simple, easy, responsive” (Participant 15 Shaded), “Simple, effective, functional” (Participant 7 Shaded), “Simple, repetitive, intuitive” (Participant 6 Shaded), “Too long. Bored quickly. Initial instructions confusing” (Participant 3 Flat), “Okay. Nothing flashy. Application is fast. No issues” (Participant 8 Flat).

Figure 6.14. Word cloud of tester’s perception of the application. From the author. Testers were asked to comment on negative aspects of the application. Feedback included:

• “Boring, long” (Participant 5 Shaded) • “Repetitive. Uninteresting. Staid” (Participant 15 Shaded)

83

• “Under pressure to find a button. Instruction not necessary in beginning levels” (Participant 2 Flat) • “Too much text to read - Instructions are often very similar but same text keeps being repeated” (Participant 4 Flat) • “Moving around to click the next button after a task. Hiding the task number and score when showing the example” (Participant 16 Flat) • “Gamification should make something fun, and I felt that I didn't enjoy this whatsoever” (Participant 3 Flat)

Testers were also asked to comment on the positive aspects of the application. Feedback included:

• “I like the countdown timer. The contrast between the two colours (green and blue). Keeping score makes it really fun” (Participant 2 Shaded) • “Gamification aspects - I was invested. Challenge of the exercise. The Instructions were useful and meaningful to the task” (Participant 3 Shaded) • “Findability. Contrast in most cases. Discoverability” (Participant 7 Shaded) • “Simple. Easy. Responsive” (Participant 15 Shaded) • “The initial activities primed you for looking at a web page for buttons. The timer ensures you don't get distracted” (Participant 8 Shaded)

Finally, testers were asked for any other comments. Comments included:

• “Good exercise I liked it - want a better score!” (Participant 3 Shaded) • “I am not sure if my ability to see the target block was due to the difference in colour or due to the shading. I really liked the high score, immediately engaged me. The area for the test was not representative of the size of modern websites so I wonder what the test would show if a larger area of screen was used. Really enjoyed this.” (Participant 4 Shaded) • “I think this would be an interesting game for kids, but also useful for testing website interfaces in general.” (Participant 1 Flat) • “Would it not be better to use no green and just shading?” (Participant 5 Shaded) • The screengrabs were a lot more interesting (Participant 3 Flat) • The screen grabs were a bit more fun (Participant 6 Shaded)

84

6.1.3 Analysis of experiment test data

Flat response times 3000

2500

2000

1500

1000 Response times (ms)

500

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Test participant

Figure 6.15. Box plot showing mean response times for each ‘flat’ test participant. From the author.

Shaded response times 3000

2500

2000

1500

1000 Response times (ms)

500

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Test participant

Figure 6.16. Box plot showing mean response times for each ‘flat’ test participant. From the author. An analysis of the box plots (figures 6.15, 6.16) for the mean response times shows outliers with longer response times. Further analysis of the data suggested that this was accounted for during the screen grab testing. In general, mean responses for the screen grab tasks were much higher than for the abstract tasks. One explanation for this was that testers were not following instructions for the screen grab tasks, but a review of the screen recordings for the 'valid' tests

85 showed that participants did check the instructions. The increase (and variance) in response times for the grabs could be a result of the relative increase in the visual complexity of the image: the distractor forms become much more complex. Additionally, participants are also scanning for text labels, they are not just relying on shape and colour to find the target, and this additional cognitive load could impact reaction time.

Shaded and flat response times 2000 1800 1600 1400 1200 1000 800 600 Response times (ms) 400 200 0 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 Tasks

Mean response time shaded Mean response time flat Linear (Mean response time shaded) Linear (Mean response time flat)

Figure 6.17. Graphs of mean response times for all tasks. From the author. Figure 6.17 demonstrates the difference in response times for abstract targets versus screen-grab based targets. Testers took much longer to identify the screen-grab based targets (tasks 87 to 96).

6.2 Inferential statistics: analysis of mean response time for all participants

In order to answer the research question “Can the use of shaded objects improve the visual perception of perceived affordances (signifiers) in almost flat user interfaces?” three hypotheses were generated:

• H1: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers • H2: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers that include Gestalt principles

86

• H3: There will be no difference in click times between shaded and unshaded objects in tests with signifiers present on user interfaces derived from existing web sites Two sets of analysis were carried out: one to investigate response times for test participants, and one to investigate response time for test tasks. Analysis for all test participants investigated responses for objects with abstract signifiers, for abstract signifiers that included Gestalt principles, and for signifiers present on user interfaces derived from existing web site. See appendix M for analysis relating to test tasks.

6.2.1 H1: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers

Analysis for the first five trials which cover shaded and unshaded objects with abstract signifiers.

Normality

A Shapiro-Wilk’s test (p > .05) and a visual inspection of their histograms, normal Q-Q plots and box plots showed that the mean response times were approximately normally distributed for both flat and shaded tests, with a skewness of 1.100 (SE = 0.536) and a kurtosis of -0.377 (SE = 1.038) for Flat and a skewness of -0.854 (SE = 0.536) and a kurtosis of -0.700 (SE = 1.038) for Shaded.

The null hypothesis for this test of normality, is that the data are normally distributed. The null hypothesis is rejected if the p-value is below 0.05.

In both cases, the p-values are above 0.05, thus in terms of the Shapiro-Wilk test, it is assumed that the data are approximately normally distributed, and as a result it meets the assumptions required for an independent t-test.

Independent t-test

An independent-samples t-test was conducted to compare response times for the shaded and flat target conditions.

There was no significant difference in the scores for shaded response (M = 788.92, SD = 73.11) and flat response (M = 815.14, SD = 119.20) conditions; t(34) = -0.796, p = 0.432, two-tailed. These results suggest that when each tester completed all tests with abstract signifiers, there is no difference in response times between test subjects who completed shaded testing and test subjects who completed flat testing. Therefore, the null hypothesis H1 is accepted.

87

6.2.2 H2: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers that include Gestalt principles

Analysis for trials six through to twelve, which cover shaded and unshaded objects with abstract signifiers that include Gestalt principles.

Normality

A Shapiro-Wilk’s test (p > .05) and a visual inspection of their histograms, normal Q-Q plots and box plots showed that the mean response times were approximately normally distributed for both flat and shaded tests, with a skewness of 1.612 (SE = 0.536) and a kurtosis of -0.054 (SE = 1.038) for Flat and a skewness of -0.243 (SE = 0.536) and a kurtosis of -0.438 (SE = 1.038) for Shaded.

The null hypothesis for this test of normality, is that the data are normally distributed. The null hypothesis is rejected if the p-value is below 0.05.

In both cases, the p-values are above 0.05, thus in terms of the Shapiro-Wilk test, it is assumed that the data are approximately normally distributed, and as a result it meets the assumptions required for an independent t-test.

Independent t-test

An independent-samples t-test was conducted to compare response times for the shaded and flat target conditions that include Gestalt principles.

There was no significant difference in the scores for shaded response (M = 828.80, SD = 80.39) and flat response (M = 861.86, SD = 147.28) conditions; t(34) = -0.836, p = 0.409, two-tailed. These results suggest that when each tester completed all tests with abstract signifiers, there is no difference in response times between test subjects who completed shaded testing and test subjects who completed flat testing. Therefore, the null hypothesis H2 is accepted.

88

6.2.3 H3: There will be no difference in click times between shaded and unshaded objects in tests with signifiers present on user interfaces derived from existing web sites

Analysis for trials thirteen through to twenty two, which cover signifiers present on user interfaces derived from screen grabs.

Normality

A Shapiro-Wilk’s test (p > .05) and a visual inspection of their histograms, normal Q-Q plots and box plots showed that the mean response times were approximately normally distributed for both flat and shaded tests, with a skewness of 0.330 (SE = 0.536) and a kurtosis of 0.053 (SE = 1.038) for Flat and a skewness of 0.651 (SE = 0.536) and a kurtosis of 0.145 (SE = 1.038) for Shaded.

The null hypothesis for this test of normality, is that the data are normally distributed. The null hypothesis is rejected if the p-value is below 0.05.

In both cases, the p-values are above 0.05, thus in terms of the Shapiro-Wilk test, it is assumed that the data are approximately normally distributed, and as a result it meets the assumptions required for an independent t-test.

Independent t-test

An independent-samples t-test was conducted to compare response times for the shaded and flat target conditions.

There was no significant difference in the scores for shaded response (M = 1264.56, SD = 217.84) and flat response (M = 1313.43, SD = 210.19) conditions; t(34) = -0.685, p = 0.498, two-tailed. These results suggest that when each tester completed all screen grab tests, there is no difference in response times between test subjects who completed shaded testing and test subjects who completed flat testing. Therefore, the null hypothesis H3 is accepted.

89

7 Discussion 7.1 Overview

The aim of this study was to investigate the application of shaded elements on almost flat interfaces, with a goal to extend the work of Creager (2017). In order to carry out the investigation, the following research question was formulated:

Can the use of shaded objects improve the visual perception of perceived affordances (signifiers) in almost flat user interfaces?

To answer the research question, three hypotheses were generated: H1: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers H2: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers that include Gestalt principles H3: There will be no difference in click times between shaded and unshaded objects in tests with signifiers present on user interfaces derived from existing web sites

In tandem with the question, the impact of certain Gestalt principles (figure/ground, proximity and good continuation) on clickability, discoverability and findability were examined. The test platform was built with respect to Google Material Design guidelines and included gamification principles, and testing measured the effectiveness of gamification on user experience.

7.2 Research findings

The primary component of the research study examined the effectiveness of shaded objects on almost flat interfaces. In order to address the research questions, an A/B test was carried out using between-group testing, for the ‘flat’ and ‘shaded’ abstract targets. Statistical analysis was carried out across a number of dimensions. The results were contradictory. In one dimension, the mean response times (click times) for all test participants was analysed. Independent t-tests (and Mann-Whitney U tests for non-normal data sets) were carried out on the mean response times for the identification of test targets. The results of the analysis suggest that there is no significant difference in response times between shaded and flat target identification, which would indicate that shaded targets do not improve visual perception of perceived affordances in almost flat user interfaces, and that the application of shaded objects cannot enhance the clickability, discoverability and findability of user interface elements on almost flat interfaces.

90

In a second dimension, the mean response times for all test tasks were analysed (see appendix M). Independent t-test (and Mann-Whitney U tests for non-normal data sets) were carried out on the mean response times for the identification of test targets. The results of the analysis suggest that there is a significant difference in response times between shaded and flat abstract target identification, which would indicate that shaded targets do improve visual perception of perceived affordances in almost flat user interfaces, and that the application of shaded objects can enhance the clickability, discoverability and findability of user interface elements on almost flat interfaces. However, when the screen grab targets were analysed, there was no significant difference in response time between shaded and flat target identification. With regard to the findings for the screen grab targets alone, the fact that the target group consisted of ten items would impact the statistical significance.

A potential explanation for the conflicting results between task and participant analysis could be that the sample sizes vary. The sample size for the mean response times for all test tasks included ninety six tasks, whereas the sample size for the mean response time for all test participants was eighteen testers. Testing with a larger set of test participants would increase the likelihood of achieving statistical significance.

It should be noted that Creager’s original experiment included 17 test participants (within- subject) who completed 360 trials.

7.2.1 H1: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers

Statistical analysis was carried out to test for abstract target/distractor groupings.

In total, five target/distractor groups were compared. An independent t-test were carried out on the mean response times for the identification of test targets. The results of the analysis suggest that there is no significant difference in response times between shaded and flat target identification, which would indicate that the use of shaded objects does not improve visual perception of perceived affordances in almost flat user interfaces, and that the application of shaded objects cannot enhance the clickability, discoverability and findability of user interface elements on almost flat interfaces.

A potential explanation for the result could be that the sample size for each target/distractor group was too low: The first trial consisted of two target distractor arrays, the second consisted of four arrays, and the third through to the fifth set of groups consisted of eight target/distractor

91 arrays. Testing with larger sets of target/distractor arrays would increase the likelihood of achieving statistical significance.

7.2.2 H2: There will be no difference in click times between shaded and unshaded objects in tests with abstract signifiers that include Gestalt principles

Statistical analysis was carried out to test for target/distractor groupings that included Gestalt principles, primarily figure/ground, proximity and good continuation. An additional test included a target/distractor group rendered with a low-contrast background (contrast ratio of 1.5:1).

In total, nine target/distractor groups were compared. An independent t-test was carried out on the mean response times for the identification of test targets. The results of the analysis suggest that there is no significant difference in response times between shaded and flat target identification, which would indicate that applying Gestalt principles to shaded targets/distractor groups does not improve visual perception of perceived affordances in almost flat user interfaces, and that the application of shaded objects cannot enhance the clickability, discoverability and findability of user interface elements on almost flat interfaces.

A potential explanation for the result could be that the sample size for each target/distractor group was too low: each group consisted of eight target/distractor arrays. Testing with larger sets of target/distractor arrays would increase the likelihood of achieving statistical significance.

7.2.3 H3: There will be no difference in click times between shaded and unshaded objects in tests with signifiers present on user interfaces derived from existing web sites

Statistical analysis was carried out on two sets of ten web site screen captures. One set contained targets that were altered to generate unshaded objects, and the other set contained targets containing shaded objects. An independent t-test was carried out on the mean response times for the identification of test targets. The results of the analysis indicated that there is no significant difference in response times between shaded and flat target identification, with the implication that the use of shading on signifiers does not improve visual perception of perceived affordances in almost-fat user interfaces, and that the application of shaded objects cannot enhance the clickability, discoverability and findability of user interface elements on almost flat interfaces. A potential explanation for the result could be that the sample size for each target/distractor group was too low: a total of ten screen grab samples were tested. Testing with larger sets of screen grabs would increase the likelihood of achieving statistical significance.

92

7.2.4 Use of gamification principles as outlined by Kumar (Kumar, 2013)

Pre and post-test questionnaires yielded qualitative data relating to the use of gamification in the experiment. Feedback was generally positive, and the use of gamification indicated that the majority of test participants became engaged in the testing process. The use of the timer and scoring enhanced the gamification aspects. Dissatisfaction was noted by some participants, and in one notable case, a test participant with significant gaming experience commented on the dullness of the ‘challenge’, commenting that “Gamification should make something fun, and I felt that I didn't enjoy this whatsoever” (Participant 3 Flat).

On the basis of the mixed feedback relating to gamification, the application of gamification features can be useful in formal testing environments, but the provision of sufficient and balanced challenge to participants (without compromising the underlying experiment) is particularly problematic. Additional development and testing cycles would assist in refining the gaming experience.

93

8 Conclusions 8.1 Overview

This study examined the impact of the visual perception of shaded and flat objects on the usability of almost flat interfaces. The study expanded previous work by Creager, which explored the visual perception of shaded and flat target/distractor arrangements. The experiment expanded on Creager’s work, and included an analysis of the application of Gestalt principles, and gamification in formal experiments. The research data and results generated mixed feedback, with some results confirming Creager’s experiment, and other results that were counter to Creager’s. Quantitative results for experiments relating to the application of Gestalt principles yielded no statistical significance, but the qualitative results from the application of gamification principles demonstrated positive results.

8.2 Key contributions

The quantitative results relating to the key research question (can the use of shaded objects improve the visual perception of perceived affordances (signifiers) in almost flat user interfaces?) and the investigation of Gestalt principles were ambiguous, but the qualitative results of the implementation of gamification principles to the experiment were positive. The base testing platform was robust and generated consistent and valid test data.

8.3 Limitations

A primary limiting factor for the study relates to the low number of test participants, which affected the statistical significance of the results.

The test platform targeted an ideal desktop resolution of 1920x1080 pixels. Any tests conducted at a lower resolution severely compromised test results, and resulted in the rejection of a large portion of test results. Testing at a significantly higher resolution, for example, 4k, would yield equally compromised results. Ideally, the tests would be independent of desktop resolution.

Remote testing using the Amazon Mechanical Turk platform yielded mixed results. Since the tests were remote and unmoderated, there was no possibility of the test participant being able to interact with the test moderator. The use of remote testing impacted the consistency of the test set-up, and the moderator had no control over the physical environment of the participant. Analysis of screen recordings revealed that participants were also liable to ignore instructions,

94

and in a number of cases, participants tested using sub-optimal screen resolutions. These tests had to be discarded.

Problems relating to the performance of the Loop 11 testing platform were evident. Some test participants complained of latency issues, and in some cases had to abandon testing. Latency issues also affected response times, and a number of test results had to be discarded. While Loop 11 provides an integrated testing platform and provides extensive and powerful data analytics, the user experience provided by the in-test user interface was problematic for some test participants, and impacted the quality of some responses.

Some planned application functionality was not fully implemented, which compromised the user experience. This had a particular impact on the implementation of the instruction/help screens, and impeded the ‘flow’ aspect of the gamification experience. This was noted by testers in the post-test questionnaires.

8.4 Future research

The primary scope of the research tested with a mix of abstract target/distractor forms, and with a small set of screen grab based target/distractor forms. Testing with a limited number of screen grab target/distractor forms did not generate statistically significant results, but the inclusion of these test tasks produced interesting qualitative feedback from the test participants. Future research could extend the testing of visual perception of flat and shaded objects to include a larger set of screen grab target/distractor sets. Given the limited number of test participants, additional research with higher participant numbers would reveal more statistically significant results.

The extension of Google Material Design principles to support shaded objects warrants further research. The research project extended the standards to a narrow set of user interface elements. Extending the existing Material Design guidelines to include a comprehensive set of shaded user interface elements is a non-trivial task.

The original scope of the application included gamification features that were not included in the final application. The inclusion of in-game awards and the implementation of a leader board would have augmented gamification features. The testing platform and procedure could be extended by others to run additional experiments.

95

8.5 Project reflection

The research provided an opportunity to explore a facet of visual perception that was insightful and relevant to the evolution of contemporary user interfaces. The literature review outlined how key design concepts are recycled and repurposed, the iterative development of the application prototypes provided opportunities to exercise all of the tools provided to me during the Masters course, and the implementation of the final application artefact provided technical and visual challenges.

The planning and execution of the project provided many challenges. The project extended over ten months, and meeting intermediate project goals proved challenging. Two challenges in particular involved deciding on the form factor, and implementing remote user testing. Decisions relating to form factor included research into potential software solutions. Remote user testing required a rapid familiarisation with the Amazon Mechanical Turk and the Loop 11 platforms.

Remote user testing proved to be insightful and frustrating in equal measures. The process of planning, conducting and refining remote tests was extremely insightful. Remote user testing was conducted iteratively, and insights gained from each iteration informed later testing iterations. In hindsight, the trade-off between access to a large pool of potential test participants via the Amazon Mechanical Turk platform versus moderated user testing would lead me to not recommend the use of the Mechanical Turk for testing a similar experiment.

8.6 User research and design artefacts

A link to all user research and design artefacts is located in Appendix N.

96

References

Accessibility - Usability. (n.d.). Retrieved December 3, 2017, from

https://material.io/guidelines/usability/accessibility.html#accessibility-color-contrast

Aflafla1. (2014). English: Iterative development model. Retrieved from

https://commons.wikimedia.org/wiki/File:Iterative_development_model.svg

Akzidenz-Grotesk. (2017, October 23). In Wikipedia. Retrieved from

https://en.wikipedia.org/w/index.php?title=Akzidenz-Grotesk&oldid=806613169

Amazon Mechanical Turk. (n.d.). Retrieved March 30, 2018, from https://www.mturk.com/

Amazon.com: Online Shopping for Electronics, Apparel, Computers, Books, DVDs & more.

(n.d.). Retrieved April 11, 2018, from https://www.amazon.com/

Angular. (n.d.). Retrieved April 17, 2018, from https://angular.io/

Apple. (n.d.). Retrieved April 11, 2018, from https://www.apple.com/

Balsamiq Products | Balsamiq. (n.d.). Retrieved April 9, 2018, from

https://balsamiq.com/products/

Baraniuk, C. (2012). How We Started Calling Visual Metaphors Skeuomorphs and Why the

Debate over Apple’s Interface Design is a Mess - The Machine Starts. Retrieved August

23, 2017, from http://www.themachinestarts.com/read/2012-11-how-we-started-

calling-visual-metaphors-skeuomorphs-why-apple-design-debate-mess

Brownie, B. (2006). Gestalt Theories and Principles.pdf.

Buy Adobe XD CC | UX/UI design, prototyping & collaboration tool. (n.d.). Retrieved April 9,

2018, from https://www.adobe.com/ie/products/xd.html

Campbell-Dollaghan, K. (2013). What Is Flat Design? | Gizmodo Australia. Retrieved August

23, 2017, from https://www.gizmodo.com.au/2013/05/what-is-flat-design/

Carr, A. (2014, January 25). Windows 8: The Boldest, Biggest Redesign In Microsoft’s History.

Retrieved August 23, 2017, from https://www.fastcodesign.com/1670705/microsoft-

new-design-strategy

97

Cechanowicz, J., Gutwin, C., Brownell, B., & Goodfellow, L. (2013). Effects of gamification on

participation and data quality in a real-world market research domain. In Proceedings of the

First International Conference on Gameful Design, Research, and Applications (pp. 58–65). ACM.

Clayton, S. (2013). Modern Design at Microsoft. Retrieved September 29, 2017, from

http://www.microsoft.com/en-us/stories/design/

Cooper, A. (2014). About face: the essentials of interaction design, 4th edition (4th edition). Indianapolis,

IN: John Wiley and Sons.

Crazy Diamond | A Creative WordPress Blog Theme. (n.d.). Retrieved April 11, 2018, from

http://demo.lollum.com/crazydiamond/

Creager, J. H. (2017). Understanding the Findability and Perceived Clickability of Shaded and

Flat Objects in Almost-flat Interfaces. Retrieved from

https://repository.lib.ncsu.edu/bitstream/handle/1840.20/33554/etd.pdf?sequence=1

Creager, J. H., & Gillan, D. J. (2016). Toward Understanding the Findability and Discoverability

of Shading Gradients in Almost-Flat Design. Proceedings of the Human Factors and Ergonomics

Society Annual Meeting, 60(1), 339–343. https://doi.org/10.1177/1541931213601077

Csikszentmihalyi, M. (2008). Flow: The Psychology of Optimal Experience (1st ed.). Harper Perennial

Modern Classics.

Deterding, S., Sicart, M., Nacke, L., O’Hara, K., & Dixon, D. (2011). Gamification. Using game-

design elements in non-gaming contexts. In CHI’11 extended abstracts on human factors in

computing systems (pp. 2425–2428). ACM.

Fluent Design System. (2017, August 24). Retrieved August 24, 2017, from

https://fluent.microsoft.com/

Gamification in UX. Increasing User Engagement. (2017). Retrieved November 30, 2017, from

https://uxplanet.org/gamification-in-ux-increasing-user-engagement-6437cbf702aa

98

Gillan, D. J., Holden, K., Adam, S., Rudisill, M., & Magee, L. (1992). How should Fitts’ Law be

applied to human-computer interaction? Interacting with Computers, 4(3), 289–290.

https://doi.org/10.1016/0953-5438(92)90018-B

Gosnell, P. (2012). “I have discovered the following truth and presented it to the world: cultural

evolution is synonymous with the removal of ornament from articles in daily use.”—

Adolf Loos, architect. Retrieved from

https://www.patrickgosnell.com/s/Skeuomorphism.pdf

IBM SPSS Software | IBM Analytics. (n.d.). Retrieved March 27, 2018, from

https://www.ibm.com/analytics/data-science/predictive-analytics/spss-statistical-

software

Introduction - Material design. (2017, August 24). Retrieved August 24, 2017, from

https://material.io/guidelines/

Introduction to Web Accessibility | Web Accessibility Initiative (WAI) | W3C. (n.d.). Retrieved

April 11, 2017, from https://www.w3.org/WAI/intro/accessibility.php

Kuan, P.-H., Huang, I.-C., Wang, Y., Li, M., & Duh, H. B.-L. (2015). TAS MOVE: The

Processes of Applying Flat Design in an Efficiency Require Mobile Application. In

IASDR2015 Congress (pp. 1175–1188). Retrieved from

https://flatisbad.com/publications/Kuan-IASDR15.pdf

Kumar, J. (2013, August 1). Five Steps to Enterprise Gamification | UX Magazine. Retrieved

November 30, 2017, from https://uxmag.com/articles/five-steps-to-enterprise-

gamification

Lankinen, L. (2017). Animation guidelines for hybrid mobile applications. Retrieved from

https://www.theseus.fi/handle/10024/125455

Lee, K.-E., & Choe, J.-H. (2016). A Comparative Study about Style Change of Modern Art and

GUI. International Journal of Applied Engineering Research, 11(2), 788–792.

99

Left vs Right: Brain Training - Apps on Google Play. (n.d.). Retrieved April 9, 2018, from

https://play.google.com/store/apps/details?id=com.mochibits.google.leftvsright

Lewis, J. R. (2013). Psychometric Evaluation of the PSSUQ Using Data from Five Years of

Usability Studies, 27.

Lidwell, W., Holden, K., & Butler, J. (2010). Universal principles of design, revised and updated: 125 ways

to enhance usability, influence perception, increase appeal, make better design decisions, and teach through

design. Rockport Pub. Retrieved from

http://books.google.com/books?hl=en&lr=&id=3RFyaF7jCZsC&oi=fnd&pg=PA3&d

q=%22supposition.+It+gives+the+designer+the+ability+to+calculate+the+product%2

2+%22driving+forces+behind+human+motivation+and+brings+the+designer+to%22

+%22of+accumulated+knowledge+led+to+increased+specialization%22+&ots=x7KY

fEzVIp&sig=bW2XSq0SLCk64f3iiBiNVu0jLm4

Lieberoth, A. (2015). Shallow Gamification: Testing Psychological Effects of Framing an

Activity as a Game. Games and Culture, 10(3), 229–248.

https://doi.org/10.1177/1555412014559978

Liu, B., & Todd, J. T. (2004). Perceptual biases in the interpretation of 3D shape from shading.

Vision Research, 44(18), 2135–2145. https://doi.org/10.1016/j.visres.2004.03.024

Matthews, W. (2017, October 29). Ernst Keller. Retrieved October 29, 2017, from

http://www.historygraphicdesign.com/the-age-of-information/the-international-

typographic-style/805-ernst-keller

Meggs, P. B., & Purvis, A. W. (2006). History of Graphic Design (4th ed.). Wiley.

Meyer, K. (2015). Long-Term Exposure to Flat Design: How the Trend Slowly Makes Users

Less Efficient. Retrieved August 24, 2017, from

https://www.nngroup.com/articles/flat-design-long-exposure/

Microsoft Excel 2016, Spreadsheet Software, Excel Free Trial. (n.d.). Retrieved March 27, 2018,

from https://products.office.com/en-ie/excel

100

MIT App Inventor | Explore MIT App Inventor. (n.d.). Retrieved April 9, 2018, from

http://appinventor.mit.edu/explore/

NBC TV Network - Shows, Episodes, Schedule. (n.d.). Retrieved April 11, 2018, from

https://www.nbc.com

Nielsen, J. (1993, November). Iterative Design of User Interfaces. Retrieved March 30, 2018,

from https://www.nngroup.com/articles/iterative-design/

Nielsen, J. (1995). 10 Heuristics for User Interface Design: Article by Jakob Nielsen. Retrieved

April 13, 2017, from https://www.nngroup.com/articles/ten-usability-heuristics/

Norman, D. (2013). The Design of Everyday Things (Revised and expanded edition). Basic Books.

Online User Testing Tool | Loop11. (n.d.). Retrieved April 17, 2018, from

http://www.loop11.com/

Photo editing apps for Mac, PC, and mobile | Adobe Photoshop Family. (n.d.). Retrieved April

20, 2018, from https://www.adobe.com/ie/products/photoshopfamily.html

Preece, J., Sharp, H., & Rogers, Y. (2015). Interaction Design: Beyond Human-Computer Interaction (4th

ed.). Wiley.

Ramachandran, V. S. (1988). Perceiving shape from shading. Scientific American, 259(2), 76–83.

Riley, W. (2013). Wells Riley - Less Aesthetic, More Design. Retrieved August 23, 2017, from

https://wells.ee//blog/less-aesthetic-more-design

Rubin, J., & Chisnell, D. (2008). Handbook of Usability Testing: How to Plan, Design, and Conduct

Effective Tests (2nd ed.). Wiley.

Sanchez, E. (2012). Skeuominimalism - The Best of Both Worlds - Blog. Retrieved August 24,

2017, from http://edwardsanchez.me/blog/13568587/skeuminimalism

Schiola, E. (2014, March 15). How to make an Administrator user account in Windows 8.

Retrieved October 1, 2017, from https://www.digitaltrends.com/computing/how-to-

make-an-administrator-user-account-in-windows-8/

101

Schneidermeier, T., Hertlein, F., & Wolff, C. (2014). Changing paradigm–changing experience?

In International Conference of Design, User Experience, and Usability (pp. 371–382). Springer.

Retrieved from http://link.springer.com/chapter/10.1007/978-3-319-07668-3_36

Screen Resolution Stats North America. (n.d.). Retrieved April 20, 2018, from

http://gs.statcounter.com/screen-resolution-stats/all/north-america

Still, J. D., & Dark, V. J. (2013). Cognitively describing and designing affordances. Design Studies,

34(3), 285–301. https://doi.org/10.1016/j.destud.2012.11.005

Turn data collection into an experience |. (n.d.). Retrieved April 17, 2018, from

https://www.typeform.com/

Uncrate. (n.d.). Retrieved April 11, 2018, from https://uncrate.com/

Understanding Success Criterion 1.4.3 | Understanding WCAG 2.0. (n.d.). Retrieved November

29, 2017, from https://www.w3.org/TR/UNDERSTANDING-WCAG20/visual-

audio-contrast-contrast.html

W3C working group, G18. (2016). Retrieved November 29, 2017, from

https://www.w3.org/TR/WCAG20-TECHS/G18.html

Weston, R. (1996). Modernism (1st ed.). Phaidon Press.

Zakia, R. (1997). Perception and Imaging (1st ed.). Focal Press.

Zune HD made me wish I never bought an iPod. (2017, March 3). Retrieved April 20, 2018,

from https://www.windowscentral.com/zune-hd-2017-or-how-i-wish-i-didnt-buy-ipod

102

Appendix A Justification for the use of gamification According to Deterding, Sicart, Nacke, O’Hara, and Dixon, (2011), gamification refers to the use of ‘video game elements in non-gaming systems to improve user experience (UX) and user engagement’.

Studies have demonstrated that the use of gamification in the context of formal testing can enhance user participation and drive completion of test tasks. Lieberoth, (2015) noted that:

“Positive experiences in games used for serious purposes might stem from a combination of mechanics, superficial but alluring outward design, and the expectations of fun generated when people believe they are about to play a game. Indeed, visual appeal and simple interactions seem to be among the strongest psychological attractors for the casual gamer.” (P. 3 para. 2).

Liebroth also noted that while core mechanics and aesthetic production values in serious games may not compare to those of successful commercial titles, psychological criteria such as effort can be sparked by the framing an activity as a game.

By introducing gamification into the experiment, there is a concern that the addition of gamification could negatively impact the quality of data obtained. A study by Cechanowicz, Gutwin, Brownell, and Goodfellow (2013) investigated the effects of gamification on participation and data quality in a real-world market research domain. They outlined a number of previous studies relating to gamification utilisation in surveys and found that the addition of gamification did not have a negative impact on data quality, and in some instances had a positive impact on data quality. For their experiment, three versions of a gamified market research survey was tested alongside an established industry standard in a study of over 600 participants. One of the research questions asked “does gamification compromise the quality of data produced by users?” Within the context of their research study, they found no significant differences were found in their measures of response quality, although they did note that the design of game elements may change respondent data, so game design is an important factor in data quality.

In an article on UX Magazine, Kumar (2013) outlines five steps to enterprise gamification, which can be applied to the application.

Kumar details a five step process called “player centred design”.

103

Player centred design The five steps are:

1. Know your player 2. Identify your mission 3. Understand human motivation 4. Apply mechanics 5. Manage, monitor and measure

Know your player The first step in the player centred design approach is to understand the player and their context. In this case, the test setup dictates the context – the test scenario; and the player – students from within the IADT, and testers from the Amazon Mechanical Turk crowd-sourced internet marketplace.

Identify your mission This step involves understanding the current scenario and identifying the desired outcome, and setting an appropriate mission for the gamification project. In this instance, there is a requirement to achieve an outcome where players have provided sufficient data to provide answers to the research questions. The mission will take the form of a sequence of visual perception problems that the tester needs to solve.

Understand human motivation Kumar identifies two types of motivation: intrinsic and extrinsic. Intrinsic refers to internal motivations such as autonomy, mastery, curiosity and meaning. Extrinsic refers to external motivations such as money, trophies etc. Within the context of the project, the primary motivators will be extrinsic, in the form of a potential prize, plus the award of points and the possible use of badges or trophies.

Apply mechanics With an understanding of the player, the mission and the motivating factors, game mechanics can be applied, in the form of game rules and a core engagement loop.

In an article on UX Planet, (2017), a number of game mechanics can be identified, which would be useful in the context of the application, including:

• Challenge: challenge is one of the most compelling of game elements that can be used to motivate users. Challenge can be used with some kind of reward system.

104

• Points: these are the granular units of measurement in gamification (Kumar, 2013). In the context of the application, points will correspond to the task completion time, in milliseconds. The tester will be presented with their completion time after they have completed each task. • Badges/trophies: once the player has accumulated a certain number of points, they may be awarded badges, which correspond to a form of virtual achievement. Within the application, this could be manifest in a badge when the tester completes a particular set of trials. • Leader boards: “Not many things can motivate users better than the desire to be the leader” (“Gamification in UX. Increasing User Engagement,” 2017). Leader boards can provide motivation to excel, though in some instances leader boards can be demotivating due to the high rank of others. • Journey: “The user should feel as the real player starting the personal journey of the product usage”. Journey can be used to progressively disclose features. “Providing the information about the progress of the user’s journey, we can inspire them to continue.” (“Gamification in UX. Increasing User Engagement,” 2017) • Constraints: constraints are typically utilised as limits on the time to task completion. “Constraints make people react faster and somehow motivate them to take an action right here and now.” With respect to the application, a time constraint of three seconds per test will be used (similar to Creager). The tester will be prompted in advance that they have three seconds to complete the task, i.e. find the target.

Kumar (2013) identifies additional game mechanics:

• Emotion: “Aesthetic elements like high quality artwork and content like humorous microcopy have the power to set a playful tone and engage the player on an emotional level” Kumar, (2013). Visual aesthetics will be considered for the development of the application, and a non-formal tone of voice will be utilised. • Game rules: when relevant game mechanics have been identified, the next step is to define the game. Within the context of the development of the application, the rules will be very simple. The player has a set amount of time to identify the target and click on it. They receive feedback on their speed of completion, and the reaction time is logged for future analysis. When all tasks have been completed, the tester will be given feedback regarding their performance.

105

• Engagement loop: the engagement loop refers to the combination of game mechanics, positive reinforcement and feedback loops that keep the player engaged in the game. Engagement will be reinforced between sub-tasks, and when the player transitions between major tasks.

Manage, monitor and measure While this step is most applicable to enterprise-level gamification, it is useful to bear the principles in mind during the development and testing of the application.

Flow In addition to gamification, the concept of flow will be applied to the application, where possible.

Csikszentmihalyi, (2008) defines flow as “a state in which people are so involved in an activity that nothing else seems to matter; the experience is so enjoyable that people will continue to do it even at great cost, for the sheer sake of doing it.”

106

In order to achieve this state, a person must be maintained in a ‘flow channel’, as shown in figure A1 below. To maintain an optimal flow experience, the task that the person is attending to should ideally maintain sufficient challenge and complexity that neither generates feelings of high anxiety or boredom. So to maintain a flow state, task complexity would ideally increase so as to generate new goals that do not push the user into an overly anxious or bored state. If repeated tasks are too simple, the user risks becoming bored. If the tasks become too complex too soon the user risks becoming overly anxious.

Figure A1. Maintaining a flow experience. From Csikszentmihalyi, (2008) In relation to the tasks that will be presented to the testers in the experiments, the application will be designed to provide a gradual increase in task complexity, so as to offer a flow-like experience, rather than one that is more akin to a formal test, which could potentially result in tester anxiety.

107

Appendix B

Selected code samples The Index.html file which sets up the whole experiment. Note that Loop 11 specific code has been added to the meta section of the html file.

Visual Perception Research

Visual Perception

This application tests visual perception of user interface elements.

The application is part of my final year UX research project.

Thank you for your assistance!

NEXT

108

109

Sample test batch setup information. This set up the target/distractor combination, set the timer limit (3000 miliseconds), and generated the image map co-ordinates and user feedback strings.

batch1: [{ img: 'images/2x1_flat01.png', userTimeResponse: 0, timeLimit: 3000, coords: '600,405,700,505', feedback: 'You found the shape in', countText: 'Your time starts in', fail: 'Time is up! you didn\'t find the shape' }, { img: 'images/2x1_flat02.png', userTimeResponse: 0, timeLimit: 3000, coords: '500,405,600,505', feedback: 'You found the shape in', countText: 'Your time starts in', fail: 'Time is up! you didn\'t find the shape' }],

110

Below is the PHP used to write results to the MySQL database. Note that the access details for the database have been deleted for security purposes. The PHP included an echo command that was used during testing to verify that data was being successfully written to the database.

$id = $_POST['id']; $batchName = $_POST['batchName']; $pageNumber = $_POST['pageNumber']; $responseTime = $_POST['responseTime'];

// Create connection $conn = new mysqli($servername, $username, $password, $dbname); // Check connection if ($conn->connect_error) { die("Connection failed: " . $conn->connect_error); }

// Insert data into table $sql = "INSERT INTO userData (id, batchName, pageNumber, responseTime) VALUES ('$id', '$batchName', '$pageNumber', '$responseTime')"; if ($conn->query($sql) === TRUE) { $last_id = mysqli_insert_id($conn); echo "New record created successfully. Last inserted ID is: " . $last_id; } else { echo "Error: " . $sql . "
" . $conn->error; }

$conn->close(); ?>

111

.JS code containing the primary logic functions. function swap() { document.getElementById("wrap").style.display = 'block'; document.getElementById("more").style.display = 'none'; } function resetSwap() { var x = document.getElementById("more"); if (x.style.display === "none") { x.style.display = "block"; } var y = document.getElementById("wrap"); if (y.style.display === "block") { y.style.display = "none"; } } function guid() { function s4() { return Math.floor((1 + Math.random()) * 0x10000) .toString(16) .substring(1); } return s4() + s4() + '-' + s4() + '-' + s4() + '-' + s4() + '-' + s4() + s4() + s4(); } var TheBatch = (function() { var batches, uuid = guid(), $ele = {}, current = null, feedback = null, timer = null, score = 0, missed = 0, pagination = { batch: null, currentlevel: 1, totalLevels: 0, batchScreen: 0, batchCount: 0, count: 1, defaultCount: 3, stopWatchTime: 0, absCurrentLevel: 1, absTotalLevels: 0, absCurrentScreen: 1, absTotalScreens: 0 }, setup = function(obj) { batches = obj; pagination.batch = Object.keys(batches)[pagination.batchScreen];

112

// pagination.absCurrentLevel = getResponses().totalScreens; pagination.absTotalLevels = Object.keys(batches).length; pagination.absTotalScreens = Object.keys(batches[pagination.batch]).length; // Countdown ELEs and defaults $ele.countdownHolder = $('#countdown'); $ele.countdownInstructions = $('#countdown__instructions'); $ele.countdownTime = $('#countdown__time'); $ele.countdownScreen = $('#countdown__screen'); $ele.countdownTotalScreens = $('#countdown__totalscreens'); $ele.countdownLevel = $('#countdown__level'); $ele.countdownTotalLevels = $('#countdown__totallevels'); $ele.countdownHolder.hide(); pagination.count = pagination.defaultCount; // Screen ELEs and defaults $ele.screenHolder = $('#screen'); $ele.screenCurrent = $('#screen__current'); $ele.screenTotal = $('#screen__total'); $ele.screenImg = $('#screen__image'); $ele.screenCoords = $('#screen__coords'); $ele.screenScore = $('#screen__score') $ele.screenLevel = $('#screen__level'); $ele.screenTotalLevels = $('#screen__totallevels'); $ele.screenScreen = $('#screen__screen'); $ele.screenTotalScreens = $('#screen__totalscreens'); $ele.screenHolder.hide(); // Feedback ELEs and defaults $ele.feedbackHolder = $('#feedback'); $ele.feedbackMsg = $('#feedback__message'); $ele.feedbackScore = $('#feedback__score'); $ele.feedbackHolder.hide(); // Batch ELEs and defaults $ele.batchFeedbackHolder = $('#batch'); $ele.batchFeedbackCount = $('#batch__count'); $ele.batchLevel = $('#batch__level'); $ele.batchFeedbackMsg = $('#batch__message'); $ele.batchScore = $('#batch__score'); $ele.batchMissed = $('#batch__missed'); $ele.batchFeedbackHolder.hide(); // Thanks ELEs and defaults $ele.thanksHolder = $('#done'); $ele.thanksScore = $('#done__score'); $ele.thanksHolder.hide(); }, startCountdown = function() { $ele.feedbackHolder.hide(); $ele.batchFeedbackHolder.hide(); current = batches[pagination.batch][pagination.batchScreen]; $ele.countdownInstructions.text(current.countText); $ele.countdownTime.text(pagination.count); $ele.countdownLevel.text(pagination.absCurrentLevel); 113

$ele.countdownTotalLevels.text(pagination.absTotalLevels); $ele.countdownScreen.text(pagination.absCurrentScreen);

$ele.countdownTotalScreens.text(pagination.absTotalScreens); $ele.countdownHolder.show(); setTimeout(function() { if (pagination.count <= 1) { $ele.countdownHolder.hide(); pagination.count = pagination.defaultCount; showScreen(); } else { pagination.count--; startCountdown(); } }, 1000); }, showScreen = function() { $ele.screenLevel.text(pagination.absCurrentLevel); $ele.screenTotalLevels.text(pagination.absTotalLevels); $ele.screenScreen.text(pagination.absCurrentScreen);

$ele.screenTotalScreens.text(pagination.absTotalScreens); $ele.screenScore.text(score); $ele.screenImg.attr('src', current.img); $ele.screenCoords.attr('coords', current.coords); pagination.stopWatchTime = Date.now(); $ele.screenHolder.show(); timer = setTimeout(function() { current.userTimeResponse = current.timeLimit - current.timeLimit; current.score = current.userTimeResponse score += current.userTimeResponse; feedback = current.fail; missed++; showFeedback(); }, current.timeLimit); }, userFound = function() { clearTimeout(timer); current.userTimeResponse = Date.now() - pagination.stopWatchTime; current.score = Math.round((current.timeLimit - current.userTimeResponse) / 10); score += Math.round((current.timeLimit - current.userTimeResponse) / 10); feedback = current.feedback + ' ' + (current.userTimeResponse / 1000) + ' seconds!'; showFeedback(); }, showFeedback = function() { $ele.screenHolder.hide(); $ele.feedbackMsg.text(feedback); $ele.feedbackScore.text(current.score + ' points'); $ele.feedbackHolder.show(); $.ajax({ 114

type: 'POST', url: './postFlat.php', data: { id: uuid, batchName: pagination.batch, pageNumber: pagination.batchScreen, responseTime: current.userTimeResponse } }) }, showBatchComplete = function() { $ele.batchScore.text(score); $ele.batchMissed.text(missed); $ele.batchFeedbackCount.text(pagination.batchCount + 1); $ele.batchFeedbackHolder.show(); }, getResponses = function() { var res = {} count = 0; Object.keys(batches).forEach(function(key) { res[key] = { fullbreakdown: [], quickest: 0, slowest: 0 }; batches[key].forEach(function(obj) { count++; if (res[key].slowest === 0) { res[key].slowest = obj.timeLimit; res[key].quickest = obj.timeLimit; } if (obj.userTimeResponse < res[key].quickest) { res[key].quickest = obj.userTimeResponse; } else { if (obj.userTimeResponse > res[key].slowest) { res[key].slowest = obj.userTimeResponse; } } res[key].fullbreakdown.push({ responseTime: obj.userTimeResponse, score: Math.round(obj.userTimeResponse * 10), timeLimit: obj.timeLimit }); }, this); }, this); return { id: uuid, data: res, totalScreens: count }; }; return { setup: setup, getResponses: getResponses, userFound: userFound, nextScreen: function() { pagination.currentScreen++; if (pagination.batchScreen < batches[pagination.batch].length - 1) { pagination.batchScreen++; pagination.absCurrentScreen++; 115

startCountdown(); } else { pagination.currentlevel++; $ele.feedbackHolder.hide(); if (pagination.batchCount < Object.keys(batches).length - 1) { showBatchComplete(); } else { $ele.thanksScore.text(score); $ele.thanksHolder.show(); } } }, nextBatch: function() { pagination.batchCount++; pagination.batchScreen = 0; pagination.count = pagination.defaultCount; pagination.batch = Object.keys(batches)[pagination.batchCount]; pagination.absCurrentLevel++; pagination.absCurrentScreen = 1; pagination.absTotalScreens = Object.keys(batches[pagination.batch]).length; startCountdown(); }, start: function() { startCountdown(); }, showHelp: function() { $('#' + pagination.batch).toggle(); } }; })();

116

Truncated sample of the image preloader var images = new Array() function preload() { for (i = 0; i < preload.arguments.length; i++) { images[i] = new Image() images[i].src = preload.arguments[i] } } preload( "http://flat.kjoconnell.com/images/2x1_flat01.png", "http://flat.kjoconnell.com/images/2x1_flat02.png", )

117

Appendix C

Wireframe testing peer tests: consent to take part in research

I ------voluntarily agree to participate in this research study.

I understand that even if I agree to participate now, I can withdraw at any time or refuse to answer any question without any consequences of any kind.

I understand that I can withdraw permission to use data from my testing within two weeks after the test, in which case the material will be deleted.

I have had the purpose and nature of the study explained to me in writing and I have had the opportunity to ask questions about the study.

I understand that participation involves user testing of a wireframe prototype including answering pre and post-test questionnaires.

I understand that I will not benefit directly from participating in this research.

I understand that all information I provide for this study will be treated confidentially.

I understand that in any report on the results of this research my identity will remain anonymous. This will be done by disguising any details of my participation which may reveal my identity.

I understand that disguised extracts from my interview may be quoted in a dissertation.

I understand that signed consent forms will be retained at 11 The Rise, Woodpark, Ballinteer until the end of May 2018.

I understand that a transcript of my questionnaires in which all identifying information has been removed will be retained for two years after project completion: the end of May 2020.

I understand that under freedom of information legalisation I am entitled to access the information I have provided at any time while it is in storage as specified above.

I understand that I am free to contact any of the people involved in the research to seek further clarification and information.

By checking the box and typing your name below, you are agreeing that you have read and understand the above and consent to submitting your application electronically.

Name of research participant

------☐

Date:

I believe the participant is giving informed consent to participate in this study

Kevin O’Connell

Date: 21 January 2018

118

Appendix D

Wireframe testing peer tests: task instructions

Hi,

Thanks for taking the time to assist me. I am researching the visual perception of user interfaces, specifically how people perceive shaded and non-shaded interface elements. I have created a low- fidelity mock-up of an application that I intend to use for more detailed testing, and I am looking for feedback on the mock-up.

Your participation in the evaluation should take no longer than 10 to 15 minutes, and is completely anonymous. You do not have to complete the testing – you can abandon it at any stage.

If you agree to participating in the testing process, please read and sign the attached consent form, and mail it back to me.

The testing is divided into three sections:

• A pre-test questionnaire • Evaluation of a low-fidelity mock-up • A post-test questionnaire

All three sections are available on-line:

• Pre-test questionnaire: https://kevoc.typeform.com/to/po8bcr • Low-fidelity mock-up: https://xd.adobe.com/view/79639293-7424-4cbe-ab1a- 657bcf1d01b0/ • Post-test questionnaire: https://kevoc.typeform.com/to/CnJFRb

Instructions

Please work through the sections in sequence.

The questionnaires can be answered using any device, but please carry out the evaluation of the mock-up using a desktop or laptop computer, and interact with the mock-up using a mouse. Run the evaluation using the highest resolution available, ideally at 1920x1080 or above. Note that there is an option to maximise the mock-up in your browser.

Notes about the mock-up

The mock-up represents an application that tests the visual perception of shaded and unshaded interface elements. In the real application, the visual perception tests are presented in the form of a reaction game, and the user will progress through 70 to 90 screen arrangements. The mock-up contains a small subset of these arrangements.

119

The mock-up is semi-interactive, and most buttons work. There is no option to move backwards through the mock-up, though you can reload the page if you want to run through it multiple times (though you don’t have to).

The low-fidelity mock-up contains screens that mimic a 3 second count-down. These screens do not function, so when you see these screens, simply click on any part of the screen to progress to the next screen.

Please contact me when you have completed the testing. If you have any questions regarding any aspect of this study, you can email me at [email protected]

Thank you for your co-operation,

Kevin

120

Appendix E

Wireframe testing peer tests: pre-test questionnaire

Age (broken into 5 sections)

Sex (m/f/other)

Computer proficiency (novice/competent/expert)

How often do you use the internet? (daily/weekly/monthly)

What devices do you use to access the internet? (PC/Laptop/Tablet/Phone)

Rate the quality of your vision (Poor/Normal or corrected normal)

Do you suffer from color blindness? (y/n)

If yes, please selected which form you suffer from:

• Anomalous Trichromacy • Dichromacy • Monochromacy

Do you play online games? (y/n)

If yes, select which of the genres you play (choose any that apply) Multiple choice

• Action • Adventure • Casual • Educational • Puzzle • Role-playing • Simulation • Strategy • Sports

With respect to computer or online games, rate the importance to you of the following (Likert scale)

• Game instructions • Game progress • Game feedback • Leader boards • Visual design • Ease of use/controls • Ease of learning • Challenge • Help and documentation • In-game awards

121

Appendix F

Wireframe testing peer tests: post-test questionnaire

Post-test questions: based on PSSUQ, plus a number of open-ended questions

Thank you for evaluating the low-fidelity mock-up.

Please rate your satisfaction with the prototype.

Try to respond to all items. Some questions are optional and can be skipped. For items that are not applicable, select NA. 1. Overall, I am satisfied with how easy it is to use this system. 2. It was simple to use this system. 3. I was able to complete the tasks and scenarios quickly using this system. 4. I felt comfortable using this system. 5. It was easy to learn to use this system. 6. I believe I could become productive quickly using this system. 7. The system gave error messages that clearly told me how to fix problems. 8. Whenever I made a mistake using the system, I could recover easily and quickly. 9. The information (such as online help, on-screen messages, and other documentation) provided with this system was clear. 10. It was easy to find the information I needed. 11. The information was effective in helping me complete the tasks and scenarios. 12. The organization of information on the system screens was clear. 13. The interface of this system was pleasant. 14. I liked using the interface of this system. 15. This system has all the functions and capabilities I expect it to have. 16. Overall, I am satisfied with this system.

7 point Likert scale, from strongly disagree to strongly agree

List the most negative aspects:

1.

2.

3.

List the most positive aspects:

1.

2.

3.

Any other comments?

122

Appendix G

MTurk wireframe testing instructions

I have created a low-fidelity mock-up of an application that I intend to use for more detailed testing, and I am looking for feedback on the mock-up. The testing comprises three sections: a pre-test questionnaire, the test, and a post-test questionnaire.

Instructions

Please work through the sections in sequence.

Please carry out the evaluation of the mock-up using a desktop or laptop computer, and interact with the mock-up using a mouse. Run the evaluation using the highest resolution available, ideally at 1920x1080. Running the evaluation on a higher resolution may cause difficulty in seeing the mock-up user interface. Note that there is an option to maximise the mock-up in your browser.

Notes about the mock-up

The mock-up represents an application that tests the visual perception of shaded and unshaded interface elements. In the real application, the visual perception tests are presented in the form of a reaction game, and the user will progress through 70 to 90 screen arrangements. The mock-up contains a small subset of these arrangements.

The mock-up is semi-interactive, and most buttons work. There is no option to move backwards through the mock-up.

The mock-up contains screens that mimic a 3 second count-down. These screens do not function, so when you see these screens, simply click on any part of the screen to progress to the next screen.

At the end of the testing process you will receive a code to paste into the box below to receive credit for completing the testing and questionnaires.

Make sure to leave this window open as you complete the survey. When you are finished, you will return to this page to paste the code into the box.

123

Appendix H

Summative testing instructions

I am a final year student, completing a Masters in User Experience Design at the Dún Laoghaire Institute of Art, Design and Technology, Ireland.

I have created a web application that tests visual perception, and I am looking for feedback on the application functionality. The testing comprises three sections: a pre-test questionnaire, the test, and a post-test questionnaire.

Consent to take part in the research

By accepting the HIT, you voluntarily agree to participate in this research study.

You understand that even if you agree to participate now, you can withdraw at any time or refuse to answer any question without any consequences of any kind.

You understand that participation involves user testing of a web application including answering pre and post-test questionnaires.

You understand that your participation in this research will result in a small monetary reward.

You understand that all information you provide for this study will be treated confidentially.

You understand that in any report on the results of this research your identity will remain anonymous. This will be done by disguising any details of your participation which may reveal your identity.

You understand that disguised extracts from your questionnaire may be quoted in a dissertation.

By accepting this HIT, you are agreeing that you have read and understand the above and consent to participating in the research study.

Instructions

Please work through the sections in sequence. If possible, try to complete all tasks in one sitting.

Please carry out the evaluation of the mock-up using a desktop or laptop computer, and interact with the application using a mouse. The application has been optimised to run at a resolution of 1920x1080 pixels. I recommend running the test using the Chrome browser. Running the evaluation at a higher or lower resolution may cause difficulty in seeing the user interface. If you cannot run the web application using a desktop or laptop computer using a mouse at the target resolution, please do not accept the HIT. Notes about the web application

The application tests the visual perception of unshaded interface elements. In the application, the visual perception tests are presented in the form of a reaction game, and the user will progress through 94 screen arrangements. Initial arrangements display simple shapes, the final 8 arrangements display web-site screen-grabs. A three-second countdown shows before the screen arrangements are displayed, and instructions are provided before each set of tests that describe what to interact with on screen. It is important to review the instructions before commencing each test.

124

Note that there is no option to move backwards through the application.

At the end of the testing process you will receive a code to paste into the box below to receive credit for completing the testing and questionnaires.

125

Appendix I

Summative testing: pre-test questionnaire

Age (Under 18/19-29/30-45/45+)

Gender (m/f/other/prefer not to say)

Computer proficiency (novice/competent/expert)

How often do you use the internet? (daily/weekly/monthly)

What devices do you use to access the internet? (PC/Laptop/Tablet/Phone)

Rate the quality of your vision (Poor/Normal or corrected normal)

Do you suffer from color blindness? (y/n)

If you answered 'yes' to the previous question, from the list below please select what you suffer from. (If you answered 'no' to the previous question, you can skip this question and scroll to the next question.):

• Anomalous Trichromacy • Dichromacy • Monochromacy

Do you play computer games (desktop, mobile or console)? (y/n)

If yes, select which of the genres you play? Select all that apply. (If you answered 'no' to the previous question, you can skip this question and move to the next question.) (Choose any that apply) Multiple choice

• Action • Adventure • Casual • Educational • Puzzle • Role-playing • Simulation • Strategy • Sports

With respect to computer or online games, rate the importance to you of the following (Likert scale)

• Game instructions • Game progress • Game feedback • Leader boards • Visual design • Ease of use/controls • Ease of learning • Challenge • Help and documentation

126

• In-game awards

Rate your impression of how easy it is to find interactive elements (buttons, links) on modern user interfaces

What is your preference for how interactive elements (buttons) are presented on modern user interfaces (Flat, ‘3-D’, N/A)

What are the three most negative aspects of modern user interfaces?

What are the three most positive aspects of modern user interfaces?

127

Appendix J

Summative post-test questions: based on PSSUQ, plus a number of open-ended questions

Thank you for testing the application. Please rate your satisfaction with the application. Try to respond to all items. Where you have no opinion, please select NA. (7 point Likert scale, from strongly disagree to strongly agree)

17. Overall, I am satisfied with how easy it is to use this system. 18. It was simple to use this system. 19. I was able to complete the tasks and scenarios quickly using this system. 20. I felt comfortable using this system. 21. It was easy to learn to use this system. 22. I believe I could become productive quickly using this system. 23. The system gave error messages that clearly told me how to fix problems. 24. Whenever I made a mistake using the system, I could recover easily and quickly. 25. The information (such as online help, on-screen messages, and other documentation) provided with this system was clear. 26. It was easy to find the information I needed. 27. The information was effective in helping me complete the tasks and scenarios. 28. The organization of information on the system screens was clear. 29. The interface of this system was pleasant. 30. I liked using the interface of this system. 31. This system has all the functions and capabilities I expect it to have. 32. Overall, I am satisfied with this system.

How difficult was it to access the in-app instructions? How easy was it to locate the green shapes? Was sufficient time given to locate the shape? How easy was it to locate the target on the screen-grab screens? Was sufficient time given to locate the target on the screen-grab screens? Using three words, describe your perception of the web application. List the most negative aspects: 1.

2.

3.

List the most positive aspects:

1.

2.

3.

Any other comments?

128

Appendix K

MTurk final testing instructions

I am a final year student, completing a Masters in User Experience Design at the Dún Laoghaire Institute of Art, Design and Technology, Ireland.

I have created a web application that tests visual perception. The testing comprises three sections: a pre-test questionnaire, a task, and a post-test questionnaire.

Important

The application has been optimised to run at a resolution of 1920x1080 pixels. Running the evaluation at a higher or lower resolution may cause difficulty in seeing the user interface. If you cannot run the web application using a desktop or laptop computer using a mouse, and at the target resolution, please do not accept the HIT, I WILL REJECT IT. Note that the testing involves the recording of screen interactions, and the testing platform requires the installation of a browser plugin. If you have multiple monitors, ensure that the correct screen and browser window is being recorded.

Consent to take part in the research

By accepting the HIT, you voluntarily agree to participate in this research study.

You understand that even if you agree to participate now, you can withdraw at any time or refuse to answer any question without any consequences of any kind (though you will forfeit the assignment reward).

You understand that participation involves user testing of a web application including answering pre and post-test questionnaires.

You understand that your participation in this research will result in a small monetary reward.

You understand that all information you provide for this study will be treated confidentially.

You understand that in any report on the results of this research, your identity will remain anonymous. This will be done by disguising any details of your participation which may reveal your identity.

You understand that disguised extracts from your questionnaire may be quoted in a dissertation.

By accepting this HIT, you are agreeing that you have read and understand the above and consent to participating in the research study.

Instructions

Please work through the sections in sequence. If possible, try to complete all parts of the test in one sitting.

Please carry out the evaluation of the application task using a desktop or laptop computer, and interact with the application using a mouse. Maximise the browser window. I recommend running the test using the Chrome browser.

Notes about the web application

129

The application tests the visual perception of interface elements. In the application, the visual perception tests are presented in the form of a reaction game, and the user will progress through 96 screen arrangements. Initial arrangements display simple shapes, the final 10 arrangements display web-site screen-grabs. A three-second countdown shows before the screen arrangements are displayed, and instructions are provided before each set of tests that describe what to interact with on screen. It is important to review the instructions before commencing each test. Note that there is no option to move backwards through the application.

At the end of the testing process you will receive a code to paste into the box below to receive credit for completing the testing and questionnaires. As well as requiring the code, you will also need to paste in your MTurk worker ID, plus the score you will receive at the end of the application task.

130

Appendix L

IADT hosted testing: PPT slide deck detailing study scope, instructions and task.

131

132

Appendix M

Statistical analysis of test tasks

Mean response time for test tasks: null hypothesis

Null hypothesis for Independent t-test: “There is no significant difference in response times between shaded test tasks and flat test tasks”.

Null hypothesis for Mann-Whitney U test: “there is no statistically significant difference between the Median response times for shaded test tasks and flat test tasks”

Test for normality

Prior to testing, data sets were tested for normality. If data sets were normal, an independent t- test was carried out, otherwise a Mann-Whitney U test was carried out.

Mean response times for test tasks (all targets)

Normality

A Shapiro-Wilk’s test (p > .05) and a visual inspection of their histograms, abnormal Q-Q plots and box plots showed that the mean response times were not normally distributed for both flat and shaded tests, with a skewness of 2.555 (SE = 0.246) and a kurtosis of 7.096 (SE = .488) for flat and a skewness of 2.418 (SE = 0.246) and a kurtosis of 6.320 (SE = .488) for shaded.

The null hypothesis for this test of normality, is that the data are normally distributed. The null hypothesis is rejected if the p-value is below 0.05.

In both cases, the p-values are below 0.05, thus in terms of the Shapiro-Wilk test, it is assumed that the data are not normally distributed, and the null hypothesis for normality is rejected.

Independent t-test

Given that flat and shaded data are not normally distributed, an independent t-test was not carried out on the data.

Mann-Whitney U test

A Mann-Whitney test indicated that there was a significant difference in response time between shaded (Mdn = 833.33) and Flat (Mdn = 894.88), U = 3643, p = 0.012. Therefore, the null hypothesis for the Mann-Whitney U test is rejected.

133

Mean response times for test tasks (abstract targets only)

Normality

A Shapiro-Wilk’s test (p > .05) and a visual inspection of their histograms, normal Q-Q plots and box plots showed that the mean response times were approximately normally distributed for both flat and shaded tests, with a skewness of -0.005 (SE = 0.260) and a kurtosis of -0.081 (SE = 0.514) for flat and a skewness of -0.051 (SE = 0.260) and a kurtosis of -0.479 (SE = 0.514) for shaded.

The null hypothesis for this test of normality, is that the data are normally distributed. The null hypothesis is rejected if the p-value is below 0.05.

In both cases, the p-values are above 0.05, thus in terms of the Shapiro-Wilk test, it is assumed that the data are approximately normally distributed. As a result it meets the assumptions required for an independent t-test.

Independent t-test

An independent-samples t-test was conducted to compare response times for the shaded and flat target conditions.

There was a significant difference in the scores for shaded response (M = 815.39, SD = 61.79) and Flat response (M = 845.56, SD = 67.92) conditions; t(170) = -3.048, p = 0.003, two-tailed. Therefore, the null hypothesis for the independent t-test test is rejected. This result suggests that as each test set is complete, there is a significant difference in response times between test subjects who completed shaded testing and test subjects who completed flat testing.

Mean response times for test tasks (screen grab targets only)

Normality

A Shapiro-Wilk’s test (p > .05) and a visual inspection of their histograms, normal Q-Q plots and box plots showed that the mean response times were approximately normally distributed for both flat and shaded tests, with a skewness of -0.058 (SE = 0.687) and a kurtosis of -1.423 (SE = 1.334) for flat and a skewness of 0.380 (SE = 0.687) and a kurtosis of -0.693 (SE = 1.334) for shaded.

The null hypothesis for this test of normality, is that the data are normally distributed. The null hypothesis is rejected if the p-value is below 0.05.

134

In both cases, the p-values are above 0.05, thus in terms of the Shapiro-Wilk test, it is assumed that the data are approximately normally distributed. As a result it meets the assumptions required for an independent t-test.

Independent t-test

An independent-samples t-test was conducted to compare response times for the shaded and flat screen grab target conditions.

There was no significant difference in the scores for Shaded screen grab target response (M = 1264.57, SD = 158.33) and Flat screen grab target response (M = 1318.98, SD = 212.25) conditions; t(18) = -0.650, p = 0.524, two-tailed. Therefore, the null hypothesis for the independent t-test test is accepted. These results suggest that when each tester completed all screen grab tests, there is no difference in response times between test subjects who completed shaded testing and test subjects who completed flat testing.

135

Appendix N

Link to project artefacts: https://drive.google.com/open?id=1civJO6RFAae-_kT-SLINdEaKbD3hAeU1

136