Eco (Bio)informatics Website Development 101 A primer on Creating Biologically‐ based Websites

1 Introduction

Modern, biologically oriented websites have evolved rapidly in the last ten years; and will continue to evolve at least as rapidly for the foreseeable future.

Web 2.0: wikis, crowd sourcing, blogs, Flickr, YouTube Interactive queries and graphing. Static tables and ‘click here to download’ interactivity Flash &The Java powered next interactivity big andthing…? effects 2 Fundamentals

• Know your site’ s mission • Know your audience • • Discovery • Web 2.0 • Copyright and ownership issues • Getting Started 3 What is Your Mission?

• What is the purpose(s) of your website • Outreach • Research • Communication • Data portal • Analytical tool • Education • Which leads us to the next question…

4 Who is your Audience?

• Identify your audience(s) • Major audience groups include: • Elementary school • High school • University • Researcher • General public • Decision makers

5 Who is your Audience?

• Each group requires different language sets, different assumptions about prior kkldnowledge, different font/color styles, tools • TtiTargeting multipl e audiences with one website is possible, but difficult to do well • Multiple entrance paths and templates can be used

6 Taxonomy

• The single, most difficult issue for managing biological data sets • Species name <> Species Concept • A species name is a particular label that someone has applied to a particular species concept • A given species concept may have many names (synonyms)

7 Taxonomy

A species name is defined by a Latin binomial and an authorit y. The authority is critical as it defines who originally described the species.

Currently Accepted name: melanochlorus Cope, 1877

Synonyms: melanochlora Cope, 1877 Olloti smelhllanochlora Cope, 1877 8 Taxonomy

A species name may also include lower rank names that ddfiefine a variety and/or sub‐species. The lower rank name(s) also have authorities

Swartzia simplex (Sw.) Spreng. var. continentalis Urb.

9 Taxonomy

Ideally, there should be a one‐to‐one match between name and concept, unfftortunat tlely, the world is not ideal… • Many names are being revised due to misspellings of the original Latin name • Disagreements about what constitutes a species (lumpers vs. splitters, geneticists vs. naturalists, ecologists vs. taxonomists) • Disagreements among major plays such as ITIS, GBIF, Species 2000 and group‐specific sites 10 Taxonomy

DNA and the new Taxonomy • DNA analysis has led to major revisions of all major kingdoms. • Changes are being made at all taxonomic levels, from phylum on down • Angiosperm Phylogenetic Group (APG) pretends to reorganize the entire Angiosperm group • Next 10 years will see major shake‐ups of many major taxonomic groups 11 Taxonomy

Out of Chaos, order… • Rule #1. Accept that there is no universal agreement and move on • Rule #2. Pick a source or sources and stick with them • Rule #3. Manage taxonomy separately from metadata • Rule #4. Use species catalog numbers rather than names to link objects together 12 Taxonomy

Synonyms can be handled using four variables: • Spnumber Species name catalog # • Taxstat Taxonomic status of name • Accepted • • Excluded • Incomplete

• Synof Spnumber of Accepted name for this species synonym • Synonyms Synonym(s) for this accepted name 13 Taxonomy

I want a picture of Taxonomic Database Algus Grenus Algus grenus Name Spnumber Taxstat Synof Synonyms Algus Grenus 5212 SYN 1234 . . Algus verdus 1234 ACC 5212

Spnumber = 5212, 1234

Photo Database 14 Discovery

Build for Discovery… Putting something on the web has little value if no one can find it • Metadata • Optimizing Site Navigation • Understand how search engines work • Understand how your audience thinks

15 Discovery

Metadata It’s more than just information about photos… • it’s information about every object that you want people to know about in the future • it’s the primary method to rigorously document the who, what, when, where and how of an object and to make it machine searchable • The best metadata takes the available information and atomizes it as much as possible

16 Discovery This thing found here doing this on this date by this person & verified by

17 Discovery

Only atomized information can be efficiently be searched. Reserve free text information for unsearched titles and comments.

Control the vocabulary of the information used in databases. Any spelling difference, no matter how minor, will be interpreted as different.

Controlled vocabulary makes it easier for the user to search for and discover information. 18 Discovery Navigation: Multiple access routes Traditional Linear Navigation Design

Project 1 "He's intelligent, but not experienced. His pattern indicates two Project 2 dimensional Home Page thinking…" Project 3

Project 4 19 Discovery Navigation: Multiple access routes

Multidimensional Navigation Design

• Use persistent tabs and menus • Search boxes • Embedded hyperlinks Homepage • Anticipate user navigation behavior

20 Discovery

Navigation: Minimizing Clicks

Always aim to minimize the average number of clicks that a user should need to go from any page on your web site to any other.

Ideally, a user should not need more than 3‐4 clicks to go from anywhere to anywhere else.

21 Web 2.0

What is this Web 2.0 thing?

The answer depends on who you ask

22 Web 2.0

“Web 2.0” refers to the second generation of web development and web design that facilitates information sharing and collaboration on the World Wide Web.

Examples include social‐networking sites, video‐ sharing sites, wikis, blogs, mashups and folksonomies. (Wikipedia) 23 Web 2.0

•Web 2.0 websites allow users to do more than just retrieve information. • Users can own the data on a Web 2.0 site and exercise control over that data. •These sites may have an "Architecture of participation" that encourages users to add value to the application as they use it. This stands in contrast to traditional websites, the sort that limited visitors to viewing and whose content only the site's owner could modify. •Web 2.0 sites often feature richer, user‐friendly interfaces 24 Web 2.0

Popular examples of Web 2.0 websites include: Wikipedia Flickr YouTube eBddBuddy Digg TravBuddy 25 Web 2.0

• Search. The ease of finding information through keyword search. • Links. Ad‐hoc guides to other relevant information. • Authoring. The ability to create constantly updating content over a platform that is shifted from being the creation of a few to being constantly updated, interlinked work. In wikis, the content is iterative in the sense that users undo and redo each other's work. In blogs, content is cumulative in that posts and comments of individuals are accumulated over time.

26 Web 2.0

• Tags. Categorization of content by creating tags: simple, one‐word user‐determined descriptions to facilitate searching and avoid rigid, pre‐made categories. • Extensions. Powerful algorithms that leverage the Web as an application platform as well as a document server. • Signals. The use of RSS* technology to rapidly notify users of content changes.

*(most commonly translated as "Really Simple Syndication," but sometimes "Rich Site Summary") 27 Copyright & Ownership

• Copyright is an important issue • Copyright law is complex, often vague, and varies considerably between countries • Ignorance is not an excuse –get informed

28 Copyright & Ownership

Who owns this file? • Anything produced using US Federal funds is considered to be Public Domain and not subject to copyright. In general, the funding agent usually has copyright. • Otherwise, copyright is automatic (under US law)

29 Copyright & Ownership

Objects can be re‐copyrighted by others only if and when ‘significant new’ artistic content has been added

Contrast enhancement, color corrections, sharpening, etc., do NOT constitute new artistic content

30 Copyright & Ownership

Creative Commons Licenses

Creative Commons is a nonprofit corporation dedicated to making it easier for people to share and build upon the work of others, consistent with the rules of copyright.

CC provides free licenses and other legal tools to mark creative work with the freedom the creator wants it to carry, so others can share, remix, use commercially, or any combination thereof. 31 Copyright & Ownership

There are Six current License agreements

1. Attribution 2. Attribution, No derivatives 3. Attribution, Non‐commercial, No derivatives 4. Attribution, Non‐commercial 5. Attribution, Non‐commercial, Share‐alike 6. Attribution, Share‐alike

32 Copyright & Ownership

Attribution: You let others copy, distribute, display, and perform your copyrighted work ‐ and derivative works based upon it ‐ but only if they give you credit. Nooconcomme ecarcial: You let others copy, distribute, display, and perform your work ‐ and derivative works based upon it ‐ but for noncommercial purposes only. No Derivative Works: You let others copy, distribute, display, and perform only verbatim copies of your work, not derivative works based upon it. Share Alike: You allow others to distribute derivative works only under a license identical to the license that governs your work. 33 Copyright & Ownership

Fair use is a doctStates copyright law that allows limited use of copyrighted material without requiring permission from the rights holders, such as use for scholarship or review. It provides for the legal, non‐licensed citation or incorporation of copyrighted material in another author's work under a four‐factor balancing test. The term "fair use" originated in the United States, but has been added to Israeli law as well; a similar principle, fair dealing, exists in some other common law jurisdictions. Civil law jurisdictions have other limitations and exceptions to copyright. ((p)Wikipedia) 34 rine in United Copyright & Ownership

In determining whether the use made of a work in any particular case is a fair use, the factors to be considered include:

1. the pppurpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; 2. the nature of the copyrighted work; 3. the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and 4. the effect of the use upon the potential market for or value of the copyrighted work.

35 Getting Started

Once you know your site’ s mission & have an idea about who your audience is, do the following: 1. Sketch out the major logical blocks of your site 2. Search the web for similar sites and make a list of what works and what doesn’t 3. Imitate and copy the good stuff(people usually like it when you ‘steal’ their design ideas) 4. Avoid their mistakes 5. If you are creating a website for someone else, frequently check with the PI regarding design and programming diidecisions 36 Getting Started

If you are going to make many websites: 1. Invest time in creating tools that can be shared between sites 2. Invest time in adoppgting a Content Management System (CMS) 3. Design your websites so that they can share data 4. Select a style and stick with it (CSS)

37 Getting Started

No matter how many websites you have: 1. Document what you are doing • Internal programming comments (you can never put too much) • External programming documentation listing all major program blocks, procedural calls, parameters passed, etc. • Database documentation: variables (types & definitions) and general content 2. Back up often or bad things will happen to you 3. For really big projects, consider implementing roll‐back technology 38 Getting Started

What tools should you use? The most common suite of tools for low‐budget, non‐commercial operations include: • MSQLMySQL databases • PhP programming language (and/or PERL) • Flash and/or Java script • Linux operating system • Apache server

39 Getting Started

What other tools might you use? There are many good tools available, both commercial and non‐commercial (open source): • ArcGIS by ESRI, Grass, Mininesota MapServer • Drupal CMS • Wiki software • Blogging software • Graphing applications There are lots of arguments for and against commercial and open source software. There is also the possibility of creating your own software tools. Mixed models often work well. 40 Questions and Comments

41