Search Tools Taporware Find Collocates (Plain) Plain-Text

Total Page:16

File Type:pdf, Size:1020Kb

Search Tools Taporware Find Collocates (Plain) Plain-Text

HELP?

Browse or search for tools that you can use. More Info

Type Source Format New And Popular All A-Z

Search

Search

Search - Text Gathering - List and Statistical - Visualization - Editing - Miscellaneous

Search Tools

TAPoRware Find Collocates (Plain)

PLAIN-TEXT

Collocation tool takes a word from the user and returns all of the words directly before and directly after it based on the given context. Results are listed alphabetically, by frequency, or by Z-score (an indication of how far and in what direction that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation). NOTE: If your select context of "Words" with long context length or you select other context: Lines, Sentences or Paragraph, it is very likely that the specified pattern appears in each text of corcordance more than once. If this occurs, the words of collocation will be counted more than once as well. So the counts of collocates are not accurate. For the same reason, the zScore values are not accurate. We will find way to fix this later.

Detailed Info - TryIt - Website

TAPoRware Date Finder (Plain)

PLAIN-TEXT

This tool extracts dates from an Plain text document. Dates can be limited to all dates, years, months, weeks, seasons, North American holidays or user defined dates (e.g. specific month(s), week(s), season(s), holiday(s)).

Detailed Info - TryIt - Website

TAPoRware Find Co-occurrence (Plain)

PLAIN-TEXT

Co-occurrence tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern, TAPoR will search the document for anywhere that the two patterns are within the user-specified limits of words, sentences, or lines.
Note: If your select context of "Words" with long context length or you select other context: Lines, Sentences or Paragraph, it is very likely that the specified pattern appears in each text of corcordance more than once. If this occurs, the words of collocation will be counted more than once as well. So the counts of collocates are not accurate. For the same reason, the zScore values are not accurate. We will find way to fix this later.

Detailed Info - TryIt - Website

TAPoRware Synonym Finder

HTML, PLAIN-TEXT, XML

This tool uses the Roget's Interactive Thesaurus services to get the synonyms/antonyms of a given word

Detailed Info - TryIt - Website

TAPoRware Date Finder (XML)

TEI, XML

This tool extracts dates from an XML document. Dates can be limited to all dates, years, months, weeks, seasons, North American holidays or user defined dates (e.g. specific month(s), week(s), season(s), holiday(s))

Detailed Info - TryIt - Website

TAPoRware Date Finder (HTML)

HTML

This tool extracts dates from an HTML document. Dates can be limited to all dates, years, months, weeks, seasons, North American holidays or user defined dates (e.g. specific month(s), week(s), season(s), holiday(s))

Detailed Info - TryIt - Website

TAPoRware Find Collocates (HTML)

HTML

The collocation tool takes a word from the user and returns all of the words directly before and directly after it based on the given context and returns the results listed alphabetically, by frequency, or by Z-score (an indication of how far and in what direction that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation).

Detailed Info - TryIt - Website

TAPoRware Find Collocates (XML)

TEI, XML

Collocation tool takes a word from the user and returns all of the words directly before and directly after it based on the given context. The results are listed alphabetically, by frequency, or by Z-score (an indication of how far and in what direction that item deviates from its distribution's mean, expressed in units of its distribution's standard deviation).

Detailed Info - TryIt - Website

TAPoRware Find Co-occurrence (HTML)

HTML

Co-occurrence tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern, TAPoR will search the document for anywhere the two patterns are within the user-specified limits of words, sentences, or lines. If desired, the results can be narrowed to include words only found within certain tags.

Detailed Info - TryIt - Website

TAPoRware Find Co-occurrence (XML)

TEI, XML

Co-occurrence tool looks for two words a certain distance apart from one another. By entering a primary and secondary pattern, TAPoR will search the document for anywhere where the two patterns are within the user-specified limits of words/sentences/lines or surrounding elements.

Detailed Info - TryIt - Website

TAPoRware Find Words - Concordance (HTML)

HTML

Find Concordance (HTML) tool can find text anywhere in an HTML document. The search can be narrowed to specified tags. All results are returned with a concordance of either words, sentences, or lines.

Detailed Info - TryIt - Website

TAPoRware Find Words - Concordance (Plain)

PLAIN-TEXT

The Concordance (Text) tool can find text anywhere in a text document. The search can also be used to view a concordance of either words, sentences, or lines surrounding the result.

Detailed Info - TryIt - Website

TAPoRware Find Words - Concordance (XML)

TEI, XML

Find Concordance (XML) tool can find text anywhere in an XML document using the Find Text tool. The search can be narrowed to specified elements or attributes, and all results are returned with a concordance of either words/sentences/lines or surrounding elements.

Detailed Info - TryIt - Website

TAPoRware Acronym Finder

HTML, PLAIN-TEXT, XML

This tool tries its best to find all the possible acronyms and their original names in a submitted text. However, for some acronym like IGO with the original name of Intergovernmental Organization, the original name can not be identified.

Detailed Info - TryIt - Website

TAPoRware CAPs Finder HTML, PLAIN-TEXT, XML

This tool tries to find all the CAPs in the submitted text. It will list all single CAP except the first words of each sentence following the more than one CAP phrases.

Detailed Info - TryIt - Website

TAPoRware Words Distribution -- Weighted Centroid

HTML, PLAIN-TEXT, XML

This tool displays a circular graph based on word distribution data or the tablet word distribution data. The text is divided up into an arbitrary number of units, which are positioned around the circumference of the circle in a clockwise sequence. The more times a word appears in a particular text unit, the closer the word will be to that unit in the circle. If a word appears an equal number of times in all units, it be located in the centre of the circle. Words are colour coded based on the amount of times they appear in the text as a whole. Blue words have the highest word count. Rolling over a word will display lines representing its connections to the units. Clicking a word will keep its lines visible after you move the mouse off of it. Click the word again to rmeove the lines. The darker the line, the more times the word was found in that unit. Additionally, all the words found in the graph are listed on the left side of the applet. There is a scroll bar for viewing the words, should they extend past the bottom of the applet. This list of words features the same rollover and clicking functionality as those found in the the graph itself. This tool uses the processing library. * This tool requires the JRE (v1.4.2 and up) in order to work properly.

Detailed Info - TryIt - Website

TAPoRware Principal Component Analysis Tool (Plain)

PLAIN-TEXT

This tool uses principal components analysis method to analyze words relation among the user specified text units

Detailed Info - TryIt - Website

Text Gathering Tools

TAPoRware Extract Text (XML)

TEI, XML

Extract Text from XML Documents tool can extract the full body of text from an XML Document. This tool can also pull the text from user-specified elements or attributes.

Detailed Info - TryIt - Website

TAPoRware Googlizer

HTML, PLAIN-TEXT, XML

This tool calls google search engine directly and give different results. The rule of search terms in google can be applied here directly

Detailed Info - TryIt - Website TAPoRware Extract Text (HTML)

HTML

Retrieve html text based on user given html tags

Detailed Info - TryIt - Website

TAPoRware Text Aggregator

HTML, PLAIN-TEXT, XML

This tool aggregates multiple text from different locations and different format into a single text. The text source can be pointed by valid URLs, your local file or text you typed in.

Detailed Info - TryIt - Website

List and Statistical Tools

TAPoRware Summarizer (Plain)

PLAIN-TEXT

Extract statistic info , high frequency words list, concordance etc.

Detailed Info - TryIt - Website

TAPoRware Tokenizer (XML)

TEI, XML

This tool splits an XML document at specified points, or tokens. These tokens can be words, lines, sentences, paragraphs, characters, patterns, or tags. The results can be listed with the token removed or preserved before or after the split

Detailed Info - TryIt - Website

QMatrix

PLAIN-TEXT

This tool implements Raymond Queneau's matrix analysis of language with a given text. The results of the analysis include a breakdown of the text into formatives, signifiers, and bi-words. N.B. In order to work with texts with accents, load your document to myTexts first, then select the document when you use QMatrix. Accented characters will not display properly but the tool will work otherwise.

Detailed Info - TryIt - Website

TAPoRware List Tags (HTML)

HTML

This service list all the html tags of an html document.

Detailed Info - TryIt - Website TAPoRware Tokenizer (HTML)

HTML

This tool splits an HTML document at specified points, or tokens. These tokens can be words, lines, sentences, and paragraphs, as well as certain characters, patterns, or tags. The results can be listed with the token removed, before the split, or after the split.

Detailed Info - TryIt - Website

TAPoRware Tokenizer (Plain)

PLAIN-TEXT

Tokenize tool splits text document at specified points, or tokens. These tokens can be words, lines, sentences, and paragraphs, as well as certain characters or patterns. The results can be listed with the token removed, before the split, or after the split.

Detailed Info - TryIt - Website

Hyperpoet Frequencies

PLAIN-TEXT, TEI

List text frequency of user specified URL.

Detailed Info - TryIt - Website

Test List Words Tool

PLAIN-TEXT description

Detailed Info - TryIt - Website

TAPoRware List Word Pairs

HTML, PLAIN-TEXT, XML

This tool will list word pairs in a corpus based on different criteria.

Detailed Info - TryIt - Website

TAPoRware List Elements (XML)

TEI, XML

List XML Elements tool is used to display all of the elements contained in an XML document. This tool also allows the user to count all instances of an element and to view the structure/hierarchy of the document. It also provides a variety of tools for listing attributes and attribute values.

Detailed Info - TryIt - Website

TAPoRware List Words (HTML)

HTML

List Words (HTML) tool can be used to list all or user specified words found within a specified tag. The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order. If no tag is specified, 'body' tag is used.

Detailed Info - TryIt - Website

TAPoRware List Words (Plain)

PLAIN-TEXT

List Words (Plain) tool can be used to list all of the words found within a given text document. The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order.

Detailed Info - TryIt - Website

TAPoRware List Words (XML)

TEI, XML

List Words (XML) tool can be used to list all or user specified words found within a specified element. The query results can be displayed alphabetically, by frequency, by order of appearance, or in reversed alphabetical order. If no element is specified, all words in the xml document will be returned.

Detailed Info - TryIt - Website

TAPoRware Texts Comparator

HTML, PLAIN-TEXT, XML

The tool compares two user submitted texts. It performs basic statistics on each text and lists the words and counts side by side.

Detailed Info - TryIt - Website

Visualization Tools

TAPoRware Pattern Distribution (XML)

TEI, XML

This tool will display user specified pattern distribution over different text unit in different format

Detailed Info - TryIt - Website

TAPoRware Pattern Distribution (HTML)

HTML

This tool displays visually pattern distribution in user selected text unit.

Detailed Info - TryIt - Website

TAPoRware Pattern Distribution (Plain)

PLAIN-TEXT This tool will display pattern distribution over plain text string in different format

Detailed Info - TryIt - Website

TAPoRware Visual Collocator

HTML, PLAIN-TEXT, XML

The Visual Collocator displays collocates of words using a graph layout. Words which share similar collocates will be drawn together in the graph, producing new insight into the text. Any word can be double-clicked to fetch its collocates. Any word can be removed from the graph, and new words can be added using the text field. Additionally, words can be made "sticky", then dragged around to new positions, creating a user defined layout. This tool uses the prefuse library. * This tool requires the JRE (v1.4.2 and up) in order to work properly.

Detailed Info - TryIt - Website

TAPoRware Word Cloud

HTML, PLAIN-TEXT, XML

This tool count the words in the text and display them in font and color based on their count. The order and the number of words are specified by user.

Detailed Info - TryIt - Website

TAPoRware Word Brush

HTML, PLAIN-TEXT, XML

'Word Brush' allows the user to paint with words extracted from an online document. This tool uses Java to display results. The current applet was compiled using Java 1.4.2_08 so you will need at least this version of the Java plug-in for your browser if you wish to use it.

Detailed Info - TryIt - Website

TAPoRware Raining Words

HTML, PLAIN-TEXT, XML

'Raining Words' is designed to display high frequency words such that high frequency words are rendered larger and move more slowly than words with lower frequencies. The source can be plain text, XML or HTML. If the source is XML or HTML, the processed text will be limited to the context of user given element. It is then filtered using a stop-word list. The resulting text is then scanned for the top 20 high frequency words. This tool uses Java to display results. The current applet was compiled using Java 1.4.2_08 so you will need at least this version of the Java plug-in for your browser if you wish to use it.

Detailed Info - TryIt - Website

TAPoRware Hypergraph (XML)

XML

Using hypergraph package to draw the xml document' structure.

Detailed Info - TryIt - Website

Humanist Trends Viewer (Voyeur) PLAIN-TEXT

This tool is intended to help view word trends in the Humanist Discussion Group archive. The archives have been scraped from the web, segmented into yearly volumes, and stripped down to only the plain text from the bodies of the email messages (see the TADA Archives of the Humanist Discussion Group for more information on acquiring and processing the archives).

Detailed Info - TryIt - Website

EditingTools

XSL Transformer

DOCBOOK, MEP, TARL, TEI, XML

This is a transformation tool that applies the specified XSL stylesheet to the selected XML file.

Detailed Info - TryIt

TEI Transformer

TEI

This is a tranformation tool that uses a set of TEI stylesheets to convert a TEI document into HTML. For more information see http://www.tei- c.org/Stylesheets/teic/.

Detailed Info - TryIt

Neko Transformer

HTML

This an HTML "tidying" tool which is based on the CyberNeko HTML Parser. Use Neko Transformer to balance tags and "fix up many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML adds missing parent elements; automatically closes elements with optional end tags; and can handle mismatched inline element tags." More information about Neko can be found at http://people.apache.org/~andyc/neko/

Detailed Info - TryIt

HTML Entity Transformer

HTML

This is a transformation tool that reads the specified HTML document and converts all HTML entities into their Unicode counterparts. The tool produces an HTML page once it completes.

Detailed Info - TryIt

Raw Entity Transformer

XML This is a transformation tool that reads the specified XML document and converts all entities in the document into their Unicode counterparts. The tool outputs the resulting XML in its raw form. As the result, most browser will not be able to display the result properly. Please save the tool invocation results locally to see the XML output.

Detailed Info - TryIt

Pretty Entity Transformer

XML

This is a transformation tool that reads the specified XML document and converts all entities in the document into their Unicode counterparts. The tool output is then converted to a pretty-printed HTML with an XSL stylesheet. This tool is not suitable for processing large XML files due to the limitations imposed by the XSL transformation. Please use Raw Entity Transformer if it is necessary to transform large XML files.

Detailed Info - TryIt

MS Word Transformer

MSWORD

This a converter tool that extracts plain text from Microsoft Word documents using Jakarta POI library.

Detailed Info - TryIt

Highlighter

HTML, MSWORD, PDF

Highlighter tool uses Apache Lucene library to provied a KWIC search. It highlights all occurrences of the specified query within a document.

Detailed Info - TryIt

Diff Transformer

DOCBOOK, HTML, MEP, PLAIN-TEXT, TARL, TEI, XML

This tool compares two files and highlights the differences in each of them.

Detailed Info - TryIt

PDF Transformer

PDF

This is a converter tool that extracts plain text from a PDF document. The tool uses PDFBox, an open source Java PDF library for working with PDF. More information about the library can be found at http://www.pdfbox.org.

Detailed Info - TryIt Miscellaneous Tools

LiteMorph

PLAIN-TEXT

LiteMorph

Detailed Info - TryIt - Website

TAPoRware XML Transformer

XML

Provide your xml and xsl file, this tool will transform the xml into the file specified in your xsl file. However, because the output is pre-configured as HTML, if your xsl's target format is xml, you need to view the source of the output to cee the xml

Detailed Info - TryIt - Website

TokenX

XML

A text visualization, analysis, and play tool

Detailed Info - TryIt - Website

Recommended publications