Links and References: • Apache Tika • Apache Opennlp • Apache Lucene

Links and References: • Apache Tika • Apache Opennlp • Apache Lucene

Memex Memex seeks to develop software that advances online search capabilities far beyond the current state of the art. Creation of a new domain-specific indexing and search paradigm will provide mechanisms for improved content discovery, information extraction, information retrieval, user collaboration and extension of current search capabilities to the deep web, the dark web, and nontraditional (e.g. multimedia) content. Crawled website GeoParser Documents The Geoparser as one of Memex’s sub projects is an open source software that can: data already indexed (any file format) • Process information from any type of file (Apache Tika) • Identify location entities from text (Named Entity Recognition) • Convert location name to geographic coordinates (Geo Gazetteer) • Visualize locations on a map (OpenLayer) After the information is parsed and points are plotted on the map, users are able to filter their results by domain, by searching a key word or geo-boundaries. Extract Text Content Identify Apache Solr Technologies locations •Apache Tika The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine Convert location indexing, content analysis, translation, and much more. to Latitude and •Apache OpenNLP Longitude The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity Visualize results on map extraction, chunking, parsing, and coreference resolution. •Apache Lucene (Geo Gazetteer) A command line gazetteer built around the Geonames.org dataset, that uses the Apache Lucene library to create a searchable gazetter. •Apache Solr Solr is highly reliable, scalable and fault tolerant, providing distributed indexing, replication and load-balanced querying, automated failover and recovery, centralized configuration and more. Links and references: GeoParser: https://github.com/MBoustani/GeoParser Memex: http://memex.jpl.nasa.gov/ Apache Tika: https://tika.apache.org/ Lucene Geo Gazetteer: https://github.com/chrismattmann/lucene-geo-gazetteer GeoTopicParser: https://wiki.apache.org/tika/GeoTopicParser Apache OpenNLP: https://opennlp.apache.org/ Apache Solr: http://lucene.apache.org/solr/.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    1 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us