Sphider-Plus Manual
Total Page:16
File Type:pdf, Size:1020Kb
Sphider-plus manual Content 1. Introduction......................................................................................................................6 2. Version and legal info......................................................................................................6 3. Installation of version 3.0 – 3.2020d..............................................................................7 3.1 Preconditions...................................................................................................................................... 7 3.2 New installation................................................................................................................................... 7 3.3 Updating from version 1 and 2..........................................................................................................10 3.4 Updating from 3.x to 3.y....................................................................................................................10 4. Settings and customizing.............................................................................................12 5. Indexing..........................................................................................................................14 5.1 Various options................................................................................................................................. 14 5.2 Allow other hosts in same domain....................................................................................................15 5.3 Word stemming................................................................................................................................. 15 5.4 Periodical Re-indexing......................................................................................................................16 5.5 Preferred indexing............................................................................................................................ 16 5.6 Multi-threaded indexing.....................................................................................................................16 5.7 Create thumbnails as a web shot during index procedure................................................................17 5.8 Follow sitemap file............................................................................................................................17 5.9 Use private sitemap instead of global sitemap..................................................................................18 5.10 Create sitemap file.......................................................................................................................... 18 6. Using the indexer from command line........................................................................19 6.1 Overview and options.......................................................................................................................19 6.2 Multi-threaded indexing.....................................................................................................................19 6.3 Index only the new............................................................................................................................20 6.4 Reindex all........................................................................................................................................ 20 6.5 Index erased sites............................................................................................................................. 20 7. Keeping pages, words and files from being indexed.................................................21 7.1 robots.txt........................................................................................................................................... 21 7.2 URL must include / must not include string list..................................................................................21 7.3 Ignoring links..................................................................................................................................... 22 7.4 Canonical <link> tag......................................................................................................................... 22 7.5 Ignoring parts of a page by <!--sphider_noindex--> . <!--/sphider_noindex-->............................22 7.6 Ignoring parts of a page defined by <div id=’abc’> or <div class=’abc’>...........................................23 7.7 Indexing only parts of a page defined by <div id=’abc’> or <div class=’abc’>...................................23 7.8 Ignoring HTML elements defined by <tagname> . </tagname>..................................................24 7.9 Indexing only HTML elements defined by <tagname> . </tagname>..........................................24 7.10 Ignoring parts of a page defined by <ul class=’abc’> . </ul>.....................................................25 7.11 Ignoring parts of a page defined by <pre class=’abc’> . </pre>.................................................25 1 7.12 Ignored words................................................................................................................................. 26 7.13 Use of Whitelist............................................................................................................................... 26 7.14 Use of Blacklist............................................................................................................................... 27 7.15 Ignored files (by suffix)....................................................................................................................27 7.16 Index only files and documents with defined suffix.........................................................................27 8. UTF-8 support and 'Preferred charset'........................................................................28 9. Search modes.................................................................................................................30 9.1 Search with wildcards *.....................................................................................................................30 9.2 Strict search !.................................................................................................................................... 30 9.3 Tolerant search................................................................................................................................. 30 9.4 Link search site:................................................................................................................................ 31 9.5 Media search.................................................................................................................................... 31 9.6 Search only in one domain...............................................................................................................31 9.7 Search in categories.........................................................................................................................31 9.8 Greek language support...................................................................................................................32 9.9 Block queries.................................................................................................................................... 33 10. Chronological order for result listing........................................................................33 10.1 Sorting text results..........................................................................................................................33 10.2 Sorting media results......................................................................................................................35 11. PDF converter for Linux/UNIX systems.....................................................................35 12. Clean resources during index / re-index...................................................................37 13. Enable real-time output of logging data....................................................................37 14. Error messages and Debug mode.............................................................................38 15. Delete secondary characters......................................................................................39 16. Media search for images, audio streams and videos..............................................40 16.1 Media indexing................................................................................................................................ 40 16.2 Not supported media content..........................................................................................................41 16.3 Search for media content................................................................................................................41 16.4 Statistics for media content.............................................................................................................43 17. Feed support................................................................................................................44 17.1 XML product feeds.......................................................................................................................... 44 17.2 RDF, RSD, RSS and Atom feeds....................................................................................................46 18. Result cache for text and media queries...................................................................47