PDF Version of This Documentation
Total Page:16
File Type:pdf, Size:1020Kb
Thunderstone Parametric Search Appliance WWW Site Indexer Version 23.1.0 Thunderstone Software December 10, 2020 Contents 1 Overview 21 1.1 Features........................................ 21 1.2 TechnicalSupport................................ ....... 22 2 Installation 23 2.1 How to unpack and install the Parametric Search Appliance................. 23 2.1.1 ConsoleMenu ................................... 23 2.1.2 FrontPanelLCD ................................. 27 2.2 Customizing the Parametric Search Appliance’s Appearance ................ 29 3 Operation 31 3.1 Running the Administrative Interface . .............. 31 3.2 FirstTimeRun:QuickStart . ........ 31 3.3 Administrative Interface Overview . ............. 34 3.4 BasicWalkSettings............................... ....... 58 3.4.1 WalkSummary................................... 58 3.4.2 Notes ......................................... 59 3.4.3 BaseURL(s) .................................... 59 3.4.4 Robots ........................................ 59 3.4.5 RobotsCrawl-delay ...... ..... ...... ...... ..... 60 3.4.6 AllowExtensions............................... 60 3.4.7 ExcludeExtensions. ..... 61 3.4.8 Exclusions .................................... 61 3.4.9 WalkDelay..................................... 61 3 4 CONTENTS 3.4.10 Parallelism .................................. 62 3.4.11 Verbosity .................................... 62 3.4.12 Disable Starting Walks . ....... 62 3.4.13 CatalogAuto-Import . ...... 63 3.4.14 Auto-ImportDelay . ..... 63 3.4.15 RewalkType ................................... 63 3.4.16 RewalkSchedule ............................... 65 3.4.17 ActionButtons ................................ 66 3.5 AdvancedWalkSettings . ....... 66 3.5.1 WatchURL...................................... 67 3.5.2 EndofWalkEmail ................................ 67 3.5.3 AttachLogs.................................... 67 3.5.4 Categories .................................... 67 3.5.5 CategoriesType................................ 68 3.5.6 DBWalker...................................... 69 3.5.7 URLFile ....................................... 69 3.5.8 URLURL ...................................... 69 3.5.9 SinglePage.................................... 69 3.5.10 PageFile ..................................... 70 3.5.11 PageURL...................................... 70 3.5.12 StripQueries ................................. 70 3.5.13 KeepQueryVars ................................ 70 3.5.14 IgnoreQueryVars . ...... ..... ...... ...... ..... 71 3.5.15 SortQueryVars................................ 71 3.5.16 LowerQueryVarValues . ..... 71 3.5.17 IgnoreCase................................... 72 3.5.18 ExtraDomains ................................. 72 3.5.19 ExtraNetworks................................ 72 3.5.20 ExtraURLsREX................................. 73 3.5.21 ExclusionREX................................. 73 CONTENTS 5 3.5.22 ExclusionPrefix ............................... 73 3.5.23 RSSFeeds ..................................... 74 3.5.24 ExcludebyField ............................... 74 3.5.25 EntitySourceFields . ...... 74 3.5.26 DatafromField................................ 75 3.5.27 RequiredREX .................................. 80 3.5.28 RequiredPrefix................................ 80 3.5.29 MaxPageSize .................................. 80 3.5.30 MaxPages ..................................... 80 3.5.31 MaxBytes ..................................... 80 3.5.32 MaxDepth ..................................... 81 3.5.33 MaxURLSize ................................... 81 3.5.34 MaxRequests.................................. 81 3.5.35 Max Connection Lifetime . ...... 81 3.5.36 PageTimeout.................................. 81 3.5.37 MetaTags..................................... 81 3.5.38 StandardMeta ................................. 82 3.5.39 AllMeta ...................................... 82 3.5.40 StorageCharset... ...... ..... ...... ...... ..... ..... 82 3.5.41 Source Default Charset . ....... 82 3.5.42 XMLUTF-8 ..................................... 83 3.5.43 KeepHTML ..................................... 83 3.5.44 KeepLinks .................................... 83 3.5.45 RemoveCommon ................................. 84 3.5.46 IgnoreTags................................... 84 3.5.47 KeepTags..................................... 84 3.5.48 IgnoreCharacters. ...... 84 3.5.49 PluginSplit.................................. 85 3.5.50 LanguageAnalysis . ..... 85 3.5.51 CJKMode ...................................... 86 6 CONTENTS 3.5.52 UnknownFileFormats . ..... 86 3.5.53 PDFTitleAction ............................... 86 3.5.54 WordDefinition ................................ 87 3.5.55 TextSearchMode ............................... 87 3.5.56 AttributeCompareMode. ...... 89 3.5.57 IndexFields.................................. 89 3.5.58 CompoundIndexFields . ..... 89 3.5.59 ExtraIndexes................................. 90 3.5.60 Spell-check Dictionaries . ......... 90 3.5.61 PrimerType................................... 90 3.5.62 PrimerURLs ................................... 91 3.5.63 UnprimerURLs ................................. 93 3.5.64 LoginInfo .................................... 94 3.5.65 ProxyAuto-ConfigURL . 94 3.5.66 Proxy ........................................ 95 3.5.67 ProxyLoginInfo ............................... 95 3.5.68 ClientCertificate . ...... 95 3.5.69 CookieSourcePath. ..... 95 3.5.70 CookieJar .................................... 96 3.5.71 StrictCookiePaths . ...... 96 3.5.72 Off-SitePages ................................ 96 3.5.73 Off-SiteComponents . ...... 96 3.5.74 StayUnder .................................... 97 3.5.75 PreventDuplicates . ...... 97 3.5.76 RespectCanonicalURLs. ...... 97 3.5.77 Duplicate Check Fields . ....... 97 3.5.78 StoreRefs.................................... 98 3.5.79 InlineIframes................................ ..... 98 3.5.80 MaxComponents................................ 98 3.5.81 ExecuteJavaScript . ...... 98 CONTENTS 7 3.5.82 FetchJavaScript .. ...... ..... ...... ...... ..... ..... 98 3.5.83 JavaScript String Links . ........ 99 3.5.84 DebugJavaScript . ..... 99 3.5.85 JavaScriptMemory. ..... 99 3.5.86 JavaScriptTimeout . ...... 99 3.5.87 AJAXCrawlableURLs. 99 3.5.88 WalkTraceSettings . 100 3.5.89 AuditLog..................................... 100 3.5.90 PerformanceLogging. 100 3.5.91 BatchLocks ................................... 101 3.5.92 URLProtocols ................................. 101 3.5.93 HTTPVersion .................................. 101 3.5.94 SSLClientProtocols . 101 3.5.95 SSLClientCiphers. 102 3.5.96 SSLUseSNI .................................... 102 3.5.97 Network Share Protocols . ....... 103 3.5.98 Network Share Access Method . ....... 103 3.5.99 Authentication Schemes . ....... 103 3.5.100EmbeddedSecurity. 104 3.5.101BodyStorageMethod . 104 3.5.102MultipleFetches . 104 3.5.103 Follow Cross-Site Links . ........ 104 3.5.104MaxRedirects ................................ 105 3.5.105EmptyFormRedirects . 105 3.5.106 Execute Walked Dataload . ....... 105 3.5.107IndexName................................... 105 3.5.108DNSMode ..................................... 106 3.5.109UserAgent ................................... 106 3.5.110Robots.txtAgents. ....... 106 3.5.111MimeTypes ................................... 107 8 CONTENTS 3.5.112CustomHeaders .. ...... ..... ...... ...... ..... 107 3.5.113 Respect Expires Header . ....... 107 3.5.114CacheContent ................................ 107 3.5.115DefaultRefreshTime. ....... 108 3.5.116MinimumRefreshTime . 108 3.5.117MaximumRefreshTime . 109 3.5.118MaximumProcessSize. 109 3.5.119MaximumLoadAverage . 109 3.5.120 Replication Settings . ........ 109 3.5.121SendData.................................... 109 3.5.122SendSettings .... ...... ..... ...... ...... ..... 110 3.5.123BatchRows................................... 110 3.5.124BatchSize ................................... 110 3.5.125BatchIdle................................... 110 3.5.126LogReplication. 110 3.6 SearchSettings .................................. 110 3.6.1 Notes ......................................... 111 3.6.2 QueryLogging .................................. 111 3.6.3 RotateSchedule ................................ 111 3.6.4 Email......................................... 111 3.6.5 ResultOrder ................................... 111 3.6.6 ResultsStyle .................................. 112 3.6.7 AllowRSS ...................................... 112 3.6.8 FormatXSLOutput ............................... 112 3.6.9 XSLEngine..................................... 112 3.6.10 XSLFile ...................................... 113 3.6.11 AbstractStyle................................ 113 3.6.12 AbstractLength... ...... ..... ...... ...... ..... 113 3.6.13 MaxTitleLength............................... 113 3.6.14 MaxURLDisplayLength . 113 CONTENTS 9 3.6.15 ResultsperPage ............................... 114 3.6.16 MaxUserResultsperPage. 114 3.6.17 PageLinksShown ............................... 114 3.6.18 ResultsperSite............................... 114 3.6.19 Allowsite:syntax . ...... ..... ...... ...... ..... 115 3.6.20 Allowlink:syntax . ...... ..... ...... ...... ..... 116 3.6.21 Parametric Search Options . ........ 116 3.6.22 ResultURLSource. ...... ..... ...... ...... ..... 117 3.6.23 Parametric Search Query . ....... 117 3.6.24 GroupBy...................................... 117 3.6.25 MaxGroupBys.................................. 118 3.6.26 MaxDocstoGroupBy..... ..... ...... ...... ..... .. 119 3.6.27 ResultsWidth................................