1. dd80576 Merge pull request #538 from balashashanka/NUTCH-2782 by Sebastian Nagel · 3 years, 10 months ago
  2. ff15671 Merge pull request #535 from sebastian-nagel/NUTCH-2796-NUTCH-2730 by Sebastian Nagel · 3 years, 10 months ago
  3. 8971ccc NUTCH-2803 Rename property http.robot.rules.whitelist by Lewis John McGibbney · 3 years, 11 months ago
  4. 996ff8b NUTCH-2782: protocol-http / lib-http: support TLSv1.3 by shbalaku · 3 years, 11 months ago
  5. 6801ac7 [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 3 years, 11 months ago
  6. 3d1fc94 NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 3 years, 11 months ago
  7. f24ccab [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 3 years, 11 months ago
  8. cbab5c8 NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 3 years, 11 months ago
  9. c8a71a8 [NUTCH-2730] SitemapProcessor to treat sitemap URLs as Set instead of List by Sebastian Nagel · 3 years, 11 months ago
  10. ca4a039 [NUTCH-2796] Upgrade to crawler-commons 1.1 by Sebastian Nagel · 3 years, 11 months ago
  11. a1adce7 Prepare for new development after release of 1.17 by Sebastian Nagel · 4 years ago
  12. 1c2e411 NUTCH-2794 Add additional ciphers to HTTP base's default cipher suite by Markus Jelsma · 4 years ago
  13. 59d0d95 Merge pull request #533 from pmezard/NUTCH-2791 by Sebastian Nagel · 4 years ago
  14. 6b6e74c NUTCH-2791 Handle GCS URLs in stats commands by Patrick Mezard · 4 years ago
  15. a48a9f6 Merge pull request #530 from sebastian-nagel/NUTCH-2789 by Sebastian Nagel · 4 years ago
  16. 5640caa Merge pull request #529 from sebastian-nagel/NUTCH-2788 by Sebastian Nagel · 4 years ago
  17. b1b6128 Merge pull request #531 from sebastian-nagel/NUTCH-2787 by Sebastian Nagel · 4 years ago
  18. 3feaf03 Merge pull request #532 from pmezard/NUTCH-2790 by Sebastian Nagel · 4 years ago
  19. 6fa02ef NUTCH-2790 indexer-csv: escape field leading quote character by Patrick Mezard · 4 years ago
  20. c51604b NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly by Sebastian Nagel · 4 years ago
  21. 5c98446 Merge pull request #527 from sebastian-nagel/NUTCH-2496 by Sebastian Nagel · 4 years ago
  22. 1cb64df Merge pull request #528 from sebastian-nagel/NUTCH-2720 by Sebastian Nagel · 4 years ago
  23. 2021ca6 NUTCH-2789 Documentation: update links to point to cwiki by Sebastian Nagel · 4 years ago
  24. bbaff6f NUTCH-2789 Docker README: update links to point to cwiki by Sebastian Nagel · 4 years ago
  25. e3f7725 NUTCH-2788 ParseData: improve presentation of Metadata in method toString() by Sebastian Nagel · 4 years ago
  26. aa3a2a6 NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 4 years ago
  27. f0e1e3d NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 4 years ago
  28. 7fba6df NUTCH-2496 Speed up link inversion step in crawling script by Sebastian Nagel · 4 years ago
  29. 9139d6e Merge pull request #526 from sebastian-nagel/NUTCH-2419-urlfilter-rule-file-precedence by Sebastian Nagel · 4 years ago
  30. f971ca1 NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 4 years ago
  31. e61a8a3 Merge pull request #525 from sebastian-nagel/NUTCH-1945 by Sebastian Nagel · 4 years ago
  32. b543b8b NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 4 years, 8 months ago
  33. ec93b33 Merge pull request #522 from sebastian-nagel/NUTCH-2758 by Sebastian Nagel · 4 years, 1 month ago
  34. ede1489 Merge pull request #523 from sebastian-nagel/NUTCH-2753 by Sebastian Nagel · 4 years, 1 month ago
  35. 40472c1 Merge pull request #521 from sebastian-nagel/NUTCH-2002-checkers-robotstxt by Sebastian Nagel · 4 years, 1 month ago
  36. aa0c75e Merge pull request #519 from sebastian-nagel/NUTCH-2785-freegenerator-num-fetch-lists by Sebastian Nagel · 4 years, 1 month ago
  37. 04e1592 Merge pull request #514 from sebastian-nagel/NUTCH-1194-generator-release-crawldb-lock-earlier by Sebastian Nagel · 4 years, 1 month ago
  38. 11eea5a NUTCH-1194 Generator: CrawlDB lock should be released earlier by Sebastian Nagel · 4 years, 1 month ago
  39. 0341f0d NUTCH-1945 Test for XLSX parser by Sebastian Nagel · 4 years, 1 month ago
  40. a0ed0b4 NUTCH-2434 Add methods to reset parameters HTMLMetaTags by Sebastian Nagel · 4 years, 1 month ago
  41. c573c70 NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and LinkDbReader by Sebastian Nagel · 4 years, 1 month ago
  42. 90502bd NUTCH-2758 Add plugin READMEs to binary release packages by Sebastian Nagel · 4 years, 1 month ago
  43. 46db3ed NUTCH-2002 parse and index checkers to check robots.txt by Sebastian Nagel · 4 years, 1 month ago
  44. 6a98ae7 Merge pull request #520 from sebastian-nagel/NUTCH-2743 by Sebastian Nagel · 4 years, 1 month ago
  45. 2d534d6 Merge pull request #518 from sebastian-nagel/NUTCH-2784-tool-listing-properties by Sebastian Nagel · 4 years, 1 month ago
  46. c50575a Merge pull request #517 from sebastian-nagel/NUTCH-2495-bin-crawl-delete-while-indexing by Sebastian Nagel · 4 years, 1 month ago
  47. a8162b9 Merge pull request #505 from sebastian-nagel/NUTCH-2776-fetcher-dedup-redirects by Sebastian Nagel · 4 years, 1 month ago
  48. 3665345 Merge pull request #500 from sebastian-nagel/NUTCH-2772-parsefilter-debug by Sebastian Nagel · 4 years, 1 month ago
  49. 462ca6e NUTCH-2743 Add list of Nutch properties (nutch-default.xml) to documentation by Sebastian Nagel · 4 years, 1 month ago
  50. 72f3ff2 NUTCH-2785 FreeGenerator: command-line option to define number of generated fetch lists by Sebastian Nagel · 4 years, 1 month ago
  51. 73880df Merge pull request #509 from sebastian-nagel/NUTCH-2778-indexer-elastic-log-errors by Sebastian Nagel · 4 years, 1 month ago
  52. 7ebd35d NUTCH-2495: Use -deleteGone instead of clean job in crawl script while indexing by Sebastian Nagel · 4 years, 1 month ago
  53. a455eb5 NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode by Sebastian Nagel · 4 years, 1 month ago
  54. fccc634 Merge pull request #513 from sebastian-nagel/NUTCH-2501-java-heap-size-distr-mode by Sebastian Nagel · 4 years, 1 month ago
  55. b5e794e NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode by Sebastian Nagel · 4 years, 1 month ago
  56. 1bc4aeb Merge pull request #512 from sebastian-nagel/NUTCH-2781-increase-java-heap-size by Sebastian Nagel · 4 years, 1 month ago
  57. 3214840 NUTCH-2781 Increase default Java heap size by Sebastian Nagel · 4 years, 1 month ago
  58. 2e2ce6a Merge pull request #516 from sebastian-nagel/NUTCH-2783-logging by Sebastian Nagel · 4 years, 1 month ago
  59. 3d3018b NUTCH-2783 Use (more) parametrized logging by Sebastian Nagel · 4 years, 1 month ago
  60. 52eec66 Merge pull request #515 from balashashanka/NUTCH-2780 by Sebastian Nagel · 4 years, 1 month ago
  61. ca457fc NUTCH-2780 : Upgrade index-solr to use Solr 8.5.1 by balashashanka · 4 years, 1 month ago
  62. a20c261 NUTCH-2784 Tool to list Nutch properties and configured values by Sebastian Nagel · 4 years, 2 months ago
  63. 5b4f595 NUTCH-2780 : Upgrade index-solr to use Solr 8.5.1 by balashashanka · 4 years, 1 month ago
  64. 49eb1bd Merge pull request #511 from sebastian-nagel/NUTCH-2779-tika-1.24.1 by Sebastian Nagel · 4 years, 1 month ago
  65. e1ba9f1 NUTCH-2779 Upgrade to Tika 1.24.1 by Sebastian Nagel · 4 years, 1 month ago
  66. dcbb0f2 Merge pull request #510 from balashashanka/NUTCH-2755 by Sebastian Nagel · 4 years, 1 month ago
  67. 6ae4468 NUTCH-2755: Remove obsolete plugin indexer-elastic-rest by balashashanka · 4 years, 1 month ago
  68. 6741574 NUTCH-2755: Remove obsolete plugin indexer-elastic-rest by balashashanka · 4 years, 1 month ago
  69. 6f51618 Merge pull request #508 from balashashanka/NUTCH-2757 by Sebastian Nagel · 4 years, 1 month ago
  70. 240aac9 Corrected README.md file by balashashanka · 4 years, 1 month ago
  71. e471397 NUTCH-2757 - Indexer-elastic: add authentication options by balashashanka · 4 years, 1 month ago
  72. 81a4b92 NUTCH-2778 indexer-elastic to properly log errors by Sebastian Nagel · 4 years, 1 month ago
  73. e5b61de NUTCH-2757 - Indexer-elastic: add authentication options by balashashanka · 4 years, 1 month ago
  74. f999ca5 NUTCH-2757 : Indexer-elastic: add authentication options by balashashanka · 4 years, 1 month ago
  75. 0cd0022 Merge pull request #507 from balashashanka/NUTCH-2777 by Sebastian Nagel · 4 years, 1 month ago
  76. e6d3e57 Merge pull request #506 from sebastian-nagel/NUTCH-2775-robots-min-delay by Sebastian Nagel · 4 years, 1 month ago
  77. d7b6ccf NUTCH-2777 - Upgrade to Hadoop 3.1 by balashashanka · 4 years, 2 months ago
  78. e6bc451 NUTCH-2775 Fetcher to guarantee minimum delay even if robots.txt defines shorter Crawl-delay by Sebastian Nagel · 4 years, 2 months ago
  79. 0f33d18 NUTCH-2776 Fetcher to temporarily deduplicate followed redirects by Sebastian Nagel · 4 years, 2 months ago
  80. e9dd180 Merge pull request #501 from sebastian-nagel/NUTCH-2773-segment-reader-recode-html by Sebastian Nagel · 4 years, 2 months ago
  81. 3cffe3b Merge pull request #502 from sebastian-nagel/NUTCH-2774-override-annotations by Sebastian Nagel · 4 years, 2 months ago
  82. 4443cc1 NUTCH-2770 Subcollection logic allows empty string as a whitelist value, thus matching every incoming document by Sebastian Nagel · 4 years, 3 months ago
  83. 22e668d NUTCH-2774 Annotate methods implementing the Hadoop API by @Override by Sebastian Nagel · 4 years, 3 months ago
  84. 5076430 NUTCH-2773 SegmentReader (-dump or -get): show HTML content as UTF-8 by Sebastian Nagel · 5 years ago
  85. caea3a0 NUTCH-2772 Debugging parse filter to show serialized DOM tree by Sebastian Nagel · 4 years, 3 months ago
  86. ebc2152 Merge pull request #498 from sebastian-nagel/NUTCH-2763-protocol-okhttp-store-headers-status-line by Sebastian Nagel · 4 years, 3 months ago
  87. 1cdbb93 Merge pull request #499 from sebastian-nagel/NUTCH-2768-fetcher-thread-unnecessary-class-cast by Sebastian Nagel · 4 years, 3 months ago
  88. ac4f2f4 Merge pull request #497 from sebastian-nagel/NUTCH-2767-queue-feeder-above-exceptions-threshold by Sebastian Nagel · 4 years, 3 months ago
  89. 77ec28f NUTCH-2768 FetcherThread: unnecessary usage of class casts by Sebastian Nagel · 4 years, 3 months ago
  90. 6dd0a7f NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 4 years, 3 months ago
  91. 35dcd42 NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 4 years, 3 months ago
  92. 7840cb6 NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 4 years, 3 months ago
  93. 9449417 NUTCH-2763 protocol-okhttp (store.http.headers): add whitespace in status line after status code also when message is empty by Sebastian Nagel · 4 years, 3 months ago
  94. 8e5837f NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 4 years, 3 months ago
  95. 142a026 Merge pull request #495 from sebastian-nagel/NUTCH-2672-build-docs-use-https by Sebastian Nagel · 4 years, 4 months ago
  96. ea862f4 Merge pull request #496 from balashashanka/NUTCH-2649 by Sebastian Nagel · 4 years, 4 months ago
  97. 0eec1f8 Fix for NUTCH-2649: Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit by Bala S Shashanka · 4 years, 4 months ago
  98. 0a2ffa7 Merge pull request #492 from sebastian-nagel/NUTCH-2733-protocol-okhttp-support-brotli by Sebastian Nagel · 4 years, 4 months ago
  99. 6f1a0dd NUTCH-2733 protocol-okhttp: add support for Brotli compression (Content-Encoding) by Sebastian Nagel · 4 years, 4 months ago
  100. a209946 NUTCH-2733 protocol-okhttp: add support for Brotli compression (Content-Encoding) by Sebastian Nagel · 4 years, 4 months ago