1. c68780d NUTCH-2817 Avoid check for equality of URL path and file part using ==/!= by Sebastian Nagel · 3 years, 9 months ago
  2. 8b85324 NUTCH-2816 Add Spotbugs target to ant build by Sebastian Nagel · 3 years, 9 months ago
  3. 88cd369 NUTCH-2814 HttpDateFormat's internal time zone may change after parsing a date by Sebastian Nagel · 3 years, 9 months ago
  4. cac79d3 Merge pull request #542 from sebastian-nagel/NUTCH-2810 by Sebastian Nagel · 3 years, 9 months ago
  5. 2f5a8ad Merge pull request #537 from sebastian-nagel/NUTCH-2801-robots-checker by Sebastian Nagel · 3 years, 9 months ago
  6. e33aaa1 NUTCH-2811 : Setup Github workflows for prs (#543) by Madhawa Gunasekara · 3 years, 9 months ago
  7. 5399ce0 Merge pull request #536 from sebastian-nagel/NUTCH-2799-asf-yaml-file by Sebastian Nagel · 3 years, 9 months ago
  8. f0161ea NUTCH-2805: Rename plugin urlfilter-domainblacklist (#540) by Shashanka Balakuntala Srinivasa · 3 years, 10 months ago
  9. 46f7dc2 NUTCH-2810 FreeGenerator to actually apply configured number of fetch lists by Sebastian Nagel · 3 years, 10 months ago
  10. dd80576 Merge pull request #538 from balashashanka/NUTCH-2782 by Sebastian Nagel · 3 years, 10 months ago
  11. ff15671 Merge pull request #535 from sebastian-nagel/NUTCH-2796-NUTCH-2730 by Sebastian Nagel · 3 years, 10 months ago
  12. 8971ccc NUTCH-2803 Rename property http.robot.rules.whitelist by Lewis John McGibbney · 3 years, 10 months ago
  13. 996ff8b NUTCH-2782: protocol-http / lib-http: support TLSv1.3 by shbalaku · 3 years, 10 months ago
  14. 6801ac7 [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 3 years, 10 months ago
  15. 3d1fc94 NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 3 years, 10 months ago
  16. f24ccab [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 3 years, 10 months ago
  17. cbab5c8 NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 3 years, 10 months ago
  18. c8a71a8 [NUTCH-2730] SitemapProcessor to treat sitemap URLs as Set instead of List by Sebastian Nagel · 3 years, 10 months ago
  19. ca4a039 [NUTCH-2796] Upgrade to crawler-commons 1.1 by Sebastian Nagel · 3 years, 10 months ago
  20. a1adce7 Prepare for new development after release of 1.17 by Sebastian Nagel · 3 years, 11 months ago
  21. 1c2e411 NUTCH-2794 Add additional ciphers to HTTP base's default cipher suite by Markus Jelsma · 3 years, 11 months ago
  22. 59d0d95 Merge pull request #533 from pmezard/NUTCH-2791 by Sebastian Nagel · 4 years ago
  23. 6b6e74c NUTCH-2791 Handle GCS URLs in stats commands by Patrick Mezard · 4 years ago
  24. a48a9f6 Merge pull request #530 from sebastian-nagel/NUTCH-2789 by Sebastian Nagel · 4 years ago
  25. 5640caa Merge pull request #529 from sebastian-nagel/NUTCH-2788 by Sebastian Nagel · 4 years ago
  26. b1b6128 Merge pull request #531 from sebastian-nagel/NUTCH-2787 by Sebastian Nagel · 4 years ago
  27. 3feaf03 Merge pull request #532 from pmezard/NUTCH-2790 by Sebastian Nagel · 4 years ago
  28. 6fa02ef NUTCH-2790 indexer-csv: escape field leading quote character by Patrick Mezard · 4 years ago
  29. c51604b NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly by Sebastian Nagel · 4 years ago
  30. 5c98446 Merge pull request #527 from sebastian-nagel/NUTCH-2496 by Sebastian Nagel · 4 years ago
  31. 1cb64df Merge pull request #528 from sebastian-nagel/NUTCH-2720 by Sebastian Nagel · 4 years ago
  32. 2021ca6 NUTCH-2789 Documentation: update links to point to cwiki by Sebastian Nagel · 4 years ago
  33. bbaff6f NUTCH-2789 Docker README: update links to point to cwiki by Sebastian Nagel · 4 years ago
  34. e3f7725 NUTCH-2788 ParseData: improve presentation of Metadata in method toString() by Sebastian Nagel · 4 years ago
  35. aa3a2a6 NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 4 years ago
  36. f0e1e3d NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 4 years ago
  37. 7fba6df NUTCH-2496 Speed up link inversion step in crawling script by Sebastian Nagel · 4 years ago
  38. 9139d6e Merge pull request #526 from sebastian-nagel/NUTCH-2419-urlfilter-rule-file-precedence by Sebastian Nagel · 4 years ago
  39. f971ca1 NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 4 years ago
  40. e61a8a3 Merge pull request #525 from sebastian-nagel/NUTCH-1945 by Sebastian Nagel · 4 years ago
  41. b543b8b NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 4 years, 8 months ago
  42. ec93b33 Merge pull request #522 from sebastian-nagel/NUTCH-2758 by Sebastian Nagel · 4 years ago
  43. ede1489 Merge pull request #523 from sebastian-nagel/NUTCH-2753 by Sebastian Nagel · 4 years ago
  44. 40472c1 Merge pull request #521 from sebastian-nagel/NUTCH-2002-checkers-robotstxt by Sebastian Nagel · 4 years ago
  45. aa0c75e Merge pull request #519 from sebastian-nagel/NUTCH-2785-freegenerator-num-fetch-lists by Sebastian Nagel · 4 years ago
  46. 04e1592 Merge pull request #514 from sebastian-nagel/NUTCH-1194-generator-release-crawldb-lock-earlier by Sebastian Nagel · 4 years ago
  47. 11eea5a NUTCH-1194 Generator: CrawlDB lock should be released earlier by Sebastian Nagel · 4 years, 1 month ago
  48. 0341f0d NUTCH-1945 Test for XLSX parser by Sebastian Nagel · 4 years ago
  49. a0ed0b4 NUTCH-2434 Add methods to reset parameters HTMLMetaTags by Sebastian Nagel · 4 years ago
  50. c573c70 NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and LinkDbReader by Sebastian Nagel · 4 years ago
  51. 90502bd NUTCH-2758 Add plugin READMEs to binary release packages by Sebastian Nagel · 4 years ago
  52. 46db3ed NUTCH-2002 parse and index checkers to check robots.txt by Sebastian Nagel · 4 years ago
  53. 6a98ae7 Merge pull request #520 from sebastian-nagel/NUTCH-2743 by Sebastian Nagel · 4 years ago
  54. 2d534d6 Merge pull request #518 from sebastian-nagel/NUTCH-2784-tool-listing-properties by Sebastian Nagel · 4 years ago
  55. c50575a Merge pull request #517 from sebastian-nagel/NUTCH-2495-bin-crawl-delete-while-indexing by Sebastian Nagel · 4 years ago
  56. a8162b9 Merge pull request #505 from sebastian-nagel/NUTCH-2776-fetcher-dedup-redirects by Sebastian Nagel · 4 years ago
  57. 3665345 Merge pull request #500 from sebastian-nagel/NUTCH-2772-parsefilter-debug by Sebastian Nagel · 4 years ago
  58. 462ca6e NUTCH-2743 Add list of Nutch properties (nutch-default.xml) to documentation by Sebastian Nagel · 4 years ago
  59. 72f3ff2 NUTCH-2785 FreeGenerator: command-line option to define number of generated fetch lists by Sebastian Nagel · 4 years ago
  60. 73880df Merge pull request #509 from sebastian-nagel/NUTCH-2778-indexer-elastic-log-errors by Sebastian Nagel · 4 years ago
  61. 7ebd35d NUTCH-2495: Use -deleteGone instead of clean job in crawl script while indexing by Sebastian Nagel · 4 years, 1 month ago
  62. a455eb5 NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode by Sebastian Nagel · 4 years ago
  63. fccc634 Merge pull request #513 from sebastian-nagel/NUTCH-2501-java-heap-size-distr-mode by Sebastian Nagel · 4 years ago
  64. b5e794e NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode by Sebastian Nagel · 4 years, 1 month ago
  65. 1bc4aeb Merge pull request #512 from sebastian-nagel/NUTCH-2781-increase-java-heap-size by Sebastian Nagel · 4 years ago
  66. 3214840 NUTCH-2781 Increase default Java heap size by Sebastian Nagel · 4 years, 1 month ago
  67. 2e2ce6a Merge pull request #516 from sebastian-nagel/NUTCH-2783-logging by Sebastian Nagel · 4 years ago
  68. 3d3018b NUTCH-2783 Use (more) parametrized logging by Sebastian Nagel · 4 years, 1 month ago
  69. 52eec66 Merge pull request #515 from balashashanka/NUTCH-2780 by Sebastian Nagel · 4 years ago
  70. ca457fc NUTCH-2780 : Upgrade index-solr to use Solr 8.5.1 by balashashanka · 4 years, 1 month ago
  71. a20c261 NUTCH-2784 Tool to list Nutch properties and configured values by Sebastian Nagel · 4 years, 2 months ago
  72. 5b4f595 NUTCH-2780 : Upgrade index-solr to use Solr 8.5.1 by balashashanka · 4 years, 1 month ago
  73. 49eb1bd Merge pull request #511 from sebastian-nagel/NUTCH-2779-tika-1.24.1 by Sebastian Nagel · 4 years, 1 month ago
  74. e1ba9f1 NUTCH-2779 Upgrade to Tika 1.24.1 by Sebastian Nagel · 4 years, 1 month ago
  75. dcbb0f2 Merge pull request #510 from balashashanka/NUTCH-2755 by Sebastian Nagel · 4 years, 1 month ago
  76. 6ae4468 NUTCH-2755: Remove obsolete plugin indexer-elastic-rest by balashashanka · 4 years, 1 month ago
  77. 6741574 NUTCH-2755: Remove obsolete plugin indexer-elastic-rest by balashashanka · 4 years, 1 month ago
  78. 6f51618 Merge pull request #508 from balashashanka/NUTCH-2757 by Sebastian Nagel · 4 years, 1 month ago
  79. 240aac9 Corrected README.md file by balashashanka · 4 years, 1 month ago
  80. e471397 NUTCH-2757 - Indexer-elastic: add authentication options by balashashanka · 4 years, 1 month ago
  81. 81a4b92 NUTCH-2778 indexer-elastic to properly log errors by Sebastian Nagel · 4 years, 1 month ago
  82. e5b61de NUTCH-2757 - Indexer-elastic: add authentication options by balashashanka · 4 years, 1 month ago
  83. f999ca5 NUTCH-2757 : Indexer-elastic: add authentication options by balashashanka · 4 years, 1 month ago
  84. 0cd0022 Merge pull request #507 from balashashanka/NUTCH-2777 by Sebastian Nagel · 4 years, 1 month ago
  85. e6d3e57 Merge pull request #506 from sebastian-nagel/NUTCH-2775-robots-min-delay by Sebastian Nagel · 4 years, 1 month ago
  86. d7b6ccf NUTCH-2777 - Upgrade to Hadoop 3.1 by balashashanka · 4 years, 1 month ago
  87. e6bc451 NUTCH-2775 Fetcher to guarantee minimum delay even if robots.txt defines shorter Crawl-delay by Sebastian Nagel · 4 years, 2 months ago
  88. 0f33d18 NUTCH-2776 Fetcher to temporarily deduplicate followed redirects by Sebastian Nagel · 4 years, 2 months ago
  89. e9dd180 Merge pull request #501 from sebastian-nagel/NUTCH-2773-segment-reader-recode-html by Sebastian Nagel · 4 years, 2 months ago
  90. 3cffe3b Merge pull request #502 from sebastian-nagel/NUTCH-2774-override-annotations by Sebastian Nagel · 4 years, 2 months ago
  91. 4443cc1 NUTCH-2770 Subcollection logic allows empty string as a whitelist value, thus matching every incoming document by Sebastian Nagel · 4 years, 2 months ago
  92. 22e668d NUTCH-2774 Annotate methods implementing the Hadoop API by @Override by Sebastian Nagel · 4 years, 2 months ago
  93. 5076430 NUTCH-2773 SegmentReader (-dump or -get): show HTML content as UTF-8 by Sebastian Nagel · 5 years ago
  94. caea3a0 NUTCH-2772 Debugging parse filter to show serialized DOM tree by Sebastian Nagel · 4 years, 2 months ago
  95. ebc2152 Merge pull request #498 from sebastian-nagel/NUTCH-2763-protocol-okhttp-store-headers-status-line by Sebastian Nagel · 4 years, 3 months ago
  96. 1cdbb93 Merge pull request #499 from sebastian-nagel/NUTCH-2768-fetcher-thread-unnecessary-class-cast by Sebastian Nagel · 4 years, 3 months ago
  97. ac4f2f4 Merge pull request #497 from sebastian-nagel/NUTCH-2767-queue-feeder-above-exceptions-threshold by Sebastian Nagel · 4 years, 3 months ago
  98. 77ec28f NUTCH-2768 FetcherThread: unnecessary usage of class casts by Sebastian Nagel · 4 years, 3 months ago
  99. 6dd0a7f NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 4 years, 3 months ago
  100. 35dcd42 NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 4 years, 3 months ago