1. cac79d3 Merge pull request #542 from sebastian-nagel/NUTCH-2810 by Sebastian Nagel · 5 days ago master
  2. 2f5a8ad Merge pull request #537 from sebastian-nagel/NUTCH-2801-robots-checker by Sebastian Nagel · 5 days ago
  3. e33aaa1 NUTCH-2811 : Setup Github workflows for prs (#543) by Madhawa Gunasekara · 5 days ago
  4. 5399ce0 Merge pull request #536 from sebastian-nagel/NUTCH-2799-asf-yaml-file by Sebastian Nagel · 6 days ago
  5. f0161ea NUTCH-2805: Rename plugin urlfilter-domainblacklist (#540) by Shashanka Balakuntala Srinivasa · 10 days ago
  6. 46f7dc2 NUTCH-2810 FreeGenerator to actually apply configured number of fetch lists by Sebastian Nagel · 12 days ago
  7. dd80576 Merge pull request #538 from balashashanka/NUTCH-2782 by Sebastian Nagel · 4 weeks ago
  8. ff15671 Merge pull request #535 from sebastian-nagel/NUTCH-2796-NUTCH-2730 by Sebastian Nagel · 4 weeks ago
  9. 996ff8b NUTCH-2782: protocol-http / lib-http: support TLSv1.3 by shbalaku · 4 weeks ago
  10. 6801ac7 [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 4 weeks ago
  11. 3d1fc94 NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 4 weeks ago
  12. f24ccab [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 4 weeks ago
  13. cbab5c8 NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 4 weeks ago
  14. c8a71a8 [NUTCH-2730] SitemapProcessor to treat sitemap URLs as Set instead of List by Sebastian Nagel · 5 weeks ago
  15. ca4a039 [NUTCH-2796] Upgrade to crawler-commons 1.1 by Sebastian Nagel · 5 weeks ago
  16. a1adce7 Prepare for new development after release of 1.17 by Sebastian Nagel · 7 weeks ago
  17. 1c2e411 NUTCH-2794 Add additional ciphers to HTTP base's default cipher suite by Markus Jelsma · 7 weeks ago
  18. 59d0d95 Merge pull request #533 from pmezard/NUTCH-2791 by Sebastian Nagel · 8 weeks ago
  19. 6b6e74c NUTCH-2791 Handle GCS URLs in stats commands by Patrick Mezard · 9 weeks ago
  20. a48a9f6 Merge pull request #530 from sebastian-nagel/NUTCH-2789 by Sebastian Nagel · 8 weeks ago
  21. 5640caa Merge pull request #529 from sebastian-nagel/NUTCH-2788 by Sebastian Nagel · 8 weeks ago
  22. b1b6128 Merge pull request #531 from sebastian-nagel/NUTCH-2787 by Sebastian Nagel · 8 weeks ago
  23. 3feaf03 Merge pull request #532 from pmezard/NUTCH-2790 by Sebastian Nagel · 8 weeks ago
  24. 6fa02ef NUTCH-2790 indexer-csv: escape field leading quote character by Patrick Mezard · 9 weeks ago
  25. c51604b NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly by Sebastian Nagel · 9 weeks ago
  26. 5c98446 Merge pull request #527 from sebastian-nagel/NUTCH-2496 by Sebastian Nagel · 9 weeks ago
  27. 1cb64df Merge pull request #528 from sebastian-nagel/NUTCH-2720 by Sebastian Nagel · 9 weeks ago
  28. 2021ca6 NUTCH-2789 Documentation: update links to point to cwiki by Sebastian Nagel · 9 weeks ago
  29. bbaff6f NUTCH-2789 Docker README: update links to point to cwiki by Sebastian Nagel · 9 weeks ago
  30. e3f7725 NUTCH-2788 ParseData: improve presentation of Metadata in method toString() by Sebastian Nagel · 9 weeks ago
  31. aa3a2a6 NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 3 months ago
  32. f0e1e3d NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 3 months ago
  33. 7fba6df NUTCH-2496 Speed up link inversion step in crawling script by Sebastian Nagel · 3 months ago
  34. 9139d6e Merge pull request #526 from sebastian-nagel/NUTCH-2419-urlfilter-rule-file-precedence by Sebastian Nagel · 3 months ago
  35. f971ca1 NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 3 months ago
  36. e61a8a3 Merge pull request #525 from sebastian-nagel/NUTCH-1945 by Sebastian Nagel · 3 months ago
  37. b543b8b NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 11 months ago
  38. ec93b33 Merge pull request #522 from sebastian-nagel/NUTCH-2758 by Sebastian Nagel · 3 months ago
  39. ede1489 Merge pull request #523 from sebastian-nagel/NUTCH-2753 by Sebastian Nagel · 3 months ago
  40. 40472c1 Merge pull request #521 from sebastian-nagel/NUTCH-2002-checkers-robotstxt by Sebastian Nagel · 3 months ago
  41. aa0c75e Merge pull request #519 from sebastian-nagel/NUTCH-2785-freegenerator-num-fetch-lists by Sebastian Nagel · 3 months ago
  42. 04e1592 Merge pull request #514 from sebastian-nagel/NUTCH-1194-generator-release-crawldb-lock-earlier by Sebastian Nagel · 3 months ago
  43. 11eea5a NUTCH-1194 Generator: CrawlDB lock should be released earlier by Sebastian Nagel · 4 months ago
  44. 0341f0d NUTCH-1945 Test for XLSX parser by Sebastian Nagel · 3 months ago
  45. a0ed0b4 NUTCH-2434 Add methods to reset parameters HTMLMetaTags by Sebastian Nagel · 3 months ago
  46. c573c70 NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and LinkDbReader by Sebastian Nagel · 3 months ago
  47. 90502bd NUTCH-2758 Add plugin READMEs to binary release packages by Sebastian Nagel · 3 months ago
  48. 46db3ed NUTCH-2002 parse and index checkers to check robots.txt by Sebastian Nagel · 3 months ago
  49. 6a98ae7 Merge pull request #520 from sebastian-nagel/NUTCH-2743 by Sebastian Nagel · 3 months ago
  50. 2d534d6 Merge pull request #518 from sebastian-nagel/NUTCH-2784-tool-listing-properties by Sebastian Nagel · 3 months ago
  51. c50575a Merge pull request #517 from sebastian-nagel/NUTCH-2495-bin-crawl-delete-while-indexing by Sebastian Nagel · 3 months ago
  52. a8162b9 Merge pull request #505 from sebastian-nagel/NUTCH-2776-fetcher-dedup-redirects by Sebastian Nagel · 3 months ago
  53. 3665345 Merge pull request #500 from sebastian-nagel/NUTCH-2772-parsefilter-debug by Sebastian Nagel · 3 months ago
  54. 462ca6e NUTCH-2743 Add list of Nutch properties (nutch-default.xml) to documentation by Sebastian Nagel · 3 months ago
  55. 72f3ff2 NUTCH-2785 FreeGenerator: command-line option to define number of generated fetch lists by Sebastian Nagel · 3 months ago
  56. 73880df Merge pull request #509 from sebastian-nagel/NUTCH-2778-indexer-elastic-log-errors by Sebastian Nagel · 3 months ago
  57. 7ebd35d NUTCH-2495: Use -deleteGone instead of clean job in crawl script while indexing by Sebastian Nagel · 3 months ago
  58. a455eb5 NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode by Sebastian Nagel · 3 months ago
  59. fccc634 Merge pull request #513 from sebastian-nagel/NUTCH-2501-java-heap-size-distr-mode by Sebastian Nagel · 3 months ago
  60. b5e794e NUTCH-2501 allow to set Java heap size when using crawl script in distributed mode by Sebastian Nagel · 4 months ago
  61. 1bc4aeb Merge pull request #512 from sebastian-nagel/NUTCH-2781-increase-java-heap-size by Sebastian Nagel · 3 months ago
  62. 3214840 NUTCH-2781 Increase default Java heap size by Sebastian Nagel · 4 months ago
  63. 2e2ce6a Merge pull request #516 from sebastian-nagel/NUTCH-2783-logging by Sebastian Nagel · 3 months ago
  64. 3d3018b NUTCH-2783 Use (more) parametrized logging by Sebastian Nagel · 4 months ago
  65. 52eec66 Merge pull request #515 from balashashanka/NUTCH-2780 by Sebastian Nagel · 3 months ago
  66. ca457fc NUTCH-2780 : Upgrade index-solr to use Solr 8.5.1 by balashashanka · 3 months ago
  67. a20c261 NUTCH-2784 Tool to list Nutch properties and configured values by Sebastian Nagel · 5 months ago
  68. 5b4f595 NUTCH-2780 : Upgrade index-solr to use Solr 8.5.1 by balashashanka · 4 months ago
  69. 49eb1bd Merge pull request #511 from sebastian-nagel/NUTCH-2779-tika-1.24.1 by Sebastian Nagel · 4 months ago
  70. e1ba9f1 NUTCH-2779 Upgrade to Tika 1.24.1 by Sebastian Nagel · 4 months ago
  71. dcbb0f2 Merge pull request #510 from balashashanka/NUTCH-2755 by Sebastian Nagel · 4 months ago
  72. 6ae4468 NUTCH-2755: Remove obsolete plugin indexer-elastic-rest by balashashanka · 4 months ago
  73. 6741574 NUTCH-2755: Remove obsolete plugin indexer-elastic-rest by balashashanka · 4 months ago
  74. 6f51618 Merge pull request #508 from balashashanka/NUTCH-2757 by Sebastian Nagel · 4 months ago
  75. 240aac9 Corrected README.md file by balashashanka · 4 months ago
  76. e471397 NUTCH-2757 - Indexer-elastic: add authentication options by balashashanka · 4 months ago
  77. 81a4b92 NUTCH-2778 indexer-elastic to properly log errors by Sebastian Nagel · 4 months ago
  78. e5b61de NUTCH-2757 - Indexer-elastic: add authentication options by balashashanka · 4 months ago
  79. f999ca5 NUTCH-2757 : Indexer-elastic: add authentication options by balashashanka · 4 months ago
  80. 0cd0022 Merge pull request #507 from balashashanka/NUTCH-2777 by Sebastian Nagel · 4 months ago
  81. e6d3e57 Merge pull request #506 from sebastian-nagel/NUTCH-2775-robots-min-delay by Sebastian Nagel · 4 months ago
  82. d7b6ccf NUTCH-2777 - Upgrade to Hadoop 3.1 by balashashanka · 4 months ago
  83. e6bc451 NUTCH-2775 Fetcher to guarantee minimum delay even if robots.txt defines shorter Crawl-delay by Sebastian Nagel · 5 months ago
  84. 0f33d18 NUTCH-2776 Fetcher to temporarily deduplicate followed redirects by Sebastian Nagel · 5 months ago
  85. e9dd180 Merge pull request #501 from sebastian-nagel/NUTCH-2773-segment-reader-recode-html by Sebastian Nagel · 5 months ago
  86. 3cffe3b Merge pull request #502 from sebastian-nagel/NUTCH-2774-override-annotations by Sebastian Nagel · 5 months ago
  87. 4443cc1 NUTCH-2770 Subcollection logic allows empty string as a whitelist value, thus matching every incoming document by Sebastian Nagel · 5 months ago
  88. 22e668d NUTCH-2774 Annotate methods implementing the Hadoop API by @Override by Sebastian Nagel · 5 months ago
  89. 5076430 NUTCH-2773 SegmentReader (-dump or -get): show HTML content as UTF-8 by Sebastian Nagel · 1 year, 6 months ago
  90. caea3a0 NUTCH-2772 Debugging parse filter to show serialized DOM tree by Sebastian Nagel · 5 months ago
  91. ebc2152 Merge pull request #498 from sebastian-nagel/NUTCH-2763-protocol-okhttp-store-headers-status-line by Sebastian Nagel · 5 months ago
  92. 1cdbb93 Merge pull request #499 from sebastian-nagel/NUTCH-2768-fetcher-thread-unnecessary-class-cast by Sebastian Nagel · 5 months ago
  93. ac4f2f4 Merge pull request #497 from sebastian-nagel/NUTCH-2767-queue-feeder-above-exceptions-threshold by Sebastian Nagel · 5 months ago
  94. 77ec28f NUTCH-2768 FetcherThread: unnecessary usage of class casts by Sebastian Nagel · 6 months ago
  95. 6dd0a7f NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 6 months ago
  96. 35dcd42 NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 6 months ago
  97. 7840cb6 NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 6 months ago
  98. 9449417 NUTCH-2763 protocol-okhttp (store.http.headers): add whitespace in status line after status code also when message is empty by Sebastian Nagel · 6 months ago
  99. 8e5837f NUTCH-2767 Fetcher to stop filling queues skipped due to repeated exception by Sebastian Nagel · 6 months ago
  100. 142a026 Merge pull request #495 from sebastian-nagel/NUTCH-2672-build-docs-use-https by Sebastian Nagel · 7 months ago