1. 3eaac31 NUTCH-2892 Upgrade to Any23 2.5 by Lewis John McGibbney · 2 years, 8 months ago
  2. e4b7be9 NUTCH-2885 Upgrade to Log4j2 (#692) by Lewis John McGibbney · 2 years, 10 months ago
  3. 96b6e0c NUTCH-2886 Move Nutch WebApp to separate repository (#693) by Lewis John McGibbney · 2 years, 11 months ago
  4. 53ed506 fireant upgrade dependency httpcore in ivy/ivy.xml from 4.4.9 to 4.4.14 (#681) by Lewis John McGibbney · 2 years, 11 months ago
  5. d6875e1 NUTCH-2882 Configure NutchUiServer for DEPLOYMENT and improve logging (#690) by Lewis John McGibbney · 2 years, 11 months ago
  6. 08de742 NUTCH-2881 bug in 'nutch' symlink in docker container (#689) by Lewis John McGibbney · 2 years, 11 months ago
  7. abb6927 Merge pull request #649 from sebastian-nagel/NUTCH-2868-urlnormalizer-protocol-exception-reading-config-file by Sebastian Nagel · 3 years ago
  8. 2702c6f NUTCH-2868 urlnormalizer-protocol fails with StringIndexOutOfBoundsException by Sebastian Nagel · 3 years ago
  9. 9c8ae8e fireant upgrade dependency junit in ivy/ivy.xml from 4.13.1 to 4.13.2 (#666) by Lewis John McGibbney · 3 years ago
  10. 41bf0a1 Merge pull request #650 from sebastian-nagel/NUTCH-2869-plugins-override-annotation by Sebastian Nagel · 3 years ago
  11. 0e3e021 NUTCH-2869 Add @Override annotations to Nutch plugins by Sebastian Nagel · 3 years ago
  12. cc8d76a NUTCH-2864 Upgrade Dockerfile to use JDK 11 (#647) by Lewis John McGibbney · 3 years ago
  13. 18d2872 Merge pull request #648 from sebastian-nagel/NUTCH-2866-metadata-tostring by Sebastian Nagel · 3 years ago
  14. 0d6eaa3 NUTCH-2866 Fix MetaData.toString() to return "key=value ..." by Sebastian Nagel · 3 years ago
  15. 6c02da0 Merge pull request #576 from sebastian-nagel/NUTCH-2859-urlnormalizer-protocol-domain-rules by Sebastian Nagel · 3 years, 2 months ago
  16. 2837039 NUTCH-2855 Update org.elasticsearch.client (#577) by Lewis John McGibbney · 3 years, 2 months ago
  17. 081c826 NUTCH-2859: urlnormalizer-protocol: allow to normalize domains by Sebastian Nagel · 3 years, 2 months ago
  18. d749920 NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization by Sebastian Nagel · 3 years, 2 months ago
  19. c454a64 NUTCH-2858 urlnormalizer-protocol: URL port is lost during normalization by Sebastian Nagel · 3 years, 2 months ago
  20. b91fae5 NUTCH-2857 Upgrade from JDK1.8 --> JDK11 (#573) by Lewis John McGibbney · 3 years, 2 months ago
  21. 81fb7bc Merge pull request #574 from sebastian-nagel/NUTCH-2596-http-protocol-plugin-test-remove-jsp by Sebastian Nagel · 3 years, 2 months ago
  22. d193137 NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty by Sebastian Nagel · 4 years, 4 months ago
  23. 2724578 NUTCH-2850 Method ignores exceptional return value (#570) by Lewis John McGibbney · 3 years, 3 months ago
  24. 5250d62 NUTCH-2851 Random object created and used only once (#571) by Lewis John McGibbney · 3 years, 3 months ago
  25. 2fae4cd NUTCH-2849 Replace remaining package.html files with package-info.java (#569) by Lewis John McGibbney · 3 years, 3 months ago
  26. 64bf638 NUTCH-2842 Fix Javadoc warnings, errors and add Javadoc check to Github Action and Jenkins (#568) by Lewis John McGibbney · 3 years, 3 months ago
  27. 2d69dab Merge pull request #567 from sebastian-nagel/NUTCH-2847-http-date-format-new-api by Sebastian Nagel · 3 years, 4 months ago
  28. 491b5c2 Merge pull request #566 from sebastian-nagel/NUTCH-2846 by Sebastian Nagel · 3 years, 4 months ago
  29. 3483a41 Merge pull request #458 from aalbahem/NUTCH-1403 by Sebastian Nagel · 3 years, 4 months ago
  30. 7ffc667 Merge pull request #564 from sebastian-nagel/NUTCH-2845-urlfilter-suffix-rules by Sebastian Nagel · 3 years, 4 months ago
  31. 66bb62a NUTCH-2840 Fix 'report-vulnerabilities' ant target in build.xml (#561) by Lewis John McGibbney · 3 years, 4 months ago
  32. cc0da7e NUTCH-2819 Move spotbugs "installation" directory to avoid that spotbugs is shipped in Nutch runtime (#565) by Sebastian Nagel · 3 years, 4 months ago
  33. abc8083 NUTCH-2847 HttpDateFormat: Simplify based on new Java 8 DateTime API by Sebastian Nagel · 3 years, 4 months ago
  34. 0df39c0 NUTCH-2846 Remove invalid declaration of constructor by Sebastian Nagel · 3 years, 4 months ago
  35. 58cd08f NUTCH-2846 Use integer arithmetic instead of floating point with rounding by Sebastian Nagel · 3 years, 4 months ago
  36. 5f94d3a NUTCH-2846 Fix incorrect bracketing in calculation of hash code by Sebastian Nagel · 3 years, 4 months ago
  37. 6202bfa NUTCH-2846 Parse integer numbers via Integer.parseInt(...) by Sebastian Nagel · 3 years, 4 months ago
  38. ebf348c Prepare for Nutch 1.19-SNAPSHOT development by Lewis John McGibbney · 3 years, 4 months ago
  39. 2cf4d62 NUTCH-2845 Complete rules of urlfilter-suffix, by Sebastian Nagel · 4 years, 7 months ago
  40. 59c63c7 NUTCH-2841 Upgrade xercesImpl dependency (#563) by Lewis John McGibbney · 3 years, 4 months ago
  41. 93aa2ab Improve NUTCH-1403, add ASLv2 header by Ameer Albahem · 3 years, 4 months ago
  42. cdb6b52 Improve fix for NUTCH-1403 by Ameer Albahem · 3 years, 5 months ago
  43. 775cd8f Merge branch 'master' of https://github.com/apache/nutch into NUTCH-1403 by Ameer Albahem · 3 years, 5 months ago
  44. 7f0fdb1 NUTCH-2837 Update multiple dependencies (#560) by Lewis John McGibbney · 3 years, 5 months ago
  45. fbd53ba NUTCH-2836 Upgrade various commons dependencies (#559) by Lewis John McGibbney · 3 years, 5 months ago
  46. 88a17f2 Add possibility to setup deduplication group mode in crawl script (#557) by Jakob Berlin · 3 years, 5 months ago
  47. 8d8e08b NUTCH-2835 Upgrade commons-jexl from 2 --> 3 (#558) by Lewis John McGibbney · 3 years, 5 months ago
  48. 4c7d422 Merge pull request #556 from sebastian-nagel/tika-1.25 by Sebastian Nagel · 3 years, 6 months ago
  49. 40218a0 NUTCH-2833 Upgrade to Tika 1.25 by Sebastian Nagel · 3 years, 6 months ago
  50. c1cf6bb Merge pull request #554 from sebastian-nagel/NUTCH-2582-set-mime-types-reader-pool-size by Sebastian Nagel · 3 years, 6 months ago
  51. 235af3c NUTCH-2809 Upgrade any23 plugin dependency to 2.4 (#553) by Lewis John McGibbney · 3 years, 6 months ago
  52. 80147d6 Merge pull request #555 from sebastian-nagel/NUTCH-2829-ant-clean-cache by Sebastian Nagel · 3 years, 6 months ago
  53. 96c01cb NUTCH-2829 Fix ant target "clean-cache" by Sebastian Nagel · 3 years, 7 months ago
  54. 975452f NUTCH-2582 Set pool size of XML SAX parsers used for MIME detection in Tika by Sebastian Nagel · 3 years, 7 months ago
  55. 680df6b Merge pull request #552 from sebastian-nagel/NUTCH-2824 by Sebastian Nagel · 3 years, 8 months ago
  56. 0b46ac2 Merge pull request #551 from sebastian-nagel/NUTCH-2823 by Sebastian Nagel · 3 years, 8 months ago
  57. 44cdb20 NUTCH-2824 urlnormalizer-basic to unescape percent-encoded host names by Sebastian Nagel · 3 years, 8 months ago
  58. 66f50be NUTCH-2824 urlnormalizer-basic to unescape percent-encoded host names by Sebastian Nagel · 3 years, 9 months ago
  59. f3afee0 Merge pull request #549 from sebastian-nagel/NUTCH-2818-ant-rat-task by Sebastian Nagel · 3 years, 9 months ago
  60. 1e4be7e Merge pull request #546 from sebastian-nagel/NUTCH-2814-http-date-format-time-zone by Sebastian Nagel · 3 years, 9 months ago
  61. 96bd757 NUTCH-2823 IllegalStateException in IndexWriters.describe() when validating url param for SolrIndexer by Sebastian Nagel · 3 years, 9 months ago
  62. d07c075 NUTCH-2697 Upgrade Ivy to fix the issue of an unset packaging.type property by Sebastian Nagel · 3 years, 10 months ago
  63. ae844b6 Merge branch 'derhecht-patch-2', closes #545 by Sebastian Nagel · 3 years, 9 months ago
  64. 69deffa NUTCH-2817 Avoid check for equality of URL path and file part using ==/!= by Sebastian Nagel · 3 years, 10 months ago
  65. e7a3da3 NUTCH-2816 Add Spotbugs target to ant build by Sebastian Nagel · 3 years, 10 months ago
  66. b4b81f7 NUTCH-2811 : Setup Github workflows for prs (#543) by Madhawa Gunasekara · 3 years, 10 months ago
  67. a51b0f5 NUTCH-2810 FreeGenerator to actually apply configured number of fetch lists by Sebastian Nagel · 3 years, 10 months ago
  68. a73bd14 [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 3 years, 11 months ago
  69. d3d3b31 [NUTCH-2801] RobotsRulesParser command-line checker to use http.robots.agents as fall-back by Sebastian Nagel · 3 years, 11 months ago
  70. 2c3d864 NUTCH-1190 MoreIndexingFilter: move data formats used to parse "lastModified" to a config file by Jakob Berlin · 3 years, 10 months ago
  71. 0d6447a NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 3 years, 11 months ago
  72. 669e5a1 NUTCH-2799 Add .asf.yaml file by Sebastian Nagel · 3 years, 11 months ago
  73. 4cc6048 NUTCH-2805: Rename plugin urlfilter-domainblacklist (#540) by Shashanka Balakuntala Srinivasa · 3 years, 10 months ago
  74. 50eba77 NUTCH-2782: protocol-http / lib-http: support TLSv1.3 by shbalaku · 3 years, 11 months ago
  75. 7b16354 [NUTCH-2730] SitemapProcessor to treat sitemap URLs as Set instead of List by Sebastian Nagel · 3 years, 11 months ago
  76. 6fb5ebb [NUTCH-2796] Upgrade to crawler-commons 1.1 by Sebastian Nagel · 3 years, 11 months ago
  77. 4b505f2 Prepare for new development after release of 1.17 by Sebastian Nagel · 4 years ago
  78. 38f6f56 NUTCH-2794 Add additional ciphers to HTTP base's default cipher suite by Markus Jelsma · 4 years ago
  79. 5649513 NUTCH-2791 Handle GCS URLs in stats commands by Patrick Mezard · 4 years ago
  80. 75e4e63 NUTCH-2789 Documentation: update links to point to cwiki by Sebastian Nagel · 4 years ago
  81. f08c9db NUTCH-2789 Docker README: update links to point to cwiki by Sebastian Nagel · 4 years ago
  82. e8673d1 NUTCH-2788 ParseData: improve presentation of Metadata in method toString() by Sebastian Nagel · 4 years ago
  83. 41d3eb1 NUTCH-2787 CrawlDb JSON dump does not export metadata primitive data types correctly by Sebastian Nagel · 4 years ago
  84. 6c65498 NUTCH-2790 indexer-csv: escape field leading quote character by Patrick Mezard · 4 years ago
  85. ea6b2f0 NUTCH-2496 Speed up link inversion step in crawling script by Sebastian Nagel · 4 years ago
  86. fa319a6 NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 4 years ago
  87. 5087151 NUTCH-2720 ROBOTS metatag ignored when capitalized by Sebastian Nagel · 4 years ago
  88. 83011a0 NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 4 years ago
  89. 79f3c0a NUTCH-2419 Some URL filters and normalizers do not respect command-line override for rule file by Sebastian Nagel · 4 years, 8 months ago
  90. 3759019 NUTCH-1945 Test for XLSX parser by Sebastian Nagel · 4 years, 1 month ago
  91. 495f0ea NUTCH-2758 Add plugin READMEs to binary release packages by Sebastian Nagel · 4 years, 1 month ago
  92. 72b941f NUTCH-2753 Add -listen option to command-line help of CrawlDbReader and LinkDbReader by Sebastian Nagel · 4 years, 1 month ago
  93. aed6fa7 NUTCH-2002 parse and index checkers to check robots.txt by Sebastian Nagel · 4 years, 1 month ago
  94. 06b2271 NUTCH-2785 FreeGenerator: command-line option to define number of generated fetch lists by Sebastian Nagel · 4 years, 1 month ago
  95. 814f8b9 NUTCH-1194 Generator: CrawlDB lock should be released earlier by Sebastian Nagel · 4 years, 1 month ago
  96. 1b27ab8 NUTCH-2434 Add methods to reset parameters HTMLMetaTags by Sebastian Nagel · 4 years, 1 month ago
  97. 7f51c25 NUTCH-2743 Add list of Nutch properties (nutch-default.xml) to documentation by Sebastian Nagel · 4 years, 1 month ago
  98. 4c8dd07 NUTCH-2818 Fix Apache Rat task to check sources for license headers by Sebastian Nagel · 3 years, 10 months ago
  99. 466cac5 Merge pull request #548 from sebastian-nagel/NUTCH-2817-spotbugs-object-equality by Sebastian Nagel · 3 years, 10 months ago
  100. da8c81c Merge pull request #547 from sebastian-nagel/NUTCH-2816-add-spotbugs-ant-target by Sebastian Nagel · 3 years, 10 months ago