1. 271f92e NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811) by Lewis John McGibbney · 2 weeks ago master
  2. c9e2f4e NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810) by Joe Gilvary · 3 weeks ago
  3. 1563396 NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… (#807) by Lewis John McGibbney · 4 weeks ago
  4. 5a95bc6 NUTCH-3035 Update license and notice file for release of 1.20 (#808) by Sebastian Nagel · 4 weeks ago
  5. 3905a8d NUTCH-3037 Upgrade org.apache.kafka:kafka_2.12: to v3.7.0 (#809) by Lewis John McGibbney · 4 weeks ago
  6. 367988d NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues by Sebastian Nagel · 6 weeks ago
  7. 9890223 NUTCH-3029 by Markus Jelsma · 6 weeks ago
  8. a8ec17c NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
  9. 84cda2a NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
  10. 5ba50c0 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
  11. 4f62dec NUTCH-3033 Upgrade Ivy to v2.5.2 (#803) by Lewis John McGibbney · 6 weeks ago
  12. 4642c30 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
  13. 551c50b NUTCH-3030 Use system default cipher suites instead of hard-coded set by Markus Jelsma · 6 weeks ago
  14. 42b55f6 Update Dockerfile / JAVA_HOME - 2nd try (#805) by Jakob Berlin · 6 weeks ago
  15. c390dfc NUTCH-3031 ProtocolFactory host mapper to support domains by Markus Jelsma · 6 weeks ago
  16. 83acd50 Update crawl documentation by Jakob Berlin · 4 months ago
  17. 6b04554 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 3 months ago
  18. d95e1a7 NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java by Markus Jelsma · 3 months ago
  19. 85fea6e NUTCH-3024 Remove flaky 'dependency check' target (#795) by Lewis John McGibbney · 5 months ago
  20. 7ad382d Merge pull request #796 from DigitalPebble/NUTCH-3025 by Sebastian Nagel · 6 months ago
  21. 49d85ea Merged changes from master; improved Javadoc and exception handling by Julien Nioche · 6 months ago
  22. adadc43 Merge branch 'NUTCH-3017', closes #793 by Sebastian Nagel · 6 months ago
  23. ac383fc [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Sebastian Nagel · 6 months ago
  24. d764e4c Added filtering on whole string + documented config in nutch-default + fixed tests by Julien Nioche · 6 months ago
  25. 9084912 NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794) by Tim Allison · 6 months ago
  26. f88b9a1 NUTCH-3019 -- update Tika (#797) by Tim Allison · 6 months ago
  27. d8e66ce [NUTCH-3025^Curlfilter-fast to filter based on the length of the URL by Julien Nioche · 6 months ago
  28. bbf0867 NUTCH-3014 Standardize Job names (#789) by Lewis John McGibbney · 6 months ago
  29. d1025fd [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Julien Nioche · 6 months ago
  30. 792ed28 NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790) by Lewis John McGibbney · 6 months ago
  31. 8431dcf NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788) by Lewis John McGibbney · 6 months ago
  32. d2c3e96 NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents by Sebastian Nagel · 7 months ago
  33. b081c75 NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) by Sebastian Nagel · 7 months ago
  34. ecdd19d NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 (#779) by Sebastian Nagel · 6 months ago
  35. bb68385 NUTCH-3009 Upgrade to Hadoop 3.3.6 by Sebastian Nagel · 7 months ago
  36. e96cfc5 NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive by Sebastian Nagel · 7 months ago
  37. 97eb0b5 Merge pull request #776 from tballison/NUTCH-2959 by Tim Allison · 6 months ago
  38. 9aabc45 update howto_upgrade_tika.txt by tallison · 7 months ago
  39. 9faf364 Working now locally and with Seb's single_node_cluster tests by tallison · 7 months ago
  40. e32a0e0 Merge remote-tracking branch 'upstream/master' into NUTCH-2959 by tallison · 7 months ago
  41. a74b57b NUTCH-2853 bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean by Sebastian Nagel · 7 months ago
  42. a1ab433 NUTCH-2897 Do not supress deprecated API warnings by Sebastian Nagel · 7 months ago
  43. 810b1d6 NUTCH-3010 Injector: count unique number of injected URLs by Sebastian Nagel · 7 months ago
  44. a72a53a NUTCH-3007 Fix impossible casts by Sebastian Nagel · 7 months ago
  45. 417b877 NUTCH-2852 SpotBugs: Method invokes System.exit(...) by Sebastian Nagel · 7 months ago
  46. 45c9de3 Merge pull request #778 from tballison/NUTCH-3004 by Tim Allison · 7 months ago
  47. 5be64d2 NUTCH-3004 -- propagate ssl exception if message doesn't match "handshake alert..." by tballison · 7 months ago
  48. 0f801c1 NUTCH-2959 -- downgrade commons-io to match the version we expect to come out with Hadoop 3.4.0. by tallison · 7 months ago
  49. f11d383 NUTCH-2959 -- bump commons-io by tallison · 7 months ago
  50. 6bfeaf4 NUTCH-2959 -- bump Tika to 2.9.0, bump common dependencies throughout by tallison · 7 months ago
  51. 85bdd00 Merge remote-tracking branch 'upstream/master' into NUTCH-2959 by tallison · 7 months ago
  52. d81be51 Merge pull request #772 from tballison/NUTCH-2978 by Tim Allison · 7 months ago
  53. 56fdbbe Merge remote-tracking branch 'upstream/master' into NUTCH-2978 by tballison · 7 months ago
  54. 51055ef NUTCH-2978 -- update slf4j-api by tballison · 7 months ago
  55. 10f7c0c NUTCH-2959 -- bump Tika to 2.9.0 by tallison · 7 months ago
  56. 0ad935f Merge pull request #775 from tballison/NUTCH-2998 by Tim Allison · 7 months ago
  57. f078a88 Merge pull request #774 from tballison/NUTCH-3001 by Tim Allison · 7 months ago
  58. c1ba16c Merge pull request #773 from tballison/NUTCH-3000 by Tim Allison · 7 months ago
  59. 8a5ef49 Remove Any23 from Nutch by tallison · 7 months ago
  60. b6f645a NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header by tallison · 7 months ago
  61. 820d129 NUTCH-3000 - the selenium protocol should return the full html, not just the inner body element. by tallison · 7 months ago
  62. daedbc3 NUTCH-2978 -- exclude reload4j and update LICENSE-binary and NOTICE-binary. by tballison · 8 months ago
  63. 0421775 Merge remote-tracking branch 'upstream/master' into NUTCH-2978 by tballison · 8 months ago
  64. 39d59f4 Merge pull request #771 from tballison/NUTCH-2999 by Tim Allison · 8 months ago
  65. 8d9c77f NUTCH-2999 -- upgrade lucene to latest 8.x throughout by tballison · 8 months ago
  66. cf74770 NUTCH-2978, upgrade to slf4j2 throughout, first steps by tballison · 8 months ago
  67. e93aa97 Merge pull request #770 from apache/NUTCH-2999 by Tim Allison · 8 months ago
  68. 3bb8b0e NUTCH-2999 -- upgrade Lucene to latest 8.x throughout by tballison · 8 months ago
  69. f5cd0d6 Merge pull request #768 from tballison/NUTCH-2989 by Tim Allison · 8 months ago
  70. 5a223e1 NUTCH-2989 -- ElasticIndexWriter should enable auth for https, too by tballison · 8 months ago
  71. 0fae6b5 NUTCH-2997 Add Override annotations by Sebastian Nagel · 8 months ago
  72. 070c115 NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4 by Sebastian Nagel · 8 months ago
  73. a24ec5c NUTCH-2995 Upgrade to crawler-commons 1.4 by Sebastian Nagel · 8 months ago
  74. eae3c52 NUTCH-2993 ScoringDepth plugin to skip depth check based on URL Pattern by Sebastian Nagel · 9 months ago
  75. 9109bdd NUTCH-2991 Support HTTP/S Header Authorization for Solr connections (#763) by Sebastian Nagel · 11 months ago
  76. 98d02e7 NUTCH-2992 Fetcher: always block fetch queues when exceptions threshold is reached by Sebastian Nagel · 11 months ago
  77. 215993b NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty by Sebastian Nagel · 1 year, 2 months ago
  78. b4cb5c1 NUTCH-2984 Drop test proxy server and benchmark tool by Sebastian Nagel · 1 year, 2 months ago
  79. 1999b1e NUTCH-2985 Disable plugin urlfilter-validator by default by Sebastian Nagel · 1 year, 2 months ago
  80. c8aecfa NUTCH-2983 nutch-default.xml improvements by Sebastian Nagel · 1 year, 2 months ago
  81. a92878d NUTCH-2972 Javadoc build fails using JDK 17 by Sebastian Nagel · 1 year, 2 months ago
  82. ef29496 NUTCH-2982 Generator: parameter for URL normalization not passed forward by Sebastian Nagel · 1 year, 2 months ago
  83. e8fd210 Add indexer-opensearch-1x to 4 more targets...feedback from sebastian-nagel by tballison · 1 year, 2 months ago
  84. e03cad3 fix template to include new key store info. remove unused auth by tallison · 1 year, 2 months ago
  85. 71fabb2 NUTCH-2920 -- improve username/pw logic and update README.md by tallison · 1 year, 2 months ago
  86. 5fc2839 NUTCH-2920 -- improve handling for missing trust.store.path in the index-writers.xml by tallison · 1 year, 2 months ago
  87. f6b1717 NUTCH-2920 -- add keystore for 2-way tls; add back in no-tls option with a stern warning and possibly helpful links. by tallison · 1 year, 2 months ago
  88. 6e149f4 NUTCH-2920 -- fix imports by tallison · 1 year, 2 months ago
  89. ca3824f NUTCH-2920 -- first working attempt at migrating ElasticsearchIndexWriter to OpenSearch by tallison · 1 year, 2 months ago
  90. 383aeca NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit by Kamil Mroczek · 1 year, 3 months ago
  91. 19dbe78 Merge pull request #752 from sebastian-nagel/NUTCH-2974 by Sebastian Nagel · 1 year, 2 months ago
  92. 541e693 NUTCH-2974 Ant build fails with "Unparseable date" on certain platforms by Sebastian Nagel · 1 year, 3 months ago
  93. 9a1ed40 Merge pull request #751 from sebastian-nagel/NUTCH-2634 by Sebastian Nagel · 1 year, 4 months ago
  94. dfdd00f NUTCH-2634 Some links marked as "nofollow" are followed anyway by Sebastian Nagel · 1 year, 4 months ago
  95. 7d39004 NUTCH-2924 Generate maxCount expr evaluated only once by Markus Jelsma · 1 year, 4 months ago
  96. d806aa4 NUTCH-2977 by Markus Jelsma · 1 year, 5 months ago
  97. ed7b661 Merge pull request #748 from sebastian-nagel/NUTCH-2883-docker by Sebastian Nagel · 1 year, 8 months ago
  98. 7c1a48c NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container by Sebastian Nagel · 1 year, 8 months ago
  99. 0bda1bd NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container by Sebastian Nagel · 2 years, 7 months ago
  100. 989c2ca NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container by Lewis John McGibbney · 2 years, 10 months ago