1. ca03d9b NUTCH-3055 README: fix Github "hub" commands by Sebastian Nagel · 5 weeks ago master
  2. bfa07df Merge pull request #815 from sebastian-nagel/NUTCH-3044-generator-npe by Sebastian Nagel · 6 days ago
  3. 8abc78a NUTCH-3041 Address confusing logging in o.a.n.net.URLExemptionFilters (#813) by Lewis John McGibbney · 3 weeks ago
  4. 5f1330a NUTCH-3043 Generator: count URLs rejected by URL filters (#814) by Sebastian Nagel · 3 weeks ago
  5. ea9c7ee NUTCH-3039 Failure to handle ftp:// URLs by Sebastian Nagel · 8 weeks ago
  6. 7ac3ce2 NUTCH-3054 Address deprecation of Node16 for all GitHub Actions (#817) by Lewis John McGibbney · 5 weeks ago
  7. 817af69 Boostrap Nutch 1.21 development drive. by Lewis John McGibbney · 5 weeks ago
  8. c0b9461 Add GitHub CI badge to README by Lewis John McGibbney · 5 weeks ago
  9. b153279 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 5 weeks ago
  10. 4729786 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 5 weeks ago
  11. 4b26353 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 6 weeks ago
  12. 271f92e NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811) by Lewis John McGibbney · 8 weeks ago
  13. c9e2f4e NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810) by Joe Gilvary · 9 weeks ago
  14. 1563396 NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… (#807) by Lewis John McGibbney · 9 weeks ago
  15. 5a95bc6 NUTCH-3035 Update license and notice file for release of 1.20 (#808) by Sebastian Nagel · 9 weeks ago
  16. 3905a8d NUTCH-3037 Upgrade org.apache.kafka:kafka_2.12: to v3.7.0 (#809) by Lewis John McGibbney · 9 weeks ago
  17. 367988d NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues by Sebastian Nagel · 3 months ago
  18. 9890223 NUTCH-3029 by Markus Jelsma · 3 months ago
  19. a8ec17c NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 3 months ago
  20. 84cda2a NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 3 months ago
  21. 5ba50c0 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 3 months ago
  22. 4f62dec NUTCH-3033 Upgrade Ivy to v2.5.2 (#803) by Lewis John McGibbney · 3 months ago
  23. 4642c30 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 3 months ago
  24. 551c50b NUTCH-3030 Use system default cipher suites instead of hard-coded set by Markus Jelsma · 3 months ago
  25. 42b55f6 Update Dockerfile / JAVA_HOME - 2nd try (#805) by Jakob Berlin · 3 months ago
  26. c390dfc NUTCH-3031 ProtocolFactory host mapper to support domains by Markus Jelsma · 3 months ago
  27. 83acd50 Update crawl documentation by Jakob Berlin · 6 months ago
  28. 6b04554 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 5 months ago
  29. d95e1a7 NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java by Markus Jelsma · 5 months ago
  30. 85fea6e NUTCH-3024 Remove flaky 'dependency check' target (#795) by Lewis John McGibbney · 6 months ago
  31. 7ad382d Merge pull request #796 from DigitalPebble/NUTCH-3025 by Sebastian Nagel · 7 months ago
  32. 49d85ea Merged changes from master; improved Javadoc and exception handling by Julien Nioche · 7 months ago
  33. adadc43 Merge branch 'NUTCH-3017', closes #793 by Sebastian Nagel · 7 months ago
  34. ac383fc [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Sebastian Nagel · 7 months ago
  35. d764e4c Added filtering on whole string + documented config in nutch-default + fixed tests by Julien Nioche · 7 months ago
  36. 9084912 NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794) by Tim Allison · 7 months ago
  37. f88b9a1 NUTCH-3019 -- update Tika (#797) by Tim Allison · 7 months ago
  38. d8e66ce [NUTCH-3025^Curlfilter-fast to filter based on the length of the URL by Julien Nioche · 7 months ago
  39. bbf0867 NUTCH-3014 Standardize Job names (#789) by Lewis John McGibbney · 7 months ago
  40. d1025fd [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Julien Nioche · 7 months ago
  41. 792ed28 NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790) by Lewis John McGibbney · 7 months ago
  42. 8431dcf NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788) by Lewis John McGibbney · 8 months ago
  43. d2c3e96 NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents by Sebastian Nagel · 8 months ago
  44. b081c75 NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) by Sebastian Nagel · 8 months ago
  45. ecdd19d NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 (#779) by Sebastian Nagel · 8 months ago
  46. bb68385 NUTCH-3009 Upgrade to Hadoop 3.3.6 by Sebastian Nagel · 8 months ago
  47. e96cfc5 NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive by Sebastian Nagel · 9 months ago
  48. 97eb0b5 Merge pull request #776 from tballison/NUTCH-2959 by Tim Allison · 8 months ago
  49. 9aabc45 update howto_upgrade_tika.txt by tallison · 8 months ago
  50. 9faf364 Working now locally and with Seb's single_node_cluster tests by tallison · 8 months ago
  51. e32a0e0 Merge remote-tracking branch 'upstream/master' into NUTCH-2959 by tallison · 8 months ago
  52. a74b57b NUTCH-2853 bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean by Sebastian Nagel · 8 months ago
  53. a1ab433 NUTCH-2897 Do not supress deprecated API warnings by Sebastian Nagel · 8 months ago
  54. 810b1d6 NUTCH-3010 Injector: count unique number of injected URLs by Sebastian Nagel · 8 months ago
  55. a72a53a NUTCH-3007 Fix impossible casts by Sebastian Nagel · 8 months ago
  56. 417b877 NUTCH-2852 SpotBugs: Method invokes System.exit(...) by Sebastian Nagel · 8 months ago
  57. 45c9de3 Merge pull request #778 from tballison/NUTCH-3004 by Tim Allison · 8 months ago
  58. 5be64d2 NUTCH-3004 -- propagate ssl exception if message doesn't match "handshake alert..." by tballison · 8 months ago
  59. 0f801c1 NUTCH-2959 -- downgrade commons-io to match the version we expect to come out with Hadoop 3.4.0. by tallison · 9 months ago
  60. f11d383 NUTCH-2959 -- bump commons-io by tallison · 9 months ago
  61. 6bfeaf4 NUTCH-2959 -- bump Tika to 2.9.0, bump common dependencies throughout by tallison · 9 months ago
  62. 85bdd00 Merge remote-tracking branch 'upstream/master' into NUTCH-2959 by tallison · 9 months ago
  63. d81be51 Merge pull request #772 from tballison/NUTCH-2978 by Tim Allison · 9 months ago
  64. 56fdbbe Merge remote-tracking branch 'upstream/master' into NUTCH-2978 by tballison · 9 months ago
  65. 51055ef NUTCH-2978 -- update slf4j-api by tballison · 9 months ago
  66. 10f7c0c NUTCH-2959 -- bump Tika to 2.9.0 by tallison · 9 months ago
  67. 0ad935f Merge pull request #775 from tballison/NUTCH-2998 by Tim Allison · 9 months ago
  68. f078a88 Merge pull request #774 from tballison/NUTCH-3001 by Tim Allison · 9 months ago
  69. c1ba16c Merge pull request #773 from tballison/NUTCH-3000 by Tim Allison · 9 months ago
  70. 8a5ef49 Remove Any23 from Nutch by tallison · 9 months ago
  71. b6f645a NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header by tallison · 9 months ago
  72. 820d129 NUTCH-3000 - the selenium protocol should return the full html, not just the inner body element. by tallison · 9 months ago
  73. daedbc3 NUTCH-2978 -- exclude reload4j and update LICENSE-binary and NOTICE-binary. by tballison · 9 months ago
  74. 0421775 Merge remote-tracking branch 'upstream/master' into NUTCH-2978 by tballison · 9 months ago
  75. 39d59f4 Merge pull request #771 from tballison/NUTCH-2999 by Tim Allison · 9 months ago
  76. 8d9c77f NUTCH-2999 -- upgrade lucene to latest 8.x throughout by tballison · 9 months ago
  77. cf74770 NUTCH-2978, upgrade to slf4j2 throughout, first steps by tballison · 9 months ago
  78. e93aa97 Merge pull request #770 from apache/NUTCH-2999 by Tim Allison · 9 months ago
  79. 3bb8b0e NUTCH-2999 -- upgrade Lucene to latest 8.x throughout by tballison · 9 months ago
  80. f5cd0d6 Merge pull request #768 from tballison/NUTCH-2989 by Tim Allison · 9 months ago
  81. 5a223e1 NUTCH-2989 -- ElasticIndexWriter should enable auth for https, too by tballison · 9 months ago
  82. 0fae6b5 NUTCH-2997 Add Override annotations by Sebastian Nagel · 10 months ago
  83. 070c115 NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4 by Sebastian Nagel · 10 months ago
  84. a24ec5c NUTCH-2995 Upgrade to crawler-commons 1.4 by Sebastian Nagel · 10 months ago
  85. eae3c52 NUTCH-2993 ScoringDepth plugin to skip depth check based on URL Pattern by Sebastian Nagel · 11 months ago
  86. 9109bdd NUTCH-2991 Support HTTP/S Header Authorization for Solr connections (#763) by Sebastian Nagel · 12 months ago
  87. 98d02e7 NUTCH-2992 Fetcher: always block fetch queues when exceptions threshold is reached by Sebastian Nagel · 1 year, 1 month ago
  88. 215993b NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty by Sebastian Nagel · 1 year, 3 months ago
  89. b4cb5c1 NUTCH-2984 Drop test proxy server and benchmark tool by Sebastian Nagel · 1 year, 3 months ago
  90. 1999b1e NUTCH-2985 Disable plugin urlfilter-validator by default by Sebastian Nagel · 1 year, 3 months ago
  91. c8aecfa NUTCH-2983 nutch-default.xml improvements by Sebastian Nagel · 1 year, 4 months ago
  92. a92878d NUTCH-2972 Javadoc build fails using JDK 17 by Sebastian Nagel · 1 year, 3 months ago
  93. ef29496 NUTCH-2982 Generator: parameter for URL normalization not passed forward by Sebastian Nagel · 1 year, 4 months ago
  94. e8fd210 Add indexer-opensearch-1x to 4 more targets...feedback from sebastian-nagel by tballison · 1 year, 3 months ago
  95. e03cad3 fix template to include new key store info. remove unused auth by tallison · 1 year, 3 months ago
  96. 71fabb2 NUTCH-2920 -- improve username/pw logic and update README.md by tallison · 1 year, 3 months ago
  97. 5fc2839 NUTCH-2920 -- improve handling for missing trust.store.path in the index-writers.xml by tallison · 1 year, 3 months ago
  98. f6b1717 NUTCH-2920 -- add keystore for 2-way tls; add back in no-tls option with a stern warning and possibly helpful links. by tallison · 1 year, 3 months ago
  99. 6e149f4 NUTCH-2920 -- fix imports by tallison · 1 year, 3 months ago
  100. ca3824f NUTCH-2920 -- first working attempt at migrating ElasticsearchIndexWriter to OpenSearch by tallison · 1 year, 3 months ago