1. b61d11f Merge pull request #849 from maciejpuzianowski/NUTCH-3108 by Sebastian Nagel · 7 weeks ago master
  2. a050c43 fix for NUTCH-3108 contributed by maciejpuzianowski/mpuzianowski by Maciej Puzianowski · 3 months ago
  3. b52ec90 NUTCH-3100 HostDB to support minimum records per host by Markus Jelsma · 4 months ago
  4. 18e7aeb NUTCH-3101 src/java/org/apache/nutch/crawl/Inlink.java by Markus Jelsma · 4 months ago
  5. 3b6d2c6 Merge pull request #832 from sebastian-nagel/NUTCH-3072 by Sebastian Nagel · 4 months ago
  6. 74b49e9 NUTCH-3086 Consolidate plugin extension names and IDs (#835) by Sebastian Nagel · 4 months ago
  7. 5068b76 Merge pull request #844 from maciejpuzianowski/NUTCH-3097 by Sebastian Nagel · 4 months ago
  8. 86b893a NUTCH-3079 Dumping a segment fails unless it has been fetched and parsed by Sebastian Nagel · 7 months ago
  9. b481f91 NUTCH-3083 Add RobotRulesParser to bin/nutch by Sebastian Nagel · 7 months ago
  10. 5263b7c NUTCH-3096 HostDB ResolverThread can create too many job counters by Sebastian Nagel · 5 months ago
  11. e2a29d0 NUTCH-3092 Replace all imports of commons-lang by commons-lang3 by Sebastian Nagel · 6 months ago
  12. bb17570 fix for NUTCH-3097 contributed by maciejpuzianowski/mpuzianowski by Maciej Puzianowski · 5 months ago
  13. 5a01834 NUTCH-3094 Github tests to run if build configuration changes by Sebastian Nagel · 5 months ago
  14. 68c1a7d NUTCH-3094 Github tests to run if build configuration changes by Sebastian Nagel · 6 months ago
  15. 6bff123 NUTCH-3095 Update .gitignore to ignore Hadoop native libraries by Sebastian Nagel · 6 months ago
  16. 5fc8ed0 NUTCH-3093 Ant target test-plugins to depend on compile-core-test (#840) by Sebastian Nagel · 5 months ago
  17. c226162 NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads" by Sebastian Nagel · 5 months ago
  18. e1b8dbe NUTCH-2771 Tests in nightly builds: skip long runners by Sebastian Nagel · 7 months ago
  19. 5961e26 NUTCH-3084 Improve CI by filtering and separating plugin and core test execution (#833) by Lewis John McGibbney · 7 months ago
  20. f5b9ace NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads" by Sebastian Nagel · 7 months ago
  21. b02340d Merge pull request #827 from sebastian-nagel/NUTCH-3067 by Sebastian Nagel · 7 months ago
  22. a99bd8e NUTCH-3075 tld plugin makes injector crash NUTCH-1942 Remove TopLevelDomain by Sebastian Nagel · 7 months ago
  23. 3495472 Unlock database when Injector finishes - regardless of result by cube · 7 months ago
  24. 633fa10 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
  25. 63da626 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
  26. bd2fce6 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
  27. 0b06b1b NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
  28. e053ed0 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
  29. 4a61208 Merge pull request #828 from sebastian-nagel/NUTCH-3073 by Sebastian Nagel · 7 months ago
  30. e678777 NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
  31. 13fcef8 NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
  32. 1db4119 NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
  33. 83405fb NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
  34. 7992d3c NUTCH-3073 Address Java compiler warnings by Sebastian Nagel · 7 months ago
  35. d6f55b8 NUTCH-2812 Methods returning array may expose internal representation by Sebastian Nagel · 8 months ago
  36. c137b4e Merge pull request #798 from GabeHaegele/NUTCH-2812 by Sebastian Nagel · 8 months ago
  37. 8b11962 Merge pull request #816 from sebastian-nagel/NUTCH-1942-domain-utils-to-use-crawler-commons by Sebastian Nagel · 8 months ago
  38. 582cdd4 NUTCH-3058 Fetcher: counter for hung threads (#820) by Sebastian Nagel · 8 months ago
  39. 9d138ff NUTCH-3061 URL filters to log name of the rules file by Sebastian Nagel · 10 months ago
  40. 4200247 NUTCH-3062 protocol-okhttp: optionally record HTTP and SSL/TLS versions (#822) by Sebastian Nagel · 8 months ago
  41. bc8bd31 Merge pull request #823 from sebastian-nagel/NUTCH-3065-changelog-markdown by Sebastian Nagel · 8 months ago
  42. 309bc18 NUTCH-3066 Protocol plugin unit tests fail randomly by Sebastian Nagel · 8 months ago
  43. e09d40c Merge pull request #819 from CatChullain/NUTCH-3057 by Joe Gilvary · 8 months ago
  44. 40881e8 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 8 months ago
  45. ac03cf1 NUTCH-3063 Support for "addBinaryContent" from REST API by Sebastian Nagel · 8 months ago
  46. 6b9887b Fix syntax in Maven template by Sebastian Nagel · 8 months ago
  47. 7bd58d8 NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
  48. f669257 NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
  49. 6f0a89f NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
  50. 20710cb NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
  51. ca03d9b NUTCH-3055 README: fix Github "hub" commands by Sebastian Nagel · 1 year ago
  52. bfa07df Merge pull request #815 from sebastian-nagel/NUTCH-3044-generator-npe by Sebastian Nagel · 12 months ago
  53. c13dc1d NUTCH-3057 - Fix for index-arbitrary plugin improper retention and use of calculated value for arbitrary field after an exception by Joe Gilvary · 12 months ago
  54. 8abc78a NUTCH-3041 Address confusing logging in o.a.n.net.URLExemptionFilters (#813) by Lewis John McGibbney · 12 months ago
  55. 5f1330a NUTCH-3043 Generator: count URLs rejected by URL filters (#814) by Sebastian Nagel · 12 months ago
  56. ea9c7ee NUTCH-3039 Failure to handle ftp:// URLs by Sebastian Nagel · 1 year, 1 month ago
  57. 7ac3ce2 NUTCH-3054 Address deprecation of Node16 for all GitHub Actions (#817) by Lewis John McGibbney · 1 year ago
  58. d43f579 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 1 year ago
  59. e0fa357 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 1 year, 1 month ago
  60. bc2ae7e NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 1 year ago
  61. f6bcec9 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 8 years ago
  62. 817af69 Boostrap Nutch 1.21 development drive. by Lewis John McGibbney · 1 year ago
  63. c0b9461 Add GitHub CI badge to README by Lewis John McGibbney · 1 year ago
  64. b153279 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 1 year, 1 month ago
  65. 4729786 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 1 year, 1 month ago
  66. 4b26353 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 1 year, 1 month ago
  67. 271f92e NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811) by Lewis John McGibbney · 1 year, 1 month ago
  68. c9e2f4e NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810) by Joe Gilvary · 1 year, 1 month ago
  69. 1563396 NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… (#807) by Lewis John McGibbney · 1 year, 1 month ago
  70. 5a95bc6 NUTCH-3035 Update license and notice file for release of 1.20 (#808) by Sebastian Nagel · 1 year, 1 month ago
  71. 3905a8d NUTCH-3037 Upgrade org.apache.kafka:kafka_2.12: to v3.7.0 (#809) by Lewis John McGibbney · 1 year, 1 month ago
  72. 367988d NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues by Sebastian Nagel · 1 year, 2 months ago
  73. 9890223 NUTCH-3029 by Markus Jelsma · 1 year, 2 months ago
  74. a8ec17c NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
  75. 84cda2a NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
  76. 5ba50c0 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
  77. 4f62dec NUTCH-3033 Upgrade Ivy to v2.5.2 (#803) by Lewis John McGibbney · 1 year, 2 months ago
  78. 4642c30 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
  79. 551c50b NUTCH-3030 Use system default cipher suites instead of hard-coded set by Markus Jelsma · 1 year, 2 months ago
  80. 42b55f6 Update Dockerfile / JAVA_HOME - 2nd try (#805) by Jakob Berlin · 1 year, 2 months ago
  81. c390dfc NUTCH-3031 ProtocolFactory host mapper to support domains by Markus Jelsma · 1 year, 2 months ago
  82. 83acd50 Update crawl documentation by Jakob Berlin · 1 year, 5 months ago
  83. 6b04554 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 1 year, 4 months ago
  84. d95e1a7 NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java by Markus Jelsma · 1 year, 4 months ago
  85. 85fea6e NUTCH-3024 Remove flaky 'dependency check' target (#795) by Lewis John McGibbney · 1 year, 6 months ago
  86. 0765367 fix for NUTCH-2812 contributed by GabeHaegele by Gabe · 1 year, 6 months ago
  87. 7ad382d Merge pull request #796 from DigitalPebble/NUTCH-3025 by Sebastian Nagel · 1 year, 6 months ago
  88. 49d85ea Merged changes from master; improved Javadoc and exception handling by Julien Nioche · 1 year, 6 months ago
  89. adadc43 Merge branch 'NUTCH-3017', closes #793 by Sebastian Nagel · 1 year, 6 months ago
  90. ac383fc [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Sebastian Nagel · 1 year, 6 months ago
  91. d764e4c Added filtering on whole string + documented config in nutch-default + fixed tests by Julien Nioche · 1 year, 6 months ago
  92. 9084912 NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794) by Tim Allison · 1 year, 6 months ago
  93. f88b9a1 NUTCH-3019 -- update Tika (#797) by Tim Allison · 1 year, 6 months ago
  94. d8e66ce [NUTCH-3025^Curlfilter-fast to filter based on the length of the URL by Julien Nioche · 1 year, 6 months ago
  95. bbf0867 NUTCH-3014 Standardize Job names (#789) by Lewis John McGibbney · 1 year, 6 months ago
  96. d1025fd [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Julien Nioche · 1 year, 7 months ago
  97. 792ed28 NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790) by Lewis John McGibbney · 1 year, 7 months ago
  98. 8431dcf NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788) by Lewis John McGibbney · 1 year, 7 months ago
  99. d2c3e96 NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents by Sebastian Nagel · 1 year, 7 months ago
  100. b081c75 NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) by Sebastian Nagel · 1 year, 7 months ago