- b61d11f Merge pull request #849 from maciejpuzianowski/NUTCH-3108 by Sebastian Nagel · 7 weeks ago master
- a050c43 fix for NUTCH-3108 contributed by maciejpuzianowski/mpuzianowski by Maciej Puzianowski · 3 months ago
- b52ec90 NUTCH-3100 HostDB to support minimum records per host by Markus Jelsma · 4 months ago
- 18e7aeb NUTCH-3101 src/java/org/apache/nutch/crawl/Inlink.java by Markus Jelsma · 4 months ago
- 3b6d2c6 Merge pull request #832 from sebastian-nagel/NUTCH-3072 by Sebastian Nagel · 4 months ago
- 74b49e9 NUTCH-3086 Consolidate plugin extension names and IDs (#835) by Sebastian Nagel · 4 months ago
- 5068b76 Merge pull request #844 from maciejpuzianowski/NUTCH-3097 by Sebastian Nagel · 4 months ago
- 86b893a NUTCH-3079 Dumping a segment fails unless it has been fetched and parsed by Sebastian Nagel · 7 months ago
- b481f91 NUTCH-3083 Add RobotRulesParser to bin/nutch by Sebastian Nagel · 7 months ago
- 5263b7c NUTCH-3096 HostDB ResolverThread can create too many job counters by Sebastian Nagel · 5 months ago
- e2a29d0 NUTCH-3092 Replace all imports of commons-lang by commons-lang3 by Sebastian Nagel · 6 months ago
- bb17570 fix for NUTCH-3097 contributed by maciejpuzianowski/mpuzianowski by Maciej Puzianowski · 5 months ago
- 5a01834 NUTCH-3094 Github tests to run if build configuration changes by Sebastian Nagel · 5 months ago
- 68c1a7d NUTCH-3094 Github tests to run if build configuration changes by Sebastian Nagel · 6 months ago
- 6bff123 NUTCH-3095 Update .gitignore to ignore Hadoop native libraries by Sebastian Nagel · 6 months ago
- 5fc8ed0 NUTCH-3093 Ant target test-plugins to depend on compile-core-test (#840) by Sebastian Nagel · 5 months ago
- c226162 NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads" by Sebastian Nagel · 5 months ago
- e1b8dbe NUTCH-2771 Tests in nightly builds: skip long runners by Sebastian Nagel · 7 months ago
- 5961e26 NUTCH-3084 Improve CI by filtering and separating plugin and core test execution (#833) by Lewis John McGibbney · 7 months ago
- f5b9ace NUTCH-3072 Fetcher to stop QueueFeeder if aborting with "hung threads" by Sebastian Nagel · 7 months ago
- b02340d Merge pull request #827 from sebastian-nagel/NUTCH-3067 by Sebastian Nagel · 7 months ago
- a99bd8e NUTCH-3075 tld plugin makes injector crash NUTCH-1942 Remove TopLevelDomain by Sebastian Nagel · 7 months ago
- 3495472 Unlock database when Injector finishes - regardless of result by cube · 7 months ago
- 633fa10 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
- 63da626 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
- bd2fce6 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
- 0b06b1b NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
- e053ed0 NUTCH-3067 Improve performance of FetchItemQueues if error state is preserved by Sebastian Nagel · 7 months ago
- 4a61208 Merge pull request #828 from sebastian-nagel/NUTCH-3073 by Sebastian Nagel · 7 months ago
- e678777 NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
- 13fcef8 NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
- 1db4119 NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
- 83405fb NUTCH-3073 Address Java compiler warning by Sebastian Nagel · 7 months ago
- 7992d3c NUTCH-3073 Address Java compiler warnings by Sebastian Nagel · 7 months ago
- d6f55b8 NUTCH-2812 Methods returning array may expose internal representation by Sebastian Nagel · 8 months ago
- c137b4e Merge pull request #798 from GabeHaegele/NUTCH-2812 by Sebastian Nagel · 8 months ago
- 8b11962 Merge pull request #816 from sebastian-nagel/NUTCH-1942-domain-utils-to-use-crawler-commons by Sebastian Nagel · 8 months ago
- 582cdd4 NUTCH-3058 Fetcher: counter for hung threads (#820) by Sebastian Nagel · 8 months ago
- 9d138ff NUTCH-3061 URL filters to log name of the rules file by Sebastian Nagel · 10 months ago
- 4200247 NUTCH-3062 protocol-okhttp: optionally record HTTP and SSL/TLS versions (#822) by Sebastian Nagel · 8 months ago
- bc8bd31 Merge pull request #823 from sebastian-nagel/NUTCH-3065-changelog-markdown by Sebastian Nagel · 8 months ago
- 309bc18 NUTCH-3066 Protocol plugin unit tests fail randomly by Sebastian Nagel · 8 months ago
- e09d40c Merge pull request #819 from CatChullain/NUTCH-3057 by Joe Gilvary · 8 months ago
- 40881e8 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 8 months ago
- ac03cf1 NUTCH-3063 Support for "addBinaryContent" from REST API by Sebastian Nagel · 8 months ago
- 6b9887b Fix syntax in Maven template by Sebastian Nagel · 8 months ago
- 7bd58d8 NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
- f669257 NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
- 6f0a89f NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
- 20710cb NUTCH-3065 Format changelog to markdown by Sebastian Nagel · 8 months ago
- ca03d9b NUTCH-3055 README: fix Github "hub" commands by Sebastian Nagel · 1 year ago
- bfa07df Merge pull request #815 from sebastian-nagel/NUTCH-3044-generator-npe by Sebastian Nagel · 12 months ago
- c13dc1d NUTCH-3057 - Fix for index-arbitrary plugin improper retention and use of calculated value for arbitrary field after an exception by Joe Gilvary · 12 months ago
- 8abc78a NUTCH-3041 Address confusing logging in o.a.n.net.URLExemptionFilters (#813) by Lewis John McGibbney · 12 months ago
- 5f1330a NUTCH-3043 Generator: count URLs rejected by URL filters (#814) by Sebastian Nagel · 12 months ago
- ea9c7ee NUTCH-3039 Failure to handle ftp:// URLs by Sebastian Nagel · 1 year, 1 month ago
- 7ac3ce2 NUTCH-3054 Address deprecation of Node16 for all GitHub Actions (#817) by Lewis John McGibbney · 1 year ago
- d43f579 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 1 year ago
- e0fa357 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 1 year, 1 month ago
- bc2ae7e NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 1 year ago
- f6bcec9 NUTCH-1806 Delegate processing of URL domains to crawler commons by Sebastian Nagel · 8 years ago
- 817af69 Boostrap Nutch 1.21 development drive. by Lewis John McGibbney · 1 year ago
- c0b9461 Add GitHub CI badge to README by Lewis John McGibbney · 1 year ago
- b153279 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 1 year, 1 month ago
- 4729786 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 1 year, 1 month ago
- 4b26353 NUTCH-3044 Generator: NPE when extracting the host part of a URL fails by Sebastian Nagel · 1 year, 1 month ago
- 271f92e NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811) by Lewis John McGibbney · 1 year, 1 month ago
- c9e2f4e NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810) by Joe Gilvary · 1 year, 1 month ago
- 1563396 NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… (#807) by Lewis John McGibbney · 1 year, 1 month ago
- 5a95bc6 NUTCH-3035 Update license and notice file for release of 1.20 (#808) by Sebastian Nagel · 1 year, 1 month ago
- 3905a8d NUTCH-3037 Upgrade org.apache.kafka:kafka_2.12: to v3.7.0 (#809) by Lewis John McGibbney · 1 year, 1 month ago
- 367988d NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues by Sebastian Nagel · 1 year, 2 months ago
- 9890223 NUTCH-3029 by Markus Jelsma · 1 year, 2 months ago
- a8ec17c NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
- 84cda2a NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
- 5ba50c0 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
- 4f62dec NUTCH-3033 Upgrade Ivy to v2.5.2 (#803) by Lewis John McGibbney · 1 year, 2 months ago
- 4642c30 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 1 year, 2 months ago
- 551c50b NUTCH-3030 Use system default cipher suites instead of hard-coded set by Markus Jelsma · 1 year, 2 months ago
- 42b55f6 Update Dockerfile / JAVA_HOME - 2nd try (#805) by Jakob Berlin · 1 year, 2 months ago
- c390dfc NUTCH-3031 ProtocolFactory host mapper to support domains by Markus Jelsma · 1 year, 2 months ago
- 83acd50 Update crawl documentation by Jakob Berlin · 1 year, 5 months ago
- 6b04554 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 1 year, 4 months ago
- d95e1a7 NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java by Markus Jelsma · 1 year, 4 months ago
- 85fea6e NUTCH-3024 Remove flaky 'dependency check' target (#795) by Lewis John McGibbney · 1 year, 6 months ago
- 0765367 fix for NUTCH-2812 contributed by GabeHaegele by Gabe · 1 year, 6 months ago
- 7ad382d Merge pull request #796 from DigitalPebble/NUTCH-3025 by Sebastian Nagel · 1 year, 6 months ago
- 49d85ea Merged changes from master; improved Javadoc and exception handling by Julien Nioche · 1 year, 6 months ago
- adadc43 Merge branch 'NUTCH-3017', closes #793 by Sebastian Nagel · 1 year, 6 months ago
- ac383fc [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Sebastian Nagel · 1 year, 6 months ago
- d764e4c Added filtering on whole string + documented config in nutch-default + fixed tests by Julien Nioche · 1 year, 6 months ago
- 9084912 NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794) by Tim Allison · 1 year, 6 months ago
- f88b9a1 NUTCH-3019 -- update Tika (#797) by Tim Allison · 1 year, 6 months ago
- d8e66ce [NUTCH-3025^Curlfilter-fast to filter based on the length of the URL by Julien Nioche · 1 year, 6 months ago
- bbf0867 NUTCH-3014 Standardize Job names (#789) by Lewis John McGibbney · 1 year, 6 months ago
- d1025fd [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Julien Nioche · 1 year, 7 months ago
- 792ed28 NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790) by Lewis John McGibbney · 1 year, 7 months ago
- 8431dcf NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788) by Lewis John McGibbney · 1 year, 7 months ago
- d2c3e96 NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents by Sebastian Nagel · 1 year, 7 months ago
- b081c75 NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) by Sebastian Nagel · 1 year, 7 months ago