- 271f92e NUTCH-3038 Address issues discovered during 1.20 release management dryrun (#811) by Lewis John McGibbney · 2 weeks ago master
- c9e2f4e NUTCH-3032 Code for an ArbitraryIndexingFilter to index values resolved by user POJO code at index time (#810) by Joe Gilvary · 3 weeks ago
- 1563396 NUTCH-3036 Upgrade org.seleniumhq.selenium:selenium-java dependency i… (#807) by Lewis John McGibbney · 4 weeks ago
- 5a95bc6 NUTCH-3035 Update license and notice file for release of 1.20 (#808) by Sebastian Nagel · 4 weeks ago
- 3905a8d NUTCH-3037 Upgrade org.apache.kafka:kafka_2.12: to v3.7.0 (#809) by Lewis John McGibbney · 4 weeks ago
- 367988d NUTCH-3008 indexer-elastic: downgrade to ES 7.10.2 to address licensing issues by Sebastian Nagel · 6 weeks ago
- 9890223 NUTCH-3029 by Markus Jelsma · 6 weeks ago
- a8ec17c NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
- 84cda2a NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
- 5ba50c0 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
- 4f62dec NUTCH-3033 Upgrade Ivy to v2.5.2 (#803) by Lewis John McGibbney · 6 weeks ago
- 4642c30 NUTCH-3029 Host specific max. and min. intervals in adaptive scheduler by Markus Jelsma · 6 weeks ago
- 551c50b NUTCH-3030 Use system default cipher suites instead of hard-coded set by Markus Jelsma · 6 weeks ago
- 42b55f6 Update Dockerfile / JAVA_HOME - 2nd try (#805) by Jakob Berlin · 6 weeks ago
- c390dfc NUTCH-3031 ProtocolFactory host mapper to support domains by Markus Jelsma · 6 weeks ago
- 83acd50 Update crawl documentation by Jakob Berlin · 4 months ago
- 6b04554 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 3 months ago
- d95e1a7 NUTCH-3027 Trivial resource leak patch in DomainSuffixes.java by Markus Jelsma · 3 months ago
- 85fea6e NUTCH-3024 Remove flaky 'dependency check' target (#795) by Lewis John McGibbney · 5 months ago
- 7ad382d Merge pull request #796 from DigitalPebble/NUTCH-3025 by Sebastian Nagel · 6 months ago
- 49d85ea Merged changes from master; improved Javadoc and exception handling by Julien Nioche · 6 months ago
- adadc43 Merge branch 'NUTCH-3017', closes #793 by Sebastian Nagel · 6 months ago
- ac383fc [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Sebastian Nagel · 6 months ago
- d764e4c Added filtering on whole string + documented config in nutch-default + fixed tests by Julien Nioche · 6 months ago
- 9084912 NUTCH-3020 -- ParseSegment should check for okhttp's truncation flag (#794) by Tim Allison · 6 months ago
- f88b9a1 NUTCH-3019 -- update Tika (#797) by Tim Allison · 6 months ago
- d8e66ce [NUTCH-3025^Curlfilter-fast to filter based on the length of the URL by Julien Nioche · 6 months ago
- bbf0867 NUTCH-3014 Standardize Job names (#789) by Lewis John McGibbney · 6 months ago
- d1025fd [NUTCH-3017] Allow fast-urlfilter to load from HDFS/S3 and support gzipped input by Julien Nioche · 6 months ago
- 792ed28 NUTCH-3015 Add more CI steps to GitHub master-build.yml (#790) by Lewis John McGibbney · 6 months ago
- 8431dcf NUTCH-3013 Employ commons-lang3's StopWatch to simplify timing logic (#788) by Lewis John McGibbney · 6 months ago
- d2c3e96 NUTCH-3012 SegmentReader when dumping with option -recode: NPE on unparsed documents by Sebastian Nagel · 7 months ago
- b081c75 NUTCH-3011 HttpRobotRulesParser: handle HTTP 429 Too Many Requests same as server errors (HTTP 5xx) by Sebastian Nagel · 7 months ago
- ecdd19d NUTCH-2990 HttpRobotRulesParser to follow 5 redirects as specified by RFC 9309 (#779) by Sebastian Nagel · 6 months ago
- bb68385 NUTCH-3009 Upgrade to Hadoop 3.3.6 by Sebastian Nagel · 7 months ago
- e96cfc5 NUTCH-3002 Protocol-okhttp HttpResponse: HTTP header metadata lookup should be case-insensitive by Sebastian Nagel · 7 months ago
- 97eb0b5 Merge pull request #776 from tballison/NUTCH-2959 by Tim Allison · 6 months ago
- 9aabc45 update howto_upgrade_tika.txt by tallison · 7 months ago
- 9faf364 Working now locally and with Seb's single_node_cluster tests by tallison · 7 months ago
- e32a0e0 Merge remote-tracking branch 'upstream/master' into NUTCH-2959 by tallison · 7 months ago
- a74b57b NUTCH-2853 bin/nutch: remove deprecated commands solrindex, solrdedup, solrclean by Sebastian Nagel · 7 months ago
- a1ab433 NUTCH-2897 Do not supress deprecated API warnings by Sebastian Nagel · 7 months ago
- 810b1d6 NUTCH-3010 Injector: count unique number of injected URLs by Sebastian Nagel · 7 months ago
- a72a53a NUTCH-3007 Fix impossible casts by Sebastian Nagel · 7 months ago
- 417b877 NUTCH-2852 SpotBugs: Method invokes System.exit(...) by Sebastian Nagel · 7 months ago
- 45c9de3 Merge pull request #778 from tballison/NUTCH-3004 by Tim Allison · 7 months ago
- 5be64d2 NUTCH-3004 -- propagate ssl exception if message doesn't match "handshake alert..." by tballison · 7 months ago
- 0f801c1 NUTCH-2959 -- downgrade commons-io to match the version we expect to come out with Hadoop 3.4.0. by tallison · 7 months ago
- f11d383 NUTCH-2959 -- bump commons-io by tallison · 7 months ago
- 6bfeaf4 NUTCH-2959 -- bump Tika to 2.9.0, bump common dependencies throughout by tallison · 7 months ago
- 85bdd00 Merge remote-tracking branch 'upstream/master' into NUTCH-2959 by tallison · 7 months ago
- d81be51 Merge pull request #772 from tballison/NUTCH-2978 by Tim Allison · 7 months ago
- 56fdbbe Merge remote-tracking branch 'upstream/master' into NUTCH-2978 by tballison · 7 months ago
- 51055ef NUTCH-2978 -- update slf4j-api by tballison · 7 months ago
- 10f7c0c NUTCH-2959 -- bump Tika to 2.9.0 by tallison · 7 months ago
- 0ad935f Merge pull request #775 from tballison/NUTCH-2998 by Tim Allison · 7 months ago
- f078a88 Merge pull request #774 from tballison/NUTCH-3001 by Tim Allison · 7 months ago
- c1ba16c Merge pull request #773 from tballison/NUTCH-3000 by Tim Allison · 7 months ago
- 8a5ef49 Remove Any23 from Nutch by tallison · 7 months ago
- b6f645a NUTCH-3001 - fix logic for grabbing bytes if there's no content type in the header by tallison · 7 months ago
- 820d129 NUTCH-3000 - the selenium protocol should return the full html, not just the inner body element. by tallison · 7 months ago
- daedbc3 NUTCH-2978 -- exclude reload4j and update LICENSE-binary and NOTICE-binary. by tballison · 8 months ago
- 0421775 Merge remote-tracking branch 'upstream/master' into NUTCH-2978 by tballison · 8 months ago
- 39d59f4 Merge pull request #771 from tballison/NUTCH-2999 by Tim Allison · 8 months ago
- 8d9c77f NUTCH-2999 -- upgrade lucene to latest 8.x throughout by tballison · 8 months ago
- cf74770 NUTCH-2978, upgrade to slf4j2 throughout, first steps by tballison · 8 months ago
- e93aa97 Merge pull request #770 from apache/NUTCH-2999 by Tim Allison · 8 months ago
- 3bb8b0e NUTCH-2999 -- upgrade Lucene to latest 8.x throughout by tballison · 8 months ago
- f5cd0d6 Merge pull request #768 from tballison/NUTCH-2989 by Tim Allison · 8 months ago
- 5a223e1 NUTCH-2989 -- ElasticIndexWriter should enable auth for https, too by tballison · 8 months ago
- 0fae6b5 NUTCH-2997 Add Override annotations by Sebastian Nagel · 8 months ago
- 070c115 NUTCH-2996 Use new SimpleRobotRulesParser API entry point crawler-commons 1.4 by Sebastian Nagel · 8 months ago
- a24ec5c NUTCH-2995 Upgrade to crawler-commons 1.4 by Sebastian Nagel · 8 months ago
- eae3c52 NUTCH-2993 ScoringDepth plugin to skip depth check based on URL Pattern by Sebastian Nagel · 9 months ago
- 9109bdd NUTCH-2991 Support HTTP/S Header Authorization for Solr connections (#763) by Sebastian Nagel · 11 months ago
- 98d02e7 NUTCH-2992 Fetcher: always block fetch queues when exceptions threshold is reached by Sebastian Nagel · 11 months ago
- 215993b NUTCH-2596 Upgrade from org.mortbay.jetty to org.eclipse.jetty by Sebastian Nagel · 1 year, 2 months ago
- b4cb5c1 NUTCH-2984 Drop test proxy server and benchmark tool by Sebastian Nagel · 1 year, 2 months ago
- 1999b1e NUTCH-2985 Disable plugin urlfilter-validator by default by Sebastian Nagel · 1 year, 2 months ago
- c8aecfa NUTCH-2983 nutch-default.xml improvements by Sebastian Nagel · 1 year, 2 months ago
- a92878d NUTCH-2972 Javadoc build fails using JDK 17 by Sebastian Nagel · 1 year, 2 months ago
- ef29496 NUTCH-2982 Generator: parameter for URL normalization not passed forward by Sebastian Nagel · 1 year, 2 months ago
- e8fd210 Add indexer-opensearch-1x to 4 more targets...feedback from sebastian-nagel by tballison · 1 year, 2 months ago
- e03cad3 fix template to include new key store info. remove unused auth by tallison · 1 year, 2 months ago
- 71fabb2 NUTCH-2920 -- improve username/pw logic and update README.md by tallison · 1 year, 2 months ago
- 5fc2839 NUTCH-2920 -- improve handling for missing trust.store.path in the index-writers.xml by tallison · 1 year, 2 months ago
- f6b1717 NUTCH-2920 -- add keystore for 2-way tls; add back in no-tls option with a stern warning and possibly helpful links. by tallison · 1 year, 2 months ago
- 6e149f4 NUTCH-2920 -- fix imports by tallison · 1 year, 2 months ago
- ca3824f NUTCH-2920 -- first working attempt at migrating ElasticsearchIndexWriter to OpenSearch by tallison · 1 year, 2 months ago
- 383aeca NUTCH-2980: Upgraded Selenium to 4.7.2 + HTMLUnit by Kamil Mroczek · 1 year, 3 months ago
- 19dbe78 Merge pull request #752 from sebastian-nagel/NUTCH-2974 by Sebastian Nagel · 1 year, 2 months ago
- 541e693 NUTCH-2974 Ant build fails with "Unparseable date" on certain platforms by Sebastian Nagel · 1 year, 3 months ago
- 9a1ed40 Merge pull request #751 from sebastian-nagel/NUTCH-2634 by Sebastian Nagel · 1 year, 4 months ago
- dfdd00f NUTCH-2634 Some links marked as "nofollow" are followed anyway by Sebastian Nagel · 1 year, 4 months ago
- 7d39004 NUTCH-2924 Generate maxCount expr evaluated only once by Markus Jelsma · 1 year, 4 months ago
- d806aa4 NUTCH-2977 by Markus Jelsma · 1 year, 5 months ago
- ed7b661 Merge pull request #748 from sebastian-nagel/NUTCH-2883-docker by Sebastian Nagel · 1 year, 8 months ago
- 7c1a48c NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container by Sebastian Nagel · 1 year, 8 months ago
- 0bda1bd NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container by Sebastian Nagel · 2 years, 7 months ago
- 989c2ca NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container by Lewis John McGibbney · 2 years, 10 months ago