1. c0f723e NUTCH-2957 indexer-solr / Solr schema.xml by Sebastian Nagel · 1 year, 11 months ago
  2. edebfe4 NUTCH-2955 indexer-solr: replace deprecated/removed field type solr.LatLonType by Sebastian Nagel · 1 year, 11 months ago
  3. a5a6300 Merge pull request #729 from sebastian-nagel/NUTCH-2947-keep-stateful-fetch-queues by Sebastian Nagel · 1 year, 10 months ago
  4. 82f9530 Merge pull request #697 from sebastian-nagel/NUTCH-2896-okhttp-connection-pool by Sebastian Nagel · 1 year, 10 months ago
  5. b7b8345 NUTCH-2958 Upgrade to crawler-commons 1.3 (#740) by Sebastian Nagel · 1 year, 11 months ago
  6. 957d460 NUTCH-2290 Update licenses of bundled libraries by Sebastian Nagel · 1 year, 11 months ago
  7. 9a59ec9 NUTCH-2290 Update licenses of bundled libraries by Sebastian Nagel · 1 year, 11 months ago
  8. 78f6f40 NUTCH-2290 Update licenses of bundled libraries by Sebastian Nagel · 1 year, 11 months ago
  9. 2fbd309 NUTCH-2290 Update licenses of bundled libraries by Sebastian Nagel · 1 year, 11 months ago
  10. 1d1eb63 NUTCH-2290 Update licenses of bundled libraries by Sebastian Nagel · 1 year, 11 months ago
  11. a107131 NUTCH-2290 Update licenses of bundled libraries by Sebastian Nagel · 1 year, 11 months ago
  12. eba8f38 NUTCH-2290 Update licenses of bundled libraries by Sebastian Nagel · 1 year, 11 months ago
  13. ddca1c2 NUTCH-2822 Split the LICENSE.txt file into two files for source resp. binary releases by Sebastian Nagel · 1 year, 11 months ago
  14. 1aec06f Upgrade to Apache Rat 0.14 (download of Rat 0.13 failed) by Sebastian Nagel · 1 year, 11 months ago
  15. dfe430b NUTCH-2861 Remove parse-swf by Sebastian Nagel · 1 year, 11 months ago
  16. 8fc4f17 NUTCH-2956 index-geoip: dependency upgrades and improvements by Sebastian Nagel · 1 year, 11 months ago
  17. 01ab00b NUTCH-2953 Indexer Elastic to ignore SSL issues by Sebastian Nagel · 1 year, 11 months ago
  18. e71841f NUTCH-2952 Upgrade core dependencies by Sebastian Nagel · 2 years ago
  19. 487110b NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs by Sebastian Nagel · 2 years ago
  20. 1f5f3e4 NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used by Sebastian Nagel · 2 years, 1 month ago
  21. 03e0ffd NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode if protocol-okhttp is used by Sebastian Nagel · 2 years ago
  22. 5b970ff NUTCH-2951 Crawl datum with metadata WRITABLE_GENERATE_TIME_KEY awaits fetching forever by Sebastian Nagel · 2 years, 1 month ago
  23. 467e591 NUTCH-2896 Protocol-okhttp: make connection pool configurable by Sebastian Nagel · 2 years, 9 months ago
  24. af44bcb NUTCH-2896 Protocol-okhttp: make connection pool configurable by Sebastian Nagel · 2 years, 9 months ago
  25. 47d3fe6 Merge pull request #731 from sebastian-nagel/NUTCH-2950-update-hostdb-performance by Sebastian Nagel · 2 years, 1 month ago
  26. 02dca3b NUTCH-2936 Early registration of URL stream handlers provided by plugins may fail Hadoop jobs running in distributed mode (#726) by Lewis John McGibbney · 2 years, 1 month ago
  27. 947e67b NUTCH-2950 Improve performance of UpdateHostDb - fix Javadoc errors / warnings by Sebastian Nagel · 2 years, 1 month ago
  28. bafa752 Fail javadoc build on all kinds of javadoc errors and warnings by Sebastian Nagel · 2 years, 1 month ago
  29. 5086958 NUTCH-2950 Improve performance of UpdateHostDb by Sebastian Nagel · 2 years, 1 month ago
  30. 13f8504 Improve performance of UpdateHostDb by Sebastian Nagel · 2 years, 1 month ago
  31. 417dee6 NUTCH-2950 Improve performance of UpdateHostDb by Sebastian Nagel · 2 years, 1 month ago
  32. 5a6ac3b NUTCH-2950 Improve performance of UpdateHostDb by Sebastian Nagel · 2 years, 1 month ago
  33. 70b2d5e NUTCH-2950 Improve performance of UpdateHostDb by Sebastian Nagel · 2 years, 1 month ago
  34. 8cfa53f NUTCH-2947 Fetcher: keep state of empty but stateful fetch queues by Sebastian Nagel · 2 years, 1 month ago
  35. c862d24 NUTCH-2947 Fetcher: keep state of empty but stateful fetch queues by Sebastian Nagel · 2 years, 5 months ago
  36. bdbe7b3 NUTCH-2946 Fetcher: optionally slow down fetching from hosts with repeated exceptions by Sebastian Nagel · 2 years, 2 months ago
  37. 42ae2a3 NUTCH-2946 Fetcher: slow down fetching from hosts where requests fail repeatedly by Sebastian Nagel · 2 years, 5 months ago
  38. 568993b NUTCH-2948 Upgrade dependencies to Any23 2.7 and Tika 2.3.0 by Sebastian Nagel · 2 years, 2 months ago
  39. f8967c4 NUTCH-2923: Added JobId in Job Failure logs (#721) by Prakhar Chaube · 2 years, 5 months ago
  40. f691bae NUTCH-2573 Suspend crawling if robots.txt fails to fetch with 5xx status (#724) by Sebastian Nagel · 2 years, 5 months ago
  41. d565f45 NUTCH-2935 DeduplicationJob: failure on URLs with invalid percent encoding by Sebastian Nagel · 2 years, 5 months ago
  42. 847e19d NUTCH-2919 Upgrade to Tika 2.2.1 and Any23 2.6 (#717) by Lewis John McGibbney · 2 years, 5 months ago
  43. f4ce845 Merge pull request #722 from sebastian-nagel/NUTCH-2929-fetcher-threads-slow-start by Sebastian Nagel · 2 years, 5 months ago
  44. 34e7b03 NUTCH-2929 Fetcher: start threads slowly to avoid that resources are temporarily exhausted by Sebastian Nagel · 2 years, 5 months ago
  45. 78e827a Merge pull request #703 from sebastian-nagel/NUTCH-2903-indexer-elastic-https by Sebastian Nagel · 2 years, 6 months ago
  46. e76d69f NUTCH-2429 Fix Plugin System to allow protocol plugins to bundle their URLStreamHandlers (#720) by Lewis John McGibbney · 2 years, 6 months ago
  47. 1d9ebf1 Upgrade to log4j 2.17.0 (#719) by Sebastian Nagel · 2 years, 6 months ago
  48. 938e2dd NUTCH-2917 Remove transitive dependency to log4j 1.x (#718) by Sebastian Nagel · 2 years, 6 months ago
  49. a9b50a7 NUTCH-2449 Replace Tika LanguageIdentifier in language-identifier (#716) by Lewis John McGibbney · 2 years, 6 months ago
  50. bb3d8bc NUTCH-2914 nutch-default.xml: remove obsolete and unused properties (#709) by Sebastian Nagel · 2 years, 6 months ago
  51. 3fe291b NUTCH-2807 SitemapProcessor to warn that ignoring robots.txt affects detection of sitemaps (#710) by Sebastian Nagel · 2 years, 6 months ago
  52. af29192 Merge pull request #711 from sebastian-nagel/NUTCH-2808 by Sebastian Nagel · 2 years, 6 months ago
  53. 4caa5ce NUTCH-2918 Upgrade to log4j 2.16.0 (#715) by Sebastian Nagel · 2 years, 6 months ago
  54. dc6e320 NUTCH-2916 Fix log file rotation / rename default log file (#714) by Sebastian Nagel · 2 years, 6 months ago
  55. 9671b64 Merge pull request #713 from sebastian-nagel/NUTCH-2915 by Sebastian Nagel · 2 years, 6 months ago
  56. 0c9971d NUTCH-2915 Upgrade to log4j 2.15.0 by Sebastian Nagel · 2 years, 6 months ago
  57. 68bd4b5 Update documentation of protocol-related properties in nutch-default.xml by Sebastian Nagel · 5 years ago
  58. dc6f78b NUTCH-2808 Document side effects of ignoring robots.txt by Sebastian Nagel · 2 years, 7 months ago
  59. 9a2f94f Merge pull request #539 from lewismc/NUTCH-2803 by Sebastian Nagel · 2 years, 7 months ago
  60. b2ccbc9 Merge branch 'master' into NUTCH-2803 by Sebastian Nagel · 2 years, 7 months ago
  61. a62168c Merge pull request #708 from prakharchaube/NUTCH-2911 by Sebastian Nagel · 2 years, 7 months ago
  62. 6662081 NUTCH-2911: Added InterruptedException to throws to allow propagation by prakharchaube · 2 years, 7 months ago
  63. dd27044 Merge pull request #704 from sebastian-nagel/NUTCH-2905-index-writers-logging-mask-credentials by Sebastian Nagel · 2 years, 7 months ago
  64. 02cd13c Merge pull request #707 from sebastian-nagel/NUTCH-2908 by Sebastian Nagel · 2 years, 7 months ago
  65. 671f904 Merge pull request #700 from sebastian-nagel/NUTCH-2891-tika-2.1 by Sebastian Nagel · 2 years, 7 months ago
  66. f7705b9 NUTCH-2911: Caught and added log for InterruptedException by prakharchaube · 2 years, 7 months ago
  67. 621c884 NUTCH-2891 Upgrade to Tika 2.1.0 by Sebastian Nagel · 2 years, 7 months ago
  68. 511e4a9 fix for NUTCH-2911 contributed by prakharchaube by prakharchaube · 2 years, 7 months ago
  69. 0dc6959 NUTCH-2908 Log mapreduce job messages and counters in local mode (Log4j2) by Sebastian Nagel · 2 years, 7 months ago
  70. ff800c5 Merge pull request #705 from sebastian-nagel/NUTCH-2867 by Sebastian Nagel · 2 years, 7 months ago
  71. ebf3036 NUTCH-2867 Support for custom HostDb aggregators - complete Javadoc by Sebastian Nagel · 2 years, 7 months ago
  72. 75daf3e Merge pull request #706 from sebastian-nagel/NUTCH-2865 by Sebastian Nagel · 2 years, 7 months ago
  73. 64fb604 Merge pull request #695 from lewismc/NUTCH-2892 by Sebastian Nagel · 2 years, 7 months ago
  74. 5f6f627 NUTCH-2867 Support for custom HostDb aggregators by Sebastian Nagel · 2 years, 7 months ago
  75. 1cff230 NUTCH-2892 Upgrade to Any23 2.5 by Sebastian Nagel · 2 years, 7 months ago
  76. 25ccf89 Merge pull request #702 from sebastian-nagel/NUTCH-2904-crawler-commons-1.2 by Sebastian Nagel · 2 years, 7 months ago
  77. 6bb30c7 NUTCH-2865 WARC exporter support for metadata and dropping empty responses by Sebastian Nagel · 2 years, 7 months ago
  78. 9909a61 NUTCH-2867 Support for custom HostDb aggregators - rename aggregator interface by Sebastian Nagel · 2 years, 7 months ago
  79. ad44f55 NUTCH-2867 Support for custom HostDb aggregators by Sebastian Nagel · 2 years, 7 months ago
  80. 1e7eb52 NUTCH-2867 Support for custom HostDb aggregators (patch contributed by markus) by Sebastian Nagel · 2 years, 7 months ago
  81. e837324 Merge branch 'NUTCH-2902' (contributed by Max Ockner) by Sebastian Nagel · 2 years, 7 months ago
  82. f8ec624 NUTCH-2905 Mask sensitive strings in log output of index writers by Sebastian Nagel · 2 years, 7 months ago
  83. 7fb201c NUTCH-2902 Jexl parsing error on statements by Sebastian Nagel · 2 years, 7 months ago
  84. c2f8d37 NUTCH-2903 indexer-elastic: allow to connect to Elastic server via HTTPS by Sebastian Nagel · 2 years, 7 months ago
  85. e0ff4fb Exclude log4j-api as Elasticsearch client dependency to avoid by Sebastian Nagel · 2 years, 7 months ago
  86. 6393cac NUTCH-2903 indexer-elastic: allow to connect to Elastic server via HTTPS by Sebastian Nagel · 2 years, 7 months ago
  87. 568d73a Upgrade to crawler-commons 1.2 by Sebastian Nagel · 2 years, 7 months ago
  88. 80ac3b5 Merge pull request #701 from sebastian-nagel/NUTCH-2899 by Sebastian Nagel · 2 years, 8 months ago
  89. ad61dd1 NUTCH-2891 Upgrade to Tika 2.1.0 - re-enable language-identifier test by Sebastian Nagel · 2 years, 8 months ago
  90. 9894096 NUTCH-2899 Remove needless warning about missing o/a/rat/anttasks/antlib.xml by Sebastian Nagel · 2 years, 8 months ago
  91. 310bc60 Merge pull request #699 from sebastian-nagel/NUTCH-2862-ivy-jar-not-in-source-package by Sebastian Nagel · 2 years, 8 months ago
  92. b9a4856 quick IntelliJ IDEA setup docs added (#698) by Abu Sufyan · 2 years, 8 months ago
  93. b0cbea5 NUTCH-2891 Upgrade to Tika 2.1.0 by Sebastian Nagel · 2 years, 9 months ago
  94. 351dc50 NUTCH-2862 Do not include Ivy jar in source release package by Sebastian Nagel · 2 years, 8 months ago
  95. c48b8d1 Merge pull request #694 from sebastian-nagel/NUTCH-2890-okhttp-4.9.1 by Sebastian Nagel · 2 years, 9 months ago
  96. b64eceb NUTCH-2890 Upgrade protocol-okhttp to use OkHttp 4.9.1 by Sebastian Nagel · 2 years, 9 months ago
  97. ed6a942 NUTCH-2890 Upgrade protocol-okhttp to use OkHttp 4.9.1 by Sebastian Nagel · 3 years, 5 months ago
  98. eeb9863 Merge pull request #696 from sebastian-nagel/NUTCH-2894-plugin-compile-classpath by Sebastian Nagel · 2 years, 9 months ago
  99. e749e3f NUTCH-2894 Java plugin compilation classpath: priorize plugin dependencies by Sebastian Nagel · 2 years, 9 months ago
  100. 004b62d fireant upgrade dependency elasticsearch-rest-high-level-client in src/plugin/indexer-elastic/ivy.xml from 7.11.1 to 7.13.2 (#688) by Lewis John McGibbney · 2 years, 9 months ago