1. ea862f4 Merge pull request #496 from balashashanka/NUTCH-2649 by Sebastian Nagel · 4 years, 4 months ago
  2. 0eec1f8 Fix for NUTCH-2649: Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit by Bala S Shashanka · 4 years, 4 months ago
  3. 0a2ffa7 Merge pull request #492 from sebastian-nagel/NUTCH-2733-protocol-okhttp-support-brotli by Sebastian Nagel · 4 years, 4 months ago
  4. 6f1a0dd NUTCH-2733 protocol-okhttp: add support for Brotli compression (Content-Encoding) by Sebastian Nagel · 4 years, 4 months ago
  5. a209946 NUTCH-2733 protocol-okhttp: add support for Brotli compression (Content-Encoding) by Sebastian Nagel · 4 years, 4 months ago
  6. a118c85 Merge pull request #491 from sebastian-nagel/NUTCH-2759-bin-crawl-rename-num-slaves by Sebastian Nagel · 4 years, 4 months ago
  7. 5de2679 Merge pull request #493 from sebastian-nagel/NUTCH-2525 by Sebastian Nagel · 4 years, 4 months ago
  8. 040d71d NUTCH-2759 bin/crawl: Rename option --num-slaves - renamed to --num-fetchers by Sebastian Nagel · 4 years, 4 months ago
  9. aa72b75 Fix for NUTCH-2649: Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit by Bala S Shashanka · 4 years, 4 months ago
  10. d70dbc5 NUTCH-2762 Replace http:// URLs by https:// (build files and documentation) by Sebastian Nagel · 4 years, 4 months ago
  11. dd51cb0 Merge pull request #494 from sebastian-nagel/NUTCH-2671-ivy-jar-download-url by Sebastian Nagel · 4 years, 4 months ago
  12. fb92db9 NUTCH-2761 ivy jar fails to download - move to https download link on repo1 by Sebastian Nagel · 4 years, 4 months ago
  13. 34d2d20 NUTCH-2525 Metadata indexer cannot handle uppercase parse metadata by Sebastian Nagel · 4 years, 8 months ago
  14. c4dd7c1 Merge pull request #486 from sebastian-nagel/NUTCH-2184-indexer-no-crawldb by Sebastian Nagel · 4 years, 4 months ago
  15. 57802d1 NUTCH-2184 Enable IndexingJob to function with no crawldb by Sebastian Nagel · 4 years, 4 months ago
  16. 3bbc6dd Merge pull request #489 from sebastian-nagel/NUTCH-2760-protocol-okhttp-request-message-http-version by Sebastian Nagel · 4 years, 4 months ago
  17. 8a663f9 Fix for NUTCH-1863: Add JSON format dump output to readdb command (#490) by Shashanka Balakuntala Srinivasa · 4 years, 5 months ago
  18. b8d1e4f Merge pull request #487 from commoncrawl/NUTCH-2754-max-crawl-delay by Sebastian Nagel · 4 years, 5 months ago
  19. 80b8a1c Merge pull request #488 from sebastian-nagel/NUTCH-2475-deploy-solr-schema-xml by Sebastian Nagel · 4 years, 5 months ago
  20. 21018be NUTCH-2760 protocol-okhttp: properly record HTTP version in request message header by Sebastian Nagel · 4 years, 5 months ago
  21. 8f15b84 NUTCH-2745 Solr schema.xml not shipped in binary release by Sebastian Nagel · 4 years, 5 months ago
  22. 4c74bce NUTCH-2754 fetcher.max.crawl.delay ignored if exceeding 5 min. / 300 sec. by Sebastian Nagel · 4 years, 5 months ago
  23. ac9c435 Merge pull request #485 from sebastian-nagel/NUTCH-2748-redir-exceeded by Sebastian Nagel · 4 years, 6 months ago
  24. 969a194 NUTCH-2748 Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb by Sebastian Nagel · 4 years, 6 months ago
  25. ec45fe5 Merge pull request #480 from sebastian-nagel/NUTCH-2746-url-normalizer-basic-idn by Sebastian Nagel · 4 years, 6 months ago
  26. c4fade4 NUTCH-2184 Enable IndexingJob to function with no crawldb by Sebastian Nagel · 4 years, 6 months ago
  27. c23afa8 Merge pull request #484 from balashashanka/NUTCH-2739 by Sebastian Nagel · 4 years, 6 months ago
  28. 597eb71 NUTCH-2739: Upgrade ES and migrate to REST client by Shashanka Balakuntala Srinivasa · 4 years, 6 months ago
  29. a390922 Merge pull request #483 from sju/NUTCH-2750 by Sebastian Nagel · 4 years, 6 months ago
  30. c43b486 NUTCH-2746 Basic URL normalizer to normalize Unicode domain names by Sebastian Nagel · 5 years ago
  31. b554145 Merge pull request #481 from sebastian-nagel/NUTCH-1559-dupl-metatags by Sebastian Nagel · 4 years, 6 months ago
  32. 497936b NUTCH-1559 parse-metatags duplicates extracted metatags by Sebastian Nagel · 4 years, 7 months ago
  33. 65361d0 Merge pull request #482 from balashashanka/NUTCH-2747 by Sebastian Nagel · 4 years, 6 months ago
  34. cf2430f Fix for NUTCH-2747: Fixed indentation according to http://svn.apache.org/repos/asf/nutch/branches/2.x/eclipse-codeformat.xml by Shashanka Balakuntala Srinivasa · 4 years, 7 months ago
  35. 81fee06 Fix for NUTCH-2750 by Jurian Broertjes · 4 years, 7 months ago
  36. 2dc9032 fix for NUTCH-2747 contributed by balashashanka by Shashanka Balakuntala Srinivasa · 4 years, 7 months ago
  37. 6ca3c5b Merge pull request #479 from YossiTamari/patch-6 by Sebastian Nagel · 4 years, 7 months ago
  38. d046b46 Add sitemap.size.max by YossiTamari · 4 years, 7 months ago
  39. 968fc7e Also set file.content.limit. by YossiTamari · 4 years, 7 months ago
  40. 83ea207 Nutch-2511 Support large sitemaps by YossiTamari · 4 years, 7 months ago
  41. f1cf2ff Prepare for new development after release of 1.16 by Sebastian Nagel · 4 years, 7 months ago
  42. 087aea6 Merge pull request #478 from sebastian-nagel/NUTCH-2279-linkrank-output-compression by Sebastian Nagel · 4 years, 8 months ago
  43. a2762f0 Merge pull request #477 from sebastian-nagel/NUTCH-2737-generator-log-selection by Sebastian Nagel · 4 years, 8 months ago
  44. 2f310ae Generator: improve description of crawl.gen.delay by Sebastian Nagel · 4 years, 8 months ago
  45. 0347527 NUTCH-2279 LinkRank fails when using Hadoop MR output compression by Sebastian Nagel · 4 years, 8 months ago
  46. 4d68c08 NUTCH-2740 Generator: generate.max.count overflow not logged by Sebastian Nagel · 4 years, 8 months ago
  47. 44ded9b Generator: apply formatting by Sebastian Nagel · 4 years, 8 months ago
  48. 35da06f NUTCH-2737 Generator: count and log reason of rejections during selection by Sebastian Nagel · 4 years, 8 months ago
  49. 8d21260 Generator: fix logging of hostdb path by Sebastian Nagel · 4 years, 8 months ago
  50. e46232d NUTCH-2738 Generator: document property generate.restrict.status by Sebastian Nagel · 4 years, 8 months ago
  51. f02c98e NUTCH-2737 Generator: count and log reason of rejections during selection by Sebastian Nagel · 4 years, 8 months ago
  52. 873d7bf Merge pull request #473 from sebastian-nagel/NUTCH-2381-text-prof-signature-lexicographic-sorting by Sebastian Nagel · 4 years, 8 months ago
  53. 9e49c3f Merge pull request #474 from sebastian-nagel/NUTCH-2457-parse-tika-embedded-docs by Sebastian Nagel · 4 years, 8 months ago
  54. c9238a1 NUTCH-2457 Embedded documents likely not correctly parsed by Tika by Sebastian Nagel · 4 years, 8 months ago
  55. 9c424f9 NUTCH-2457 Embedded documents likely not correctly parsed by Tika by Sebastian Nagel · 4 years, 8 months ago
  56. 29865b2 NUTCH-2457 Embedded documents likely not correctly parsed by Tika by Sebastian Nagel · 4 years, 8 months ago
  57. 0f46927 Merge pull request #476 from sebastian-nagel/NUTCH-2482-index-geoip-npe by Sebastian Nagel · 4 years, 8 months ago
  58. ff9f025 Merge pull request #475 from r0ann3l/NUTCH-2732 by Sebastian Nagel · 4 years, 8 months ago
  59. 254a968 NUTCH-2482 index-geoip not to add null values to document fields by Sebastian Nagel · 4 years, 8 months ago
  60. 8875a8f Merge branch 'master' into NUTCH-2732 by r0ann3l · 4 years, 8 months ago
  61. 026d2e7 NUTCH-2732: nutch-default.xml as a file that is not a template, since this file should not be modified. by r0ann3l · 4 years, 8 months ago
  62. 45a66f7 NUTCH-2381 In some situations the class TextProfileSignature by Sebastian Nagel · 4 years, 8 months ago
  63. caa9422 Merge pull request #472 from sebastian-nagel/NUTCH-2736-docker-upgrade-base-image by Sebastian Nagel · 4 years, 8 months ago
  64. c735ebb NUTCH-2736 Upgrade Dockerfile to be based on recent Ubuntu LTS version by Sebastian Nagel · 4 years, 8 months ago
  65. f1c352e Merge pull request #471 from sebastian-nagel/NUTCH-1982-ide-setup by Sebastian Nagel · 4 years, 8 months ago
  66. 9e5ae73 Merge pull request #468 from r0ann3l/NUTCH-2654 by Roannel Fernández Hernández · 4 years, 8 months ago
  67. 2a6fbfe NUTCH-2654: a note about the new location of schema.xml file by r0ann3l · 4 years, 8 months ago
  68. 600c298 NUTCH-1982 Make Git ignore IDE project files and add note about IDE setup by Sebastian Nagel · 4 years, 8 months ago
  69. 35d28ce Merge branch 'master' into NUTCH-2654 by r0ann3l · 4 years, 8 months ago
  70. 87b08fc Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 4 years, 8 months ago
  71. 9dbb4be NUTCH-2612 Support for sitemap processing by hostname by Markus Jelsma · 4 years, 8 months ago
  72. 1fa627c Merge pull request #469 from r0ann3l/NUTCH-2732 by Roannel Fernández Hernández · 4 years, 8 months ago
  73. 596146e NUTCH-2732: .template for configuration files. by r0ann3l · 4 years, 8 months ago
  74. 0adeed2 NUTCH-2654: schema.xml moved to indexer-solr folder. by r0ann3l · 4 years, 8 months ago
  75. 5cc1239 NUTCH-2654: Obsolete configuration deleted by r0ann3l · 4 years, 9 months ago
  76. 830ca8e Merge pull request #440 from sebastian-nagel/NUTCH-2696-segment-reader-output-charset by Sebastian Nagel · 4 years, 9 months ago
  77. 9dd11cd Merge pull request #435 from sebastian-nagel/NUTCH-2598-normalizerchecker-fails-on-invalid-url by Sebastian Nagel · 4 years, 9 months ago
  78. 9a9f425 Merge pull request #462 from sebastian-nagel/NUTCH-2729-protocol-okhttp-mark-truncated by Sebastian Nagel · 4 years, 9 months ago
  79. fa9f895 Merge pull request #467 from sebastian-nagel/NUTCH-2669-javax-ws-packaging-type by Sebastian Nagel · 4 years, 9 months ago
  80. 5d2a76d Merge pull request #465 from r0ann3l/NUTCH-2719 by Roannel Fernández Hernández · 4 years, 9 months ago
  81. 6e54f72 Merge pull request #466 from r0ann3l/NUTCH-2718 by Roannel Fernández Hernández · 4 years, 9 months ago
  82. 77fe258 NUTCH-2669 Reliable solution for javax.ws packaging.type by Sebastian Nagel · 4 years, 9 months ago
  83. efcafb6 NUTCH-2729 protocol-okhttp: fix marking of truncated content by Sebastian Nagel · 4 years, 9 months ago
  84. 36c2ce6 NUTCH-2598 URLNormalizerChecker fails on invalid URLs in input by Sebastian Nagel · 5 years ago
  85. 6f5eab3 NUTCH-2718: file names of configuration files of index writers and exchanges are configurable. by r0ann3l · 4 years, 9 months ago
  86. 5c45172 NUTCH-2729 protocol-okhttp: fix marking of truncated content by Sebastian Nagel · 4 years, 9 months ago
  87. a82a663 NUTCH-2729 protocol-okhttp: fix marking of truncated content by Sebastian Nagel · 4 years, 9 months ago
  88. 73b1286 NUTCH-2719: Showing a warning when an exchange points to an indexer that doesn't exist. by r0ann3l · 4 years, 9 months ago
  89. 4695c27 Merge pull request #461 from sebastian-nagel/NUTCH-2728-protocol-okhttp-3.14.2 by Sebastian Nagel · 4 years, 9 months ago
  90. caa6d5c Merge pull request #460 from sebastian-nagel/NUTCH-2727-upgrade-Hadoop-2.9.2 by Sebastian Nagel · 4 years, 9 months ago
  91. 1698f6a NUTCH-2727 Upgrade Hadoop dependencies to 2.9.2 by Sebastian Nagel · 4 years, 9 months ago
  92. da2b51b Merge pull request #459 from sebastian-nagel/NUTCH-2726-tika-1.22 by Sebastian Nagel · 4 years, 9 months ago
  93. dda4578 Merge pull request #464 from deanpearce/bug/solr-cleanup by Sebastian Nagel · 4 years, 9 months ago
  94. 2408256 NUTCH-2731 Fix to ensure that credentials are set on delete operations. by Dean Pearce · 4 years, 9 months ago
  95. eebb807 NUTCH-2728 protocol-okhttp: upgrade okhttp dependency to 3.14.2 by Sebastian Nagel · 4 years, 9 months ago
  96. 09b7142 NUTCH-2726 Upgrade to Tika 1.22 by Sebastian Nagel · 4 years, 9 months ago
  97. f02c41a NUTCH-2702 Fetcher: suppress stack for frequent exceptions by Sebastian Nagel · 5 years ago
  98. 54f73bf NUTCH-2725 Plugin lib-http to support per-host configurable cookies by Markus Jelsma · 4 years, 10 months ago
  99. a67c9be NUTCH-2724 Metadata indexer not to emit empty values by Markus Jelsma · 4 years, 10 months ago
  100. 9692464 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 4 years, 10 months ago