1. a118c85 Merge pull request #491 from sebastian-nagel/NUTCH-2759-bin-crawl-rename-num-slaves by Sebastian Nagel · 4 years, 3 months ago
  2. 5de2679 Merge pull request #493 from sebastian-nagel/NUTCH-2525 by Sebastian Nagel · 4 years, 3 months ago
  3. 040d71d NUTCH-2759 bin/crawl: Rename option --num-slaves - renamed to --num-fetchers by Sebastian Nagel · 4 years, 4 months ago
  4. aa72b75 Fix for NUTCH-2649: Optionally skip TLS/SSL certificate validation for protocol-selenium and protocol-htmlunit by Bala S Shashanka · 4 years, 4 months ago
  5. d70dbc5 NUTCH-2762 Replace http:// URLs by https:// (build files and documentation) by Sebastian Nagel · 4 years, 4 months ago
  6. dd51cb0 Merge pull request #494 from sebastian-nagel/NUTCH-2671-ivy-jar-download-url by Sebastian Nagel · 4 years, 4 months ago
  7. fb92db9 NUTCH-2761 ivy jar fails to download - move to https download link on repo1 by Sebastian Nagel · 4 years, 4 months ago
  8. 34d2d20 NUTCH-2525 Metadata indexer cannot handle uppercase parse metadata by Sebastian Nagel · 4 years, 7 months ago
  9. c4dd7c1 Merge pull request #486 from sebastian-nagel/NUTCH-2184-indexer-no-crawldb by Sebastian Nagel · 4 years, 4 months ago
  10. 57802d1 NUTCH-2184 Enable IndexingJob to function with no crawldb by Sebastian Nagel · 4 years, 4 months ago
  11. 3bbc6dd Merge pull request #489 from sebastian-nagel/NUTCH-2760-protocol-okhttp-request-message-http-version by Sebastian Nagel · 4 years, 4 months ago
  12. 8a663f9 Fix for NUTCH-1863: Add JSON format dump output to readdb command (#490) by Shashanka Balakuntala Srinivasa · 4 years, 4 months ago
  13. b8d1e4f Merge pull request #487 from commoncrawl/NUTCH-2754-max-crawl-delay by Sebastian Nagel · 4 years, 4 months ago
  14. 80b8a1c Merge pull request #488 from sebastian-nagel/NUTCH-2475-deploy-solr-schema-xml by Sebastian Nagel · 4 years, 4 months ago
  15. 21018be NUTCH-2760 protocol-okhttp: properly record HTTP version in request message header by Sebastian Nagel · 4 years, 5 months ago
  16. 8f15b84 NUTCH-2745 Solr schema.xml not shipped in binary release by Sebastian Nagel · 4 years, 5 months ago
  17. 4c74bce NUTCH-2754 fetcher.max.crawl.delay ignored if exceeding 5 min. / 300 sec. by Sebastian Nagel · 4 years, 5 months ago
  18. ac9c435 Merge pull request #485 from sebastian-nagel/NUTCH-2748-redir-exceeded by Sebastian Nagel · 4 years, 5 months ago
  19. 969a194 NUTCH-2748 Fetch status gone (redirect exceeded) not to overwrite existing items in CrawlDb by Sebastian Nagel · 4 years, 6 months ago
  20. ec45fe5 Merge pull request #480 from sebastian-nagel/NUTCH-2746-url-normalizer-basic-idn by Sebastian Nagel · 4 years, 5 months ago
  21. c4fade4 NUTCH-2184 Enable IndexingJob to function with no crawldb by Sebastian Nagel · 4 years, 5 months ago
  22. c23afa8 Merge pull request #484 from balashashanka/NUTCH-2739 by Sebastian Nagel · 4 years, 5 months ago
  23. 597eb71 NUTCH-2739: Upgrade ES and migrate to REST client by Shashanka Balakuntala Srinivasa · 4 years, 6 months ago
  24. a390922 Merge pull request #483 from sju/NUTCH-2750 by Sebastian Nagel · 4 years, 6 months ago
  25. c43b486 NUTCH-2746 Basic URL normalizer to normalize Unicode domain names by Sebastian Nagel · 5 years ago
  26. b554145 Merge pull request #481 from sebastian-nagel/NUTCH-1559-dupl-metatags by Sebastian Nagel · 4 years, 6 months ago
  27. 497936b NUTCH-1559 parse-metatags duplicates extracted metatags by Sebastian Nagel · 4 years, 7 months ago
  28. 65361d0 Merge pull request #482 from balashashanka/NUTCH-2747 by Sebastian Nagel · 4 years, 6 months ago
  29. cf2430f Fix for NUTCH-2747: Fixed indentation according to http://svn.apache.org/repos/asf/nutch/branches/2.x/eclipse-codeformat.xml by Shashanka Balakuntala Srinivasa · 4 years, 6 months ago
  30. 81fee06 Fix for NUTCH-2750 by Jurian Broertjes · 4 years, 6 months ago
  31. 2dc9032 fix for NUTCH-2747 contributed by balashashanka by Shashanka Balakuntala Srinivasa · 4 years, 6 months ago
  32. 6ca3c5b Merge pull request #479 from YossiTamari/patch-6 by Sebastian Nagel · 4 years, 7 months ago
  33. d046b46 Add sitemap.size.max by YossiTamari · 4 years, 7 months ago
  34. 968fc7e Also set file.content.limit. by YossiTamari · 4 years, 7 months ago
  35. 83ea207 Nutch-2511 Support large sitemaps by YossiTamari · 4 years, 7 months ago
  36. f1cf2ff Prepare for new development after release of 1.16 by Sebastian Nagel · 4 years, 7 months ago
  37. 087aea6 Merge pull request #478 from sebastian-nagel/NUTCH-2279-linkrank-output-compression by Sebastian Nagel · 4 years, 7 months ago
  38. a2762f0 Merge pull request #477 from sebastian-nagel/NUTCH-2737-generator-log-selection by Sebastian Nagel · 4 years, 7 months ago
  39. 2f310ae Generator: improve description of crawl.gen.delay by Sebastian Nagel · 4 years, 7 months ago
  40. 0347527 NUTCH-2279 LinkRank fails when using Hadoop MR output compression by Sebastian Nagel · 4 years, 7 months ago
  41. 4d68c08 NUTCH-2740 Generator: generate.max.count overflow not logged by Sebastian Nagel · 4 years, 7 months ago
  42. 44ded9b Generator: apply formatting by Sebastian Nagel · 4 years, 7 months ago
  43. 35da06f NUTCH-2737 Generator: count and log reason of rejections during selection by Sebastian Nagel · 4 years, 7 months ago
  44. 8d21260 Generator: fix logging of hostdb path by Sebastian Nagel · 4 years, 7 months ago
  45. e46232d NUTCH-2738 Generator: document property generate.restrict.status by Sebastian Nagel · 4 years, 7 months ago
  46. f02c98e NUTCH-2737 Generator: count and log reason of rejections during selection by Sebastian Nagel · 4 years, 7 months ago
  47. 873d7bf Merge pull request #473 from sebastian-nagel/NUTCH-2381-text-prof-signature-lexicographic-sorting by Sebastian Nagel · 4 years, 7 months ago
  48. 9e49c3f Merge pull request #474 from sebastian-nagel/NUTCH-2457-parse-tika-embedded-docs by Sebastian Nagel · 4 years, 7 months ago
  49. c9238a1 NUTCH-2457 Embedded documents likely not correctly parsed by Tika by Sebastian Nagel · 4 years, 7 months ago
  50. 9c424f9 NUTCH-2457 Embedded documents likely not correctly parsed by Tika by Sebastian Nagel · 4 years, 7 months ago
  51. 29865b2 NUTCH-2457 Embedded documents likely not correctly parsed by Tika by Sebastian Nagel · 4 years, 7 months ago
  52. 0f46927 Merge pull request #476 from sebastian-nagel/NUTCH-2482-index-geoip-npe by Sebastian Nagel · 4 years, 7 months ago
  53. ff9f025 Merge pull request #475 from r0ann3l/NUTCH-2732 by Sebastian Nagel · 4 years, 7 months ago
  54. 254a968 NUTCH-2482 index-geoip not to add null values to document fields by Sebastian Nagel · 4 years, 7 months ago
  55. 8875a8f Merge branch 'master' into NUTCH-2732 by r0ann3l · 4 years, 7 months ago
  56. 026d2e7 NUTCH-2732: nutch-default.xml as a file that is not a template, since this file should not be modified. by r0ann3l · 4 years, 7 months ago
  57. 45a66f7 NUTCH-2381 In some situations the class TextProfileSignature by Sebastian Nagel · 4 years, 7 months ago
  58. caa9422 Merge pull request #472 from sebastian-nagel/NUTCH-2736-docker-upgrade-base-image by Sebastian Nagel · 4 years, 7 months ago
  59. c735ebb NUTCH-2736 Upgrade Dockerfile to be based on recent Ubuntu LTS version by Sebastian Nagel · 4 years, 7 months ago
  60. f1c352e Merge pull request #471 from sebastian-nagel/NUTCH-1982-ide-setup by Sebastian Nagel · 4 years, 7 months ago
  61. 9e5ae73 Merge pull request #468 from r0ann3l/NUTCH-2654 by Roannel Fernández Hernández · 4 years, 8 months ago
  62. 2a6fbfe NUTCH-2654: a note about the new location of schema.xml file by r0ann3l · 4 years, 8 months ago
  63. 600c298 NUTCH-1982 Make Git ignore IDE project files and add note about IDE setup by Sebastian Nagel · 4 years, 8 months ago
  64. 35d28ce Merge branch 'master' into NUTCH-2654 by r0ann3l · 4 years, 8 months ago
  65. 87b08fc Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 4 years, 8 months ago
  66. 9dbb4be NUTCH-2612 Support for sitemap processing by hostname by Markus Jelsma · 4 years, 8 months ago
  67. 1fa627c Merge pull request #469 from r0ann3l/NUTCH-2732 by Roannel Fernández Hernández · 4 years, 8 months ago
  68. 596146e NUTCH-2732: .template for configuration files. by r0ann3l · 4 years, 8 months ago
  69. 0adeed2 NUTCH-2654: schema.xml moved to indexer-solr folder. by r0ann3l · 4 years, 8 months ago
  70. 5cc1239 NUTCH-2654: Obsolete configuration deleted by r0ann3l · 4 years, 8 months ago
  71. 830ca8e Merge pull request #440 from sebastian-nagel/NUTCH-2696-segment-reader-output-charset by Sebastian Nagel · 4 years, 8 months ago
  72. 9dd11cd Merge pull request #435 from sebastian-nagel/NUTCH-2598-normalizerchecker-fails-on-invalid-url by Sebastian Nagel · 4 years, 8 months ago
  73. 9a9f425 Merge pull request #462 from sebastian-nagel/NUTCH-2729-protocol-okhttp-mark-truncated by Sebastian Nagel · 4 years, 8 months ago
  74. fa9f895 Merge pull request #467 from sebastian-nagel/NUTCH-2669-javax-ws-packaging-type by Sebastian Nagel · 4 years, 8 months ago
  75. 5d2a76d Merge pull request #465 from r0ann3l/NUTCH-2719 by Roannel Fernández Hernández · 4 years, 8 months ago
  76. 6e54f72 Merge pull request #466 from r0ann3l/NUTCH-2718 by Roannel Fernández Hernández · 4 years, 8 months ago
  77. 77fe258 NUTCH-2669 Reliable solution for javax.ws packaging.type by Sebastian Nagel · 4 years, 8 months ago
  78. efcafb6 NUTCH-2729 protocol-okhttp: fix marking of truncated content by Sebastian Nagel · 4 years, 8 months ago
  79. 36c2ce6 NUTCH-2598 URLNormalizerChecker fails on invalid URLs in input by Sebastian Nagel · 5 years ago
  80. 6f5eab3 NUTCH-2718: file names of configuration files of index writers and exchanges are configurable. by r0ann3l · 4 years, 8 months ago
  81. 5c45172 NUTCH-2729 protocol-okhttp: fix marking of truncated content by Sebastian Nagel · 4 years, 8 months ago
  82. a82a663 NUTCH-2729 protocol-okhttp: fix marking of truncated content by Sebastian Nagel · 4 years, 9 months ago
  83. 73b1286 NUTCH-2719: Showing a warning when an exchange points to an indexer that doesn't exist. by r0ann3l · 4 years, 8 months ago
  84. 4695c27 Merge pull request #461 from sebastian-nagel/NUTCH-2728-protocol-okhttp-3.14.2 by Sebastian Nagel · 4 years, 8 months ago
  85. caa6d5c Merge pull request #460 from sebastian-nagel/NUTCH-2727-upgrade-Hadoop-2.9.2 by Sebastian Nagel · 4 years, 8 months ago
  86. 1698f6a NUTCH-2727 Upgrade Hadoop dependencies to 2.9.2 by Sebastian Nagel · 4 years, 9 months ago
  87. da2b51b Merge pull request #459 from sebastian-nagel/NUTCH-2726-tika-1.22 by Sebastian Nagel · 4 years, 8 months ago
  88. dda4578 Merge pull request #464 from deanpearce/bug/solr-cleanup by Sebastian Nagel · 4 years, 8 months ago
  89. 2408256 NUTCH-2731 Fix to ensure that credentials are set on delete operations. by Dean Pearce · 4 years, 8 months ago
  90. eebb807 NUTCH-2728 protocol-okhttp: upgrade okhttp dependency to 3.14.2 by Sebastian Nagel · 4 years, 9 months ago
  91. 09b7142 NUTCH-2726 Upgrade to Tika 1.22 by Sebastian Nagel · 4 years, 9 months ago
  92. f02c41a NUTCH-2702 Fetcher: suppress stack for frequent exceptions by Sebastian Nagel · 5 years ago
  93. 54f73bf NUTCH-2725 Plugin lib-http to support per-host configurable cookies by Markus Jelsma · 4 years, 9 months ago
  94. a67c9be NUTCH-2724 Metadata indexer not to emit empty values by Markus Jelsma · 4 years, 10 months ago
  95. 9692464 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 4 years, 10 months ago
  96. 5150c44 NUTCH-2723 Indexer Solr not to decode URLs before deletion by Markus Jelsma · 4 years, 10 months ago
  97. fc6a274 NUTCH-2722 Fetch dependencies via https by Sebastian Nagel · 5 years ago
  98. 598bbc4 fix for NUTCH-1403 contributed by aalbahem by Ameer Albahem · 4 years, 10 months ago
  99. f4b6b37 NUTCH-2706 NUTCH-2650 -addBinaryContent -base64 flag can cause "String length must be a multiple of four" error in IndexingJob by Sebastian Nagel · 5 years ago
  100. aff04c8 fix for NUTCH-2717 by Jurian Broertjes · 5 years ago