1. a2b944f Merge branch NUTCH-2716 of https://github.com/YossiTamari/nutch by Sebastian Nagel · 5 years ago
  2. e7e0a21 Fix plugin.xml to match changes in ivy.xml. by Yossi Tamari · 5 years ago
  3. a83a736 When fixHttpHeaders is called with null, return null by Yossi Tamari · 5 years ago
  4. 2fce5cd NUTCH-2716 Response headers are not stored for a compressed response by Yossi Tamari · 5 years ago
  5. 5644f9c NUTCH-2708 urlfilter-automaton: update library dependency (dk.brics.automaton) by Sebastian Nagel · 5 years ago
  6. 8cc41d8 Merge pull request #433 from commoncrawl/cc-fast-url-filter by Sebastian Nagel · 5 years ago
  7. 290e3cb NUTCH-2626 bin/crawl: remove option -noParsing from fetch command by Sebastian Nagel · 5 years ago
  8. 0e44fd1 Merge pull request #451 from sebastian-nagel/NUTCH-2709-remove-unused-http-properties by Sebastian Nagel · 5 years ago
  9. 070c2ad Merge pull request #450 from sebastian-nagel/NUTCH-2708-upgrade-urlfilter-automaton-dependency by Sebastian Nagel · 5 years ago
  10. fd203d8 Merge pull request #452 from sebastian-nagel/NUTCH-2585-trie-string-matcher by Sebastian Nagel · 5 years ago
  11. b1c46c4 NUTCH-2585 NPE in TrieStringMatcher by Sebastian Nagel · 5 years ago
  12. 615b20e NUTCH-2585 NPE in TrieStringMatcher by Sebastian Nagel · 5 years ago
  13. e2e3ee7 NUTCH-2709 Remove unused properties and code related to HTTP protocol by Sebastian Nagel · 5 years ago
  14. 7765bb3 Merge pull request #448 from sebastian-nagel/NUTCH-2704-crawler-commons-upgrade-1.0 by Sebastian Nagel · 5 years ago
  15. b56a577 Merge pull request #445 from sebastian-nagel/NUTCH-2699-protocol-okhttp by Sebastian Nagel · 5 years ago
  16. 3408d74 NUTCH-2708 urlfilter-automaton: update library dependency (dk.brics.automaton) by Sebastian Nagel · 5 years ago
  17. 1fc98bf NUTCH-2690 Configurable and fast URL filter by Sebastian Nagel · 7 years ago
  18. 510a4ea Merge pull request #446 from sebastian-nagel/NUTCH-2700-indexchecker-cmd-line-help by Sebastian Nagel · 5 years ago
  19. f51a276 NUTCH-2699 Protocol-okhttp: needless loops to increment requested bytes counter by Sebastian Nagel · 5 years ago
  20. 7e6eabb NUTCH-2703 parse-tika: Boilerpipe should not run for non-(X)HTML pages by Markus Jelsma · 5 years ago
  21. 84cca6c NUTCH-2704 Upgrade crawler-commons dependency to 1.0 by Sebastian Nagel · 5 years ago
  22. bf75e96 Merge pull request #447 from sebastian-nagel/NUTCH-2701-fetcher-log-times-human-readable by Sebastian Nagel · 5 years ago
  23. 190828f Merge pull request #427 from sebastian-nagel/NUTCH-2666-increase-http-content-limit by Sebastian Nagel · 5 years ago
  24. cda251a Merge pull request #425 from sebastian-nagel/NUTCH-2683-dedup-prefer-https by Sebastian Nagel · 5 years ago
  25. 0624d25 NUTCH-2701 Fetcher: log dates and times also in human-readable form by Sebastian Nagel · 5 years ago
  26. fc8f863 Merge pull request #444 from r0ann3l/NUTCH-2688 by Roannel Fernández Hernández · 5 years ago
  27. a5a65a0 Merge pull request #373 from NextCenturyCorporation/NUTCH-2631 by Roannel Fernández Hernández · 5 years ago
  28. 76c8cff NUTCH-2700 Indexchecker: improve command-line help by Sebastian Nagel · 5 years ago
  29. 3ba0622 NUTCH-2699 Protocol-okhttp: needless loops to increment requested bytes counter by Sebastian Nagel · 5 years ago
  30. 209f2cb NUTCH-2631: Moving dependencies from root ivy file to indexer-kafka plugin. by r0ann3l · 5 years ago
  31. e1c5ebc NUTCH-2688: Some license headers that had gone unfixed by r0ann3l · 5 years ago
  32. 97a24a2 Merge branch 'master' into NUTCH-2688 by r0ann3l · 5 years ago
  33. a6ead23 NUTCH-2688: License headers for Java classes. by r0ann3l · 5 years ago
  34. 8bdec5e NUTCH-2698 Remove sonar build task from build.xml (#443) by Lewis John McGibbney · 5 years ago
  35. 2ca3d89 Merge pull request #442 from apache/revert-441-NUTCH-2697 by Sebastian Nagel · 5 years ago
  36. 5fc56b6 Revert "NUTCH-2697: Upgrade Ivy to fix the issue of an unset packaging.type property. (#441)" by Sebastian Nagel · 5 years ago revert-441-NUTCH-2697
  37. 0b0fcea NUTCH-2697: Upgrade Ivy to fix the issue of an unset packaging.type property. (#441) by Chris Gavin · 5 years ago
  38. dfd8602 Merge pull request #430 from sbatururimi/NUTCH-2676 by Sebastian Nagel · 5 years ago
  39. 8f421a4 NUTCH-2676 Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver by Stas Batururimi · 5 years ago
  40. 23fb95c NUTCH-2696 Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x by Sebastian Nagel · 5 years ago
  41. f7fdca3 NUTCH-2692 Removing previously accidentally added file by Markus Jelsma · 5 years ago
  42. 0085ee7 Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch by Markus Jelsma · 5 years ago
  43. 3fa2f4a NUTCH-2692 Subcollection to support case-insensitive white and black lists by Markus Jelsma · 5 years ago
  44. 89c41e1 NUTCH-2692 Subcollection to support case-insensitive white and black lists by Markus Jelsma · 5 years ago
  45. 78af89f Merge pull request #436 from r0ann3l/NUTCH-2684 by Sebastian Nagel · 5 years ago
  46. e95c915 Merge pull request #437 from sebastian-nagel/NUTCH-2693-misspelled-properties by Sebastian Nagel · 5 years ago
  47. 8cf9e67 Merge pull request #370 from sebastian-nagel/NUTCH-2627-fetcher-filter-urls by Sebastian Nagel · 5 years ago
  48. 546237d NUTCH-2627 Fetcher to optionally filter URLs by Sebastian Nagel · 6 years ago
  49. fd31cea Merge branch 'NUTCH-2695', closes #438 by Sebastian Nagel · 5 years ago
  50. 31ecf64 NUTCH-2695: fix some alerts raised by LGTM by Sebastian Nagel · 5 years ago
  51. 3abe7db NUTCH-2695: fix some alerts raised by LGTM by Malcolm Taylor · 5 years ago
  52. 33922fe NUTCH-2694 HostDB to aggregate by long instead of integer by Markus Jelsma · 5 years ago
  53. 6462821 fixed some spelling mistakes and formatting by Ayal Ciobotaru · 5 years ago
  54. 4787d40 NUTCH-2693 Misspelled configuration property names in documentation by Sebastian Nagel · 6 years ago
  55. da8f3f5 Merge pull request #432 from sebastian-nagel/NUTCH-2689-urlfilter-regex-speed-up by Sebastian Nagel · 5 years ago
  56. cb580f0 Merge pull request #434 from YossiTamari/patch-3 by Sebastian Nagel · 5 years ago
  57. e27eb65 Merge branch 'master' into NUTCH-2684 by r0ann3l · 5 years ago
  58. dd067c5 NUTCH-2684: Outdated documentation. by r0ann3l · 5 years ago
  59. a326284 NUTCH-2684: README.md file for index writer plugins. by r0ann3l · 5 years ago
  60. 6e000e1 Merge pull request #429 from r0ann3l/NUTCH-2685 by Sebastian Nagel · 5 years ago
  61. 010c2fc NUTCH-2691: Improve logging from scoring-depth plugin by YossiTamari · 5 years ago
  62. f87b19b NUTCH-2689 Speed up urlfilter-regex and urlfilter-automaton by Sebastian Nagel · 5 years ago
  63. 6934d52 Merge pull request #424 from sebastian-nagel/NUTCH-2682-upgrade-tika by Sebastian Nagel · 5 years ago
  64. 86aac2d Merge pull request #428 from r0ann3l/NUTCH-2686 by Roannel Fernández Hernández · 5 years ago
  65. dd8f64e NUTCH-2686: Improving description of moreIndexingFilter.mapMimeTypes.field property. by Roannel Fernández Hernández · 5 years ago
  66. 0c18f6c Merge pull request #426 from sebastian-nagel/NUTCH-2680 by Sebastian Nagel · 5 years ago
  67. ac2d578 Merge pull request #400 from jorgelbg/jexl-improve-syntax by Sebastian Nagel · 5 years ago
  68. 64a6f7c NUTCH-2678 Allow for per-host configurable protocol plugin by Markus Jelsma · 5 years ago
  69. 9cc076f NUTCH-2687 Regex for reading title from Content-Disposition is wrong by Markus Jelsma · 5 years ago
  70. a4dd1f7 added description and default values by Ayal Ciobotaru · 5 years ago
  71. 636f576 NUTCH-2685: README.md file for exchange-jexl plugin. by r0ann3l · 5 years ago
  72. 6e8c2c3 fixes for NUTCH-2686: New property: "moreIndexingFilter.mapMimeTypes.field", indicating the name of the field where the mapped mime type must be written. by r0ann3l · 5 years ago
  73. da2a673 Missed this commit, constant type error by Ayal Ciobotaru · 5 years ago
  74. f5b5d2c Changes made per r0ann3l's review, added doc limit, made other small changes by Ayal Ciobotaru · 5 years ago
  75. 4efed0a added one sentence description for Kafka indexer by Ayal Ciobotaru · 6 years ago
  76. 8e511a1 fix for NUTCH-2631 contributed by AyalCiobotaru by Ayal Ciobotaru · 6 years ago
  77. 13a9a6d NUTCH-2666 Increase default value for http.content.limit / ftp.content.limit / file.content.limit by Sebastian Nagel · 5 years ago
  78. 9ae7a80 NUTCH-2680 Documentation: https supported by multiple protocol plugins not only httpclient by Sebastian Nagel · 5 years ago
  79. 3958d0c NUTCH-2683 DeduplicationJob: add option to prefer https:// over http:// by Sebastian Nagel · 5 years ago
  80. d0a4abf NUTCH-2475 If and else-if branches has the same condition by Sebastian Nagel · 5 years ago
  81. 6274083 Merge pull request #371 from sebastian-nagel/NUTCH-2628-fetcher-signature by Sebastian Nagel · 5 years ago
  82. 2ae86d3 NUTCH-2628 Fetcher: optionally generate signature of unparsed content by Sebastian Nagel · 8 years ago
  83. 58ef2da Merge pull request #422 from sebastian-nagel/NUTCH-2657-http-headers-crlf by Sebastian Nagel · 5 years ago
  84. 3ab0227 Merge pull request #398 from jorgelbg/doc-indexer-links by Sebastian Nagel · 5 years ago
  85. 784aa5f NUTCH-2682 Upgrade to Tika 1.20 by Sebastian Nagel · 5 years ago
  86. 241ad5a Merge pull request #1 from apache/master by Ayal Ciobotaru · 5 years ago
  87. 122648e NUTCH-2657 Protocol-http to store HTTP response header with "\r\n" by Sebastian Nagel · 6 years ago
  88. 43d26ce Merge pull request #407 from sebastian-nagel/NUTCH-2674-hostdb-dump-header by Sebastian Nagel · 5 years ago
  89. a965cd2 NUTCH-2668 Integrate OWASP dependency checks as ant target by Sebastian Nagel · 5 years ago
  90. 785a52f Merge pull request #401 from sebastian-nagel/dependency-check by Sebastian Nagel · 5 years ago
  91. a37bde1 NUTCH-1842: crawl.gen.delay value is read incorrectly from config by Sebastian Nagel · 5 years ago
  92. ee9ff89 Merge pull request #392 from sebastian-nagel/NUTCH-2606-mime-detection-plain-text by Sebastian Nagel · 5 years ago
  93. f861c82 NUTCH-1842: crawl.gen.delay value is read incorrectly from config by Sebastian Nagel · 5 years ago
  94. e6a961c NUTCH-2671 Upgrade to ant ivy library by Sebastian Nagel · 5 years ago
  95. 393d3e5 NUTCH-2671 Upgrade to ant ivy library by Sebastian Nagel · 5 years ago
  96. 93b1a81 NUTCH-2671 Upgrade to ant ivy library by Sebastian Nagel · 5 years ago
  97. a5df63a NUTCH-2658 Adding the fields required by the index-links plugin to the schema by Jorge Luis Betancourt · 5 years ago
  98. 31a1ec4 NUTCH-2651 Upgrade to Tika 1.19.1 (from 1.18) by Sebastian Nagel · 5 years ago
  99. 2d48152 NUTCH-2661 Move the TestOutlinks class into the o.a.n.parse path by Jorge Luis Betancourt Gonzalez · 6 years ago
  100. d45fb7a NUTCH-2660 Plugin tests not executed by Sebastian Nagel · 6 years ago