1. 48e1aef NUTCH-2659 Add missing Apache license headers by Sebastian Nagel · 6 years ago
  2. a9ea1f1 NUTCH-2655 Update Solr schema.xml for Solr 7.x by Sebastian Nagel · 6 years ago
  3. 89b16ce NUTCH-2652 Fetcher launches more fetch tasks than fetch lists by Sebastian Nagel · 6 years ago
  4. 2a3b1d1 NUTCH-2651 Upgrade core and parse-tika to use Tika 1.19.1 by Sebastian Nagel · 6 years ago
  5. 524a594 NUTCH-2630 Fetcher to log skipped records by robots.txt by Sebastian Nagel · 6 years ago
  6. a6f533d NUTCH-2625 ProtocolFactory.getProtocol(url) may create multiple plugin instances by Sebastian Nagel · 6 years ago
  7. 8151237 Merge pull request #387 from sebastian-nagel/NUTCH-2630-fetcher-log-robotstxt-denied by Sebastian Nagel · 5 years ago
  8. f443f1b Merge pull request #395 from sebastian-nagel/NUTCH-2655-solr-schema-7x by Sebastian Nagel · 5 years ago
  9. 898ba0e Merge pull request #402 from jorgelbg/index-links-schema by Jorge Luis Betancourt · 5 years ago
  10. 230d1a2 NUTCH-2674 HostDb: dump shows wrong column headers by Sebastian Nagel · 5 years ago
  11. 73bb0a7 NUTCH-2671 Upgrade to ant ivy library by Sebastian Nagel · 6 years ago
  12. 9a89898 NUTCH-2671 Upgrade to ant ivy library by Sebastian Nagel · 6 years ago
  13. 426650f Merge pull request #406 from sebastian-nagel/NUTCH-2671-ivy-lib-upgrade by Sebastian Nagel · 6 years ago
  14. ed142e5 NUTCH-2671 Upgrade to ant ivy library by Sebastian Nagel · 6 years ago
  15. 3e9a6e4 NUTCH-2668 Integrate OWASP dependency checks as ant target by Sebastian Nagel · 6 years ago
  16. 45098e7 NUTCH-2658 Adding the fields required by the index-links plugin to the schema by Jorge Luis Betancourt · 6 years ago
  17. 5e3de5b Merge pull request #396 from sebastian-nagel/NUTCH-2659-license-headers by Jorge Luis Betancourt · 6 years ago
  18. 0b1c816 Merge pull request #399 from jorgelbg/indexer-link-test-move by Jorge Luis Betancourt · 6 years ago
  19. 65c4fed NUTCH-2651 Upgrade to Tika 1.19.1 (from 1.18) by Sebastian Nagel · 6 years ago
  20. 1abd6dd Merge pull request #394 from sebastian-nagel/NUTCH-2652-fetcher-not-split-inputs by Sebastian Nagel · 6 years ago
  21. 4426ca9 Merge pull request #391 from sebastian-nagel/NUTCH-2651-tika-1.19.1 by Sebastian Nagel · 6 years ago
  22. 95e9d66 Merge pull request #397 from sebastian-nagel/NUTCH-2660-execute-plugin-tests by Sebastian Nagel · 6 years ago
  23. 5939579 Merge pull request #368 from sebastian-nagel/NUTCH-2625-protocolfactory-getprotocol-synchronized by Sebastian Nagel · 6 years ago
  24. b2ec5c4 NUTCH-2663 Improve the JEXL syntax for getting values from the metadata/context by Jorge Luis Betancourt Gonzalez · 6 years ago
  25. c8fcb78 NUTCH-2661 Move the TestOutlinks class into the o.a.n.parse path by Jorge Luis Betancourt Gonzalez · 6 years ago
  26. 2b7dc0f NUTCH-2660 Plugin tests not executed by Sebastian Nagel · 6 years ago
  27. 8bf04cd NUTCH-2659 Add missing Apache license headers by Sebastian Nagel · 6 years ago
  28. f79a5af NUTCH-2658 Add README for the index-links plugin by Jorge Luis Betancourt Gonzalez · 6 years ago
  29. 1a9f2e6 NUTCH-2655 Update Solr schema.xml for Solr 7.x by Sebastian Nagel · 6 years ago
  30. a6de472 NUTCH-2652 Fetcher launches more fetch tasks than fetch lists by Sebastian Nagel · 6 years ago
  31. 8b7298d NUTCH-1842: crawl.gen.delay value is read incorrectly from configuration. by YossiTamari · 6 years ago
  32. 5f53fd4 NUTCH-2606 MIME detection is wrong for plain-text documents send by Sebastian Nagel · 6 years ago
  33. a997f10 Merge pull request #389 from sebastian-nagel/NUTCH-2192-remove-oro by Sebastian Nagel · 6 years ago
  34. 24faf03 NUTCH-2651 Upgrade core and parse-tika to use Tika 1.19.1 by Sebastian Nagel · 6 years ago
  35. 4418a0d NUTCH-2192 Migrate from Apache ORO to java.util.regex by Sebastian Nagel · 6 years ago
  36. cf1f2dd NUTCH-1121 JUnit test for parse-js by Sebastian Nagel · 6 years ago
  37. c532c4e NUTCH-2192 NUTCH-1678 NUTCH-1014 NUTCH-1021 Migrate from Apache ORO to java.util.regex by Sebastian Nagel · 6 years ago
  38. 5fb3140 Merge pull request #388 from sebastian-nagel/NUTCH-2648-configurable-tls-cert-check by Sebastian Nagel · 6 years ago
  39. 58ea01f NUTCH-2648 Make configurable whether TLS/SSL certificates are checked by protocol plugins by Sebastian Nagel · 6 years ago
  40. 3f64083 NUTCH-2648 Make configurable whether TLS/SSL certificates are checked by protocol plugins by Sebastian Nagel · 6 years ago
  41. 54f156c NUTCH-2630 Fetcher to log skipped records by robots.txt by Sebastian Nagel · 6 years ago
  42. 4d2938f Merge pull request #369 from sebastian-nagel/NUTCH-2623-fetcher-queue-mode by Sebastian Nagel · 6 years ago
  43. 0ce62e1 Merge pull request #383 from sebastian-nagel/NUTCH-2644-crawldb-reader by Sebastian Nagel · 6 years ago
  44. 525e241 Merge pull request #382 from sebastian-nagel/NUTCH-2634-ant-resolve-default by Sebastian Nagel · 6 years ago
  45. ec9e3d8 Merge pull request #376 from sebastian-nagel/NUTCH-2635-generator-temporary-output by Sebastian Nagel · 6 years ago
  46. 8c55414 Merge pull request #385 from sebastian-nagel/NUTCH-2642-index-more-date-timezone by Sebastian Nagel · 6 years ago
  47. d1ffe61 NUTCH-2623 Fetcher to guarantee delay for same host/domain/ip by Sebastian Nagel · 6 years ago
  48. 61d7e8c NUTCH-2647 Skip TLS certificate checks in protocol-http plugin by Markus Jelsma · 6 years ago
  49. 9d59538 Merge pull request #356 from r0ann3l/NUTCH-2602 by Roannel Fernández Hernández · 6 years ago
  50. afe119d Merge branch 'master' into NUTCH-2602 by r0ann3l · 6 years ago
  51. d3864d6 NUTCH-2642 MoreIndexingFilter parses ISO 8601 UTC dates in local time zone by Sebastian Nagel · 6 years ago
  52. 497db00 NUTCH-2645 Webgraph tools ignore command-line options by Sebastian Nagel · 6 years ago
  53. af37024 ProtocolStatusStatistics: job configuration should not be static by Sebastian Nagel · 6 years ago
  54. 6f5c50e NUTCH-2644 CrawlDbReader -dump ignores filter options by Sebastian Nagel · 6 years ago
  55. 7ed4204 NUTCH-2643 ant target "resolve-default" to depend on "init" by Sebastian Nagel · 6 years ago
  56. 566f3fb NUTCH-2639 bin/nutch fails to set native library path on Cygwin causing jobs to fail with UnsatisfiedLinkError by rustyx · 6 years ago
  57. 1130e68 Merge pull request #365 from sebastian-nagel/NUTCH-2621-3rd-party-license-report by Sebastian Nagel · 6 years ago
  58. b38077b NUTCH-2632 protocol-okhttp doesn't accept proxy authentication by Sebastian Nagel · 6 years ago
  59. 8b9714f NUTCH-2632 protocol-okhttp doesn't accept proxy authentication by Sebastian Nagel · 6 years ago
  60. 50f08b3 NUTCH-2633 Fix deprecation warnings when building Nutch master branch under JDK 10.0.2+13 (#374) by Lewis John McGibbney · 6 years ago
  61. 113c58e NUTCH-2635 Generator writes unneeded temporary output by Sebastian Nagel · 6 years ago
  62. f02110f NUTCH-2633 Fix deprecation warnings when building Nutch master branch under JDK 10.0.2+13 (#374) by Lewis John McGibbney · 6 years ago
  63. 5f8b72b NUTCH-2632 protocol-okhttp proxy authentication by Steven Woodard · 6 years ago
  64. 01c5d6e Prepare for new development after release of 1.15 by Sebastian Nagel · 6 years ago
  65. 4e70af2 Fixes for NUTCH-2602: Description as a table with columns: KEY, DESCRIPTION, VALUE. by r0ann3l · 6 years ago
  66. aaf85ca Merge pull request #366 from sebastian-nagel/NUTCH-2622-unbundle-lgpl-licensed-jars by Sebastian Nagel · 6 years ago
  67. 4d17fac Merge pull request #367 from sebastian-nagel/NUTCH-2624-protocol-okhttp-resource-leak by Sebastian Nagel · 6 years ago
  68. 1193f8b Merge branch 'master' into NUTCH-2602 by r0ann3l · 6 years ago
  69. a10db14 NUTCH-2625 ProtocolFactory.getProtocol(url) may create multiple plugin instances by Sebastian Nagel · 6 years ago
  70. c585312 NUTCH-2624 protocol-okhttp resource leak by Sebastian Nagel · 6 years ago
  71. 55afdbf NUTCH-2622 Unbundle LGPL-licensed jars from binary release by Sebastian Nagel · 6 years ago
  72. 1f148ba NUTCH-2621 Generate report of third-party licenses by Sebastian Nagel · 6 years ago
  73. 1e09fee Merge pull request #364 from sebastian-nagel/NUTCH-1993-use-backup-parsers by Sebastian Nagel · 6 years ago
  74. 9777aea Merge pull request #361 from sebastian-nagel/NUTCH-2619-protocol-okhttp-partial-as-truncated by Sebastian Nagel · 6 years ago
  75. 2f9110c Merge pull request #355 from sebastian-nagel/NUTCH-2152 by Sebastian Nagel · 6 years ago
  76. 2999e14 Merge pull request #363 from sebastian-nagel/NUTCH-2616-exchange-route-deletions by Sebastian Nagel · 6 years ago
  77. b88930e NUTCH-1993 Nutch does not use backup parsers by Sebastian Nagel · 6 years ago
  78. f263d91 Merge pull request #359 from sebastian-nagel/NUTCH-1106-max-outlink-length by Sebastian Nagel · 6 years ago
  79. cf183ad Merge pull request #358 from sebastian-nagel/NUTCH-2071 by Sebastian Nagel · 6 years ago
  80. 718f51b NUTCH-2616 Review routing of deletions by Exchange component by Sebastian Nagel · 6 years ago
  81. 8ea99ee NUTCH-2620 urlfilter-validator incorrectly assumes that top-level domains are not longer than 4 characters by Sebastian Nagel · 6 years ago
  82. 52e0590 typo in fix by Gareth Owen · 6 years ago
  83. 07bc20c Fix invalid assumption in URL validator by Gareth Owen · 6 years ago
  84. 6d196c8 NUTCH-2071 - also catch any Throwable if parser is called by extension ID by Sebastian Nagel · 6 years ago
  85. 4a8ad95 NUTCH-2071 A parser failure on a single document by Sebastian Nagel · 6 years ago
  86. 8d434b5 NUTCH-1106 Options to skip url's based on length by Sebastian Nagel · 6 years ago
  87. 579a76b NUTCH-1106 Options to skip url's based on length by Sebastian Nagel · 6 years ago
  88. a4569f1 NUTCH-2619 protocol-okhttp: allow to keep partially fetched docs as truncated by Sebastian Nagel · 6 years ago
  89. 56ee081 NUTCH-2618 protocol-okhttp not to use http.timeout for max duration to fetch document by Sebastian Nagel · 6 years ago
  90. f011b21 Merge pull request #357 from sebastian-nagel/NUTCH-2614-NPE-readdb-empty-crawldb by Sebastian Nagel · 6 years ago
  91. bef8d8e NUTCH-2614 NPE in CrawlDbReader -stats on empty CrawlDb by Sebastian Nagel · 6 years ago
  92. 5a8793c Merge branch 'master' into NUTCH-2602 by r0ann3l · 6 years ago
  93. 4717ff8 Merge pull request #294 from sebastian-nagel/NUTCH-1541 by Roannel Fernández Hernández · 6 years ago
  94. 3417951 fixes for NUTCH-1514: fixed the failure of unit tests. by r0ann3l · 6 years ago
  95. e6787f2 fixes for NUTCH-1514: fixed the failure of unit tests. by r0ann3l · 6 years ago
  96. b4add1e Merge branch 'NUTCH-1541' of github.com:sebastian-nagel/nutch into NUTCH-1541 by r0ann3l · 6 years ago
  97. 8836f7c fixes for NUTCH-1514: Changes: by r0ann3l · 6 years ago
  98. 06221e0 Fix unit tests for changes related to NUTCH-1480 by Sebastian Nagel · 6 years ago
  99. 0176883 fixes for NUTCH-1514: Support for NUTCH-1480. by r0ann3l · 6 years ago
  100. 5fb2d4d Merge branch 'master' into NUTCH-1541 by r0ann3l · 6 years ago