Sign in
apache
/
nutch
/
HEAD
« Previous
48e1aef
NUTCH-2659 Add missing Apache license headers
by Sebastian Nagel
· 6 years ago
a9ea1f1
NUTCH-2655 Update Solr schema.xml for Solr 7.x
by Sebastian Nagel
· 6 years ago
89b16ce
NUTCH-2652 Fetcher launches more fetch tasks than fetch lists
by Sebastian Nagel
· 6 years ago
2a3b1d1
NUTCH-2651 Upgrade core and parse-tika to use Tika 1.19.1
by Sebastian Nagel
· 6 years ago
524a594
NUTCH-2630 Fetcher to log skipped records by robots.txt
by Sebastian Nagel
· 6 years ago
a6f533d
NUTCH-2625 ProtocolFactory.getProtocol(url) may create multiple plugin instances
by Sebastian Nagel
· 6 years ago
8151237
Merge pull request #387 from sebastian-nagel/NUTCH-2630-fetcher-log-robotstxt-denied
by Sebastian Nagel
· 5 years ago
f443f1b
Merge pull request #395 from sebastian-nagel/NUTCH-2655-solr-schema-7x
by Sebastian Nagel
· 5 years ago
898ba0e
Merge pull request #402 from jorgelbg/index-links-schema
by Jorge Luis Betancourt
· 5 years ago
230d1a2
NUTCH-2674 HostDb: dump shows wrong column headers
by Sebastian Nagel
· 5 years ago
73bb0a7
NUTCH-2671 Upgrade to ant ivy library
by Sebastian Nagel
· 6 years ago
9a89898
NUTCH-2671 Upgrade to ant ivy library
by Sebastian Nagel
· 6 years ago
426650f
Merge pull request #406 from sebastian-nagel/NUTCH-2671-ivy-lib-upgrade
by Sebastian Nagel
· 6 years ago
ed142e5
NUTCH-2671 Upgrade to ant ivy library
by Sebastian Nagel
· 6 years ago
3e9a6e4
NUTCH-2668 Integrate OWASP dependency checks as ant target
by Sebastian Nagel
· 6 years ago
45098e7
NUTCH-2658 Adding the fields required by the index-links plugin to the schema
by Jorge Luis Betancourt
· 6 years ago
5e3de5b
Merge pull request #396 from sebastian-nagel/NUTCH-2659-license-headers
by Jorge Luis Betancourt
· 6 years ago
0b1c816
Merge pull request #399 from jorgelbg/indexer-link-test-move
by Jorge Luis Betancourt
· 6 years ago
65c4fed
NUTCH-2651 Upgrade to Tika 1.19.1 (from 1.18)
by Sebastian Nagel
· 6 years ago
1abd6dd
Merge pull request #394 from sebastian-nagel/NUTCH-2652-fetcher-not-split-inputs
by Sebastian Nagel
· 6 years ago
4426ca9
Merge pull request #391 from sebastian-nagel/NUTCH-2651-tika-1.19.1
by Sebastian Nagel
· 6 years ago
95e9d66
Merge pull request #397 from sebastian-nagel/NUTCH-2660-execute-plugin-tests
by Sebastian Nagel
· 6 years ago
5939579
Merge pull request #368 from sebastian-nagel/NUTCH-2625-protocolfactory-getprotocol-synchronized
by Sebastian Nagel
· 6 years ago
b2ec5c4
NUTCH-2663 Improve the JEXL syntax for getting values from the metadata/context
by Jorge Luis Betancourt Gonzalez
· 6 years ago
c8fcb78
NUTCH-2661 Move the TestOutlinks class into the o.a.n.parse path
by Jorge Luis Betancourt Gonzalez
· 6 years ago
2b7dc0f
NUTCH-2660 Plugin tests not executed
by Sebastian Nagel
· 6 years ago
8bf04cd
NUTCH-2659 Add missing Apache license headers
by Sebastian Nagel
· 6 years ago
f79a5af
NUTCH-2658 Add README for the index-links plugin
by Jorge Luis Betancourt Gonzalez
· 6 years ago
1a9f2e6
NUTCH-2655 Update Solr schema.xml for Solr 7.x
by Sebastian Nagel
· 6 years ago
a6de472
NUTCH-2652 Fetcher launches more fetch tasks than fetch lists
by Sebastian Nagel
· 6 years ago
8b7298d
NUTCH-1842: crawl.gen.delay value is read incorrectly from configuration.
by YossiTamari
· 6 years ago
5f53fd4
NUTCH-2606 MIME detection is wrong for plain-text documents send
by Sebastian Nagel
· 6 years ago
a997f10
Merge pull request #389 from sebastian-nagel/NUTCH-2192-remove-oro
by Sebastian Nagel
· 6 years ago
24faf03
NUTCH-2651 Upgrade core and parse-tika to use Tika 1.19.1
by Sebastian Nagel
· 6 years ago
4418a0d
NUTCH-2192 Migrate from Apache ORO to java.util.regex
by Sebastian Nagel
· 6 years ago
cf1f2dd
NUTCH-1121 JUnit test for parse-js
by Sebastian Nagel
· 6 years ago
c532c4e
NUTCH-2192 NUTCH-1678 NUTCH-1014 NUTCH-1021 Migrate from Apache ORO to java.util.regex
by Sebastian Nagel
· 6 years ago
5fb3140
Merge pull request #388 from sebastian-nagel/NUTCH-2648-configurable-tls-cert-check
by Sebastian Nagel
· 6 years ago
58ea01f
NUTCH-2648 Make configurable whether TLS/SSL certificates are checked by protocol plugins
by Sebastian Nagel
· 6 years ago
3f64083
NUTCH-2648 Make configurable whether TLS/SSL certificates are checked by protocol plugins
by Sebastian Nagel
· 6 years ago
54f156c
NUTCH-2630 Fetcher to log skipped records by robots.txt
by Sebastian Nagel
· 6 years ago
4d2938f
Merge pull request #369 from sebastian-nagel/NUTCH-2623-fetcher-queue-mode
by Sebastian Nagel
· 6 years ago
0ce62e1
Merge pull request #383 from sebastian-nagel/NUTCH-2644-crawldb-reader
by Sebastian Nagel
· 6 years ago
525e241
Merge pull request #382 from sebastian-nagel/NUTCH-2634-ant-resolve-default
by Sebastian Nagel
· 6 years ago
ec9e3d8
Merge pull request #376 from sebastian-nagel/NUTCH-2635-generator-temporary-output
by Sebastian Nagel
· 6 years ago
8c55414
Merge pull request #385 from sebastian-nagel/NUTCH-2642-index-more-date-timezone
by Sebastian Nagel
· 6 years ago
d1ffe61
NUTCH-2623 Fetcher to guarantee delay for same host/domain/ip
by Sebastian Nagel
· 6 years ago
61d7e8c
NUTCH-2647 Skip TLS certificate checks in protocol-http plugin
by Markus Jelsma
· 6 years ago
9d59538
Merge pull request #356 from r0ann3l/NUTCH-2602
by Roannel Fernández Hernández
· 6 years ago
afe119d
Merge branch 'master' into NUTCH-2602
by r0ann3l
· 6 years ago
d3864d6
NUTCH-2642 MoreIndexingFilter parses ISO 8601 UTC dates in local time zone
by Sebastian Nagel
· 6 years ago
497db00
NUTCH-2645 Webgraph tools ignore command-line options
by Sebastian Nagel
· 6 years ago
af37024
ProtocolStatusStatistics: job configuration should not be static
by Sebastian Nagel
· 6 years ago
6f5c50e
NUTCH-2644 CrawlDbReader -dump ignores filter options
by Sebastian Nagel
· 6 years ago
7ed4204
NUTCH-2643 ant target "resolve-default" to depend on "init"
by Sebastian Nagel
· 6 years ago
566f3fb
NUTCH-2639 bin/nutch fails to set native library path on Cygwin causing jobs to fail with UnsatisfiedLinkError
by rustyx
· 6 years ago
1130e68
Merge pull request #365 from sebastian-nagel/NUTCH-2621-3rd-party-license-report
by Sebastian Nagel
· 6 years ago
b38077b
NUTCH-2632 protocol-okhttp doesn't accept proxy authentication
by Sebastian Nagel
· 6 years ago
8b9714f
NUTCH-2632 protocol-okhttp doesn't accept proxy authentication
by Sebastian Nagel
· 6 years ago
50f08b3
NUTCH-2633 Fix deprecation warnings when building Nutch master branch under JDK 10.0.2+13 (#374)
by Lewis John McGibbney
· 6 years ago
113c58e
NUTCH-2635 Generator writes unneeded temporary output
by Sebastian Nagel
· 6 years ago
f02110f
NUTCH-2633 Fix deprecation warnings when building Nutch master branch under JDK 10.0.2+13 (#374)
by Lewis John McGibbney
· 6 years ago
5f8b72b
NUTCH-2632 protocol-okhttp proxy authentication
by Steven Woodard
· 6 years ago
01c5d6e
Prepare for new development after release of 1.15
by Sebastian Nagel
· 6 years ago
4e70af2
Fixes for NUTCH-2602: Description as a table with columns: KEY, DESCRIPTION, VALUE.
by r0ann3l
· 6 years ago
aaf85ca
Merge pull request #366 from sebastian-nagel/NUTCH-2622-unbundle-lgpl-licensed-jars
by Sebastian Nagel
· 6 years ago
4d17fac
Merge pull request #367 from sebastian-nagel/NUTCH-2624-protocol-okhttp-resource-leak
by Sebastian Nagel
· 6 years ago
1193f8b
Merge branch 'master' into NUTCH-2602
by r0ann3l
· 6 years ago
a10db14
NUTCH-2625 ProtocolFactory.getProtocol(url) may create multiple plugin instances
by Sebastian Nagel
· 6 years ago
c585312
NUTCH-2624 protocol-okhttp resource leak
by Sebastian Nagel
· 6 years ago
55afdbf
NUTCH-2622 Unbundle LGPL-licensed jars from binary release
by Sebastian Nagel
· 6 years ago
1f148ba
NUTCH-2621 Generate report of third-party licenses
by Sebastian Nagel
· 6 years ago
1e09fee
Merge pull request #364 from sebastian-nagel/NUTCH-1993-use-backup-parsers
by Sebastian Nagel
· 6 years ago
9777aea
Merge pull request #361 from sebastian-nagel/NUTCH-2619-protocol-okhttp-partial-as-truncated
by Sebastian Nagel
· 6 years ago
2f9110c
Merge pull request #355 from sebastian-nagel/NUTCH-2152
by Sebastian Nagel
· 6 years ago
2999e14
Merge pull request #363 from sebastian-nagel/NUTCH-2616-exchange-route-deletions
by Sebastian Nagel
· 6 years ago
b88930e
NUTCH-1993 Nutch does not use backup parsers
by Sebastian Nagel
· 6 years ago
f263d91
Merge pull request #359 from sebastian-nagel/NUTCH-1106-max-outlink-length
by Sebastian Nagel
· 6 years ago
cf183ad
Merge pull request #358 from sebastian-nagel/NUTCH-2071
by Sebastian Nagel
· 6 years ago
718f51b
NUTCH-2616 Review routing of deletions by Exchange component
by Sebastian Nagel
· 6 years ago
8ea99ee
NUTCH-2620 urlfilter-validator incorrectly assumes that top-level domains are not longer than 4 characters
by Sebastian Nagel
· 6 years ago
52e0590
typo in fix
by Gareth Owen
· 6 years ago
07bc20c
Fix invalid assumption in URL validator
by Gareth Owen
· 6 years ago
6d196c8
NUTCH-2071 - also catch any Throwable if parser is called by extension ID
by Sebastian Nagel
· 6 years ago
4a8ad95
NUTCH-2071 A parser failure on a single document
by Sebastian Nagel
· 6 years ago
8d434b5
NUTCH-1106 Options to skip url's based on length
by Sebastian Nagel
· 6 years ago
579a76b
NUTCH-1106 Options to skip url's based on length
by Sebastian Nagel
· 6 years ago
a4569f1
NUTCH-2619 protocol-okhttp: allow to keep partially fetched docs as truncated
by Sebastian Nagel
· 6 years ago
56ee081
NUTCH-2618 protocol-okhttp not to use http.timeout for max duration to fetch document
by Sebastian Nagel
· 6 years ago
f011b21
Merge pull request #357 from sebastian-nagel/NUTCH-2614-NPE-readdb-empty-crawldb
by Sebastian Nagel
· 6 years ago
bef8d8e
NUTCH-2614 NPE in CrawlDbReader -stats on empty CrawlDb
by Sebastian Nagel
· 6 years ago
5a8793c
Merge branch 'master' into NUTCH-2602
by r0ann3l
· 6 years ago
4717ff8
Merge pull request #294 from sebastian-nagel/NUTCH-1541
by Roannel Fernández Hernández
· 6 years ago
3417951
fixes for NUTCH-1514: fixed the failure of unit tests.
by r0ann3l
· 6 years ago
e6787f2
fixes for NUTCH-1514: fixed the failure of unit tests.
by r0ann3l
· 6 years ago
b4add1e
Merge branch 'NUTCH-1541' of github.com:sebastian-nagel/nutch into NUTCH-1541
by r0ann3l
· 6 years ago
8836f7c
fixes for NUTCH-1514: Changes:
by r0ann3l
· 6 years ago
06221e0
Fix unit tests for changes related to NUTCH-1480
by Sebastian Nagel
· 6 years ago
0176883
fixes for NUTCH-1514: Support for NUTCH-1480.
by r0ann3l
· 6 years ago
5fb2d4d
Merge branch 'master' into NUTCH-1541
by r0ann3l
· 6 years ago
Next »