Sign in
apache
/
nutch
/
HEAD
« Previous
a2b944f
Merge branch NUTCH-2716 of https://github.com/YossiTamari/nutch
by Sebastian Nagel
· 5 years ago
e7e0a21
Fix plugin.xml to match changes in ivy.xml.
by Yossi Tamari
· 5 years ago
a83a736
When fixHttpHeaders is called with null, return null
by Yossi Tamari
· 5 years ago
2fce5cd
NUTCH-2716 Response headers are not stored for a compressed response
by Yossi Tamari
· 5 years ago
5644f9c
NUTCH-2708 urlfilter-automaton: update library dependency (dk.brics.automaton)
by Sebastian Nagel
· 5 years ago
8cc41d8
Merge pull request #433 from commoncrawl/cc-fast-url-filter
by Sebastian Nagel
· 5 years ago
290e3cb
NUTCH-2626 bin/crawl: remove option -noParsing from fetch command
by Sebastian Nagel
· 5 years ago
0e44fd1
Merge pull request #451 from sebastian-nagel/NUTCH-2709-remove-unused-http-properties
by Sebastian Nagel
· 5 years ago
070c2ad
Merge pull request #450 from sebastian-nagel/NUTCH-2708-upgrade-urlfilter-automaton-dependency
by Sebastian Nagel
· 5 years ago
fd203d8
Merge pull request #452 from sebastian-nagel/NUTCH-2585-trie-string-matcher
by Sebastian Nagel
· 5 years ago
b1c46c4
NUTCH-2585 NPE in TrieStringMatcher
by Sebastian Nagel
· 5 years ago
615b20e
NUTCH-2585 NPE in TrieStringMatcher
by Sebastian Nagel
· 5 years ago
e2e3ee7
NUTCH-2709 Remove unused properties and code related to HTTP protocol
by Sebastian Nagel
· 5 years ago
7765bb3
Merge pull request #448 from sebastian-nagel/NUTCH-2704-crawler-commons-upgrade-1.0
by Sebastian Nagel
· 5 years ago
b56a577
Merge pull request #445 from sebastian-nagel/NUTCH-2699-protocol-okhttp
by Sebastian Nagel
· 5 years ago
3408d74
NUTCH-2708 urlfilter-automaton: update library dependency (dk.brics.automaton)
by Sebastian Nagel
· 5 years ago
1fc98bf
NUTCH-2690 Configurable and fast URL filter
by Sebastian Nagel
· 7 years ago
510a4ea
Merge pull request #446 from sebastian-nagel/NUTCH-2700-indexchecker-cmd-line-help
by Sebastian Nagel
· 5 years ago
f51a276
NUTCH-2699 Protocol-okhttp: needless loops to increment requested bytes counter
by Sebastian Nagel
· 5 years ago
7e6eabb
NUTCH-2703 parse-tika: Boilerpipe should not run for non-(X)HTML pages
by Markus Jelsma
· 5 years ago
84cca6c
NUTCH-2704 Upgrade crawler-commons dependency to 1.0
by Sebastian Nagel
· 5 years ago
bf75e96
Merge pull request #447 from sebastian-nagel/NUTCH-2701-fetcher-log-times-human-readable
by Sebastian Nagel
· 5 years ago
190828f
Merge pull request #427 from sebastian-nagel/NUTCH-2666-increase-http-content-limit
by Sebastian Nagel
· 5 years ago
cda251a
Merge pull request #425 from sebastian-nagel/NUTCH-2683-dedup-prefer-https
by Sebastian Nagel
· 5 years ago
0624d25
NUTCH-2701 Fetcher: log dates and times also in human-readable form
by Sebastian Nagel
· 5 years ago
fc8f863
Merge pull request #444 from r0ann3l/NUTCH-2688
by Roannel Fernández Hernández
· 5 years ago
a5a65a0
Merge pull request #373 from NextCenturyCorporation/NUTCH-2631
by Roannel Fernández Hernández
· 5 years ago
76c8cff
NUTCH-2700 Indexchecker: improve command-line help
by Sebastian Nagel
· 5 years ago
3ba0622
NUTCH-2699 Protocol-okhttp: needless loops to increment requested bytes counter
by Sebastian Nagel
· 5 years ago
209f2cb
NUTCH-2631: Moving dependencies from root ivy file to indexer-kafka plugin.
by r0ann3l
· 5 years ago
e1c5ebc
NUTCH-2688: Some license headers that had gone unfixed
by r0ann3l
· 5 years ago
97a24a2
Merge branch 'master' into NUTCH-2688
by r0ann3l
· 5 years ago
a6ead23
NUTCH-2688: License headers for Java classes.
by r0ann3l
· 5 years ago
8bdec5e
NUTCH-2698 Remove sonar build task from build.xml (#443)
by Lewis John McGibbney
· 5 years ago
2ca3d89
Merge pull request #442 from apache/revert-441-NUTCH-2697
by Sebastian Nagel
· 5 years ago
5fc56b6
Revert "NUTCH-2697: Upgrade Ivy to fix the issue of an unset packaging.type property. (#441)"
by Sebastian Nagel
· 5 years ago
revert-441-NUTCH-2697
0b0fcea
NUTCH-2697: Upgrade Ivy to fix the issue of an unset packaging.type property. (#441)
by Chris Gavin
· 5 years ago
dfd8602
Merge pull request #430 from sbatururimi/NUTCH-2676
by Sebastian Nagel
· 5 years ago
8f421a4
NUTCH-2676 Update to the latest selenium and add code to use chrome and firefox headless mode with the remote web driver
by Stas Batururimi
· 5 years ago
23fb95c
NUTCH-2696 Nutch SegmentReader does not dump non-ASCII characters with Hadoop 3.x
by Sebastian Nagel
· 5 years ago
f7fdca3
NUTCH-2692 Removing previously accidentally added file
by Markus Jelsma
· 5 years ago
0085ee7
Merge branch 'master' of https://gitbox.apache.org/repos/asf/nutch
by Markus Jelsma
· 5 years ago
3fa2f4a
NUTCH-2692 Subcollection to support case-insensitive white and black lists
by Markus Jelsma
· 5 years ago
89c41e1
NUTCH-2692 Subcollection to support case-insensitive white and black lists
by Markus Jelsma
· 5 years ago
78af89f
Merge pull request #436 from r0ann3l/NUTCH-2684
by Sebastian Nagel
· 5 years ago
e95c915
Merge pull request #437 from sebastian-nagel/NUTCH-2693-misspelled-properties
by Sebastian Nagel
· 5 years ago
8cf9e67
Merge pull request #370 from sebastian-nagel/NUTCH-2627-fetcher-filter-urls
by Sebastian Nagel
· 5 years ago
546237d
NUTCH-2627 Fetcher to optionally filter URLs
by Sebastian Nagel
· 6 years ago
fd31cea
Merge branch 'NUTCH-2695', closes #438
by Sebastian Nagel
· 5 years ago
31ecf64
NUTCH-2695: fix some alerts raised by LGTM
by Sebastian Nagel
· 5 years ago
3abe7db
NUTCH-2695: fix some alerts raised by LGTM
by Malcolm Taylor
· 5 years ago
33922fe
NUTCH-2694 HostDB to aggregate by long instead of integer
by Markus Jelsma
· 5 years ago
6462821
fixed some spelling mistakes and formatting
by Ayal Ciobotaru
· 5 years ago
4787d40
NUTCH-2693 Misspelled configuration property names in documentation
by Sebastian Nagel
· 6 years ago
da8f3f5
Merge pull request #432 from sebastian-nagel/NUTCH-2689-urlfilter-regex-speed-up
by Sebastian Nagel
· 5 years ago
cb580f0
Merge pull request #434 from YossiTamari/patch-3
by Sebastian Nagel
· 5 years ago
e27eb65
Merge branch 'master' into NUTCH-2684
by r0ann3l
· 5 years ago
dd067c5
NUTCH-2684: Outdated documentation.
by r0ann3l
· 5 years ago
a326284
NUTCH-2684: README.md file for index writer plugins.
by r0ann3l
· 5 years ago
6e000e1
Merge pull request #429 from r0ann3l/NUTCH-2685
by Sebastian Nagel
· 5 years ago
010c2fc
NUTCH-2691: Improve logging from scoring-depth plugin
by YossiTamari
· 5 years ago
f87b19b
NUTCH-2689 Speed up urlfilter-regex and urlfilter-automaton
by Sebastian Nagel
· 5 years ago
6934d52
Merge pull request #424 from sebastian-nagel/NUTCH-2682-upgrade-tika
by Sebastian Nagel
· 5 years ago
86aac2d
Merge pull request #428 from r0ann3l/NUTCH-2686
by Roannel Fernández Hernández
· 5 years ago
dd8f64e
NUTCH-2686: Improving description of moreIndexingFilter.mapMimeTypes.field property.
by Roannel Fernández Hernández
· 5 years ago
0c18f6c
Merge pull request #426 from sebastian-nagel/NUTCH-2680
by Sebastian Nagel
· 5 years ago
ac2d578
Merge pull request #400 from jorgelbg/jexl-improve-syntax
by Sebastian Nagel
· 5 years ago
64a6f7c
NUTCH-2678 Allow for per-host configurable protocol plugin
by Markus Jelsma
· 5 years ago
9cc076f
NUTCH-2687 Regex for reading title from Content-Disposition is wrong
by Markus Jelsma
· 5 years ago
a4dd1f7
added description and default values
by Ayal Ciobotaru
· 5 years ago
636f576
NUTCH-2685: README.md file for exchange-jexl plugin.
by r0ann3l
· 5 years ago
6e8c2c3
fixes for NUTCH-2686: New property: "moreIndexingFilter.mapMimeTypes.field", indicating the name of the field where the mapped mime type must be written.
by r0ann3l
· 5 years ago
da2a673
Missed this commit, constant type error
by Ayal Ciobotaru
· 5 years ago
f5b5d2c
Changes made per r0ann3l's review, added doc limit, made other small changes
by Ayal Ciobotaru
· 5 years ago
4efed0a
added one sentence description for Kafka indexer
by Ayal Ciobotaru
· 6 years ago
8e511a1
fix for NUTCH-2631 contributed by AyalCiobotaru
by Ayal Ciobotaru
· 6 years ago
13a9a6d
NUTCH-2666 Increase default value for http.content.limit / ftp.content.limit / file.content.limit
by Sebastian Nagel
· 5 years ago
9ae7a80
NUTCH-2680 Documentation: https supported by multiple protocol plugins not only httpclient
by Sebastian Nagel
· 5 years ago
3958d0c
NUTCH-2683 DeduplicationJob: add option to prefer https:// over http://
by Sebastian Nagel
· 5 years ago
d0a4abf
NUTCH-2475 If and else-if branches has the same condition
by Sebastian Nagel
· 5 years ago
6274083
Merge pull request #371 from sebastian-nagel/NUTCH-2628-fetcher-signature
by Sebastian Nagel
· 5 years ago
2ae86d3
NUTCH-2628 Fetcher: optionally generate signature of unparsed content
by Sebastian Nagel
· 8 years ago
58ef2da
Merge pull request #422 from sebastian-nagel/NUTCH-2657-http-headers-crlf
by Sebastian Nagel
· 5 years ago
3ab0227
Merge pull request #398 from jorgelbg/doc-indexer-links
by Sebastian Nagel
· 5 years ago
784aa5f
NUTCH-2682 Upgrade to Tika 1.20
by Sebastian Nagel
· 5 years ago
241ad5a
Merge pull request #1 from apache/master
by Ayal Ciobotaru
· 5 years ago
122648e
NUTCH-2657 Protocol-http to store HTTP response header with "\r\n"
by Sebastian Nagel
· 6 years ago
43d26ce
Merge pull request #407 from sebastian-nagel/NUTCH-2674-hostdb-dump-header
by Sebastian Nagel
· 5 years ago
a965cd2
NUTCH-2668 Integrate OWASP dependency checks as ant target
by Sebastian Nagel
· 5 years ago
785a52f
Merge pull request #401 from sebastian-nagel/dependency-check
by Sebastian Nagel
· 5 years ago
a37bde1
NUTCH-1842: crawl.gen.delay value is read incorrectly from config
by Sebastian Nagel
· 5 years ago
ee9ff89
Merge pull request #392 from sebastian-nagel/NUTCH-2606-mime-detection-plain-text
by Sebastian Nagel
· 5 years ago
f861c82
NUTCH-1842: crawl.gen.delay value is read incorrectly from config
by Sebastian Nagel
· 5 years ago
e6a961c
NUTCH-2671 Upgrade to ant ivy library
by Sebastian Nagel
· 5 years ago
393d3e5
NUTCH-2671 Upgrade to ant ivy library
by Sebastian Nagel
· 5 years ago
93b1a81
NUTCH-2671 Upgrade to ant ivy library
by Sebastian Nagel
· 5 years ago
a5df63a
NUTCH-2658 Adding the fields required by the index-links plugin to the schema
by Jorge Luis Betancourt
· 5 years ago
31a1ec4
NUTCH-2651 Upgrade to Tika 1.19.1 (from 1.18)
by Sebastian Nagel
· 5 years ago
2d48152
NUTCH-2661 Move the TestOutlinks class into the o.a.n.parse path
by Jorge Luis Betancourt Gonzalez
· 6 years ago
d45fb7a
NUTCH-2660 Plugin tests not executed
by Sebastian Nagel
· 6 years ago
Next »