Merge pull request #748 from sebastian-nagel/NUTCH-2883-docker

NUTCH-2883 Provide means to run server and webapp as persistent services in Docker container
tree: f680830a2661d57c7979b71ac011526e71288299
  1. .github/
  2. conf/
  3. docker/
  4. ivy/
  5. lib/
  6. licenses-binary/
  7. src/
  8. .asf.yaml
  9. .gitignore
  10. build.xml
  11. CHANGES.txt
  13. eclipse-codeformat.xml
  14. KEYS
  15. LICENSE-binary
  16. LICENSE.txt
  17. NOTICE-binary
  18. NOTICE.txt

Apache Nutch README

For the latest information about Nutch, please visit our website at:

and our wiki, at:

To get started using Nutch read Tutorial:


To contribute a patch, follow these instructions (note that installing Hub is not strictly required, but is recommended).

0. Download and install
1. File JIRA issue for your fix at
- you will get issue id NUTCH-xxx where xxx is the issue ID.
2. git clone
3. cd nutch
4. git checkout -b NUTCH-xxx
5. edit files (please try and include a test case if possible)
6. git status (make sure it shows what files you expected to edit)
7. Make sure that your code complies with the [Nutch codeformatting template](, which is basially two space indents
8. git add <files>
9. git commit -m “fix for NUTCH-xxx contributed by <your username>”
10. git fork
11. git push -u <your git username> NUTCH-xxx
12. git pull-request

IDE setup

Generate Eclipse project files

ant eclipse

and follow the instructions in Importing existing projects.

For Intellij IDEA, first install the IvyIDEA Plugin. then run ant eclipse.

Then open the project in IntelliJ. You may see popups like “Ant build scripts found”, “Frameworks detected - IvyIDEA Framework detected”. Just follow the simple steps in these dialogs.

You must configure the nutch-site.xml before running. Make sure, you've added and plugin.folders properties. The plugin.folders normally points to <project_root>/build/plugins.

Now create a Java Application Configuration, choose org.apache.nutch.crawl.Injector, add two paths as arguments. First one is the crawldb directory, second one is the URL directory where, the injector can read urls. Now run your configuration.

If we still see the No plugins found on paths of property plugin.folders="plugins", update the plugin.folders in the nutch-default.xml, this is a quick fix, but should not be used.