NUTCH-3163 Integrate Apache Yetus' pre-commit patch testing into Nutch GitHub Continuous Integration (#907)

For the latest information about Nutch, please visit our website at:
and our wiki, at:
https://cwiki.apache.org/confluence/display/NUTCH/Home
To get started using Nutch read Tutorial:
https://cwiki.apache.org/confluence/display/NUTCH/NutchTutorial
To contribute a patch, follow these instructions (note that installing Hub is not strictly required, but is recommended).
git clone https://github.com/apache/nutch.gitcd nutchgit checkout -b NUTCH-xxxxgit status (make sure it shows what files you expected to edit)git add <files>git commit -m "fix for NUTCH-xxx contributed by <your username>"hub fork (if hub is not installed, fork using the “fork” button on the Nutch Github project page)git push -u <your git username> NUTCH-xxxxhub pull-request (if hub is not installed, please follow the instructions to create a pull-request from a fork)Pull requests run Apache Yetus test-patch for automated checks (style, reporting). See Basic Precommit and Usage Introduction. CI uses Java 17. To run test-patch locally (e.g. before opening a PR):
test-patch --basedir=/path/to/clean/repo --build-tool=nobuild \ --plugins=all,-jira,-gitlab,-unit,-compile [patchfile]
Exclude patterns can be added in .yetus/excludes.txt (regex, one per line).
Generate Eclipse project files
ant eclipse
and follow the instructions in Importing existing projects.
You must configure the nutch-site.xml before running. Make sure you have added http.agent.name and plugin.folders properties. The plugin.folders normally points to <project_root>/build/plugins.
Now create a Java Application Configuration, choose org.apache.nutch.crawl.Injector, add two paths as arguments: first the crawldb directory, second the URL directory where the injector can read urls. Then run your configuration.
If we still see “No plugins found on paths of property plugin.folders=plugins”, update the plugin.folders in the nutch-default.xml; this is a quick fix, but should not be used.
First install the IvyIDEA Plugin. Then run ant eclipse. This creates the .classpath and .project files so Intellij can import the project.
In Intellij IDEA, select File > New > Project from Existing Sources. Select the nutch home directory and click “Open”.
On the “Import Project” screen select the “Import project from external model” radio button and select “Eclipse”. Click “Create”. On the next screen the “Eclipse projects directory” should be already set to the nutch folder. Leave the “Create module files near .classpath files” radio button selected.
Click “Next” on the next screens. On the project SDK screen select Java 11 and click “Create”. N.B. On Mac with homebrew openjdk, use the directory under libexec: <openjdk11_directory>/libexec/openjdk.jdk/Contents/Home.
Once the project is imported, you will see a popup saying “Ant build scripts found”, “Frameworks detected - IvyIDEA Framework detected”. Click “Import”. If you don't get the pop-up, go through the steps again as this happens from time to time. There is another Ant popup that asks you to configure the project. Do NOT click “Configure”.
To import the code-style: Intellij IDEA > Preferences > Editor > Code Style > Java. For the Scheme dropdown select “Project”. Click the gear icon and select “Import Scheme” > “Eclipse XML file”. Select the eclipse-format.xml file and click “Open”. On the next screen check the “Current Scheme” checkbox and hit OK.
Running in Intellij
Note: You will need to manually trigger a build through ANT to get latest updated changes when running, because the ant build system is separate from the Intellij one.