Skip directory entries in RecursiveCopyableDataset to fix IOException on empty source dirs (#4181) * [GOBBLIN-XXXX] Skip directory entries in RecursiveCopyableDataset to avoid IOException on empty source dirs When source.path is an empty directory, FileListUtils.listFilesToCopyAtPath (includeEmptyDirectories=true) returns the directory itself as the sole FileStatus entry. The subsequent call to resolveReplicatedOwnerAndPermissionsRecursively passes file.getPath().getParent() as fromPath — which is *above* replacedPrefix — inverting the ancestry check and throwing an IOException. Skip FileStatus entries where isDirectory=true; empty source directories produce no copy work units by design. Log a warning so operators can diagnose misconfigured source paths. * [ETL-19035] Fix ancestor path resolution for empty source directories in RecursiveCopyableDataset When includeEmptyDirectories=true and the source root is empty, FileListUtils returns the root directory itself as a FileStatus entry. Calling .getParent() on it produces a path above replacedPrefix, breaking the ancestry check in resolveReplicatedOwnerAndPermissionsRecursively. Fix: guard with isAncestor(replacedPrefix, parentPath) — fall back to the file's own path only when parentPath is above the dataset root. This preserves correct ancestor permission replication for nested empty subdirs (where .getParent() is still within replacedPrefix) and ensures the empty root directory is replicated at the destination rather than skipped. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Apache Gobblin is a highly scalable data management solution for structured and byte-oriented data in heterogeneous data ecosystems.
If building the distribution with tests turned on:
If you are going to build Gobblin from the source distribution, run the following command for downloading the gradle-wrapper.jar from Gobblin git repository to gradle/wrapper directory (replace GOBBLIN_VERSION in the URL with the version you downloaded).
wget --no-check-certificate -P gradle/wrapper https://github.com/apache/gobblin/raw/${GOBBLIN_VERSION}/gradle/wrapper/gradle-wrapper.jar
(or)
curl --insecure -L https://github.com/apache/gobblin/raw/${GOBBLIN_VERSION}/gradle/wrapper/gradle-wrapper.jar > gradle/wrapper/gradle-wrapper.jar
Alternatively, you can download it manually from: https://github.com/apache/gobblin/blob/${GOBBLIN_VERSION}/gradle/wrapper/gradle-wrapper.jar
Make sure that you download it to gradle/wrapper directory.
./gradlew rat. Report will be generated under build/rat/rat-report.html./gradlew build -x findbugsMain -x test -x rat -x checkstyleMain The distribution will be created in build/gobblin-distribution/distributions directory. (or)./gradlew build The distribution will be created in build/gobblin-distribution/distributions directory.