[Bug] Fix skip() api maybe skip unexpected bytes which makes inconsistent data (#40) (#52)
### What changes were proposed in this pull request?
Fix bug when call `inputstream.skip()` which may return unexpected result
### Why are the changes needed?
Get exception messages as following, and it maybe caused by unexpected data from `Local` storage
```
com.tencent.rss.common.exception.RssException: Unexpected crc value for blockId[9992363390829154], expected:2562548848, actual:2244862586
at com.tencent.rss.client.impl.ShuffleReadClientImpl.readShuffleBlockData(ShuffleReadClientImpl.java:184)
at org.apache.spark.shuffle.reader.RssShuffleDataIterator.hasNext(RssShuffleDataIterator.java:99)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
```
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
With current UTs
diff --git a/storage/src/main/java/com/tencent/rss/storage/handler/impl/LocalFileReader.java b/storage/src/main/java/com/tencent/rss/storage/handler/impl/LocalFileReader.java
index 8ff2f5f..f16846a 100644
--- a/storage/src/main/java/com/tencent/rss/storage/handler/impl/LocalFileReader.java
+++ b/storage/src/main/java/com/tencent/rss/storage/handler/impl/LocalFileReader.java
@@ -42,7 +42,22 @@
public byte[] read(long offset, int length) {
try {
- dataInputStream.skip(offset);
+ long targetSkip = offset;
+ // comments from skip API:
+ // The skip method may, for a variety of reasons,
+ // end up skipping over some smaller number of bytes, possibly 0
+ // the result should be checked and try again until skip expectation length
+ while (targetSkip > 0) {
+ long realSkip = dataInputStream.skip(targetSkip);
+ if (realSkip == -1) {
+ throw new RuntimeException("Unexpected EOF when skip bytes");
+ }
+ targetSkip -= realSkip;
+ if (targetSkip > 0) {
+ LOG.warn("Got unexpected skip for path:" + path + " with offset["
+ + offset + "], length[" + length + "], remain[" + targetSkip + "]");
+ }
+ }
byte[] buf = new byte[length];
dataInputStream.readFully(buf);
return buf;