Right now I have one test data which have 1 partition and inside that partition it has 2 parquet files
If I read data as:
val df = spark.read.format("delta").load("./test1510/table@v1")
Then I get latest data with 10,000 rows and if I read:
val df = spark.read.format("delta").load("./test1510/table@v0")
Then I get 612 rows, now my question is: How can I view only those new rows which were added in version 1 which is 10,000 - 612 = 9388 rows only
In short at each version I just want to view which data changed. Overall in delta log I am able to see json files and inside there json file I can see that it create separate parquet file at each version but how can I view it in code ?
I am using Spark with Scala