I have a CSV file which I am trying to load using Spark CSV package and it does not load data properly because few of the fields have \n
within them for e.g. the following two rows
"XYZ", "Test Data", "TestNew\nline", "OtherData"
"XYZ", "Test Data", "blablablabla
\nblablablablablalbal", "OtherData"
I am using the following code which is straightforward I am using parserLib
as univocity
as read in internet it solves multiple newline problem but it does not seems to be the case for me.
SQLContext sqlContext = new SQLContext(sc);
DataFrame df = sqlContext.read()
.format("com.databricks.spark.csv")
.option("inferSchema", "true")
.option("header", "true")
.option("parserLib","univocity")
.load("data.csv");
How do I replace newline within fields which starts with quotes. Is there any easier way?