reading json in spark

reading a typical json file in spark may fail . it will say something like corrupt records ….spark looks for a corrupt record column in the schema where it can dump jsons that are not correctly parsed , if it doesnt find the same , it will fail .

here is one way to handle this problem

https://docs.azuredatabricks.net/_static/notebooks/read-csv-corrupt-record.html

the other option is to convert the json such that each line is a json object , the json object is not sperated by “,” and the parser can expect a complete json in each line

this behavior is because in a system like Hive the json objects are stored as values of a single column.

see code below

#File location and type

file_location = “/FileStore/tables/trythisjsonfmtd.json”
file_type = “json”

#CSV options

infer_schema = “true”
first_row_is_header = “false”
delimiter = “,”

#The applied options are for CSV files. For other file types, these will be ignored.

df = spark.read.format(file_type) \
.option(“inferSchema”, infer_schema) \
.option(“header”, first_row_is_header) \
.option(“sep”, delimiter) \
.load(file_location)

display(df)

Share this: