Saturday, April 30, 2016

spark: read an unsupported mix of union types in avro file

By Hường Hana 11:00 PM apache-spark, apache-spark-sql, scala, spark-avro Leave a Comment

I'm trying to switch from reading csv flat files to avro files on spark. following https://github.com/databricks/spark-avro I use:

import com.databricks.spark.avro._ val sqlContext = new org.apache.spark.sql.SQLContext(sc) val df = sqlContext.read.avro("gs://logs.xyz.com/raw/2016/04/20/div1/div2/2016-04-20-08-28-35.UTC.blah-blah.avro")

and get

java.lang.UnsupportedOperationException: This mix of union types is not supported (see README): ArrayBuffer(STRING)

the readme file states clearly:

This library supports reading all Avro types, with the exception of complex union types. It uses the following mapping from Avro types to Spark SQL types:

when i try to textread the same file I can see the schema

val df = sc.textFile("gs://logs.xyz.com/raw/2016/04/20/div1/div2/2016-04-20-08-28-35.UTC.blah-blah.avro") df.take(2).foreach(println)

{"name":"log_record","type":"record","fields":[{"name":"request","type":{"type":"record","name":"request_data","fields":[{"name":"datetime","type":"string"},{"name":"ip","type":"string"},{"name":"host","type":"string"},{"name":"uri","type":"string"},{"name":"request_uri","type":"string"},{"name":"referer","type":"string"},{"name":"useragent","type":"string"}]}}

<------- an excerpt of the full reply ------->

since I have little control on the format I'm getting these files in, my question here is - is there a workaround someone tested and can recommend?

I use gc dataproc with

MASTER=yarn-cluster spark-shell --num-executors 4 --executor-memory 4G --executor-cores 4 --packages com.databricks:spark-avro_2.10:2.0.1,com.databricks:spark-csv_2.11:1.3.0

any help would be greatly appreciated.....

Coding Question

Saturday, April 30, 2016

spark: read an unsupported mix of union types in avro file

0 Answers

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment

Search

Popular Posts

Labels

Blog Archive

Find Us On Facebook