i trying build quick report using zeppelin notebook fetching data dynamodb apache spark
the count running fine beyond not able run
orders.take(1).foreach(println)
fails follwoing error:
org.apache.spark.sparkexception: job aborted due stage failure: task 0.0 in stage 5.0 (tid 5) had not serializable result: org.apache.hadoop.io.text serialization stack: - object not serializable (class: org.apache.hadoop.io.text, value: ) - field (class: scala.tuple2, name: _1, type: class java.lang.object) - object (class scala.tuple2, (,{<<a rec dynamodb json>>})) - element of array (index: 0) - array (class [lscala.tuple2;, size 7)
how fix this? have tries typecast results failed:
asinstanceof[tuple2[text, dynamodbitemwritable]
so did filter
orders.filter(_._1 != null)
i planning convert dataframe register temp table. plan run adhoc queries on this.
im not complete spark expert know might parallelized needs serializable. think there might clue in error message:
object not serializable (class: org.apache.hadoop.io.text, value: )
a quick check on definition of class tells me may not be:
public class text extends binarycomparable implements writablecomparable<binarycomparable>
this may help: