scala - Error in reading DynamoDB record from Spark -

i trying build quick report using zeppelin notebook fetching data dynamodb apache spark

the count running fine beyond not able run

orders.take(1).foreach(println)

fails follwoing error:

org.apache.spark.sparkexception: job aborted due stage failure: task 0.0 in stage 5.0 (tid 5) had not serializable result: org.apache.hadoop.io.text serialization stack: - object not serializable (class: org.apache.hadoop.io.text, value: ) - field (class: scala.tuple2, name: _1, type: class java.lang.object) - object (class scala.tuple2, (,{<<a rec dynamodb json>>})) - element of array (index: 0) - array (class [lscala.tuple2;, size 7)

how fix this? have tries typecast results failed:

 asinstanceof[tuple2[text, dynamodbitemwritable]

so did filter

 orders.filter(_._1 != null)

i planning convert dataframe register temp table. plan run adhoc queries on this.

im not complete spark expert know might parallelized needs serializable. think there might clue in error message:

object not serializable (class: org.apache.hadoop.io.text, value: )

a quick check on definition of class tells me may not be:

public class text     extends binarycomparable     implements writablecomparable<binarycomparable>

this may help:

http://apache-spark-user-list.1001560.n3.nabble.com/how-to-solve-java-io-notserializableexception-org-apache-hadoop-io-text-td2650.html

Story

Search This Blog

scala - Error in reading DynamoDB record from Spark -