i trying similar question: how initialize cluster centers k-means in spark mllib? however, don't totally understand solution. when try add more centroids, error:
exception in thread "main" java.lang.illegalargumentexception: requirement failed: mismatched cluster count
i use
val initialmodel = new kmeansmodel( array("[0.6, 0.6, 5.0]", "[8.0, 8.0, 1.0]", "[11, 9.0, 7.0]").map(vectors.parse(_)) ) val model = new kmeans() .setinitialmodel(initialmodel) .setk(3) .run(data)
by default, kmeans set k 2. it's easy setting k before setting initial model (kmeansmodel.k , kmeans.k must coincide).
val initialmodel = new kmeansmodel( array("[0.6, 0.6, 5.0]", "[8.0, 8.0, 1.0]", "[11, 9.0, 7.0]").map(vectors.parse(_)) ) val model = new kmeans() .setk(3) .setinitialmodel(initialmodel) .run(data)