`
wang_peng1
  • 浏览: 3904501 次
  • 性别: Icon_minigender_1
  • 来自: 北京
社区版块
存档分类
最新评论

aggregate

 
阅读更多
First lets see how parallelize splits your data between partitions:

val x = sc.parallelize(List("12","23","345","4567"), 2)
x.glom.collect
// Array[Array[String]] = Array(Array(12, 23), Array(345, 4567))

val y = sc.parallelize(List("12","23","345",""), 2)
y.glom.collect
// Array[Array[String]] = Array(Array(12, 23), Array(345, ""))
and define two helpers:

def seqOp(x: String, y: String) =  math.min(x.length, y.length).toString
def combOp(x: String, y: String) = x + y
Now lets trace execution for x. Ignoring parallelism it can be represented as follows:

(combOp (seqOp (seqOp "" "12") "23") (seqOp (seqOp "" "345") "4567"))
(combOp (seqOp "0" "23") (seqOp (seqOp "" "345") "4567"))
(combOp "1" (seqOp (seqOp "" "345") "4567"))
(combOp "1" (seqOp "0" "4567"))
(combOp "1" "1")
"11"
The same thing for y:

(combOp (seqOp (seqOp "" "12") "23") (seqOp (seqOp "" "345") ""))
(combOp (seqOp "0" "23") (seqOp (seqOp "" "345") ""))
(combOp "1" (seqOp (seqOp "" "345") ""))
(combOp "1" (seqOp "0" ""))
(combOp "1" "0")
"10"

 

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics