Train XGBoost with Spark

1
2
3
4
5
6
7
8
# XGB training script
# run spark-shell on cluster

spark-shell --name xxx --num-executors 15 --executor-cores 4 --executor-memory 20G --jars /tmp/xgboost4j-0.82.jar,/tmp/xgboost4j-spark-0.82.jar --driver-class-path /tmp/xgboost4j-0.82.jar,/tmp/xgboost4j-spark-0.82.jar

# import dependencies
import org.apache.spark.sql.types._
import scala.collection.mutable.ArrayBuilder

Building K-Means with Spark

Industry applications of machine learning generally require us to have the ability to deal with massive datasets. Spark provides a machine learning library named mllib allowing us to build machine learning models efficiently and parallelly.

This post is going to start with a Spark ML modelling example based on pyspark on Python, K-Means, and to explain some basic steps as well as the usage of Spark APIs when building an ML model on Spark.

For the complete code of the K-Means example, please refer to Sec2. Spark K-Means code summarization.


Your browser is out-of-date!

Update your browser to view this website correctly.&npsb;Update my browser now

×