Train XGBoost with Spark

2021-01-16Edit: 2021-01-16Zhanhang (Matthew) ZENG a few seconds read (About 55 words)

Big Data / Machine Learning / Spark / Statistical Learning Big Data,Spark,Machine Learning

# XGB training script
# run spark-shell on cluster

spark-shell --name xxx --num-executors 15 --executor-cores 4 --executor-memory 20G --jars /tmp/xgboost4j-0.82.jar,/tmp/xgboost4j-spark-0.82.jar --driver-class-path /tmp/xgboost4j-0.82.jar,/tmp/xgboost4j-spark-0.82.jar

# import dependencies
import org.apache.spark.sql.types._
import scala.collection.mutable.ArrayBuilder

#Big Data,Spark,Machine Learning

Building K-Means with Spark

2020-12-18Edit: 2021-01-05Zhanhang (Matthew) ZENG 8 minutes read (About 1238 words)

Big Data / Machine Learning / Spark / Statistical Learning Big Data,Spark,Machine Learning

Industry applications of machine learning generally require us to have the ability to deal with massive datasets. Spark provides a machine learning library named mllib allowing us to build machine learning models efficiently and parallelly.

This post is going to start with a Spark ML modelling example based on pyspark on Python, K-Means, and to explain some basic steps as well as the usage of Spark APIs when building an ML model on Spark.

For the complete code of the K-Means example, please refer to Sec2. Spark K-Means code summarization.

#Big Data,Spark,Machine Learning

Train XGBoost with Spark

Building K-Means with Spark

Categories

Recent

Archives

Tags

Links

Subscribe to Updates

Your browser is out-of-date!