SynapseML
简单且分布式的机器学习
Coming from MMLSpark? We have been renamed to SynapseML!
- 认知服务
- 深度学习
- 负责任的人工智能
- LightGBM
- OpenCV
Read morefrom synapse.ml.cognitive import *sentiment_df = (TextSentiment().setTextCol("text").setLocation("eastus").setSubscriptionKey(key).setOutputCol("sentiment").setErrorCol("error").setLanguageCol("language").transform(input_df))
Read morefrom synapse.ml.onnx import *model_prediction_df = (ONNXModel().setModelPayload(model_payload_ml).setDeviceType("CPU").setFeedDict({"input": "features"}).setFetchDict({"probability": "probabilities", "prediction": "label"}).setMiniBatchSize(64).transform(input_df))
Read morefrom synapse.ml.explainers import *interpretation_df = (TabularSHAP().setInputCols(features).setOutputCol("shapValues").setTargetCol("probability").setTargetClasses([1]).setNumSamples(5000).setModel(model).transform(input_df))
Read morefrom synapse.ml.lightgbm import *quantile_df = (LightGBMRegressor().setApplication('quantile').setAlpha(0.3).setLearningRate(0.3).setNumIterations(100).setNumLeaves(31).fit(train_df).transform(test_df))
Read morefrom synapse.ml.opencv import *image_df = (ImageTransformer().setInputCol("images").setOutputCol("transformed_images").resize(224, True).centerCrop(224, 224).normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], color_scale_factor = 1/255).transform(input_df))
简单
只需几行代码即可快速创建、训练和使用分布式机器学习工具。
多语言
使用SynapseML从任何与Spark兼容的语言,包括Python、Scala、R、Java、.NET和C#。
打开
SynapseML 是开源的,可以在任何 Spark 3 基础设施上安装和使用,包括您的本地机器、Databricks、Synapse Analytics 等。
安装
使用Scala编写,并支持多种语言。开源且云原生。
- Synapse
- Fabric
- Spark 包
- Databricks
- Docker
- Python
- SBT
SynapseML 可以通过在笔记本的第一个单元格中添加以下内容来安装在 Synapse 上:
For Spark3.4 pools:For Spark3.3 pools:%%configure -f{"name": "synapseml","conf": {"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:1.0.8","spark.jars.repositories": "https://mmlspark.azureedge.net/maven","spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind","spark.yarn.user.classpath.first": "true","spark.sql.parquet.enableVectorizedReader": "false"}}
%%configure -f{"name": "synapseml","conf": {"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3","spark.jars.repositories": "https://mmlspark.azureedge.net/maven","spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind","spark.yarn.user.classpath.first": "true","spark.sql.parquet.enableVectorizedReader": "false"}}
SynapseML 已预装在 Fabric 上。要安装不同版本,请在笔记本的第一个单元格中添加以下内容:
%%configure -f{"name": "synapseml","conf": {"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:[THE_SYNAPSEML_VERSION_YOU_WANT]","spark.jars.repositories": "https://mmlspark.azureedge.net/maven","spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind","spark.yarn.user.classpath.first": "true","spark.sql.parquet.enableVectorizedReader": "false"}}
SynapseML can be conveniently installed on existing Spark clusters via the --packages option, examples:
This can be used in other Spark contexts too. For example, you can use SynapseML in AZTK by adding it to the .aztk/spark-defaults.conf file.spark-shell --packages com.microsoft.azure:synapseml_2.12:1.0.8 # Please use 1.0.8 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3pyspark --packages com.microsoft.azure:synapseml_2.12:1.0.8spark-submit --packages com.microsoft.azure:synapseml_2.12:1.0.8 MyApp.jar
要在Databricks云上安装SynapseML,请在工作区中创建一个新的来自Maven坐标的库。
对于坐标:
Spark 3.4 Cluster:Spark 3.3 Cluster:com.microsoft.azure:synapseml_2.12:1.0.8
with the resolver:com.microsoft.azure:synapseml_2.12:0.11.4-spark3.3
Ensure this library is attached to your target cluster(s).https://mmlspark.azureedge.net/maven
最后,确保您的Spark集群至少安装了Spark 3.4和Scala 2.12。
You can use SynapseML in both your Scala and PySpark notebooks. To get started with our example notebooks import the following databricks archive:https://mmlspark.blob.core.windows.net/dbcs/SynapseMLExamplesv1.0.8.dbc
The easiest way to evaluate SynapseML is via our pre-built Docker container. To do so, run the following command:
docker run -it -p 8888:8888 -e ACCEPT_EULA=yes mcr.microsoft.com/mmlspark/release
在您的网络浏览器中导航到http://localhost:8888以运行示例笔记本。有关Docker使用的更多信息,请参阅documentation。
To read the EULA for using the docker image, rundocker run -it -p 8888:8888 mcr.microsoft.com/mmlspark/release eula
To try out SynapseML on a Python (or Conda) installation you can get Spark installed via pip with
You can then use pyspark as in the above example, or from python:pip install pyspark
import pysparkspark = (pyspark.sql.SparkSession.builder.appName("MyApp").config("spark.jars.packages", "com.microsoft.azure:synapseml_2.12:1.0.8") # Please use 1.0.8 version for Spark3.4 and 0.11.4-spark3.3 version for Spark3.3.config("spark.jars.repositories", "https://mmlspark.azureedge.net/maven").getOrCreate())import synapse.ml
If you are building a Spark application in Scala, add the following lines to your build.sbt:
resolvers += "SynapseML" at "https://mmlspark.azureedge.net/maven"libraryDependencies += "com.microsoft.azure" %% "synapseml_2.12" % "1.0.8" // Please use 1.0.8 version for Spark3.2 and 1.0.8-spark3.3 version for Spark3.3