[SPARK-36254][INFRA][PYTHON] Install mlflow in Github Actions CI
### What changes were proposed in this pull request?
This PR proposes adding a Python package, `mlflow` and `sklearn` to enable the MLflow test in pandas API on Spark.
### Why are the changes needed?
To enable the MLflow test in pandas API on Spark.
### Does this PR introduce _any_ user-facing change?
No, it's test-only
### How was this patch tested?
Manually test on local, with `python/run-tests --testnames pyspark.pandas.mlflow`.
Closes #33567 from itholic/SPARK-36254.
Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/.github/workflows/build_and_test.yml b/.github/workflows/build_and_test.yml
index f3a6363..17908ff 100644
--- a/.github/workflows/build_and_test.yml
+++ b/.github/workflows/build_and_test.yml
@@ -252,6 +252,8 @@
# Run the tests.
- name: Run tests
run: |
+ # TODO(SPARK-36345): Install mlflow>=1.0 and sklearn in Python 3.9 of the base image
+ python3.9 -m pip install 'mlflow>=1.0' sklearn
export PATH=$PATH:$HOME/miniconda/bin
./dev/run-tests --parallelism 1 --modules "$MODULES_TO_TEST"
- name: Upload test results to report
diff --git a/dev/requirements.txt b/dev/requirements.txt
index f5d662b..34f4b88 100644
--- a/dev/requirements.txt
+++ b/dev/requirements.txt
@@ -7,7 +7,8 @@
pandas
scipy
plotly
-mlflow
+mlflow>=1.0
+sklearn
matplotlib<3.3.0
# PySpark test dependencies
diff --git a/python/pyspark/pandas/mlflow.py b/python/pyspark/pandas/mlflow.py
index 719db40..4e48369 100644
--- a/python/pyspark/pandas/mlflow.py
+++ b/python/pyspark/pandas/mlflow.py
@@ -229,10 +229,4 @@
if __name__ == "__main__":
- try:
- import mlflow # noqa: F401
- import sklearn # noqa: F401
-
- _test()
- except ImportError:
- pass
+ _test()