[SPARK-49716][PS][DOCS][TESTS] Fix documentation and add test of barh plot
### What changes were proposed in this pull request?
- Update the documentation for barh plot to clarify the difference between axis interpretation in Plotly and Matplotlib.
- Test multiple columns as value axis.
The parameter difference is demonstrated as below.
```py
>>> df = ps.DataFrame({'lab': ['A', 'B', 'C'], 'val': [10, 30, 20]})
>>> df.plot.barh(x='val', y='lab').show() # plot1
>>> ps.set_option('plotting.backend', 'matplotlib')
>>> import matplotlib.pyplot as plt
>>> df.plot.barh(x='lab', y='val')
>>> plt.show() # plot2
```
plot1

plot2

### Why are the changes needed?
The barh plot’s x and y axis behavior differs between Plotly and Matplotlib, which may confuse users. The updated documentation and tests help ensure clarity and prevent misinterpretation.
### Does this PR introduce _any_ user-facing change?
No. Doc change only.
### How was this patch tested?
Unit tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #48161 from xinrong-meng/ps_barh.
Authored-by: Xinrong Meng <xinrong@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
diff --git a/python/pyspark/pandas/plot/core.py b/python/pyspark/pandas/plot/core.py
index 7630ecc..429e97e 100644
--- a/python/pyspark/pandas/plot/core.py
+++ b/python/pyspark/pandas/plot/core.py
@@ -756,10 +756,10 @@
Parameters
----------
- x : label or position, default DataFrame.index
- Column to be used for categories.
- y : label or position, default All numeric columns in dataframe
+ x : label or position, default All numeric columns in dataframe
Columns to be plotted from the DataFrame.
+ y : label or position, default DataFrame.index
+ Column to be used for categories.
**kwds
Keyword arguments to pass on to
:meth:`pyspark.pandas.DataFrame.plot` or :meth:`pyspark.pandas.Series.plot`.
@@ -770,6 +770,13 @@
Return an custom object when ``backend!=plotly``.
Return an ndarray when ``subplots=True`` (matplotlib-only).
+ Notes
+ -----
+ In Plotly and Matplotlib, the interpretation of `x` and `y` for `barh` plots differs.
+ In Plotly, `x` refers to the values and `y` refers to the categories.
+ In Matplotlib, `x` refers to the categories and `y` refers to the values.
+ Ensure correct axis labeling based on the backend used.
+
See Also
--------
plotly.express.bar : Plot a vertical bar plot using plotly.
diff --git a/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py b/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
index 37469db..8d19764 100644
--- a/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
+++ b/python/pyspark/pandas/tests/plot/test_frame_plot_plotly.py
@@ -105,9 +105,10 @@
self.assertEqual(pdf.plot.barh(x=x, y=y), psdf.plot.barh(x=x, y=y))
# this is testing plot with specified x and y
- pdf1 = pd.DataFrame({"lab": ["A", "B", "C"], "val": [10, 30, 20]})
+ pdf1 = pd.DataFrame({"lab": ["A", "B", "C"], "val": [10, 30, 20], "val2": [1.1, 2.2, 3.3]})
psdf1 = ps.from_pandas(pdf1)
- check_barh_plot_with_x_y(pdf1, psdf1, x="lab", y="val")
+ check_barh_plot_with_x_y(pdf1, psdf1, x="val", y="lab")
+ check_barh_plot_with_x_y(pdf1, psdf1, x=["val", "val2"], y="lab")
def test_barh_plot(self):
def check_barh_plot(pdf, psdf):