blob: 336fd262f611e3d12c48138da3c15ee933b813e7 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
.. http://www.apache.org/licenses/LICENSE-2.0
.. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. _api.dataframe:
=========
DataFrame
=========
.. currentmodule:: pyspark.pandas
Constructor
-----------
.. autosummary::
:toctree: api/
DataFrame
Attributes and underlying data
------------------------------
.. autosummary::
:toctree: api/
DataFrame.index
DataFrame.info
DataFrame.columns
DataFrame.empty
.. autosummary::
:toctree: api/
DataFrame.dtypes
DataFrame.shape
DataFrame.axes
DataFrame.ndim
DataFrame.size
DataFrame.select_dtypes
DataFrame.values
Conversion
----------
.. autosummary::
:toctree: api/
DataFrame.copy
DataFrame.isna
DataFrame.astype
DataFrame.isnull
DataFrame.notna
DataFrame.notnull
DataFrame.bool
Indexing, iteration
-------------------
.. autosummary::
:toctree: api/
DataFrame.at
DataFrame.iat
DataFrame.head
DataFrame.idxmax
DataFrame.idxmin
DataFrame.loc
DataFrame.iloc
DataFrame.insert
DataFrame.items
DataFrame.iterrows
DataFrame.itertuples
DataFrame.keys
DataFrame.pop
DataFrame.tail
DataFrame.xs
DataFrame.get
DataFrame.where
DataFrame.mask
DataFrame.query
Binary operator functions
-------------------------
.. autosummary::
:toctree: api/
DataFrame.add
DataFrame.radd
DataFrame.div
DataFrame.rdiv
DataFrame.truediv
DataFrame.rtruediv
DataFrame.mul
DataFrame.rmul
DataFrame.sub
DataFrame.rsub
DataFrame.pow
DataFrame.rpow
DataFrame.mod
DataFrame.rmod
DataFrame.floordiv
DataFrame.rfloordiv
DataFrame.lt
DataFrame.gt
DataFrame.le
DataFrame.ge
DataFrame.ne
DataFrame.eq
DataFrame.dot
DataFrame.combine_first
Function application, GroupBy & Window
--------------------------------------
.. autosummary::
:toctree: api/
DataFrame.apply
DataFrame.applymap
DataFrame.map
DataFrame.pipe
DataFrame.agg
DataFrame.aggregate
DataFrame.groupby
DataFrame.rolling
DataFrame.expanding
DataFrame.transform
.. _api.dataframe.stats:
Computations / Descriptive Stats
--------------------------------
.. autosummary::
:toctree: api/
DataFrame.abs
DataFrame.all
DataFrame.any
DataFrame.clip
DataFrame.corr
DataFrame.corrwith
DataFrame.count
DataFrame.cov
DataFrame.describe
DataFrame.ewm
DataFrame.kurt
DataFrame.kurtosis
DataFrame.max
DataFrame.mean
DataFrame.min
DataFrame.median
DataFrame.mode
DataFrame.pct_change
DataFrame.prod
DataFrame.product
DataFrame.quantile
DataFrame.rank
DataFrame.nunique
DataFrame.sem
DataFrame.skew
DataFrame.sum
DataFrame.std
DataFrame.var
DataFrame.cummin
DataFrame.cummax
DataFrame.cumsum
DataFrame.cumprod
DataFrame.round
DataFrame.diff
DataFrame.eval
Reindexing / Selection / Label manipulation
-------------------------------------------
.. autosummary::
:toctree: api/
DataFrame.add_prefix
DataFrame.add_suffix
DataFrame.align
DataFrame.at_time
DataFrame.between_time
DataFrame.drop
DataFrame.droplevel
DataFrame.drop_duplicates
DataFrame.duplicated
DataFrame.equals
DataFrame.filter
DataFrame.first
DataFrame.head
DataFrame.last
DataFrame.reindex
DataFrame.reindex_like
DataFrame.rename
DataFrame.rename_axis
DataFrame.reset_index
DataFrame.set_index
DataFrame.swapaxes
DataFrame.swaplevel
DataFrame.take
DataFrame.isin
DataFrame.sample
DataFrame.truncate
.. _api.dataframe.missing:
Missing data handling
---------------------
.. autosummary::
:toctree: api/
DataFrame.backfill
DataFrame.dropna
DataFrame.fillna
DataFrame.replace
DataFrame.bfill
DataFrame.ffill
DataFrame.interpolate
DataFrame.pad
Reshaping, sorting, transposing
-------------------------------
.. autosummary::
:toctree: api/
DataFrame.pivot_table
DataFrame.pivot
DataFrame.sort_index
DataFrame.sort_values
DataFrame.nlargest
DataFrame.nsmallest
DataFrame.stack
DataFrame.unstack
DataFrame.melt
DataFrame.explode
DataFrame.squeeze
DataFrame.T
DataFrame.transpose
Combining / joining / merging
-----------------------------
.. autosummary::
:toctree: api/
DataFrame.assign
DataFrame.merge
DataFrame.join
DataFrame.update
Time series-related
-------------------
.. autosummary::
:toctree: api/
DataFrame.resample
DataFrame.shift
DataFrame.first_valid_index
DataFrame.last_valid_index
Serialization / IO / Conversion
-------------------------------
.. autosummary::
:toctree: api/
DataFrame.from_dict
DataFrame.from_records
DataFrame.to_table
DataFrame.to_delta
DataFrame.to_parquet
DataFrame.to_csv
DataFrame.to_orc
DataFrame.to_pandas
DataFrame.to_html
DataFrame.to_numpy
DataFrame.to_spark
DataFrame.to_string
DataFrame.to_feather
DataFrame.to_stata
DataFrame.to_json
DataFrame.to_dict
DataFrame.to_excel
DataFrame.to_hdf
DataFrame.to_clipboard
DataFrame.to_markdown
DataFrame.to_records
DataFrame.to_latex
DataFrame.style
Spark-related
-------------
``DataFrame.spark`` provides features that does not exist in pandas but
in Spark. These can be accessed by ``DataFrame.spark.<function/property>``.
.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst
DataFrame.spark.frame
DataFrame.spark.cache
DataFrame.spark.persist
DataFrame.spark.hint
DataFrame.spark.to_table
DataFrame.spark.to_spark_io
DataFrame.spark.apply
DataFrame.spark.repartition
DataFrame.spark.coalesce
.. _api.dataframe.plot:
Plotting
--------
``DataFrame.plot`` is both a callable method and a namespace attribute for
specific plotting methods of the form ``DataFrame.plot.<kind>``.
.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst
DataFrame.plot.area
DataFrame.plot.barh
DataFrame.plot.bar
DataFrame.plot.hist
DataFrame.plot.box
DataFrame.plot.line
DataFrame.plot.pie
DataFrame.plot.scatter
DataFrame.plot.density
.. autosummary::
:toctree: api/
DataFrame.hist
DataFrame.boxplot
DataFrame.kde
Pandas-on-Spark specific
------------------------
``DataFrame.pandas_on_spark`` provides pandas-on-Spark specific features that exists only in pandas API on Spark.
These can be accessed by ``DataFrame.pandas_on_spark.<function/property>``.
.. autosummary::
:toctree: api/
:template: autosummary/accessor_method.rst
DataFrame.pandas_on_spark.apply_batch
DataFrame.pandas_on_spark.transform_batch