
This PyPi package contains the Python APIs for using Paimon.
Pypaimon requires Python 3.6+.
The core dependencies are listed in dev/requirements.txt. The development dependencies are listed in dev/requirements-dev.txt.
You can build the source package by executing the following command:
python3 setup.py sdist
The package is under dist/. Then you can install the package by executing the following command:
pip3 install dist/*.tar.gz
The command will install the package and core dependencies to your local Python environment.
pypaimon supports HDFS through a pure-protocol client based on hdfs-native (Rust + PyO3). Use it when you want HDFS access without installing Hadoop, a JDK, libhdfs, or wrestling with CLASSPATH / LD_LIBRARY_PATH.
Install with the optional extra:
pip install 'pypaimon[hdfs]'
The native backend requires Python 3.10+ (and is unavailable on Windows). On older interpreters the extra is skipped, so pypaimon still installs — keep using the legacy pyarrow (libhdfs/JVM) backend there via hdfs.client.impl=pyarrow.
For hdfs:// and viewfs:// URIs this backend is now the default. Switch back to the legacy libhdfs (JNI) path with:
catalog = CatalogFactory.create({ "warehouse": "hdfs://ns1/warehouse", "hdfs.client.impl": "pyarrow", # default: "native" })
The client still needs to know about NameNode addresses, HA failover groups, and viewfs mount tables. Three options:
Local xml — set HADOOP_CONF_DIR (or the hdfs.conf-dir option) to a directory containing core-site.xml / hdfs-site.xml. Only the xml is required; no Hadoop binaries or JDK.
Catalog options (REST-friendly) — pass the original Hadoop key/values directly in catalog options. Keys with prefixes dfs., fs., hadoop., ipc., io. are forwarded as-is. A REST catalog can deliver these in its response, giving a fully zero-file client experience:
CatalogFactory.create({ "warehouse": "viewfs://cluster/warehouse", "dfs.nameservices": "ns1", "dfs.ha.namenodes.ns1": "nn1,nn2", "dfs.namenode.rpc-address.ns1.nn1": "host-1:8020", "dfs.namenode.rpc-address.ns1.nn2": "host-2:8020", "fs.viewfs.mounttable.cluster.link./prod": "hdfs://ns1/prod", })
Namespaced overrides — use hdfs.config.<key> to forward any other Hadoop key not covered by the prefix whitelist.
The three sources can be combined; catalog options take precedence over xml.
A secured cluster still needs the GSSAPI system library (libgssapi-krb5-2 on Debian/Ubuntu, krb5 via Homebrew on macOS, krb5-libs on RHEL) plus a krb5.conf. Provide credentials by either:
kinit yourself and pointing KRB5CCNAME at the cache, orsecurity.kerberos.login.principal and security.kerberos.login.keytab in catalog options — pypaimon will run kinit for you.If the native backend fails to initialise (e.g. wheel missing on an unsupported platform such as Windows), pypaimon automatically falls back to the pyarrow (libhdfs/JVM) path and logs a warning. Disable the fallback with hdfs.client.fallback-to-pyarrow=false if you want hard failures instead.