commit | bea9c96fc1acae107498982724e4a739c8c98d74 | [log] [tgz] |
---|---|---|
author | Nakroma <tarackobarack@gmail.com> | Wed Feb 05 22:38:35 2025 +0100 |
committer | Sebastian Baunsgaard <baunsgaard@apache.org> | Wed Feb 05 22:38:35 2025 +0100 |
tree | 78637ef26bac7f8ab8c50486517fe6a1bd6b4859 | |
parent | 22642a1bddf6a2755069f80413b04216b2dd1a89 [diff] |
[SYSTEMDS-3548] Optimize python dataframe transfer This commit optimizes how the pandas_to_frame_block function accesses Java types. It also fixes a small regression, where exceptions from the parallelization threads weren't propagating exceptions properly. - Fix perftests not working with large, split-up datasets IO datagen splits large datasets into multiple files (for example 100k_1k). This commit makes load_pandas.py and load_numpy.py able to read those. - Add pandas to FrameBlock row-wise parallel processing in the case of cols > rows. It also adds some other small, unused utility methods. - Add javadocs - Adjust Py4jConverterUtilsTest to reflect the code changes in the main class. - adds missing tests for added code in SYSTEMDS-3548. This includes the FrameBlock and Py4jConverterUtils functions, as well as python pandas to systemds io e2e tests. - Fix pandas io test (rows have to be >4) Closes #2189
Overview: Apache SystemDS is an open-source machine learning (ML) system for the end-to-end data science lifecycle from data preparation and cleaning, over efficient ML model training, to debugging and serving. ML algorithms or pipelines are specified in a high-level language with R-like syntax or related Python and Java APIs (with many builtin primitives), and the system automatically generates hybrid runtime plans of local, in-memory operations and distributed operations on Apache Spark. Additional backends exist for GPUs and federated learning.
Resource | Links |
---|---|
Quick Start | Install, Quick Start and Hello World |
Documentation: | SystemDS Documentation |
Python Documentation | Python SystemDS Documentation |
Issue Tracker | Jira Dashboard |
Status and Build: SystemDS is renamed from SystemML which is an Apache Top Level Project. To build from source visit SystemDS Install from source