| commit | 2bc9a7eb870e1ff540408b455193ff52f7ede38c | [log] [tgz] |
|---|---|---|
| author | baunsgaard <baunsgaard@tugraz.at> | Thu Jan 26 18:24:19 2023 +0100 |
| committer | baunsgaard <baunsgaard@tugraz.at> | Fri Jan 27 15:53:48 2023 +0100 |
| tree | 3d226d6add7211fe78d730f1ac39290f760edba5 | |
| parent | c4636bc59bfe94c0d9cf1f1c0078ba5616216d29 [diff] |
[SYSTEMDS-3490] Compressed Transform Encode Transform encode fused with compression. Making a compressed output from the frame input depending on the transformations applied. Initial results are very promising transforming single threaded at the same speed as our tuned multithreaded version. This commit contains the bare minimum for the transform encode, and following commits will add more transformation pipelines. Currently supported is Recode to dummy, recode, and pass through in very naive implementations. Also contained is an IdentityDictionary implementation that allows one to specify that the compressed dictionary simply is the identity matrix. This allocation is very small of a object and a integer specifying the number of rows and columns contained in the Identity matrix. To make the implementation efficient initially a soft reference to a materialized MatrixBlock dictionary is materialized in all not supported cases of operations the IdentityDictionary. Closes #1772
Overview: SystemDS is an open source ML system for the end-to-end data science lifecycle from data integration, cleaning, and feature engineering, over efficient, local and distributed ML model training, to deployment and serving. To this end, we aim to provide a stack of declarative languages with R-like syntax for (1) the different tasks of the data-science lifecycle, and (2) users with different expertise. These high-level scripts are compiled into hybrid execution plans of local, in-memory CPU and GPU operations, as well as distributed operations on Apache Spark. In contrast to existing systems - that either provide homogeneous tensors or 2D Datasets - and in order to serve the entire data science lifecycle, the underlying data model are DataTensors, i.e., tensors (multi-dimensional arrays) whose first dimension may have a heterogeneous and nested schema.
Quick Start Install, Quick Start and Hello World
Documentation: SystemDS Documentation
Python Documentation Python SystemDS Documentation
Issue Tracker Jira Dashboard
Status and Build: SystemDS is renamed from SystemML which is an Apache Top Level Project. To build from source visit SystemDS Install from source