commit | 076139562ed9d9a2d582cfcc0a9998c897a0b944 | [log] [tgz] |
---|---|---|
author | Sebastian <baunsgaard@tugraz.at> | Tue May 26 21:07:38 2020 +0200 |
committer | Matthias Boehm <mboehm7@gmail.com> | Tue May 26 22:03:49 2020 +0200 |
tree | 712a6a47848fe5fadd9043587b42a9df4681c73e | |
parent | 822b4922b938ece3a23204823f818545d471bae4 [diff] |
[SYSTEMDS-310] Python API refactoring General Comfort of life changes to Python Interface and Documentation restructures. Modifying the SystemDS context, such that it starts new contexts for each instantiation, since the previous idea of reusing had some difficulties with our system std out and err. This change should also completely fix the :bug: we got from early adopters, not allowing many instances of SystemDSContext. The Python testing also have a major overhaul, to only start a systemDS context if the test are run. Unlike before where the IDE, if the IDE loaded the tests, would launch many contexts at the risk of errors. The tests are now setup without relative paths and now run using IDE debugging and testing at least for Visual Studio Code :smile:. Changes for the python launch to reflect more of the same settings possible in java, starting with Log4J. Log4J option for python added to flag in SystemDSContext object. This works by catching all std out and err in a fifo que. The outputs can then be returned through two methods in SystemDSContext. Commit also contains relocation internally in __java__ systemds - Python API file moved to api package. - Change the python binding to api - Move Py4j converter from compression to runtime/util Minor changes - Federated python tests only execute on pr to master or master branch - All instances of StringObject returns for validation have been switched to their respective Object types. - All Tests are now executable `except federated` using unittest discover. Documentation Update - Documentation of the API is now structured like the code hopefully making additions easier. - Documentation contains a startup as before but also the beginning of tutorial style guides for specific applications. Closes #900.
Overview: SystemDS is a versatile system for the end-to-end data science lifecycle from data integration, cleaning, and feature engineering, over efficient, local and distributed ML model training, to deployment and serving. To this end, we aim to provide a stack of declarative languages with R-like syntax for (1) the different tasks of the data-science lifecycle, and (2) users with different expertise. These high-level scripts are compiled into hybrid execution plans of local, in-memory CPU and GPU operations, as well as distributed operations on Apache Spark. In contrast to existing systems - that either provide homogeneous tensors or 2D Datasets - and in order to serve the entire data science lifecycle, the underlying data model are DataTensors, i.e., tensors (multi-dimensional arrays) whose first dimension may have a heterogeneous and nested schema.
Quick Start Install, Quick Start and Hello World
Documentation: SystemDS Documentation
Python Documentation Python SystemDS Documentation
Status and Build: SystemDS is still in pre-alpha status. The original code base was forked from Apache SystemML 1.2 in September 2018. We will continue to support linear algebra programs over matrices, while replacing the underlying data model and compiler, as well as substantially extending the supported functionalities. Until the first release, you can build your own snapshot via Apache Maven: mvn clean package -P distribution
.