[SYSTEMDS-310] Python API refactoring

General Comfort of life changes to Python Interface and Documentation
restructures.

Modifying the SystemDS context, such that it starts new contexts
for each instantiation, since the previous idea of reusing had some
difficulties with our system std out and err. This change should also
completely fix the :bug: we got from early adopters, not allowing
many instances of SystemDSContext.

The Python testing also have a major overhaul, to only
start a systemDS context if the test are run.
Unlike before where the IDE, if the IDE loaded the tests,
would launch many contexts at the risk of errors.

The tests are now setup without relative paths and
now run using IDE debugging and testing at least for Visual
Studio Code :smile:.

Changes for the python launch to reflect
more of the same settings possible in java, starting with Log4J.
Log4J option for python added to flag in SystemDSContext object.
This works by catching all std out and err in a fifo que.
The outputs can then be returned through two methods in SystemDSContext.

Commit also contains relocation internally in __java__ systemds

- Python API file moved to api package.
- Change the python binding to api
- Move Py4j converter from compression to runtime/util

Minor changes

- Federated python tests only execute on pr to master or master branch
- All instances of StringObject returns for validation have been
switched
  to their respective Object types.
- All Tests are now executable `except federated` using unittest
discover.

Documentation Update

- Documentation of the API is now structured like the code hopefully
  making additions easier.
- Documentation contains a startup as before but also the beginning of
  tutorial style guides for specific applications.

Closes #900.
91 files changed
tree: 712a6a47848fe5fadd9043587b42a9df4681c73e
  1. .github/
  2. bin/
  3. conf/
  4. dev/
  5. docker/
  6. docs/
  7. scripts/
  8. src/
  9. .gitattributes
  10. .gitignore
  11. CONTRIBUTING.md
  12. LICENSE
  13. NOTICE
  14. pom.xml
  15. README.md
README.md

Apache SystemDS

Overview: SystemDS is a versatile system for the end-to-end data science lifecycle from data integration, cleaning, and feature engineering, over efficient, local and distributed ML model training, to deployment and serving. To this end, we aim to provide a stack of declarative languages with R-like syntax for (1) the different tasks of the data-science lifecycle, and (2) users with different expertise. These high-level scripts are compiled into hybrid execution plans of local, in-memory CPU and GPU operations, as well as distributed operations on Apache Spark. In contrast to existing systems - that either provide homogeneous tensors or 2D Datasets - and in order to serve the entire data science lifecycle, the underlying data model are DataTensors, i.e., tensors (multi-dimensional arrays) whose first dimension may have a heterogeneous and nested schema.

Quick Start Install, Quick Start and Hello World

Documentation: SystemDS Documentation

Python Documentation Python SystemDS Documentation

Status and Build: SystemDS is still in pre-alpha status. The original code base was forked from Apache SystemML 1.2 in September 2018. We will continue to support linear algebra programs over matrices, while replacing the underlying data model and compiler, as well as substantially extending the supported functionalities. Until the first release, you can build your own snapshot via Apache Maven: mvn clean package -P distribution.

Build Documentation Component Test Application Test Function Test Python Test Federated Python Test