commit | eb29b2d5488d503c97c6b5845a7fffee9a225eed | [log] [tgz] |
---|---|---|
author | Lachezar Nikolov <74593108+lachezar-n@users.noreply.github.com> | Sun Feb 25 12:57:18 2024 +0100 |
committer | GitHub <noreply@github.com> | Sun Feb 25 17:27:18 2024 +0530 |
tree | 19d47b30d77e9d12c095006e2fe3a79244cd2d51 | |
parent | 4742a8545d4af6121aaace00f8ea12b6051ef05d [diff] |
[SYSTEMDS-2926] AWS scripts update for EMR-7.0.0 (#2003) The changes fix some general issues: - creating and referencing the S3 buckets - not assigning any sub-network for the cluster (bad practice + potential security vulnerability) The changes also update the used EMR version to the currently most recent one: emr-7.0.0: - configurations update - exchanging Ganglia with AmazonCloudWatchAgent see https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-AmazonCloudWatchAgent.html While testing the script with the current repo version the following bug was observed: when launching SystemDS in execution mode "spark" an `IllegalCallerException` is thrown. For running the command `spark-submit target/SystemDS.jar -f path/to/hello.dml -exec spark -stats -explain` the exact output in the console is: ```shell ... --MAIN PROGRAM ----GENERIC (lines 1-1) [recompile=false] ------CP print Hello World.SCALAR.STRING.true _Var0.SCALAR.STRING 8 ------CP rmvar _Var0 An Error Occurred : IllegalCallerException -- java.lang.ref is not open to unnamed module @4eba373c ```
Overview: SystemDS is an open source ML system for the end-to-end data science lifecycle from data integration, cleaning, and feature engineering, over efficient, local and distributed ML model training, to deployment and serving. To this end, we aim to provide a stack of declarative languages with R-like syntax for (1) the different tasks of the data-science lifecycle, and (2) users with different expertise. These high-level scripts are compiled into hybrid execution plans of local, in-memory CPU and GPU operations, as well as distributed operations on Apache Spark. In contrast to existing systems - that either provide homogeneous tensors or 2D Datasets - and in order to serve the entire data science lifecycle, the underlying data model are DataTensors, i.e., tensors (multi-dimensional arrays) whose first dimension may have a heterogeneous and nested schema.
Resource | Links |
---|---|
Quick Start | Install, Quick Start and Hello World |
Documentation: | SystemDS Documentation |
Python Documentation | Python SystemDS Documentation |
Issue Tracker | Jira Dashboard |
Status and Build: SystemDS is renamed from SystemML which is an Apache Top Level Project. To build from source visit SystemDS Install from source