[SYSTEMDS-2668] Fine-Grained Priv Constraint Prop

This PR improves propagation of fine-grained privacy constraints for
matrix multiplications. The PR also provides a new structure for
fine-grained privacy propagations by introducing a "Propagator"
interface which are implemented by different propagator classes. This
interface will be used in the following implementations of privacy
propagation for other operators.

The new matrix multiplication propagation is more efficient than the
previous implementation since it makes an array with the summarized
privacy level of the rows of the first matrix and the columns of the
second matrix.
Furthermore, it takes the operator type into account. This means that
if a row or column contains only a single non-zero value, it cannot be
considered an aggregation, hence the output in case of the
PrivateAggregation privacy level in the input should still be
PrivateAggregation. The rules of propagation is implemented in the
method "PrivacyPropagator.corePropagation", where the comment also
details the privacy "truth table".

- Edit Fine-Grained Constraint Propagation in Matrix Multiplications
- The new version will take operator type into account when propagating
  and will summarize the privacy level of rows and columns of the input
  matrices to make a faster propagation. The new implementation needs
  further test cases, which will be added in future commits.
- Add Tests of Matrix Multiplication Privacy Propagation
- Refactor Matrix Multiplication Propagation By Introducing
  the Propagator Interface
- Add Optimized PrivateFirst Propagator

Closes #1060
19 files changed
tree: 987573ca169a639c95826ca77fccb9750fb67f5f
  1. .github/
  2. bin/
  3. conf/
  4. dev/
  5. docker/
  6. docs/
  7. notebooks/
  8. scripts/
  9. src/
  10. .gitattributes
  11. .gitignore
  12. CONTRIBUTING.md
  13. LICENSE
  14. NOTICE
  15. pom.xml
  16. README.md
README.md

Apache SystemDS

Overview: SystemDS is a versatile system for the end-to-end data science lifecycle from data integration, cleaning, and feature engineering, over efficient, local and distributed ML model training, to deployment and serving. To this end, we aim to provide a stack of declarative languages with R-like syntax for (1) the different tasks of the data-science lifecycle, and (2) users with different expertise. These high-level scripts are compiled into hybrid execution plans of local, in-memory CPU and GPU operations, as well as distributed operations on Apache Spark. In contrast to existing systems - that either provide homogeneous tensors or 2D Datasets - and in order to serve the entire data science lifecycle, the underlying data model are DataTensors, i.e., tensors (multi-dimensional arrays) whose first dimension may have a heterogeneous and nested schema.

Quick Start Install, Quick Start and Hello World

Documentation: SystemDS Documentation

Python Documentation Python SystemDS Documentation

Issue Tracker Jira Dashboard

Status and Build: SystemDS is still in pre-alpha status. The original code base was forked from Apache SystemML 1.2 in September 2018. We will continue to support linear algebra programs over matrices, while replacing the underlying data model and compiler, as well as substantially extending the supported functionalities. Until the first release, you can build your own snapshot via Apache Maven: mvn clean package -P distribution.

Build Documentation Component Test Application Test Function Test Python Test Federated Python Test