| <!-- |
| {% comment %} |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to you under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| {% end comment %} |
| --> |
| |
| --- |
| GENERAL NOTES: |
| * Assign IDs for epics by default with a key range of 10 |
| * Keep epic tasks in the key range, otherwise create new epic |
| --- |
| |
| SYSTEMDS-10 Compiler Rework / Misc |
| * 11 Support DML-bodied builtin functions OK |
| * 12 Remove unnecessary HOP/LOP indirections OK |
| * 13 Refactoring test cases into component/integration OK |
| * 14 Complete removal of external functions from all scripts |
| * 15 Travis integration w/ subset of tests OK (removed for Github Actions) |
| * 16 Remove instruction patching |
| * 17 Refactoring of program block hierarchy OK |
| * 18 Improve API for new dml-bodied builtin functions OK |
| * 19 Cleanup pass cpvar+rmvar to mvvar instructions OK |
| |
| SYSTEMDS-20 New Data Model |
| * 21 Finalize dense tensor blocks OK |
| * 22 Sparse double/float tensor blocks |
| * 23 Sparse int/bool tensor blocks |
| * 24 Initial tensor dml integration OK |
| * 25 Initial data tensor implementation |
| * 26 Non-zero default value for sparse (row/col) |
| * 27 Tensor buffer pool integration |
| * 28 Tensor local readers and writers (textcell, binary) OK |
| * 29 Tensor local readers and writers (linearize csv) |
| |
| SYSTEMDS-30 Builtin and Packaging |
| * 31 Shell script for local runs OK |
| * 32 Shell script for spark runs OK |
| * 33 Cleanup hadoop dependency for local runs |
| * 34 Wrapper blocks for sequence files |
| * 35 Replace unnecessary dependencies w/ custom |
| * 36 Shell script for AWS execution OK |
| * 37 Cleanup mvn package, compile, test OK |
| * 38 Runscript rename, improve, fix cygwin OK |
| |
| SYSTEMDS-40 Preprocessing builtins |
| * 41 Add new winsorize builtin function OK |
| * 42 Add new outlier builtin function OK |
| * 43 Add new scale builtin function OK |
| * 44 SotA normalization primitives |
| * 45 SotA outlier detection primitives |
| * 46 Generalization of quantiles |
| * 47 Add new normalize builtin function OK |
| * 48 Extended transform encode/apply: feature hashing OK |
| |
| SYSTEMDS-50 I/O Formats |
| * 51 Support for homogeneous JSON (local/distributed) |
| * 52 Support for libsvm files (local/distributed) |
| * 53 New sql data source (local, distributed) |
| * 54 Support for is.na, is.nan, is.infinite OK |
| * 55 Cleanup file formats, binary cell, input/output infos OK |
| * 56 Consolidate single/multi-threaded readers |
| * 57 Support for hdf5 files (local/distributed) |
| |
| SYSTEMDS-60 Update SystemML improvements |
| * 61 Take over cumulative aggregate improvements OK |
| * 62 Take over sparsity estimation improvements OK |
| * 63 Take over fixes for transform w/ binning |
| |
| SYSTEMDS-70 Lineage Tracing and Reuse OK |
| * 71 Basic collection of lineage traces OK |
| * 72 Serialization and deserialization of lineage traces OK |
| * 73 Deduplication of lineage traces OK |
| (loops, nested loops, branchless loops) OK |
| * 74 Performance features lineage tracing OK |
| * 75 Reuse cache based on lineage traces OK |
| * 76 Generate runtime plan from lineage trace OK |
| * 77 New builtin function for obtaining lineage OK |
| * 78 Extended lineage tracing (parfor, funs, codegen) OK |
| * 79 Reuse cache configuration options (cmd line options) OK |
| |
| SYSTEMDS-80 Improved distributed operations |
| * 81 Avoid unnecessary replication on rmm |
| * 82 Avoid unnecessary replication on binary |
| |
| SYSTEMDS-100 Various Fixes |
| * 101 Fix spark quantiles w/ multiple queries OK |
| * 102 Fix parser issue after refactoring OK |
| * 103 Fix handling of builtin functions w/ matching udfs OK |
| * 104 Fix failing tests due to incorrect parfor parameters OK |
| * 105 Fix all application/function tests (various issues) OK |
| * 106 Fix correctness of as.integer for negative numbers OK |
| * 107 Fix correctness IPA check dimension-preserving OK |
| * 108 Fix codegen optimizer (early costing abort) OK |
| * 109 Fix update-in-place w/ udf function calls OK |
| |
| SYSTEMDS-110 New Builtin Functions |
| * 111 Time builtin function for script-level measurements OK |
| * 112 Image data augmentation builtin functions OK |
| * 113 Builtin functions for linear regression algorithms OK |
| * 114 Builtin function for stepwise regression OK |
| * 115 Builtin function for model debugging (slice finder) OK |
| * 116 Builtin function for kmeans OK |
| * 117 Builtin function for lm cross validation OK |
| * 118 Builtin function for hyperparameter grid search |
| * 119 Builtin functions for l2svm and msvm OK |
| |
| SYSTEMDS-120 Performance Features |
| * 121 Avoid spark context creation on parfor result merge OK |
| * 122 Reduce thread contention on parfor left indexing OK |
| * 123 Avoid unnecessary spark context creation on explain OK |
| * 124 Improved cbind performance (small rhs) OK |
| * 125 Parallel Sort (previously 191) OK |
| |
| SYSTEMDS-130 IPA and Size Propagation |
| * 131 New IPA pass for function call forwarding OK |
| * 132 Size propagation via data characteristics |
| * 133 Hop Size Propagation for Tensors |
| * 134 Increased repretition and fixpoint detection OK |
| * 135 Fix IPA scalar propagation w/ named arguments OK |
| |
| SYSTEMDS-140 Distributed Tensor Operations |
| * 141 Infrastructure for spark tensor operations |
| * 142 Basis tensor operations (datagen, aggregate, elementwise) |
| |
| SYSTEMDS-150 Releases |
| * 151 Fixing native BLAS instructions (MKL,OpenBLAS) OK |
| * 152 Run script documentation OK |
| * 153 convert all occurrences SystemML/sysml to SystemDS/sysds OK |
| * 154 fix license headers in all touched files OK |
| * 155 import release scripts from SystemML repo OK |
| * 156 initial codestyle eclipse template OK |
| * 157 Shell script simplification, cleanups and fixes for 0.2 OK |
| |
| SYSTEMDS-160 Tensor Compiler/Runtime |
| * 161 Local readers and writers for tensors (all formats) |
| * 162 Spark binary tensor instructions |
| |
| SYSTEMDS-170 Lineage full and partial reuse |
| * 171 Initial version of partial rewrites OK |
| * 172 Parfor integration (blocked waiting for results) OK |
| * 173 Improved cost estimates OK |
| * 174 Reuse rewrite for rbind/cbind-tsmm/ba+* OK |
| * 175 Refactoring of lineage rewrite code OK |
| * 176 Reuse rewrite for cbind/rbind-elementwise */+ |
| * 177 Reuse rewrite for aggregate OK |
| * 178 Compiler assisted reuse (eg. CV, lmCG) |
| * 179 Lineage handling for lists OK |
| |
| SYSTEMDS-180 New Builtin Functions II OK |
| * 181 Builtin function for naive bayes OK |
| * 182 Builtin function for typeof (frame schema detection) OK |
| * 183 Builtin function detectSchema OK |
| * 184 Builtin function GNMF OK |
| * 185 Builtin function PNMF OK |
| * 186 Builtin function ALS-DS OK |
| * 187 Builtin function ALS-CG OK |
| * 188 Builtin function ALS OK |
| * 189 Builtin function mice (chained equation imputation) OK |
| |
| SYSTEMDS-190 New Builtin Functions III |
| * 191 Builtin function multi logreg OK |
| * 192 Extended cmdline interface -b for calling builtin functions |
| * 193 New outlierBySds builtin function (delete row/cell) OK |
| * 194 New outlierByIQR builtin function (delete row/cell) OK |
| * 195 Builtin function mice for nominal features OK |
| * 196 Builtin function intersect (set intersection) OK |
| * 197 Builtin function for functional dependency discovery OK |
| * 198 Extended slice finding (classification) OK |
| * 199 Builtin function Multinominal Logistic Regression Predict OK |
| * 200 Builtin function Gaussian Mixture Model OK |
| |
| SYSTEMDS-200 Various Fixes |
| * 201 Fix spark append instruction zero columns OK |
| * 202 Fix rewrite split DAG multiple removeEmpty (pending op) OK |
| * 203 Fix codegen outer compilation (MV operations) OK |
| * 204 Fix rewrite simplify sequences of binary comparisons OK |
| * 205 Fix scoping of builtin dml-bodied functions (vs user-defined) |
| * 206 Fix codegen outer template compilation (tsmm) OK |
| * 207 Fix builtin function call hoisting from expressions OK |
| * 208 Fix bufferpool leak (live var analysis and createvar) OK |
| * 209 Fix performance sparse M-CV elementwise multiply OK |
| |
| SYSTEMDS-210 Extended lists Operations |
| * 211 Cbind and Rbind over lists of matrices OK |
| * 212 Creation of empty lists OK |
| * 213 Add entries to lists OK |
| * 214 Removal of entries from lists OK |
| * 215 List expansion during dynamic recompilation OK |
| * 216 New rewrite for tsmm/mm over list of folds OK |
| |
| SYSTEMDS-220 Federated Tensors and Instructions |
| * 221 Initial infrastructure federated operations OK |
| * 222 Federated matrix-vector multiplication OK |
| * 223 Federated unary aggregates OK |
| * 224 Federated frames OK |
| * 225 Federated elementwise operations OK |
| * 226 Federated rbind and cbind OK |
| * 227 Federated worker setup and infrastructure OK |
| * 228 Federated matrix-matrix multiplication OK |
| * 229 Federated transform functionality |
| |
| SYSTEMDS-230 Lineage Integration |
| * 231 Use lineage in buffer pool |
| * 232 Lineage of code generated operators OK |
| * 233 Lineage cache and reuse of function results OK |
| * 234 Lineage tracing for spark instructions OK |
| * 235 Lineage tracing for remote-spark parfor OK |
| * 236 Extended multi-level lineage cache for statement blocks OK |
| * 237 Unmark loop dependent operations OK |
| * 238 Robustness parfor lineage merge (flaky tests) OK |
| * 239 Make lineage accessible through python bindings OK |
| |
| SYSTEMDS-240 GPU Backend Improvements |
| * 241 Dense GPU cumulative aggregates OK |
| |
| SYSTEMDS-250 Extended Slice Finding |
| * 251 Alternative slice enumeration approach OK |
| * 252 Initial data slicing implementation Python OK |
| * 253 Distributed slicing algorithms (task/data parallel) OK |
| * 254 Consolidation and fixes distributed slice finding OK |
| |
| SYSTEMDS-260 Misc Tools |
| * 261 Stable marriage algorithm OK |
| * 262 Data augmentation tool for data cleaning OK |
| * 263 ONNX graph importer (Python API, docs, tests) OK |
| * 264 ONNX graph exporter |
| * 265 Entity resolution pipelines and primitives OK |
| |
| SYSTEMDS-270 Compressed Matrix Blocks |
| * 271 Reintroduce compressed matrix blocks from SystemML OK |
| * 272 Simplify and speedup compression tests OK |
| * 273 Refactor compressed Matrix Block to simplify responsibilities OK |
| * 273a Redesign allocation of ColGroups in ColGroupFactory |
| * 274 Make the DDC Compression dictionary share correctly OK |
| * 275 Include compressionSettings in DMLConfiguration |
| * 276 Allow Uncompressed Columns to be in sparse formats |
| * 277 Sampling based estimators fix |
| * 278 Compression-CoCode algorithm optimization |
| * 278a Return ColGroups estimated compression ratio to Factory |
| * 279 Add missing standard lossless compression techniques |
| * 279a ColGroup FOR (Frame of reference) encoding |
| * 279b ColGroup DEL (Delta) encoding |
| * MINOR Reduce memory usage for compression statistics. |
| * MINOR Make ContainsAZero() method in UncompressedBitMap |
| |
| SYSTEMDS-280 New Rewrites |
| * 281 New rewrites sum(rmEmpty), sum(table) OK |
| * 282 New rewrite binary add chain to nary plus OK |
| |
| SYSTEMDS-290 Misc Compiler Features |
| * 291 Eval function calls of dml-bodied builtin functions OK |
| * 292 Eval function calls of native builtin functions (genate fun) |
| * 293 Compiler rewrites for eval to normal function calls |
| * 294 DML-bodied builtin functions with frames as main input |
| |
| SYSTEMDS-300 Github Workflow/Actions |
| * 301 Initial Setup of GitHub Actions for automatic testing OK |
| * 302 Workflow for Application tests OK |
| * 303 Workflow for Function tests OK |
| * 304 Workflow make documentation OK |
| * 305 Workflow run docker containers for speedup OK |
| |
| SYSTEMDS-310 Python Bindings |
| * 311 Initial Python Binding for federated execution OK |
| * 312 Python 3.6 compatibility OK |
| * 313 Python Documentation upload via Github Actions OK |
| * 314 Python SystemDS context manager OK |
| * 314a Change py4j interaction using py4j for termination not stdin |
| * 315 Python Federated Matrices Tests OK |
| * 316 Extended Python API (rand, lm, mm) OK |
| * 317 Python Rerun with different inputs without recompilation |
| * 318 Python Multiple outputs |
| * 319 Python Caching of intermediates |
| |
| SYSTEMDS-320 Merge SystemDS into Apache SystemML OK |
| * 321 Merge histories of SystemDS and SystemML OK |
| * 322 Change global package names OK |
| * 323 Fix licenses and notice file OK |
| |
| SYSTEMDS-330 Lineage Tracing, Reuse and Integration |
| * 331 Cache and reuse scalar outputs (instruction and multi-level) OK |
| * 332 Parfor integration with multi-level reuse OK |
| * 333 Improve cache eviction with actual compute time OK |
| * 334 Cache scalars only with atleast one matrix inputs |
| * 335 Weighted eviction policy (function(size,computetime,LRU time)) OK |
| * 336 Better use of cache status to handle multithreading |
| * 337 Adjust disk I/O speed by recording actual time taken OK |
| * 338 Extended lineage tracing (rmEmpty, lists), partial rewrites OK |
| * 339 Lineage tracing robustness (indexed updates, algorithms) |
| |
| SYSTEMDS-340 Compiler Assisted Lineage Caching and Reuse |
| * 341 Finalize unmarking of loop dependent operations |
| * 342 Mark functions as last-use to enable early eviction |
| * 343 Identify equal last level HOPs to ensure SB-level reuse |
| * 344 Unmark functions/SBs containing non-determinism for caching OK |
| * 345 Compiler assisted cache configuration |
| |
| SYSTEMDS-350 Data Cleaning Framework |
| * 351 New builtin function for error correction by schema OK |
| * 352 New builtin function for error correction by length Ok |
| |
| SYSTEMDS-360 Privacy/Data Exchange Constraints |
| * 361 Initial privacy meta data (compiler/runtime) OK |
| * 362 Runtime privacy propagation |
| * 363 Compile-time privacy propagation |
| * 364 Error handling violated privacy constraints OK |
| * 365 Extended privacy/data exchange constraints OK |
| |
| SYSTEMDS-370 Lossy Compression Blocks |
| * 371 ColGroup Quantization |
| * 372 ColGroup Base Data change (from Double to ??) |
| |
| SYSTEMDS-380 Memory Footprint |
| * 381 Matrix Block Memory footprint update |
| |
| SYSTEMDS-390 New Builtin Functions IV |
| * 391 New GLM builtin function (from algorithms) OK |
| * 392 Builtin function for missing value imputation via FDs OK |
| * 393 Builtin to find Connected Components of a graph OK |
| * 394 Builtin for one-hot encoding of matrix (not frame), see table OK |
| * 395 SVM rework and utils (confusionMatrix, msvmPredict) OK |
| * 396 Builtin for counting number of distinct values OK |
| * 397 Algorithm for neural collaborative filtering (NCF) OK |
| |
| SYSTEMDS-400 Spark Backend Improvements |
| * 401 Fix output block indexes of rdiag (diagM2V) OK |
| |
| SYSTEMDS-410 Lineage Tracing, Reuse and Integration II |
| * 411 Improved handling of multi-level cache duplicates OK |
| * 412 Robust lineage tracing (non-recursive, parfor) OK |
| * 413 Cache and reuse MultiReturnBuiltin instructions OK |
| * 414 New rewrite for PCA --> lmDS pipeline OK |
| * 415 MLContext lineage support (tracing and reuse correctness) OK |
| * 416 Lineage deduplication while, nested if, loop sequences OK |
| * 417 New rewrite for partial reuse in StepLM OK |
| * 418 Performance lineage tracing and reuse probing small data OK |
| * 419 Performance and robustness partial rewrites |
| |
| SYSTEMDS-420 Compiler Improvements |
| * 421 Fix invalid IPA scalar propagation into functions OK |
| |
| SYSTEMDS-500 Documentation Webpage Reintroduction |
| * 501 Make Documentation webpage framework OK |
| |
| Others: |
| * Break append instruction to cbind and rbind |
| |
| SYSTEMDS-510 IO formats |
| * 511 Add protobuf support to write and read FrameBlocks to HDFS OK |
| |
| SYSTEMDS-520 Lineage Tracing, Reuse and Integration III |
| * 521 New lineage option exposing cache policies OK |
| |
| SYSTEMDS-610 Cleaning Pipelines |
| * 611 Initial Brute force execution OK |
| |
| |