blob: 7d30fdca91948a1f7976a4cb922648d9daaed26c [file] [log] [blame]
<!--
{% comment %}
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to you under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% end comment %}
-->
---
GENERAL NOTES:
* Assign IDs for epics by default with a key range of 10
* Keep epic tasks in the key range, otherwise create new epic
---
SYSTEMDS-10 Compiler Rework / Misc
* 11 Support DML-bodied builtin functions OK
* 12 Remove unnecessary HOP/LOP indirections OK
* 13 Refactoring test cases into component/integration OK
* 14 Complete removal of external functions from all scripts
* 15 Travis integration w/ subset of tests OK (removed for Github Actions)
* 16 Remove instruction patching
* 17 Refactoring of program block hierarchy OK
* 18 Improve API for new dml-bodied builtin functions OK
* 19 Cleanup pass cpvar+rmvar to mvvar instructions OK
SYSTEMDS-20 New Data Model
* 21 Finalize dense tensor blocks OK
* 22 Sparse double/float tensor blocks
* 23 Sparse int/bool tensor blocks
* 24 Initial tensor dml integration OK
* 25 Initial data tensor implementation
* 26 Non-zero default value for sparse (row/col)
* 27 Tensor buffer pool integration
* 28 Tensor local readers and writers (textcell, binary) OK
* 29 Tensor local readers and writers (linearize csv)
SYSTEMDS-30 Builtin and Packaging
* 31 Shell script for local runs OK
* 32 Shell script for spark runs OK
* 33 Cleanup hadoop dependency for local runs
* 34 Wrapper blocks for sequence files
* 35 Replace unnecessary dependencies w/ custom
* 36 Shell script for AWS execution OK
* 37 Cleanup mvn package, compile, test OK
* 38 Runscript rename, improve, fix cygwin OK
SYSTEMDS-40 Preprocessing builtins
* 41 Add new winsorize builtin function OK
* 42 Add new outlier builtin function OK
* 43 Add new scale builtin function OK
* 44 SotA normalization primitives
* 45 SotA outlier detection primitives
* 46 Generalization of quantiles
* 47 Add new normalize builtin function OK
* 48 Extended transform encode/apply: feature hashing OK
SYSTEMDS-50 I/O Formats
* 51 Support for homogeneous JSON (local/distributed)
* 52 Support for libsvm files (local/distributed)
* 53 New sql data source (local, distributed)
* 54 Support for is.na, is.nan, is.infinite OK
* 55 Cleanup file formats, binary cell, input/output infos OK
* 56 Consolidate single/multi-threaded readers
* 57 Support for hdf5 files (local/distributed)
SYSTEMDS-60 Update SystemML improvements
* 61 Take over cumulative aggregate improvements OK
* 62 Take over sparsity estimation improvements OK
* 63 Take over fixes for transform w/ binning
SYSTEMDS-70 Lineage Tracing and Reuse OK
* 71 Basic collection of lineage traces OK
* 72 Serialization and deserialization of lineage traces OK
* 73 Deduplication of lineage traces OK
(loops, nested loops, branchless loops) OK
* 74 Performance features lineage tracing OK
* 75 Reuse cache based on lineage traces OK
* 76 Generate runtime plan from lineage trace OK
* 77 New builtin function for obtaining lineage OK
* 78 Extended lineage tracing (parfor, funs, codegen) OK
* 79 Reuse cache configuration options (cmd line options) OK
SYSTEMDS-80 Improved distributed operations
* 81 Avoid unnecessary replication on rmm
* 82 Avoid unnecessary replication on binary
SYSTEMDS-100 Various Fixes
* 101 Fix spark quantiles w/ multiple queries OK
* 102 Fix parser issue after refactoring OK
* 103 Fix handling of builtin functions w/ matching udfs OK
* 104 Fix failing tests due to incorrect parfor parameters OK
* 105 Fix all application/function tests (various issues) OK
* 106 Fix correctness of as.integer for negative numbers OK
* 107 Fix correctness IPA check dimension-preserving OK
* 108 Fix codegen optimizer (early costing abort) OK
* 109 Fix update-in-place w/ udf function calls OK
SYSTEMDS-110 New Builtin Functions
* 111 Time builtin function for script-level measurements OK
* 112 Image data augmentation builtin functions OK
* 113 Builtin functions for linear regression algorithms OK
* 114 Builtin function for stepwise regression OK
* 115 Builtin function for model debugging (slice finder) OK
* 116 Builtin function for kmeans OK
* 117 Builtin function for lm cross validation OK
* 118 Builtin function for hyperparameter grid search
* 119 Builtin functions for l2svm and msvm OK
SYSTEMDS-120 Performance Features
* 121 Avoid spark context creation on parfor result merge OK
* 122 Reduce thread contention on parfor left indexing OK
* 123 Avoid unnecessary spark context creation on explain OK
* 124 Improved cbind performance (small rhs) OK
* 125 Parallel Sort (previously 191) OK
SYSTEMDS-130 IPA and Size Propagation
* 131 New IPA pass for function call forwarding OK
* 132 Size propagation via data characteristics
* 133 Hop Size Propagation for Tensors
* 134 Increased repretition and fixpoint detection OK
* 135 Fix IPA scalar propagation w/ named arguments OK
SYSTEMDS-140 Distributed Tensor Operations
* 141 Infrastructure for spark tensor operations
* 142 Basis tensor operations (datagen, aggregate, elementwise)
SYSTEMDS-150 Releases
* 151 Fixing native BLAS instructions (MKL,OpenBLAS) OK
* 152 Run script documentation OK
* 153 convert all occurrences SystemML/sysml to SystemDS/sysds OK
* 154 fix license headers in all touched files OK
* 155 import release scripts from SystemML repo OK
* 156 initial codestyle eclipse template OK
* 157 Shell script simplification, cleanups and fixes for 0.2 OK
SYSTEMDS-160 Tensor Compiler/Runtime
* 161 Local readers and writers for tensors (all formats)
* 162 Spark binary tensor instructions
SYSTEMDS-170 Lineage full and partial reuse
* 171 Initial version of partial rewrites OK
* 172 Parfor integration (blocked waiting for results) OK
* 173 Improved cost estimates OK
* 174 Reuse rewrite for rbind/cbind-tsmm/ba+* OK
* 175 Refactoring of lineage rewrite code OK
* 176 Reuse rewrite for cbind/rbind-elementwise */+
* 177 Reuse rewrite for aggregate OK
* 178 Compiler assisted reuse (eg. CV, lmCG)
* 179 Lineage handling for lists OK
SYSTEMDS-180 New Builtin Functions II OK
* 181 Builtin function for naive bayes OK
* 182 Builtin function for typeof (frame schema detection) OK
* 183 Builtin function detectSchema OK
* 184 Builtin function GNMF OK
* 185 Builtin function PNMF OK
* 186 Builtin function ALS-DS OK
* 187 Builtin function ALS-CG OK
* 188 Builtin function ALS OK
* 189 Builtin function mice (chained equation imputation) OK
SYSTEMDS-190 New Builtin Functions III
* 191 Builtin function multi logreg OK
* 192 Extended cmdline interface -b for calling builtin functions
* 193 New outlierBySds builtin function (delete row/cell) OK
* 194 New outlierByIQR builtin function (delete row/cell) OK
* 195 Builtin function mice for nominal features OK
* 196 Builtin function intersect (set intersection) OK
* 197 Builtin function for functional dependency discovery OK
* 198 Extended slice finding (classification) OK
* 199 Builtin function Multinominal Logistic Regression Predict OK
* 200 Builtin function Gaussian Mixture Model OK
SYSTEMDS-200 Various Fixes
* 201 Fix spark append instruction zero columns OK
* 202 Fix rewrite split DAG multiple removeEmpty (pending op) OK
* 203 Fix codegen outer compilation (MV operations) OK
* 204 Fix rewrite simplify sequences of binary comparisons OK
* 205 Fix scoping of builtin dml-bodied functions (vs user-defined)
* 206 Fix codegen outer template compilation (tsmm) OK
* 207 Fix builtin function call hoisting from expressions OK
* 208 Fix bufferpool leak (live var analysis and createvar) OK
* 209 Fix performance sparse M-CV elementwise multiply OK
SYSTEMDS-210 Extended lists Operations
* 211 Cbind and Rbind over lists of matrices OK
* 212 Creation of empty lists OK
* 213 Add entries to lists OK
* 214 Removal of entries from lists OK
* 215 List expansion during dynamic recompilation OK
* 216 New rewrite for tsmm/mm over list of folds OK
SYSTEMDS-220 Federated Tensors and Instructions
* 221 Initial infrastructure federated operations OK
* 222 Federated matrix-vector multiplication OK
* 223 Federated unary aggregates OK
* 224 Federated frames OK
* 225 Federated elementwise operations OK
* 226 Federated rbind and cbind OK
* 227 Federated worker setup and infrastructure OK
* 228 Federated matrix-matrix multiplication OK
* 229 Federated transform functionality
SYSTEMDS-230 Lineage Integration
* 231 Use lineage in buffer pool
* 232 Lineage of code generated operators OK
* 233 Lineage cache and reuse of function results OK
* 234 Lineage tracing for spark instructions OK
* 235 Lineage tracing for remote-spark parfor OK
* 236 Extended multi-level lineage cache for statement blocks OK
* 237 Unmark loop dependent operations OK
* 238 Robustness parfor lineage merge (flaky tests) OK
* 239 Make lineage accessible through python bindings OK
SYSTEMDS-240 GPU Backend Improvements
* 241 Dense GPU cumulative aggregates OK
SYSTEMDS-250 Extended Slice Finding
* 251 Alternative slice enumeration approach OK
* 252 Initial data slicing implementation Python OK
* 253 Distributed slicing algorithms (task/data parallel) OK
* 254 Consolidation and fixes distributed slice finding OK
SYSTEMDS-260 Misc Tools
* 261 Stable marriage algorithm OK
* 262 Data augmentation tool for data cleaning OK
* 263 ONNX graph importer (Python API, docs, tests) OK
* 264 ONNX graph exporter
* 265 Entity resolution pipelines and primitives OK
SYSTEMDS-270 Compressed Matrix Blocks
* 271 Reintroduce compressed matrix blocks from SystemML OK
* 272 Simplify and speedup compression tests OK
* 273 Refactor compressed Matrix Block to simplify responsibilities OK
* 273a Redesign allocation of ColGroups in ColGroupFactory
* 274 Make the DDC Compression dictionary share correctly OK
* 275 Include compressionSettings in DMLConfiguration
* 276 Allow Uncompressed Columns to be in sparse formats
* 277 Sampling based estimators fix
* 278 Compression-CoCode algorithm optimization
* 278a Return ColGroups estimated compression ratio to Factory
* 279 Add missing standard lossless compression techniques
* 279a ColGroup FOR (Frame of reference) encoding
* 279b ColGroup DEL (Delta) encoding
* MINOR Reduce memory usage for compression statistics.
* MINOR Make ContainsAZero() method in UncompressedBitMap
SYSTEMDS-280 New Rewrites
* 281 New rewrites sum(rmEmpty), sum(table) OK
* 282 New rewrite binary add chain to nary plus OK
SYSTEMDS-290 Misc Compiler Features
* 291 Eval function calls of dml-bodied builtin functions OK
* 292 Eval function calls of native builtin functions (genate fun)
* 293 Compiler rewrites for eval to normal function calls
* 294 DML-bodied builtin functions with frames as main input
SYSTEMDS-300 Github Workflow/Actions
* 301 Initial Setup of GitHub Actions for automatic testing OK
* 302 Workflow for Application tests OK
* 303 Workflow for Function tests OK
* 304 Workflow make documentation OK
* 305 Workflow run docker containers for speedup OK
SYSTEMDS-310 Python Bindings
* 311 Initial Python Binding for federated execution OK
* 312 Python 3.6 compatibility OK
* 313 Python Documentation upload via Github Actions OK
* 314 Python SystemDS context manager OK
* 314a Change py4j interaction using py4j for termination not stdin
* 315 Python Federated Matrices Tests OK
* 316 Extended Python API (rand, lm, mm) OK
* 317 Python Rerun with different inputs without recompilation
* 318 Python Multiple outputs
* 319 Python Caching of intermediates
SYSTEMDS-320 Merge SystemDS into Apache SystemML OK
* 321 Merge histories of SystemDS and SystemML OK
* 322 Change global package names OK
* 323 Fix licenses and notice file OK
SYSTEMDS-330 Lineage Tracing, Reuse and Integration
* 331 Cache and reuse scalar outputs (instruction and multi-level) OK
* 332 Parfor integration with multi-level reuse OK
* 333 Improve cache eviction with actual compute time OK
* 334 Cache scalars only with atleast one matrix inputs
* 335 Weighted eviction policy (function(size,computetime,LRU time)) OK
* 336 Better use of cache status to handle multithreading
* 337 Adjust disk I/O speed by recording actual time taken OK
* 338 Extended lineage tracing (rmEmpty, lists), partial rewrites OK
* 339 Lineage tracing robustness (indexed updates, algorithms)
SYSTEMDS-340 Compiler Assisted Lineage Caching and Reuse
* 341 Finalize unmarking of loop dependent operations
* 342 Mark functions as last-use to enable early eviction
* 343 Identify equal last level HOPs to ensure SB-level reuse
* 344 Unmark functions/SBs containing non-determinism for caching OK
* 345 Compiler assisted cache configuration
SYSTEMDS-350 Data Cleaning Framework
* 351 New builtin function for error correction by schema OK
* 352 New builtin function for error correction by length Ok
SYSTEMDS-360 Privacy/Data Exchange Constraints
* 361 Initial privacy meta data (compiler/runtime) OK
* 362 Runtime privacy propagation
* 363 Compile-time privacy propagation
* 364 Error handling violated privacy constraints OK
* 365 Extended privacy/data exchange constraints OK
SYSTEMDS-370 Lossy Compression Blocks
* 371 ColGroup Quantization OK (Naive Q8)
* 321 ColGroup Base Data change (from Double to ??)
SYSTEMDS-380 Memory Footprint
* 381 Matrix Block Memory footprint update
SYSTEMDS-390 New Builtin Functions IV
* 391 New GLM builtin function (from algorithms) OK
* 392 Builtin function for missing value imputation via FDs OK
* 393 Builtin to find Connected Components of a graph OK
* 394 Builtin for one-hot encoding of matrix (not frame), see table OK
* 395 SVM rework and utils (confusionMatrix, msvmPredict) OK
* 396 Builtin for counting number of distinct values OK
* 397 Algorithm for neural collaborative filtering (NCF) OK
SYSTEMDS-400 Spark Backend Improvements
* 401 Fix output block indexes of rdiag (diagM2V) OK
SYSTEMDS-410 Lineage Tracing, Reuse and Integration II
* 411 Improved handling of multi-level cache duplicates OK
* 412 Robust lineage tracing (non-recursive, parfor) OK
* 413 Cache and reuse MultiReturnBuiltin instructions OK
* 414 New rewrite for PCA --> lmDS pipeline OK
* 415 MLContext lineage support (tracing and reuse correctness) OK
* 416 Lineage deduplication while, nested if, loop sequences OK
* 417 New rewrite for partial reuse in StepLM OK
* 418 Performance lineage tracing and reuse probing small data OK
* 419 Performance and robustness partial rewrites
SYSTEMDS-420 Compiler Improvements
* 421 Fix invalid IPA scalar propagation into functions OK
SYSTEMDS-500 Documentation Webpage Reintroduction
* 501 Make Documentation webpage framework OK
Others:
* Break append instruction to cbind and rbind
SYSTEMDS-510 IO formats
* 511 Add protobuf support to write and read FrameBlocks to HDFS OK
SYSTEMDS-520 Lineage Tracing, Reuse and Integration III
* 521 New lineage option exposing cache policies OK
SYSTEMDS-610 Cleaning Pipelines
* 611 Initial Brute force execution OK