blob: f3d4acd5b11bdeee3e248b8851d2595e19b2ffdc [file] [log] [blame]
---
GENERAL NOTES:
* Assign IDs for epics by default with a key range of 10
* Keep epic tasks in the key range, otherwise create new epic
---
SYSTEMDS-10 Compiler Rework / Misc
* 11 Support DML-bodied builtin functions OK
* 12 Remove unnecessary HOP/LOP indirections OK
* 13 Refactoring test cases into component/integration OK
* 14 Complete removal of external functions from all scripts
* 15 Travis integration w/ subset of tests OK (removed for Github Actions)
* 16 Remove instruction patching
* 17 Refactoring of program block hierarchy OK
* 18 Improve API for new dml-bodied builtin functions OK
* 19 Cleanup pass cpvar+rmvar to mvvar instructions OK
SYSTEMDS-20 New Data Model
* 21 Finalize dense tensor blocks OK
* 22 Sparse double/float tensor blocks
* 23 Sparse int/bool tensor blocks
* 24 Initial tensor dml integration OK
* 25 Initial data tensor implementation
* 26 Non-zero default value for sparse (row/col)
* 27 Tensor buffer pool integration
* 28 Tensor local readers and writers (textcell, binary) OK
* 29 Tensor local readers and writers (linearize csv)
SYSTEMDS-30 Builtin and Packaging
* 31 Shell script for local runs OK
* 32 Shell script for spark runs OK
* 33 Cleanup hadoop dependency for local runs
* 34 Wrapper blocks for sequence files
* 35 Replace unnecessary dependencies w/ custom
* 36 Shell script for AWS execution OK
* 37 Cleanup mvn package, compile, test OK
* 38 Runscript rename, improve, fix cygwin OK
SYSTEMDS-40 Preprocessing builtins
* 41 Add new winsorize builtin function OK
* 42 Add new outlier builtin function OK
* 43 Add new scale builtin function OK
* 44 SotA normalization primitives
* 45 SotA outlier detection primitives
* 46 Generalization of quantiles
* 47 Add new normalize builtin function OK
* 48 Extended transform encode/apply: feature hashing OK
SYSTEMDS-50 I/O Formats
* 51 Support for homogeneous JSON (local/distributed)
* 52 Support for libsvm files (local/distributed)
* 53 New sql data source (local, distributed)
* 54 Support for is.na, is.nan, is.infinite OK
* 55 Cleanup file formats, binary cell, input/output infos OK
* 56 Consolidate single/multi-threaded readers
* 57 Support for hdf5 files (local/distributed)
SYSTEMDS-60 Update SystemML improvements
* 61 Take over cumulative aggregate improvements OK
* 62 Take over sparsity estimation improvements OK
* 63 Take over fixes for transform w/ binning
SYSTEMDS-70 Lineage Tracing and Reuse OK
* 71 Basic collection of lineage traces OK
* 72 Serialization and deserialization of lineage traces OK
* 73 Deduplication of lineage traces OK
(loops, nested loops, branchless loops) OK
* 74 Performance features lineage tracing OK
* 75 Reuse cache based on lineage traces OK
* 76 Generate runtime plan from lineage trace OK
* 77 New builtin function for obtaining lineage OK
* 78 Extended lineage tracing (parfor, funs, codegen) OK
* 79 Reuse cache configuration options (cmd line options) OK
SYSTEMDS-80 Improved distributed operations
* 81 Avoid unnecessary replication on rmm
* 82 Avoid unnecessary replication on binary
SYSTEMDS-100 Various Fixes
* 101 Fix spark quantiles w/ multiple queries OK
* 102 Fix parser issue after refactoring OK
* 103 Fix handling of builtin functions w/ matching udfs OK
* 104 Fix failing tests due to incorrect parfor parameters OK
* 105 Fix all application/function tests (various issues) OK
* 106 Fix correctness of as.integer for negative numbers OK
* 107 Fix correctness IPA check dimension-preserving OK
* 108 Fix codegen optimizer (early costing abort) OK
* 109 Fix update-in-place w/ udf function calls OK
SYSTEMDS-110 New Builtin Functions
* 111 Time builtin function for script-level measurements OK
* 112 Image data augmentation builtin functions OK
* 113 Builtin functions for linear regression algorithms OK
* 114 Builtin function for stepwise regression OK
* 115 Builtin function for model debugging (slice finder) OK
* 116 Builtin function for kmeans OK
* 117 Builtin function for lm cross validation OK
* 118 Builtin function for hyperparameter grid search
* 119 Builtin functions for l2svm and msvm OK
SYSTEMDS-120 Performance Features
* 121 Avoid spark context creation on parfor result merge OK
* 122 Reduce thread contention on parfor left indexing OK
* 123 Avoid unnecessary spark context creation on explain OK
* 124 Improved cbind performance (small rhs) OK
* 125 Parallel Sort (previously 191) OK
SYSTEMDS-130 IPA and Size Propagation
* 131 New IPA pass for function call forwarding OK
* 132 Size propagation via data characteristics
* 133 Hop Size Propagation for Tensors
* 134 Increased repretition and fixpoint detection OK
* 135 Fix IPA scalar propagation w/ named arguments OK
SYSTEMDS-140 Distributed Tensor Operations
* 141 Infrastructure for spark tensor operations
* 142 Basis tensor operations (datagen, aggregate, elementwise)
SYSTEMDS-150 Releases
* 151 Fixing native BLAS instructions (MKL,OpenBLAS) OK
* 152 Run script documentation OK
* 153 convert all occurrences SystemML/sysml to SystemDS/sysds OK
* 154 fix license headers in all touched files OK
* 155 import release scripts from SystemML repo OK
* 156 initial codestyle eclipse template OK
* 157 Shell script simplification, cleanups and fixes for 0.2 OK
SYSTEMDS-160 Tensor Compiler/Runtime
* 161 Local readers and writers for tensors (all formats)
* 162 Spark binary tensor instructions
SYSTEMDS-170 Lineage full and partial reuse
* 171 Initial version of partial rewrites OK
* 172 Parfor integration (blocked waiting for results) OK
* 173 Improved cost estimates OK
* 174 Reuse rewrite for rbind/cbind-tsmm/ba+* OK
* 175 Refactoring of lineage rewrite code OK
* 176 Reuse rewrite for cbind/rbind-elementwise */+
* 177 Reuse rewrite for aggregate OK
* 178 Compiler assisted reuse (eg. CV, lmCG)
* 179 Lineage handling for lists OK
SYSTEMDS-180 New Builtin Functions II OK
* 181 Builtin function for naive bayes OK
* 182 Builtin function for typeof (frame schema detection) OK
* 183 Builtin function detectSchema OK
* 184 Builtin function GNMF OK
* 185 Builtin function PNMF OK
* 186 Builtin function ALS-DS OK
* 187 Builtin function ALS-CG OK
* 188 Builtin function ALS OK
* 189 Builtin function mice (chained equation imputation) OK
SYSTEMDS-190 New Builtin Functions III
* 191 Builtin function multi logreg OK
* 192 Extended cmdline interface -b for calling builtin functions
* 193 New outlierBySds builtin function (delete row/cell) OK
* 194 New outlierByIQR builtin function (delete row/cell) OK
* 195 Builtin function mice for nominal features OK
* 196 Builtin function intersect (set intersection) OK
* 197 Builtin function for functional dependency discovery OK
* 198 Extended slice finding (classification) OK
* 199 Builtin function Multinominal Logistic Regression Predict OK
SYSTEMDS-200 Various Fixes
* 201 Fix spark append instruction zero columns OK
* 202 Fix rewrite split DAG multiple removeEmpty (pending op) OK
* 203 Fix codegen outer compilation (MV operations) OK
* 204 Fix rewrite simplify sequences of binary comparisons OK
* 205 Fix scoping of builtin dml-bodied functions (vs user-defined)
* 206 Fix codegen outer template compilation (tsmm) OK
* 207 Fix builtin function call hoisting from expressions OK
* 208 Fix bufferpool leak (live var analysis and createvar) OK
* 209 Fix performance sparse M-CV elementwise multiply OK
SYSTEMDS-210 Extended lists Operations
* 211 Cbind and Rbind over lists of matrices OK
* 212 Creation of empty lists OK
* 213 Add entries to lists OK
* 214 Removal of entries from lists OK
* 215 List expansion during dynamic recompilation OK
* 216 New rewrite for tsmm/mm over list of folds OK
SYSTEMDS-220 Federated Tensors and Instructions
* 221 Initial infrastructure federated operations OK
* 222 Federated matrix-vector multiplication OK
* 223 Federated unary aggregates OK
* 224 Federated frames OK
* 225 Federated elementwise operations OK
* 226 Federated rbind and cbind OK
* 227 Federated worker setup and infrastructure OK
* 228 Federated matrix-matrix multiplication OK
* 229 Federated transform functionality
SYSTEMDS-230 Lineage Integration
* 231 Use lineage in buffer pool
* 232 Lineage of code generated operators OK
* 233 Lineage cache and reuse of function results OK
* 234 Lineage tracing for spark instructions OK
* 235 Lineage tracing for remote-spark parfor OK
* 236 Extended multi-level lineage cache for statement blocks OK
* 237 Unmark loop dependent operations OK
* 238 Robustness parfor lineage merge (flaky tests) OK
* 239 Make lineage accessible through python bindings OK
SYSTEMDS-240 GPU Backend Improvements
* 241 Dense GPU cumulative aggregates OK
SYSTEMDS-250 Extended Slice Finding
* 251 Alternative slice enumeration approach OK
* 252 Initial data slicing implementation Python OK
* 253 Distributed slicing algorithms (task/data parallel) OK
* 254 Consolidation and fixes distributed slice finding OK
SYSTEMDS-260 Misc Tools
* 261 Stable marriage algorithm OK
* 262 Data augmentation tool for data cleaning OK
* 263 ONNX graph importer (Python API, docs, tests) OK
* 264 ONNX graph exporter
SYSTEMDS-270 Compressed Matrix Blocks
* 271 Reintroduce compressed matrix blocks from SystemML OK
* 272 Simplify and speedup compression tests OK
* 273 Refactor compressed Matrix Block to simplify responsibilities OK
* 273a Redesign allocation of ColGroups in ColGroupFactory
* 274 Make the DDC Compression dictionary share correctly OK
* 275 Include compressionSettings in DMLConfiguration
* 276 Allow Uncompressed Columns to be in sparse formats
* 277 Sampling based estimators fix
* 278 Compression-CoCode algorithm optimization
* 278a Return ColGroups estimated compression ratio to Factory
* 279 Add missing standard lossless compression techniques
* 279a ColGroup FOR (Frame of reference) encoding
* 279b ColGroup DEL (Delta) encoding
* MINOR Reduce memory usage for compression statistics.
* MINOR Make ContainsAZero() method in UncompressedBitMap
SYSTEMDS-280 New Rewrites
* 281 New rewrites sum(rmEmpty), sum(table) OK
* 282 New rewrite binary add chain to nary plus OK
SYSTEMDS-290 Misc Compiler Features
* 291 Eval function calls of dml-bodied builtin functions OK
* 292 Eval function calls of native builtin functions (genate fun)
* 293 Compiler rewrites for eval to normal function calls
* 294 DML-bodied builtin functions with frames as main input
SYSTEMDS-300 Github Workflow/Actions
* 301 Initial Setup of GitHub Actions for automatic testing OK
* 302 Workflow for Application tests OK
* 303 Workflow for Function tests OK
* 304 Workflow make documentation OK
* 305 Workflow run docker containers for speedup OK
SYSTEMDS-310 Python Bindings
* 311 Initial Python Binding for federated execution OK
* 312 Python 3.6 compatibility OK
* 313 Python Documentation upload via Github Actions OK
* 314 Python SystemDS context manager OK
* 314a Change py4j interaction using py4j for termination not stdin
* 315 Python Federated Matrices Tests OK
* 316 Extended Python API (rand, lm, mm) OK
* 317 Python Rerun with different inputs without recompilation
* 318 Python Multiple outputs
* 319 Python Caching of intermediates
SYSTEMDS-320 Merge SystemDS into Apache SystemML OK
* 321 Merge histories of SystemDS and SystemML OK
* 322 Change global package names OK
* 323 Fix licenses and notice file OK
SYSTEMDS-330 Lineage Tracing, Reuse and Integration
* 331 Cache and reuse scalar outputs (instruction and multi-level) OK
* 332 Parfor integration with multi-level reuse OK
* 333 Improve cache eviction with actual compute time OK
* 334 Cache scalars only with atleast one matrix inputs
* 335 Weighted eviction policy (function(size,computetime,LRU time)) OK
* 336 Better use of cache status to handle multithreading
* 337 Adjust disk I/O speed by recording actual time taken OK
* 338 Extended lineage tracing (rmEmpty, lists), partial rewrites OK
* 339 Lineage tracing robustness (indexed updates, algorithms)
SYSTEMDS-340 Compiler Assisted Lineage Caching and Reuse
* 341 Finalize unmarking of loop dependent operations
* 342 Mark functions as last-use to enable early eviction
* 343 Identify equal last level HOPs to ensure SB-level reuse
* 344 Unmark functions/SBs containing non-determinism for caching OK
* 345 Compiler assisted cache configuration
SYSTEMDS-350 Data Cleaning Framework
* 351 New builtin function for error correction by schema OK
SYSTEMDS-360 Privacy/Data Exchange Constraints
* 361 Initial privacy meta data (compiler/runtime) OK
* 362 Runtime privacy propagation
* 363 Compile-time privacy propagation
* 364 Error handling violated privacy constraints
* 365 Extended privacy/data exchange constraints
SYSTEMDS-370 Lossy Compression Blocks
* 361 ColGroup Quantization
* 362 ColGroup Base Data change (from Double to ??)
SYSTEMDS-380 Memory Footprint
* 371 Matrix Block Memory footprint update
SYSTEMDS-390 New Builtin Functions IV
* 391 New GLM builtin function (from algorithms) OK
* 392 Builtin function for missing value imputation via FDs OK
* 393 Builtin to find Connected Components of a graph OK
* 394 Builtin for one-hot encoding of matrix (not frame), see table OK
* 395 SVM rework and utils (confusionMatrix, msvmPredict) OK
* 396 Builtin for counting number of distinct values OK
SYSTEMDS-400 Spark Backend Improvements
* 401 Fix output block indexes of rdiag (diagM2V) OK
SYSTEMDS-410 Lineage Tracing, Reuse and Integration II
* 411 Improved handling of multi-level cache duplicates OK
Others:
* Break append instruction to cbind and rbind