[SYSTEMDS-2834] Python I/O Benchmarking
This commit extends the performance benchmarks to include a python
benchmark for the transfer of data from the Python API into and out of
systemds. Results include:
double: read.dml; 40.781715454
double: load_native.py; 39.19094614699134
int: read.dml; 32.824596657
int: load_native.py; 36.457156577002024
string: read.dml; 34.440663763
string: load_native.py; 38.71029913998791
boolean: read.dml; 33.266684618
boolean: load_native.py; 36.68671202700352
double: load_numpy.py; 32.85507999898982
double: load_pandas.py; 512.6433556610136
float: load_numpy.py; 38.261559439997654
float: load_pandas.py; 546.0650390849914
long: load_numpy.py; 39.400702337006805
long: load_pandas.py; 536.5950958920002
int64: load_numpy.py; 32.98173662999761
int64: load_pandas.py; 487.0634801320266
int32: load_numpy.py; 32.48500068101566
int32: load_pandas.py; 489.97116349000135
uint8: load_numpy.py; 31.86706029099878
uint8: load_pandas.py; 496.9151880980062
string: load_pandas.py; 504.3096235789999
bool: load_numpy.py; 33.19832509398111
bool: load_pandas.py; 479.9256292580103
Pandas reading and writing is underperforming and need to be
refined, while numpy transfer is on par with normal reads.
Both instances indicate potentials for improvements, especially
pandas.
Closes #1847
diff --git a/scripts/perftest/README.md b/scripts/perftest/README.md
index 4493939..14ea405 100755
--- a/scripts/perftest/README.md
+++ b/scripts/perftest/README.md
@@ -17,18 +17,35 @@
{% end comment %}
-->
-# Performance tests SystemDS
+# Performance Tests SystemDS
-To run all performance tests for SystemDS, simply download systemds, install the prerequisites and execute.
+To run all performance tests for SystemDS:
+ * install systemds,
+ * install the prerequisites,
+ * navigate to the perftest directory $`cd $SYSTEMDS_ROOT/scripts/perftest`
+ * generate the data,
+ * and execute.
There are a few prerequisites:
+## Install SystemDS
+
- First follow the install guide: <http://apache.github.io/systemds/site/install> and build the project.
+- Install the python package for python api benchmarks: <https://apache.github.io/systemds/api/python/getting_started/install.html>
+- Prepare to run SystemDS: <https://apache.github.io/systemds/site/run>
+
+## Install Additional Prerequisites
- Setup Intel MKL: <http://apache.github.io/systemds/site/run>
- Setup OpenBlas: <https://github.com/xianyi/OpenBLAS/wiki/Precompiled-installation-packages>
- Install Perf stat: <https://linoxide.com/linux-how-to/install-perf-tool-centos-ubuntu/>
-## NOTE THE SCRIPT HAS TO BE RUN FROM THE PERFTEST FOLDER
+## Generate Test Data
+
+Using the scripts found in `$SYSTEMDS_ROOT/scripts/perftest/datagen`, generate the data for the tests you want to run. Note the sometimes optional and other times required parameters/args. Dataset size is likely the most important of these.
+
+## Run the Benchmarks
+
+**Reminder: The scripts should be run from the perftest folder.**
Examples:
@@ -36,7 +53,7 @@
./runAll.sh
```
-Look inside the runAll script to see how to run individual tests.
+Or look inside the runAll script to see how to run individual tests.
-Time calculations in the bash scripts additionally subtract a number, e.g. ".4".
+Time calculations in the bash scripts may additionally subtract a number, e.g. ".4".
This is done to accommodate for time lost by shell script and JVM startup overheads, to match the actual application runtime of SystemML.
diff --git a/scripts/perftest/datagen/genClusteringData.sh b/scripts/perftest/datagen/genClusteringData.sh
index 9fb1e9d..35c49aa 100755
--- a/scripts/perftest/datagen/genClusteringData.sh
+++ b/scripts/perftest/datagen/genClusteringData.sh
@@ -25,9 +25,9 @@
exit 1;
fi
-CMD=$1
-BASE=$2/clustering
-MAXMEM=$3
+CMD=${1:-systemds}
+BASE=${2:-"temp"}/clustering
+MAXMEM=${3:-80}
FORMAT="binary"
DENSE_SP=0.9
diff --git a/scripts/perftest/datagen/genDimensionReductionData.sh b/scripts/perftest/datagen/genDimensionReductionData.sh
index 1207a0d..2f6cc21 100755
--- a/scripts/perftest/datagen/genDimensionReductionData.sh
+++ b/scripts/perftest/datagen/genDimensionReductionData.sh
@@ -25,9 +25,9 @@
exit 1;
fi
-CMD=$1
-BASE=$2/dimensionreduction
-MAXMEM=$3
+CMD=${1:-systemds}
+BASE=${2:-"temp"}/dimensionreduction
+MAXMEM=${3:-80}
FORMAT="binary"
diff --git a/scripts/perftest/datagen/genIOData.sh b/scripts/perftest/datagen/genIOData.sh
new file mode 100755
index 0000000..46154f8
--- /dev/null
+++ b/scripts/perftest/datagen/genIOData.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+if [ "$(basename $PWD)" != "perftest" ];
+then
+ echo "Please execute scripts from directory 'perftest'"
+ exit 1;
+fi
+
+CMD=${1:-systemds}
+DATADIR=${2:-"temp"}/io
+MAXMEM=${3:-1}
+
+FORMAT="csv" # can be csv, mm, text, binary
+
+echo "-- Generating IO data." >> results/times.txt;
+
+
+#generate XS scenarios (10MB)
+if [ $MAXMEM -ge 1 ]; then
+ ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X500_250_dense R=500 C=250 Fmt=$FORMAT &
+fi
+
+#generate XS scenarios (10MB)
+if [ $MAXMEM -ge 10 ]; then
+ ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X5k_250_dense R=5000 C=250 Fmt=$FORMAT &
+fi
+
+#generate XS scenarios (80MB)
+if [ $MAXMEM -ge 80 ]; then
+ ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X10k_1k_dense R=10000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate S scenarios (800MB)
+if [ $MAXMEM -ge 800 ]; then
+ ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X100k_1k_dense R=100000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate M scenarios (8GB)
+if [ $MAXMEM -ge 8000 ]; then
+ ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X1M_1k_dense R=1000000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate L scenarios (80GB)
+if [ $MAXMEM -ge 80000 ]; then
+ ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X10M_1k_dense R=10000000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate XL scenarios (800GB)
+if [ $MAXMEM -ge 800000 ]; then
+ ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X100M_1k_dense R=100000000 C=1000 Fmt=$FORMAT &
+fi
+
+wait
diff --git a/scripts/perftest/python/io/load_native.py b/scripts/perftest/python/io/load_native.py
new file mode 100644
index 0000000..aa1d891
--- /dev/null
+++ b/scripts/perftest/python/io/load_native.py
@@ -0,0 +1,56 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+import argparse
+import timeit
+
+
+setup = "\n".join(
+ [
+ "from systemds.context import SystemDSContext",
+ "from systemds.script_building.script import DMLScript",
+ ]
+)
+
+
+run = "\n".join(
+ [
+ "with SystemDSContext(logging_level=10, py4j_logging_level=50) as ctx:",
+ " node = ctx.read(src)",
+ " script = DMLScript(ctx)",
+ " script.build_code(node)",
+ " script.execute()",
+ ]
+)
+
+
+def main(args):
+ gvars = {"src": args.src}
+ print(timeit.timeit(run, setup, globals=gvars, number=args.number))
+
+
+if __name__ == "__main__":
+ description = "Benchmarks time spent loading data into systemds"
+ parser = argparse.ArgumentParser(description=description)
+ parser.add_argument("src")
+ parser.add_argument("number", type=int, help="number of times to load the data")
+ args = parser.parse_args()
+ main(args)
diff --git a/scripts/perftest/python/io/load_numpy.py b/scripts/perftest/python/io/load_numpy.py
new file mode 100644
index 0000000..8bd489b
--- /dev/null
+++ b/scripts/perftest/python/io/load_numpy.py
@@ -0,0 +1,89 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+
+import argparse
+import timeit
+
+setup = "\n".join(
+ [
+ "from systemds.context import SystemDSContext",
+ "from systemds.script_building.script import DMLScript",
+ "import numpy as np",
+ "array = np.loadtxt(src, delimiter=',')",
+ "if dtype is not None:",
+ " array = array.astype(dtype)",
+ ]
+)
+
+
+run = "\n".join(
+ [
+ "with SystemDSContext(logging_level=10, py4j_logging_level=50) as ctx:",
+ " matrix_from_np = ctx.from_numpy(array)",
+ " script = DMLScript(ctx)",
+ " script.add_input_from_python('test', matrix_from_np)",
+ " script.execute()",
+ ]
+)
+
+
+dtype_choices = [
+ "double",
+ "float",
+ "long",
+ "int8",
+ "int16",
+ "int32",
+ "int64",
+ "uint8",
+ "uint16",
+ "uint32",
+ "uint64",
+ "float32",
+ "float64",
+ "string",
+ "bool",
+]
+
+
+def main(args):
+ gvars = {"src": args.src, "dtype": args.dtype}
+ print(timeit.timeit(run, setup, globals=gvars, number=args.number))
+
+
+if __name__ == "__main__":
+ description = "Benchmarks time spent loading data into systemds"
+ parser = argparse.ArgumentParser(description=description)
+ parser.add_argument("src")
+ parser.add_argument("number", type=int, help="number of times to load the data")
+ help_force_dtype = (
+ "optionally cast all columns to one of the dtype choices in numpy"
+ )
+ parser.add_argument(
+ "--dtype",
+ choices=dtype_choices,
+ required=False,
+ default=None,
+ help=help_force_dtype,
+ )
+ args = parser.parse_args()
+ main(args)
diff --git a/scripts/perftest/python/io/load_pandas.py b/scripts/perftest/python/io/load_pandas.py
new file mode 100644
index 0000000..30714ca
--- /dev/null
+++ b/scripts/perftest/python/io/load_pandas.py
@@ -0,0 +1,87 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+import argparse
+import timeit
+
+setup = "\n".join(
+ [
+ "from systemds.context import SystemDSContext",
+ "from systemds.script_building.script import DMLScript",
+ "import pandas as pd",
+ "df = pd.read_csv(src, header=None)",
+ "if dtype is not None:",
+ " df = df.astype(dtype)",
+ ]
+)
+
+
+run = "\n".join(
+ [
+ "with SystemDSContext(logging_level=10, py4j_logging_level=50) as ctx:",
+ " frame_from_pandas = ctx.from_pandas(df)",
+ " script = DMLScript(ctx)",
+ " script.add_input_from_python('test', frame_from_pandas)",
+ " script.execute()",
+ ]
+)
+
+dtype_choices = [
+ "double",
+ "float",
+ "long",
+ "int8",
+ "int16",
+ "int32",
+ "int64",
+ "uint8",
+ "uint16",
+ "uint32",
+ "uint64",
+ "float32",
+ "float64",
+ "string",
+ "bool",
+]
+
+
+def main(args):
+ gvars = {"src": args.src, "dtype": args.dtype}
+ print(timeit.timeit(run, setup, globals=gvars, number=args.number))
+
+
+if __name__ == "__main__":
+ description = "Benchmarks time spent loading data into systemds"
+ parser = argparse.ArgumentParser(description=description)
+ parser.add_argument("src")
+ parser.add_argument("number", type=int, help="number of times to load the data")
+ help_force_dtype = (
+ "optionally cast all columns to one of the dtype choices in pandas"
+ )
+ parser.add_argument(
+ "--dtype",
+ choices=dtype_choices,
+ required=False,
+ default=None,
+ help=help_force_dtype,
+ )
+ args = parser.parse_args()
+ main(args)
diff --git a/scripts/perftest/runAll.sh b/scripts/perftest/runAll.sh
index db31559..9b20606 100755
--- a/scripts/perftest/runAll.sh
+++ b/scripts/perftest/runAll.sh
@@ -127,6 +127,9 @@
./runAllDimensionReduction.sh ${CMD} ${TEMPFOLDER} ${MAXMEM}
./runAllALS.sh ${CMD} ${TEMPFOLDER} ${MAXMEM}
+### IO Benchmarks:
+./runAllIO.sh ${CMD} ${TEMPFOLDER} ${MAXMEM}
+
# TODO The following benchmarks have yet to be written. The decision tree algorithms additionally need to be fixed.
# add stepwise Linear
# add stepwise GLM
diff --git a/scripts/perftest/runAllDimensionReduction.sh b/scripts/perftest/runAllDimensionReduction.sh
index e154926..03955fc 100755
--- a/scripts/perftest/runAllDimensionReduction.sh
+++ b/scripts/perftest/runAllDimensionReduction.sh
@@ -25,9 +25,9 @@
exit 1;
fi
-COMMAND=$1
-BASE=$2/dimensionreduction
-MAXMEM=$3
+CMD=${1:-systemds}
+BASE=${2:-"temp"}/dimensionreduction
+MAXMEM=${3:-80}
FILENAME=$0
err_report() {
diff --git a/scripts/perftest/runAllIO.sh b/scripts/perftest/runAllIO.sh
new file mode 100755
index 0000000..8e321a7
--- /dev/null
+++ b/scripts/perftest/runAllIO.sh
@@ -0,0 +1,82 @@
+#!/bin/bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+if [ "$(basename $PWD)" != "perftest" ];
+then
+ echo "Please execute scripts from directory 'perftest'"
+ exit 1;
+fi
+
+CMD=${1:-"systemds"}
+DATADIR=${2:-"temp"}/io
+MAXMEM=${3:-1}
+REPEATS=${4:-1}
+
+DATA=()
+if [ $MAXMEM -ge 1 ]; then DATA+=("500_250_dense"); fi
+if [ $MAXMEM -ge 10 ]; then DATA+=("5k_250_dense"); fi
+if [ $MAXMEM -ge 80 ]; then DATA+=("10k_1k_dense"); fi
+if [ $MAXMEM -ge 800 ]; then DATA+=("100k_1k_dense"); fi
+if [ $MAXMEM -ge 8000 ]; then DATA+=("1M_1k_dense"); fi
+if [ $MAXMEM -ge 80000 ]; then DATA+=("10M_1k_dense"); fi
+if [ $MAXMEM -ge 800000 ]; then DATA+=("100M_1k_dense"); fi
+
+echo "RUN IO Benchmarks: " $(date) >> results/times.txt;
+
+execute_python_script () {
+ script=$1
+ input=$2
+ repeats=$3
+ DTYPE=$4
+ printf "%-16s " "${script}; " >> results/times.txt;
+ if [ -z "$DTYPE" ]; then
+ TIME_IO=$(python ./python/io/${script} ${input} ${repeats});
+ else
+ TIME_IO=$(python ./python/io/${script} ${input} ${repeats} --dtype ${DTYPE});
+ fi
+ printf "%s\n" "$TIME_IO" >> results/times.txt
+}
+
+for d in ${DATA[@]}
+do
+ echo "-- Running IO benchmarks on "$d >> results/times.txt;
+ DATAFILE="$DATADIR/X$d"
+ F="runIO.sh"
+ for vtype in "double" "int" "string" "boolean"
+ do
+ . ./$F $CMD $DATAFILE $REPEATS $vtype
+ cp "${DATAFILE}.mtd" "${DATAFILE}.mtd.backup"
+ sed -i "s/\"value_type\":.*$/\"value_type\": \"${vtype}\",/" "${DATAFILE}.mtd"
+ printf "%-10s " "${vtype}: " >> results/times.txt;
+ execute_python_script "load_native.py" $DATAFILE $REPEATS
+ rm "${DATAFILE}.mtd"
+ mv "${DATAFILE}.mtd.backup" "${DATAFILE}.mtd"
+ done
+ for vtype in "double" "float" "long" "int64" "int32" "uint8" "string" "bool"
+ do
+ printf "%-10s " "${vtype}: " >> results/times.txt;
+ execute_python_script "load_numpy.py" $DATAFILE $REPEATS $vtype
+ printf "%-10s " "${vtype}: " >> results/times.txt;
+ execute_python_script "load_pandas.py" $DATAFILE $REPEATS $vtype
+ done
+done
+
+echo -e "\n\n" >> results/times.txt
diff --git a/scripts/perftest/runAllMultinomial.sh b/scripts/perftest/runAllMultinomial.sh
index 1078c20..2b878d2 100755
--- a/scripts/perftest/runAllMultinomial.sh
+++ b/scripts/perftest/runAllMultinomial.sh
@@ -31,7 +31,6 @@
if [ "$TEMPFOLDER" == "" ]; then TEMPFOLDER=temp ; fi
BASE=${TEMPFOLDER}/multinomial
-BASE0=${TEMPFOLDER}/binomial
MAXITR=20
FILENAME=$0
diff --git a/scripts/perftest/runIO.sh b/scripts/perftest/runIO.sh
new file mode 100755
index 0000000..15df746
--- /dev/null
+++ b/scripts/perftest/runIO.sh
@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+if [ "$(basename $PWD)" != "perftest" ];
+then
+ echo "Please execute scripts from directory 'perftest'"
+ exit 1;
+fi
+
+
+CMD=$1
+DATA=$2
+REPEAT=${3:-1}
+VTYPE=${4:-"double"}
+DTYPE=${5:-"matrix"}
+
+cp "${DATA}.mtd" "${DATA}.mtd.backup"
+sed -i "s/\"data_type\":.*$/\"data_type\": \"${DTYPE}\",/" "${DATA}.mtd"
+sed -i "s/\"value_type\":.*$/\"value_type\": \"${VTYPE}\",/" "${DATA}.mtd"
+tstart=$(date +%s.%N)
+printf "%-10s " "$VTYPE: " >> results/times.txt;
+printf "%-16s " "read.dml; " >> results/times.txt;
+for n in $(seq $REPEAT)
+do
+ ${CMD} -f ./scripts/read.dml \
+ --config conf/SystemDS-config.xml \
+ --stats \
+ --nvargs INPUT="$DATA"
+done
+
+duration=$(echo "$(date +%s.%N) - $tstart" | bc)
+printf "%s\n" "$duration" >> results/times.txt
+rm "${DATA}.mtd"
+mv "${DATA}.mtd.backup" "${DATA}.mtd"
+
diff --git a/scripts/perftest/runPCA.sh b/scripts/perftest/runPCA.sh
index 66fd356..fdb56d4 100755
--- a/scripts/perftest/runPCA.sh
+++ b/scripts/perftest/runPCA.sh
@@ -27,7 +27,7 @@
exit 1;
fi
-CMD=$3
+CMD=${3:-systemds}
BASE=$2
tstart=$(date +%s.%N)
diff --git a/scripts/perftest/scripts/read.dml b/scripts/perftest/scripts/read.dml
new file mode 100644
index 0000000..e391926
--- /dev/null
+++ b/scripts/perftest/scripts/read.dml
@@ -0,0 +1,22 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+data = read($INPUT);
diff --git a/scripts/utils/generateData.dml b/scripts/utils/generateData.dml
index fd13934..11a6e70 100644
--- a/scripts/utils/generateData.dml
+++ b/scripts/utils/generateData.dml
@@ -45,6 +45,7 @@
maxVal = ifdef($Max, 10)
pdFunc = ifdef($Pdf, "uniform")
pathUse = ifdef($Path, "/user/bigr/randomData")
+format = ifdef($Fmt, "csv")
A = rand(rows=numRows, cols=numCols, sparsity=sparsityParam, min=minVal, max=maxVal, pdf="uniform");
-write(A, pathUse, format="csv");
+write(A, pathUse, format=format);
diff --git a/src/main/python/systemds/utils/converters.py b/src/main/python/systemds/utils/converters.py
index 3d0c8cf..136e347 100644
--- a/src/main/python/systemds/utils/converters.py
+++ b/src/main/python/systemds/utils/converters.py
@@ -135,6 +135,11 @@
def frame_block_to_pandas(sds, fb: JavaObject):
+ """Converts a FrameBlock object in the JVM to a pandas dataframe.
+
+ :param sds: The current systemds context.
+ :param fb: A pointer to the JVM's FrameBlock object.
+ """
num_rows = fb.getNumRows()
num_cols = fb.getNumColumns()