[SYSTEMDS-2834] Python I/O Benchmarking

This commit extends the performance benchmarks to include a python
benchmark for the transfer of data from the Python API into and out of
systemds. Results include:

double:    read.dml;        40.781715454
double:    load_native.py;  39.19094614699134
int:       read.dml;        32.824596657
int:       load_native.py;  36.457156577002024
string:    read.dml;        34.440663763
string:    load_native.py;  38.71029913998791
boolean:   read.dml;        33.266684618
boolean:   load_native.py;  36.68671202700352
double:    load_numpy.py;   32.85507999898982
double:    load_pandas.py;  512.6433556610136
float:     load_numpy.py;   38.261559439997654
float:     load_pandas.py;  546.0650390849914
long:      load_numpy.py;   39.400702337006805
long:      load_pandas.py;  536.5950958920002
int64:     load_numpy.py;   32.98173662999761
int64:     load_pandas.py;  487.0634801320266
int32:     load_numpy.py;   32.48500068101566
int32:     load_pandas.py;  489.97116349000135
uint8:     load_numpy.py;   31.86706029099878
uint8:     load_pandas.py;  496.9151880980062
string:    load_pandas.py;  504.3096235789999
bool:      load_numpy.py;   33.19832509398111
bool:      load_pandas.py;  479.9256292580103

Pandas reading and writing is underperforming and need to be
refined, while numpy transfer is on par with normal reads.
Both instances indicate potentials for improvements, especially
pandas.

Closes #1847
diff --git a/scripts/perftest/README.md b/scripts/perftest/README.md
index 4493939..14ea405 100755
--- a/scripts/perftest/README.md
+++ b/scripts/perftest/README.md
@@ -17,18 +17,35 @@
 {% end comment %}
 -->
 
-# Performance tests SystemDS
+# Performance Tests SystemDS
 
-To run all performance tests for SystemDS, simply download systemds, install the prerequisites and execute.
+To run all performance tests for SystemDS:
+ * install systemds,
+ * install the prerequisites,
+ * navigate to the perftest directory $`cd $SYSTEMDS_ROOT/scripts/perftest` 
+ * generate the data,
+ * and execute.
 
 There are a few prerequisites:
 
+## Install SystemDS
+
 - First follow the install guide: <http://apache.github.io/systemds/site/install> and build the project.
+- Install the python package for python api benchmarks: <https://apache.github.io/systemds/api/python/getting_started/install.html>
+- Prepare to run SystemDS: <https://apache.github.io/systemds/site/run>
+
+## Install Additional Prerequisites
 - Setup Intel MKL: <http://apache.github.io/systemds/site/run>
 - Setup OpenBlas: <https://github.com/xianyi/OpenBLAS/wiki/Precompiled-installation-packages>
 - Install Perf stat: <https://linoxide.com/linux-how-to/install-perf-tool-centos-ubuntu/>
 
-## NOTE THE SCRIPT HAS TO BE RUN FROM THE PERFTEST FOLDER
+## Generate Test Data
+
+Using the scripts found in `$SYSTEMDS_ROOT/scripts/perftest/datagen`, generate the data for the tests you want to run. Note the sometimes optional and other times required parameters/args. Dataset size is likely the most important of these.
+
+## Run the Benchmarks
+
+**Reminder: The scripts should be run from the perftest folder.**
 
 Examples:
 
@@ -36,7 +53,7 @@
 ./runAll.sh
 ```
 
-Look inside the runAll script to see how to run individual tests.
+Or look inside the runAll script to see how to run individual tests.
 
-Time calculations in the bash scripts additionally subtract a number, e.g. ".4".
+Time calculations in the bash scripts may additionally subtract a number, e.g. ".4".
 This is done to accommodate for time lost by shell script and JVM startup overheads, to match the actual application runtime of SystemML.
diff --git a/scripts/perftest/datagen/genClusteringData.sh b/scripts/perftest/datagen/genClusteringData.sh
index 9fb1e9d..35c49aa 100755
--- a/scripts/perftest/datagen/genClusteringData.sh
+++ b/scripts/perftest/datagen/genClusteringData.sh
@@ -25,9 +25,9 @@
   exit 1;
 fi
 
-CMD=$1
-BASE=$2/clustering
-MAXMEM=$3
+CMD=${1:-systemds}
+BASE=${2:-"temp"}/clustering
+MAXMEM=${3:-80}
 
 FORMAT="binary" 
 DENSE_SP=0.9
diff --git a/scripts/perftest/datagen/genDimensionReductionData.sh b/scripts/perftest/datagen/genDimensionReductionData.sh
index 1207a0d..2f6cc21 100755
--- a/scripts/perftest/datagen/genDimensionReductionData.sh
+++ b/scripts/perftest/datagen/genDimensionReductionData.sh
@@ -25,9 +25,9 @@
   exit 1;
 fi
 
-CMD=$1
-BASE=$2/dimensionreduction
-MAXMEM=$3
+CMD=${1:-systemds}
+BASE=${2:-"temp"}/dimensionreduction
+MAXMEM=${3:-80}
 
 FORMAT="binary"
 
diff --git a/scripts/perftest/datagen/genIOData.sh b/scripts/perftest/datagen/genIOData.sh
new file mode 100755
index 0000000..46154f8
--- /dev/null
+++ b/scripts/perftest/datagen/genIOData.sh
@@ -0,0 +1,72 @@
+#!/bin/bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+# 
+#   http://www.apache.org/licenses/LICENSE-2.0
+# 
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+if [ "$(basename $PWD)" != "perftest" ];
+then
+  echo "Please execute scripts from directory 'perftest'"
+  exit 1;
+fi
+
+CMD=${1:-systemds}
+DATADIR=${2:-"temp"}/io
+MAXMEM=${3:-1}
+
+FORMAT="csv" # can be csv, mm, text, binary
+
+echo "-- Generating IO data." >> results/times.txt;
+
+
+#generate XS scenarios (10MB)
+if [ $MAXMEM -ge 1 ]; then
+  ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X500_250_dense R=500 C=250 Fmt=$FORMAT &
+fi
+
+#generate XS scenarios (10MB)
+if [ $MAXMEM -ge 10 ]; then
+  ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X5k_250_dense R=5000 C=250 Fmt=$FORMAT &
+fi
+
+#generate XS scenarios (80MB)
+if [ $MAXMEM -ge 80 ]; then
+  ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X10k_1k_dense R=10000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate S scenarios (800MB)
+if [ $MAXMEM -ge 800 ]; then
+  ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X100k_1k_dense R=100000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate M scenarios (8GB)
+if [ $MAXMEM -ge 8000 ]; then
+  ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X1M_1k_dense R=1000000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate L scenarios (80GB)
+if [ $MAXMEM -ge 80000 ]; then
+  ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X10M_1k_dense R=10000000 C=1000 Fmt=$FORMAT &
+fi
+
+#generate XL scenarios (800GB)
+if [ $MAXMEM -ge 800000 ]; then
+  ${CMD} -f ../utils/generateData.dml --nvargs Path=${DATADIR}/X100M_1k_dense R=100000000 C=1000 Fmt=$FORMAT &
+fi
+
+wait
diff --git a/scripts/perftest/python/io/load_native.py b/scripts/perftest/python/io/load_native.py
new file mode 100644
index 0000000..aa1d891
--- /dev/null
+++ b/scripts/perftest/python/io/load_native.py
@@ -0,0 +1,56 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+import argparse
+import timeit
+
+
+setup = "\n".join(
+    [
+        "from systemds.context import SystemDSContext",
+        "from systemds.script_building.script import DMLScript",
+    ]
+)
+
+
+run = "\n".join(
+    [
+        "with SystemDSContext(logging_level=10, py4j_logging_level=50) as ctx:",
+        "    node = ctx.read(src)",
+        "    script = DMLScript(ctx)",
+        "    script.build_code(node)",
+        "    script.execute()",
+    ]
+)
+
+
+def main(args):
+    gvars = {"src": args.src}
+    print(timeit.timeit(run, setup, globals=gvars, number=args.number))
+
+
+if __name__ == "__main__":
+    description = "Benchmarks time spent loading data into systemds"
+    parser = argparse.ArgumentParser(description=description)
+    parser.add_argument("src")
+    parser.add_argument("number", type=int, help="number of times to load the data")
+    args = parser.parse_args()
+    main(args)
diff --git a/scripts/perftest/python/io/load_numpy.py b/scripts/perftest/python/io/load_numpy.py
new file mode 100644
index 0000000..8bd489b
--- /dev/null
+++ b/scripts/perftest/python/io/load_numpy.py
@@ -0,0 +1,89 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+
+import argparse
+import timeit
+
+setup = "\n".join(
+    [
+        "from systemds.context import SystemDSContext",
+        "from systemds.script_building.script import DMLScript",
+        "import numpy as np",
+        "array = np.loadtxt(src, delimiter=',')",
+        "if dtype is not None:",
+        "    array = array.astype(dtype)",
+    ]
+)
+
+
+run = "\n".join(
+    [
+        "with SystemDSContext(logging_level=10, py4j_logging_level=50) as ctx:",
+        "    matrix_from_np = ctx.from_numpy(array)",
+        "    script = DMLScript(ctx)",
+        "    script.add_input_from_python('test', matrix_from_np)",
+        "    script.execute()",
+    ]
+)
+
+
+dtype_choices = [
+    "double",
+    "float",
+    "long",
+    "int8",
+    "int16",
+    "int32",
+    "int64",
+    "uint8",
+    "uint16",
+    "uint32",
+    "uint64",
+    "float32",
+    "float64",
+    "string",
+    "bool",
+]
+
+
+def main(args):
+    gvars = {"src": args.src, "dtype": args.dtype}
+    print(timeit.timeit(run, setup, globals=gvars, number=args.number))
+
+
+if __name__ == "__main__":
+    description = "Benchmarks time spent loading data into systemds"
+    parser = argparse.ArgumentParser(description=description)
+    parser.add_argument("src")
+    parser.add_argument("number", type=int, help="number of times to load the data")
+    help_force_dtype = (
+        "optionally cast all columns to one of the dtype choices in numpy"
+    )
+    parser.add_argument(
+        "--dtype",
+        choices=dtype_choices,
+        required=False,
+        default=None,
+        help=help_force_dtype,
+    )
+    args = parser.parse_args()
+    main(args)
diff --git a/scripts/perftest/python/io/load_pandas.py b/scripts/perftest/python/io/load_pandas.py
new file mode 100644
index 0000000..30714ca
--- /dev/null
+++ b/scripts/perftest/python/io/load_pandas.py
@@ -0,0 +1,87 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+import argparse
+import timeit
+
+setup = "\n".join(
+    [
+        "from systemds.context import SystemDSContext",
+        "from systemds.script_building.script import DMLScript",
+        "import pandas as pd",
+        "df = pd.read_csv(src, header=None)",
+        "if dtype is not None:",
+        "    df = df.astype(dtype)",
+    ]
+)
+
+
+run = "\n".join(
+    [
+        "with SystemDSContext(logging_level=10, py4j_logging_level=50) as ctx:",
+        "    frame_from_pandas = ctx.from_pandas(df)",
+        "    script = DMLScript(ctx)",
+        "    script.add_input_from_python('test', frame_from_pandas)",
+        "    script.execute()",
+    ]
+)
+
+dtype_choices = [
+    "double",
+    "float",
+    "long",
+    "int8",
+    "int16",
+    "int32",
+    "int64",
+    "uint8",
+    "uint16",
+    "uint32",
+    "uint64",
+    "float32",
+    "float64",
+    "string",
+    "bool",
+]
+
+
+def main(args):
+    gvars = {"src": args.src, "dtype": args.dtype}
+    print(timeit.timeit(run, setup, globals=gvars, number=args.number))
+
+
+if __name__ == "__main__":
+    description = "Benchmarks time spent loading data into systemds"
+    parser = argparse.ArgumentParser(description=description)
+    parser.add_argument("src")
+    parser.add_argument("number", type=int, help="number of times to load the data")
+    help_force_dtype = (
+        "optionally cast all columns to one of the dtype choices in pandas"
+    )
+    parser.add_argument(
+        "--dtype",
+        choices=dtype_choices,
+        required=False,
+        default=None,
+        help=help_force_dtype,
+    )
+    args = parser.parse_args()
+    main(args)
diff --git a/scripts/perftest/runAll.sh b/scripts/perftest/runAll.sh
index db31559..9b20606 100755
--- a/scripts/perftest/runAll.sh
+++ b/scripts/perftest/runAll.sh
@@ -127,6 +127,9 @@
 ./runAllDimensionReduction.sh ${CMD} ${TEMPFOLDER} ${MAXMEM}
 ./runAllALS.sh ${CMD} ${TEMPFOLDER} ${MAXMEM}
 
+### IO Benchmarks:
+./runAllIO.sh ${CMD} ${TEMPFOLDER} ${MAXMEM}
+
 # TODO The following benchmarks have yet to be written. The decision tree algorithms additionally need to be fixed.
 # add stepwise Linear 
 # add stepwise GLM
diff --git a/scripts/perftest/runAllDimensionReduction.sh b/scripts/perftest/runAllDimensionReduction.sh
index e154926..03955fc 100755
--- a/scripts/perftest/runAllDimensionReduction.sh
+++ b/scripts/perftest/runAllDimensionReduction.sh
@@ -25,9 +25,9 @@
   exit 1;
 fi
 
-COMMAND=$1
-BASE=$2/dimensionreduction
-MAXMEM=$3
+CMD=${1:-systemds}
+BASE=${2:-"temp"}/dimensionreduction
+MAXMEM=${3:-80}
 
 FILENAME=$0
 err_report() {
diff --git a/scripts/perftest/runAllIO.sh b/scripts/perftest/runAllIO.sh
new file mode 100755
index 0000000..8e321a7
--- /dev/null
+++ b/scripts/perftest/runAllIO.sh
@@ -0,0 +1,82 @@
+#!/bin/bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+if [ "$(basename $PWD)" != "perftest" ];
+then
+  echo "Please execute scripts from directory 'perftest'"
+  exit 1;
+fi
+
+CMD=${1:-"systemds"}
+DATADIR=${2:-"temp"}/io
+MAXMEM=${3:-1}
+REPEATS=${4:-1}
+
+DATA=()
+if [ $MAXMEM -ge 1 ]; then DATA+=("500_250_dense"); fi
+if [ $MAXMEM -ge 10 ]; then DATA+=("5k_250_dense"); fi
+if [ $MAXMEM -ge 80 ]; then DATA+=("10k_1k_dense"); fi
+if [ $MAXMEM -ge 800 ]; then DATA+=("100k_1k_dense"); fi
+if [ $MAXMEM -ge 8000 ]; then DATA+=("1M_1k_dense"); fi
+if [ $MAXMEM -ge 80000 ]; then DATA+=("10M_1k_dense"); fi
+if [ $MAXMEM -ge 800000 ]; then DATA+=("100M_1k_dense"); fi
+
+echo "RUN IO Benchmarks: " $(date) >> results/times.txt;
+
+execute_python_script () {
+  script=$1
+  input=$2
+  repeats=$3
+  DTYPE=$4
+  printf "%-16s " "${script}; " >> results/times.txt;
+  if [ -z "$DTYPE" ]; then
+    TIME_IO=$(python ./python/io/${script} ${input} ${repeats});
+  else
+    TIME_IO=$(python ./python/io/${script} ${input} ${repeats} --dtype ${DTYPE});
+  fi
+  printf "%s\n" "$TIME_IO" >> results/times.txt
+}
+
+for d in ${DATA[@]}
+do
+  echo "-- Running IO benchmarks on "$d >> results/times.txt;
+  DATAFILE="$DATADIR/X$d"
+  F="runIO.sh" 
+  for vtype in "double" "int" "string" "boolean"
+  do
+    . ./$F $CMD $DATAFILE $REPEATS $vtype
+    cp "${DATAFILE}.mtd" "${DATAFILE}.mtd.backup" 
+    sed -i "s/\"value_type\":.*$/\"value_type\": \"${vtype}\",/" "${DATAFILE}.mtd"
+    printf "%-10s " "${vtype}: " >> results/times.txt;
+    execute_python_script "load_native.py" $DATAFILE $REPEATS
+    rm "${DATAFILE}.mtd"
+    mv "${DATAFILE}.mtd.backup" "${DATAFILE}.mtd"
+  done
+  for vtype in "double" "float" "long" "int64" "int32" "uint8" "string" "bool"
+  do
+    printf "%-10s " "${vtype}: " >> results/times.txt;
+    execute_python_script "load_numpy.py" $DATAFILE $REPEATS $vtype
+    printf "%-10s " "${vtype}: " >> results/times.txt;
+    execute_python_script "load_pandas.py" $DATAFILE $REPEATS $vtype
+  done
+done
+
+echo -e "\n\n" >> results/times.txt
diff --git a/scripts/perftest/runAllMultinomial.sh b/scripts/perftest/runAllMultinomial.sh
index 1078c20..2b878d2 100755
--- a/scripts/perftest/runAllMultinomial.sh
+++ b/scripts/perftest/runAllMultinomial.sh
@@ -31,7 +31,6 @@
 
 if [ "$TEMPFOLDER" == "" ]; then TEMPFOLDER=temp ; fi
 BASE=${TEMPFOLDER}/multinomial
-BASE0=${TEMPFOLDER}/binomial
 MAXITR=20
 
 FILENAME=$0
diff --git a/scripts/perftest/runIO.sh b/scripts/perftest/runIO.sh
new file mode 100755
index 0000000..15df746
--- /dev/null
+++ b/scripts/perftest/runIO.sh
@@ -0,0 +1,54 @@
+#!/usr/bin/env bash
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+if [ "$(basename $PWD)" != "perftest" ];
+then
+  echo "Please execute scripts from directory 'perftest'"
+  exit 1;
+fi
+
+
+CMD=$1
+DATA=$2
+REPEAT=${3:-1}
+VTYPE=${4:-"double"}
+DTYPE=${5:-"matrix"}
+
+cp "${DATA}.mtd" "${DATA}.mtd.backup"
+sed -i "s/\"data_type\":.*$/\"data_type\": \"${DTYPE}\",/" "${DATA}.mtd"
+sed -i "s/\"value_type\":.*$/\"value_type\": \"${VTYPE}\",/" "${DATA}.mtd"
+tstart=$(date +%s.%N)
+printf "%-10s " "$VTYPE: " >> results/times.txt;
+printf "%-16s " "read.dml; " >> results/times.txt;
+for n in $(seq $REPEAT)
+do
+  ${CMD} -f ./scripts/read.dml \
+    --config conf/SystemDS-config.xml \
+    --stats \
+    --nvargs INPUT="$DATA"
+done
+
+duration=$(echo "$(date +%s.%N) - $tstart" | bc)
+printf "%s\n" "$duration" >> results/times.txt
+rm "${DATA}.mtd"
+mv "${DATA}.mtd.backup" "${DATA}.mtd"
+
diff --git a/scripts/perftest/runPCA.sh b/scripts/perftest/runPCA.sh
index 66fd356..fdb56d4 100755
--- a/scripts/perftest/runPCA.sh
+++ b/scripts/perftest/runPCA.sh
@@ -27,7 +27,7 @@
   exit 1;
 fi
 
-CMD=$3
+CMD=${3:-systemds}
 BASE=$2
 
 tstart=$(date +%s.%N)
diff --git a/scripts/perftest/scripts/read.dml b/scripts/perftest/scripts/read.dml
new file mode 100644
index 0000000..e391926
--- /dev/null
+++ b/scripts/perftest/scripts/read.dml
@@ -0,0 +1,22 @@
+#-------------------------------------------------------------
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+#-------------------------------------------------------------
+
+data = read($INPUT);
diff --git a/scripts/utils/generateData.dml b/scripts/utils/generateData.dml
index fd13934..11a6e70 100644
--- a/scripts/utils/generateData.dml
+++ b/scripts/utils/generateData.dml
@@ -45,6 +45,7 @@
 maxVal = ifdef($Max, 10)
 pdFunc = ifdef($Pdf, "uniform")
 pathUse = ifdef($Path, "/user/bigr/randomData")
+format = ifdef($Fmt, "csv")
 
 A = rand(rows=numRows, cols=numCols, sparsity=sparsityParam, min=minVal, max=maxVal, pdf="uniform");
-write(A, pathUse, format="csv");
+write(A, pathUse, format=format);
diff --git a/src/main/python/systemds/utils/converters.py b/src/main/python/systemds/utils/converters.py
index 3d0c8cf..136e347 100644
--- a/src/main/python/systemds/utils/converters.py
+++ b/src/main/python/systemds/utils/converters.py
@@ -135,6 +135,11 @@
 
 
 def frame_block_to_pandas(sds, fb: JavaObject):
+    """Converts a FrameBlock object in the JVM to a pandas dataframe.
+
+    :param sds: The current systemds context.
+    :param fb: A pointer to the JVM's FrameBlock object.
+    """
 
     num_rows = fb.getNumRows()
     num_cols = fb.getNumColumns()