Update ci bulk ingest docs (#98)

commit: a37fb19c16701232d51527dbe913b212710282ff [log] [tgz]
author: Keith Turner <kturner@apache.org> Thu Jul 18 20:37:52 2019 -0400
committer: Christopher Tubbs <ctubbsii@apache.org> Thu Jul 18 20:37:52 2019 -0400
tree: c8a2ed4162e3cb7bfba0c34571e3be7171ed4dbe
parent: 5a0ac28a89fc4ed9fa7e0cdf633f9d065a6a0cbb [diff]
diff --git a/docs/bulk-test.md b/docs/bulk-test.md
index df5dc45..da8f163 100644
--- a/docs/bulk-test.md
+++ b/docs/bulk-test.md

@@ -8,6 +8,12 @@
 # create the ci table if necessary
 ./bin/cingest createtable
 
+# Optionally, consider lowering the split threshold to make splits happen more 
+# frequently while the test runs.  Choose a threshold base on the amount of data
+# being imported and the desired number of splits.
+# 
+#   accumulo shell -u root -p secret -e 'config -t ci -s table.split.threshold=32M'
+
 for i in $(seq 1 10); do
    # run map reduce job to generate data for bulk import
    ./bin/cingest bulk /tmp/bt/$i
@@ -47,3 +53,13 @@
 scan -t accumulo.metadata -c loaded
 ```
 
+The counts (add referenced and unrefrenced) output by `cingest verify` should equal :
+
+```
+test.ci.bulk.map.task * test.ci.bulk.map.nodes * num_bulk_generate_jobs
+``` 
+
+Its possible the counts could be slightly smaller because of collisions. However collisions 
+are unlikely with the default settings given there are 63 bits of randomness in the row and 
+30 bits in the column.  This gives a total of 93 bits of randomness per key.
+
commit	a37fb19c16701232d51527dbe913b212710282ff	[log] [tgz]
author	Keith Turner <kturner@apache.org>	Thu Jul 18 20:37:52 2019 -0400
committer	Christopher Tubbs <ctubbsii@apache.org>	Thu Jul 18 20:37:52 2019 -0400
tree	c8a2ed4162e3cb7bfba0c34571e3be7171ed4dbe
parent	5a0ac28a89fc4ed9fa7e0cdf633f9d065a6a0cbb [diff]