[VTA] Performance optimize, remove unnecessary contigious memory use. (#4246)

* [VTA] Performance optimize, remove unnecessary contigious memory use.

Issue:
Uop maintain a cache vector to copy uop data into contigious DRAM memory for
FPGA/Simulator use, but this cache vector not get clear after FPGA/Simulator
core run, in Resnet18 case, if we printf the cache size in UopQueue::ReadBarrier
function, we can saw such cache size keep increase, this would cause
no use data copy and unnecessary contigous DRAM memory malloc.

Analysis:
This issue caused by not clear cache_ vector when do
uop_queue_.Reset().

Solution:
Override BaseQueue Reset function in UopQueue and add cache_ clear
logic.

* address review comments, remove spacing.
diff --git a/src/runtime.cc b/src/runtime.cc
index cbb819b..79fc0e2 100644
--- a/src/runtime.cc
+++ b/src/runtime.cc
@@ -348,7 +348,7 @@
    * \brief Reset the pointer of the buffer.
    *  Set SRAM pointer to be the current end.
    */
-  void Reset() {
+  virtual void Reset() {
     dram_buffer_.clear();
     sram_begin_ = sram_end_;
   }
@@ -443,6 +443,12 @@
       sram_begin_ = sram_end_;
     }
   }
+  /*! \brief clear cache and reset base queue buffer.*/
+  void Reset() {
+    cache_.clear();
+    cache_idx_ = 0;
+    BaseQueue<VTAUop>::Reset();
+  }
   void AutoReadBarrier() {
     ReadBarrier();
   }