commit | bc133b978d6dab91fafdb777738b601acdb897db | [log] [tgz] |
---|---|---|
author | Hua Jiang <huaj@xilinx.com> | Fri Nov 01 20:29:54 2019 -0700 |
committer | Thierry Moreau <moreau@uw.edu> | Fri Nov 01 20:29:54 2019 -0700 |
tree | fc2faf29a0894043f87098d2f3ad33220f2159a7 | |
parent | 0b530715eed08579c8c433ad994991c457fd41f9 [diff] |
[VTA] Performance optimize, remove unnecessary contigious memory use. (#4246) * [VTA] Performance optimize, remove unnecessary contigious memory use. Issue: Uop maintain a cache vector to copy uop data into contigious DRAM memory for FPGA/Simulator use, but this cache vector not get clear after FPGA/Simulator core run, in Resnet18 case, if we printf the cache size in UopQueue::ReadBarrier function, we can saw such cache size keep increase, this would cause no use data copy and unnecessary contigous DRAM memory malloc. Analysis: This issue caused by not clear cache_ vector when do uop_queue_.Reset(). Solution: Override BaseQueue Reset function in UopQueue and add cache_ clear logic. * address review comments, remove spacing.
VTA (versatile tensor accelerator) is an open-source deep learning accelerator complemented with an end-to-end TVM-based compiler stack.
The key features of VTA include:
Learn more about VTA here.