commit	bc133b978d6dab91fafdb777738b601acdb897db	[log] [tgz]
author	Hua Jiang <huaj@xilinx.com>	Fri Nov 01 20:29:54 2019 -0700
committer	Thierry Moreau <moreau@uw.edu>	Fri Nov 01 20:29:54 2019 -0700
tree	fc2faf29a0894043f87098d2f3ad33220f2159a7
parent	0b530715eed08579c8c433ad994991c457fd41f9 [diff]

[VTA] Performance optimize, remove unnecessary contigious memory use. (#4246)

* [VTA] Performance optimize, remove unnecessary contigious memory use.

Issue:
Uop maintain a cache vector to copy uop data into contigious DRAM memory for
FPGA/Simulator use, but this cache vector not get clear after FPGA/Simulator
core run, in Resnet18 case, if we printf the cache size in UopQueue::ReadBarrier
function, we can saw such cache size keep increase, this would cause
no use data copy and unnecessary contigous DRAM memory malloc.

Analysis:
This issue caused by not clear cache_ vector when do
uop_queue_.Reset().

Solution:
Override BaseQueue Reset function in UopQueue and add cache_ clear
logic.

* address review comments, remove spacing.

src/runtime.cc[diff]

1 file changed

tree: fc2faf29a0894043f87098d2f3ad33220f2159a7

README.md

VTA: Open, Modular, Deep Learning Accelerator Stack

VTA (versatile tensor accelerator) is an open-source deep learning accelerator complemented with an end-to-end TVM-based compiler stack.

The key features of VTA include:

Generic, modular, open-source hardware
- Streamlined workflow to deploy to FPGAs.
- Simulator support to prototype compilation passes on regular workstations.
Driver and JIT runtime for both simulator and FPGA hardware back-end.
End-to-end TVM stack integration
- Direct optimization and deployment of models from deep learning frameworks via TVM.
- Customized and extensible TVM compiler back-end.
- Flexible RPC support to ease deployment, and program FPGAs with the convenience of Python.

Learn more about VTA here.