commit	36a91576edf633479c78649e050f18dd2ddc8103	[log] [tgz]
author	Anton Sorokin <anton.a.sorokin@intel.com>	Wed Sep 08 18:10:14 2021 -0700
committer	GitHub <noreply@github.com>	Wed Sep 08 18:10:14 2021 -0700
tree	055e1bc8ab88ca73ce415e693d0507c285898d11
parent	5b5375729dab081bff1f4afb929a3d94784200b5 [diff]

VTA Chisel Wide memory interface. (#32)

* VTA Chisel Wide memory interface.
* Added SyncQueue with tests - Implementation uses sync memory to implement larger queues.
* AXI 64/128/256/512 data bits support by AXIParams->dataBits
A wide implementation of load/store is used when AXI interface data width
is larger than number of bits in a tesor.
Instructions are stored as 64bit tensors to allow 64bit address alignment
* TensorLoad is modified to replace all VME load operations.
Multiple simultaneous requests can be generated. Load is pipelined
and separated from request generation.
* TensorStore -> TensorStoreNarrowVME TensorStoreWideVME. The narrow one is the original one
* TensorLoad -> TensorLoadSimple (original) TensorLoadWideVME TensorStoreNarrowVME
* LoadUop -> LoadUopSimple is the original one. The new one is based on TensorLoad
* Fetch -> FetchVME64 FetchWideVME. Reuse communication part from TensorLoad.
* DPI intreface changed to transfer more than 64bit. svOpenArrayHandle is used. tsim library compilation now requires verilator
* Compute is changed to use TensorLoad style of load uop.
* VME changed to generate/queue/respond to multiple simultaneous load requests

* code formatting fix

* Update to Chisel 3.4.3 PR Port to the latest stable Chisel release (#33)
* Fix Makefile to use Chisel -o instead of top name and .sv instead of .v
* fix reset to reset.asBool
* fix SyncQueue to deprecated module.io
* fix toBools to asBools

* include Verialted.cpp verilated_dpi.cpp directly in module.cc to provide verilator array acces fuctionality and avoid compilation warnings

* fix module io warnings

* comments

* Jenkinsfile ci pipeline fix

* Jenkinsfile ci pipeline fix. only for lint,cpu,i386

* Reenable tsim tests

* style fix

* comments cleanup

* AXI constants commented. Moved write id to VME

* comments cleanup

26 files changed

tree: 055e1bc8ab88ca73ce415e693d0507c285898d11

README.md

VTA Hardware Design Stack

VTA (versatile tensor accelerator) is an open-source deep learning accelerator complemented with an end-to-end TVM-based compiler stack.

The key features of VTA include:

Generic, modular, open-source hardware
- Streamlined workflow to deploy to FPGAs.
- Simulator support to prototype compilation passes on regular workstations.
Driver and JIT runtime for both simulator and FPGA hardware back-end.
End-to-end TVM stack integration
- Direct optimization and deployment of models from deep learning frameworks via TVM.
- Customized and extensible TVM compiler back-end.
- Flexible RPC support to ease deployment, and program FPGAs with the convenience of Python.