The goal is to understand what exactly is happening during A_data.copyfrom(np.array([2, 3])), where A_data lives in Hexagon.
The diagram below describes the sequence of calls and components involved when memcpy over the Hexagon device is invoked.

The communication between x86 and Android is done via the standard TVM RPC protocol implemented mostly in src/runtime/rpc/rpc_endpoint.cc.
A packet between Android and Hexagon is proxy-ed by the Hexagon FastRPC mechanism. FastRPC depends on the auto-generated implementations of client- and server- side API. During the build time, the Android side API (”stub”) and the Hexagon side API (”skel”) is generated from src/runtime/hexagon/rpc/hexagon_rpc.idl (see cmake/modules/Hexagon.cmake).
When TVM’s RPC server on Android, tvm_rpc_android_server, invokes hexagon_rpc_send(...), it actually calls into the same-name function defined in the stub with the exact same arguments (which includes the URI for the *skel.so library to use on Hexagon, which in our case is libhexagon_rpc_skel.so). Similarly, on the Hexagon side, hexagon_rpc_send(...) call is first intercepted by the “skel” API, which in tern calls the actual implementation defined in src/runtime/hexagon/rpc/rpc_server.cc.
What’s happening during the launcher initialization at https://github.com/apache/tvm/blob/7cfaa88e6c18edc0a41e1a984d3cb9d8659a1c2c/tests/python/contrib/test_hexagon/test_launcher.py#L71-L73 ?
launcher = HexagonLauncher(serial_number=android_serial_number, rpc_info=rpc_info) launcher.upload(dso_binary_path, dso_binary) launcher.start_server()
Here, we send various files over android via adb, and initialize a RPC server via tvm_rpc_android binary (built from https://github.com/apache/tvm/tree/main/apps/cpp_rpc):
subprocess.Popen( self._adb_device_sub_cmd + ["shell", f"cd {self._workspace} && ./android_bash.sh"], stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE, )
./tvm_rpc_android server --port=<RPC_SERVER_PORT> --tracker=<RPC_TRACKER_HOST>:<RPC_TRACKER_PORT> --key=<HEXAGON_REMOTE_DEVICE_KEY>&
When we do launcher.create_session() , a remote RPC session between x86 and android is established via this line:
self._rpc = tracker.request( ... session_constructor_args=[ "tvm.contrib.hexagon.create_hexagon_session", self._session_name, self._remote_stack_size_bytes, ], )
Which eventually jumps to the following line in C++, which creates a RPC client session on an x86 host and run a server initialization function tvm.contrib.hexagon.create_hexagon_session on android:
TVM_FFI_STATIC_INIT_BLOCK() { namespace refl = tvm::ffi::reflection; refl::GlobalDef().def_packed("rpc.Connect", [](ffi::PackedArgs args, ffi::Any* rv) { auto url = args[0].cast<std::string>(); int port = args[1].cast<int>(); auto key = args[2].cast<std::string>(); *rv = RPCClientConnect(url, port, key, ffi::PackedArgs(args.values + 3, args.type_codes + 3, args.size() - 3)); }); }
tvm.contrib.hexagon.create_hexagon_session is defined here. It establishes a link between android and hexagon, this code runs on android.
TVM_FFI_STATIC_INIT_BLOCK() { namespace refl = tvm::ffi::reflection; refl::GlobalDef().def_packed( "tvm.contrib.hexagon.create_hexagon_session", [](ffi::PackedArgs args, ffi::Any* rv) { auto session_name = args[0].cast<std::string>(); int remote_stack_size_bytes = args[1].cast<int>(); HexagonTransportChannel* hexagon_channel = new HexagonTransportChannel(hexagon_rpc_URI CDSP_DOMAIN, remote_stack_size_bytes); std::unique_ptr<RPCChannel> channel(hexagon_channel); auto ep = RPCEndpoint::Create(std::move(channel), session_name, "", NULL); auto sess = CreateClientSession(ep); *rv = CreateRPCSessionModule(sess); }); }
HexagonTransportChannel is the one that actually knows how to talk to Hexagon. It uses functions such as hexagon_rpc_send, hexagon_rpc_receive defined in
A_data.copyfrom(np.array([2, 3])) reaches this line. This is the boundary between Python and C++ land in TVM FFI:
check_call(_LIB.TVMTensorCopyFromBytes(self.handle, data, nbytes))
int TVMTensorCopyFromBytes(TVMArrayHandle handle, void* data, size_t nbytes) { API_BEGIN(); TensorCopyFromBytes(handle, data, nbytes); API_END(); }
Now we come to TensorCopyFromBytes function. The first non-obvious question is, which DeviceAPI is selected by DeviceAPI::Get(handle->device)?
void TensorCopyFromBytes(DLTensor* handle, const void* data, size_t nbytes) { ... DLTensor from; ... DeviceAPI::Get(handle->device)->CopyDataFromTo(&from, handle, nullptr); // Synchronize in case data become unavailable later. DeviceAPI::Get(handle->device)->StreamSync(handle->device, nullptr); }
The answer: RPCDeviceAPI defined below, not HexagonDeviceAPI.
class RPCDeviceAPI final : public DeviceAPI { ...
This is due to the fact that sess.device, used in test_launcher.py below, encodes two pieces of information: (1) The device is RPC and (2) it wraps the underlying “real” device Hexagon.
See below for how sess.device is created during HexagonLauncher initialization.
self.device = self._rpc.hexagon(0).
RPCDeviceAPI::CopyDataFromTo is defined in https://github.com/apache/tvm/blob/899bc064e1bf8df915bcadc979a6f37210cdce33/src/runtime/rpc/rpc_device_api.cc#L80
Here, we meet another GetAPI call:
GetSess(dev_from)->GetDeviceAPI(remote_dev)->CopyDataFromTo(&from_tensor, &to_tensor, stream);
At first, it is not obvious where this CopyDataFromTo jumps to (initially I thought it would jump to HexagonDeviceAPI). Since GetSess(dev_from) returns the client RPC connection between x86 and android, created during initialization in
Module RPCClientConnect(std::string url, int port, std::string key, ffi::PackedArgs init_seq) { auto endpt = RPCConnect(url, port, "client:" + key, init_seq); return CreateRPCSessionModule(CreateClientSession(endpt)); }
, this jumps to RPCClientSession class defined in https://github.com/apache/tvm/blob/899bc064e1bf8df915bcadc979a6f37210cdce33/src/runtime/rpc/rpc_endpoint.cc#L994
class RPCClientSession : public RPCSession, public DeviceAPI { ...
rpc_endpoint.cc is a very important file. It contains the core RPC protocol logic. CopyDataFromTo in rpc_device_api.cc jumps to
void CopyDataFromTo(DLTensor* from, DLTensor* to, TVMStreamHandle stream) final { endpoint_->SysCallRemote(RPCCode::kCopyAmongRemote, from, to, stream); }
from which things transfer to the Android side.
Here is where RPCCode::kCopyAmongRemote is handled:
case RPCCode::kCopyAmongRemote: SysCallHandler(RPCCopyAmongRemote); break;
The handler is represented by serving_session_, which is initialized during server initialization at
serving_session_ = RPCModuleGetSession(mod);
which corresponds to the Hexagon session created before in https://github.com/apache/tvm/blob/cd2fa69677516048e165e84a88c774dfb0ee65d1/src/runtime/hexagon/rpc/android/session.cc#L106.
The handler is passed to the following function
void RPCCopyAmongRemote(RPCSession* handler, ffi::PackedArgs args, ffi::Any* rv) { auto from = args[0].cast<DLTensor*>(); auto to = args[1].cast<DLTensor*>(); ... handler->GetDeviceAPI(dev)->CopyDataFromTo(from, to, stream); }
This is an interesting function. Here, handler is again RPCClientSession due to the line in
auto sess = CreateClientSession(ep);
so apparently, things might look like it is looping back to RPCClientSession::CopyDataFromTo:
void CopyDataFromTo(DLTensor* from, DLTensor* to, TVMStreamHandle stream) final { endpoint_->SysCallRemote(RPCCode::kCopyAmongRemote, from, to, stream); }
But this time, endpoint_ is different. Previously, this endpoint_ represented the connection between x86 and android (created in https://github.com/apache/tvm/blob/2cca934aad1635e3a83b712958ea83ff65704316/src/runtime/rpc/rpc_socket_impl.cc#L99-L100), but this endpoint_ belongs to the Hexagon session created in https://github.com/apache/tvm/blob/cd2fa69677516048e165e84a88c774dfb0ee65d1/src/runtime/hexagon/rpc/android/session.cc#L113. So this is where the RPC communication between Android and Hexagon starts.
Recall that the endpoint_ owned by the Hexagon session is created via tvm.contrib.hexagon.create_hexagon_session when the Android RPC server is being initialized. The endpoint_ is represented by the following class:
class HexagonTransportChannel : public RPCChannel { public: explicit HexagonTransportChannel(const std::string& uri, int remote_stack_size_bytes) { ... hexagon_rpc_open(uri.c_str(), &_handle); ... } size_t Send(const void* data, size_t size) override { hexagon_rpc_send(_handle, static_cast<const unsigned char*>(data), static_cast<int>(size)); ... }
On construction, hexagon_rpc_open is called, which will initialize the TVM MinRPC server on Hexagon and overwrites device_api.hexagon registry to point to the call to HexagonDeviceAPI. https://github.com/apache/tvm/blob/c20cbc55c03f9f048b151a1221469b9888123608/src/runtime/hexagon/rpc/hexagon/rpc_server.cc#L210-L213
The endpoint routes each RPC packet by Send function, which in turn calls hexagon_rpc_send(...) defined in:
AEEResult hexagon_rpc_send(remote_handle64 _handle, const unsigned char* data, int dataLen) { get_hexagon_rpc_server()->Write(reinterpret_cast<const uint8_t*>(data), static_cast<size_t>(dataLen)); ... }
This is where FastRPC comes into play and things get very confusing. The endpoint lives in Android, so hexagon_rpc_send call (also hexagon_rpc_open) happens at Android. But the implementations of these functions in rpc_server.cc describe the behavior on the Hexagon side... What’s happening is that FastRPC “stub” and “skel” (see the overview at the top) API intercept those calls and play some magic behind the scene to make RPC call look transparent from the client (Android) perspective.
So when the control comes to the point of definition of hexagon_rpc_send in rpc_server.cc, FastRPC has already finished its job and so we are really on the Hexagon side now. We come to HexagonRPCServer::Write(...) function, which in tern calls into TVM MinRPC server instance rpc_server_ to process the incoming packet:
int64_t Write(const uint8_t* data, size_t data_size_bytes) { if (io_.SetReadBuffer(data, data_size_bytes) != AEE_SUCCESS) { return -1; } rpc_server_.ProcessOnePacket(); return (int64_t)data_size_bytes; }
MinRPCServer::ProcessOnePacket() function dispatches to HandleCopyFromRemote() upon receiving kCopyFromRemote request:
bool ProcessOnePacket() { ... if (...) { ... } else { switch (code) { ... case RPCCode::kCopyFromRemote: { HandleCopyFromRemote(); break; } ...
void HandleCopyFromRemote() { DLTensor* arr = this->ArenaAlloc<DLTensor>(1); uint64_t data_handle; this->Read(&data_handle); arr->data = reinterpret_cast<void*>(data_handle); ... this->ReadArray(arr->shape, arr->ndim); if (...) { ... } else { data_ptr = this->ArenaAlloc<uint8_t>(num_bytes); DLTensor temp; ... call_ecode = TVMDeviceCopyDataFromTo(arr, &temp, nullptr); // need sync to make sure that the copy is completed. if (call_ecode == 0) { call_ecode = TVMSynchronize(arr->device.device_type, arr->device.device_id, nullptr); } }
And finally we see a call to DeviceAPIManager::Get(dev)->CopyDataFromTo which translates to HexagonDeviceAPI::CopyDataFromTo .
int TVMDeviceCopyDataFromTo(DLTensor* from, DLTensor* to, TVMStreamHandle stream) { ... DeviceAPIManager::Get(dev)->CopyDataFromTo(from, to, stream); ... }