tree: 7d5a2bfcf9609a11e8cc98c31482accbaa8fd95c [path history] [tgz]
  1. README.md
  2. simple_client.py
http/get_multipart/python/client/README.md

HTTP GET Arrow Data in multipart/mixed: Python Client Example

This directory contains an example of a Python HTTP client that receives a multipart/mixed response from the server. The client:

  1. Sends an HTTP GET request to a server.
  2. Receives an HTTP 200 response from the server, with the response body containing a multipart/mixed response.
  3. Parses the multipart/mixed response using the email module.[^1]
  4. Extracts the JSON part, parses it and prints a preview of the JSON data.
  5. Extracts the Arrow stream part, reads the Arrow stream, and sums the total number of records in the entire Arrow stream.
  6. Extracts the plain text part and prints it as it is.

To run this example, first start one of the server examples in the parent directory, then:

pip install pyarrow
python simple_client.py

[!WARNING] This simple_client.py parses the multipart response using the multipart message parser from the Python email module. This module puts the entire message in memory and seems to spend a lot of time looking for part delimiter and encoding/decoding the parts.

The overhead of multipart/mixed parsing is 85% on my machine and after the ~1GB Arrow Stream message is fully in memory, it takes only 0.06% of the total execution time to parse it.

[^1]: The multipart/mixed standard, used by HTTP, is derived from the MIME standard used in email.