| <!--- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| ---> |
| |
| # PyWayang |
| |
| Implementation of a python API for Apache Wayang |
| |
| ## Building from source |
| To build and install `pywy` as a python library, `build` is needed. It |
| can be installed using: |
| ```shell |
| cd ./python |
| pip install --upgrade build |
| ``` |
| Installing this might require `python3.8-venv` to be installed on the |
| system. |
| |
| After building the python package, execute the following steps to make |
| it available for your system: |
| ```shell |
| python3 -m pip install dist/pywy-1.0.0.tar.gz |
| ``` |
| |
| ## Executing python code |
| In order to execute python code, the REST API needs to be running. |
| Compiling the assembly and executing the Main class for the REST API can |
| be done with the following steps: |
| |
| |
| Before compiling your code, make sure the required configuration variables |
| are set correctly in `wayang-api/wayang-api-python/src/main/resources/wayang-api-python-defaults.properties`. |
| This example `wayang-api-python-defaults.properties` can be used as a |
| guideline: |
| |
| ``` |
| wayang.api.python.worker = /var/www/html/python/src/pywy/execution/worker.py |
| wayang.api.python.path = python3 |
| wayang.api.python.env.path = /usr/local/lib/python3.8/dist-packages |
| ``` |
| The first configuration value needs to point to the location of your |
| apache wayang repository so that it can find the python worker that |
| executes UDFs (usually your current work directory + |
| `python/src/pypy/execution/worker.py`) |
| The second value is your command used for invoking python scripts. |
| Usually either python3 or just python. |
| The third value points to the directory in which python libraries are to |
| be found (usually where pip installs them). |
| |
| - Package the project |
| ```shell |
| ./mvnw clean package -pl :wayang-assembly -Pdistribution |
| ``` |
| |
| - Starting the REST API as a background process |
| ```shell |
| cd wayang-assembly/target/ |
| tar -xvf apache-wayang-assembly-1.0.0-incubating-dist.tar.gz |
| cd wayang-1.0.0 |
| ./bin/wayang-submit org.apache.wayang.api.json.Main & |
| ``` |
| |
| Now, create and execute a python script like this: |
| |
| ```python |
| from pywy.dataquanta import WayangContext |
| from pywy.platforms.java import JavaPlugin |
| from pywy.platforms.spark import SparkPlugin |
| |
| def word_count(): |
| ctx = WayangContext() \ |
| .register({JavaPlugin, SparkPlugin}) \ |
| .textfile("file:///README.md") \ |
| .flatmap(lambda w: w.split()) \ |
| .filter(lambda w: w.strip() != "") \ |
| .map(lambda w: (w.lower(), 1)) \ |
| .reduce_by_key(lambda t: t[0], lambda t1, t2: (t1[0], int(t1[1]) + int(t2[1]))) \ |
| .store_textfile("file:///wordcount-out-python.txt") |
| |
| if __name__ == "__main__": |
| word_count |
| ``` |
| |
| ### Testing python code |
| You can run the python tests by using pytest, the requirements for the tests are listed in `python/src/pywy/requirements.txt`. To run the tests navigate to the base wayang folder, e.g. `/var/www/html` and run `pytest -s python/src/pywy` if you need to pass a specific configuration for your use case you can also add a config flag `pytest -s --config=pathToYourConfig python/src/pywy/` |