tree 03e2edd996562c3279c7d2ecbd47ca3755163cb9
parent 23f52e98d6426a5d0e11bc4e8c8de588b60f0406
author Jon Bringhurst <jon@bringhurst.org> 1746224550 -0700
committer GitHub <noreply@github.com> 1746224550 -0700
gpgsig -----BEGIN PGP SIGNATURE-----
 
 wsFcBAABCAAQBQJoFUWmCRC1aQ7uu5UhlAAAwUcQAAlthyuismGuBySJ1tFEqWFE
 lnjYfAKra9au3HhB+JpXEPOXvh3CA8ZcZ3fbja9P6MSHBKlatU/w+tEpqPuLlB79
 vtMyOSTgvAQuDcZAAFKKRnOv39QVPydVT4VjhqwmvKXJ/tYewq4R1iJ7QgWzXBD+
 iMzSYb/9kfVEzTiOJh/27amlTZbpYB5WMea6sKVU7pqBwbjYCbBDin9UYKr3FOrU
 Q6DZqppxMdmWh/8m3UuZJ7Q7fJeX6Lp2hqTZgkOphqp15a3P6lR+yov3PdYac/MH
 +3jCr/bDs9G90Fhla8JN64k/Wt1mn8ObxURalfiNAyNxUoGXtMkWezvOy35jAIRi
 Sc8ejXAH4BlNp/H6xA/r6QoS6I5GsXTEyJgPjyiI3DgX8QNNDmkFQ0C8H42eTKw8
 YJSiePibcy8A6jtX6B8hG/UT7fGj8j4hVpMU1TdxdBlZujbZ3T5WkdmfumgFAUDE
 pSJcN5OlF9E62wHkQwn8UsAEN5NGZOSVdjye1/A+8T1bCMosT0PBDRUelWPDTvHG
 HvMv0hhgH51xTinIuZdZFvFYS+yzQY/0H1GXutIptsg/sDxYqkJYUo9IVS+Vo2Fr
 U+yPGzPSChhEgIqOA6hMzAoqqxBTd0NmR1btmW1jVqOHDWKIhjsehh0CA5STs5nn
 NJPsc3ToZTUCvTQr6cqn
 =TjUK
 -----END PGP SIGNATURE-----
 

SAMZA-2804: Concurrency issues identified in run-class.sh on samza-yarn (#1716)

* Add annotations for each line identified as having a potential issue.

* Resolve multiple concurrency issues

## Race condition in pathing jar manifest creation

A race condition exists when setting up the classpath during container launch.

During container launch using samza-yarn, run-class.sh creates a pathing jar file (which holds the classpath for the container launch). However, during the creation of this pathing jar, temporary files, as well as the pathing jar itself is not placed in a location unique to the container. This results in multiple containers writing to the same pathing jar location and temporary file location, which results in a race condition.

This race condition may show up in several ways, such as when Yarn removes jars from a finished container (other containers will point to a classpath which no longer exists) or when multiple run-class.sh scripts attempt to write the manifest.txt or pathing jar at the same time.

Note that host affinity being enabled will make this problem worse. The pathing.jar is written to the usercache, so when the container which created the pathing.jar is finished and removed, any new container which launches on that host will point to jar files which do not exist anymore. When host affinity is enabled, it will not move to a new host and just keep failing.

## Container logging directory fallback is not unique for each container

The fallback log directory is the same among all containers running on the same host. It should be unique per-container.

## Container tmp dir is not unique per-container

The JAVA_TMP_DIR directory is the same for all containers. We should make sure that it's safe to use the same directory for all containers.

* Simplify comments and print manifest file locations