| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| |
| /** |
| |
| This package contains classes, interfaces and exceptions to launch |
| YARN services from the command line. |
| |
| <h2>Key Features</h2> |
| |
| <p> |
| <b>General purpose YARN service launcher</b>:<p> |
| The {@link org.apache.hadoop.service.launcher.ServiceLauncher} class parses |
| a command line, then instantiates and launches the specified YARN service. It |
| then waits for the service to finish, converting any exceptions raised or |
| exit codes returned into an exit code for the (then exited) process. |
| <p> |
| This class is designed be invokable from the static |
| {@link org.apache.hadoop.service.launcher.ServiceLauncher#main(String[])} |
| method, or from {@code main(String[])} methods implemented by |
| other classes which provide their own entry points. |
| |
| |
| <p> |
| <b>Extended YARN Service Interface</b>:<p> |
| The {@link org.apache.hadoop.service.launcher.LaunchableService} interface |
| extends {@link org.apache.hadoop.service.Service} with methods to pass |
| down the CLI arguments and to execute an operation without having to |
| spawn a thread in the {@link org.apache.hadoop.service.Service#start()} phase. |
| |
| |
| <p> |
| <b>Standard Exit codes</b>:<p> |
| {@link org.apache.hadoop.service.launcher.LauncherExitCodes} |
| defines a set of exit codes that can be used by services to standardize |
| exit causes. |
| |
| <p> |
| <b>Escalated shutdown</b>:<p> |
| The {@link org.apache.hadoop.service.launcher.ServiceShutdownHook} |
| shuts down any service via the hadoop shutdown mechanism. |
| The {@link org.apache.hadoop.service.launcher.InterruptEscalator} can be |
| registered to catch interrupts, triggering the shutdown -and forcing a JVM |
| exit if it times out or a second interrupt is received. |
| |
| <p><b>Tests:</b><p> test cases include interrupt handling and |
| lifecycle failures. |
| |
| <h2>Launching a YARN Service</h2> |
| |
| The Service Launcher can launch <i>any YARN service</i>. |
| It will instantiate the service classname provided, using the no-args |
| constructor, or if no such constructor is available, it will fall back |
| to a constructor with a single {@code String} parameter, |
| passing the service name as the parameter value. |
| <p> |
| |
| The launcher will initialize the service via |
| {@link org.apache.hadoop.service.Service#init(Configuration)}, |
| then start it via its {@link org.apache.hadoop.service.Service#start()} method. |
| It then waits indefinitely for the service to stop. |
| <p> |
| After the service has stopped, a non-null value of |
| {@link org.apache.hadoop.service.Service#getFailureCause()} is interpreted |
| as a failure, and, if it didn't happen during the stop phase (i.e. when |
| {@link org.apache.hadoop.service.Service#getFailureState()} is not |
| {@code STATE.STOPPED}, escalated into a non-zero return code). |
| <p> |
| |
| To view the workflow in sequence, it is: |
| <ol> |
| <li>(prepare configuration files —covered later)</li> |
| <li>instantiate service via its empty or string constructor</li> |
| <li>call {@link org.apache.hadoop.service.Service#init(Configuration)}</li> |
| <li>call {@link org.apache.hadoop.service.Service#start()}</li> |
| <li>call |
| {@link org.apache.hadoop.service.Service#waitForServiceToStop(long)}</li> |
| <li>If an exception was raised: propagate it</li> |
| <li>If an exception was recorded in |
| {@link org.apache.hadoop.service.Service#getFailureCause()} |
| while the service was running: propagate it.</li> |
| </ol> |
| |
| For a service to be fully compatible with this launch model, it must |
| <ul> |
| <li>Start worker threads, processes and executors in its |
| {@link org.apache.hadoop.service.Service#start()} method</li> |
| <li>Terminate itself via a call to |
| {@link org.apache.hadoop.service.Service#stop()} |
| in one of these asynchronous methods.</li> |
| </ul> |
| |
| If a service does not stop itself, <i>ever</i>, then it can be launched |
| as a long-lived daemon. |
| The service launcher will never terminate, but neither will the service. |
| The service launcher does register signal handlers to catch {@code kill} |
| and control-C signals —calling {@code stop()} on the service when |
| signaled. |
| This means that a daemon service <i>may</i> get a warning and time to shut |
| down. |
| |
| <p> |
| To summarize: provided a service launches its long-lived threads in its Service |
| {@code start()} method, the service launcher can create it, configure it |
| and start it, triggering shutdown when signaled. |
| |
| What these services can not do is get at the command line parameters or easily |
| propagate exit codes (there is a way covered later). These features require |
| some extensions to the base {@code Service} interface: <i>the Launchable |
| Service</i>. |
| |
| <h2>Launching a Launchable YARN Service</h2> |
| |
| A Launchable YARN Service is a YARN service which implements the interface |
| {@link org.apache.hadoop.service.launcher.LaunchableService}. |
| <p> |
| It adds two methods to the service interface —and hence two new features: |
| |
| <ol> |
| <li>Access to the command line passed to the service launcher </li> |
| <li>A blocking {@code int execute()} method which can return the exit |
| code for the application.</li> |
| </ol> |
| |
| This design is ideal for implementing services which parse the command line, |
| and which execute short-lived applications. For example, end user |
| commands can be implemented as such services, thus integrating with YARN's |
| workflow and {@code YarnClient} client-side code. |
| |
| <p> |
| It can just as easily be used for implementing long-lived services that |
| parse the command line -it just becomes the responsibility of the |
| service to decide when to return from the {@code execute()} method. |
| It doesn't even need to {@code stop()} itself; the launcher will handle |
| that if necessary. |
| <p> |
| The {@link org.apache.hadoop.service.launcher.LaunchableService} interface |
| extends {@link org.apache.hadoop.service.Service} with two new methods. |
| |
| <p> |
| {@link org.apache.hadoop.service.launcher.LaunchableService#bindArgs(Configuration, List)} |
| provides the {@code main(String args[])} arguments as a list, after any |
| processing by the Service Launcher to extract configuration file references. |
| This method <i>is called before |
| {@link org.apache.hadoop.service.Service#init(Configuration)}.</i> |
| This is by design: it allows the arguments to be parsed before the service is |
| initialized, thus allowing services to tune their configuration data before |
| passing it to any superclass in that {@code init()} method. |
| To make this operation even simpler, the |
| {@link org.apache.hadoop.conf.Configuration} that is to be passed in |
| is provided as an argument. |
| This reference passed in is the initial configuration for this service; |
| the one that will be passed to the init operation. |
| |
| In |
| {@link org.apache.hadoop.service.launcher.LaunchableService#bindArgs(Configuration, List)}, |
| a Launchable Service may manipulate this configuration by setting or removing |
| properties. It may also create a new {@code Configuration} instance |
| which may be needed to trigger the injection of HDFS or YARN resources |
| into the default resources of all Configurations. |
| If the return value of the method call is a configuration |
| reference (as opposed to a null value), the returned value becomes that |
| passed in to the {@code init()} method. |
| <p> |
| After the {@code bindArgs()} processing, the service's {@code init()} |
| and {@code start()} methods are called, as usual. |
| <p> |
| At this point, rather than block waiting for the service to terminate (as |
| is done for a basic service), the method |
| {@link org.apache.hadoop.service.launcher.LaunchableService#execute()} |
| is called. |
| This is a method expected to block until completed, returning the intended |
| application exit code of the process when it does so. |
| <p> |
| After this {@code execute()} operation completes, the |
| service is stopped and exit codes generated. Any exception raised |
| during the {@code execute()} method takes priority over any exit codes |
| returned by the method. This allows services to signal failures simply |
| by raising exceptions with exit codes. |
| <p> |
| |
| <p> |
| To view the workflow in sequence, it is: |
| <ol> |
| <li>(prepare configuration files —covered later)</li> |
| <li>instantiate service via its empty or string constructor</li> |
| <li>call {@link org.apache.hadoop.service.launcher.LaunchableService#bindArgs(Configuration, List)}</li> |
| <li>call {@link org.apache.hadoop.service.Service#init(Configuration)} with the existing config, |
| or any new one returned by |
| {@link org.apache.hadoop.service.launcher.LaunchableService#bindArgs(Configuration, List)}</li> |
| <li>call {@link org.apache.hadoop.service.Service#start()}</li> |
| <li>call {@link org.apache.hadoop.service.launcher.LaunchableService#execute()}</li> |
| <li>call {@link org.apache.hadoop.service.Service#stop()}</li> |
| <li>The return code from |
| {@link org.apache.hadoop.service.launcher.LaunchableService#execute()} |
| becomes the exit code of the process, unless overridden by an exception.</li> |
| <li>If an exception was raised in this workflow: propagate it</li> |
| <li>If an exception was recorded in |
| {@link org.apache.hadoop.service.Service#getFailureCause()} |
| while the service was running: propagate it.</li> |
| </ol> |
| |
| |
| <h2>Exit Codes and Exceptions</h2> |
| |
| <p> |
| For a basic service, the return code is 0 unless an exception |
| was raised. |
| <p> |
| For a {@link org.apache.hadoop.service.launcher.LaunchableService}, the return |
| code is the number returned from the |
| {@link org.apache.hadoop.service.launcher.LaunchableService#execute()} |
| operation, again, unless overridden an exception was raised. |
| |
| <p> |
| Exceptions are converted into exit codes -but rather than simply |
| have a "something went wrong" exit code, exceptions <i>may</i> |
| provide exit codes which will be extracted and used as the return code. |
| This enables Launchable Services to use exceptions as a way |
| of returning error codes to signal failures and for |
| normal Services to return any error code at all. |
| |
| <p> |
| Any exception which implements the |
| {@link org.apache.hadoop.util.ExitCodeProvider} |
| interface is considered be a provider of the exit code: the method |
| {@link org.apache.hadoop.util.ExitCodeProvider#getExitCode()} |
| will be called on the caught exception to generate the return code. |
| This return code and the message in the exception will be used to |
| generate an instance of |
| {@link org.apache.hadoop.util.ExitUtil.ExitException} |
| which can be passed down to |
| {@link org.apache.hadoop.util.ExitUtil#terminate(ExitUtil.ExitException)} |
| to trigger a JVM exit. The initial exception will be used as the cause |
| of the {@link org.apache.hadoop.util.ExitUtil.ExitException}. |
| |
| <p> |
| If the exception is already an instance or subclass of |
| {@link org.apache.hadoop.util.ExitUtil.ExitException}, it is passed |
| directly to |
| {@link org.apache.hadoop.util.ExitUtil#terminate(ExitUtil.ExitException)} |
| without any conversion. |
| One such subclass, |
| {@link org.apache.hadoop.service.launcher.ServiceLaunchException} |
| may be useful: it includes formatted exception message generation. |
| It also declares that it extends the |
| {@link org.apache.hadoop.service.launcher.LauncherExitCodes} |
| interface listing common exception codes. These are exception codes |
| that can be raised by the {@link org.apache.hadoop.service.launcher.ServiceLauncher} |
| itself to indicate problems during parsing the command line, creating |
| the service instance and the like. There are also some common exit codes |
| for Hadoop/YARN service failures, such as |
| {@link org.apache.hadoop.service.launcher.LauncherExitCodes#EXIT_UNAUTHORIZED}. |
| Note that {@link org.apache.hadoop.util.ExitUtil.ExitException} itself |
| implements {@link org.apache.hadoop.util.ExitCodeProvider#getExitCode()} |
| |
| <p> |
| If an exception does not implement |
| {@link org.apache.hadoop.util.ExitCodeProvider#getExitCode()}, |
| it will be wrapped in an {@link org.apache.hadoop.util.ExitUtil.ExitException} |
| with the exit code |
| {@link org.apache.hadoop.service.launcher.LauncherExitCodes#EXIT_EXCEPTION_THROWN}. |
| |
| <p> |
| To view the exit code extraction in sequence, it is: |
| <ol> |
| <li>If no exception was triggered by a basic service, a |
| {@link org.apache.hadoop.service.launcher.ServiceLaunchException} with an |
| exit code of 0 is created.</li> |
| |
| <li>For a LaunchableService, the exit code is the result of {@code execute()} |
| Again, a {@link org.apache.hadoop.service.launcher.ServiceLaunchException} |
| with a return code of 0 is created. |
| </li> |
| |
| <li>Otherwise, if the exception is an instance of {@code ExitException}, |
| it is returned as the service terminating exception.</li> |
| |
| <li>If the exception implements {@link org.apache.hadoop.util.ExitCodeProvider}, |
| its exit code and {@code getMessage()} value become the exit exception.</li> |
| |
| <li>Otherwise, it is wrapped as a |
| {@link org.apache.hadoop.service.launcher.ServiceLaunchException} |
| with the exit code |
| {@link org.apache.hadoop.service.launcher.LauncherExitCodes#EXIT_EXCEPTION_THROWN} |
| to indicate that an exception was thrown.</li> |
| |
| <li>This is finally passed to |
| {@link org.apache.hadoop.util.ExitUtil#terminate(ExitUtil.ExitException)}, |
| by way of |
| {@link org.apache.hadoop.service.launcher.ServiceLauncher#exit(ExitUtil.ExitException)}; |
| a method designed to allow subclasses to override for testing.</li> |
| |
| <li>The {@link org.apache.hadoop.util.ExitUtil} class then terminates the JVM |
| with the specified exit code, printing the {@code toString()} value |
| of the exception if the return code is non-zero.</li> |
| </ol> |
| |
| This process may seem convoluted, but it is designed to allow any exception |
| in the Hadoop exception hierarchy to generate exit codes, |
| and to minimize the amount of exception wrapping which takes place. |
| |
| <h2>Interrupt Handling</h2> |
| |
| The Service Launcher has a helper class, |
| {@link org.apache.hadoop.service.launcher.InterruptEscalator} |
| to handle the standard SIGKILL signal and control-C signals. |
| This class registers for signal callbacks from these signals, and, |
| when received, attempts to stop the service in a limited period of time. |
| It then triggers a JVM shutdown by way of |
| {@link org.apache.hadoop.util.ExitUtil#terminate(int, String)} |
| <p> |
| If a second signal is received, the |
| {@link org.apache.hadoop.service.launcher.InterruptEscalator} |
| reacts by triggering an immediate JVM halt, invoking |
| {@link org.apache.hadoop.util.ExitUtil#halt(int, String)}. |
| This escalation process is designed to address the situation in which |
| a shutdown-hook can block, yet the caller (such as an init.d daemon) |
| wishes to kill the process. |
| The shutdown script should repeat the kill signal after a chosen time period, |
| to trigger the more aggressive process halt. The exit code will always be |
| {@link org.apache.hadoop.service.launcher.LauncherExitCodes#EXIT_INTERRUPTED}. |
| <p> |
| The {@link org.apache.hadoop.service.launcher.ServiceLauncher} also registers |
| a {@link org.apache.hadoop.service.launcher.ServiceShutdownHook} with the |
| Hadoop shutdown hook manager, unregistering it afterwards. This hook will |
| stop the service if a shutdown request is received, so ensuring that |
| if the JVM is exited by any thread, an attempt to shut down the service |
| will be made. |
| |
| |
| <h2>Configuration class creation</h2> |
| |
| The Configuration class used to initialize a service is a basic |
| {@link org.apache.hadoop.conf.Configuration} instance. As the launcher is |
| the entry point for an application, this implies that the HDFS, YARN or other |
| default configurations will not have been forced in through the constructors |
| of {@code HdfsConfiguration} or {@code YarnConfiguration}. |
| <p> |
| What the launcher does do is use reflection to try and create instances of |
| these classes simply to force in the common resources. If the classes are |
| not on the classpath this fact will be logged. |
| <p> |
| Applications may consider it essential to either force load in the relevant |
| configuration, or pass it down to the service being created. In which |
| case further measures may be needed. |
| |
| <p><b>1: Creation in an extended {@code ServiceLauncher}</b> |
| |
| <p> |
| Subclass the Service launcher and override its |
| {@link org.apache.hadoop.service.launcher.ServiceLauncher#createConfiguration()} |
| method with one that creates the right configuration. |
| This is good if a single |
| launcher can be created for all services launched by a module, such as |
| HDFS or YARN. It does imply a dedicated script to invoke the custom |
| {@code main()} method. |
| |
| <p><b>2: Creation in {@code bindArgs()}</b> |
| |
| <p> |
| In |
| {@link org.apache.hadoop.service.launcher.LaunchableService#bindArgs(Configuration, List)}, |
| a new configuration is created: |
| |
| <pre> |
| public Configuration bindArgs(Configuration config, List<String> args) |
| throws Exception { |
| Configuration newConf = new YarnConfiguration(config); |
| return newConf; |
| } |
| </pre> |
| |
| This guarantees a configuration of the right type is generated for all |
| instances created via the service launcher. It does imply that this is |
| expected to be only way that services will be launched. |
| |
| <p><b>3: Creation in {@code serviceInit()}</b> |
| |
| <pre> |
| protected void serviceInit(Configuration conf) throws Exception { |
| super.serviceInit(new YarnConfiguration(conf)); |
| } |
| </pre> |
| |
| <p> |
| This is a strategy used by many existing YARN services, and is ideal for |
| services which do not implement the LaunchableService interface. Its one |
| weakness is that the configuration is now private to that instance. Some |
| YARN services use a single shared configuration instance as a way of |
| propagating information between peer services in a |
| {@link org.apache.hadoop.service.CompositeService}. |
| While a dangerous practice, it does happen. |
| |
| |
| <b>Summary</b>: the ServiceLauncher makes a best-effort attempt to load the |
| standard Configuration subclasses, but does not fail if they are not present. |
| Services which require a specific subclasses should follow one of the |
| strategies listed; |
| creation in {@code serviceInit()} is the recommended policy. |
| |
| <h2>Configuration file loading</h2> |
| |
| Before the service is bound to the CLI, the ServiceLauncher scans through |
| all the arguments after the first one, looking for instances of the argument |
| {@link org.apache.hadoop.service.launcher.ServiceLauncher#ARG_CONF} |
| argument pair: {@code --conf <file>}. This must refer to a file |
| in the local filesystem which exists. |
| <p> |
| It will be loaded into the Hadoop configuration |
| class (the one created by the |
| {@link org.apache.hadoop.service.launcher.ServiceLauncher#createConfiguration()} |
| method. |
| If this argument is repeated multiple times, all configuration |
| files are merged with the latest file on the command line being the |
| last one to be applied. |
| <p> |
| All the {@code --conf <file>} argument pairs are stripped off |
| the argument list provided to the instantiated service; they get the |
| merged configuration, but not the commands used to create it. |
| |
| <h2>Utility Classes</h2> |
| |
| <ul> |
| |
| <li> |
| {@link org.apache.hadoop.service.launcher.IrqHandler}: registers interrupt |
| handlers using {@code sun.misc} APIs. |
| </li> |
| |
| <li> |
| {@link org.apache.hadoop.service.launcher.ServiceLaunchException}: a |
| subclass of {@link org.apache.hadoop.util.ExitUtil.ExitException} which |
| takes a String-formatted format string and a list of arguments to create |
| the exception text. |
| </li> |
| |
| </ul> |
| */ |
| |
| |
| package org.apache.hadoop.service.launcher; |
| |
| import java.util.List; |
| |
| import org.apache.hadoop.conf.Configuration; |
| import org.apache.hadoop.util.ExitUtil; |