src/site/xdoc/faq.xml - whirr - Git at Google

 <?xml version="1.0" encoding="iso-8859-1"?>
 <!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
 -->
 <document xmlns="http://maven.apache.org/XDOC/2.0"
 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
 xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
   <properties></properties>
   <body>
     <section name="Whirr&#153; Frequently Asked Questions"></section>
     <p>
       <a name="how-do-i-find-my-cloud-credentials"></a>
     </p>

     <subsection name="How do I find my cloud credentials?"></subsection>

     <p>On EC2:</p>
     <ol style="list-style-type: decimal">
       <li>Go to
       <a class="externalLink"
       href="http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key">
       http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key</a> </li>
       <li>Log in, if prompted</li>
       <li>Find your Access Key ID and Secret Access Key in the "Access Credentials" section, under
       the "Access Keys" tab. You will have to click "Show" to see the text of your secret access
       key.</li>
     </ol>
     <p>Another good resource is
     <a class="externalLink" href="http://alestic.com/2009/11/ec2-credentials">Understanding Access
     Credentials for AWS/EC2</a> by Eric Hammond.</p>

     <subsection name="Can I specify my own private key?"></subsection>

     <p>Yes, by setting
     <tt>whirr.private-key-file</tt> (or
     <tt>--private-key-file</tt> on the command line). You should also set
     <tt>whirr.public-key-file</tt> (
     <tt>--public-key-file</tt> ) at the same time.</p>
     <p>Private keys must not have a passphrase associated with them. You can check this with:</p>
     <source>grep ENCRYPTED ~/.ssh/id_rsa</source>
     <p>If there is no passphrase then there will be no match.</p>

     <subsection name="How do I access my cluster from a different network?"></subsection>

     <p>By default, access to clusters is restricted to the single IP address of the machine
     starting the cluster, as determined by
     <a class="externalLink" href="http://checkip.amazonaws.com/">Amazon's check IP service</a> .
     However, some networks report multiple origin IP addresses (e.g. they round-robin between them
     by connection), which may cause problems if the address used for later connections is different
     to the one reported at the time of the first connection.</p>
     <p>A related problem is when you wish to access the cluster from a different network to the one
     it was launched from.</p>
     <p>In these cases you can specify the IP addresses of the machines that may connect to the
     cluster by setting the
     <tt>client-cidrs</tt> property to a comma-separated list of
     <a class="externalLink" href="http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing">
     CIDR</a> blocks.</p>
     <p>For example,
     <tt>208.128.0.0/16,38.102.147.107/32</tt>  would allow access from the
     <tt>208.128.0.0</tt>  class B network, and the (single) IP address 38.102.147.107.</p>

     <subsection name="How can I start a cluster in a particular location?"></subsection>

     <p>By default clusters are started in an arbitrary location (e.g. region or data center). You
     can control the location by setting
     <tt>location-id</tt> (see the
     <a href="configuration-guide.html">configuration guide</a> for details).</p>
     <p>For example, in EC2, setting
     <tt>location-id</tt> to
     <tt>us-east-1</tt> would start the cluster in the US-East region, while setting it to
     <tt>us-east-1a</tt> (note the final
     <tt>a</tt> ) would start the cluster in that particular availability zone (
     <tt>us-east-1a</tt> ) in the US-East region.</p>

     <subsection name="How can I use a custom image? How can I control the cloud hardware used?">
     </subsection>

     <p>The default image used is dependent on the Cloud provider, the hardware, and the service.
     Whirr tries to find an image with Ubuntu Server and at least 1024 MB of RAM.</p>
     <p>Use
     <tt>image-id</tt> to specify the image used, and
     <tt>hardware-id</tt> to specify the hardware. Both are cloud-specific.</p>
     <p>You can specify the amount of RAM in a cloud agnostic way by setting a value for
     <tt>hardware-min-ram</tt> .</p>
     <p>In addition, on EC2 you need to set jclouds.ec2.ami-owners to include the AMI owner if it is
     not Amazon, Alestic, Canonical, or RightScale.</p>

     <subsection name="How do I log in to a node in the cluster?"></subsection>

     <p>On EC2, if you know the node's address you can do:</p>
     <source>ssh -i ~/.ssh/id_rsa &lt;whirr.cluster-user&gt;@host</source>
     <p>This assumes that you use the default private key; if this is not the case then specify the
     one you used at cluster launch.</p>
     <p>
     <tt>whirr.cluster-user</tt> defaults to the name of the local user running Whirr.</p>
     <p>
       <a name="how-can-i-modify-the-instance-installation-and-configuration-scripts"></a>
     </p>

     <subsection name="How can I modify the instance installation and configuration scripts?">
     </subsection>

     <p>The scripts to install and configure cloud instances are searched for on the classpath.</p>
     <p>(Note that in versions prior to 0.4.0 scripts were downloaded from S3 by default, and could
     be overridden by setting
     <tt>run-url-base</tt> . This property no longer has any effect, so you should instead use the
     approach explained below.)</p>
     <p>If you want to change the scripts then you can place a modified copy of the scripts in a
     <i>functions</i>directory in Whirr's installation directory. The original versions of the
     scripts can be found in
     <i>functions</i>directories in the source trees.</p>
     <p>For example, to override the Hadoop scripts, do the following:</p>
     <source>
 cd $WHIRR_HOME
 mkdir functions
 cp services/hadoop/src/main/resources/functions/* functions
 </source>
 <p>Then make your changes to the copies in
 <i>functions</i>.</p>
 <p>The first port of call for debugging the scripts that run on on a cloud instance is the
 <i>whirr.log</i>in the directory from which you launched the
 <i>whirr</i>CLI.</p>
 <p>The script output in this log file may be truncated, but you can see the complete output by
 logging into the node on which the script ran (see "How do I log in to a node in the cluster?"
 above) and looking in the
 <i>/tmp/bootstrap</i>or directories for the script itself, and the standard output and standard
 error logs.</p>

 <subsection name="How do I specify the service version and other service properties?">
 </subsection>

 <p>Some services have a property to control the version number of the software to be installed.
 This is typically achieved by setting the property
 <tt>whirr.&lt;service-name&gt;.tarball.url</tt> . Similarly, some services can have arbitrary
 service properties set.</p>
 <p>See the samples in the
 <i>recipes</i>directory for details for a particular service.</p>
 <p>In cases where neither of these configuration controls are supported, you may modify the
 scripts to install a particular version of the service, or to change the service properties
 from the defaults. See "How to modify the instance installation and configuration scripts"
 above for details on how to override the scripts.</p>

 <subsection name="How can I install custom packages?"></subsection>

 <p>You can install extra software by modifying the scripts that run on the cloud instances. See
 "How to modify the instance installation and configuration scripts" above.</p>

 <subsection name="How do I run Cloudera's Distribution for Hadoop?"></subsection>

 <p>You can run CDH rather than Apache Hadoop by running the Hadoop service and setting the
 <tt>whirr.hadoop.install-function</tt> and
 <tt>whirr.hadoop.configure-function</tt> properties. See the
 <i>recipes</i>directory in the distribution for samples.</p>
 <p>
 <a name="other-services"></a>
 </p>

 <subsection name="How do I run a Cassandra/HBase/ZooKeeper cluster?"></subsection>

 <p>See the
 <i>recipes</i>directory in the distribution for samples.</p>

 <subsection name="How do I automatically tear down a cluster after a fixed time?"></subsection>

 <p>It's often convenient to terminate a cluster a fixed time after launch. This is the case for
 test clusters, for example. You can achieve this by scheduling the destroy command using the
 <tt>at</tt> command from your local machine.</p>
 <p>
 <b>WARNING: The machine from which you issued the
 <tt>at</tt> command must be running (and able to contact the cloud provider) at the time it
 runs.</b>
 </p>
 <source>% echo 'bin/whirr destroy-cluster --config hadoop.properties' | at 'now + 50 min'</source>
     <p>Note that issuing a
     <tt>shutdown</tt> command on an instance may simply stop the instance, which is not sufficient
     to fully terminate the instance, in which case you would continue to be charged for it. This is
     the case for EBS boot instances, for example.</p>
     <p>You can read more about this technique on
     <a class="externalLink" href="http://alestic.com/2010/09/ec2-instance-termination">Eric
     Hammond's blog</a> .</p>
     <p>Also, Mac OS X users might find
     <a class="externalLink"
     href="http://superuser.com/questions/43678/mac-os-x-at-command-not-working">this thread</a> a
     useful reference for the
     <tt>at</tt> command.</p>

     <subsection name="How do I start a machine without having a cluster role?"></subsection>

     <p>Sometimes you need to provision machines in the same cluster without having a specific role.
     For this you can use "noop" as a role name when specifying the instance templates.</p>
     <source>
 whirr.instance-templates=3 zookeeper,1 noop
 # will start three machines with zookeeper and one machine just with the OS
 </source>
   </body>
 </document>
	<?xml version="1.0" encoding="iso-8859-1"?>
	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->
	<document xmlns="http://maven.apache.org/XDOC/2.0"
	xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
	<properties></properties>
	<body>
	<section name="Whirr Frequently Asked Questions"></section>
	<p>
	<a name="how-do-i-find-my-cloud-credentials"></a>
	</p>

	<subsection name="How do I find my cloud credentials?"></subsection>

	<p>On EC2:</p>
	<ol style="list-style-type: decimal">
	<li>Go to
	<a class="externalLink"
	href="http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key">
	http://aws-portal.amazon.com/gp/aws/developer/account/index.html?action=access-key</a> </li>
	<li>Log in, if prompted</li>
	<li>Find your Access Key ID and Secret Access Key in the "Access Credentials" section, under
	the "Access Keys" tab. You will have to click "Show" to see the text of your secret access
	key.</li>
	</ol>
	<p>Another good resource is
	<a class="externalLink" href="http://alestic.com/2009/11/ec2-credentials">Understanding Access
	Credentials for AWS/EC2</a> by Eric Hammond.</p>

	<subsection name="Can I specify my own private key?"></subsection>

	<p>Yes, by setting
	<tt>whirr.private-key-file</tt> (or
	<tt>--private-key-file</tt> on the command line). You should also set
	<tt>whirr.public-key-file</tt> (
	<tt>--public-key-file</tt> ) at the same time.</p>
	<p>Private keys must not have a passphrase associated with them. You can check this with:</p>
	<source>grep ENCRYPTED ~/.ssh/id_rsa</source>
	<p>If there is no passphrase then there will be no match.</p>

	<subsection name="How do I access my cluster from a different network?"></subsection>

	<p>By default, access to clusters is restricted to the single IP address of the machine
	starting the cluster, as determined by
	<a class="externalLink" href="http://checkip.amazonaws.com/">Amazon's check IP service</a> .
	However, some networks report multiple origin IP addresses (e.g. they round-robin between them
	by connection), which may cause problems if the address used for later connections is different
	to the one reported at the time of the first connection.</p>
	<p>A related problem is when you wish to access the cluster from a different network to the one
	it was launched from.</p>
	<p>In these cases you can specify the IP addresses of the machines that may connect to the
	cluster by setting the
	<tt>client-cidrs</tt> property to a comma-separated list of
	<a class="externalLink" href="http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing">
	CIDR</a> blocks.</p>
	<p>For example,
	<tt>208.128.0.0/16,38.102.147.107/32</tt> would allow access from the
	<tt>208.128.0.0</tt> class B network, and the (single) IP address 38.102.147.107.</p>

	<subsection name="How can I start a cluster in a particular location?"></subsection>

	<p>By default clusters are started in an arbitrary location (e.g. region or data center). You
	can control the location by setting
	<tt>location-id</tt> (see the
	<a href="configuration-guide.html">configuration guide</a> for details).</p>
	<p>For example, in EC2, setting
	<tt>location-id</tt> to
	<tt>us-east-1</tt> would start the cluster in the US-East region, while setting it to
	<tt>us-east-1a</tt> (note the final
	<tt>a</tt> ) would start the cluster in that particular availability zone (
	<tt>us-east-1a</tt> ) in the US-East region.</p>

	<subsection name="How can I use a custom image? How can I control the cloud hardware used?">
	</subsection>

	<p>The default image used is dependent on the Cloud provider, the hardware, and the service.
	Whirr tries to find an image with Ubuntu Server and at least 1024 MB of RAM.</p>
	<p>Use
	<tt>image-id</tt> to specify the image used, and
	<tt>hardware-id</tt> to specify the hardware. Both are cloud-specific.</p>
	<p>You can specify the amount of RAM in a cloud agnostic way by setting a value for
	<tt>hardware-min-ram</tt> .</p>
	<p>In addition, on EC2 you need to set jclouds.ec2.ami-owners to include the AMI owner if it is
	not Amazon, Alestic, Canonical, or RightScale.</p>

	<subsection name="How do I log in to a node in the cluster?"></subsection>

	<p>On EC2, if you know the node's address you can do:</p>
	<source>ssh -i ~/.ssh/id_rsa <whirr.cluster-user>@host</source>
	<p>This assumes that you use the default private key; if this is not the case then specify the
	one you used at cluster launch.</p>
	<p>
	<tt>whirr.cluster-user</tt> defaults to the name of the local user running Whirr.</p>
	<p>
	<a name="how-can-i-modify-the-instance-installation-and-configuration-scripts"></a>
	</p>

	<subsection name="How can I modify the instance installation and configuration scripts?">
	</subsection>

	<p>The scripts to install and configure cloud instances are searched for on the classpath.</p>
	<p>(Note that in versions prior to 0.4.0 scripts were downloaded from S3 by default, and could
	be overridden by setting
	<tt>run-url-base</tt> . This property no longer has any effect, so you should instead use the
	approach explained below.)</p>
	<p>If you want to change the scripts then you can place a modified copy of the scripts in a
	<i>functions</i>directory in Whirr's installation directory. The original versions of the
	scripts can be found in
	<i>functions</i>directories in the source trees.</p>
	<p>For example, to override the Hadoop scripts, do the following:</p>
	<source>
	cd $WHIRR_HOME
	mkdir functions
	cp services/hadoop/src/main/resources/functions/* functions
	</source>
	<p>Then make your changes to the copies in
	<i>functions</i>.</p>
	<p>The first port of call for debugging the scripts that run on on a cloud instance is the
	<i>whirr.log</i>in the directory from which you launched the
	<i>whirr</i>CLI.</p>
	<p>The script output in this log file may be truncated, but you can see the complete output by
	logging into the node on which the script ran (see "How do I log in to a node in the cluster?"
	above) and looking in the
	<i>/tmp/bootstrap</i>or directories for the script itself, and the standard output and standard
	error logs.</p>

	<subsection name="How do I specify the service version and other service properties?">
	</subsection>

	<p>Some services have a property to control the version number of the software to be installed.
	This is typically achieved by setting the property
	<tt>whirr.<service-name>.tarball.url</tt> . Similarly, some services can have arbitrary
	service properties set.</p>
	<p>See the samples in the
	<i>recipes</i>directory for details for a particular service.</p>
	<p>In cases where neither of these configuration controls are supported, you may modify the
	scripts to install a particular version of the service, or to change the service properties
	from the defaults. See "How to modify the instance installation and configuration scripts"
	above for details on how to override the scripts.</p>

	<subsection name="How can I install custom packages?"></subsection>

	<p>You can install extra software by modifying the scripts that run on the cloud instances. See
	"How to modify the instance installation and configuration scripts" above.</p>

	<subsection name="How do I run Cloudera's Distribution for Hadoop?"></subsection>

	<p>You can run CDH rather than Apache Hadoop by running the Hadoop service and setting the
	<tt>whirr.hadoop.install-function</tt> and
	<tt>whirr.hadoop.configure-function</tt> properties. See the
	<i>recipes</i>directory in the distribution for samples.</p>
	<p>
	<a name="other-services"></a>
	</p>

	<subsection name="How do I run a Cassandra/HBase/ZooKeeper cluster?"></subsection>

	<p>See the
	<i>recipes</i>directory in the distribution for samples.</p>

	<subsection name="How do I automatically tear down a cluster after a fixed time?"></subsection>

	<p>It's often convenient to terminate a cluster a fixed time after launch. This is the case for
	test clusters, for example. You can achieve this by scheduling the destroy command using the
	<tt>at</tt> command from your local machine.</p>
	<p>
	<b>WARNING: The machine from which you issued the
	<tt>at</tt> command must be running (and able to contact the cloud provider) at the time it
	runs.</b>
	</p>
	<source>% echo 'bin/whirr destroy-cluster --config hadoop.properties' \| at 'now + 50 min'</source>
	<p>Note that issuing a
	<tt>shutdown</tt> command on an instance may simply stop the instance, which is not sufficient
	to fully terminate the instance, in which case you would continue to be charged for it. This is
	the case for EBS boot instances, for example.</p>
	<p>You can read more about this technique on
	<a class="externalLink" href="http://alestic.com/2010/09/ec2-instance-termination">Eric
	Hammond's blog</a> .</p>
	<p>Also, Mac OS X users might find
	<a class="externalLink"
	href="http://superuser.com/questions/43678/mac-os-x-at-command-not-working">this thread</a> a
	useful reference for the
	<tt>at</tt> command.</p>

	<subsection name="How do I start a machine without having a cluster role?"></subsection>

	<p>Sometimes you need to provision machines in the same cluster without having a specific role.
	For this you can use "noop" as a role name when specifying the instance templates.</p>
	<source>
	whirr.instance-templates=3 zookeeper,1 noop
	# will start three machines with zookeeper and one machine just with the OS
	</source>
	</body>
	</document>