docs/manual/source/resources/faq.html.md - predictionio - Git at Google

 ---
 title: Frequently Asked Questions
 ---

 <!--
 Licensed to the Apache Software Foundation (ASF) under one or more
 contributor license agreements.  See the NOTICE file distributed with
 this work for additional information regarding copyright ownership.
 The ASF licenses this file to You under the Apache License, Version 2.0
 (the "License"); you may not use this file except in compliance with
 the License.  You may obtain a copy of the License at

     http://www.apache.org/licenses/LICENSE-2.0

 Unless required by applicable law or agreed to in writing, software
 distributed under the License is distributed on an "AS IS" BASIS,
 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->

 If you have questions that are not resolved below, you can subscribe and post to
 the user mailing list. You can follow the instructions [here](/support).

 ## Using PredictionIO

 ### Q: How do I check to see if various dependencies, such as Elasticsearch and HBase, are running?

 You can run `$ pio status` from the terminal and it will return the status of various components that PredictionIO depends on.

 -  You should see the following message if everything is OK:

 ```
 $ pio status
 PredictionIO
   Installed at: /home/vagrant/PredictionIO
   Version: 0.8.6

 Apache Spark
   Installed at: /home/vagrant/PredictionIO/vendors/spark-1.2.0
   Version: 1.2.0 (meets minimum requirement of 1.2.0)

 Storage Backend Connections
   Verifying Meta Data Backend
   Verifying Model Data Backend
   Verifying Event Data Backend
   Test write Event Store (App Id 0)
 2015-02-03 18:52:38,904 INFO  hbase.HBLEvents - The table predictionio_eventdata:events_0 doesn't exist yet. Creating now...
 2015-02-03 18:52:39,868 INFO  hbase.HBLEvents - Removing table predictionio_eventdata:events_0...

 (sleeping 5 seconds for all messages to show up...)
 Your system is all ready to go.
 ```

 - If you see the following error message, it usually means Elasticsearch is not running properly:

 ```
   ...
 Storage Backend Connections
   Verifying Meta Data Backend
   ...
 Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: []
 	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:298)
   ...

 Unable to connect to all storage backend(s) successfully. Please refer to error message(s) above. Aborting.
 ```

 You can check if there is any Elasticsearch process by running 'jps'.

 Please see **How to start Elasticsearch** below.

 - If you see the following error message, it usually means HBase is not running properly:

 ```
 Storage Backend Connections
   Verifying Meta Data Backend
   Verifying Model Data Backend
   Verifying Event Data Backend
 2015-02-03 18:40:04,810 ERROR zookeeper.RecoverableZooKeeper - ZooKeeper exists failed after 1 attempts
 2015-02-03 18:40:04,812 ERROR zookeeper.ZooKeeperWatcher - hconnection-0x1e4075ce, quorum=localhost:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
 org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
 ...
 2015-02-03 18:40:07,021 ERROR hbase.StorageClient - Failed to connect to HBase. Plase check if HBase is running properly.
 2015-02-03 18:40:07,026 ERROR storage.Storage$ - Error initializing storage client for source HBASE
 2015-02-03 18:40:07,027 ERROR storage.Storage$ - Can't connect to ZooKeeper
 java.util.NoSuchElementException: None.get
 ...

 Unable to connect to all storage backend(s) successfully. Please refer to error message(s) above. Aborting.
 ```

 You can check if there is any HBase-related process by running 'jps'.

 Please see **How to start HBase** below.

 ### Q: How to start Elasticsearch?

 If you followed the [instructions](/install/install-sourcecode/) to install
 PredictionIO, Elasticsearch would have been installed at
 `PredictionIO/vendors/elasticsearch-x.y.z/` where x.y.z is the version number.
 To start it, run:

 ```
 $ ~/PredictionIO/vendors/elasticsearch-x.y.z/bin/elasticsearch
 ```

 If you didn't use install script, please go to where Elasticsearch is installed to start it.

 INFO: It may take some time (15 seconds or so) for Elasticsearch to become ready after you start it (wait a bit before you run `pio status` again).

 ### Q: How to start HBase ?

 If you followed the [instructions](/install/install-sourcecode/) to install
 PredictionIO, the HBase is installed at `~/PredictionIO/vendors/hbase-x.y.z/`
 where x.y.z is the version number. To start it, run:

 ```
 $ ~/PredictionIO/vendors/hbase-x.y.z/bin/start-hbase.sh
 ```

 If you didn't use install script, please go to where HBase is installed to start it.

 INFO: It may take some time (15 seconds or so) for HBase to become ready after
 you start it (wait a bit before you run `pio status` again).


 ## Problem with Event Server

 ### Q: How do I increase the JVM heap size of the Event Server?

 Add the `JAVA_OPTS` environmental variable to supply JVM options, e.g.

 ```
 $ JAVA_OPTS=-Xmx16g bin/pio eventserver ...
 ````

 ## Engine Training

 ### Q: How to increase Spark driver program and worker executor memory size?
 In general, the PredictionIO `bin/pio` scripts wraps around Spark's `spark-submit`
 script. You can specify a lot of Spark configurations (i.e. executor memory, cores, master
 url, etc.) with it. You can supply these as pass-through arguments at the end of
 `bin/pio` command.

 If the engine training seems stuck, it's possible that the the executor doesn't have enough memory.

 First, follow [instruction here]( http://spark.apache.org/docs/latest/spark-standalone.html) to start standalone Spark cluster and get the master URL. If you use the provided quick install script to install PredictionIO, the Spark is installed at `PredictionIO/vendors/spark-1.2.0/` where you could run the Spark commands in `sbin/` as described in the Spark documentation. Then use following train command to specify executor memory (default is only 512 MB) and driver memory.

 For example, the follow command set the Spark master to `spark://localhost:7077`
 (the default url of standalone cluster), set the driver memory to 16G and set the executor memory to 24G for `pio train`.

 ```
 $ pio train -- --master spark://localhost:7077 --driver-memory 16G --executor-memory 24G
 ```

 ### Q: How to resolve "Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 165:35 was 110539813 bytes, which exceeds max allowed: spark.akka.frameSize (10485760 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values."?

 A likely reason is the local algorithm model is larger than the default frame size.
 You can specify a larger value as a pass-thru argument to spark-submit when you `pio train`.
 The following command increase the frameSize to 1024MB.

 ```
 $ pio train -- --conf spark.akka.frameSize=1024
 ```

 ## Deploy Engine

 ### Q: How to increase heap space memory for "pio deploy"?

 If you see the following error during `pio deploy`, it means there is not enough
 heap space memory.

 ```
 ...
 [ERROR] [LocalFSModels] Java heap space
 [ERROR] [OneForOneStrategy] None.get
 ...
 ```

 To increase the heap space, specify the "-- --driver-memory " parameter in the
 command. For example, set the driver memory to 8G when deploy the engine:

 ```
 $ pio deploy -- --driver-memory 8G
 ```


 ## Building PredictionIO

 ### Q: How to resolve "Error: Could not find or load main class org.apache.predictionio.tools.Console" after ./make_distribution.sh?

 ```
 $ bin/pio app
 Error: Could not find or load main class org.apache.predictionio.tools.Console
 ```

 When PredictionIO bumps a version, it creates another JAR file with the new
 version number.

 Delete everything but the latest `pio-assembly-<VERSION>.jar` in
 `$PIO_HOME/assembly` directory. For example:

 ```
 PredictionIO$ cd assembly/
 PredictionIO/assembly$ ls -al
 total 197776
 drwxr-xr-x  2 yipjustin yipjustin      4096 Nov 12 00:08 .
 drwxr-xr-x 17 yipjustin yipjustin      4096 Nov 12 00:09 ..
 -rw-r--r--  1 yipjustin yipjustin 101184982 Nov  5 06:05 pio-assembly-0.8.1-SNAPSHOT.jar
 -rw-r--r--  1 yipjustin yipjustin 101324859 Nov 12 00:09 pio-assembly-0.8.2.jar

 PredictionIO/assembly$ rm pio-assembly-0.8.1-SNAPSHOT.jar
 ```

 ### Q: How to resolve ".......\[error\] (data/compile:compile) java.lang.AssertionError: assertion failed: java.lang.AutoCloseable" when ./make_distribution.sh?

 PredictionIO only support Java 8 or later. Please make sure you have the
 correct Java version with the command:

 ```
 $ javac -version
 ```

 ## Engine Development

 ### Q: What's the difference between P- and L- prefixed classes and functions?

 PredictionIO v0.8 is built on the top of Spark, a massively scalable programming
 framework. A spark algorithm is different from conventional single machine
 algorithm in a way that spark algorithms use the
 [RDD](http://spark.apache.org/docs/1.0.1/programming-guide.html#resilient-distributed-datasets-rdds)
 abstraction as its primary data type.

 PredictionIO framework natively support both RDD-based algorithms and
 traditional single-machine algorithms. For controllers prefixed by "P" (i.e.
 PJavaDataSource, PJavaAlgorithm), their data include RDD abstraction; For "L"
 controllers, they are traditional single machine algorithms.

 ## Running HBase

 ### Q: How to resolve 'Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.net.DNS.reverseDns(DNS.java:92)'?

 HBase relies on reverse DNS be set up properly to function. If your network
 configuration changes (such as working on a laptop with public WiFi hotspots),
 there could be a chance that reverse DNS does not function properly. You can
 install a DNS server on your own computer. Some users have reported that using
 [Google Public DNS](https://developers.google.com/speed/public-dns/) would also
 solve the problem.

 ### Q: How to fix HBase issues after cleaning up a disk that was full?

 You may receive error messages like `write error: No space left on device`
 when disk is full, and also receive error from `pio status` even after
 restarting PredictionIO services (due to
 [an issue](https://issues.apache.org/jira/browse/ZOOKEEPER-1621) in ZooKeeper).

 One quick fix is to delete the newest `snapshot.xxxxx` and `log.xxxoo` in the
 ZooKeeper data directory (e.g. `$HBASE_HOME/zookeeper/zookeeper_0/version-2`).
 Restart all services with `pio-start-all`, and use `pio status` to check whether
 your setup is good to go again.

 If you still have problems connecting to the event server, take a look at the
 HBase dashboard to see if there are `regions under transition`. If so, try the
 following:

 1.  Try `hbase hbck -repair` and `hbase hbck -repairHoles`. If it solves the
     problem, you are all set. Otherwise, continue on.

 2.  Find out failing regions by `hbase hbck`.

     ```
     ...
     Summary:
     Table pio_event:events_1 is inconsistent.
       Number of regions: 2
       Deployed on:  prediction.io,54829,1489213832255
     ...
     2 inconsistencies detected.
     ```

 3.  Shutdown HBase process and delete `recovered.edits` folders in the HBase
     data directory (e.g. `$HBASE_HOME/hbase/data/pio_event/events_1` in this
     example) for failing regions.

 4.  Run `hbase hbck -repairHoles` and restart all PredictionIO services.
	---
	title: Frequently Asked Questions
	---

	<!--
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.
	-->

	If you have questions that are not resolved below, you can subscribe and post to
	the user mailing list. You can follow the instructions [here](/support).

	## Using PredictionIO

	### Q: How do I check to see if various dependencies, such as Elasticsearch and HBase, are running?

	You can run `$ pio status` from the terminal and it will return the status of various components that PredictionIO depends on.

	- You should see the following message if everything is OK:

	```
	$ pio status
	PredictionIO
	Installed at: /home/vagrant/PredictionIO
	Version: 0.8.6

	Apache Spark
	Installed at: /home/vagrant/PredictionIO/vendors/spark-1.2.0
	Version: 1.2.0 (meets minimum requirement of 1.2.0)

	Storage Backend Connections
	Verifying Meta Data Backend
	Verifying Model Data Backend
	Verifying Event Data Backend
	Test write Event Store (App Id 0)
	2015-02-03 18:52:38,904 INFO hbase.HBLEvents - The table predictionio_eventdata:events_0 doesn't exist yet. Creating now...
	2015-02-03 18:52:39,868 INFO hbase.HBLEvents - Removing table predictionio_eventdata:events_0...

	(sleeping 5 seconds for all messages to show up...)
	Your system is all ready to go.
	```

	- If you see the following error message, it usually means Elasticsearch is not running properly:

	```
	...
	Storage Backend Connections
	Verifying Meta Data Backend
	...
	Caused by: org.elasticsearch.client.transport.NoNodeAvailableException: None of the configured nodes are available: []
	at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:298)
	...

	Unable to connect to all storage backend(s) successfully. Please refer to error message(s) above. Aborting.
	```

	You can check if there is any Elasticsearch process by running 'jps'.

	Please see How to start Elasticsearch below.

	- If you see the following error message, it usually means HBase is not running properly:

	```
	Storage Backend Connections
	Verifying Meta Data Backend
	Verifying Model Data Backend
	Verifying Event Data Backend
	2015-02-03 18:40:04,810 ERROR zookeeper.RecoverableZooKeeper - ZooKeeper exists failed after 1 attempts
	2015-02-03 18:40:04,812 ERROR zookeeper.ZooKeeperWatcher - hconnection-0x1e4075ce, quorum=localhost:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
	org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/hbaseid
	...
	2015-02-03 18:40:07,021 ERROR hbase.StorageClient - Failed to connect to HBase. Plase check if HBase is running properly.
	2015-02-03 18:40:07,026 ERROR storage.Storage$ - Error initializing storage client for source HBASE
	2015-02-03 18:40:07,027 ERROR storage.Storage$ - Can't connect to ZooKeeper
	java.util.NoSuchElementException: None.get
	...

	Unable to connect to all storage backend(s) successfully. Please refer to error message(s) above. Aborting.
	```

	You can check if there is any HBase-related process by running 'jps'.

	Please see How to start HBase below.

	### Q: How to start Elasticsearch?

	If you followed the [instructions](/install/install-sourcecode/) to install
	PredictionIO, Elasticsearch would have been installed at
	`PredictionIO/vendors/elasticsearch-x.y.z/` where x.y.z is the version number.
	To start it, run:

	```
	$ ~/PredictionIO/vendors/elasticsearch-x.y.z/bin/elasticsearch
	```

	If you didn't use install script, please go to where Elasticsearch is installed to start it.

	INFO: It may take some time (15 seconds or so) for Elasticsearch to become ready after you start it (wait a bit before you run `pio status` again).

	### Q: How to start HBase ?

	If you followed the [instructions](/install/install-sourcecode/) to install
	PredictionIO, the HBase is installed at `~/PredictionIO/vendors/hbase-x.y.z/`
	where x.y.z is the version number. To start it, run:

	```
	$ ~/PredictionIO/vendors/hbase-x.y.z/bin/start-hbase.sh
	```

	If you didn't use install script, please go to where HBase is installed to start it.

	INFO: It may take some time (15 seconds or so) for HBase to become ready after
	you start it (wait a bit before you run `pio status` again).


	## Problem with Event Server

	### Q: How do I increase the JVM heap size of the Event Server?

	Add the `JAVA_OPTS` environmental variable to supply JVM options, e.g.

	```
	$ JAVA_OPTS=-Xmx16g bin/pio eventserver ...
	````

	## Engine Training

	### Q: How to increase Spark driver program and worker executor memory size?
	In general, the PredictionIO `bin/pio` scripts wraps around Spark's `spark-submit`
	script. You can specify a lot of Spark configurations (i.e. executor memory, cores, master
	url, etc.) with it. You can supply these as pass-through arguments at the end of
	`bin/pio` command.

	If the engine training seems stuck, it's possible that the the executor doesn't have enough memory.

	First, follow [instruction here]( http://spark.apache.org/docs/latest/spark-standalone.html) to start standalone Spark cluster and get the master URL. If you use the provided quick install script to install PredictionIO, the Spark is installed at `PredictionIO/vendors/spark-1.2.0/` where you could run the Spark commands in `sbin/` as described in the Spark documentation. Then use following train command to specify executor memory (default is only 512 MB) and driver memory.

	For example, the follow command set the Spark master to `spark://localhost:7077`
	(the default url of standalone cluster), set the driver memory to 16G and set the executor memory to 24G for `pio train`.

	```
	$ pio train -- --master spark://localhost:7077 --driver-memory 16G --executor-memory 24G
	```

	### Q: How to resolve "Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Serialized task 165:35 was 110539813 bytes, which exceeds max allowed: spark.akka.frameSize (10485760 bytes) - reserved (204800 bytes). Consider increasing spark.akka.frameSize or using broadcast variables for large values."?

	A likely reason is the local algorithm model is larger than the default frame size.
	You can specify a larger value as a pass-thru argument to spark-submit when you `pio train`.
	The following command increase the frameSize to 1024MB.

	```
	$ pio train -- --conf spark.akka.frameSize=1024
	```

	## Deploy Engine

	### Q: How to increase heap space memory for "pio deploy"?

	If you see the following error during `pio deploy`, it means there is not enough
	heap space memory.

	```
	...
	[ERROR] [LocalFSModels] Java heap space
	[ERROR] [OneForOneStrategy] None.get
	...
	```

	To increase the heap space, specify the "-- --driver-memory " parameter in the
	command. For example, set the driver memory to 8G when deploy the engine:

	```
	$ pio deploy -- --driver-memory 8G
	```


	## Building PredictionIO

	### Q: How to resolve "Error: Could not find or load main class org.apache.predictionio.tools.Console" after ./make_distribution.sh?

	```
	$ bin/pio app
	Error: Could not find or load main class org.apache.predictionio.tools.Console
	```

	When PredictionIO bumps a version, it creates another JAR file with the new
	version number.

	Delete everything but the latest `pio-assembly-<VERSION>.jar` in
	`$PIO_HOME/assembly` directory. For example:

	```
	PredictionIO$ cd assembly/
	PredictionIO/assembly$ ls -al
	total 197776
	drwxr-xr-x 2 yipjustin yipjustin 4096 Nov 12 00:08 .
	drwxr-xr-x 17 yipjustin yipjustin 4096 Nov 12 00:09 ..
	-rw-r--r-- 1 yipjustin yipjustin 101184982 Nov 5 06:05 pio-assembly-0.8.1-SNAPSHOT.jar
	-rw-r--r-- 1 yipjustin yipjustin 101324859 Nov 12 00:09 pio-assembly-0.8.2.jar

	PredictionIO/assembly$ rm pio-assembly-0.8.1-SNAPSHOT.jar
	```

	### Q: How to resolve ".......\[error\] (data/compile:compile) java.lang.AssertionError: assertion failed: java.lang.AutoCloseable" when ./make_distribution.sh?

	PredictionIO only support Java 8 or later. Please make sure you have the
	correct Java version with the command:

	```
	$ javac -version
	```

	## Engine Development

	### Q: What's the difference between P- and L- prefixed classes and functions?

	PredictionIO v0.8 is built on the top of Spark, a massively scalable programming
	framework. A spark algorithm is different from conventional single machine
	algorithm in a way that spark algorithms use the
	[RDD](http://spark.apache.org/docs/1.0.1/programming-guide.html#resilient-distributed-datasets-rdds)
	abstraction as its primary data type.

	PredictionIO framework natively support both RDD-based algorithms and
	traditional single-machine algorithms. For controllers prefixed by "P" (i.e.
	PJavaDataSource, PJavaAlgorithm), their data include RDD abstraction; For "L"
	controllers, they are traditional single machine algorithms.

	## Running HBase

	### Q: How to resolve 'Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.net.DNS.reverseDns(DNS.java:92)'?

	HBase relies on reverse DNS be set up properly to function. If your network
	configuration changes (such as working on a laptop with public WiFi hotspots),
	there could be a chance that reverse DNS does not function properly. You can
	install a DNS server on your own computer. Some users have reported that using
	[Google Public DNS](https://developers.google.com/speed/public-dns/) would also
	solve the problem.

	### Q: How to fix HBase issues after cleaning up a disk that was full?

	You may receive error messages like `write error: No space left on device`
	when disk is full, and also receive error from `pio status` even after
	restarting PredictionIO services (due to
	[an issue](https://issues.apache.org/jira/browse/ZOOKEEPER-1621) in ZooKeeper).

	One quick fix is to delete the newest `snapshot.xxxxx` and `log.xxxoo` in the
	ZooKeeper data directory (e.g. `$HBASE_HOME/zookeeper/zookeeper_0/version-2`).
	Restart all services with `pio-start-all`, and use `pio status` to check whether
	your setup is good to go again.

	If you still have problems connecting to the event server, take a look at the
	HBase dashboard to see if there are `regions under transition`. If so, try the
	following:

	1. Try `hbase hbck -repair` and `hbase hbck -repairHoles`. If it solves the
	problem, you are all set. Otherwise, continue on.

	2. Find out failing regions by `hbase hbck`.

	```
	...
	Summary:
	Table pio_event:events_1 is inconsistent.
	Number of regions: 2
	Deployed on: prediction.io,54829,1489213832255
	...
	2 inconsistencies detected.
	```

	3. Shutdown HBase process and delete `recovered.edits` folders in the HBase
	data directory (e.g. `$HBASE_HOME/hbase/data/pio_event/events_1` in this
	example) for failing regions.

	4. Run `hbase hbck -repairHoles` and restart all PredictionIO services.