native/TODO.txt - tomcat-connectors - Git at Google

   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.

 TODO for tomcat-connectors

 $Id$

 1) Optimize "distance"
 ======================

 Sorting the list of balanced workers by distance would be nice, but:
 How to combine the sorting with the offset implementation (especially
 useful for strategy BUSYNESS under low load).

 Local error states and likely other features will also break in case
 we do naive reordering.

 2) Reduce number of string comparisons in most_suitable
 ========================================================

 a) redirect/domains

 It would be easy to improve the redirect string b an integer, giving the
 index of the worker in the lb. Then lb would not need to search for the redirect worker.

 The same way, one could add a list with indizes to workers in the same domain.
 Whenever domain names are managed (init and status worker update) one would
 scan the worker list and update the index list.

 Finally one could have a list of workers, whose domain is the same as the redirect
 attribute of the worker, because that's also something we consider.

 What I'm not sure about, even in the existing code, is the locking between updates
 by the status worker and the process local information about the workers,
 especially in the case, when status updates a redirect or domain attribute.

 I would like to keep these attributes and the new index arrays process local,
 and the processes should find out about changes made by status to shm (redirect/domain)
 and then rebuild their data. No need to get these on every request from the shm,
 only the check for up-to-date should be made.

 b) exact matches for jvmRoutes

 Could we use hashes instead of string comparisons all the time?
 I'm not sure, if a good enough hash takes longer than a string comparison though.

 3) Code separation between factory, validate and init in lb
 ============================================================

 The factory contains:

         private_data->worker.retries = JK_RETRIES;
         private_data->s->recover_wait_time = WAIT_BEFORE_RECOVER;

 I think, this should move to validate() or init().
 It might even be obsolete, because in init, we already have:

     pThis->retries = jk_get_worker_retries(props, p->s->name,
     p->s->retries = pThis->retries;
     p->s->recover_wait_time = jk_get_worker_recover_timeout(props, p->s->name, WAIT_BEFORE_RECOVER);
     if (p->s->recover_wait_time < WAIT_BEFORE_RECOVER)
         p->s->recover_wait_time = WAIT_BEFORE_RECOVER;

 Then: In validate there is

                 p->lb_workers[i].s->error_time = 0;

 So shouldn't there also be

                 p->lb_workers[i].s->maintain_time = time(NULL);

 4) Logging
 ==========

 a) Allow logging of request url or uuid in jk log to ease matching with access log.

 b) Implement log rotation for IIS. (done in 1.2.31)

 c) Allow adding of log notes for IIS like we do with Apache.

 d) Add error type info to access log notes

 e) Refactor: Use the same code files for the request logging functions in Apache 1.3 and 2.0.

 f) Refactor: Use the same code files for piped logging in Apache 1.3 and 2.0.

 5) ajpget
 ==========

 Combine ajplib and Apache ab to an ajp13 commandline client ajpget.

 6) Parsing workers.properties
 =============================

 Parsing of workers.properties aditionally to just looking up attributes
 would help users to detect syntax errors in the file. At the moment
 no information will be logged, e.g. when attributes contain typos.

 Example: worker.list vs. worker.lists.

 7) Persisting workers.properties
 ================================

 Make workers.properties persist from inside status worker.

 Add additional properties file, that contains a journal of property changes done
 via the status worker. When initializing those overwrite the initial workers.properties.

 Update actions in the status worker will allow to optionally add a change to this journal.
 We can also add a comment with timestamp etc. to each journal line.

 8) Reduce number of uses of time(NULL)
 ======================================

 We use time(NULL) a lot. Since it only has resolution of a second,
 I'm asking myself, if we could update the actual time in only a few
 places and get time out of some variables when needed. The same does
 not hold true for millisecond time, but in several cases we use the time,
 it's not very critical, that it is exact. These cases are related to:

 Some of this is already been done, the remaining parts are:

 - last_access for usage against timeout value that is ~minutes
 - error_time for usage against retry timeout that is ~minutes
 - uri_worker_map checked for usage against JK_URIMAP_RELOAD=1 minute

 So I think, it would suffice to set an actual time at the beginning of
 the request/response cycle (used by everything before the request is being
 sent over the socket) and maybe after the response shows up/ an error occurs
 (for everything else, if there is).

 For which cases would it be OK, to use the time before sending to TC:
 - uri_worker_map "checked" (uri map lookup starts early)
 - setting/testing last_access in
   - jk_ajp_common.c:ajp_connect_to_endpoint()
   - jk_ajp_common.c:ajp_get_endpoint()
   - jk_ajp_common.c:ajp_maintain()

 What about the others:
 - setting last_access in init should use the actual time in
   jk_ajp_common.c:ajp_create_endpoint_cache()

 - setting last_access again after the service could also use the
   actual time in jk_ajp_common.c:ajp_done()
 - setting error_time should better use the actual time
   jk_lb_worker.c service(): rec->s->error_time = time(NULL);

 The last two cases could again use the same time, which then would be needed
 to be generated at the end or directly after service.

 9) Access/Modification Time in shm
 ==================================

 a) [Discussion] What will this generally be used for? At the moment,
 only jk_status "uses" it, but it only sets the values, it never asks for them.

 b) [Improvement, minor] jk_shm_set_workers_time() implicitly calls
 jk_shm_sync_access_time(), but jk_status does:

             jk_shm_set_workers_time(time(NULL));
             /* Since we updated the config no need to reload
              * on the next request
              */
             jk_shm_sync_access_time();

 two times. So depending on the idea of the functionality of these calls,
 either set_workers_time and sync_access_time should be independently,
 or the second call in jk_status coulkd be removed.

 10) "Destroy" functionality
 ===========================

 [Hint] Destroy on a worker never seems to free shm,
 but I think that was already a flaw without shm.

 11) Locks against shm
 =====================

 It might be an interesting experiment to implement an improved locking structure.
 It looks like one would need in fact different types of locks.
 In shm we have as read/write information:

 Changed only by status worker:
 - redirect, domain, lb_factor, sticky_session, sticky_session_force,
   recover_wait_time, retries, status (is_disabled, is_stopped).

 These changes need some kind of reconfiguration in the threads after
 change and before the next request/response. Since changes are rare,
 here we would be better of, with a simple detect change and copy from
 shm to process procedure. status updates the data in shm and after that
 the time stamp on the shh. Each process checks the time stamp before
 doing a request, and when the time stamp changed it does a writer CS
 lock and updates it's local copy. All threads always do a reader CS
 lock when doing a request/response cycle. Reader CS locks are concurrent,
 writers are exclusive. So readers are not allowed, when the config data is being updated.

 Changed by the threads themselves (and via reset by the status worker):
 - counters needed by routing decisions (lb_value, readed, transferred, busy)
 - timers needed by maintenance functions (error_time, servic_time/maintain_time)
 - status is_busy, in_error_state
 - uncritical data with no influence on routing decisions (max_busy, elected, errors,
   in_recovering)

 Here again we could improve by using reader/writer locks. I have a
 tendency for the PESSIMISTIC side of locking, but I think we could
 shrink the code blocks that should be locked. At the monent they are
 pretty big (most of get_most_suitable_worker).

 Read-only: name and id.

 By the way: at several places we don't check for errors on getting the lock.

 12) Global locks
 ================

 We might want to make the lock technology choosable, like httpd does.
 E.g. on Solaris the default lock type if fcntl, and we can easily
 get an invalid EDEADLOCK for our jk_log_lock.

 The following pthread based non global locks are used:

 - one mutex for each AJP worker, synchronizing access to the connection
 pool, which exists per process

 - one mutex for each lb worker

 - a mutex used during dynamic update of uriworkermap.properties to
 prevent concurrent updates. Updates are done per process.

 - a mutex to prevent concurrent execution of the process local internal
 maintenance task

 - a mutex for access to the shared memory when changing or reading
 configuration parameters. That might be a little unsafe, because it
 actually should be a global mutex, not a process local, but those config
 changes are only done due to interaction with the status worker, so
 there's very little chance for unwanted concurrency here. All dynamic
 runtime data are already marked as being volatile.

 All except the last seem to be safe. The last might need some hybrid model
 using thread local mostly and process global when doing updates.

 See also: http://marc.info/?t=123394059800003&r=1&w=2

 13) Understand the exact behaviour of shm and restarts
 ======================================================

 Furthermore: rotatelogs (?) and gzip (mod_mime_magic) seem to close
 the (non-existing) shm file. Maybe a problem on some platforms?

 14) What I didn't yet check
 ===========================

 a) Correctness of is_busy handling

 b) Correctness of the reset values after reset by status worker

 c) What would be the exact behaviour, if shm does not work (memory case).
    Will this be a critical failure, or will we only experience a
    degradation in routing decisions.

 d) How complete is mod_proxy_ajp/mod_proxy_balancer.
    Port changes from mod_jk to them.

 15) Status worker
 =================

 Allow managing pool and connection parameters. Add flags to
 pool and connections to signal workers and maintenance whether
 existing connections should be closed and renewed.

 Check completeness of attribute manageability for AJP workers.

 Check completeness of runtime data display, e.g.
 reset_time, recover_time, etc. Maybe also add "last error type".

 Work on a global display of process local data, e.g. state of the
 process local connection pools (sizes, num connected/idle).

 Rework the GUI:

 - Basic overview start page with links to the workers,
   maybe a checkbox to decide whether you want to see config data too.
 - Detail view (what type of error was last and when etc.), detail view
   for connection pools.

 16) URI mapping
 ===============

 Add more extensions?

 17) Connection Pool
 ===================

 How would a good global maximum count look like?
 Simply limit busyness? Soft limit (applies only to non-sticky requests)
 and a hard limit (applies to all requests)?

 18) IP V6
 =========

 There's a Bugzilla with a patch.

 19) IIS Chunked Encoding
 ========================

 Move from alternative build to default.

 What about the other ifdef'd features?

 20) Add REMORE_PORT as a default JkEnvVar
 =========================================

 ... and port to IIS to fix the getRemotePort() problem.

 21) Rework HowTo docs
 =====================

 22) Add better example config
 =============================

 23) Remove JNI worker
 =====================

 24) Watchdog reload of uriworkermap.properties
 ==============================================

 Questions about uriworkermap.properties watchdog reload (r745898):
 - shm lock needed?
 - Apache port (need to iterate over vhosts)

 25) Watchdog backend probing
 ============================

 Configurable probe URL to test backend, e.g. to decide about
 recovery instead of using random real requests.

 26) Commandline shm
 ===================

 Commandline tool to read data from shm file.

 27) Status worker property format
 =================================

 Check whether we return list properties as one line
 (because Java doesn't allow the same property key multiple
 times).

 Applies possibly to:

 "list",
 BALANCE_WORKERS,
 MOUNT_OF_WORKER,
 USER_OF_WORKER,
 GOOD_RATING_OF_WORKER,
 BAD_RATING_OF_WORKER,
 STATUS_FAIL_OF_WORKER,
	Licensed to the Apache Software Foundation (ASF) under one or more
	contributor license agreements. See the NOTICE file distributed with
	this work for additional information regarding copyright ownership.
	The ASF licenses this file to You under the Apache License, Version 2.0
	(the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software
	distributed under the License is distributed on an "AS IS" BASIS,
	WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	See the License for the specific language governing permissions and
	limitations under the License.

	TODO for tomcat-connectors

	$Id$

	1) Optimize "distance"
	======================

	Sorting the list of balanced workers by distance would be nice, but:
	How to combine the sorting with the offset implementation (especially
	useful for strategy BUSYNESS under low load).

	Local error states and likely other features will also break in case
	we do naive reordering.

	2) Reduce number of string comparisons in most_suitable
	========================================================

	a) redirect/domains

	It would be easy to improve the redirect string b an integer, giving the
	index of the worker in the lb. Then lb would not need to search for the redirect worker.

	The same way, one could add a list with indizes to workers in the same domain.
	Whenever domain names are managed (init and status worker update) one would
	scan the worker list and update the index list.

	Finally one could have a list of workers, whose domain is the same as the redirect
	attribute of the worker, because that's also something we consider.

	What I'm not sure about, even in the existing code, is the locking between updates
	by the status worker and the process local information about the workers,
	especially in the case, when status updates a redirect or domain attribute.

	I would like to keep these attributes and the new index arrays process local,
	and the processes should find out about changes made by status to shm (redirect/domain)
	and then rebuild their data. No need to get these on every request from the shm,
	only the check for up-to-date should be made.

	b) exact matches for jvmRoutes

	Could we use hashes instead of string comparisons all the time?
	I'm not sure, if a good enough hash takes longer than a string comparison though.

	3) Code separation between factory, validate and init in lb
	============================================================

	The factory contains:

	private_data->worker.retries = JK_RETRIES;
	private_data->s->recover_wait_time = WAIT_BEFORE_RECOVER;

	I think, this should move to validate() or init().
	It might even be obsolete, because in init, we already have:

	pThis->retries = jk_get_worker_retries(props, p->s->name,
	p->s->retries = pThis->retries;
	p->s->recover_wait_time = jk_get_worker_recover_timeout(props, p->s->name, WAIT_BEFORE_RECOVER);
	if (p->s->recover_wait_time < WAIT_BEFORE_RECOVER)
	p->s->recover_wait_time = WAIT_BEFORE_RECOVER;

	Then: In validate there is

	p->lb_workers[i].s->error_time = 0;

	So shouldn't there also be

	p->lb_workers[i].s->maintain_time = time(NULL);

	4) Logging
	==========

	a) Allow logging of request url or uuid in jk log to ease matching with access log.

	b) Implement log rotation for IIS. (done in 1.2.31)

	c) Allow adding of log notes for IIS like we do with Apache.

	d) Add error type info to access log notes

	e) Refactor: Use the same code files for the request logging functions in Apache 1.3 and 2.0.

	f) Refactor: Use the same code files for piped logging in Apache 1.3 and 2.0.

	5) ajpget
	==========

	Combine ajplib and Apache ab to an ajp13 commandline client ajpget.

	6) Parsing workers.properties
	=============================

	Parsing of workers.properties aditionally to just looking up attributes
	would help users to detect syntax errors in the file. At the moment
	no information will be logged, e.g. when attributes contain typos.

	Example: worker.list vs. worker.lists.

	7) Persisting workers.properties
	================================

	Make workers.properties persist from inside status worker.

	Add additional properties file, that contains a journal of property changes done
	via the status worker. When initializing those overwrite the initial workers.properties.

	Update actions in the status worker will allow to optionally add a change to this journal.
	We can also add a comment with timestamp etc. to each journal line.

	8) Reduce number of uses of time(NULL)
	======================================

	We use time(NULL) a lot. Since it only has resolution of a second,
	I'm asking myself, if we could update the actual time in only a few
	places and get time out of some variables when needed. The same does
	not hold true for millisecond time, but in several cases we use the time,
	it's not very critical, that it is exact. These cases are related to:

	Some of this is already been done, the remaining parts are:

	- last_access for usage against timeout value that is ~minutes
	- error_time for usage against retry timeout that is ~minutes
	- uri_worker_map checked for usage against JK_URIMAP_RELOAD=1 minute

	So I think, it would suffice to set an actual time at the beginning of
	the request/response cycle (used by everything before the request is being
	sent over the socket) and maybe after the response shows up/ an error occurs
	(for everything else, if there is).

	For which cases would it be OK, to use the time before sending to TC:
	- uri_worker_map "checked" (uri map lookup starts early)
	- setting/testing last_access in
	- jk_ajp_common.c:ajp_connect_to_endpoint()
	- jk_ajp_common.c:ajp_get_endpoint()
	- jk_ajp_common.c:ajp_maintain()

	What about the others:
	- setting last_access in init should use the actual time in
	jk_ajp_common.c:ajp_create_endpoint_cache()

	- setting last_access again after the service could also use the
	actual time in jk_ajp_common.c:ajp_done()
	- setting error_time should better use the actual time
	jk_lb_worker.c service(): rec->s->error_time = time(NULL);

	The last two cases could again use the same time, which then would be needed
	to be generated at the end or directly after service.

	9) Access/Modification Time in shm
	==================================

	a) [Discussion] What will this generally be used for? At the moment,
	only jk_status "uses" it, but it only sets the values, it never asks for them.

	b) [Improvement, minor] jk_shm_set_workers_time() implicitly calls
	jk_shm_sync_access_time(), but jk_status does:

	jk_shm_set_workers_time(time(NULL));
	/* Since we updated the config no need to reload
	* on the next request
	*/
	jk_shm_sync_access_time();

	two times. So depending on the idea of the functionality of these calls,
	either set_workers_time and sync_access_time should be independently,
	or the second call in jk_status coulkd be removed.

	10) "Destroy" functionality
	===========================

	[Hint] Destroy on a worker never seems to free shm,
	but I think that was already a flaw without shm.

	11) Locks against shm
	=====================

	It might be an interesting experiment to implement an improved locking structure.
	It looks like one would need in fact different types of locks.
	In shm we have as read/write information:

	Changed only by status worker:
	- redirect, domain, lb_factor, sticky_session, sticky_session_force,
	recover_wait_time, retries, status (is_disabled, is_stopped).

	These changes need some kind of reconfiguration in the threads after
	change and before the next request/response. Since changes are rare,
	here we would be better of, with a simple detect change and copy from
	shm to process procedure. status updates the data in shm and after that
	the time stamp on the shh. Each process checks the time stamp before
	doing a request, and when the time stamp changed it does a writer CS
	lock and updates it's local copy. All threads always do a reader CS
	lock when doing a request/response cycle. Reader CS locks are concurrent,
	writers are exclusive. So readers are not allowed, when the config data is being updated.

	Changed by the threads themselves (and via reset by the status worker):
	- counters needed by routing decisions (lb_value, readed, transferred, busy)
	- timers needed by maintenance functions (error_time, servic_time/maintain_time)
	- status is_busy, in_error_state
	- uncritical data with no influence on routing decisions (max_busy, elected, errors,
	in_recovering)

	Here again we could improve by using reader/writer locks. I have a
	tendency for the PESSIMISTIC side of locking, but I think we could
	shrink the code blocks that should be locked. At the monent they are
	pretty big (most of get_most_suitable_worker).

	Read-only: name and id.

	By the way: at several places we don't check for errors on getting the lock.

	12) Global locks
	================

	We might want to make the lock technology choosable, like httpd does.
	E.g. on Solaris the default lock type if fcntl, and we can easily
	get an invalid EDEADLOCK for our jk_log_lock.

	The following pthread based non global locks are used:

	- one mutex for each AJP worker, synchronizing access to the connection
	pool, which exists per process

	- one mutex for each lb worker

	- a mutex used during dynamic update of uriworkermap.properties to
	prevent concurrent updates. Updates are done per process.

	- a mutex to prevent concurrent execution of the process local internal
	maintenance task

	- a mutex for access to the shared memory when changing or reading
	configuration parameters. That might be a little unsafe, because it
	actually should be a global mutex, not a process local, but those config
	changes are only done due to interaction with the status worker, so
	there's very little chance for unwanted concurrency here. All dynamic
	runtime data are already marked as being volatile.

	All except the last seem to be safe. The last might need some hybrid model
	using thread local mostly and process global when doing updates.

	See also: http://marc.info/?t=123394059800003&r=1&w=2

	13) Understand the exact behaviour of shm and restarts
	======================================================

	Furthermore: rotatelogs (?) and gzip (mod_mime_magic) seem to close
	the (non-existing) shm file. Maybe a problem on some platforms?

	14) What I didn't yet check
	===========================

	a) Correctness of is_busy handling

	b) Correctness of the reset values after reset by status worker

	c) What would be the exact behaviour, if shm does not work (memory case).
	Will this be a critical failure, or will we only experience a
	degradation in routing decisions.

	d) How complete is mod_proxy_ajp/mod_proxy_balancer.
	Port changes from mod_jk to them.

	15) Status worker
	=================

	Allow managing pool and connection parameters. Add flags to
	pool and connections to signal workers and maintenance whether
	existing connections should be closed and renewed.

	Check completeness of attribute manageability for AJP workers.

	Check completeness of runtime data display, e.g.
	reset_time, recover_time, etc. Maybe also add "last error type".

	Work on a global display of process local data, e.g. state of the
	process local connection pools (sizes, num connected/idle).

	Rework the GUI:

	- Basic overview start page with links to the workers,
	maybe a checkbox to decide whether you want to see config data too.
	- Detail view (what type of error was last and when etc.), detail view
	for connection pools.

	16) URI mapping
	===============

	Add more extensions?

	17) Connection Pool
	===================

	How would a good global maximum count look like?
	Simply limit busyness? Soft limit (applies only to non-sticky requests)
	and a hard limit (applies to all requests)?

	18) IP V6
	=========

	There's a Bugzilla with a patch.

	19) IIS Chunked Encoding
	========================

	Move from alternative build to default.

	What about the other ifdef'd features?

	20) Add REMORE_PORT as a default JkEnvVar
	=========================================

	... and port to IIS to fix the getRemotePort() problem.

	21) Rework HowTo docs
	=====================

	22) Add better example config
	=============================

	23) Remove JNI worker
	=====================

	24) Watchdog reload of uriworkermap.properties
	==============================================

	Questions about uriworkermap.properties watchdog reload (r745898):
	- shm lock needed?
	- Apache port (need to iterate over vhosts)

	25) Watchdog backend probing
	============================

	Configurable probe URL to test backend, e.g. to decide about
	recovery instead of using random real requests.

	26) Commandline shm
	===================

	Commandline tool to read data from shm file.

	27) Status worker property format
	=================================

	Check whether we return list properties as one line
	(because Java doesn't allow the same property key multiple
	times).

	Applies possibly to:

	"list",
	BALANCE_WORKERS,
	MOUNT_OF_WORKER,
	USER_OF_WORKER,
	GOOD_RATING_OF_WORKER,
	BAD_RATING_OF_WORKER,
	STATUS_FAIL_OF_WORKER,