| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| |
| TODO for tomcat-connectors |
| |
| $Id$ |
| |
| 1) Optimize "distance" |
| ====================== |
| |
| Sorting the list of balanced workers by distance would be nice, but: |
| How to combine the sorting with the offset implementation (especially |
| useful for strategy BUSYNESS under low load). |
| |
| Local error states and likely other features will also break in case |
| we do naive reordering. |
| |
| 2) Reduce number of string comparisons in most_suitable |
| ======================================================== |
| |
| a) redirect/domains |
| |
| It would be easy to improve the redirect string b an integer, giving the |
| index of the worker in the lb. Then lb would not need to search for the redirect worker. |
| |
| The same way, one could add a list with indizes to workers in the same domain. |
| Whenever domain names are managed (init and status worker update) one would |
| scan the worker list and update the index list. |
| |
| Finally one could have a list of workers, whose domain is the same as the redirect |
| attribute of the worker, because that's also something we consider. |
| |
| What I'm not sure about, even in the existing code, is the locking between updates |
| by the status worker and the process local information about the workers, |
| especially in the case, when status updates a redirect or domain attribute. |
| |
| I would like to keep these attributes and the new index arrays process local, |
| and the processes should find out about changes made by status to shm (redirect/domain) |
| and then rebuild their data. No need to get these on every request from the shm, |
| only the check for up-to-date should be made. |
| |
| b) exact matches for jvmRoutes |
| |
| Could we use hashes instead of string comparisons all the time? |
| I'm not sure, if a good enough hash takes longer than a string comparison though. |
| |
| 3) Code separation between factory, validate and init in lb |
| ============================================================ |
| |
| The factory contains: |
| |
| private_data->worker.retries = JK_RETRIES; |
| private_data->s->recover_wait_time = WAIT_BEFORE_RECOVER; |
| |
| I think, this should move to validate() or init(). |
| It might even be obsolete, because in init, we already have: |
| |
| pThis->retries = jk_get_worker_retries(props, p->s->name, |
| p->s->retries = pThis->retries; |
| p->s->recover_wait_time = jk_get_worker_recover_timeout(props, p->s->name, WAIT_BEFORE_RECOVER); |
| if (p->s->recover_wait_time < WAIT_BEFORE_RECOVER) |
| p->s->recover_wait_time = WAIT_BEFORE_RECOVER; |
| |
| Then: In validate there is |
| |
| p->lb_workers[i].s->error_time = 0; |
| |
| So shouldn't there also be |
| |
| p->lb_workers[i].s->maintain_time = time(NULL); |
| |
| 4) Logging |
| ========== |
| |
| a) Allow logging of request url or uuid in jk log to ease matching with access log. |
| |
| b) Implement log rotation for IIS. (done in 1.2.31) |
| |
| c) Allow adding of log notes for IIS like we do with Apache. |
| |
| d) Add error type info to access log notes |
| |
| e) Refactor: Use the same code files for the request logging functions in Apache 1.3 and 2.0. |
| |
| f) Refactor: Use the same code files for piped logging in Apache 1.3 and 2.0. |
| |
| 5) ajpget |
| ========== |
| |
| Combine ajplib and Apache ab to an ajp13 commandline client ajpget. |
| |
| 6) Parsing workers.properties |
| ============================= |
| |
| Parsing of workers.properties aditionally to just looking up attributes |
| would help users to detect syntax errors in the file. At the moment |
| no information will be logged, e.g. when attributes contain typos. |
| |
| Example: worker.list vs. worker.lists. |
| |
| 7) Persisting workers.properties |
| ================================ |
| |
| Make workers.properties persist from inside status worker. |
| |
| Add additional properties file, that contains a journal of property changes done |
| via the status worker. When initializing those overwrite the initial workers.properties. |
| |
| Update actions in the status worker will allow to optionally add a change to this journal. |
| We can also add a comment with timestamp etc. to each journal line. |
| |
| 8) Reduce number of uses of time(NULL) |
| ====================================== |
| |
| We use time(NULL) a lot. Since it only has resolution of a second, |
| I'm asking myself, if we could update the actual time in only a few |
| places and get time out of some variables when needed. The same does |
| not hold true for millisecond time, but in several cases we use the time, |
| it's not very critical, that it is exact. These cases are related to: |
| |
| Some of this is already been done, the remaining parts are: |
| |
| - last_access for usage against timeout value that is ~minutes |
| - error_time for usage against retry timeout that is ~minutes |
| - uri_worker_map checked for usage against JK_URIMAP_RELOAD=1 minute |
| |
| So I think, it would suffice to set an actual time at the beginning of |
| the request/response cycle (used by everything before the request is being |
| sent over the socket) and maybe after the response shows up/ an error occurs |
| (for everything else, if there is). |
| |
| For which cases would it be OK, to use the time before sending to TC: |
| - uri_worker_map "checked" (uri map lookup starts early) |
| - setting/testing last_access in |
| - jk_ajp_common.c:ajp_connect_to_endpoint() |
| - jk_ajp_common.c:ajp_get_endpoint() |
| - jk_ajp_common.c:ajp_maintain() |
| |
| What about the others: |
| - setting last_access in init should use the actual time in |
| jk_ajp_common.c:ajp_create_endpoint_cache() |
| |
| - setting last_access again after the service could also use the |
| actual time in jk_ajp_common.c:ajp_done() |
| - setting error_time should better use the actual time |
| jk_lb_worker.c service(): rec->s->error_time = time(NULL); |
| |
| The last two cases could again use the same time, which then would be needed |
| to be generated at the end or directly after service. |
| |
| 9) Access/Modification Time in shm |
| ================================== |
| |
| a) [Discussion] What will this generally be used for? At the moment, |
| only jk_status "uses" it, but it only sets the values, it never asks for them. |
| |
| b) [Improvement, minor] jk_shm_set_workers_time() implicitly calls |
| jk_shm_sync_access_time(), but jk_status does: |
| |
| jk_shm_set_workers_time(time(NULL)); |
| /* Since we updated the config no need to reload |
| * on the next request |
| */ |
| jk_shm_sync_access_time(); |
| |
| two times. So depending on the idea of the functionality of these calls, |
| either set_workers_time and sync_access_time should be independently, |
| or the second call in jk_status coulkd be removed. |
| |
| 10) "Destroy" functionality |
| =========================== |
| |
| [Hint] Destroy on a worker never seems to free shm, |
| but I think that was already a flaw without shm. |
| |
| 11) Locks against shm |
| ===================== |
| |
| It might be an interesting experiment to implement an improved locking structure. |
| It looks like one would need in fact different types of locks. |
| In shm we have as read/write information: |
| |
| Changed only by status worker: |
| - redirect, domain, lb_factor, sticky_session, sticky_session_force, |
| recover_wait_time, retries, status (is_disabled, is_stopped). |
| |
| These changes need some kind of reconfiguration in the threads after |
| change and before the next request/response. Since changes are rare, |
| here we would be better of, with a simple detect change and copy from |
| shm to process procedure. status updates the data in shm and after that |
| the time stamp on the shh. Each process checks the time stamp before |
| doing a request, and when the time stamp changed it does a writer CS |
| lock and updates it's local copy. All threads always do a reader CS |
| lock when doing a request/response cycle. Reader CS locks are concurrent, |
| writers are exclusive. So readers are not allowed, when the config data is being updated. |
| |
| Changed by the threads themselves (and via reset by the status worker): |
| - counters needed by routing decisions (lb_value, readed, transferred, busy) |
| - timers needed by maintenance functions (error_time, servic_time/maintain_time) |
| - status is_busy, in_error_state |
| - uncritical data with no influence on routing decisions (max_busy, elected, errors, |
| in_recovering) |
| |
| Here again we could improve by using reader/writer locks. I have a |
| tendency for the PESSIMISTIC side of locking, but I think we could |
| shrink the code blocks that should be locked. At the monent they are |
| pretty big (most of get_most_suitable_worker). |
| |
| Read-only: name and id. |
| |
| By the way: at several places we don't check for errors on getting the lock. |
| |
| 12) Global locks |
| ================ |
| |
| We might want to make the lock technology choosable, like httpd does. |
| E.g. on Solaris the default lock type if fcntl, and we can easily |
| get an invalid EDEADLOCK for our jk_log_lock. |
| |
| The following pthread based non global locks are used: |
| |
| - one mutex for each AJP worker, synchronizing access to the connection |
| pool, which exists per process |
| |
| - one mutex for each lb worker |
| |
| - a mutex used during dynamic update of uriworkermap.properties to |
| prevent concurrent updates. Updates are done per process. |
| |
| - a mutex to prevent concurrent execution of the process local internal |
| maintenance task |
| |
| - a mutex for access to the shared memory when changing or reading |
| configuration parameters. That might be a little unsafe, because it |
| actually should be a global mutex, not a process local, but those config |
| changes are only done due to interaction with the status worker, so |
| there's very little chance for unwanted concurrency here. All dynamic |
| runtime data are already marked as being volatile. |
| |
| All except the last seem to be safe. The last might need some hybrid model |
| using thread local mostly and process global when doing updates. |
| |
| See also: http://marc.info/?t=123394059800003&r=1&w=2 |
| |
| 13) Understand the exact behaviour of shm and restarts |
| ====================================================== |
| |
| Furthermore: rotatelogs (?) and gzip (mod_mime_magic) seem to close |
| the (non-existing) shm file. Maybe a problem on some platforms? |
| |
| 14) What I didn't yet check |
| =========================== |
| |
| a) Correctness of is_busy handling |
| |
| b) Correctness of the reset values after reset by status worker |
| |
| c) What would be the exact behaviour, if shm does not work (memory case). |
| Will this be a critical failure, or will we only experience a |
| degradation in routing decisions. |
| |
| d) How complete is mod_proxy_ajp/mod_proxy_balancer. |
| Port changes from mod_jk to them. |
| |
| 15) Status worker |
| ================= |
| |
| Allow managing pool and connection parameters. Add flags to |
| pool and connections to signal workers and maintenance whether |
| existing connections should be closed and renewed. |
| |
| Check completeness of attribute manageability for AJP workers. |
| |
| Check completeness of runtime data display, e.g. |
| reset_time, recover_time, etc. Maybe also add "last error type". |
| |
| Work on a global display of process local data, e.g. state of the |
| process local connection pools (sizes, num connected/idle). |
| |
| Rework the GUI: |
| |
| - Basic overview start page with links to the workers, |
| maybe a checkbox to decide whether you want to see config data too. |
| - Detail view (what type of error was last and when etc.), detail view |
| for connection pools. |
| |
| 16) URI mapping |
| =============== |
| |
| Add more extensions? |
| |
| 17) Connection Pool |
| =================== |
| |
| How would a good global maximum count look like? |
| Simply limit busyness? Soft limit (applies only to non-sticky requests) |
| and a hard limit (applies to all requests)? |
| |
| 18) IP V6 |
| ========= |
| |
| There's a Bugzilla with a patch. |
| |
| 19) IIS Chunked Encoding |
| ======================== |
| |
| Move from alternative build to default. |
| |
| What about the other ifdef'd features? |
| |
| 20) Add REMORE_PORT as a default JkEnvVar |
| ========================================= |
| |
| ... and port to IIS to fix the getRemotePort() problem. |
| |
| 21) Rework HowTo docs |
| ===================== |
| |
| 22) Add better example config |
| ============================= |
| |
| 23) Remove JNI worker |
| ===================== |
| |
| 24) Watchdog reload of uriworkermap.properties |
| ============================================== |
| |
| Questions about uriworkermap.properties watchdog reload (r745898): |
| - shm lock needed? |
| - Apache port (need to iterate over vhosts) |
| |
| 25) Watchdog backend probing |
| ============================ |
| |
| Configurable probe URL to test backend, e.g. to decide about |
| recovery instead of using random real requests. |
| |
| 26) Commandline shm |
| =================== |
| |
| Commandline tool to read data from shm file. |
| |
| 27) Status worker property format |
| ================================= |
| |
| Check whether we return list properties as one line |
| (because Java doesn't allow the same property key multiple |
| times). |
| |
| Applies possibly to: |
| |
| "list", |
| BALANCE_WORKERS, |
| MOUNT_OF_WORKER, |
| USER_OF_WORKER, |
| GOOD_RATING_OF_WORKER, |
| BAD_RATING_OF_WORKER, |
| STATUS_FAIL_OF_WORKER, |
| |