Optimizing Threads

Once you have determined your traffic profile and your memory profile, it is now possible to tune your server by adjusting the parameters of the thread pool. Each Jetty HttpListener has a pool of threads that is used to allocate threads to accepted connections. The following parameters can be used to configure the thread pool of each listener:

Parameter Comment
maxThreads limit to the number of threads that can be allocated to connections for that HTTP listener. This will effectively limit the number of simultaneous users of the server as well as the maximum memory usage.
minThreads The minimum number of unused threads to keep within the thread pool. A large number of unused threads will allows a server to respond to a sudden increase in load with little latency. More importantly, a HTTP listener is considered to be low on resources once it's pool cannot allocate minThreads unused threads without exceeding max threads.
maxIdleTimeMs The maximum time in milliseconds that a thread can be allocated to a connection without a request being received. This limits the duration of idle persistent connections.
lowResourcePersistTimeMs An alternative value for maxIdleTimeMs to be used when the listener is low on resources (see minThreads).
poolName If multiple HTTP Listeners are used, those with the same pool name will share the same thread pool. This avoid one listener running low on threads while another has idle threads.

Setting maxThreads

The primary objective of the maxThread setting is to protect the server from excess resource utilization from high connection or request rates. Without a limit to the maximum threads, it would be possible for arbitrary high load to be accepted by the server which would eventually lead to one of the following failure modes:

  • Out of memory. Each accepted connection/thread consumes memory and unlimited threads will eventually result in an OutOfMemoryException. Note that the memory allocated to the JVM can be increased to avoid this limit, but at some level physical memory will be exceeded and the server performance will decline. Eventually virtual memory can be exhausted.
  • Out of threads. Threads are normally implemented by the host operating system and are a finite resource that can be exhausted. The OS can normally be tuned to increase this limit, but not indefinitely as system performance will eventually degrade.
  • Out of file descriptors. TCP/IP connections are implemented by most operating systems using file descriptors and are a finite resource that can be exhausted. The OS can normally be tuned to increase this limit, but not indefinitely as system performance will eventually degrade.
  • 100% CPU. Each connection accepted will allows a flow of requests into the system, each which takes CPU to process. Once 100% CPU has been reached any additional connections accepted are just increasing latency for all connections and eventually reducing total throughput.

There are two main approaches to setting maxThreads:

  1. If a good estimate or measurement of the maximum load is known, then maxThreads is set high enough to handle this and then system verified to check that none of the failure modes are breached. This approach results in a server that is good enough for the webapp and can leave server resources available for other uses.
  2. Various maxThreads values are tested with a test client generating a load of approximately the same value. The tested maxThreads value is increased until such time as one of the failure modes above is detected or the measured throughput starts to decrease. This approach results in a server that uses all the system resources and requires a dedicated machine.

If with either of these approaches, the estimated, measured or required maximum load requires a maxThread value that exhaust the system memory, CPU, connections or other resources, then the machine is not sufficient for that webapp. In this case, additional server resources (memory, CPU, kernel configuration) is required or a clustering solution can be considered.

Once a server has reached it's maximum number of threads, then any new connections attempted are held by the operating system until either they time out, a thread becomes available to accept the connection or they are refused when the operating system queue becomes full.

Setting minThreads

The minThreads value is used to control how a server degrades under extreme load. Once there are less than minThreads available in the thread pool, then the lowResourcePersisteTimeMs parameter can be used to free up other idle threads.

If a good estimate or measure of average and maximum load are known, then the minThreads value can be set to half the difference between the average and maximum.

minThreads == (maxThreads - averageConnections) / 2

Thus if maxThreads is 3000 and averageConnections is 2500, then minThreads could be set at 250, so that low resource timeouts will be applied once the actual connections exceeds 2750.

Alternately, minThreads may be set to protect excess memory usage. If maxThreads requires more memory than is physically available, then minThreads can be set to free resources once physical memory is exceeded. Using the memory formula example from above and if 47Mb of physical memory is available on the system (when running the OS), then for maxThreads == 200:

minThreads == maxThreads - ( ( 47Mb - 23Mb ) / 200kb ) == 80

Setting maxIdleTimeMs

The idle time of a thread is used to limit the time that an persistent connection can be idle. Higher values are desirable to reduce latency for a user and avoid the expense of recreating TCP/IP connections. However, if the value is set too high, it wil result in many connections being left open when the user is no longer browsing the webapp and the resources allocated to it are effectively wasted for a long period of time.

A good value to use for the maxIdleTimeMs is slightly longer than the average page view time for the application, so that persistent connections are held long enough to span the time between page requests for an average user.

Setting lowResourcesPersistTimeMs

A HTTP Listener is considered low on resources if there are less than minThreads available in the thread pool and a lowResourcePersistTimeMs can be set to replace maxIdleTimeMs so that idle connections can be freed for other connections. The reasoning for this is that once a server is low on resources, there is little benefit keeping resources allocated to idle connections in the hope that new requests will come from them.

With a low lowResourcesPersistTimeMs value set, performance will degrade more gracefully as maxThreads is approached.

The value of lowResourcePersistTimeMs should be long enough to ensure that all requests in the cluster for a page view can be served by a persistent connection. This is typcially governed by the network latency and should not be more than a few seconds and can be as low as a few hundred milliseconds for a good network.

Setting poolName

If a server has multiple HTTP listeners configured, it may be desirable to share the thread pool between listeners, so that one listener is not starved or resources if the other has free threads. If you wish to reserve capacity for a particular listener, then a shared thread pool should not be used:


  ...
 
    
      
        8080
        80
        200
        30000
        2500
        Listener
      
    
  

  
    
      
        443
        Listener
        ./etc/demokeystore
        OBF:1vny1zlo1x8e1vnw1vn61x8g1zlu1vn4
        OBF:1u2u1wml1z7s1z7a1wnl1u2g
      
    
  
  ...