Nginx Blocking

Nginx Blocking

What's blocking?

Nginx has one master process and multiple worker processes, and each worker process is single threading.

It’s well known that NGINX uses an asynchronous, event‑driven approach to handling connections. This means that instead of creating another dedicated process or thread for each request (like servers with a traditional architecture), it handles multiple connections and requests in one worker process. To achieve this, NGINX works with sockets in a non‑blocking mode and uses efficient methods such as epoll and kqueue.

However, once the connection is accepted in one worker process, it could not be moved to other processes, even when that process is busy. Not like golang, it has no work stealing. And, only socket is asynchronous, the cpu-intensive tasks and file io tasks is still synchronous.

Every phase handler for an http request runs in a single thread. OpenResty allows you to run lua code in each phase. The rewrite, access, content phases, as well as the timer (implemented as faked connection) are running lua code within an individual lua coroutine. Within the coroutine, you could use cosocket to communicate with the world. The cosocket is 100% nonblocking out of the box. When you call cosocket API and it needs to wait for more data, it would yield the coroutine, register corresponding event in epoll and let the event-loop handles other events.

As the name suggests, coroutines use the CPU in cooperative way. When you block a long time within one coroutine, other coroutines have no way to get executed, because all coroutines reside in the one thread, which results in uneven request handling and long request delay.

When I work in Kugou Music, my team encounters blocking issue, and I wrote a simple tool to diagnose it. This tool makes use of LD_PRELOAD to hook the lua_resume and check the function execution time, and print the url and entry lua function for long execution.

In fact, this tool is not good, because it could not print the backtrace and does not work for compiled lua code.

Locate the blocking source

The best tool to check blocking should be systemtap.

The blocking comes from two things:

  • cpu-intensive tasks
  • blocking system calls

So just check the execution time, if the time exceed a threshold, print the backtraces collected during that time.

...

How to slove blocking?

Since the main thread is blocking sensitive, obviously, we should delegate blocking jobs to other threads.

The nginx provides the thread pool, which is originally used for serving static files more efficiently. I wrap them in lua, so that you could use nonblocking API to execute blocking stuff.

I contribute my works to OpenResty official, that is ngx.run_worker_thread() API:

github.com/openresty/lua-nginx-module#ngxru..

...

Please visit my blog site for detail: luajit.io/post/2022/nginx-blocking