If you have a single TCP connection, all the data flows through that connection, ultimately serializing at least some of the processing. Given that the workers are just responding with OK, no matter how many CPU cores you give to that you're still bound by the throughput of the IO thread (well by the minimum of the client and server IO thread). If you want more than 1 IO thread to share the load, you need more than one TCP connection.
When I started out using the gRPC SDK in Go, I was really surprised to find that it created just one connection per client. You'd think someone had made a multiplexing client wrapper, but I haven't been able to find one, so inevitably I've just written lightweight pooling myself. (Actually, I did come across one, but the code was very low quality.)
If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.
Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.
Doesn't initcwnd only apply as the initial value? I don't care that the first request on the gRPC channel is slow, but subsequent requests on the same channel reuse the TCP connection and should have larger window size. This works as long as the channel is actively being used, but after short inactivity (few hundred ms, unsure exactly) something appears to revert back.
Yeah that was my understanding too, hence I filed the bug (actually duplicate of older bug that was closed because poster didn't provide reproduction).
Still not sure if this is linux network configuration issue or grpc issue, but something is for sure broken if I can't send a ~1MB request and get response within roughly network RTT + server processing time.
I was not aware of this setting, it's pretty unfortunate this is a system-level setting that can't be overridden on application layer, and the idle timeout can't be changed either. Will have to figure out how to safely make this change on the k8s service this is affecting...
I don't think this is head-of-line blocking. That is, it's not like a single slow request causes starvation of other requests. The IO thread for the connection is grabbing and dispatching data to workers as fast as it can. All the requests are uniform, so it's not like one request would be bigger/harder to handle for that thread.
> First, we checked the number of TCP connections using lsof -i TCP:2137 and found that only a single TCP connection was used regardless of in-flight count.
It's head-of-line blocking. When requests are serialized, the queue will grow as long as the time to service a request is longer than the interval between arriving requests. Queue growth is bad if sufficient capacity exists to service requests in parallel.
I guess I'd thought of head-of-line-blocking as the delay from a slow request stalling subsequent ones beyond the throughput limits of the system: i.e. a slow-to-parse request causes other cheap requests to wait.
gRPC is a very badly implemented system. I have gotten 25%-30%+ improvements in throughput just by monkeypatching client libraries for google cloud to force json api endpoint usage.
At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.
If you have a single TCP connection, all the data flows through that connection, ultimately serializing at least some of the processing. Given that the workers are just responding with OK, no matter how many CPU cores you give to that you're still bound by the throughput of the IO thread (well by the minimum of the client and server IO thread). If you want more than 1 IO thread to share the load, you need more than one TCP connection.
When I started out using the gRPC SDK in Go, I was really surprised to find that it created just one connection per client. You'd think someone had made a multiplexing client wrapper, but I haven't been able to find one, so inevitably I've just written lightweight pooling myself. (Actually, I did come across one, but the code was very low quality.)
Somewhat related, I'm running into a gRPC latency issue in https://github.com/grpc/grpc-go/issues/8436
If request payload exceeds certain size the response latency goes from network RTT to double that, or triple.
Definitely something wrong with either TCP or HTTP/2 windowing as it doesn't send the full request without getting ACK from server first. But none of the gRPC windowing config options nor linux tcp_wmem/rmem settings work. Sending one byte request every few hundred milliseconds fixes it by keeping the gRPC channel / TCP connection active. Nagle / slow start is disabled.
sounds like classic tcp congestion window scaling delay. Sounds like your payload exceeds 10x initcwnd.
Doesn't initcwnd only apply as the initial value? I don't care that the first request on the gRPC channel is slow, but subsequent requests on the same channel reuse the TCP connection and should have larger window size. This works as long as the channel is actively being used, but after short inactivity (few hundred ms, unsure exactly) something appears to revert back.
Yes, in case of hot tcp connections congestion control should not be the issue.
Yeah that was my understanding too, hence I filed the bug (actually duplicate of older bug that was closed because poster didn't provide reproduction).
Still not sure if this is linux network configuration issue or grpc issue, but something is for sure broken if I can't send a ~1MB request and get response within roughly network RTT + server processing time.
Could you check the value of your kernel's net.ipv4.tcp_slow_start_after_idle sysctl, and if it's non zero set it to 0?
That seems to work, thank you!
Now latency is just RTT + server time + payloadsize/bandwidth, not multiple times RTT: https://github.com/grpc/grpc-go/issues/8436#issuecomment-311...
I was not aware of this setting, it's pretty unfortunate this is a system-level setting that can't be overridden on application layer, and the idle timeout can't be changed either. Will have to figure out how to safely make this change on the k8s service this is affecting...
This sounds exactly like the culprit. I didn't knew there is a slow start after idle and it is set to 1 (active) by default.
I wonder if I should change this to 0 on my default desktop machines for all connections.
That's indeed interesting, thank you for sharing.
classic case of head of line blocking!
I don't think this is head-of-line blocking. That is, it's not like a single slow request causes starvation of other requests. The IO thread for the connection is grabbing and dispatching data to workers as fast as it can. All the requests are uniform, so it's not like one request would be bigger/harder to handle for that thread.
> First, we checked the number of TCP connections using lsof -i TCP:2137 and found that only a single TCP connection was used regardless of in-flight count.
It's head-of-line blocking. When requests are serialized, the queue will grow as long as the time to service a request is longer than the interval between arriving requests. Queue growth is bad if sufficient capacity exists to service requests in parallel.
I guess I'd thought of head-of-line-blocking as the delay from a slow request stalling subsequent ones beyond the throughput limits of the system: i.e. a slow-to-parse request causes other cheap requests to wait.
Everything in a queue is subject to head-of-line blocking by definition; it's really a question of whether the wait time meets your requirements.
Requests are not serialized though. Http2/GRPC multiplexs multiple requests over one TCP connection.
gRPC is a very badly implemented system. I have gotten 25%-30%+ improvements in throughput just by monkeypatching client libraries for google cloud to force json api endpoint usage.
At least try something else besides gRPC when building systems so you have a baseline performance understanding. gRPC is OFTEN introducing performance breakdowns that goes unnoticed.
Have you done any comparisons with connect-rpc?