Tuesday, August 03, 2004

Hang The Jerk Who Invented Work

Sometimes I hate my stupid job, I really do.

On the plus side, after oh-so-much fussing and tweaking randomly and praying for my bug to go away, I finally figured out exactly what was going wrong with the socket communication in gzochi. I'm going to explain it, in case it can prevent any of you from tearing your own hair out over something similar: Okay, so first off, TCP sockets, which are great, are stream-oriented, which means you can treat them like files. That is, you can say, "hey, you, file descriptor! Do you have any data for me to read? I want 1024 bytes!" And the socket will say, "well, here's 37 bytes" and maybe the call to read() will even block for a little while before the socket says this because no one has actually sent the bytes. So here's what was happening in my program -- the user sends a message asking to join a game on a gzochi server; then, the server decides whether or not this is okay and tells the client so; then, if the request was approved, the server asynchronously sends a "token delivery" message, which is the key that will allow the client to actually initiate a datagram conversation with the server. The token is delivered asynchronously so that game availability would not necessarily be closely wed to the client requesting to join the game -- that was just a design choice I made, and hopefully it'll make the server more flexible in the long run. Anyway, what was happening was that the client would report the receipt of the message from the server saying whether or not the request to join the game was approved, but would not always (but sometimes would) receive the token, which, again, was sent in a separate message. I had no idea why this was happening.

Then I figured it out, and here's where it gets interesting. You'll need to know a few things: First of all, when you're reading from a stream into a buffer, you have to make sure that the buffer is big enough to hold the stuff you're reading. If, in one shot, you read, for example, 16 bytes, and you want to store all of it, you need to have a 16-byte buffer ready. You might think it'd be a good idea, then, to resize your buffer each time you read a byte -- read, allocate, read, allocate, etc., until you're done. No! This is bad, because allocation is time- / processor-intensive. Instead, what you do is try to read a big chunk at a time and write it all to the buffer, which is allocated by chunks. In my case, the chunk size is 1024 bytes.

The other key thing I should mention is that my TCP communication is mitigated by the use of zlib, a free (as in freedom) and wonderful compression library. zlib is also stream-oriented, in that you point it at some bytes and tell it to decompress them, and it'll come back and say, "hey, give me more bytes, the decompression's not finished," or, "okay, the message is fully decompressed." It figures out when it's decompressed the whole thing based on the input stream itself. This is great, because the stream-orientedness of TCP means that it's hard to tell when you've sent an entire message, especially when your messages are in (relatively) English text like mine are. I mean, you can use a certain character to signal the end of one message and the beginning of the next, but what if someone needs to send that character as part of a messaage? So by compressing the messages, I not only save bandwidth but also make the boundaries between messages programatically obvious.

So I grab 1024 bytes at a time, and zlib tells me when I've got a whole message. Here's where the problem was happening: If I grabbed more than one message's worth of bytes inside my message-reception code, I wouldn't look at anything past the point at which zlib told me it had decoded the first message. So each compressed message weighs it at about 100 bytes. When the client requests to join a game, the server sends two messages, one for the yes / no response, one for the token. On the client side, I ask the socket for 1024 bytes, and it gives me about 200. After processing about 100 of them, zlib tells me that it's done, and I return the decompressed message to the application layer, discarding the rest. So. It hadn't come up before because the server and client messages were usually one-to-one, like ping pong. It was when the second ball entered the mix that things started to go wrong.

Anyway, I fixed it by buffering my message-receiving code. So I still return the first message when zlib's finished decompressing it, but now I hang onto the remaining bytes and put them towards the next call to the message-receiving layer. Phew. Now I have to get back to designing this thing. Ugh. What a bad (2 months of utter anguish) coding experience.

Maybe I'm just an idiot.

No comments: