client.pl010064400022570000215000000010420666431230500127750ustar00chodiglib00000400000073#! /usr/bin/perl -w # client1.pl - a simple client use strict; use Socket; # initialize host and port my $host = shift || 'server.onsight.com'; my $port = shift || 7890; my $proto = getprotobyname('tcp'); # get the port address my $iaddr = inet_aton($host); my $paddr = sockaddr_in($port, $iaddr); # create the socket, connect to the port socket(SOCKET, PF_INET, SOCK_STREAM, $proto) or die "socket: $!"; connect(SOCKET, $paddr) or die "connect: $!"; while () { print; } close SOCKET or die "close: $!"; network.html010064400022570000215000002146270655663456500135770ustar00chodiglib00000400000073 Beej's Guide to Network Programming

Beej's Guide to Network Programming

Using Internet Sockets

Version 1.5.4 (17-May-1998)
[http://www.ecst.csuchico.edu/~beej/guide/net]

Intro

Hey! Socket programming got you down? Is this stuff just a little too difficult to figure out from the man pages? You want to do cool Internet programming, but you don't have time to wade through a gob of structs trying to figure out if you have to call bind() before you connect(), etc., etc.

Well, guess what! I've already done this nasty business, and I'm dying to share the information with everyone! You've come to the right place. This document should give the average competent C programmer the edge s/he needs to get a grip on this networking noise.


Audience

This document has been written as a tutorial, not a reference. It is probably at its best when read by individuals who are just starting out with socket programming and are looking for a foothold. It is certainly not the complete guide to sockets programming, by any means.

Hopefully, though, it'll be just enough for those man pages to start making sense... :-)


Platform and Compiler

Most of the code contained within this document was compiled on a Linux PC using Gnu's gcc compiler. It was also found to compile on HPUX using gcc. Note that every code snippet was not individually tested.


Contents:


What is a socket?

You hear talk of "sockets" all the time, and perhaps you are wondering just what they are exactly. Well, they're this: a way to speak to other programs using standard Unix file descriptors.

What?

Ok--you may have heard some Unix hacker state, "Jeez, everything in Unix is a file!" What that person may have been talking about is the fact that when Unix programs do any sort of I/O, they do it by reading or writing to a file descriptor. A file descriptor is simply an integer associated with an open file. But (and here's the catch), that file can be a network connection, a FIFO, a pipe, a terminal, a real on-the-disk file, or just about anything else. Everything in Unix is a file! So when you want to communicate with another program over the Internet you're gonna do it through a file descriptor, you'd better believe it.

"Where do I get this file descriptor for network communication, Mr. Smarty-Pants?" is probably the last question on your mind right now, but I'm going to answer it anyway: You make a call to the socket() system routine. It returns the socket descriptor, and you communicate through it using the specialized send() and recv() ("man send", "man recv") socket calls.

"But, hey!" you might be exclaiming right about now. "If it's a file descriptor, why in the hell can't I just use the normal read() and write() calls to communicate through the socket?" The short answer is, "You can!" The longer answer is, "You can, but send() and recv() offer much greater control over your data transmission."

What next? How about this: there are all kinds of sockets. There are DARPA Internet addresses (Internet Sockets), path names on a local node (Unix Sockets), CCITT X.25 addresses (X.25 Sockets that you can safely ignore), and probably many others depending on which Unix flavor you run. This document deals only with the first: Internet Sockets.


Two Types of Internet Sockets

What's this? There are two types of Internet sockets? Yes. Well, no. I'm lying. There are more, but I didn't want to scare you. I'm only going to talk about two types here. Except for this sentence, where I'm going to tell you that "Raw Sockets" are also very powerful and you should look them up.

All right, already. What are the two types? One is "Stream Sockets"; the other is "Datagram Sockets", which may hereafter be referred to as "SOCK_STREAM" and "SOCK_DGRAM", respectively. Datagram sockets are sometimes called "connectionless sockets" (though they can be connect()'d if you really want. See connect(), below.

Stream sockets are reliable two-way connected communication streams. If you output two items into the socket in the order "1, 2", they will arrive in the order "1, 2" at the opposite end. They will also be error free. Any errors you do encounter are figments of your own deranged mind, and are not to be discussed here.

What uses stream sockets? Well, you may have heard of the telnet application, yes? It uses stream sockets. All the characters you type need to arrive in the same order you type them, right? Also, WWW browsers use the HTTP protocol which uses stream sockets to get pages. Indeed, if you telnet to a WWW site on port 80, and type "GET pagename", it'll dump the HTML back at you!

How do stream sockets achieve this high level of data transmission quality? They use a protocol called "The Transmission Control Protocol", otherwise known as "TCP" (see RFC-793 for extremely detailed info on TCP.) TCP makes sure your data arrives sequentially and error-free. You may have heard "TCP" before as the better half of "TCP/IP" where "IP" stands for "Internet Protocol" (see RFC-791.) IP deals with Internet routing only.

Cool. What about Datagram sockets? Why are they called connectionless? What is the deal, here, anyway? Why are they unreliable? Well, here are some facts: if you send a datagram, it may arrive. It may arrive out of order. If it arrives, the data within the packet will be error-free.

Datagram sockets also use IP for routing, but they don't use TCP; they use the "User Datagram Protocol", or "UDP" (see RFC-768.)

Why are they connectionless? Well, basically, it's because you don't have to maintain an open connection as you do with stream sockets. You just build a packet, slap an IP header on it with destination information, and send it out. No connection needed. They are generally used for packet-by-packet transfers of information. Sample applications: tftp, bootp, etc.

"Enough!" you may scream. "How do these programs even work if datagrams might get lost?!" Well, my human friend, each has it's own protocol on top of UDP. For example, the tftp protocol says that for each packet that gets sent, the recipient has to send back a packet that says, "I got it!" (an "ACK" packet.) If the sender of the original packet gets no reply in, say, five seconds, he'll re-transmit the packet until he finally gets an ACK. This acknowledgment procedure is very important when implementing SOCK_DGRAM applications.


Low level Nonsense and Network Theory

Since I just mentioned layering of protocols, it's time to talk about how networks really work, and to show some examples of how SOCK_DGRAM packets are built. Practically, you can probably skip this section. It's good background, however.

[Encapsulated Protocols Image]

Hey, kids, it's time to learn about Data Encapsulation! This is very very important. It's so important that you might just learn about it if you take the networks course here at Chico State ;-). Basically, it says this: a packet is born, the packet is wrapped ("encapsulated") in a header (and maybe footer) by the first protocol (say, the TFTP protocol), then the whole thing (TFTP header included) is encapsulated again by the next protocol (say, UDP), then again by the next (IP), then again by the final protocol on the hardware (physical) layer (say, Ethernet).

When another computer receives the packet, the hardware strips the Ethernet header, the kernel strips the IP and UDP headers, the TFTP program strips the TFTP header, and it finally has the data.

Now I can finally talk about the infamous Layered Network Model. This Network Model describes a system of network functionality that has many advantages over other models. For instance, you can write sockets programs that are exactly the same without caring how the data is physically transmitted (serial, thin Ethernet, AUI, whatever) because programs on lower levels deal with it for you. The actual network hardware and topology is transparent to the socket programmer.

Without any further ado, I'll present the layers of the full-blown model. Remember this for network class exams:

The Physical Layer is the hardware (serial, Ethernet, etc.). The Application Layer is just about as far from the physical layer as you can imagine--it's the place where users interact with the network.

Now, this model is so general you could probably use it as an automobile repair guide if you really wanted to. A layered model more consistent with Unix might be:

At this point in time, you can probably see how these layers correspond to the encapsulation of the original data.

See how much work there is in building a simple packet? Jeez! And you have to type in the packet headers yourself using "cat"! Just kidding. All you have to do for stream sockets is send() the data out. All you have to do for datagram sockets is encapsulate the packet in the method of your choosing and sendto() it out. The kernel builds the Transport Layer and Internet Layer on for you and the hardware does the Network Access Layer. Ah, modern technology.

So ends our brief foray into network theory. Oh yes, I forgot to tell you everything I wanted to say about routing: nothing! That's right, I'm not going to talk about it at all. The router strips the packet to the IP header, consults its routing table, blah blah blah. Check out the IP RFC if you really really care. If you never learn about it, well, you'll live.


structs

Well, we're finally here. It's time to talk about programming. In this section, I'll cover various data types used by the sockets interface, since some of them are a real bitch to figure out.

First the easy one: a socket descriptor. A socket descriptor is the following type:

    int
Just a regular int.

Things get weird from here, so just read through and bear with me. Know this: there are two byte orderings: most significant byte (sometimes called an "octet") first, or least significant byte first. The former is called "Network Byte Order". Some machines store their numbers internally in Network Byte Order, some don't. When I say something has to be in NBO, you have to call a function (such as htons()) to change it from "Host Byte Order". If I don't say "NBO", then you must leave the value in Host Byte Order.

My First Struct(TM)--struct sockaddr. This structure holds socket address information for many types of sockets:

    struct sockaddr {
        unsigned short    sa_family;    /* address family, AF_xxx       */
        char              sa_data[14];  /* 14 bytes of protocol address */
    };
sa_family can be a variety of things, but it'll be "AF_INET" for everything we do in this document. sa_data contains a destination address and port number for the socket. This is rather unwieldy.

To deal with struct sockaddr, programmers created a parallel structure: struct sockaddr_in ("in" for "Internet".)

    struct sockaddr_in {
        short int          sin_family;  /* Address family               */
        unsigned short int sin_port;    /* Port number                  */
        struct in_addr     sin_addr;    /* Internet address             */
        unsigned char      sin_zero[8]; /* Same size as struct sockaddr */
    };
This structure makes it easy to reference elements of the socket address. Note that sin_zero (which is included to pad the structure to the length of a struct sockaddr) should be set to all zeros with the function bzero() or memset(). Also, and this is the important bit, a pointer to a struct sockaddr_in can be cast to a pointer to a struct sockaddr and vice-versa. So even though socket() wants a struct sockaddr *, you can still use a struct sockaddr_in and cast it at the last minute! Also, notice that sin_family corresponds to sa_family in a struct sockaddr and should be set to "AF_INET". Finally, the sin_port and sin_addr must be in Network Byte Order!

"But," you object, "how can the entire structure, struct in_addr sin_addr, be in Network Byte Order?" This question requires careful examination of the structure struct in_addr, one of the worst unions alive:

    /* Internet address (a structure for historical reasons) */
    struct in_addr {
        unsigned long s_addr;
    };
Well, it used to be a union, but now those days seem to be gone. Good riddance. So if you have declared "ina" to be of type struct sockaddr_in, then "ina.sin_addr.s_addr" references the 4 byte IP address (in Network Byte Order). Note that even if your system still uses the God-awful union for struct in_addr, you can still reference the 4 byte IP address in exactly the same way as I did above (this due to #defines.)


Convert the Natives!

We've now been lead right into the next section. There's been too much talk about this Network to Host Byte Order conversion--now is the time for action!

All righty. There are two types that you can convert: short (two bytes) and long (four bytes). These functions work for the unsigned variations as well. Say you want to convert a short from Host Byte Order to Network Byte Order. Start with "h" for "host", follow it with "to", then "n" for "network", and "s" for "short": h-to-n-s, or htons() (read: "Host to Network Short").

It's almost too easy...

You can use every combination if "n", "h", "s", and "l" you want, not counting the really stupid ones. For example, there is NOT a stolh() ("Short to Long Host") function--not at this party, anyway. But there are:

Now, you may think you're wising up to this. You might think, "What do I do if I have to change byte order on a char?" Then you might think, "Uh, never mind." You might also think that since your 68000 machine already uses network byte order, you don't have to call htonl() on your IP addresses. You would be right, BUT if you try to port to a machine that has reverse network byte order, your program will fail. Be portable! This is a Unix world! Remember: put your bytes in Network Order before you put them on the network.

A final point: why do sin_addr and sin_port need to be in Network Byte Order in a struct sockaddr_in, but sin_family does not? The answer: sin_addr and sin_port get encapsulated in the packet at the IP and UDP layers, respectively. Thus, they must be in Network Byte Order. However, the sin_family field is only used by the kernel to determine what type of address the structure contains, so it must be in Host Byte Order. Also, since sin_family does not get sent out on the network, it can be in Host Byte Order.


IP Addresses and How to Deal With Them

Fortunately for you, there are a bunch of functions that allow you to manipulate IP addresses. No need to figure them out by hand and stuff them in a long with the << operator.

First, let's say you have a struct sockaddr_in ina, and you have an IP address "132.241.5.10" that you want to store into it. The function you want to use, inet_addr(), converts an IP address in numbers-and-dots notation into an unsigned long. The assignment can be made as follows:

    ina.sin_addr.s_addr = inet_addr("132.241.5.10");
Notice that inet_addr() returns the address in Network Byte Order already--you don't have to call htonl(). Swell!

Now, the above code snippet isn't very robust because there is no error checking. See, inet_addr() returns -1 on error. Remember binary numbers? (unsigned)-1 just happens to correspond to the IP address 255.255.255.255! That's the broadcast address! Wrongo. Remember to do your error checking properly.

All right, now you can convert string IP addresses to longs. What about the other way around? What if you have a struct in_addr and you want to print it in numbers-and-dots notation? In this case, you'll want to use the function inet_ntoa() ("ntoa" means "network to ascii") like this:

    printf("%s",inet_ntoa(ina.sin_addr));
That will print the IP address. Note that inet_ntoa() takes a struct in_addr as an argument, not a long. Also notice that it returns a pointer to a char. This points to a statically stored char array within inet_ntoa() so that each time you call inet_ntoa() it will overwrite the last IP address you asked for. For example:
    char *a1, *a2;
    .
    .
    a1 = inet_ntoa(ina1.sin_addr);  /* this is 198.92.129.1 */
    a2 = inet_ntoa(ina2.sin_addr);  /* this is 132.241.5.10 */
    printf("address 1: %s\n",a1);
    printf("address 2: %s\n",a2);
will print:
    address 1: 132.241.5.10
    address 2: 132.241.5.10
If you need to save the address, strcpy() it to your own character array.

That's all on this topic for now. Later, you'll learn to convert a string like "whitehouse.gov" into its corresponding IP address (see DNS, below.)


socket()--Get the File Descriptor!

I guess I can put it off no longer--I have to talk about the socket() system call. Here's the breakdown:
    #include <sys/types.h> 
    #include <sys/socket.h> 

    int socket(int domain, int type, int protocol);
But what are these arguments? First, domain should be set to "AF_INET", just like in the struct sockaddr_in (above.) Next, the type argument tells the kernel what kind of socket this is: SOCK_STREAM or SOCK_DGRAM. Finally, just set protocol to "0". (Notes: there are many more domains than I've listed. There are many more types than I've listed. See the socket() man page. Also, there's a "better" way to get the protocol. See the getprotobyname() man page.)

socket() simply returns to you a socket descriptor that you can use in later system calls, or -1 on error. The global variable errno is set to the error's value (see the perror() man page.)


bind()--What port am I on?

Once you have a socket, you might have to associate that socket with a port on your local machine. (This is commonly done if you're going to listen() for incoming connections on a specific port--MUDs do this when they tell you to "telnet to x.y.z port 6969".) If you're going to only be doing a connect(), this may be unnecessary. Read it anyway, just for kicks.

Here is the synopsis for the bind() system call:

    #include <sys/types.h> 
    #include <sys/socket.h> 

    int bind(int sockfd, struct sockaddr *my_addr, int addrlen);
sockfd is the socket file descriptor returned by socket(). my_addr is a pointer to a struct sockaddr that contains information about your address, namely, port and IP address. addrlen can be set to sizeof(struct sockaddr).

Whew. That's a bit to absorb in one chunk. Let's have an example:

    #include <string.h> 
    #include <sys/types.h> 
    #include <sys/socket.h> 

    #define MYPORT 3490

    main()
    {
        int sockfd;
        struct sockaddr_in my_addr;

        sockfd = socket(AF_INET, SOCK_STREAM, 0); /* do some error checking! */

        my_addr.sin_family = AF_INET;     /* host byte order */
        my_addr.sin_port = htons(MYPORT); /* short, network byte order */
        my_addr.sin_addr.s_addr = inet_addr("132.241.5.10");
        bzero(&(my_addr.sin_zero), 8);    /* zero the rest of the struct */

        /* don't forget your error checking for bind(): */
        bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));
        .
        .
        .
There are a few things to notice here. my_addr.sin_port is in Network Byte Order. So is my_addr.sin_addr.s_addr. Another thing to watch out for is that the header files might differ from system to system. To be sure, you should check your local man pages.

Lastly, on the topic of bind(), I should mention that some of the process of getting your own IP address and/or port can can be automated:

        my_addr.sin_port = 0; /* choose an unused port at random */
        my_addr.sin_addr.s_addr = INADDR_ANY;  /* use my IP address */
See, by setting my_addr.sin_port to zero, you are telling bind() to choose the port for you. Likewise, by setting my_addr.sin_addr.s_addr to INADDR_ANY, you are telling it to automatically fill in the IP address of the machine the process is running on.

If you are into noticing little things, you might have seen that I didn't put INADDR_ANY into Network Byte Order! Naughty me. However, I have inside info: INADDR_ANY is really zero! Zero still has zero on bits even if you rearrange the bytes. However, purists will point out that there could be a parallel dimension where INADDR_ANY is, say, 12 and that my code won't work there. That's ok with me:

        my_addr.sin_port = htons(0); /* choose an unused port at random */
        my_addr.sin_addr.s_addr = htonl(INADDR_ANY);  /* use my IP address */
Now we're so portable you probably wouldn't believe it. I just wanted to point that out, since most of the code you come across won't bother running INADDR_ANY through htonl().

bind() also returns -1 on error and sets errno to the error's value.

Another thing to watch out for when calling bind(): don't go underboard with your port numbers. All ports below 1024 are RESERVED! You can have any port number above that, right up to 65535 (provided they aren't already being used by another program.)

One small extra final note about bind(): there are times when you won't absolutely have to call it. If you are connect()'ing to a remote machine and you don't care what your local port is (as is the case with telnet), you can simply call connect(), it'll check to see if the socket is unbound, and will bind() it to an unused local port.


connect()--Hey, you!

Let's just pretend for a few minutes that you're a telnet application. Your user commands you (just like in the movie TRON) to get a socket file descriptor. You comply and call socket(). Next, the user tells you to connect to "132.241.5.10" on port "23" (the standard telnet port.) Oh my God! What do you do now?

Lucky for you, program, you're now perusing the section on connect()--how to connect to a remote host. You read furiously onward, not wanting to disappoint your user...

The connect() call is as follows:

    #include <sys/types.h> 
    #include <sys/socket.h> 

    int connect(int sockfd, struct sockaddr *serv_addr, int addrlen);
sockfd is our friendly neighborhood socket file descriptor, as returned by the socket() call, serv_addr is a struct sockaddr containing the destination port and IP address, and addrlen can be set to sizeof(struct sockaddr).

Isn't this starting to make more sense? Let's have an example:

    #include <string.h> 
    #include <sys/types.h> 
    #include <sys/socket.h> 

    #define DEST_IP   "132.241.5.10"
    #define DEST_PORT 23

    main()
    {
        int sockfd;
        struct sockaddr_in dest_addr;   /* will hold the destination addr */

        sockfd = socket(AF_INET, SOCK_STREAM, 0); /* do some error checking! */

        dest_addr.sin_family = AF_INET;        /* host byte order */
        dest_addr.sin_port = htons(DEST_PORT); /* short, network byte order */
        dest_addr.sin_addr.s_addr = inet_addr(DEST_IP);
        bzero(&(dest_addr.sin_zero), 8);       /* zero the rest of the struct */

        /* don't forget to error check the connect()! */
        connect(sockfd, (struct sockaddr *)&dest_addr, sizeof(struct sockaddr));
        .
        .
        .

Again, be sure to check the return value from connect()--it'll return -1 on error and set the variable errno.

Also, notice that we didn't call bind(). Basically, we don't care about our local port number; we only care where we're going. The kernel will choose a local port for us, and the site we connect to will automatically get this information from us. No worries.


listen()--Will somebody please call me?

Ok, time for a change of pace. What if you don't want to connect to a remote host. Say, just for kicks, that you want to wait for incoming connections and handle them in some way. The process is two step: first you listen(), then you accept() (see below.)

The listen call is fairly simple, but requires a bit of explanation:

    int listen(int sockfd, int backlog);
sockfd is the usual socket file descriptor from the socket() system call. backlog is the number of connections allowed on the incoming queue. What does that mean? Well, incoming connections are going to wait in this queue until you accept() them (see below) and this is the limit on how many can queue up. Most systems silently limit this number to about 20; you can probably get away with setting it to 5 or 10.

Again, as per usual, listen() returns -1 and sets errno on error.

Well, as you can probably imagine, we need to call bind() before we call listen() or the kernel will have us listening on a random port. Bleah! So if you're going to be listening for incoming connections, the sequence of system calls you'll make is:

    socket();
    bind();
    listen();
    /* accept() goes here */
I'll just leave that in the place of sample code, since it's fairly self-explanatory. (The code in the accept() section, below, is more complete.) The really tricky part of this whole sha-bang is the call to accept().


accept()--"Thank you for calling port 3490."

Get ready--the accept() call is kinda weird! What's going to happen is this: someone far far away will try to connect() to your machine on a port that you are listen()'ing on. Their connection will be queued up waiting to be accept()'ed. You call accept() and you tell it to get the pending connection. It'll return to you a brand new socket file descriptor to use for this single connection! That's right, suddenly you have two socket file descriptors for the price of one! The original one is still listening on your port and the newly created one is finally ready to send() and recv(). We're there!

The call is as follows:

     #include <sys/socket.h> 

     int accept(int sockfd, void *addr, int *addrlen);
sockfd is the listen()'ing socket descriptor. Easy enough. addr will usually be a pointer to a local struct sockaddr_in. This is where the information about the incoming connection will go (and you can determine which host is calling you from which port). addrlen is a local integer variable that should be set to sizeof(struct sockaddr_in) before its address is passed to accept(). Accept will not put more than that many bytes into addr. If it puts fewer in, it'll change the value of addrlen to reflect that.

Guess what? accept() returns -1 and sets errno if an error occurs. Betcha didn't figure that.

Like before, this is a bunch to absorb in one chunk, so here's a sample code fragment for your perusal:

    #include <string.h> 
    #include <sys/types.h> 
    #include <sys/socket.h> 

    #define MYPORT 3490    /* the port users will be connecting to */

    #define BACKLOG 10     /* how many pending connections queue will hold */

    main()
    {
        int sockfd, new_fd;  /* listen on sock_fd, new connection on new_fd */
        struct sockaddr_in my_addr;    /* my address information */
        struct sockaddr_in their_addr; /* connector's address information */
        int sin_size;

        sockfd = socket(AF_INET, SOCK_STREAM, 0); /* do some error checking! */

        my_addr.sin_family = AF_INET;         /* host byte order */
        my_addr.sin_port = htons(MYPORT);     /* short, network byte order */
        my_addr.sin_addr.s_addr = INADDR_ANY; /* auto-fill with my IP */
        bzero(&(my_addr.sin_zero), 8);        /* zero the rest of the struct */

        /* don't forget your error checking for these calls: */
        bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr));

        listen(sockfd, BACKLOG);

        sin_size = sizeof(struct sockaddr_in);
        new_fd = accept(sockfd, &their_addr, &sin_size);
        .
        .
        .
Again, note that we will use the socket descriptor new_fd for all send() and recv() calls. If you're only getting one single connection ever, you can close() the original sockfd in order to prevent more incoming connections on the same port, if you so desire.


send() and recv()--Talk to me, baby!

These two functions are for communicating over stream sockets or connected datagram sockets. If you want to use regular unconnected datagram sockets, you'll need to see the section on sendto() and recvfrom(), below.

The send() call:

    int send(int sockfd, const void *msg, int len, int flags);
sockfd is the socket descriptor you want to send data to (whether it's the one returned by socket() or the one you got with accept().) msg is a pointer to the data you want to send, and len is the length of that data in bytes. Just set flags to 0. (See the send() man page for more information concerning flags.)

Some sample code might be:

    char *msg = "Beej was here!";
    int len, bytes_sent;
    .
    .
    len = strlen(msg);
    bytes_sent = send(sockfd, msg, len, 0);
    .
    .
    .
send() returns the number of bytes actually sent out--this might be less than the number you told it to send! See, sometimes you tell it to send a whole gob of data and it just can't handle it. It'll fire off as much of the data as it can, and trust you to send the rest later. Remember, if the value returned by send() doesn't match doesn't match the value in len, it's up to you to send the rest of the string. The good news is this: if the packet is small (less than 1K or so) it will probably manage to send the whole thing all in one go. Again, -1 is returned on error, and errno is set to the error number.

The recv() call is similar in many respects:

    int recv(int sockfd, void *buf, int len, unsigned int flags);
sockfd is the socket descriptor to read from, buf is the buffer to read the information into, len is the maximum length of the buffer, and flags can again be set to 0. (See the recv() man page for flag information.)

recv() returns the number of bytes actually read into the buffer, or -1 on error (with errno set, accordingly.)

There, that was easy, wasn't it? You can now pass data back and forth on stream sockets! Whee! You're a Unix Network Programmer!


sendto() and recvfrom()--Talk to me, DGRAM-style

"This is all fine and dandy," I hear you saying, "but where does this leave me with unconnected datagram sockets?" No problemo, amigo. We have just the thing.

Since datagram sockets aren't connected to a remote host, guess which piece of information we need to give before we send a packet? That's right! The destination address! Here's the scoop:

    int sendto(int sockfd, const void *msg, int len, unsigned int flags,
               const struct sockaddr *to, int tolen);
As you can see, this call is basically the same as the call to send() with the addition of two other pieces of information. to is a pointer to a struct sockaddr (which you'll probably have as a struct sockaddr_in and cast it at the last minute) which contains the destination IP address and port. tolen can simply be set to sizeof(struct sockaddr).

Just like with send(), sendto() returns the number of bytes actually sent (which, again, might be less than the number of bytes you told it to send!), or -1 on error.

Equally similar are recv() and recvfrom(). The synopsis of recvfrom() is:

    int recvfrom(int sockfd, void *buf, int len, unsigned int flags
                 struct sockaddr *from, int *fromlen);
Again, this is just like recv() with the addition of a couple fields. from is a pointer to a local struct sockaddr that will be filled with the IP address and port of the originating machine. fromlen is a pointer to a local int that should be initialized to sizeof(struct sockaddr). When the function returns, fromlen will contain the length of the address actually stored in from.

recvfrom() returns the number of bytes received, or -1 on error (with errno set accordingly.)

Remember, if you connect() a datagram socket, you can then simply use send() and recv() for all your transactions. The socket itself is still a datagram socket and the packets still use UDP, but the socket interface will automatically add the destination and source information for you.


close() and shutdown()--Get outta my face!

Whew! You've been send()'ing and recv()'ing data all day long, and you've had it. You're ready to close the connection on your socket descriptor. This is easy. You can just use the regular Unix file descriptor close() function:
    close(sockfd);
This will prevent any more reads and writes to the socket. Anyone attempting to read or write the socket on the remote end will receive an error.

Just in case you want a little more control over how the socket closes, you can use the shutdown() function. It allows you to cut off communication in a certain direction, or both ways (just like close() does.) Synopsis:

    int shutdown(int sockfd, int how);
sockfd is the socket file descriptor you want to shutdown, and how is one of the following:

shutdown() returns 0 on success, and -1 on error (with errno set accordingly.)

If you deign to use shutdown() on unconnected datagram sockets, it will simply make the socket unavailable for further send() and recv() calls (remember that you can use these if you connect() your datagram socket.)

Nothing to it.


getpeername()--Who are you?

This function is so easy.

It's so easy, I almost didn't give it it's own section. But here it is anyway.

The function getpeername() will tell you who is at the other end of a connected stream socket. The synopsis:

    #include <sys/socket.h> 

    int getpeername(int sockfd, struct sockaddr *addr, int *addrlen);
sockfd is the descriptor of the connected stream socket, addr is a pointer to a struct sockaddr (or a struct sockaddr_in) that will hold the information about the other side of the connection, and addrlen is a pointer to an int, that should be initialized to sizeof(struct sockaddr).

The function returns -1 on error and sets errno accordingly.

Once you have their address, you can use inet_ntoa() or gethostbyaddr() to print or get more information. No, you can't get their login name. (Ok, ok. If the other computer is running an ident daemon, this is possible. This, however, is beyond the scope of this document. Check out RFC-1413 for more info.)


gethostname()--Who am I?

Even easier than getpeername() is the function gethostname(). It returns the name of the computer that your program is running on. The name can then be used by gethostbyname(), below, to determine the IP address of your local machine.

What could be more fun? I could think of a few things, but they don't pertain to socket programming. Anyway, here's the breakdown:

    #include <unistd.h>

    int gethostname(char *hostname, size_t size);

The arguments are simple: hostname is a pointer to an array of chars that will contain the hostname upon the function's return, and size is the length in bytes of the hostname array.

The function returns 0 on successful completion, and -1 on error, setting errno as usual.


DNS--You say "whitehouse.gov", I say "198.137.240.100"

In case you don't know what DNS is, it stands for "Domain Name Service". In a nutshell, you tell it what the human-readable address is for a site, and it'll give you the IP address (so you can use it with bind(), connect(), sendto(), or whatever you need it for.) This way, when someone enters:
    $ telnet whitehouse.gov
telnet can find out that it needs to connect() to "198.137.240.100".

But how does it work? You'll be using the function gethostbyname():

    #include <netdb.h> 
    
    struct hostent *gethostbyname(const char *name);
As you see, it returns a pointer to a struct hostent, the layout of which is as follows:
    struct hostent {
        char    *h_name;
        char    **h_aliases;
        int     h_addrtype;
        int     h_length;
        char    **h_addr_list;
    };
    #define h_addr h_addr_list[0]
And here are the descriptions of the fields in the struct hostent:

gethostbyname() returns a pointer to the filled struct hostent, or NULL on error. (But errno is not set--h_errno is set instead. See herror(), below.)

But how is it used? Sometimes (as we find from reading computer manuals), just spewing the information at the reader is not enough. This function is certainly easier to use than it looks.

Here's an example program:

    #include <stdio.h> 
    #include <stdlib.h> 
    #include <errno.h> 
    #include <netdb.h> 
    #include <sys/types.h>
    #include <netinet/in.h> 

    int main(int argc, char *argv[])
    {
        struct hostent *h;

        if (argc != 2) {  /* error check the command line */
            fprintf(stderr,"usage: getip address\n");
            exit(1);
        }

        if ((h=gethostbyname(argv[1])) == NULL) {  /* get the host info */
            herror("gethostbyname");
            exit(1);
        }

        printf("Host name  : %s\n", h->h_name);
        printf("IP Address : %s\n",inet_ntoa(*((struct in_addr *)h->h_addr)));

        return 0;
    }
With gethostbyname(), you can't use perror() to print error message (since errno is not used). Instead, call herror().

It's pretty straightforward. You simply pass the string that contains the machine name ("whitehouse.gov") to gethostbyname(), and then grab the information out of the returned struct hostent.

The only possible weirdness might be in the printing of the IP address, above. h->h_addr is a char *, but inet_ntoa() wants a struct in_addr passed to it. So I cast h->h_addr to a struct in_addr *, then dereference it to get at the data.


Client-Server Background

It's a client-server world, baby. Just about everything on the network deals with client processes talking to server processes and vice-versa. Take telnet, for instance. When you connect to a remote host on port 23 with telnet (the client), a program on that host (called telnetd, the server) springs to life. It handles the incoming telnet connection, sets you up with a login prompt, etc.

[Client-Server Relationship]
Figure 2. The Client-Server Relationship.

The exchange of information between client and server is summarized in Figure 2.

Note that the client-server pair can speak SOCK_STREAM, SOCK_DGRAM, or anything else (as long as they're speaking the same thing.) Some good examples of client-server pairs are telnet/telnetd, ftp/ftpd, or bootp/bootpd. Every time you use ftp, there's a remote program, ftpd, that serves you.

Often, there will only be one server on a machine, and that server will handle multiple clients using fork(). The basic routine is: server will wait for a connection, accept() it, and fork() a child process to handle it. This is what our sample server does in the next section.


A Simple Stream Server

All this server does is send the string "Hello, World!\n" out over a stream connection. All you need to do to test this server is run it in one window, and telnet to it from another with:
    $ telnet remotehostname 3490
where remotehostname is the name of the machine you're running it on.

The server code: (Note: a trailing backslash on a line means that the line is continued on the next.)

    #include <stdio.h> 
    #include <stdlib.h> 
    #include <errno.h> 
    #include <string.h> 
    #include <sys/types.h> 
    #include <netinet/in.h> 
    #include <sys/socket.h> 
    #include <sys/wait.h> 

    #define MYPORT 3490    /* the port users will be connecting to */

    #define BACKLOG 10     /* how many pending connections queue will hold */

    main()
    {
        int sockfd, new_fd;  /* listen on sock_fd, new connection on new_fd */
        struct sockaddr_in my_addr;    /* my address information */
        struct sockaddr_in their_addr; /* connector's address information */
        int sin_size;

        if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
            perror("socket");
            exit(1);
        }

        my_addr.sin_family = AF_INET;         /* host byte order */
        my_addr.sin_port = htons(MYPORT);     /* short, network byte order */
        my_addr.sin_addr.s_addr = INADDR_ANY; /* auto-fill with my IP */
        bzero(&(my_addr.sin_zero), 8);        /* zero the rest of the struct */

        if (bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr)) \
                                                                      == -1) {
            perror("bind");
            exit(1);
        }

        if (listen(sockfd, BACKLOG) == -1) {
            perror("listen");
            exit(1);
        }

        while(1) {  /* main accept() loop */
            sin_size = sizeof(struct sockaddr_in);
            if ((new_fd = accept(sockfd, (struct sockaddr *)&their_addr, \
                                                          &sin_size)) == -1) {
                perror("accept");
                continue;
            }
            printf("server: got connection from %s\n", \
                                               inet_ntoa(their_addr.sin_addr));
            if (!fork()) { /* this is the child process */
                if (send(new_fd, "Hello, world!\n", 14, 0) == -1)
                    perror("send");
                close(new_fd);
                exit(0);
            }
            close(new_fd);  /* parent doesn't need this */

            while(waitpid(-1,NULL,WNOHANG) > 0); /* clean up child processes */
        }
    }
In case you're curious, I have the code in one big main() function for (I feel) syntactic clarity. Feel free to split it into smaller functions if it makes you feel better.

You can also get the string from this server by using the client listed in the next section.


A Simple Stream Client

This guy's even easier than the server. All this client does is connect to the host you specify on the command line, port 3490. It gets the string that the server sends.

The client source:

    #include <stdio.h> 
    #include <stdlib.h> 
    #include <errno.h> 
    #include <string.h> 
    #include <netdb.h> 
    #include <sys/types.h> 
    #include <netinet/in.h> 
    #include <sys/socket.h> 

    #define PORT 3490    /* the port client will be connecting to */

    #define MAXDATASIZE 100 /* max number of bytes we can get at once */

    int main(int argc, char *argv[])
    {
        int sockfd, numbytes;  
        char buf[MAXDATASIZE];
        struct hostent *he;
        struct sockaddr_in their_addr; /* connector's address information */

        if (argc != 2) {
            fprintf(stderr,"usage: client hostname\n");
            exit(1);
        }

        if ((he=gethostbyname(argv[1])) == NULL) {  /* get the host info */
            herror("gethostbyname");
            exit(1);
        }

        if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) == -1) {
            perror("socket");
            exit(1);
        }

        their_addr.sin_family = AF_INET;      /* host byte order */
        their_addr.sin_port = htons(PORT);    /* short, network byte order */
        their_addr.sin_addr = *((struct in_addr *)he->h_addr);
        bzero(&(their_addr.sin_zero), 8);     /* zero the rest of the struct */

        if (connect(sockfd, (struct sockaddr *)&their_addr, \
                                              sizeof(struct sockaddr)) == -1) {
            perror("connect");
            exit(1);
        }

        if ((numbytes=recv(sockfd, buf, MAXDATASIZE, 0)) == -1) {
            perror("recv");
            exit(1);
        }

        buf[numbytes] = '\0';

        printf("Received: %s",buf);

        close(sockfd);

        return 0;
    }

Notice that if you don't run the server before you run the client, connect() returns "Connection refused". Very useful.


Datagram Sockets

I really don't have that much to talk about here, so I'll just present a couple of sample programs: talker.c and listener.c.

listener sits on a machine waiting for an incoming packet on port 4950. talker sends a packet to that port, on the specified machine, that contains whatever the user enters on the command line.

Here is the source for listener.c:

    #include <stdio.h> 
    #include <stdlib.h> 
    #include <errno.h> 
    #include <string.h> 
    #include <sys/types.h> 
    #include <netinet/in.h> 
    #include <sys/socket.h> 
    #include <sys/wait.h> 

    #define MYPORT 4950    /* the port users will be connecting to */

    #define MAXBUFLEN 100

    main()
    {
        int sockfd;
        struct sockaddr_in my_addr;    /* my address information */
        struct sockaddr_in their_addr; /* connector's address information */
        int addr_len, numbytes;
        char buf[MAXBUFLEN];

        if ((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) {
            perror("socket");
            exit(1);
        }

        my_addr.sin_family = AF_INET;         /* host byte order */
        my_addr.sin_port = htons(MYPORT);     /* short, network byte order */
        my_addr.sin_addr.s_addr = INADDR_ANY; /* auto-fill with my IP */
        bzero(&(my_addr.sin_zero), 8);        /* zero the rest of the struct */

        if (bind(sockfd, (struct sockaddr *)&my_addr, sizeof(struct sockaddr)) \
                                                                       == -1) {
            perror("bind");
            exit(1);
        }

        addr_len = sizeof(struct sockaddr);
        if ((numbytes=recvfrom(sockfd, buf, MAXBUFLEN, 0, \
                           (struct sockaddr *)&their_addr, &addr_len)) == -1) {
            perror("recvfrom");
            exit(1);
        }

        printf("got packet from %s\n",inet_ntoa(their_addr.sin_addr));
        printf("packet is %d bytes long\n",numbytes);
        buf[numbytes] = '\0';
        printf("packet contains \"%s\"\n",buf);

        close(sockfd);
    }
Notice that in our call to socket() we're finally using SOCK_DGRAM. Also, note that there's no need to listen() or accept(). This is one of the perks of using unconnected datagram sockets!

Next comes the source for talker.c:

    #include <stdio.h> 
    #include <stdlib.h> 
    #include <errno.h> 
    #include <string.h> 
    #include <sys/types.h> 
    #include <netinet/in.h> 
    #include <netdb.h> 
    #include <sys/socket.h> 
    #include <sys/wait.h> 

    #define MYPORT 4950    /* the port users will be connecting to */

    int main(int argc, char *argv[])
    {
        int sockfd;
        struct sockaddr_in their_addr; /* connector's address information */
        struct hostent *he;
        int numbytes;

        if (argc != 3) {
            fprintf(stderr,"usage: talker hostname message\n");
            exit(1);
        }

        if ((he=gethostbyname(argv[1])) == NULL) {  /* get the host info */
            herror("gethostbyname");
            exit(1);
        }

        if ((sockfd = socket(AF_INET, SOCK_DGRAM, 0)) == -1) {
            perror("socket");
            exit(1);
        }

        their_addr.sin_family = AF_INET;      /* host byte order */
        their_addr.sin_port = htons(MYPORT);  /* short, network byte order */
        their_addr.sin_addr = *((struct in_addr *)he->h_addr);
        bzero(&(their_addr.sin_zero), 8);     /* zero the rest of the struct */

        if ((numbytes=sendto(sockfd, argv[2], strlen(argv[2]), 0, \
             (struct sockaddr *)&their_addr, sizeof(struct sockaddr))) == -1) {
            perror("sendto");
            exit(1);
        }

        printf("sent %d bytes to %s\n",numbytes,inet_ntoa(their_addr.sin_addr));

        close(sockfd);

        return 0;
    }
And that's all there is to it! Run listener on some machine, then run talker on another. Watch them communicate! Fun G-rated excitement for the entire nuclear family!

Except for one more tiny detail that I've mentioned many times in the past: connected datagram sockets. I need to talk about this here, since we're in the datagram section of the document. Let's say that talker calls connect() and specifies the listener's address. From that point on, talker may only sent to and receive from the address specified by connect(). For this reason, you don't have to use sendto() and recvfrom(); you can simply use send() and recv().


Blocking

Blocking. You've heard about it--now what the hell is it? In a nutshell, "block" is techie jargon for "sleep". You probably noticed that when you run listener, above, it just sits there until a packet arrives. What happened is that it called recvfrom(), there was no data, and so recvfrom() is said to "block" (that is, sleep there) until some data arrives.

Lots of functions block. accept() blocks. All the recv*() functions block. The reason they can do this is because they're allowed to. When you first create the socket descriptor with socket(), the kernel sets it to blocking. If you don't want a socket to be blocking, you have to make a call to fcntl():

    #include <unistd.h> 
    #include <fcntl.h> 
    .
    .
    sockfd = socket(AF_INET, SOCK_STREAM, 0);
    fcntl(sockfd, F_SETFL, O_NONBLOCK);
    .
    .

By setting a socket to non-blocking, you can effectively "poll" the socket for information. If you try to read from a non-blocking socket and there's no data there, it's not allowed to block--it will return -1 and errno will be set to EWOULDBLOCK.

Generally speaking, however, this type of polling is a bad idea. If you put your program in a busy-wait looking for data on the socket, you'll suck up CPU time like it was going out of style. A more elegant solution for checking to see if there's data waiting to be read comes in the following section on select().


select()--Synchronous I/O Multiplexing

This function is somewhat strange, but it's very useful. Take the following situation: you are a server and you want to listen for incoming connections as well as keep reading from the connections you already have.

No problem, you say, just an accept() and a couple of recv()s. Not so fast, buster! What if you're blocking on an accept() call? How are you going to recv() data at the same time? "Use non-blocking sockets!" No way! You don't want to be a CPU hog. What, then?

select() gives you the power to monitor several sockets at the same time. It'll tell you which ones are ready for reading, which are ready for writing, and which sockets have raised exceptions, if you really want to know that.

Without any further ado, I'll offer the synopsis of select():

       #include <sys/time.h> 
       #include <sys/types.h> 
       #include <unistd.h> 

       int select(int numfds, fd_set *readfds, fd_set *writefds,
                  fd_set *exceptfds, struct timeval *timeout);

The function monitors "sets" of file descriptors; in particular readfds, writefds, and exceptfds. If you want to see if you can read from standard input and some socket descriptor, sockfd, just add the file descriptors 0 and sockfd to the set readfds. The parameter numfds should be set to the values of the highest file descriptor plus one. In this example, it should be set to sockfd+1, since it is assuredly higher than standard input (0).

When select() returns, readfds will be modified to reflect which of the file descriptors you selected is ready for reading. You can test them with the macro FD_ISSET(), below.

Before progressing much further, I'll talk about how to manipulate these sets. Each set is of the type fd_set. The following macros operate on this type:

Finally, what is this weirded out struct timeval? Well, sometimes you don't want to wait forever for someone to send you some data. Maybe every 96 seconds you want to print "Still Going..." to the terminal even though nothing has happened. This time structure allows you to specify a timeout period. If the time is exceeded and select() still hasn't found any ready file descriptors, it'll return so you can continue processing.

The struct timeval has the follow fields:

    struct timeval {
        int tv_sec;     /* seconds */
        int tv_usec;    /* microseconds */
    };
Just set tv_sec to the number of seconds to wait, and set tv_usec to the number of microseconds to wait. Yes, that's microseconds, not milliseconds. There are 1,000 microseconds in a millisecond, and 1,000 milliseconds in a second. Thus, there are 1,000,000 microseconds in a second. Why is it "usec"? The "u" is supposed to look like the Greek letter Mu that we use for "micro". Also, when the function returns, timeout might be updated to show the time still remaining. This depends on what flavor of Unix you're running.

Yay! We have a microsecond resolution timer! Well, don't count on it. Standard Unix timeslice is 100 milliseconds, so you'll probably have to wait at least that long, no matter how small you set your struct timeval.

Other things of interest: If you set the fields in your struct timeval to 0, select() will timeout immediately, effectively polling all the file descriptors in your sets. If you set the parameter timeout to NULL, it will never timeout, and will wait until the first file descriptor is ready. Finally, if you don't care about waiting for a certain set, you can just set it to NULL in the call to select().

The following code snippet waits 2.5 seconds for something to appear on standard input:

       #include <sys/time.h> 
       #include <sys/types.h> 
       #include <unistd.h> 

       #define STDIN 0  /* file descriptor for standard input */

       main()
       {
           struct timeval tv;
           fd_set readfds;

           tv.tv_sec = 2;
           tv.tv_usec = 500000;

           FD_ZERO(&readfds);
           FD_SET(STDIN, &readfds);

           /* don't care about writefds and exceptfds: */
           select(STDIN+1, &readfds, NULL, NULL, &tv);

           if (FD_ISSET(STDIN, &readfds))
               printf("A key was pressed!\n");
           else
               printf("Timed out.\n");
       }
If you're on a line buffered terminal, the key you hit should be RETURN or it will time out anyway.

Now, some of you might think this is a great way to wait for data on a datagram socket--and you are right: it might be. Some Unices can use select in this manner, and some can't. You should see what your local man page says on the matter if you want to attempt it.

One final note of interest about select(): if you have a socket that is listen()'ing, you can check to see if there is a new connection by putting that socket's file descriptor in the readfds set.

And that, my friends, is a quick overview of the almighty select() function.


More References

You've come this far, and now you're screaming for more! Where else can you go to learn more about all this stuff?

Try the following man pages, for starters:

Also, look up the following books:


Internetworking with TCP/IP, volumes I-III by Douglas E. Comer and David L. Stevens. Published by Prentice Hall. Second edition ISBNs: 0-13-468505-9, 0-13-472242-6, 0-13-474222-2. There is a third edition of this set which covers IPv6 and IP over ATM.

Using C on the UNIX System by David A. Curry. Published by O'Reilly & Associates, Inc. ISBN 0-937175-23-4.

TCP/IP Network Administration by Craig Hunt. Published by O'Reilly & Associates, Inc. ISBN 0-937175-82-X.

TCP/IP Illustrated, volumes 1-3 by W. Richard Stevens and Gary R. Wright. Published by Addison Wesley. ISBNs: 0-201-63346-9, 0-201-63354-X, 0-201-63495-3.

Unix Network Programming by W. Richard Stevens. Published by Prentice Hall. ISBN 0-13-949876-1.

On the web:


BSD Sockets: A Quick And Dirty Primer
(http://www.cs.umn.edu/~bentlema/unix/--has other great Unix system programming info, too!)

Client-Server Computing
(http://pandonia.canberra.edu.au/ClientServer/socket.html)

Intro to TCP/IP (gopher)
(gopher://gopher-chem.ucdavis.edu/11/Index/Internet_aw/Intro_the_Internet/intro.to.ip/)

Internet Protocol Frequently Asked Questions (France)
(http://web.cnam.fr/Network/TCP-IP/)

The Unix Socket FAQ
(http://www.ibrado.com/sock-faq/)

RFCs--the real dirt:


RFC-768 -- The User Datagram Protocol (UDP)
(ftp://nic.ddn.mil/rfc/rfc768.txt)

RFC-791 -- The Internet Protocol (IP)
(ftp://nic.ddn.mil/rfc/rfc791.txt)

RFC-793 -- The Transmission Control Protocol (TCP)
(ftp://nic.ddn.mil/rfc/rfc793.txt)

RFC-854 -- The Telnet Protocol
(ftp://nic.ddn.mil/rfc/rfc854.txt)

RFC-951 -- The Bootstrap Protocol (BOOTP)
(ftp://nic.ddn.mil/rfc/rfc951.txt)

RFC-1350 -- The Trivial File Transfer Protocol (TFTP)
(ftp://nic.ddn.mil/rfc/rfc1350.txt)


Disclaimer and Call for Help

Well, that's the lot of it. Hopefully at least some of the information contained within this document has been remotely accurate and I sincerely hope there aren't any glaring errors. Well, sure, there always are.

So, if there are, that's tough for you. I'm sorry if any inaccuracies contained herein have caused you any grief, but you just can't hold me accountable. See, I don't stand behind a single word of this document, legally speaking. This is my warning to you: the whole thing could be a load of crap.

But it's probably not. After all, I've spent many many hours messing with this stuff, and implemented several TCP/IP network utilities for Windows (including Telnet) as summer work. I'm not the sockets god; I'm just some guy.

By the way, if anyone has any constructive (or destructive) criticism about this document, please send mail to beej@ecst.csuchico.edu and I'll try to make an effort to set the record straight.

In case you're wondering why I did this, well, I did it for the money. Hah! No, really, I did it because a lot of people have asked me socket-related questions and when I tell them I've been thinking about putting together a socket page, they say, "cool!" Besides, I feel that all this hard-earned knowledge is going to waste if I can't share it with others. WWW just happens to be the perfect vehicle. I encourage others to provide similar information whenever possible.

Enough of this--back to coding! ;-)


Copyright © 1995, 1996 by Brian "Beej" Hall. This guide may be reprinted in any medium provided that its content is not altered, it is presented in its entirety, and this copyright notice remains intact. Contact beej@ecst.csuchico.edu for more information. file descriptors you selected is ready for reading. You can test them with the macro FD_ISSET()perl-sock-example.html010064400022570000215000000006640666314021700154110ustar00chodiglib00000400000073
#/usr/local/bin/perl

# create a socket
$proto = getprotobyname('tcp');
socket(SOCK, PF_INET, SOCK_STREAM, $proto);

# connect to the server
$iaddr = gethostbyname($name);
$port  = getservbyname('http', 'tcp');
$sin = sockaddr_in($port, $iaddr);
connect(SOCK, $sin);

# send request
$request = "HEAD / HTTP/1.0\r\nUser-Agent: Poong\r\n\r\n";
syswrite SOCK, $request, length($request);
# read result
@_ = ;
print @_;

perl-sock-example.html~010064400022570000215000000006240666314012300155770ustar00chodiglib00000400000073

# create a socket
$proto = getprotobyname('tcp');
socket(SOCK, PF_INET, SOCK_STREAM, $proto);

# connect to the server
$iaddr = gethostbyname($name);
$port  = getservbyname('http', 'tcp');
$sin = sockaddr_in($port, $iaddr);
connect(SOCK, $sin);

# send request
$request = "HEAD / HTTP/1.0\r\nUser-Agent: Poong\r\n\r\n";
syswrite SOCK, $request, length($request);
# read result
@_ = ;
server.pl010064400022570000215000000021400666431225600130320ustar00chodiglib00000400000073!/usr/bin/perl -w # server1.pl - a simple server use strict; use Socket; # use port 7890 as default my $port = shift || 7890; my $proto = getprotobyname('tcp'); # create a socket, make it reusable socket(SERVER, PF_INET, SOCK_STREAM, $proto) or die "socket: $!"; setsockopt(SERVER, SOL_SOCKET, SO_REUSEADDR, 1) or die "setsock: $!"; # grab a port on this machine my $paddr = sockaddr_in($port, INADDR_ANY); # bind to a port, then listen bind(SERVER, $paddr) or die "bind: $!"; listen(SERVER, SOMAXCONN) or die "listen: $!"; print "SERVER started on port $port\n"; # for each connection... my $client_addr; while ($client_addr = accept(CLIENT, SERVER)) { # find out who connected my ($client_port, $client_ip) = sockaddr_in($client_addr); my $client_ipnum = inet_ntoa($client_ip); my $client_host = gethostbyaddr($client_ip, AF_INET); # tell who connected print "got a connection from: $client_host", " [$client_ipnum]\n"; # send them a message, close connection print CLIENT "Hello from the server: ", close CLIENT; } shell.html010064400022570000215000000442270660616727500132050ustar00chodiglib00000400000073 How to write a shell script

How to write a shell script

Introduction

A shell is a command line interpretor. It takes commands and executes them. As such, it implements a programming language. The Bourne shell is used to create shell scripts -- ie. programs that are interpreted/executed by the shell. You can write shell scripts with the C-shell; however, this is not covered here.

Creating a Script

Suppose you often type the command
    find . -name file -print
and you'd rather type a simple command, say
    sfind file
Create a shell script
    % cd ~/bin
    % emacs sfind
    % page sfind
    find . -name $1 -print
    % chmod a+x sfind
    % rehash
    % cd /usr/local/bin
    % sfind tcsh
    ./shells/tcsh

Observations

This quick example is far from adequate but some observations:
  1. Shell scripts are simple text files created with an editor.
  2. Shell scripts are marked as executeable
        %chmod a+x sfind
    
  3. Should be located in your search path and ~/bin should be in your search path.
  4. You likely need to rehash if you're a Csh (tcsh) user (but not again when you login).
  5. Arguments are passed from the command line and referenced. For example, as $1.

#!/bin/sh

All Bourne Shell scripts should begin with the sequence
    #!/bin/sh
From the man page for exec(2):

"On the first line of an interpreter script, following the "#!", is the name of a program which should be used to interpret the contents of the file. For instance, if the first line contains "#! /bin/sh", then the con- tents of the file are executed as a shell script."

You can get away without this, but you shouldn't. All good scripts state the interpretor explicitly. Long ago there was just one (the Bourne Shell) but these days there are many interpretors -- Csh, Ksh, Bash, and others.

Comments

Comments are any text beginning with the pound (#) sign. A comment can start anywhere on a line and continue until the end of the line.

Search Path

All shell scripts should include a search path specifica- tion:
    PATH=/usr/ucb:/usr/bin:/bin; export PATH
A PATH specification is recommended -- often times a script will fail for some people because they have a different or incomplete search path.

The Bourne Shell does not export environment variables to children unless explicitly instructed to do so by using the export command.

Argument Checking

A good shell script should verify that the arguments sup- plied (if any) are correct.

    if [ $# -ne 3 ]; then
         echo 1>&2 Usage: $0 19 Oct 91
         exit 127
    fi
This script requires three arguments and gripes accordingly.

Exit status

All Unix utilities should return an exit status.
    # is the year out of range for me?

    if [ $year -lt 1901  -o  $year -gt 2099 ]; then
         echo 1>&2 Year \"$year\" out of range
         exit 127
    fi

    etc...

    # All done, exit ok

    exit 0
A non-zero exit status indicates an error condition of some sort while a zero exit status indicates things worked as expected.

On BSD systems there's been an attempt to categorize some of the more common exit status codes. See /usr/include/sysexits.h.

Using exit status

Exit codes are important for those who use your code. Many constructs test on the exit status of a command.

The conditional construct is:

    if command; then
         command
    fi
For example,
    if tty -s; then
         echo Enter text end with \^D
    fi
Your code should be written with the expectation that others will use it. Making sure you return a meaningful exit status will help.

Stdin, Stdout, Stderr

Standard input, output, and error are file descriptors 0, 1, and 2. Each has a particular role and should be used accordingly:
    # is the year out of range for me?

    if [ $year -lt 1901  -o  $year -gt 2099 ]; then
         echo 1>&2 Year \"$year\" out of my range
         exit 127
    fi

    etc...

    # ok, you have the number of days since Jan 1, ...

    case `expr $days % 7` in
    0)
         echo Mon;;
    1)
         echo Tue;;

    etc...
Error messages should appear on stderr not on stdout! Output should appear on stdout. As for input/output dialogue:
    # give the fellow a chance to quit

    if tty -s ; then
         echo This will remove all files in $* since ...
         echo $n Ok to procede? $c;      read ans
         case "$ans" in
              n*|N*)
    echo File purge abandoned;
    exit 0   ;;
         esac
         RM="rm -rfi"
    else
         RM="rm -rf"
    fi
Note: this code behaves differently if there's a user to communicate with (ie. if the standard input is a tty rather than a pipe, or file, or etc. See tty(1)).

Language Constructs

Some Tricks

PRE> The value of the variable "p_01".

${p}_01
The value of the variable "p" with "_01" pasted onto the end.

  • Conditional Reference

    ${variable-word}
    
    If the variable has been set, use it's value, else use word.
    POSTSCRIPT=${POSTSCRIPT-PostScript};
    export POSTSCRIPT
    
    ${variable:-word}
    
    If the variable unix-socket-faq-1.html010064400022570000215000000143420663340331100152250ustar00chodiglib00000400000073 Programming UNIX Sockets in C - Frequently Asked Questions: General Information and Concepts Previous Next Table of Contents

    1. General Information and Concepts

    1.1 What's new?

    Changed on May 21/1998

    • 2.12. Where can a get a library for programming sockets? (fixed a couple of links)
    • 7. Sample Source Code (put examples on a new ftp server)

    1.2 About this FAQ

    This FAQ is maintained by Vic Metcalfe ( vic@acm.org), with lots of assistance from Andrew Gierth ( andrew@erlenstar.demon.co.uk). I am depending on the true wizards to fill in the details, and correct my (no doubt) plentiful mistakes. The code examples in this FAQ are written to be easy to follow and understand. It is up to the reader to make them as efficient as required. I started this faq because after reading comp.unix.programmer for a short time, it became evident that a FAQ for sockets was needed.

    The FAQ is available at the following locations:

    Usenet: (Posted on the 21st of each month)

    news.answers, comp.answers, comp.unix.answers, comp.unix.programmer

    FTP:

    ftp://rtfm.mit.edu/pub/usenet/news.answers/unix-faq/socket

    WWW:

    http://www.ibrado.com/sock-faq http://kipper.york.ac.uk/~vic/sock-faq http://www.ntua.gr/sock-faq

    Please email me if you would like to correct or clarify an answer. I would also like to hear from you if you would like me to add a question to the list. I may not be able to answer it, but I can add it in the hopes that someone else will submit an answer. Every hour I seem to be getting even busier, so if I am slow to respond to your email, please be patient. If more than a week passes you may want to send me another one as I often put messages aside for later and then forget about them. I'll have to work on dealing with my mail better, but until then feel free to pester me a little bit.

    1.3 Who is this FAQ for?

    This FAQ is for C programmers in the Unix environment. It is not intended for WinSock programmers, or for Perl, Java, etc. I have nothing against Windows or Perl, but I had to limit the scope of the FAQ for the first draft. In the future, I would really like to provide examples for Perl, Java, and maybe others. For now though I will concentrate on correctness and completeness for C.

    This version of the FAQ will only cover sockets of the AF_INET family, since this is their most common use. Coverage of other types of sockets may be added later.

    1.4 What are Sockets?

    Sockets are just like "worm holes" in science fiction. When things go into one end, they (should) come out of the other. Different kinds of sockets have different properties. Sockets are either connection-oriented or connectionless. Connection-oriented sockets allow for data to flow back and forth as needed, while connectionless sockets (also known as datagram sockets) allow only one message at a time to be transmitted, without an open connection. There are also different socket families. The two most common are AF_INET for internet connections, and AF_UNIX for unix IPC (interprocess communication). As stated earlier, this FAQ deals only with AF_INET sockets.

    1.5 How do Sockets Work?

    The implementation is left up to the vendor of your particular unix, but from the point of view of the programmer, connection-oriented sockets work a lot like files, or pipes. The most noticeable difference, once you have your file descriptor is that read() or write() calls may actually read or write fewer bytes than requested. If this happens, then you will have to make a second call for the rest of the data. There are examples of this in the source code that accompanies the faq.

    1.6 Where can I get source code for the book [book title]?

    Here is a list of the places I know to get source code for network programming books. It is very short, so please mail me with any others you know of.

    Title: Unix Network Programming
    Author: W. Richard Stevens ( rstevens@noao.edu)
    Publisher: Prentice Hall, Inc.
    ISBN: 0-13-949876-1
    URL: http://www.noao.edu/~rstevens

    Title: Power Programming with RPC
    Author: John Bloomer
    Publisher: O'Reilly & Associates, Inc.
    ISBN: 0-937175-77-3
    URL: ftp://ftp.uu.net/published/oreilly/nutshell/rpc/rpc.tar.Z

    Recommended by: Lokmanm Merican (lokmanm#pop4.jaring.my@199.1.1.88)
    Title: UNIX PROGRAM DEVELOPMENT for IBM PC'S Including OSF/Motif
    Author: Thomas Yager
    Publisher: Addison Wesley, 1991
    ISBN: 0-201-57727-5

    1.7 Where can I get more information?

    I keep a copy of the resources I know of on my socks page on the web. I don't remember where I got most of these items, but some day I'll check out their sources, and provide ftp information here. For now, you can get them at http://www.ibrado.com/sock-faq.

    There is a good TCP/IP FAQ maintained by George Neville-Neil ( gnn@wrs.com) which can be found at http://www.visi.com/~khayes/tcpipfaq.html


    Previous Next Table of Contents than a week passes you may want to send me another one as I often put messages aside for later and then forget about them. I'll have to work on dealing with my mail better, but until then feel free to pester me a little bit.

    1.3 Who is this FAQ for? Programming UNIX Sockets in C - Frequently Asked Questions: Questions regarding both Clients and Servers (TCP/SOCK_STREAM) Previous Next Table of Contents

    2. Questions regarding both Clients and Servers (TCP/SOCK_STREAM)

    2.1 How can I tell when a socket is closed on the other end?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    AFAIK:

    If the peer calls close() or exits, without having messed with SO_LINGER, then our calls to read() should return 0. It is less clear what happens to write() calls in this case; I would expect EPIPE, not on the next call, but the one after.

    If the peer reboots, or sets l_onoff = 1, l_linger = 0 and then closes, then we should get ECONNRESET (eventually) from read(), or EPIPE from write().

    I should also point out that when write() returns EPIPE, it also raises the SIGPIPE signal - you never see the EPIPE error unless you handle or ignore the signal.

    If the peer remains unreachable, we should get some other error.

    I don't think that write() can legitimately return 0. read() should return 0 on receipt of a FIN from the peer, and on all following calls.

    So yes, you must expect read() to return 0.

    As an example, suppose you are receiving a file down a TCP link; you might handle the return from read() like this:

    rc = read(sock,buf,sizeof(buf));
    if (rc > 0)
    {
        write(file,buf,rc);
        /* error checking on file omitted */
    }
    else if (rc == 0)
    {
        close(file);
        close(sock);
        /* file received successfully */
    }
    else /* rc < 0 */
    {
        /* close file and delete it, since data is not complete
           report error, or whatever */
    }
    

    2.2 What's with the second parameter in bind()?

    The man page shows it as "struct sockaddr *my_addr". The sockaddr struct though is just a place holder for the structure it really wants. You have to pass different structures depending on what kind of socket you have. For an AF_INET socket, you need the sockaddr_in structure. It has three fields of interest:

    sin_family

    Set this to AF_INET.

    sin_port

    The network byte-ordered 16 bit port number

    sin_addr

    The host's ip number. This is a struct in_addr, which contains only one field, s_addr which is a u_long.

    2.3 How do I get the port number for a given service?

    Use the getservbyname() routine. This will return a pointer to a servent structure. You are interested in the s_port field, which contains the port number, with correct byte ordering (so you don't need to call htons() on it). Here is a sample routine:

    /* Take a service name, and a service type, and return a port number.  If the
       service name is not found, it tries it as a decimal number.  The number
       returned is byte ordered for the network. */
    int atoport(char *service, char *proto) {
      int port;
      long int lport;
      struct servent *serv;
      char *errpos;
    
      /* First try to read it from /etc/services */
      serv = getservbyname(service, proto);
      if (serv != NULL)
        port = serv->s_port;
      else { /* Not in services, maybe a number? */
        lport = strtol(service,&errpos,0);
        if ( (errpos[0] != 0) || (lport < 1) || (lport > 5000) )
          return -1; /* Invalid port address */
        port = htons(lport);
      }
      return port;
    }
    

    2.4 If bind() fails, what should I do with the socket descriptor?

    If you are exiting, I have been assured by Andrew that all unixes will close open file descriptors on exit. If you are not exiting though, you can just close it with a regular close() call.

    2.5 How do I properly close a socket?

    This question is usually asked by people who try close(), because they have seen that that is what they are supposed to do, and then run netstat and see that their socket is still active. Yes, close() is the correct method. To read about the TIME_WAIT state, and why it is important, refer to 2.7 Please explain the TIME_WAIT state..

    2.6 When should I use shutdown()?

    From Michael Hunter ( mphunter@qnx.com):

    shutdown() is useful for deliniating when you are done providing a request to a server using TCP. A typical use is to send a request to a server followed by a shutdown(). The server will read your request followed by an EOF (read of 0 on most unix implementations). This tells the server that it has your full request. You then go read blocked on the socket. The server will process your request and send the necessary data back to you followed by a close. When you have finished reading all of the response to your request you will read an EOF thus signifying that you have the whole response. It should be noted the TTCP (TCP for Transactions -- see R. Steven's home page) provides for a better method of tcp transaction management.

    S.Degtyarev ( deg@sunsr.inp.nsk.su) wrote a nice in-depth message to me about this. He shows a practical example of using shutdown() to aid in synchronization of client processes when one is the "reader" process, and the other is the "writer" process. A portion of his message follows:

    Sockets are very similar to pipes in the way they are used for data transfer and client/server transactions, but not like pipes they are bidirectional. Programs that use sockets often fork() and each process inherits the socket descriptor. In pipe based programs it is strictly recommended to close all the pipe ends that are not used to convert the pipe line to one-directional data stream to avoid data losses and deadlocks. With the socket there is no way to allow one process only to send data and the other only to receive so you should always keep in mind the consequences.

    Generally the difference between close() and shutdown() is: close() closes the socket id for the process but the connection is still opened if another process shares this socket id. The connection stays opened both for read and write, and sometimes this is very important. shutdown() breaks the connection for all processes sharing the socket id. Those who try to read will detect EOF, and those who try to write will reseive SIGPIPE, possibly delayed while the kernel socket buffer will be filled. Additionally, shutdown() has a second argument which denotes how to close the connection: 0 means to disable further reading, 1 to disable writing and 2 disables both.

    The quick example below is a fragment of a very simple client process. After establishing the connection with the server it forks. Then child sends the keyboard input to the server until EOF is received and the parent receives answers from the server.

    /*
     *      Sample client fragment,
     *      variables declarations and error handling are omitted
     */
            s=connect(...);
    
            if( fork() ){   /*      The child, it copies its stdin to
                                            the socket              */
                    while( gets(buffer) >0)
                            write(s,buf,strlen(buffer));
    
                    close(s);
                    exit(0);
                    }
    
            else {          /* The parent, it receives answers  */
                    while( (l=read(s,buffer,sizeof(buffer)){
                            do_something(l,buffer);
    
                    /* Connection break from the server is assumed  */
                    /* ATTENTION: deadlock here                     */
                    wait(0); /* Wait for the child to exit          */
                    exit(0);
                    }
    

    What do we expect? The child detects an EOF from its stdin, it closes the socket (assuming connection break) and exits. The server in its turn detects EOF, closes connection and exits. The parent detects EOF, makes the wait() system call and exits. What do we see instead? The socket instance in the parent process is still opened for writing and reading, though the parent never writes. The server never detects EOF and waits for more data from the client forever. The parent never sees the connection is closed and hangs forever and the server hangs too. Unexpected deadlock! ( any deadlock is unexpected though :-)

    You should change the client fragment as follows:

                    if( fork() ) {  /* The child                    */
                            while( gets(buffer) }
                                    write(s,buffer,strlen(buffer));
    
                                    shutdown(s,1); /* Break the connection
            for writing, The server will detect EOF now. Note: reading from
            the socket is still allowed. The server may send some more data
            after receiving EOF, why not? */
                            exit(0);
                            }
    

    I hope this rough example explains the troubles you can have with client/server syncronization. Generally you should always remember all the instances of the particular socket in all the processes that share the socket and close them all at once if you whish to use close() or use shutdown() in one process to break the connection.

    2.7 Please explain the TIME_WAIT state.

    Remember that TCP guarantees all data transmitted will be delivered, if at all possible. When you close a socket, the server goes into a TIME_WAIT state, just to be really really sure that all the data has gone through. When a socket is closed, both sides agree by sending messages to each other that they will send no more data. This, it seemed to me was good enough, and after the handshaking is done, the socket should be closed. The problem is two-fold. First, there is no way to be sure that the last ack was communicated successfully. Second, there may be "wandering duplicates" left on the net that must be dealt with if they are delivered.

    Andrew Gierth ( andrew@erlenstar.demon.co.uk) helped to explain the closing sequence in the following usenet posting:

    Assume that a connection is in ESTABLISHED state, and the client is about to do an orderly release. The client's sequence no. is Sc, and the server's is Ss. The pipe is empty in both directions.

       Client                                                   Server
       ======                                                   ======
       ESTABLISHED                                              ESTABLISHED
       (client closes)
       ESTABLISHED                                              ESTABLISHED
                    <CTL=FIN+ACK><SEQ=Sc><ACK=Ss> ------->>
       FIN_WAIT_1
                    <<-------- <CTL=ACK><SEQ=Ss><ACK=Sc+1>
       FIN_WAIT_2                                               CLOSE_WAIT
                    <<-------- <CTL=FIN+ACK><SEQ=Ss><ACK=Sc+1>  (server closes)
                                                                LAST_ACK
                    <CTL=ACK>,<SEQ=Sc+1><ACK=Ss+1> ------->>
       TIME_WAIT                                                CLOSED
       (2*msl elapses...)
       CLOSED
    

    Note: the +1 on the sequence numbers is because the FIN counts as one byte of data. (The above diagram is equivalent to fig. 13 from RFC 793).

    Now consider what happens if the last of those packets is dropped in the network. The client has done with the connection; it has no more data or control info to send, and never will have. But the server does not know whether the client received all the data correctly; that's what the last ACK segment is for. Now the server may or may not care whether the client got the data, but that is not an issue for TCP; TCP is a reliable rotocol, and must distinguish between an orderly connection close where all data is transferred, and a connection abort where data may or may not have been lost.

    So, if that last packet is dropped, the server will retransmit it (it is, after all, an unacknowledged segment) and will expect to see a suitable ACK segment in reply. If the client went straight to CLOSED, the only possible response to that retransmit would be a RST, which would indicate to the server that data had been lost, when in fact it had not been.

    (Bear in mind that the server's FIN segment may, additionally, contain data.)

    DISCLAIMER: This is my interpretation of the RFCs (I have read all the TCP-related ones I could find), but I have not attempted to examine implementation source code or trace actual connections in order to verify it. I am satisfied that the logic is correct, though.

    More commentarty from Vic:

    The second issue was addressed by Richard Stevens ( rstevens@noao.edu, author of "Unix Network Programming", see 1.6 Where can I get source code for the book [book title]?). I have put together quotes from some of his postings and email which explain this. I have brought together paragraphs from different postings, and have made as few changes as possible.

    From Richard Stevens ( rstevens@noao.edu):

    If the duration of the TIME_WAIT state were just to handle TCP's full-duplex close, then the time would be much smaller, and it would be some function of the current RTO (retransmission timeout), not the MSL (the packet lifetime).

    A couple of points about the TIME_WAIT state.

    • The end that sends the first FIN goes into the TIME_WAIT state, because that is the end that sends the final ACK. If the other end's FIN is lost, or if the final ACK is lost, having the end that sends the first FIN maintain state about the connection guarantees that it has enough information to retransmit the final ACK.
    • Realize that TCP sequence numbers wrap around after 2**32 bytes have been transferred. Assume a connection between A.1500 (host A, port 1500) and B.2000. During the connection one segment is lost and retransmitted. But the segment is not really lost, it is held by some intermediate router and then re-injected into the network. (This is called a "wandering duplicate".) But in the time between the packet being lost & retransmitted, and then reappearing, the connection is closed (without any problems) and then another connection is established between the same host, same port (that is, A.1500 and B.2000; this is called another "incarnation" of the connection). But the sequence numbers chosen for the new incarnation just happen to overlap with the sequence number of the wandering duplicate that is about to reappear. (This is indeed possible, given the way sequence numbers are chosen for TCP connections.) Bingo, you are about to deliver the data from the wandering duplicate (the previous incarnation of the connection) to the new incarnation of the connection. To avoid this, you do not allow the same incarnation of the connection to be reestablished until the TIME_WAIT state terminates. Even the TIME_WAIT state doesn't complete solve the second problem, given what is called TIME_WAIT assassination. RFC 1337 has more details.
    • The reason that the duration of the TIME_WAIT state is 2*MSL is that the maximum amount of time a packet can wander around a network is assumed to be MSL seconds. The factor of 2 is for the round-trip. The recommended value for MSL is 120 seconds, but Berkeley-derived implementations normally use 30 seconds instead. This means a TIME_WAIT delay between 1 and 4 minutes. Solaris 2.x does indeed use the recommended MSL of 120 seconds.

    A wandering duplicate is a packet that appeared to be lost and was retransmitted. But it wasn't really lost ... some router had problems, held on to the packet for a while (order of seconds, could be a minute if the TTL is large enough) and then re-injects the packet back into the network. But by the time it reappears, the application that sent it originally has already retransmitted the data contained in that packet.

    Because of these potential problems with TIME_WAIT assassinations, one should not avoid the TIME_WAIT state by setting the SO_LINGER option to send an RST instead of the normal TCP connection termination (FIN/ACK/FIN/ACK). The TIME_WAIT state is there for a reason; it's your friend and it's there to help you :-)

    I have a long discussion of just this topic in my just-released "TCP/IP Illustrated, Volume 3". The TIME_WAIT state is indeed, one of the most misunderstood features of TCP.

    I'm currently rewriting "Unix Network Programming" (see 1.6 Where can I get source code for the book [book title]?). and will include lots more on this topic, as it is often confusing and misunderstood.

    An additional note from Andrew:

    Closing a socket: if SO_LINGER has not been called on a socket, then close() is not supposed to discard data. This is true on SVR4.2 (and, apparently, on all non-SVR4 systems) but apparently not on SVR4; the use of either shutdown() or SO_LINGER seems to be required to guarantee delivery of all data.

    2.8 Why does it take so long to detect that the peer died?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    Because by default, no packets are sent on the TCP connection unless there is data to send or acknowledge.

    So, if you are simply waiting for data from the peer, there is no way to tell if the peer has silently gone away, or just isn't ready to send any more data yet. This can be a problem (especially if the peer is a PC, and the user just hits the Big Switch...).

    One solution is to use the SO_KEEPALIVE option. This option enables periodic probing of the connection to ensure that the peer is still present. BE WARNED: the default timeout for this option is AT LEAST 2 HOURS. This timeout can often be altered (in a system-dependent fashion) but not normally on a per-connection basis (AFAIK).

    RFC1122 specifies that this timeout (if it exists) must be configurable. On the majority of Unix variants, this configuration may only be done globally, affecting all TCP connections which have keepalive enabled. The method of changing the value, moreover, is often difficult and/or poorly documented, and in any case is different for just about every version in existence.

    If you must change the value, look for something resembling tcp_keepidle in your kernel configuration or network options configuration.

    If you're sending to the peer, though, you have some better guarantees; since sending data implies receiving ACKs from the peer, then you will know after the retransmit timeout whether the peer is still alive. But the retransmit timeout is designed to allow for various contingencies, with the intention that TCP connections are not dropped simply as a result of minor network upsets. So you should still expect a delay of several minutes before getting notification of the failure.

    The approach taken by most application protocols currently in use on the Internet (e.g. FTP, SMTP etc.) is to implement read timeouts on the server end; the server simply gives up on the client if no requests are received in a given time period (often of the order of 15 minutes). Protocols where the connection is maintained even if idle for long periods have two choices:

    1. use SO_KEEPALIVE
    2. use a higher-level keepalive mechanism (such as sending a null request to the server every so often).

    2.9 What are the pros/cons of select(), non-blocking I/O and SIGIO?

    Using non-blocking I/O means that you have to poll sockets to see if there is data to be read from them. Polling should usually be avoided since it uses more CPU time than other techniques.

    Using SIGIO allows your application to do what it does and have the operating system tell it (with a signal) that there is data waiting for it on a socket. The only drawback to this soltion is that it can be confusing, and if you are dealing with multiple sockets you will have to do a select() anyway to find out which one(s) is ready to be read.

    Using select() is great if your application has to accept data from more than one socket at a time since it will block until any one of a number of sockets is ready with data. One other advantage to select() is that you can set a time-out value after which control will be returned to you whether any of the sockets have data for you or not.

    2.10 Why do I get EPROTO from read()?

    From Steve Rago ( sar@plc.com):

    EPROTO means that the protocol encountered an unrecoverable error for that endpoint. EPROTO is one of those catch-all error codes used by STREAMS-based drivers when a better code isn't available.

    And an addition note from Andrew ( andrew@erlenstar.demon.co.uk):

    Not quite to do with EPROTO from read(), but I found out once that on some STREAMS-based implementations, EPROTO could be returned by accept() if the incoming connection was reset before the accept completes.

    On some other implementations, accept seemed to be capable of blocking if this occured. This is important, since if select() said the listening socket was readable, then you would normally expect not to block in the accept() call. The fix is, of course, to set nonblocking mode on the listening socket if you are going to use select() on it.

    2.11 How can I force a socket to send the data in its buffer?

    From Richard Stevens ( rstevens@noao.edu):

    You can't force it. Period. TCP makes up its own mind as to when it can send data. Now, normally when you call write() on a TCP socket, TCP will indeed send a segment, but there's no guarantee and no way to force this. There are lots of reasons why TCP will not send a segment: a closed window and the Nagle algorithm are two things to come immediately to mind.

    (Snipped suggestion from Andrew Gierth to use TCP_NODELAY)

    Setting this only disables one of the many tests, the Nagle algorithm. But if the original poster's problem is this, then setting this socket option will help.

    A quick glance at tcp_output() shows around 11 tests TCP has to make as to whether to send a segment or not.

    Now from Dr. Charles E. Campbell Jr. ( cec@gryphon.gsfc.nasa.gov):

    As you've surmised, I've never had any problem with disabling Nagle's algorithm. Its basically a buffering method; there's a fixed overhead for all packets, no matter how small. Hence, Nagle's algorithm collects small packets together (no more than .2sec delay) and thereby reduces the amount of overhead bytes being transferred. This approach works well for rcp, for example: the .2 second delay isn't humanly noticeable, and multiple users have their small packets more efficiently transferred. Helps in university settings where most folks using the network are using standard tools such as rcp and ftp, and programs such as telnet may use it, too.

    However, Nagle's algorithm is pure havoc for real-time control and not much better for keystroke interactive applications (control-C, anyone?). It has seemed to me that the types of new programs using sockets that people write usually do have problems with small packet delays. One way to bypass Nagle's algorithm selectively is to use "out-of-band" messaging, but that is limited in its content and has other effects (such as a loss of sequentiality) (by the way, out-of-band is often used for that ctrl-C, too).

    More from Vic:

    So to sum it all up, if you are having trouble and need to flush the socket, setting the TCP_NODELAY option will usually solve the problem. If it doesn't, you will have to use out-of-band messaging, but according to Andrew, "out-of-band data has its own problems, and I don't think it works well as a solution to buffering delays (haven't tried it though). It is not 'expedited data' in the sense that exists in some other protocols; it is transmitted in-stream, but with a pointer to indicate where it is."

    I asked Andrew something to the effect of "What promises does TCP make about when it will get around to writing data to the network?" I thought his reply should be put under this question:

    Not many promises, but some.

    I'll try and quote chapter and verse on this:

    References:

    RFC 1122, "Requirements for Internet Hosts" (also STD 3)
    RFC 793, "Transmission Control Protocol" (also STD 7)

    1. The socket interface does not provide access to the TCP PUSH flag.
    2. RFC1122 says (4.2.2.2): A TCP MAY implement PUSH flags on SEND calls. If PUSH flags are not implemented, then the sending TCP: (1) must not buffer data indefinitely, and (2) MUST set the PSH bit in the last buffered segment (i.e., when there is no more queued data to be sent).
    3. RFC793 says (2.8): When a receiving TCP sees the PUSH flag, it must not wait for more data from the sending TCP before passing the data to the receiving process. [RFC1122 supports this statement.]
    4. Therefore, data passed to a write() call must be delivered to the peer within a finite time, unless prevented by protocol considerations.
    5. There are (according to a post from Stevens quoted in the FAQ [earlier in this answer - Vic]) about 11 tests made which could delay sending the data. But as I see it, there are only 2 that are significant, since things like retransmit backoff are a) not under the programmers control and b) must either resolve within a finite time or drop the connection.

    The first of the interesting cases is "window closed" (ie. there is no buffer space at the receiver; this can delay data indefinitely, but only if the receiving process is not actually reading the data that is available)

    Vic asks:

    OK, it makes sense that if the client isn't reading, the data isn't going to make it across the connection. I take it this causes the sender to block after the recieve queue is filled?

    The sender blocks when the socket send buffer is full, so buffers will be full at both ends.

    While the window is closed, the sending TCP sends window probe packets. This ensures that when the window finally does open again, the sending TCP detects the fact. [RFC1122, ss 4.2.2.17]

    The second interesting case is "Nagle algorithm" (small segments, e.g. keystrokes, are delayed to form larger segments if ACKs are expected from the peer; this is what is disabled with TCP_NODELAY)

    Vic Asks:

    Does this mean that my tcpclient sample should set TCP_NODELAY to ensure that the end-of-line code is indeed put out onto the network when sent?

    No. tcpclient.c is doing the right thing as it stands; trying to write as much data as possible in as few calls to write() as is feasible. Since the amount of data is likely to be small relative to the socket send buffer, then it is likely (since the connection is idle at that point) that the entire request will require only one call to write(), and that the TCP layer will immediately dispatch the request as a single segment (with the PSH flag, see point 2.2 above).

    The Nagle algorithm only has an effect when a second write() call is made while data is still unacknowledged. In the normal case, this data will be left buffered until either: a) there is no unacknowledged data; or b) enough data is available to dispatch a full-sized segment. The delay cannot be indefinite, since condition (a) must become true within the retransmit timeout or the connection dies.

    Since this delay has negative consequences for certain applications, generally those where a stream of small requests are being sent without response, e.g. mouse movements, the standards specify that an option must exist to disable it. [RFC1122, ss 4.2.3.4]

    Additional note: RFC1122 also says:

    [DISCUSSION]:

    When the PUSH flag is not implemented on SEND calls, i.e., when the application/TCP interface uses a pure streaming model, responsibility for aggregating any tiny data fragments to form reasonable sized segments is partially borne by the application layer.

    So programs should avoid calls to write() with small data lengths (small relative to the MSS, that is); it's better to build up a request in a buffer and then do one call to sock_write() or equivalent.

    The other possible sources of delay in the TCP are not really controllable by the program, but they can only delay the data temporarily.

    Vic asks:

    By temporarily, you mean that the data will go as soon as it can, and I won't get stuck in a position where one side is waiting on a response, and the other side hasn't recieved the request? (Or at least I won't get stuck forever)

    You can only deadlock if you somehow manage to fill up all the buffers in both directions... not easy.

    If it is possible to do this, (can't think of a good example though), the solution is to use nonblocking mode, especially for writes. Then you can buffer excess data in the program as necessary.

    2.12 Where can I get a library for programming sockets?

    There is the Simple Sockets Library by Charles E. Campbell, Jr. PhD. and Terry McRoberts. The file is called ssl.tar.gz, and you can download it from this faq's home page. For c++ there is the Socket++ library which is on ftp://ftp.virginia.edu/pub/socket++-1.10.tar.gz. There is also C++ Wrappers. The file is called ftp://ftp.huji.ac.il/pub/languages/C++/C++_wrappers.tar.gz. Thanks to Bill McKinnon for tracking it down for me! From http://www.cs.wustl.edu/~schmidt you should be able to find the ACE toolkit. PING Software Group has some libraries that include a sockets interface among other things. My link to their web site has gone stale, and I don't know where their new site is. Please send me an email if you find it.

    Philippe Jounin has developed a cross platform library which includes high level support for http and ftp protocols, with more to come. You can find it at http://perso.magic.fr/jounin-ph/P_tcp4u.htm, and you can find a review of it at http://www6.zdnet.com/cgi-bin/texis/swlib/hotfiles/info.html?fcode=000H4F

    I don't have any experience with any of these libraries, so I can't recomend one over the other.

    2.13 How come select says there is data, but read returns zero?

    The data that causes select to return is the EOF because the other side has closed the connection. This causes read to return zero. For more information see 2.1 How can I tell when a socket is closed on the other end?

    2.14 Whats the difference between select() and poll()?

    From Richard Stevens ( rstevens@noao.edu):

    The basic difference is that select()'s fd_set is a bit mask and therefore has some fixed size. It would be possible for the kernel to not limit this size when the kernel is compiled, allowing the application to define FD_SETSIZE to whatever it wants (as the comments in the system header imply today) but it takes more work. 4.4BSD's kernel and the Solaris library function both have this limit. But I see that BSD/OS 2.1 has now been coded to avoid this limit, so it's doable, just a small matter of programming. :-) Someone should file a Solaris bug report on this, and see if it ever gets fixed.

    With poll(), however, the user must allocate an array of pollfd structures, and pass the number of entries in this array, so there's no fundamental limit. As Casper notes, fewer systems have poll() than select, so the latter is more portable. Also, with original implementations (SVR3) you could not set the descriptor to -1 to tell the kernel to ignore an entry in the pollfd structure, which made it hard to remove entries from the array; SVR4 gets around this. Personally, I always use select() and rarely poll(), because I port my code to BSD environments too. Someone could write an implementation of poll() that uses select(), for these environments, but I've never seen one. Both select() and poll() are being standardized by POSIX 1003.1g.

    2.15 How do I send [this] over a socket?

    Anything other than single bytes of data will probably get mangled unless you take care. For integer values you can use htons() and friends, and strings are really just a bunch of single bytes, so those should be OK. Be careful not to send a pointer to a string though, since the pointer will be meaningless on another machine. If you need to send a struct, you should write sendthisstruct() and readthisstruct() functions for it that do all the work of taking the structure apart on one side, and putting it back together on the other. If you need to send floats, you may have a lot of work ahead of you. You should read RFC 1014 which is about portable ways of getting data from one machine to another (thanks to Andrew Gabriel for pointing this out).

    2.16 How do I use TCP_NODELAY?

    First off, be sure you really want to use it in the first place. It will disable the Nagle algorithm (see 2.11 How can I force a socket to send the data in its buffer?), which will cause network traffic to increase, with smaller than needed packets wasting bandwidth. Also, from what I have been able to tell, the speed increase is very small, so you should probably do it without TCP_NODELAY first, and only turn it on if there is a problem.

    Here is a code example, with a warning about using it from Andrew Gierth:

      int flag = 1;
      int result = setsockopt(sock,            /* socket affected */
                              IPPROTO_TCP,     /* set option at TCP level */
                              TCP_NODELAY,     /* name of option */
                              (char *) &flag,  /* the cast is historical 
                                                      cruft */
                              sizeof(int));    /* length of option value */
      if (result < 0)
         ... handle the error ...
    

    TCP_NODELAY is for a specific purpose; to disable the Nagle buffering algorithm. It should only be set for applications that send frequent small bursts of information without getting an immediate response, where timely delivery of data is required (the canonical example is mouse movements).

    2.17 What exactly does the Nagle algorithm do?

    It groups together as much data as it can between ACK's from the other end of the connection. I found this really confusing until Andrew Gierth ( andrew@erlenstar.demon.co.uk) drew the following diagram, and explained:

    This diagram is not intended to be complete, just to illustrate the point better...

    Case 1: client writes 1 byte per write() call. The program on host B is tcpserver.c from the FAQ examples.

          CLIENT                                  SERVER
    APP             TCP                     TCP             APP
                    [connection setup omitted]
    
     "h" --------->          [1 byte]
                        ------------------>
                                               -----------> "h"
                                       [ack delayed]
     "e" ---------> [Nagle alg.              .
                     now in effect]          .
     "l" ---------> [ditto]                  .
     "l" ---------> [ditto]                  .
     "o" ---------> [ditto]                  .
     "\n"---------> [ditto]                  .
                                             .
                                             .
                           [ack 1 byte]
                        <------------------
                    [send queued
                    data]
                            [5 bytes]
                        ------------------>
                                              ------------> "ello\n"
                                              <------------ "HELLO\n"
                       [6 bytes, ack 5 bytes]
                        <------------------
     "HELLO\n" <----
                  [ack delayed]
                     .
                     .
                     .   [ack 6 bytes]
                        ------------------>
    

    Total segments: 5. (If TCP_NODELAY was set, could have been up to 10.) Time for response: 2*RTT, plus ack delay.

    Case 2: client writes all data with one write() call.

          CLIENT                                  SERVER
    APP             TCP                     TCP             APP
                    [connection setup omitted]
    
     "hello\n" --->          [6 bytes]
                        ------------------>
                                              ------------> "hello\n"
                                              <------------ "HELLO\n"
                       [6 bytes, ack 6 bytes]
                        <------------------
     "HELLO\n" <----
                [ack delayed]
                     .
                     .
                     .   [ack 6 bytes]
                        ------------------>
    

    Total segments: 3.

    Time for response = RTT (therefore minimum possible).

    Hope this makes things a bit clearer...

    Note that in case 2, you don't want the implementation to gratuitously delay sending the data, since that would add straight onto the response time.

    2.18 What is the difference between read() and recv()?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    read() is equivalent to recv() with a flags parameter of 0. Other values for the flags parameter change the behaviour of recv(). Similarly, write() is equivalent to send() with flags == 0.

    It is unlikely that send()/recv() would be dropped; perhaps someone with a copy of the POSIX drafts for socket calls can check...

    Portability note: non-unix systems may not allow read()/write() on sockets, but recv()/send() are usually ok. This is true on Windows and OS/2, for example.

    2.19 I see that send()/write() can generate SIGPIPE. Is there any advantage to handling the signal, rather than just ignoring it and checking for the EPIPE error? Are there any useful parameters passed to the signal catching function?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    In general, the only parameter passed to a signal handler is the signal number that caused it to be invoked. Some systems have optional additional parameters, but they are no use to you in this case.

    My advice is to just ignore SIGPIPE as you suggest. That's what I do in just about all of my socket code; errno values are easier to handle than signals (in fact, the first revision of the FAQ failed to mention SIGPIPE in that context; I'd got so used to ignoring it...)

    There is one situation where you should not ignore SIGPIPE; if you are going to exec() another program with stdout redirected to a socket. In this case it is probably wise to set SIGPIPE to SIG_DFL before doing the exec().

    2.20 After the chroot(), calls to socket() are failing. Why?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    On systems where sockets are implemented on top of Streams (e.g. all SysV-based systems, presumably including Solaris), the socket() function will actually be opening certain special files in /dev. You will need to create a /dev directory under your fake root and populate it with the required device nodes (only).

    Your system documentation may or may not specify exactly which device nodes are required; I can't help you there (sorry). (Editors note: Adrian Hall ( adrian@hottub.org) suggested checking the man page for ftpd, which should list the files you need to copy and devices you need to create in the chroot'd environment.)

    A less-obvious issue with chroot() is if you call syslog(), as many daemons do; syslog() opens (depending on the system) either a UDP socket, a FIFO or a Unix-domain socket. So if you use it after a chroot() call, make sure that you call openlog() *before* the chroot.

    2.21 Why do I keep getting EINTR from the socket calls?

    This isn't really so much an error as an exit condition. It means that the call was interrupted by a signal. Any call that might block should be wrapped in a loop that checkes for EINTR, as is done in the example code (See 7. Sample Source Code).

    2.22 When will my application receive SIGPIPE?

    From Richard Stevens ( rstevens@noao.edu):

    Very simple: with TCP you get SIGPIPE if your end of the connection has received an RST from the other end. What this also means is that if you were using select instead of write, the select would have indicated the socket as being readable, since the RST is there for you to read (read will return an error with errno set to ECONNRESET).

    Basically an RST is TCP's response to some packet that it doesn't expect and has no other way of dealing with. A common case is when the peer closes the connection (sending you a FIN) but you ignore it because you're writing and not reading. (You should be using select.) So you write to a connection that has been closed by the other end and the other end's TCP responds with an RST.

    2.23 What are socket exceptions? What is out-of-band data?

    Unlike exceptions in C++, socket exceptions do not indicate that an error has occured. Socket exceptions usually refer to the notification that out-of-band data has arrived. Out-of-band data (called "urgent data" in TCP) looks to the application like a separate stream of data from the main data stream. This can be useful for separating two different kinds of data. Note that just because it is called "urgent data" does not mean that it will be delivered any faster, or with higher priorety than data in the in-band data stream. Also beware that unlike the main data stream, the out-of-bound data may be lost if your application can't keep up with it.

    2.24 How can I find the full hostname (FQDN) of the system I'mrunning on?

    From Richard Stevens ( rstevens@noao.edu):

    Some systems set the hostname to the FQDN and others set it to just the unqualified host name. I know the current BIND FAQ recommends the FQDN, but most Solaris systems, for example, tend to use only the unqualified host name.

    Regardless, the way around this is to first get the host's name (perhaps an FQDN, perhaps unaualified). Most systems support the Posix way to do this using uname(), but older BSD systems only provide gethostname(). Call gethostbyname() to find your IP address. Then take the IP address and call gethostbyaddr(). The h_name member of the hostent{} should then be your FQDN.


    Previous Next Table of Contents length of option value */ if (result < 0) ... handle the error ...

    TCP_NODELAY is for a specific purpose; to disable the Nagle buffering algorithm. It should only be set for applications that send frequent small bursts of information without getting an immediate response, where timely delivery of data is required (the canonical example is mouse movements).

    Programming UNIX Sockets in C - Frequently Asked Questions: Writing Client Applications (TCP/SOCK_STREAM) Previous Next Table of Contents

    3. Writing Client Applications (TCP/SOCK_STREAM)

    3.1 How do I convert a string into an internet address?

    If you are reading a host's address from the command line, you may not know if you have an aaa.bbb.ccc.ddd style address, or a host.domain.com style address. What I do with these, is first try to use it as a aaa.bbb.ccc.ddd type address, and if that fails, then do a name lookup on it. Here is an example:

    /* Converts ascii text to in_addr struct.  NULL is returned if the 
       address can not be found. */
    struct in_addr *atoaddr(char *address) {
      struct hostent *host;
      static struct in_addr saddr;
    
      /* First try it as aaa.bbb.ccc.ddd. */
      saddr.s_addr = inet_addr(address);
      if (saddr.s_addr != -1) {
        return &saddr;
      }
      host = gethostbyname(address);
      if (host != NULL) {
        return (struct in_addr *) *host->h_addr_list;
      }
      return NULL;
    }
    

    3.2 How can my client work through a firewall/proxy server?

    If you are running through separate proxies for each service, you shouldn't need to do anything. If you are working through sockd, you will need to "socksify" your application. Details for doing this can be found in the package itself, which is available at:

    ftp://ftp.net.com/socks.cstc/socks.cstc.4.2.tar.gz

    you can get the socks faq at:

    ftp://coast.cs.purdue.edu/pub/tools/unix/socks/FAQ

    3.3 Why does connect() succeed even before my server did an accept()?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    Once you have done a listen() call on your socket, the kernel is primed to accept connections on it. The usual UNIX implementation of this works by immediately completing the SYN handshake for any incoming valid SYN segments (connection attempts), creating the socket for the new connection, and keeping this new socket on an internal queue ready for the accept() call. So the socket is fully open before the accept is done.

    The other factor in this is the 'backlog' parameter for listen(); that defines how many of these completed connections can be queued at one time. If the specified number is exceeded, then new incoming connects are simply ignored (which causes them to be retried).

    3.4 Why do I sometimes lose a server's address when using more than one server?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    Take a careful look at struct hostent. Notice that almost everything in it is a pointer? All these pointers will refer to statically allocated data.

    For example, if you do:

        struct hostent *host = gethostbyname(hostname);
    

    then (as you should know) a subsequent call to gethostbyname() will overwrite the structure pointed to by 'host'.

    But if you do:

        struct hostent myhost;
        struct hostent *hostptr = gethostbyname(hostname);
        if (hostptr) myhost = *host;
    

    to make a copy of the hostent before it gets overwritten, then it still gets clobbered by a subsequent call to gethostbyname(), since although myhost won't get overwritten, all the data it is pointing to will be.

    You can get round this by doing a proper 'deep copy' of the hostent structure, but this is tedious. My recommendation would be to extract the needed fields of the hostent and store them in your own way.

    Robin Paterson ( etmrpat@etm.ericsson.se) has added:

    It might be nice if you mention MT safe libraries provide complimentary functions for multithreaded programming. On the solaris machine I'm typing at, we have gethostbyname and gethostbyname_r (_r for reentrant). The main difference is, you provide the storage for the hostent struct so you always have a local copy and not just a pointer to the static copy.

    3.5 How can I set the timeout for the connect() system call?

    From Richard Stevens ( rstevens@noao.edu):

    Normally you cannot change this. Solaris does let you do this, on a per-kernel basis with the ndd tcp_ip_abort_cinterval parameter.

    The easiest way to shorten the connect time is with an alarm() around the call to connect(). A harder way is to use select(), after setting the socket nonblocking. Also notice that you can only shorten the connect time, there's normally no way to lengthen it.

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    First, create the socket and put it into non-blocking mode, then call connect(). There are three possibilities:

    • connect succeeds: the connection has been successfully made (this usually only happens when connecting to the same machine)
    • connect fails: obvious
    • connect returns -1/EINPROGRESS. The connection attempt has begun, but not yet completed.

    If the connection succeeds:

    • the socket will select() as writable (and will also select as readable if data arrives)

    If the connection fails:

    • the socket will select as readable *and* writable, but either a read or write will return the error code from the connection attempt. Also, you can use getsockopt(SO_ERROR) to get the error status - but be careful; some systems return the error code in the result parameter of getsockopt, but others (incorrectly) cause the getsockopt call *itself* to fail with the stored value as the error.

    3.6 Should I bind() a port number in my client program, or let thesystem choose one for me on the connect() call?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    ** Let the system choose your client's port number **

    The exception to this, is if the server has been written to be picky about what client ports it will allow connections from. Rlogind and rshd are the classic examples. This is usually part of a Unix-specific (and rather weak) authentication scheme; the intent is that the server allows connections only from processes with root privilege. (The weakness in the scheme is that many O/Ss (e.g. MS-DOS) allow anyone to bind any port.)

    The rresvport() routine exists to help out clients that are using this scheme. It basically does the equivalent of socket() + bind(), choosing a port number in the range 512..1023.

    If the server is not fussy about the client's port number, then don't try and assign it yourself in the client, just let connect() pick it for you.

    If, in a client, you use the naive scheme of starting at a fixed port number and calling bind() on consecutive values until it works, then you buy yourself a whole lot of trouble:

    The problem is if the server end of your connection does an active close. (E.G. client sends 'QUIT' command to server, server responds by closing the connection). That leaves the client end of the connection in CLOSED state, and the server end in TIME_WAIT state. So after the client exits, there is no trace of the connection on the client end.

    Now run the client again. It will pick the same port number, since as far as it can see, it's free. But as soon as it calls connect(), the server finds that you are trying to duplicate an existing connection (although one in TIME_WAIT). It is perfectly entitled to refuse to do this, so you get, I suspect, ECONNREFUSED from connect(). (Some systems may sometimes allow the connection anyway, but you can't rely on it.)

    This problem is especially dangerous because it doesn't show up unless the client and server are on different machines. (If they are the same machine, then the client won't pick the same port number as before). So you can get bitten well into the development cycle (if you do what I suspect most people do, and test client & server on the same box initially).

    Even if your protocol has the client closing first, there are still ways to produce this problem (e.g. kill the server).

    3.7 Why do I get "connection refused" when the server isn't running?

    The connect() call will only block while it is waiting to establish a connection. When there is no server waiting at the other end, it gets notified that the connection can not be established, and gives up with the error message you see. This is a good thing, since if it were not the case clients might wait for ever for a service which just doesn't exist. Users would think that they were only waiting for the connection to be established, and then after a while give up, muttering something about crummy software under their breath.

    3.8 What does one do when one does not know how much information is commingover the socket ? Is there a way to have a dynamic buffer ?

    This question asked by Niranjan Perera ( perera@mindspring.com).

    When the size of the incoming data is unknown, you can either make the size of the buffer as big as the largest possible (or likely) buffer, or you can re-size the buffer on the fly during your read. When you malloc() a large buffer, most (if not all) varients of unix will only allocate address space, but not physical pages of ram. As more and more of the buffer is used, the kernel allocates physical memory. This means that malloc'ing a large buffer will not waste resources unless that memory is used, and so it is perfectly acceptable to ask for a meg of ram when you expect only a few K.

    On the other hand, a more elegant solution that does not depend on the inner workings of the kernel is to use realloc() to expand the buffer as required in say 4K chunks (since 4K is the size of a page of ram on most systems). I may add something like this to sockhelp.c in the example code one day.


    Previous Next Table of Contents E> ftp://ftp.net.com/socks.cstc/socks.cstc.4.2.tar.gz

    you can get the socks faq at:

    ftp://coast.cs.purdue.edu/pub/tools/unix/socks/FAQ

    3.3 Why does connect() unix-socket-faq-4.html010064400022570000215000000562550663340333700152510ustar00chodiglib00000400000073 Programming UNIX Sockets in C - Frequently Asked Questions: Writing Server Applications (TCP/SOCK_STREAM) Previous Next Table of Contents

    4. Writing Server Applications (TCP/SOCK_STREAM)

    4.1 How come I get "address already in use" from bind()?

    You get this when the address is already in use. (Oh, you figured that much out?) The most common reason for this is that you have stopped your server, and then re-started it right away. The sockets that were used by the first incarnation of the server are still active. This is further explained in 2.7 Please explain the TIME_WAIT state., and 2.5 How do I properly close a socket?.

    4.2 Why don't my sockets close?

    When you issue the close() system call, you are closing your interface to the socket, not the socket itself. It is up to the kernel to close the socket. Sometimes, for really technical reasons, the socket is kept alive for a few minutes after you close it. It is normal, for example for the socket to go into a TIME_WAIT state, on the server side, for a few minutes. People have reported ranges from 20 seconds to 4 minutes to me. The official standard says that it should be 4 minutes. On my Linux system it is about 2 minutes. This is explained in great detail in 2.7 Please explain the TIME_WAIT state..

    4.3 How can I make my server a daemon?

    There are two approaches you can take here. The first is to use inetd to do all the hard work for you. The second is to do all the hard work yourself.

    If you use inetd, you simply use stdin, stdout, or stderr for your socket. (These three are all created with dup() from the real socket) You can use these as you would a socket in your code. The inetd process will even close the socket for you when you are done.

    If you wish to write your own server, there is a detailed explanation in "Unix Network Programming" by Richard Stevens (see 1.6 Where can I get source code for the book [book title]?). I also picked up this posting from comp.unix.programmer, by Nikhil Nair ( nn201@cus.cam.ac.uk). You may want to add code to ignore SIGPIPE, because if this signal is not dealt with, it will cause your application to exit. (Thanks to ingo@milan2.snafu.de for pointing this out).

    I worked all this lot out from the GNU C Library Manual (on-line
    documentation).  Here's some code I wrote - you can adapt it as necessary:
    
    
    #include <stdio.h>
    #include <stdlib.h>
    #include <ctype.h>
    #include <unistd.h>
    #include <fcntl.h>
    #include <signal.h>
    #include <sys/wait.h>
    
    /* Global variables */
    ...
    volatile sig_atomic_t keep_going = 1; /* controls program termination */
    
    
    /* Function prototypes: */
    ...
    void termination_handler (int signum); /* clean up before termination */
    
    
    int
    main (void)
    {
      ...
    
      if (chdir (HOME_DIR))         /* change to directory containing data 
                                        files */
       {
         fprintf (stderr, "`%s': ", HOME_DIR);
         perror (NULL);
         exit (1);
       }
    
       /* Become a daemon: */
       switch (fork ())
         {
         case -1:                    /* can't fork */
           perror ("fork()");
           exit (3);
         case 0:                     /* child, process becomes a daemon: */
           close (STDIN_FILENO);
           close (STDOUT_FILENO);
           close (STDERR_FILENO);
           if (setsid () == -1)      /* request a new session (job control) */
             {
               exit (4);
             }
           break;
         default:                    /* parent returns to calling process: */
           return 0;
         }
    
       /* Establish signal handler to clean up before termination: */
       if (signal (SIGTERM, termination_handler) == SIG_IGN)
         signal (SIGTERM, SIG_IGN);
       signal (SIGINT, SIG_IGN);
       signal (SIGHUP, SIG_IGN);
    
       /* Main program loop */
       while (keep_going)
         {
           ...
         }
       return 0;
    }
    
    void
    termination_handler (int signum)
    {
      keep_going = 0;
      signal (signum, termination_handler);
    }
    

    4.4 How can I listen on more than one port at a time?

    The best way to do this is with the select() call. This tells the kernel to let you know when a socket is available for use. You can have one process do i/o with multiple sockets with this call. If you want to wait for a connect on sockets 4, 6 and 10 you might execute the following code snippet:

    fd_set socklist;
    
    FD_ZERO(&socklist); /* Always clear the structure first. */
    FD_SET(4, &socklist);
    FD_SET(6, &socklist);
    FD_SET(10, &socklist);
    if (select(11, NULL, &socklist, NULL, NULL) < 0)
      perror("select");
    

    The kernel will notify us as soon as a file descriptor which is less than 11 (the first parameter to select()), and is a member of our socklist becomes available for writing. See the man page on select() for more details.

    4.5 What exactly does SO_REUSEADDR do?

    This socket option tells the kernel that even if this port is busy (in the TIME_WAIT state), go ahead and reuse it anyway. If it is busy, but with another state, you will still get an address already in use error. It is useful if your server has been shut down, and then restarted right away while sockets are still active on its port. You should be aware that if any unexpected data comes in, it may confuse your server, but while this is possible, it is not likely.

    It has been pointed out that "A socket is a 5 tuple (proto, local addr, local port, remote addr, remote port). SO_REUSEADDR just says that you can reuse local addresses. The 5 tuple still must be unique!" by Michael Hunter (mphunter@qnx.com). This is true, and this is why it is very unlikely that unexpected data will ever be seen by your server. The danger is that such a 5 tuple is still floating around on the net, and while it is bouncing around, a new connection from the same client, on the same system, happens to get the same remote port. This is explained by Richard Stevens in 2.7 Please explain the TIME_WAIT state..

    4.6 What exactly does SO_LINGER do?

    On some unixes this does nothing. On others, it instructs the kernel to abort tcp connections instead of closing them properly. This can be dangerous. If you are not clear on this, see 2.7 Please explain the TIME_WAIT state..

    4.7 What exactly does SO_KEEPALIVE do?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    The SO_KEEPALIVE option causes a packet (called a 'keepalive probe') to be sent to the remote system if a long time (by default, more than 2 hours) passes with no other data being sent or received. This packet is designed to provoke an ACK response from the peer. This enables detection of a peer which has become unreachable (e.g. powered off or disconnected from the net). See 2.8 Why does it take so long to detect that the peer died? for further discussion.

    Note that the figure of 2 hours comes from RFC1122, "Requirements for Internet Hosts". The precise value should be configurable, but I've often found this to be difficult. The only implementation I know of that allows the keepalive interval to be set per-connection is SVR4.2.

    4.8 How can I bind() to a port number < 1024?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    The restriction on access to ports < 1024 is part of a (fairly weak) security scheme particular to UNIX. The intention is that servers (for example rlogind, rshd) can check the port number of the client, and if it is < 1024, assume the request has been properly authorised at the client end.

    The practical upshot of this, is that binding a port number < 1024 is reserved to processes having an effective UID == root.

    This can, occasionally, itself present a security problem, e.g. when a server process needs to bind a well-known port, but does not itself need root access (news servers, for example). This is often solved by creating a small program which simply binds the socket, then restores the real userid and exec()s the real server. This program can then be made setuid root.

    4.9 How do I get my server to find out the client's address / hostname?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    After accept()ing a connection, use getpeername() to get the address of the client. The client's address is of course, also returned on the accept(), but it is essential to initialise the address-length parameter before the accept call for this will work.

    Jari Kokko ( jkokko@cc.hut.fi) has offered the following code to determine the client address:

    int t;
    int len;
    struct sockaddr_in sin;
    struct hostent *host;
    
    len = sizeof sin;
    if (getpeername(t, (struct sockaddr *) &sin, &len) < 0)
            perror("getpeername");
    else {
            if ((host = gethostbyaddr((char *) &sin.sin_addr,
                                      sizeof sin.sin_addr,
                                      AF_INET)) == NULL)
                perror("gethostbyaddr");
            else printf("remote host is '%s'\n", host->h_name);
    }
    

    4.10 How should I choose a port number for my server?

    The list of registered port assignments can be found in STD 2 or RFC 1700. Choose one that isn't already registered, and isn't in /etc/services on your system. It is also a good idea to let users customize the port number in case of conflicts with other un-registered port numbers in other servers. The best way of doing this is hardcoding a service name, and using getservbyname() to lookup the actual port number. This method allows users to change the port your server binds to by simply editing the /etc/services file.

    4.11 What is the difference between SO_REUSEADDR and SO_REUSEPORT?

    SO_REUSEADDR allows your server to bind to an address which is in a TIME_WAIT state. It does not allow more than one server to bind to the same address. It was mentioned that use of this flag can create a security risk because another server can bind to a the same port, by binding to a specific address as opposed to INADDR_ANY. The SO_REUSEPORT flag allows multiple processes to bind to the same address provided all of them use the SO_REUSEPORT option.

    From Richard Stevens ( rstevens@noao.edu):

    This is a newer flag that appeared in the 4.4BSD multicasting code (although that code was from elsewhere, so I am not sure just who invented the new SO_REUSEPORT flag).

    What this flag lets you do is rebind a port that is already in use, but only if all users of the port specify the flag. I believe the intent is for multicasting apps, since if you're running the same app on a host, all need to bind the same port. But the flag may have other uses. For example the following is from a post in February:

    From Stu Friedberg ( stuartf@sequent.com):

    SO_REUSEPORT is also useful for eliminating the try-10-times-to-bind hack in ftpd's data connection setup routine. Without SO_REUSEPORT, only one ftpd thread can bind to TCP (lhost, lport, INADDR_ANY, 0) in preparation for connecting back to the client. Under conditions of heavy load, there are more threads colliding here than the try-10-times hack can accomodate. With SO_REUSEPORT, things work nicely and the hack becomes unnecessary.

    I have also heard that DEC OSF supports the flag. Also note that under 4.4BSD, if you are binding a multicast address, then SO_REUSEADDR is condisered the same as SO_REUSEPORT (p. 731 of "TCP/IP Illustrated, Volume 2"). I think under Solaris you just replace SO_REUSEPORT with SO_REUSEADDR.

    From a later Stevens posting, with minor editing:

    Basically SO_REUSEPORT is a BSD'ism that arose when multicasting was added, even thought it was not used in the original Steve Deering code. I believe some BSD-derived systems may also include it (OSF, now Digital Unix, perhaps?). SO_REUSEPORT lets you bind the same address *and* port, but only if all the binders have specified it. But when binding a multicast address (its main use), SO_REUSEADDR is considered identical to SO_REUSEPORT (p. 731, "TCP/IP Illustrated, Volume 2"). So for portability of multicasting applications I always use SO_REUSEADDR.

    4.12 How can I write a multi-homed server?

    The original question was actually from Shankar Ramamoorthy ( shankar@viman.com):

    I want to run a server on a multi-homed host. The host is part of two networks and has two ethernet cards. I want to run a server on this machine, binding to a pre-determined port number. I want clients on either subnet to be able to send broadcast packates to the port and have the server receive them.

    And answered by Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    Your first question in this scenario is, do you need to know which subnet the packet came from? I'm not at all sure that this can be reliably determined in all cases.

    If you don't really care, then all you need is one socket bound to INADDR_ANY. That simplifies things greatly.

    If you do care, then you have to bind multiple sockets. You are obviously attempting to do this in your code as posted, so I'll assume you do.

    I was hoping that something like the following would work. Will it? This is on Sparcs running Solaris 2.4/2.5.

    I don't have access to Solaris, but I'll comment based on my experience with other Unixes.

    [Shankar's original code omitted]

    What you are doing is attempting to bind all the current hosts unicast addresses as listed in hosts/NIS/DNS. This may or may not reflect reality, but much more importantly, neglects the broadcast addresses. It seems to be the case in the majority of implementations that a socket bound to a unicast address will not see incoming packets with broadcast addresses as their destinations.

    The approach I've taken is to use SIOCGIFCONF to retrieve the list of active network interfaces, and SIOCGIFFLAGS and SIOCGIFBRDADDR to identify broadcastable interfaces and get the broadcast addresses. Then I bind to each unicast address, each broadcast address, and to INADDR_ANY as well. That last is necessary to catch packets that are on the wire with INADDR_BROADCAST in the destination. (SO_REUSEADDR is necessary to bind INADDR_ANY as well as the specific addresses.)

    This gives me very nearly what I want. The wrinkles are:

    • I don't assume that getting a packet through a particular socket necessarily means that it actually arrived on that interface.
    • I can't tell anything about which subnet a packet originated on if its destination was INADDR_BROADCAST.
    • On some stacks, apparently only those with multicast support, I get duplicate incoming messages on the INADDR_ANY socket.

    4.13 How can I read only one character at a time?

    This question is usually asked by people who are testing their server with telnet, and want it to process their keystrokes one character at a time. The correct technique is to use a psuedo terminal (pty). More on that in a minute.

    According to Roger Espel Llima ( espel@drakkar.ens.fr), you can have your server send a sequence of control characters: 0xff 0xfb 0x01 0xff 0xfb 0x03 0xff 0xfd 0x0f3, which translates to IAC WILL ECHO IAC WILL SUPPRESS-GO-AHEAD IAC DO SUPPRESS-GO-AHEAD. For more information on what this means, check out std8, std28 and std29. Roger also gave the following tips:

    • This code will suppress echo, so you'll have to send the characters the user types back to the client if you want the user to see them.
    • Carriage returns will be followed by a null character, so you'll have to expect them.
    • If you get a 0xff, it will be followed by two more characters. These are telnet escapes.

    Use of a pty would also be the correct way to execute a child process and pass the i/o to a socket.

    I'll add pty stuff to the list of example source I'd like to add to the faq. If someone has some source they'd like to contribute (without copyright) to the faq which demonstrates use of pty's, please email me!

    4.14 I'm trying to exec() a program from my server, and attach my socket's IO to it, but I'm not getting all the data across. Why?

    If the program you are running uses printf(), etc (streams from stdio.h) you have to deal with two buffers. The kernel buffers all socket IO, and this is explained in section 2.11. The second buffer is the one that is causing you grief. This is the stdio buffer, and the problem was well explained by Andrew:

    (The short answer to this question is that you want to use a pty rather than a socket; the remainder of this article is an attempt to explain why.)

    Firstly, the socket buffer controlled by setsockopt() has absolutly nothing to do with stdio buffering. Setting it to 1 is guaranteed to be the Wrong Thing(tm).

    Perhaps the following diagram might make things a little clearer:

            Process A                   Process B
        +---------------------+     +---------------------+
        |                     |     |                     |
        |    mainline code    |     |    mainline code    |
        |         |           |     |         ^           |
        |         v           |     |         |           |
        |      fputc()        |     |      fgetc()        |
        |         |           |     |         ^           |
        |         v           |     |         |           |
        |    +-----------+    |     |    +-----------+    |
        |    | stdio     |    |     |    | stdio     |    |
        |    | buffer    |    |     |    | buffer    |    |
        |    +-----------+    |     |    +-----------+    |
        |         |           |     |         ^           |
        |         |           |     |         |           |
        |      write()        |     |       read()        |
        |         |           |     |         |           |
        +-------- | ----------+     +-------- | ----------+
                  |                           |                  User space
      ------------|-------------------------- | ---------------------------
                  |                           |                Kernel space
                  v                           |
             +-----------+               +-----------+
             | socket    |               | socket    |
             | buffer    |               | buffer    |
             +-----------+               +-----------+
                  |                           ^
                  v                           |
          (AF- and protocol-          (AF- and protocol-
           dependent code)             dependent code)
    

    Assuming these two processes are communicating with each other (I've deliberately omitted the actual comms mechanisms, which aren't really relevent), you can see that data written by process A to its stdio buffer is completely inaccessible to process B. Only once the decision is made to flush that buffer to the kernel (via write()) can the data actually be delivered to the other process.

    The only guaranteed way to affect the buffering within process A is to change the code. However, the default buffering for stdout is controlled by whether the underlying FD refers to a terminal or not; generally, output to terminals is line-buffered, and output to non-terminals (including but not limited to files, pipes, sockets, non-tty devices, etc.) is fully buffered. So the desired effect can usually be achieved by using a pty device; this, for example, is what the 'expect' program does.

    Since the stdio buffer (and the FILE structure, and everything else related to stdio) is user-level data, it is not preserved across an exec() call, hence trying to use setvbuf() before the exec is ineffective.

    A couple of alternate solutions were proposed by Roger Espel Llima ( espel@drakkar.ens.fr):

    If it's an option, you can use some standalone program that will just run something inside a pty and buffer its input/output. I've seen a package by the name pty.tar.gz that did that; you could search around for it with archie or AltaVista.

    Another option (**warning, evil hack**) , if you're on a system that supports this (SunOS, Solaris, Linux ELF do; I don't know about others) is to, on your main program, putenv() the name of a shared executable (*.so) in LD_PRELOAD, and then in that .so redefine some commonly used libc function that the program you're exec'ing is known to use early. There you can 'get control' on the running program, and the first time you get it, do a setbuf(stdout, NULL) on the program's behalf, and then call the original libc function with a dlopen() + dlsym(). And you keep the dlsym() value on a static var, so you can just call that the following times.

    (Editors note: I still haven't done an expample for how to do pty's, but I hope I will be able to do one after I finish the non-blocking example code.)


    Previous Next Table of Contents n thought it was not used in the original Steve Deering code. I believe some BSD-derived systems may also include it (OSF, now Digital Unix, perhaps?). SO_REUSEPORT lets you bind the same address *and* port, but only if all the binders have specified it. But when binding a multicast address (its main use), SO_REUSEAunix-socket-faq-5.html010064400022570000215000000242430663340334700152430ustar00chodiglib00000400000073 Programming UNIX Sockets in C - Frequently Asked Questions: Writing UDP/SOCK_DGRAM applications Previous Next Table of Contents

    5. Writing UDP/SOCK_DGRAM applications

    5.1 When should I use UDP instead of TCP?

    UDP is good for sending messages from one system to another when the order isn't important and you don't need all of the messages to get to the other machine. This is why I've only used UDP once to write the example code for the faq. Usually TCP is a better solution. It saves you having to write code to ensure that messages make it to the desired destination, or to ensure the message ordering. Keep in mind that every additional line of code you add to your project in another line that could contain a potentially expensive bug.

    If you find that TCP is too slow for your needs you may be able to get better performance with UDP so long as you are willing to sacrifice message order and/or reliability.

    Philippe Jounin would like to add...

    In chapter 5.1 you say UDP allows more throughput than TCP. It is rarely the case if you have to pass several routers.

    For instance, if you connect two LANs via X25 (a common way in Europe!), every UDP datagram will :

    • establish a Virtual Channel (VC)
    • send the data
    • close the VC,
    whereas the VC remains during a TCP dialog.

    UDP must be used to multicast messages to more than one other machine at the same time. With TCP an application would have to open separate connections to each of the destination machines and send the message once to each target machine. This limits your application to only communicate with machines that it already knows about.

    5.2 What is the difference between "connected" and "unconnected" sockets?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    If a UDP socket is unconnected, which is the normal state after a bind() call, then send() or write() are not allowed, since no destination address is available; only sendto() can be used to send data.

    Calling connect() on the socket simply records the specified address and port number as being the desired communications partner. That means that send() or write() are now allowed; they use the destination address and port given on the connect call as the destination of the packet.

    5.3 Does doing a connect() call affect the receive behaviourof the socket?

    From Richard Stevens ( rstevens@noao.edu):

    Yes, in two ways. First, only datagrams from your "connected peer" are returned. All others arriving at your port are not delivered to you.

    But most importantly, a UDP socket must be connected to receive ICMP errors. Pp. 748-749 of "TCP/IP Illustrated, Volume 2" give all the gory details on why this is so.

    5.4 How can I read ICMP errors from "connected" UDP sockets?

    If the target machine discards the message because there is no process reading on the requested port number, it sends an ICMP message to your machine which will cause the next system call on the socket to return ECONNREFUSED. Since delivery of ICMP messages is not guarenteed you may not recieve this notification on the first transaction.

    Remember that your socket must be "connected" in order to receive the ICMP errors. I've been told, and Alan Cox has verified that Linux will return them on "unconnected" sockets. This may cause porting problems if your application isn't ready for it, so Alan tells me they've added a SO_BSDCOMPAT flag which can be set for Linux kernels after 2.0.0.

    5.5 How can I be sure that a UDP message is received?

    You have to design your protocol to expect a confirmation back from the destination when a message is received. Of course is the confirmation is sent by UDP, then it too is unreliable and may not make it back to the sender. If the sender does not get confirmation back by a certain time, it will have to re-transmit the message, maybe more than once. Now the receiver has a problem because it may have already received the message, so some way of dropping duplicates is required. Most protocols use a message numbering scheme so that the receiver can tell that it has already processed this message and return another confirmation. Confirmations will also have to reference the message number so that the sender can tell which message is being confirmed. Confused? That's why I stick with TCP.

    5.6 How can I be sure that UDP messages are received in order?

    You can't. What you can do is make sure that messages are processed in order by using a numbering system as mentioned in 5.5 How can I be sure that a UDP message is received?. If you need your messages to be received and be received in order you should really consider switching to TCP. It is unlikely that you will be able to do a better job implementing this sort of protocol than the TCP people already have, without a significant investment of time.

    5.7 How often should I re-transmit un-acknowleged messages?

    The simplest thing to do is simply pick a fairly small delay such as one second and stick with it. The problem is that this can congest your network with useless traffic if there is a problem on the lan or on the other machine, and this added traffic may only serve to make the problem worse.

    A better technique, described with source code in "UNIX Network Programming" by Richard Stevens (see 1.6 Where can I get source code for the book [book title]?), is to use an adaptive timeout with an exponential backoff. This technique keeps statistical information on the time it is taking messages to reach a host and adjusts timeout values accordingly. It also doubles the timeout each time it is reached as to not flood the network with useless datagrams. Richard has been kind enough to post the source code for the book on the web. Check out his home page at http://www.noao.edu/~rstevens.

    5.8 How come only the first part of my datagram is getting through?

    This has to do with the maximum size of a datagram on the two machines involved. This depends on the sytems involved, and the MTU (Maximum Transmission Unit). According to "UNIX Network Programming", all TCP/IP implementations must support a minimum IP datagram size of 576 bytes, regardless of the MTU. Assuming a 20 byte IP header and 8 byte UDP header, this leaves 548 bytes as a safe maximum size for UDP messages. The maximum size is 65516 bytes. Some platforms support IP fragmentation which will allow datagrams to be broken up (because of MTU values) and then re-assembled on the other end, but not all implementations support this.

    This information is taken from my reading of "UNIX Netowrk Programming" (see 1.6 Where can I get source code for the book [book title]?).

    Andrew has pointed out the following regarding large UDP messages:

    Another issue is fragmentation. If a datagram is sent which is too large for the network interface it is sent through, then the sending host will fragment it into smaller packets which are reassembled by the receiving host. Also, if there are intervening routers, then they may also need to fragment the packet(s), which greatly increases the chances of losing one or more fragments (which causes the entire datagram to be dropped). Thus, large UDP datagrams should be avoided for applications that are likely to operate over routed nets or the Internet proper.

    5.9 Why does the socket's buffer fill up sooner than expected?

    From Paul W. Nelson ( nelson@thursby.com):

    In the traditional BSD socket implementation, sockets that are atomic such as UDP keep received data in lists of mbufs. An mbuf is a fixed size buffer that is shared by various protocol stacks. When you set your receive buffer size, the protocol stack keeps track of how many bytes of mbuf space are on the receive buffer, not the number of actual bytes. This approach is used because the resource you are controlling is really how many mbufs are used, not how many bytes are being held in the socket buffer. (A socket buffer isn't really a buffer in the traditional sense, but a list of mbufs).

    For example: Lets assume your UNIX has a small mbuf size of 256 bytes. If your receive socket buffer is set to 4096, you can fit 16 mbufs on the socket buffer. If you receive 16 UDP packets that are 10 bytes each, your socket buffer is full, and you have 160 bytes of data. If you receive 16 UDP packets that are 200 bytes each, your socket buffer is also full, but contains 3200 bytes of data. FIONREAD returns the total number of bytes, not the number of messages or bytes of mbufs. Because of this, it is not a good indicator of how full your receive buffer is.

    Additionaly, if you receive UDP messages that are 260 bytes, you use up two mbufs, and can only recieve 8 packets before your socket buffer is full. In this case, only 2080 bytes of the 4096 are held in the socket buffer.

    This example is greatly simplified, and the real socket buffer algorithm also takes into account some other parameters. Note that some older socket implementations use a 128 byte mbuf.


    Previous Next Table of Contents unix-socket-faq-6.html010064400022570000215000000054400663340335500152410ustar00chodiglib00000400000073 Programming UNIX Sockets in C - Frequently Asked Questions: Advanced Socket Programming Previous Next Table of Contents

    6. Advanced Socket Programming

    6.1 How would I put my socket in non-blocking mode?

    From Andrew Gierth ( andrew@erlenstar.demon.co.uk):

    Technically, fcntl(soc, F_SETFL, O_NONBLOCK) is incorrect since it clobbers all other file flags. Generally one gets away with it since the other flags (O_APPEND for example) don't really apply much to sockets. In a similarly rough vein, you would use fcntl(soc, F_SETFL, 0) to go back to blocking mode.

    To do it right, use F_GETFL to get the current flags, set or clear the O_NONBLOCK flag, then use F_SETFL to set the flags.

    And yes, the flag can be changed either way at will.

    6.2 How can I put a timeout on connect()?

    Andrew Gierth ( andrew@erlenstar.demon.co.uk) has outlined the following procedure for using select() with connect(), which will allow you to put a timeout on the connect() call:

    First, create the socket and put it into non-blocking mode, then call connect(). There are three possibilities:

    • connect succeeds: the connection has been successfully made (this usually only happens when connecting to the same machine)
    • connect fails: obvious
    • connect returns -1/EINPROGRESS. The connection attempt has begun, but not yet completed.

    If the connection succeeds:

    • the socket will select() as writable (and will also select as readable if data arrives)

    If the connection fails:

    • the socket will select as readable *and* writable, but either a read or write will return the error code from the connection attempt. Also, you can use getsockopt(SO_ERROR) to get the error status - but be careful; some systems return the error code in the result parameter of getsockopt(), but others (incorrectly) cause the getsockopt call itself to fail with the stored value as the error.

    Sample code that illustrates this can be found in the socket-faq examples, in the file connect.c.


    Previous Next Table of Contents A NAME="s6">6. Advanced Socket Programming

    6.1 How would I put my socket in non-blocking mode?

    From Andrew Gierth ( andrew@erlenstar.dunix-socket-faq-7.html010064400022570000215000000042050663340336400152400ustar00chodiglib00000400000073 Programming UNIX Sockets in C - Frequently Asked Questions: Sample Source Code Previous Next Table of Contents


    7. Sample Source Code

    The sample source code is no longer included in the faq. To get it, please download it from one of the unix-socket-faq www pages:

    http://www.ibrado.com/sock-faq
    http://kipper.york.ac.uk/~vic/sock-faq http://www.ntua.gr/sock-faq

    If you don't have web access, you can ftp it with ftpmail by following the following instructions.

    To get the sample source by mail, send mail to ftpmail@decwrl.dec.com, with no subject line and a body like this:

      reply <put your email address here>
      connect ftp.zymsys.com
      binary
      uuencode
      get pub/sockets/examples.tar.gz
      quit
    

    Save the reply as examples.uu, and type:

      % uudecode examples.uu
      % gunzip examples.tar.gz
      % tar xf examples.tar
    

    This will create a directory called socket-faq-examples which contains the sample code from this faq, plus a sample client and server for both tcp and udp.

    Note that this package requires the gnu unzip program to be installed on your system. It is very common, but if you don't have it you can get the source for it from:

    ftp://prep.ai.mit.edu/pub/gnu/gzip-1.2.4.tar

    If you don't have ftp access, you can obtain it in a way similar to obtaining the sample source. I'll leave the exact changes to the body of the message as an excersise for the reader.


    Previous Next Table of Contents unix-socket-faq.html010064400022570000215000000175510664644134600151120ustar00chodiglib00000400000073 Programming UNIX Sockets in C - Frequently Asked Questions Previous Next Table of Contents

    Programming UNIX Sockets in C - Frequently Asked Questions

    Created by Vic Metcalfe, Andrew Gierth and other contributers

    May 21, 1998


    This is a list of frequently asked questions, with answers about programming TCP/IP applications in unix with the sockets interface.

    1. General Information and Concepts

    2. Questions regarding both Clients and Servers (TCP/SOCK_STREAM)

    3. Writing Client Applications (TCP/SOCK_STREAM)

    4. Writing Server Applications (TCP/SOCK_STREAM)

    5. Writing UDP/SOCK_DGRAM applications

    6. Advanced Socket Programming

    7. Sample Source Code


    Previous Next Table of Contents ss2.7">2.7 Please explain the TIME_WAIT state.
  • 2.8 Why does it take so long to detect that the peer died