Robot episode 1: create a server in C
These last months, due to health-related reasons, I had to stay at home and not move too much. That’s part of why I have started this blog, and more generally been back to coding and Data Science. I’ve been a teacher for seven years (you can see some of my work here, but in French). It was wonderful, it was hard, it was inspiring, it was exhausting. But the last two years I’ve been thinking more and more about coming back to the dark side engineering. As life has decided that I should be stuck at home for a few months, it was time for the career change (here is my resume, and here is my LinkedIn, you never know…).
So I started with Data Science projects, since it was my speciality before becoming a teacher. But I’ve been a bit frustrated with finding exciting data, and it did not completely fulfill my need for challenging and fun problem-solving. And to add more context, my partner is a software engineer, and a very talented one. So he has slowly convinced me to work on more coding-oriented projects, and to use his help to gain skills in this subject. Side effect: the field is probably a little less saturated than Data Science, so maybe more open to my atypical profile.
And that’s how I’ve started this project. You can follow it on my GitHub here, and see what it does here. The idea is to move a little robot. But it’s an excuse for working on basic programming skills. So to add more fun, we have decided that I should do it in C, using only the standard library. I also want this article to act as a cheatsheet if I need to do something similar in the future.
Enough of this far too long introduction, let’s get started.
- The objective
- The code
- Break down
- socket – Create a socket and get its file descriptor
- inet_aton – Define IP address and port
- bind – Attach socket to the previously defined address and port
- listen – Mark the socket as ready to receive entry connexions
- accept – Get client connexion address
- read – Read data sent from client
- write – Send response to client
- close – Close client’s file descriptor
- poll – Go further: use poll to read properly everything in the kernel buffer
The objective
I just want to create a socket, listen for clients and read what they have for me, and for now, just write back the same request that I received.
But first, what is a socket? Well it is like a case number (we call it a file descriptor) that we use to communicate with the kernel.
The code
Click here to see all the code or hide it (128 line).
#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
#include <string.h>
int main(void) {
/*
Create a socket and get its file descriptor.
A socket is like a case number (a file descriptor)
Here, we tell him that we wanna use the IPV4 protocol family (AF_INET),
and the TCP protocol (SOCK_STREAM)
*/
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
perror("socket() failed");
return 1;
}
printf("Socket created, sockfd: %d\n", sockfd);
// I always want this to avoid error when I use this twice in a row
int optval = 1;
int ret = setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof optval);
if (ret < 0) {
perror("setsockopt() failed");
return 1;
}
// Define IP address and port
const char *interface = "0.0.0.0";
struct in_addr mysinaddr;
ret = inet_aton(interface, &mysinaddr);
if (ret == 0) {
fprintf(stderr, "Invalid IP address: %s\n", interface);
return 1;
}
struct sockaddr_in myaddr = {
.sin_family = AF_INET,
.sin_port = htons(8000),
.sin_addr = mysinaddr,
};
// Attach socket to the previously defined address and port
ret = bind(sockfd, (struct sockaddr*) &myaddr, sizeof myaddr);
if (ret < 0) {
perror("bind() failed");
return 1;
}
// Mark the socket as ready go receive entry connexions
ret = listen(sockfd, 1);
if (ret < 0) {
perror("listen() failed");
return 1;
}
// Repeat indefinitely for each new client
while (1) {
// Get client connexion address
struct sockaddr_in client_addr;
socklen_t client_addr_len = sizeof client_addr;
int clientfd = accept(sockfd, (struct sockaddr*) &client_addr, &client_addr_len);
if (clientfd < 0) {
perror("accept() failed");
return 1;
}
printf("\n --- NEW CONNEXION RECEIVED, clientfd: %d ---\n", clientfd);
// Get client IP address in '0.0.0.0' format for printing
char dst[16];
const char* ret2 = inet_ntop(AF_INET, &client_addr.sin_addr, dst, sizeof dst);
if (ret2 == NULL) {
perror("inet_ntop() failed");
return 1;
}
printf("Client IP address: %s\n", dst);
char buf[1000];
// Repeat indefinitely for each new request from current client
while (1) {
// Read data sent from client
// -1 to keep last bit for 0
ssize_t n = read(clientfd, buf, (sizeof buf) - 1);
if (n == 0) {
printf("Client %d disconnected\n", clientfd);
printf("-------------------------------------\n");
printf("-------------------------------------\n");
break;
} else if (n < 0) {
perror("read() failed");
break;
}
buf[n] = 0;
printf("Data received, size: %zi\n", n);
printf("DATA:\n");
printf("-------------------------------------\n");
printf("%s\n", buf);
printf("-------------------------------------\n");
// Create header for response
char header[100];
int w = snprintf(header, sizeof header, "HTTP/1.0 200 OK\r\nContent-Length: %d\r\n\r\n", n);
if (w < 0) {
perror("snprintf() for header failed");
break;
}
// Concatenate header and data of response
char str[w + n + 1];
ret = snprintf(str, sizeof str, "%s%s", header, buf);
if (ret < 0) {
perror("snprintf() for concatenation failed");
break;
}
// Send response to client (write to client file descriptor)
ret = write(clientfd, str, w + n);
if (ret < 0) {
perror("write() failed");
break;
}
}
}
}
Note that, here, I am using the code of one of my older commits, to avoid having to much noise. So this code really only creates the server, listens and speaks but no treatment is applied and at this stage, it was only sending back the received request.
Break down
socket
– Create a socket and get its file descriptor
int sockfd = socket(AF_INET, SOCK_STREAM, 0);
if (sockfd < 0) {
perror("socket() failed");
return 1;
}
printf("Socket created, sockfd: %d\n", sockfd);
// I always want this to avoid error when I use this twice in a row
int optval = 1;
int ret = setsockopt(sockfd, SOL_SOCKET, SO_REUSEPORT, &optval, sizeof optval);
if (ret < 0) {
perror("setsockopt() failed");
return 1;
}
We first use the function socket
defined in <sys/socket.h>
. The complete documentation is here but what I understand and need of it is the following:
int socket(int domain, int type, int protocol);
This function creates an end point for communication and returns a file descriptor that refers to that endpoint. It is this file descriptor that we are going to use in all next steps.
- Arguments:
- int domain : communication domain. There are many but I used
AF_INET
wich is for IPv4 Internet protocols. - int type : communication semantics. Same, there are severals (see the doc) but I used
SOCK_STREAM
for TCP protocol (I guessSOCK_DGRAM
might be for UDP protocol as the doc says that it supports datagrams, but I am not sure). - int protocol : 0 here, as there is a single protocol for this socket type.
- int domain : communication domain. There are many but I used
- Return value:
- int sockfd : the famous file descriptor. If negative, than there is an error.
Then I use setsockopt
. To be honest, I know that it is necessary to avoid errors when you run the server a second time quickly after stopping it, but I’ve just been following this article.
inet_aton
– Define IP address and port
const char *interface = "0.0.0.0";
struct in_addr mysinaddr;
ret = inet_aton(interface, &mysinaddr);
if (ret == 0) {
fprintf(stderr, "Invalid IP address: %s\n", interface);
return 1;
}
struct sockaddr_in myaddr = {
.sin_family = AF_INET,
.sin_port = htons(8000),
.sin_addr = mysinaddr,
};
I first use the function inet_aton
defined in <arpa/inet.h>
whose documentation can be found here.
int inet_aton(const char *cp, struct in_addr *inp);
This function converts an IPv4 address from number-and-dots notation into binary form.
- Arguments:
- const char *cp : host IPv4 address in number-and-dots format. I’ve used
0.0.0.0
that means “listen on every available network interface” from what I understand. - struct in_addr *inp : the result is stored in the structure pointed by inp. This structure is a
struct in_addr
which is defined in<in.h>
as follows:
- const char *cp : host IPv4 address in number-and-dots format. I’ve used
typedef uint32_t in_addr_t;
struct in_addr
{
in_addr_t s_addr;
};
So it’s just an unsigned int.
- Return value: 0 if invalid address.
Then I define myaddr
which is a struct sockaddr_in
(also defined in <in.h>
) containing the following elements:
- sa_family_t (int) sin_family : still
AF_INET
for IPv4. - in_port_t (uint) sin_port : port number, I used
8000
. - struct in_addr sin_addr : internet address, filled by the previous function.
bind
– Attach socket to the previously defined address and port
ret = bind(sockfd, (struct sockaddr*) &myaddr, sizeof myaddr);
if (ret < 0) {
perror("bind() failed");
return 1;
}
As its name suggests, bind
(defined in <socket.h>
and documented here) will…bind our socket to our address and port:
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
- Arguments:
- int sockfd : file descriptor of our previously created socket.
- const struct sockaddr *addr : previously defined IP address and port. Note that it was a
sockaddr_in
and now it is asockaddr
. In fact,sockaddr
is the generic structure for socket addresses, but we have defined an IPv4 one (that’s what the_in
insockaddr_in
means). So by adding(struct sockaddr*)
before&myaddr
we are telling to the compiler to interpret it as a generic socket address. - socklen_t addrlen : size of IP address.
- Return value: 0 if success, -1 otherwise.
listen
– Mark the socket as ready to receive entry connexions
ret = listen(sockfd, 1);
if (ret < 0) {
perror("listen() failed");
return 1;
}
Now we can finally listen
to potential clients!
It’s defined in <socket.h>
and documentation is here.
int listen(int sockfd, int backlog);
For once, it’s an easy one:
- Arguments:
- int sockfd : file descriptor of our socket.
- int backlog : maximum length of pending connections. Just 1 for now, so we will refuse new clients while we are dealing with one.
- Return value: 0 if success, -1 otherwise.
accept
– Get client connexion address
struct sockaddr_in client_addr;
socklen_t client_addr_len = sizeof client_addr;
int clientfd = accept(sockfd, (struct sockaddr*) &client_addr, &client_addr_len);
if (clientfd < 0) {
perror("accept() failed");
return 1;
}
printf("\n --- NEW CONNEXION RECEIVED, clientfd: %d ---\n", clientfd);
// Get client IP address in '0.0.0.0' format for printing
char dst[16];
const char* ret2 = inet_ntop(AF_INET, &client_addr.sin_addr, dst, sizeof dst);
if (ret2 == NULL) {
perror("inet_ntop() failed");
return 1;
}
printf("Client IP address: %s\n", dst);
Now everything is ready, so we can accept a new client! (from now, we will repeat the following steps indefinitely for each new client).
We use accept
, defined in <socket.h>
and documented here.
int accept(int sockfd, struct sockaddr *_Nullable restrict addr, socklen_t *_Nullable restrict addrlen);
This function extracts the first connection request from the queue of pending connections (so here, the only one as our queue is maximum of length 1), and returns a new file descriptor now refering to the client.
- Arguments:
- int sockfd : file descriptor of listening socket.
- struct sockaddr *_Nullable restrict addr : pointer to a sockaddr structure that is initialized before (same than before, we are working with IPv4 so
sockaddr_in
, but we interpret it as a genericsockaddr
) for saving client address. - socklen_t *_Nullable restrict addrlen : size of client address.
- Return value:
- int: -1 if error, client socket file descriptor otherwise.
Then we convert the client IP address into the number-and-dots format but that was only necessary when I wanted to print it at the beginning of the project. At the time I am writing this article, I have removed this part. But I used inet_ntop
function, defined in <inet.h>
and documented here (I don’t go in details here because I am not using it anymore).
read
– Read data sent from client
char buf[1000];
// …
ssize_t n = read(clientfd, buf, (sizeof buf) - 1);
if (n == 0) {
printf("Client %d disconnected\n", clientfd);
printf("-------------------------------------\n");
printf("-------------------------------------\n");
break;
} else if (n < 0) {
perror("read() failed");
break;
}
buf[n] = 0;
printf("Data received, size: %zi\n", n);
printf("DATA:\n");
printf("-------------------------------------\n");
printf("%s\n", buf);
printf("-------------------------------------\n");
This one is a bit tricky and I need to go a bit deeper in what happens when we want to read what the client sent us. When the client sends data to my server, the network interface card writes it in a kernel buffer (or maybe the kernel does, but at our level, it doesn’t change anything). We cannot directly access a kernel buffer, only the kernel can. So we are using the read
function to ask the kernel to copy its buffer into a buffer that we can use (the one called buf
in my code). The thing is, the kernel doesn’t know if the network interface card has received all the data sent by the client, and the kernel copies all it can copy. Which means that we are making here 2 assumptions to make our lives easier:
- We suppose that the whole client request has been received (but we can have fun with the
nc
command: start the server in a terminal, then open a new terminal and run the commandnc 127.0.0.0 8000
and hit Return, then type the begining of a request:GET / HTTP/1.1
and hit Return. The server sends you a response immediately, before you have any chance to finish sending your request). - We suppose that our buffer
buf
is big enough to get the full request. If it’s not,read
will copy what it can from the request, but not return an error. The rest of the request will remain available in kernel memory. So we end with an incomplete request, which may (or may not) be a problem. That’s why for now, I am using a large buffer by doingchar buf[1000];
. But we will see later how to deal with this by checking if the kernel buffer is empty or not, and reallocating memory tobuf
when necessary.
Anyway, now let’s see how read
works. It is defined in <unistd.h>
and documented here.
ssize_t read(int fd, void buf[.count], size_t count);
- Arguments:
- int fd : file descriptor from where we want to read. So the client file descriptor.
- void *buf : beginning of the buffer into which we want the kernel to write.
- size_t count : number of bytes that we want to read. I use
(sizeof buf) - 1
to keep one byte for the terminating 0 that is required by functions I use later.
- Return value:
- On success: the number of bytes read (can be smaller than count if there are fewer bytes actually available than required).
- On error: -1.
- If end of file: 0.
Note: what has been read is removed from kernel memory; we cannot read the same data twice.
write
– Send response to client
ret = write(clientfd, str, w + n);
if (ret < 0) {
perror("write() failed");
break;
}
Finally, we can answer! And this one is easier. write
is defined in <unistd.h>
and documented here.
ssize_t write(int fd, const void buf[.count], size_t count);
- Arguments:
- int fd : file descriptor to which we want to write (so the client file descriptor).
- const void *buf : beginning of the buffer containing what we want to write (so our answer, for example an HTML file).
- size_t count : number of bytes that we want to write (so should be the size of
buf
if we want to send all its content).
- Return value: -1 if error, number of bytes written otherwise.
close
– Close client’s file descriptor
Well I had forgotten it at the time but you should close the client’s file descriptor when the client is disconnected (I found it because I had a bug when I kept my finger on the F5 key, refreshing until death… I had no available file descriptors anymore). Luckily, that’s pretty easy :
// Close client file descriptor
int r = close(clientfd);
if (r < 0) {
perror("close() failed");
return 1;
}
close()
is documented here.
int close(int fd);
- Arguments:
- int fd: client’s file descriptor to close.
- Return value: -1 if error, 0 otherwise.
poll
– Go further: use poll to read properly everything in the kernel buffer
At the read
stage, we cheated a bit by defining a big buffer and hoping that the request would fit in. But what if we want to do it a bit more properly, so create a small buffer, read, check if everything has been retrieved, if not realloc memory for our buffer, read again…until all the data has been recovered?
Well the function we need for that is poll
, defined in <poll2.h>
and documented here.
int poll(struct pollfd *fds, nfds_t nfds, int timeout);
- Arguments:
- struct pollfd *fds : list of pollfds, which is a structure with the following elements:
- int fd : file descriptor (so for me the client file descriptor).
- short events : requested events (I used
POLLIN
to know if I can read, the list is in the documentation). - short revents : returned events (it is the output parameter of
poll
).
- nfds_t nfds : length of list, so for me 1.
- int timeout : number of milliseconds that
poll
should block waiting for a file descriptor to become ready. I used 0 so thatpoll
returns immediately.
- struct pollfd *fds : list of pollfds, which is a structure with the following elements:
- Return value: -1 if error, number of elements whose revents are nonzero (so for me, 0 or 1) otherwise.
This function waits for the file descriptor (could be many but we are using only one here) to be ready for input/output.
Here is how I use it:
// Read data sent from client
size_t buf_size = 10;
size_t data_len = 0;
char* buf = malloc(buf_size);
// Repeat indefinitely for each new request from current client
while (1) {
// Reinitialize data_len
data_len = 0;
struct pollfd client_pollfd;
client_pollfd.fd = clientfd;
client_pollfd.events = POLLIN; // can I read?
int n = read_client(clientfd, buf, &data_len, &buf_size);
if (n == 0) {
return;
}
if (n < 0) {
fprintf(stderr, "%s:%d - read_client() failed\n", __FILE__, __LINE__);
free(buf);
return;
}
while (1) {
int r = poll(&client_pollfd, 1, 0); // timeout = 0 causes poll() to return immediately, even if no file descriptors are ready
if (r < 0) {
perror("poll() failed");
return;
}
if ((client_pollfd.revents & POLLIN) == 0) {
break;
}
// Fill buf from where we stopped
n = read_client(clientfd, buf, &data_len, &buf_size);
if (n == 0) {
break;
}
if (n < 0) {
fprintf(stderr, "%s:%d - read_client() failed\n", __FILE__, __LINE__);
free(buf);
break;
}
}
// Finally, last realloc to have the exact needed size and add final 0
buf = realloc(buf, data_len + 1);
buf_size = data_len + 1;
buf[data_len] = 0;
read_client
is just a wrapper around read
to avoid copying some verifications twice:
int read_client(int clientfd, char* buf, size_t* p_data_len, size_t* p_buf_size) {
/*
Reading of client buffer into buf at position data_len (data_len is the
size of already written data) for as many characters as possible (to fill
the buffer buf or to read everything).
Returns an int:
0: client disconnected
-1: fail
n: read response, number of characters read
*/
ssize_t n = read(clientfd, buf + *p_data_len, *p_buf_size - *p_data_len);
if (n == 0) {
printf("Client %d disconnected\n", clientfd);
printf("-------------------------------------\n");
printf("-------------------------------------\n");
return 0;
}
if (n < 0) {
perror("read() failed");
return -1;
}
*p_data_len += n;
if (*p_data_len == *p_buf_size) {
// Buf is full, need to realloc
*p_buf_size *= 2;
buf = realloc(buf, *p_buf_size);
// printf("REALLOC!! New size: %zu\n", buf_size);
}
return n;
}
Here’s roughly what my code does:
- Creates a small buffer
buf
.buf_size
contains the maximum size of the buffer, anddata_len
the size already used in the buffer (so the remaining size isbuf_size - data_len
). - I read a first time. It is necessary to read a first time before using
poll
becauseread
is the function that will wait for something to arrive. If we don’t put this firstread_client
, we will loop on empty requests before the first one arrives. - Then we loop until everything has been retrieved:
- Use
poll
to check if there is still something to read. - If not (
(client_pollfd.revents & POLLIN) == 0
) then all the request has been retrieved and we can quit the loop. - Otherwise, we read everything we can. Note that in the function
read_client
, I check ifbuf
is full or not. If it is, I reallocate twice the memory to be able to read more if necessary on next loop. We might reallocate once too much at the end though (if we fill the buffer with exactly everything that was left to read).
- Use
- Once everything was retrieved, we reallocate one last time to the exact needed size and add final 0 (this step is not necessarily required).
And here you have it! Now we just have to process the request and make something interesting out of it! But it will be for the next episode…