1. Introduction#
In Socket programming, we often use the sockaddr_in
structure to build socket information.
struct sockaddr_in serv_addr;
memset(&serv_addr, 0, sizeof(serv_addr));
serv_addr.sin_family = AF_INET;
serv_addr.sin_addr.s_addr = inet_addr(ip);
serv_addr.sin_port = htons(port);
Let's take a look at the source code of the sockaddr_in
structure:
struct sockaddr_in {
short sin_family; // Address Family, AF_INET
u_short sin_port; // 16-bit TCP/UDP port number, network byte order
struct in_addr sin_addr; // 32-bit IP address, network byte order
char sin_zero[8]; // Not used, can be used for padding
};
We notice that on line 4, we don't directly use the s_addr
field to represent an IP address, but instead it is nested within a structure sin_addr
.
So what are the benefits of this?
2. Analysis#
On Unix platforms, the in_addr
structure is defined as:
typedef uint32_t in_addr_t;
struct in_addr {
in_addr_t s_addr; // 32-bit IPV4 address, network byte order
};
On Windows platforms, the in_addr
structure is defined as:
struct in_addr {
union {
struct {
u_char s_b1, s_b2, s_b3, s_b4;
} S_un_b;
struct {
u_short s_w1, s_w2;
} S_un_w;
u_long S_addr;
} S_un;
};
As we can see, the handling of the s_addr
field is different on different platforms, so this design ensures platform compatibility.
This explains why we see the s_addr
field wrapped in the in_addr
structure in the sockaddr_in
structure instead of directly using this field.
3. Analysis of Union in in_addr#
On Windows platforms, the in_addr
structure uses a Union type to represent the s_addr
field, which represents different parts of the IPV4 address using 4 bytes, 2 16-bit integers, or 1 32-bit integer.
So when we initialize the in_addr
field:
serv_addr.sin_addr.s_addr = inet_addr(ip);
We can use the aforementioned 3 types of Union to interpret the IPV4 address.
4. sockaddr Structure#
struct sockaddr{
sa_family_t sin_family; // Address Family, the address type
char sa_data[14]; // IP address and port number
};
struct sockaddr_in{
sa_family_t sin_family; // Address Family, the address type
uint16_t sin_port; // 16-bit port number
struct in_addr sin_addr; // 32-bit IP address
char sin_zero[8]; // Not used, usually filled with 0
};
struct sockaddr_in6 {
sa_family_t sin6_family; // Address type, value is AF_INET6
in_port_t sin6_port; // 16-bit port number
uint32_t sin6_flowinfo; // IPv6 flow information
struct in6_addr sin6_addr; // Specific IPv6 address
uint32_t sin6_scope_id; // Interface scope ID
};
It can be observed that sockaddr
, sockaddr_in
, and sockaddr_in6
actually have the same length, only sockaddr
combines the IP address and port number together, and the latter two are derived types of the former.
So why don't we directly pass in the IP:Port
format?
Because the API does not provide functions to parse IP
and Port
, and the original sockaddr
is inconvenient to use, which is why the latter two were created.
However, when using it, for example:
bind(serv_sock, (struct sockaddr*)&serv_addr, sizeof(server_addr)))
We use type punning (i.e., type casting) to call the above function, so that both sockaddr_in
and sockaddr_in6
can be used interchangeably.
Type punning: Refers to the technique of accessing the same block of memory with different types in C/C++, thereby effectively changing the type of the storage space, i.e., obtaining a certain bit pattern by changing the type of the variable.
There are many ways to use type punning, such as through Union and type casting, as well as officially sanctioned methods like memcpy.
However, when using type punning, it may lead to strict aliasing issues, so it needs to be used with caution.
Strict aliasing: Refers to an optimization feature in C/C++, which means that accessing an object with a type different from another type is absolutely not allowed. It can effectively avoid optimization errors and ensure the correctness of operations.
The reason it can be used here is: The lengths of these two structures are the same, and when type casting, no bytes are lost and there are no extra bytes.