How DNS Got Its Messages on Diet
An illustrated Guide to DNS Message Compression
DNS message compression refers to a technique used to remove redundant information from its messages. This technique works by replacing duplicated names in a DNS message with a two-byte value indicating the location of their first occurrence. Given that a domain name can be up to 253-byte long, DNS compression reduces substantially the length of DNS messages.
In this article, I present how DNS message compression works. I also provide a concrete example.
DNS Database and Resource Records
DNS refers to a distributed database maintained across multiple DNS servers including the Root, the TLD, and the Authoritative servers.
The DNS database consists of records called Resource Records (RRs) of different types. The most common types of RRs include:
- Type A records contain the IPv4 address corresponding to a domain name.
- Type AAAA records contain the IPv6 address corresponding to a domain name.
- Type CNAME records contain the canonical name corresponding to a domain name acting as an alias.
- Type NS records contain the authoritative server in charge of a domain name.
- Type MX records contain the name of the incoming mail server for a domain.
DNS clients send a DNS query message to fetch one of these RRs for a given domain name. To do so, a client encapsulates a DNS query in a UDP datagram submitted to the local DNS server. In case of a recursive query, the client will wait and receive the DNS answer message eventually.
DNS Messages
A DNS message contains a header followed by the following four (4) different types of sections (some may be empty):
- Questions
- Answers
- Authority
- Additional
Each of these four (4) sections contains 0 or more resource records.
The header consists of the following six (6) fields:
- Identification
- Control
- Number of Questions
- Number of Answers
- Number of Authority
- Number of Additional
The four (4) last fields named “Number of …” indicate the number of resource records contained in the corresponding four sections following the header.
The records listed in the Question section are incomplete whereas the other 3 sections contain complete resource records. The format of a Question record contains the following fields:
{QNAME, QTYPE, QCLASS}
where:
- QNAME refers to the name of the requested resource,
- QTYPE (16) refers to the type of the resource record (i.e., A, AAAA, CNAME, MX, …),
- QCLASS (16) is set to IN (0x0001) for Internet.
The records listed in the Answer, the Authority, and the Additional sections are complete resource records. They correspond to the resource records as stored in the DNS distributed database. The format of a resource record contains the following fields:
{NAME, TYPE, CLASS, TTL, RDATA_LENGTH, RDATA}
where:
- NAME refers to the queried domain name,
- TYPE (16) refers to the type of the resource record (i.e., A, AAAA, CNAME, MX, …),
- CLASS (16) is set to IN (0x0001) for Internet,
- TTL (32) is the duration in seconds during which the resource records will remain valid,
- RDATA_LENGTH (16) refers to the length of the RDATA field,
- RDATA is the data contained in this resource record.
The data in a resource record depends on the type of the resource record:
- For Type=A (0x0001), data is an IPv4 address.
- For Type=AAAA (0x001C), data is an IPv6 address.
- For Type=CNAME (0x0005), data is the canonical name.
- For Type=NS (0x0002), data is the name of the authoritative server.
- For Type=MX (0x000F), data is the name of a mail server.
Due to the many records listed in the 4 sections of a DNS message, a single DNS message may contain multiple names. The same name or a fraction of a name may be duplicated across the multiple records present in the message. Recall that the labels located at the end of a domain name is the name of the parent domains. These repetitions may lead to long messages which waste bandwidth unnecessarily.
DNS Message Compression
A technique called message compression is used to reduce the length of DNS messages by removing duplicated information. This technique targets the names contained in the records carried in the four sections of a DNS message.
According to the DNS message compression technique, a name can be provided either as a sequence of data labels, as a compression label, or as a mix of data and compression labels.
- A data label starts with one byte indicating the length in bytes of the label. The last label of a name is followed by the null label 0x00.
Example: Data label ‘fr’ is encoded by value 0x02667200 where 0x02 is the length of the label (2 characters) and 0x6672 the ASCII codes in hex for characters ‘f’ (0x66) and ‘r’ (0x72). 0x00 is the ending label since ‘fr’ is a TLD domain.
- A compression label refers to a 2-byte label which points to one or many consecutive data labels located above in the DNS message. Instead of repeating the same data label(s), a compression label indicates where to find the data label(s) to be reused. A compression label is 2-byte (e.g., 16-bit) long: The value of the first two bits are set to 0b11 and the remaining 14 bits indicates the offset of the data label(s) to be reused in the DNS message.
Example: 0xC010 indicates that this is a compression label and the data label(s) to be reused is (are) located 16 (0x0010) bytes from the start of the DNS message. All labels, if many, will be read starting from byte 16 until the ending label 0x00.
An Illustrative Example of DNS Compression
The following figure shows the DNS answer message received in response to a DNS query for name “www.upmc.fr”.
The values for the header fields of this message are:
- Identification: 0x37E5
- Control: 0x8180
- Number of questions: 0x0001
- Number of answers: 0x0002
- Number of authority: 0x0000
- Number of additional: 0x0000
The header is followed by the 2 non-empty sections Question and Answers.
The Question section contains one question record:
{NAME=www.upmc.fr, TYPE=1 (A), CLASS=1 (IN)}
The Answer section contains 2 resource records:
- Resource Record 1: Type CNAME
{NAME=www.upmc.fr, TYPE=5 (CNAME), CLASS=1 (IN), TTL=30 (seconds), RDLENGTH=6 (bytes), CNAME=web.upmc.fr}
- Resource Record 2: Type A
{NAME=web.upmc.fr, TYPE=1 (A), CLASS=IN (1), TTL=30 (seconds), RDLENGTH=4 (bytes), RDATA=134.157.250.59}
The DNS answer message contains 4 names listed below by order of appearance:
- Name 1: www.upmc.fr (3 data labels)
- Name 2: www.upmc.fr (1 compression label)
- Name 3: web.upmc.fr (1 data label, 1 compression label)
- Name 4: web.upmc.fr (1 compression label)
In the following table, each row represents a data label or a compression label. The value of each column varies depending on whether the row represents a data label or a compression label.
For a data label:
- Column 1: The length of the label,
- Column 2: The value of the label in hexadecimal,
- Column 3: The value of the data label in plain characters.
For a compression label:
- Column 1: The value of the compression label in hexadecimal,
- Column 2: The value of the offset in decimal,
- Column 3: The value of the data labels (in plain characters) that have been compressed.
After compression, the DNS message is 63 byte long. With no compression, the DNS message would have been 92-byte long:
name 1: 13 bytes (www.upmc.fr)
name 2: 13 bytes instead of 2 bytes (www.upmc.fr)
name 3: 13 bytes instead of 6 bytes (web.upmc.fr)
name 4: 13 bytes instead of 2 bytes (web.upmc.fr)
63-(2+6+2)+3*13 = 92 bytes.
The compression ratio for the DNS message in this example is 31.52%.
Let’s now consider the following trace representing the answer of the DNS request for lip6.fr's authoritative servers (nslookup -type=NS lip6.fr).
What is the compression ratio for this message? Detailed answer here.
Final Word
Techniques such as DNS message compression are of paramount importance for the Internet of Things and the integration of low-powered wireless devices as these techniques lower the size of IPv6 packets to match small MTUs. DNS message compression exploits the hierarchical nature of domain names. This technique works by removing the duplicated names and the labels shared between names belonging to domains under the same parent domain.