Understanding How the Internet Works

The Internet and the World Wide Web are two different things, although many people confuse the two.

This article provides a high level overview of how the Internet Works.

The Internet

The Internet is the global network of connected devices that adhere to the Internet Protocol (IP).

The Internet Protocol allows devices to connect to the Internet and communicate with other devices connected to the Internet. It is a specification for a basic unit of data transfer or datagram.

The IP datagram consists of a header and a payload.

The header has a source address and a destination address. As a result, there is no true anonymity on the Internet, since the transfer of information requires knowing the source and destination of the information transfer1.

The payload is the information part of the packet and its content depends on what is being transmitted. It is at the payload level that the World Wide Web and other Internet services (like email, Voice over IP (VoIP), File Transfer Protocol (FTP), etc) exist.

The IP Header also includes a transport protocol. This controls how the datagram is transported through the Internet. Two common transport protocols are: Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).

TCP is a robust protocol that verifies the packet is received. If the packet is not received, it is retransmitted.

UDP is less robust – it doesn’t verify that the packet is received. This may seem strange, but some data transfers don’t need to be robust – they can tolerate lost packets. For example, video streaming can survive the occasional “lost” data packet.

The use of TCP on the Internet is sometimes referred to as tcp/ip or TCP/IP. This is read as TCP over IP.

IP Addresses

Each device connected to the Internet must be uniquely identifiable2. This is done by assigning each device on the Internet a unique number called an IP Address. There are two types of IP addresses: IPv4 and IPv6 – they are not compatible.3

IPv4 is the older and most commonly used (for now) addressing scheme. Each device on the Internet is assigned a unique identifier. This identifier is a 32 bit number. This means there are 232 different numbers available. Which means a maximum of 4294967296 devices can be connected to the Internet when using IPv4 addressing4.

The addresses are written in dotted notation. This is done by dividing the 32 bit number into four 8 bit octets.5 Each octet is written in decimal notation separated by a period. IPv4 addresses look similar to these examples: 127.0.0.1 or 192.168.1.101, or 10.184.216.34.

IPv6 is a newer addressing scheme and was introduced to expand the number of available IP addresses.6 IPv6 addresses are 128 bits long and allow for 2128 unique addresses7.

IPv6 addresses are written differently from IPv4 addresses. The 128 bit address is divided into eight 16 bit hextets.8 These are written in hexadecimal9 notation and separated by colons10. IPv6 addresses look similar to these examples: ::111, or FE80:0000:0000:0000:0202:B3FF:FE1E:8329

Clients and Servers

Devices on the Internet are classified as clients or servers12.

A client requests resources from a server. It tends to have a transitory connection to the Internet and may not always be available.

A server provides (serves) resources to clients. It tends to be permanently connected to the Internet and always available.

You can think of the relationship between clients and servers as being similar to the relationship between students and teachers. The students (clients) request information from the teacher (server) and the teacher (server) provides information to the students (clients).

If clients wish to communicate with other clients, the information is sent first to the server and then the server forwards the information to other clients.

There is a third relationship: peer-to-peer (P2P). Like clients, peers tend to be characterized by transitory connections to the Internet.

Peers are devices that connect directly with other devices instead of going through a server. P2P devices act both as server and client. Communication between peers is direct and not mediated by a server.

Domain Names

Most people rarely (if ever) access resources on the Internet by entering IP Addresses. Addresses like 192.168.1.1 might be fine for computers but they are not easy for humans to use. A Domain Name is a string of characters (preferably easy to remember) that act as an alias for an IP Address.

Domain names are hierarchically organized. The Internet Corporation for Assigned Names and Numbers (ICANN) manages the structure of domain names on the Internet.

Domains are composed of two parts: the hostname and the top-level domain.13

For example: complete-concrete-concise is the hostname and .com is the top-level domain.

The mapping from a domain name into an IP Address is performed by a Domain Name System Server (DNS Server). The domain name is sent to the DNS Server and the server returns the associated IP Address.14

Globally, there are 13 root servers from which all domain names are resolved.15

Summary

  1. The Internet is the global network of connected devices which adhere to the Internet Protocol.
  2. The Internet Protocol is a datagram composed of a header and a payload.
  3. The IP header consists of a source address, destination address, and a transport protocol.
  4. Every device connected to the Internet requires a unique identifying address.
  5. The IP payload carries the data for a specific service. This might be the World Wide Web, email, Voice over Internet, or any other service.
  6. The Internet tends to be hierarchical with a client-server model.
  7. Clients request services and resources from the Internet and tend to have a transitory presence on the Internet.
  8. Servers provide services and resources on the Internet and tend to have a permanent presence on the Internet.
  9. Domain names are a (human readable) way of accessing resources on the Internet. Domain names are sent to a global network of DNS Servers that return the IP Address associated with the domain.

  1. You can try to make it difficult for someone to follow your trail by going through multiple intermediate connections (for example, by using Tor), but, ultimately, there is a direct path between the source and destination. It is a lot like playing “connect the dots”. If there are only two dots (source and destination), the connection is easy to follow. If there are multiple intermediate dots, then the path is harder to follow – but not impossible. A lot depends on whether the various intermediate points store visitor info and how easy it is to get access to that information. A route spanning multiple countries makes it harder to get access to the information – but not impossible.
  2. This is not entirely true. There are ways multiple devices can “share” the same IP Address. This will be covered in a future tutorial.
  3. This means there are two different Internets running side-by-side. Techniques allow traffic to flow between the two, but this will be covered in a future tutorial.
  4. Fewer addresses are actually available because a number of them are reserved. This will be covered in a future tutorial.
  5. An octet is 8 bits.
  6. On January 31, 2011, the Internet Assigned Numbers Authority (IANA) allocated the last of the IPv4 address blocks. This didn’t mean the world had run out of IPv4 IP Addresses, but it did mean there were no more available to hand out. Companies and organizations manage blocks of IP addresses which they have been allocated by IANA. There are a number of techniques that can be used to ensure everybody has a unique IP address when connecting to the Internet. This will be covered in more detail in a future tutorial.
  7. This is approximately 3.4×1038 addresses. As with IPv4, some addresses are reserved, so the actual number of available addresses is slightly less. Written out in full it is 340,282,366,920,938,463,463,374,607,431,768,211,456. More details will be given in a future tutorial.
  8. A hextet is 16 bits.
  9. Hexadecimal numbers are numbers written in base 16. In decimal notation, we use 10 unique digits: 0 through 9. Hexadecimal needs 16 unique digits, so we extend the decimal digits by adding the letters A through F as additional digits. In hexadecimal, the number 10 is written as 0xA, the number 15 as 0xF, the number 16 as 0x10. The prefix 0x is used to indicate the number is in hexadecimal notation.
  10. When you write out the address in full. A number of rules allow writing addresses more compactly. This will be covered in more detail in a future tutorial.
  11. This is permitted shorthand for 0000:0000:0000:0000:0000:0000:0000:0001.
  12. Actually, there are many more devices: routers, gateways, switches. Some of these may be covered in more detail in a future tutorial.
  13. It is a little more complicated and will be covered in more detail in a future tutorial. It is also closely related to the Universal Resource Locator (URL) – also to be covered in a future tutorial.
  14. It can be more complicated than this and will be covered in more detail in a future tutorial.
  15. The 13 root servers are a global network of hundreds of computers dedicated to mapping domain names to IP Addresses. The network is partitioned among 13 organizations who are responsible for managing their part.