Understanding How the World Wide Web (WWW) Works

The Internet and the World Wide Web are not the same things, even though people often use these terms interchangeably. The Web is a collection of webpages, videos, pictures, and applications that are all connected together, while the Internet is a global network of devices that can send and receive data.

A Little Background

In the 1980s, Sir Tim Berners-Lee was working at CERNfound it difficult to access documents and information from different computers:

I found it frustrating that in those days, there was different information on different computers, but you had to log on to different computers to get at it. Also, sometimes you had to learn a different program on each computer. So finding out how things worked was really difficult.¹

In 1989, he proposed a hypertext² based system for accessing, sharing, and linking documents. At that time, the standard practice for document organization was a centralized repository. Sir Tim Berners-Lee proposed to do away with that:

[T]he hope would be to allow a pool of information to develop which could grow and evolve with the organisation and the projects it describes. For this to be possible, the method of storage must not place its own restraints on the information. This is why a “web” of notes with links (like references) between them is far more useful than a fixed hierarchical system.³

In 1990, he wrote the first web browser and web server and developed three fundamental technologies that form the foundation of the World Wide Web:

Hypertext Markup Language (HTML) – this is used to create webpages.
Uniform Resource Locator (URL) – a string of characters used to identify a resource and its location.⁴
Hypertext Transfer Protocol (HTTP) – the protocol by which HTML resources are accessed on the Internet.

While Sir Tim Berners-Lee’s primary concern was for a web of interconnected hypertext documents, he realized that many other resources could be interlinked. His implementation was so successful that the World Wide Web includes:

Documents – like this webpage
Images – Flickr allows people to share and access photos
Video – YouTube allows people to share and access videos
Audio – Spotify allows people to listen to music
Applications – Google Docs is an online word processor

The first website to go live was at CERN in 1991.
By the end of 1991, there were a total of three websites worldwide:

CERN
The World Wide Web Virtual Library (also at CERN)
Stanford Linear Accelerator Center – the first North American website.

By the end of 1992, there were a total of 10 websites worldwide.
As of early 2023, there are almost 2 billion websites.

The World Wide Web

The Internet is the global network of interconnected devices transferring datagrams⁵ using the Internet Protocol (IP). Most datagrams are transported using the Transmission Control Protocol (TCP) protocol.⁶
The World Wide Web (WWW) is a service that runs on the Internet. It is the collection of documents⁷ written in Hypertext Markup Language (HTML)⁸, identified by a Uniform Resource Locator (URL), and transferred using the Hypertext Transfer Protocol (HTTP). All of this is done on the Internet using TCP and IP.
IP identifies devices on the Internet, TCP transports datagrams between them, and HTTP⁹ is used to access content. TCP is like a delivery company that transports goods to the correct address, while HTTP packages resources so that they can be transported by TCP from one device to another. There are other application layer protocols like FTP,¹⁰ SSH,¹¹ SMTP,¹² POP3,¹³ that are used on the Internet, but HTTP is the most commonly used.

HTTP is not the only application layer protocol in use on the Internet, but it is the most visible. Most people don’t notice (or even know about) other application layer protocols such as: FTP,¹⁰ SSH,¹¹ SMTP,¹² POP3,¹³ etc.

Hypertext Markup Language (HTML)

HTML is used to write webpages and web applications. It is often combined with Cascading Style Sheets (CSS) and JavaScript (JS).¹⁴
HTML documents are sent by web servers and rendered by web clients (web browsers) to display the contents of a webpage to a user.
Here’s an example of a simple webpage:

<!DOCTYPE html>
<html>
  <head>
    <title>A Simple Webpage</title>
  </head>
  <body>
    <h1>This is important!</h1>
    <p>Check out this site:</p>
    <a href="//example.com">Example</a>
  </body>
</html>

All HTML documents are composed of HTML tags which (usually) come in pairs. There is an opening tag, like <html> and its corresponding closing tag </html>. These tags markup the document.
HTML tags perform three different functions:

Structural Markup tags are used to indicate the structure and purpose of the text in the document. We can see that this page is divided into two parts: a <head> that contains metainformation¹⁵ about the page and a <body> that contains the content displayed to the user. In the body, we see there is a heading (<h1>) and a paragraph (<p>).
Presentational Markup is used to indicate how text should be displayed to the user – for example, bold, italic, ~~strikethrough~~, etc. There is no presentation markup in this simple webpage. CSS is recommended for presentation markup.
Hypertext Markup is used to create links inside the document to other documents or resources. In this page, there is a single hypertext link – the anchor tag <a>

In general, HTML documents are plain text documents with special annotations (HTML tags) to describe the structure of the document. Web clients (web browsers) display the content to the user.

Uniform Resource Locator (URL)

A Uniform Resource Locator (URL) is used to access resources on the WWW. It has the following format:

<access protocol>://<host>/<location & resource name>

access protocol specifies how the resource is to be accessed. For the WWW it is either HTTP or HTTPS¹⁶. There are many different access protocols for the Internet.¹⁷ Technically, you should always include the access protocol when you type the URL for a website. However, browsers (being helpful) will automatically prefix the protocol for you.
host specifies which device on the Internet contains the resource. It can be an IP address (like 127.0.0.1) or, more commonly, a human readable string (like www.complete-concrete-concise.com) which is translated by a Domain Name System (DNS) into an IP address.
location & resource name specifies the name of the resource and where it is located. If no resource name is given, web servers return index.html by default.

Let’s consider the following URL:

https://complete-concrete-concise.com/sample/helloworld.html

The access protocol is https.
The host is complete-concrete-concise.com.
The location & resource is /sample/helloworld.html.

Hypertext Transfer Protocol (HTTP)

HTTP responses indicate the status of a request, with two common responses being 200 (OK) for a successful request and 404 (NOT FOUND) for when the requested resource cannot be found.

HTTP is a request-response protocol for data transfer on the World Wide Web. It is most commonly used for transferring hypertext documents, like HTML, but can be used to transfer other types of content.

It operates on a client-server model. The client sends a request to a server. The server then responds to the client. For example, a web browser (client) that requests a webpage from a website (server); the website (server) responds by sending the webpage to the web browser (client).¹⁸

It is a stateless protocol.¹⁹ This means that each HTTP request is independent of all other HTTP requests. In other words, the current request knows nothing about previous requests: all information to fulfill the request must be contained in the request itself.

Two common HTTP requests are GET, used to request a resource from a server, and POST, used to send data to a server, such as when submitting a comment or form.

HTTP responses indicate the status of a request, with two common responses being 200 (OK) for a successful request and 404 (NOT FOUND) for when the requested resource cannot be found.

Summary

The World Wide Web is one of many services that run on top of the Internet.
The World Wide Web is the global collection of resources written using Hypertext Markup Language (HTML),²⁰ identified using a Uniform Resource Locator (URL), and transferred using the Hypertext Transfer Protocol (HTTP).
Clients request resources from servers using HTTP.
Servers respond to clients using HTTP.
HTTP is transported using TCP between devices adhering to the Internet Protocol.
The flexibility of HTTP has contributed to the World Wide Web becoming extremely popular because it handles all types of content: from documents to videos, from images to applications.

Understanding How the World Wide Web (WWW) Works

A Little Background

The World Wide Web

Hypertext Markup Language (HTML)

Uniform Resource Locator (URL)

Hypertext Transfer Protocol (HTTP)

Summary

Further Reading

World Wide Web

Uniform Resource Locators

Hypertext Transfer Protocol