How does the Internet work and what is it anyways?

02 Dec 2020 - tsp
Last update 03 Dec 2020
Reading time 29 mins

TL;DR: The internet is an interconnection of independent networks and machines that use a really simple ruleset to route traffic in between them in a failure resistant way without any guarantee on relieability or trustability.

Since I know many people who have no idea how the Internet really works and what it is but are talking often on how one should regulate it I decided to write this hopefully short introduction into some basic networking theory. Please don’t be frightened when I write about Ethernet and basic IP networks as well as subnetting at first - it’s required to discuss the concept of the Internet later on.

Local networks - Ethernet (OSI layer 1 and 2)

First let’s take a look at the local network layer. There is a variety of technologies available but I’ll focus on the one most commonly seen in private and consumer areas. Ethernet is a rather old technology that dates back to 1973. The basic technology still works the same as back then with some minor changes.

First Ethernet provides an electrical standard for networking - usually seen over Twisted Pair cables today (what’s commonly seen as network connectors) but in fact the 802.3 specification supports a huge number of technologies. Coaxial cables have been used back in the 80’s, fiber optic cables are used for long distance connections, there are parallel short cables (CX4) used to interconnect switches in datacenters over less than a meter, there are backplane specifications (KR) and other electrical carrier specifications.

Since electrical transfer is only half of the story Ethernet also provides a Link-layer protocol. Each and every Ethernet device worldwide has a unique 48 bit device address also called the media-access-control (MAC) address. This address is assigned by the manufacturer of the network equipment from a namespace that has been assigned to the manufacturer. It should usually not be changed and there are many applications that require that MAC addresses are worldwide unique. At least it’s required for them to be unique on the local network segment to allow ethernet to function correctly. Note that it’s easy to spoof so it’s not usable for any security related stuff though.

What’s the MAC address used for? Data is transmitted between network equipment in the form of packets. The Ethernet frame specifies the layout of a packet transmitted between Ethernet capable equipment. The ethernet frame consists of:

A preamble (7 bytes long) that provides an alternating bit pattern on the transmission medium. This is usually used to perform clock recovery in form of a phased locked loop on the receiver - i.e. it’s required for the receiver to know the speed and timing at which bits are transmitted. The exact encoding of bits is of course dependent on the physical layer technology. Together with the minimum packet length it also plays a role in collision detection.
A single byte start of frame delimiter (SFD) that terminates the preamble. This is required since the receiver cannot count the bits received at the preamble till the clocks have been synchronized so there has to be a way to detect the end of the preamble even if it’s been partially missed. From this moment on the receiver has to keep track of it’s clock for the remaining packet duration. Because of the way the packets are encoded on the wire there is a coarse method to perform error correction for any clock skew though - which is sufficient during the whole maximum frame duration.
A destination address - this is the MAC address of the device that should receive the packet.
A source address. This is the MAC address of the transmitter.
Information about quality of service class (802.1q) and payload type
46 to 1500 bytes of payload data. The minimum data length is required for proper collision detection and is directly correlated to the maximum cable length for twisted pair and coaxial networks.
A frame check sequence (FCS) that allows the receiver to calculate a basic checksum and detect transmission errors.
At least 12 octets inter-packet gap - i.e. a pause after each packet.

A network might consist of an arbitrary number of devices and might either be switched (usually it’s today) or connected via a hub (done in the early days).

Using a hub the electrical signals from one transmitter are simply transmitted to all receivers on the same network. This is done always - even if other devices are transmitting at the same time. Collisions are handled with a schema calls CSMA/CD - carrier sense multiple access / collision detection. This basically means that a station that wishes to transmit does:

Detect other carriers, it senses if another station is currently transmitting and is backing off it’s own transmission till it doesn’t sense another carrier any more. Of course it’s still possible that two stations device at the same time to start transmission - and then there is the propagation delay through the network so that packets take some time to reach other stations. Just because one doesn’t sense a carrier it doesn’t imply no station is transmitting at this moment.
Multiple access simply means that all stations are equal - there is no reserved time slice, no token passed and no master/slave concept.
Collision detection is the last essential part. During transmission the transmitter still listens on the network for incoming data. In case it receives data it starts to transmit a random jabber sequence that disturbs currently in flight data packets so any previously transmitted bits do not get recognized wrongly as part of a valid packet. This is also the reason for cable length limits and minimum packet size - the timing for a single packet is large enough to reach the other end of the network segment and get disturbed by another station before the packet has been transmitted completely. This ensures that any collisions are detected and packets are re-transmitted later on. Before any retransmission after a collision has been detected the network devices introduce a random exponential back-off interval that reduces the probability of a follow up collision.

Note that this behavior is the reason why one should consider a network that’s loaded up to 80% of it’s capacity outside of peak periods to be under-dimensioned.

All devices listen to all incoming traffic. Whenever they receive a packet with their MAC address or a broadcast address listed as the target / destination address they receive the packet. In any other case - i.e. a non matching address - they discard the packet silently directly on the network processor except when this device has been put into promiscuous mode deliberately. In this case the software layer receives all packets that have been detected on the Ethernet layer which had been a nice method to listen for all traffic on non-switched Ethernet segments - it’s not sufficient to listen to all traffic by other nodes on modern networks though, one would have to perform some ARP cache poisoning to do this (more on that later).

Since using a hub all packets have to be received by every station and the whole network segment can only be used by a single device at the same time switching has been introduced. A switch initially works exactly as a hub - all incoming packets are transmitted on all ports. But for each incoming packet the switch learns the source addresses and remembers on which ports they’ve been seen. In case a packet arrives whose destination address matches a previously seen source address the packet is only transmitted to the previously learned port. Table entries are removed when they get updated with other up to date data (i.e. a device has changed physical switch port) or after a given timeout. A special class of packets - so called broadcast and multicast packets - are transmitted to all ports and received by all nodes anyways in traditionally switched networks, some more advanced networks only do this to broadcast packets and perform a process called IGMP snooping to forward multicast packets only to ports that contain subscribers for the given multicast group.

Of course switching also has a drawback - if one forms switching loops inside a network by connecting switches with multiple paths (either indirectly or forming a ring topology) packets might be passed around infinitely and cause network overload. Of course building a ring topology or a fully meshed interconnect might be interesting for fault tolerance so modern managed switches usually employ a protocol called spanning tree (STP) that detects loops and disables ports as long as multiple paths exist.

Note that this switching does not provide any security since it’s easy to poison the switches caches to redirect traffic to different ports or flood them so they start broadcasting packets again. On the other hand switching using crossbar switches allows better network utilization and reduces the load on single network devices. Depending on the switch it may even allow different speed devices to be attached to the same switch without limiting the whole network segment to communicate at slower speeds.

Note that the policy in case of network congestion (i.e. two devices try to transmit to the same destination port) is to silently drop one of the packets. Therefore higher protocol layers like TCP will have to detect such situations.

As one can see this schema works perfectly well for a single network segment that’s small enough to employ CSMA/CD and that also is capable of broadcasting a single packet onto all ports as long as routes are not known. But it’s not sufficient for larger networks, multiple segments or the Internet. Also all devices on the same network segment are contained in the same broadcast domain so the number of devices is usually also limited by broadcast traffic.

IP based networks

To counter these problems the Internet Protocol (IP) has been designed. It can be used on top of any other networking protocol like Ethernet or even serial protocols (serial line IP - SLIP - for example). It’s also often tunneled in SONET/SDH ATM frames, generic routing encapsulation packets or VPN tunnels on top of other IP networks.

The basic idea is similar to Ethernet. Each packet that should be transmitted contains a source IP and a target IP address. The network itself is divided into subnets that are formed by applying subnet masks to addresses. One can imagine a subnet being simply all addresses that share a given number of high order bits of their IP addresses called a prefix. In fact subnets are usually encoded using an IP address together with a prefix length.

IP Addresses and groups

In the following section the format of IP addresses as well as some special subnets are listed. Don’t be afraid if you don’t totally understand the meaning of subnets and prefixes until now - more on that will be written in the next section on routing.

IPv4 (32 bit) addresses are usually written as a sequence of 4 decimal numbers separated by dots (for example 127.0.0.1). One is usually tempted to read this number decimal but only the binary representation is relevant. One can of course arbitrarily select a prefix length but usually one’s limited by a given subnet that one can use for a specific application.

There is a number of networks that can be used for private networks that are never routed on any public network:

IP Address	Prefix length	Usage
10.0.0.0/8	8 bits	Private class A network that consists of $2^24$ addresses
172.16.0.0/12	12 bits	16 class B networks that consist in total of $2^20$ addresses
192.168.0.0/16	16 bits	256 class C networks that consist in total of $2^16$ addresses
100.64.0.0/10	10 bits	Range assigned specially for carrier grade NAT as private network that does not collide with the earlier mentioned networks. Should not be used in private home networks
169.254.0.0/16	16 bits	A single class B network that should never be routed - not even in private networks. Used as link local addresses

There are other networks that are reserved for special purposes:

IP Address	Prefix length	Usage
0.0.0.0/8	8 bits	Current network (only as source). Is used for some broadcasts
127.0.0.0/8	8 bits	Used for loopback addresses to the host itself - and sometimes also to local virtual machines and containers. Never leaves the local system
192.88.99.0/24	24 bits	Reserved, previously used for IPv6 to IPv4 relays
198.18.0.0/15	15 bits	Used for benchmarking and testing inside local networks across subnet boundaries. Sometimes seen at exchange points
224.0.0.0/4	4 bits	Prefix used for all multicast groups
240.0.0.0/4	4 bits	Reserved
255.255.255.255/32	32 bits	Limited broadcast

IPv6 addresses are written in hexadecimal notation since they are 128 bits long. Each 16 bit group is separated by a colon (:). One can omit a single series of zeroes by a double colon - which is sometimes done to separate an assigned prefix and a static IP address consisting largely of zeros - or for the loopback address ::1 that consists of 127 zeros and a single one.

Again there is a group of special address ranges:

Range	Prefix	Usage
::/0	0	Default route (only used symbolically)
::/128	128 bits	Unspecified invalid address, used only in software locally
::1/128	128 bits	Loopback
::ffff:0:0/96	96 bits	Space to map IPv4 addresses for easy transport of IPv4 over IPv6 (lower 32 bits are set to the IPv4 address)
::ffff:0:0:0/96	96 bits	IPv4 translated addresses. Another transition mechanism not as easy as the previous one
64:ff9b::/96	96 bits	Internet global IPv4 to IPv6 translation mechanism
100::/64	64 bits	Discard packets prefix
2001::/32	32 bits	Teredo tunnel solutions - allows IPv6 access over IPv4 networks without any tunnel broker
2001:20::/28	28 bits	Overlay Routable Cryptographic Hash Identifiers (ORCHID)
2002::/16	16 bits	Older 6to4 translation mechanism
fc00::/7	7 bits	Unique local address (ULA) - all site local networks reside under this subnet
fe80::/10	10 bits	Link local addresses
ff00::/8	8 bits	Multicast group prefix

Routing

So what does this whole subnet process mean? IP networks are segmented into a different groups of hosts and networks called subnets. Nearly every network can be divided into further networks that share a common prefix. For example one can take the private network 10.0.0.0/8 and choose to split it into 256 other networks ranging from 10.1.0.0/16, 10.2.0.0/16 up to 10.255.0.0/16. The same can be done with IPv6 networks - but for technical reasons subnets are not allowed to have a prefix longer than 64 bits there.

Traffic inside each subnet is transferred via the previously presented switching process. Packets broadcasted or multicasted (without having switches that perform IGMP snooping) are transmitted to all members of the networks - they form a common broadcast domain. Additionally limitations on the network size apply.

Different subnets are connected by components called routers. How do routers decide how to route a package? They employ a routing table. A routing table basically consists of a list of prefixes, prefix lengths and target ports (in practice there is some different information like link metrics / costs, etc. but this is not required for basic understanding and system configuration). Whenever a packet arrives at a router and it has to take a decision it logically and’s the binary representation with the prefix length (i.e. sets all bits not corresponding to the prefix to zero) and compares the result with each known prefix. In case one prefix matches it selects the port recorded with the specific prefix; in case multiple prefixes match routers usually employ a longest prefix rule that chooses the longest common prefix for a given packet to choose which port to forward to.

Let’s take a look at an (IPv4) example. Again it works the same for IPv6. Let’s say the routing table consists of:

Subnet	Binary notation (Subnet)	Binary notation (mask)	Port
192.168.1.0/24	11000000101010000000000100000000	11111111111111111111111100000000	eth0
192.168.2.0/24	11000000101010000000001000000000	11111111111111111111111100000000	eth1
192.168.3.0/24	11000000101010000000001100000000	11111111111111111111111100000000	eth2
192.168.0.0/16	11000000101010000000000000000000	11111111111111110000000000000000	eth0
10.1.0.0/16	00001010000000010000000000000000	11111111111111110000000000000000	eth3
0.0.0.0/0	00000000000000000000000000000000	00000000000000000000000000000000	eth3

So if now a packet arrives with the destination address 192.168.1.16 the router would logically and for each entry:

Entry	Operation	Result	Match	Match length
1	`192.168.1.16` and `/24`	`192.168.1.0`	Yes	24
2	`192.168.1.16` and `/24`	`192.168.1.0`	No
3	`192.168.1.16` and `/24`	`192.168.1.0`	No
4	`192.168.1.16` and `/16`	`192.168.0.0`	Yes	16
5	`192.168.1.16` and `/16`	`192.168.0.0`	No
6	`192.168.1.16` and `/0`	`0.0.0.0`	Yes	0

As one can see three of the known routes would match the packet. The router then chooses the longest prefix and thus selects the first entry - choosing eth0 as target interface to forward the packet to. If not match would have been found the router would transmit an ICMP message called no route to host back to the senders address to indicate there is no known route in existence for the selected target.

If one looks at the last entry in the routing table above one notices the 0.0.0.0/0 entry which at first glance doesn’t make sense since it provides always a match. This is called a default route and is usually not found on large routers on the Internet but only on smaller routers in private and corporate networks. The idea is to forward packets for which no route is known into the public Internet. This way not every local router has to know a route to every other network worldwide. This - in contrast - is totally different for the routers found on the Internet. All routers on the Internet know via which interface they reach every other publicly announced network worldwide.

Routing table configuration (the Internet)

How is this routing information configured? One can imagine that for smaller networks this can be done manually - which is in fact what’s done for example when manually assigning IP addresses to interfaces.

If one assigns a static address to an interface one also assigns a subnet mask together with the address. As soon as this happens the system adds a routing entry for the given interface in it’s local routing table. For many small scale site local networks this is enough - sometimes adding a default route is sufficient. For dynamically configured local networks the same thing happens - the devices get an address assigned by DHCP (IPv4) or SLAAC (IPv6) together with a subnet mask or prefix length as well as a default route and calculate their routing tables from this information.

When using DHCP a system starts by transmitting a DHCP discovery message onto the local network using the broadcast address asking for an address assignment by a service called DHCP server. The DHCP server(s) see this request, select an IP address for a device - sometimes this is done dynamically, sometimes it’s assigned statically based on the MAC address or the physical location of a device - and transmit information back. This requires a stateful DHCP server that keeps track of assigned IP addresses though. This has been changed with IPv6 where IP address assignment is usually done via stateless autoconfiguration (SLAAC) which works by having routers broadcasting the prefixes they’re authoritative for onto the network (ICMP router advisory). Every device seeing such an advisory can take the announced prefix, derive a 64 bit local part from it’s MAC address and attach it to the prefix - and thus has an address assigned automatically. One can also assign additional configuration like a default route and DNS server configuration using the same mechanism or then use DHCPv6 for providing additional configuration information.

But how does this work for larger networks containing multiple subnets or even the Internet? There is a bunch of routing protocols by which routers can exchange their routing information. The two most commonly used protocols today are called Optimized Link State Routing (OLSR) which is used on ad hoc networks like wireless meshes or in local networks and Border Gateway Protocol (BGP) which is in fact the protocol that literally builds the Internet.

The main difference is how both protocols implement discovery of neighbors and how much information they keep. For BGP each router that is allowed to route a given network is configured by a network operator to announce this responsibility. Announcing means the router simply broadcasts to all neighbors that it’s responsible for the given subnet via one hop. Routers then additionally also transmit all of their routing information increasing the hop count by one (or adding some kind of metric information for a given link to reflect link cost, link quality, etc.). Thus they slowly learn which routes their neighbors can reach, which routes the neighbors of their neighbors are capable of reaching, etc. and they learn which of their ports provides the shortest route. Whenever they see a network announcement they don’t already know they learn that the given port is responsible for the given network with the seen link cost / hop count. Whenever they see an announcement for an already known network they check if the newly seen announcement has a lower link cost / lower hop count. In this case they simply update their local routing table. That way routers know always via which port they’re capable of reaching a network the shortest way - and it allows them to transmit packets into third party networks that forward traffic to other attached networks - that way one can reach any network attached to the Internet since usually networks are configured to provide transit for all data packets through them. This is one of the major building points of the Internet - any traffic received by a network is forwarded as long as a route is known without charging the transmitter or receiver, without looking at the payload or service and without any content discrimination - this is what’s really known as net neutrality and this is the second building point of the Internet itself.

This behavior is also the reason why the Internet is fault resistant. Whenever a network fails routers immediately learn different routes to other networks as long as any direct or indirect link exists. This usually takes between seconds up to minutes - sometimes there is a process called route flapping or short routing loops for some minutes but the network converges into optimal state by using these simple rules after a really short timespan.

There is an additional advantage to this process - one can announce a subnet at different physical locations. This might not seem obvious or useful at first but this technique - called anycast - allows service operators to host the same service at different network topological and physical locations. Any packet sent towards one of their networks always reaches the network topological closest router that announces their network. That way one always reaches the geographically closest system - but not necessarily the same system all the time. This is for example the reason why people can use 8.8.8.8 as Google’s DNS server address worldwide without reaching the same routers or same system worldwide. And it allows another method to provide redundancy.

Another advantage is that one can arbitrarily use one’s IP subnets. IP addresses never have any geographic association. One can announce any subnet of one’s addresses anywhere worldwide - even at multiple locations.

One drawback is of course that one can - and this has happened by some state actors more often than one would think - announce networks that one’s not responsible for to divert traffic to one own’s network. This does not go without notice and usually leads to one’s peers to disconnect from one’s network if done on purpose. Usually there are no laws governing this but it’s an unwritten rule on the Internet that no one peers with someone else who announces invalid rules (be it for malicious or for legal reasons - it doesn’t matter) because it’s in the interest of everyone themselves to keep routing tables functional and correct. Large network operators also usually monitor the network for announcements not matching their owns. There have been ideas of addition some kind of signatures to BGP but this has not been deployed up until now and would require some additional central authority.

Now that’s basically what the Internet is:

a huge bunch of autonomous and independent networks that use IP (IP prefixes and AS numbers - the latter being used with BGP - are the only thing allocated by a central authority; IP subnets free of charge)
Border routers that announce the networks they’re responsible for.
Routers exchanging their routing tables always learning the shortest routes automatically via a really simple protocol that only relies on local knowledge that’s really simple and thus error resilient and fault tolerant.
Routers forwarding any traffic not belonging to their own subnets to the shortest known route to the target independent of source, destination or content (net neutrality) free of charge (cost neutrality)

Keep in mind: The Internet would not work without network neutrality (in this case each network would require a direct connection to each other network) and without cost neutrality (each packet would have indeterminate price). This is not some political standpoint, it’s just a basic building block of the Internet.

Keeping addresses readable: DNS

Since no one wants to remember IP addresses an additional hierarchical naming schema as well as a resolution protocol has been defined. The domain name service. The DNS forms a hierarchical database - the root domain name servers know only the entries of the . root zone - these are called top level domains (.com., .at., .edu., etc.). These entries contain references to another set of name-servers operated by local registrars. These know the next level of domain names. For example the servers responsible for .at. operated by nic.at know which of my own DNS servers are responsible for tspi.at. and know some additional signature information for DNSSEC - but they don’t know who the www or mail host is. In fact I’m operating more than 200 different hostnames under the tspi.at. zone. Traditionally webpages are found under the www host that’s than resolved by my own domain name servers when asked who www.tspi.at. is. This is an recursive process. Please also keep in mind it’s bad practice to offer webservices directly under the domain APEX (in this case tspi.at.) for a variety of reasons. Just don’t do this.

Whenever a recursive resolver tries to resolve www.tspi.at. it:

First asks the root servers responsible for . who at. is. The root servers then respond with one of the known DNS servers responsible for the Austrian zone.
The recursive resolver then asks the Austrian DNS servers who tspi.at. is. These resolvers then return one of my own DNS server addresses back to the resolver.
In the next step the resolver asks one of my nameservers for www.tspi.at.

This process is called recursive resolving and is rather resource intensive and places some load onto the root servers. Thus a second method usually used by clients and small gateways is employed - which is called forwarding. In this case the client directly asks it’s forwarder who www.tspi.at. is who then either performs the recursion on behalf of the client or forwards to another DNS server. This has the advantage that forward DNS servers are capable of caching queried information. For example nameservers know for more than 24 hours who’s responsible for .at.. One has to keep these (multiple) caching layers in mind when updating DNS zones - this is a rather slow process that might take up to days depending on configured TTLs and DNS server behavior.

Since the database is hierarchical it’s managed by different hierarchical entities. One usually has to pay for usage of domain names.

The DNS can contain additional information besides just resolving to IP addresses. For example one can refer mail servers, keep information about DNSSEC signatures, keep information about used public keys for OpenPGP, publish information about servers being responsible for specific services (heavily used for XMPP for example), add telephone numbers - there is also a zone that resolves telephone numbers to SIP accounts to aid transition to voice over IP networks - etc.

The world wide web

So now one knows what the Internet is - but why haven’t webpages, E-Mail, etc. been mentioned? Because these are technologies built on top of the Internet. Basically the world wide web is a simple concept. It consists of a markup language called the hypertext markup language (HTML) as well as an address schema called unified resource locator (URL). Information is exchanged via a protocol called hypertext transfer protocol (HTTP).

The idea again is pretty simple. Anyone who wants to publish content runs their own servers that are reachable via the Internet. One then publishes some documents written in HTML on ones server and everyone requesting the document gets them sent by the server if one does not want to use any kind of paywall or authentication. On the local system resources are specified by a path-like string that many times maps to a filesystem structure (for example /directoryA/B/filename) - but this is simply an identifier. Whenever a system connects to the webserver and asks for /directoryA/B/filename the webserver transmits it’s answer to the client. Note that it means that webpages thus are not out there for anyone to reach but are kept in a public area and handed to requesters on purpose and explicitly by the hosting servers - usually the content owners. This works in contrast to broadcast systems like television where the provider simply sends out a stream of data to all customers and they decide what they want to receive or decode - which is an argument when politic requests payment of TV flat-rates for internet connections, it makes simply no sense.

The web part of the WWW is built by hyperlinks. Each HTML document can contain metadata called links - everyone knows them when reading a webpage. These are simply references to some other resource on any other server worldwide.

That’s basically it.