TL;DR: The internet is an interconnection of independent networks and machines
that use a really simple ruleset to route traffic in between them in a failure
resistant way without any guarantee on relieability or trustability.
Since I know many people who have no idea how the Internet really works and what
it is but are talking often on how one should regulate it I decided to write this
hopefully short introduction into some basic networking theory. Please donât
be frightened when I write about Ethernet and basic IP networks as well as
subnetting at first - itâs required to discuss the concept of the Internet
later on.
Local networks - Ethernet (OSI layer 1 and 2)
First letâs take a look at the local network layer. There is a variety of
technologies available but Iâll focus on the one most commonly seen in private
and consumer areas. Ethernet is a rather old technology that dates back to 1973.
The basic technology still works the same as back then with some minor
changes.
First Ethernet provides an electrical standard for networking - usually seen
over Twisted Pair cables today (whatâs commonly seen as network connectors)
but in fact the 802.3 specification supports a huge number of technologies.
Coaxial cables have been used back in the 80âs, fiber optic cables are used
for long distance connections, there are parallel short cables (CX4) used
to interconnect switches in datacenters over less than a meter, there are
backplane specifications (KR) and other electrical carrier specifications.
Since electrical transfer is only half of the story Ethernet also provides
a Link-layer protocol. Each and every Ethernet device worldwide has a unique
48 bit device address also called the media-access-control (MAC) address.
This address is assigned by the manufacturer of the network equipment from
a namespace that has been assigned to the manufacturer. It should usually not
be changed and there are many applications that require that MAC addresses are
worldwide unique. At least itâs required for them to be unique on the local
network segment to allow ethernet to function correctly. Note that itâs easy
to spoof so itâs not usable for any security related stuff though.
Whatâs the MAC address used for? Data is transmitted between network equipment
in the form of packets. The Ethernet frame specifies the layout of a packet
transmitted between Ethernet capable equipment. The ethernet frame consists of:
- A preamble (7 bytes long) that provides an alternating bit pattern on the
transmission medium. This is usually used to perform clock recovery in form
of a phased locked loop on the receiver - i.e. itâs required for the receiver
to know the speed and timing at which bits are transmitted. The exact encoding
of bits is of course dependent on the physical layer technology. Together with
the minimum packet length it also plays a role in collision detection.
- A single byte start of frame delimiter (SFD) that terminates the preamble. This
is required since the receiver cannot count the bits received at the preamble
till the clocks have been synchronized so there has to be a way to detect
the end of the preamble even if itâs been partially missed. From this moment
on the receiver has to keep track of itâs clock for the remaining packet duration.
Because of the way the packets are encoded on the wire there is a coarse method
to perform error correction for any clock skew though - which is sufficient
during the whole maximum frame duration.
- A destination address - this is the MAC address of the device that should
receive the packet.
- A source address. This is the MAC address of the transmitter.
- Information about quality of service class (802.1q) and payload type
- 46 to 1500 bytes of payload data. The minimum data length is required for
proper collision detection and is directly correlated to the maximum cable
length for twisted pair and coaxial networks.
- A frame check sequence (FCS) that allows the receiver to calculate a basic
checksum and detect transmission errors.
- At least 12 octets inter-packet gap - i.e. a pause after each packet.
A network might consist of an arbitrary number of devices and might either be
switched (usually itâs today) or connected via a hub (done in the early days).
Using a hub the electrical signals from one transmitter are simply transmitted
to all receivers on the same network. This is done always - even if other
devices are transmitting at the same time. Collisions are handled with a
schema calls CSMA/CD - carrier sense multiple access / collision detection. This
basically means that a station that wishes to transmit does:
- Detect other carriers, it senses if another station is currently transmitting
and is backing off itâs own transmission till it doesnât sense another carrier
any more. Of course itâs still possible that two stations device at the same
time to start transmission - and then there is the propagation delay through
the network so that packets take some time to reach other stations. Just because
one doesnât sense a carrier it doesnât imply no station is transmitting at this
moment.
- Multiple access simply means that all stations are equal - there is no
reserved time slice, no token passed and no master/slave concept.
- Collision detection is the last essential part. During transmission the transmitter
still listens on the network for incoming data. In case it receives data it
starts to transmit a random jabber sequence that disturbs currently in flight
data packets so any previously transmitted bits do not get recognized wrongly
as part of a valid packet. This is also the reason for cable length limits and
minimum packet size - the timing for a single packet is large enough to reach the
other end of the network segment and get disturbed by another station before the
packet has been transmitted completely. This ensures that any collisions are
detected and packets are re-transmitted later on. Before any retransmission after
a collision has been detected the network devices introduce a random exponential
back-off interval that reduces the probability of a follow up collision.
Note that this behavior is the reason why one should consider a network thatâs
loaded up to 80% of itâs capacity outside of peak periods to be under-dimensioned.
All devices listen to all incoming traffic. Whenever they receive a packet with
their MAC address or a broadcast address listed as the target / destination
address they receive the packet. In any other case - i.e. a non matching address -
they discard the packet silently directly on the network processor except when this
device has been put into promiscuous mode deliberately. In this case the software
layer receives all packets that have been detected on the Ethernet layer which had
been a nice method to listen for all traffic on non-switched Ethernet segments - itâs
not sufficient to listen to all traffic by other nodes on modern networks though, one
would have to perform some ARP cache poisoning to do this (more on that later).
Since using a hub all packets have to be received by every station and the whole
network segment can only be used by a single device at the same time switching
has been introduced. A switch initially works exactly as a hub - all incoming
packets are transmitted on all ports. But for each incoming packet the switch
learns the source addresses and remembers on which ports theyâve been seen.
In case a packet arrives whose destination address matches a previously seen
source address the packet is only transmitted to the previously learned port.
Table entries are removed when they get updated with other up to date data (i.e.
a device has changed physical switch port) or after a given timeout. A special
class of packets - so called broadcast and multicast packets - are transmitted to
all ports and received by all nodes anyways in traditionally switched networks,
some more advanced networks only do this to broadcast packets and perform a process
called IGMP snooping to forward multicast packets only to ports that contain
subscribers for the given multicast group.
Of course switching also has a drawback - if one forms switching loops inside a
network by connecting switches with multiple paths (either indirectly or forming
a ring topology) packets might be passed around infinitely and cause network
overload. Of course building a ring topology or a fully meshed interconnect might
be interesting for fault tolerance so modern managed switches usually employ
a protocol called spanning tree (STP) that detects loops and disables ports as
long as multiple paths exist.
Note that this switching does not provide any security since itâs easy to poison
the switches caches to redirect traffic to different ports or flood them so they
start broadcasting packets again. On the other hand switching using crossbar switches
allows better network utilization and reduces the load on single network devices.
Depending on the switch it may even allow different speed devices to be attached
to the same switch without limiting the whole network segment to communicate at
slower speeds.
Note that the policy in case of network congestion (i.e. two devices try to transmit
to the same destination port) is to silently drop one of the packets. Therefore
higher protocol layers like TCP will have to detect such situations.
As one can see this schema works perfectly well for a single network segment
thatâs small enough to employ CSMA/CD and that also is capable of broadcasting
a single packet onto all ports as long as routes are not known. But itâs not
sufficient for larger networks, multiple segments or the Internet. Also all
devices on the same network segment are contained in the same broadcast domain
so the number of devices is usually also limited by broadcast traffic.
IP based networks
To counter these problems the Internet Protocol (IP) has been designed. It
can be used on top of any other networking protocol like Ethernet or even
serial protocols (serial line IP - SLIP - for example). Itâs also often
tunneled in SONET/SDH ATM frames, generic routing encapsulation packets or
VPN tunnels on top of other IP networks.
The basic idea is similar to Ethernet. Each packet that should be transmitted
contains a source IP and a target IP address. The network itself is divided into
subnets that are formed by applying subnet masks to addresses. One can imagine
a subnet being simply all addresses that share a given number of high order bits
of their IP addresses called a prefix. In fact subnets are usually encoded using
an IP address together with a prefix length.
IP Addresses and groups
In the following section the format of IP addresses as well as some special subnets
are listed. Donât be afraid if you donât totally understand the meaning of subnets
and prefixes until now - more on that will be written in the next section on routing.
IPv4 (32 bit) addresses are usually written as a sequence of 4 decimal numbers separated
by dots (for example 127.0.0.1
). One is usually tempted to read this number
decimal but only the binary representation is relevant. One can of course arbitrarily
select a prefix length but usually oneâs limited by a given subnet that one can
use for a specific application.
There is a number of networks that can be used for private networks that are never
routed on any public network:
IP Address |
Prefix length |
Usage |
10.0.0.0/8 |
8 bits |
Private class A network that consists of $2^24$ addresses |
172.16.0.0/12 |
12 bits |
16 class B networks that consist in total of $2^20$ addresses |
192.168.0.0/16 |
16 bits |
256 class C networks that consist in total of $2^16$ addresses |
100.64.0.0/10 |
10 bits |
Range assigned specially for carrier grade NAT as private network that does not collide with the earlier mentioned networks. Should not be used in private home networks |
169.254.0.0/16 |
16 bits |
A single class B network that should never be routed - not even in private networks. Used as link local addresses |
There are other networks that are reserved for special purposes:
IP Address |
Prefix length |
Usage |
0.0.0.0/8 |
8 bits |
Current network (only as source). Is used for some broadcasts |
127.0.0.0/8 |
8 bits |
Used for loopback addresses to the host itself - and sometimes also to local virtual machines and containers. Never leaves the local system |
192.88.99.0/24 |
24 bits |
Reserved, previously used for IPv6 to IPv4 relays |
198.18.0.0/15 |
15 bits |
Used for benchmarking and testing inside local networks across subnet boundaries. Sometimes seen at exchange points |
224.0.0.0/4 |
4 bits |
Prefix used for all multicast groups |
240.0.0.0/4 |
4 bits |
Reserved |
255.255.255.255/32 |
32 bits |
Limited broadcast |
IPv6 addresses are written in hexadecimal notation since they are 128 bits long.
Each 16 bit group is separated by a colon (:
). One can omit a single series
of zeroes by a double colon - which is sometimes done to separate an assigned prefix
and a static IP address consisting largely of zeros - or for the loopback address ::1
that consists of 127 zeros and a single one.
Again there is a group of special address ranges:
Range |
Prefix |
Usage |
::/0 |
0 |
Default route (only used symbolically) |
::/128 |
128 bits |
Unspecified invalid address, used only in software locally |
::1/128 |
128 bits |
Loopback |
::ffff:0:0/96 |
96 bits |
Space to map IPv4 addresses for easy transport of IPv4 over IPv6 (lower 32 bits are set to the IPv4 address) |
::ffff:0:0:0/96 |
96 bits |
IPv4 translated addresses. Another transition mechanism not as easy as the previous one |
64:ff9b::/96 |
96 bits |
Internet global IPv4 to IPv6 translation mechanism |
100::/64 |
64 bits |
Discard packets prefix |
2001::/32 |
32 bits |
Teredo tunnel solutions - allows IPv6 access over IPv4 networks without any tunnel broker |
2001:20::/28 |
28 bits |
Overlay Routable Cryptographic Hash Identifiers (ORCHID) |
2002::/16 |
16 bits |
Older 6to4 translation mechanism |
fc00::/7 |
7 bits |
Unique local address (ULA) - all site local networks reside under this subnet |
fe80::/10 |
10 bits |
Link local addresses |
ff00::/8 |
8 bits |
Multicast group prefix |
Routing
So what does this whole subnet process mean? IP networks are segmented into a
different groups of hosts and networks called subnets. Nearly every network can
be divided into further networks that share a common prefix. For example one
can take the private network 10.0.0.0/8
and choose to split it into
256 other networks ranging from 10.1.0.0/16
, 10.2.0.0/16
up to 10.255.0.0/16
.
The same can be done with IPv6 networks - but for technical reasons subnets are
not allowed to have a prefix longer than 64 bits there.
Traffic inside each subnet is transferred via the previously presented switching
process. Packets broadcasted or multicasted (without having switches that perform
IGMP snooping) are transmitted to all members of the networks - they form a common
broadcast domain. Additionally limitations on the network size apply.
Different subnets are connected by components called routers
. How do routers
decide how to route a package? They employ a routing table. A routing table basically
consists of a list of prefixes, prefix lengths and target ports (in practice there is
some different information like link metrics / costs, etc. but this is not required
for basic understanding and system configuration). Whenever a packet arrives at a
router and it has to take a decision it logically andâs the binary representation
with the prefix length (i.e. sets all bits not corresponding to the prefix to zero)
and compares the result with each known prefix. In case one prefix matches it selects
the port recorded with the specific prefix; in case multiple prefixes match routers
usually employ a longest prefix rule that chooses the longest common prefix for
a given packet to choose which port to forward to.
Letâs take a look at an (IPv4) example. Again it works the same for IPv6. Letâs say
the routing table consists of:
Subnet |
Binary notation (Subnet) |
Binary notation (mask) |
Port |
192.168.1.0/24 |
11000000101010000000000100000000 |
11111111111111111111111100000000 |
eth0 |
192.168.2.0/24 |
11000000101010000000001000000000 |
11111111111111111111111100000000 |
eth1 |
192.168.3.0/24 |
11000000101010000000001100000000 |
11111111111111111111111100000000 |
eth2 |
192.168.0.0/16 |
11000000101010000000000000000000 |
11111111111111110000000000000000 |
eth0 |
10.1.0.0/16 |
00001010000000010000000000000000 |
11111111111111110000000000000000 |
eth3 |
0.0.0.0/0 |
00000000000000000000000000000000 |
00000000000000000000000000000000 |
eth3 |
So if now a packet arrives with the destination address 192.168.1.16
the router would
logically and for each entry:
Entry |
Operation |
Result |
Match |
Match length |
1 |
192.168.1.16 and /24 |
192.168.1.0 |
Yes |
24 |
2 |
192.168.1.16 and /24 |
192.168.1.0 |
No |
 |
3 |
192.168.1.16 and /24 |
192.168.1.0 |
No |
 |
4 |
192.168.1.16 and /16 |
192.168.0.0 |
Yes |
16 |
5 |
192.168.1.16 and /16 |
192.168.0.0 |
No |
 |
6 |
192.168.1.16 and /0 |
0.0.0.0 |
Yes |
0 |
As one can see three of the known routes would match the packet. The router then chooses
the longest prefix and thus selects the first entry - choosing eth0
as target
interface to forward the packet to. If not match would have been found the router
would transmit an ICMP message called no route to host back to the senders address
to indicate there is no known route in existence for the selected target.
If one looks at the last entry in the routing table above one notices the 0.0.0.0/0
entry which at first glance doesnât make sense since it provides always a match. This
is called a default route and is usually not found on large routers on the Internet
but only on smaller routers in private and corporate networks. The idea is to forward
packets for which no route is known into the public Internet. This way not every
local router has to know a route to every other network worldwide. This - in contrast - is
totally different for the routers found on the Internet. All routers on the Internet
know via which interface they reach every other publicly announced network
worldwide.
Routing table configuration (the Internet)
How is this routing information configured? One can imagine that for smaller networks
this can be done manually - which is in fact whatâs done for example when manually
assigning IP addresses to interfaces.
If one assigns a static address to an interface one also assigns a subnet mask
together with the address. As soon as this happens the system adds a routing
entry for the given interface in itâs local routing table. For many small scale
site local networks this is enough - sometimes adding a default route is sufficient.
For dynamically configured local networks the same thing happens - the devices
get an address assigned by DHCP (IPv4) or SLAAC (IPv6) together with a subnet mask
or prefix length as well as a default route and calculate their routing tables
from this information.
When using DHCP a system starts by transmitting a DHCP discovery message onto the
local network using the broadcast address asking for an address assignment by
a service called DHCP server. The DHCP server(s) see this request, select an
IP address for a device - sometimes this is done dynamically, sometimes itâs
assigned statically based on the MAC address or the physical location of a device -
and transmit information back. This requires a stateful DHCP server that keeps
track of assigned IP addresses though. This has been changed with IPv6 where
IP address assignment is usually done via stateless autoconfiguration (SLAAC)
which works by having routers broadcasting the prefixes theyâre authoritative for
onto the network (ICMP router advisory). Every device seeing such an advisory
can take the announced prefix, derive a 64 bit local part from itâs MAC address
and attach it to the prefix - and thus has an address assigned automatically.
One can also assign additional configuration like a default route and DNS server
configuration using the same mechanism or then use DHCPv6 for providing additional
configuration information.
But how does this work for larger networks containing multiple subnets or
even the Internet? There is a bunch of routing protocols by which routers can
exchange their routing information. The two most commonly used protocols today
are called Optimized Link State Routing (OLSR) which is used on ad hoc networks
like wireless meshes or in local networks and Border Gateway Protocol (BGP) which
is in fact the protocol that literally builds the Internet.
The main difference is how both protocols implement discovery of neighbors and how
much information they keep. For BGP each router that is allowed to route a given
network is configured by a network operator to announce this responsibility. Announcing
means the router simply broadcasts to all neighbors that itâs responsible for
the given subnet via one hop. Routers then additionally also transmit all of
their routing information increasing the hop count by one (or adding some kind of
metric information for a given link to reflect link cost, link quality, etc.). Thus
they slowly learn which routes their neighbors can reach, which routes the neighbors
of their neighbors are capable of reaching, etc. and they learn which of their
ports provides the shortest route. Whenever they see a network announcement they
donât already know they learn that the given port is responsible for the given
network with the seen link cost / hop count. Whenever they see an announcement
for an already known network they check if the newly seen announcement has a lower
link cost / lower hop count. In this case they simply update their local routing
table. That way routers know always via which port theyâre capable of reaching
a network the shortest way - and it allows them to transmit packets into third
party networks that forward traffic to other attached networks - that way one
can reach any network attached to the Internet since usually networks are configured
to provide transit for all data packets through them. This is one of the major
building points of the Internet - any traffic received by a network is forwarded
as long as a route is known without charging the transmitter or receiver, without
looking at the payload or service and without any content discrimination - this is
whatâs really known as net neutrality and this is the second building point of
the Internet itself.
This behavior is also the reason why the Internet is fault resistant. Whenever
a network fails routers immediately learn different routes to other networks
as long as any direct or indirect link exists. This usually takes between seconds
up to minutes - sometimes there is a process called route flapping or short routing
loops for some minutes but the network converges into optimal state by using these
simple rules after a really short timespan.
There is an additional advantage to this process - one can announce a subnet
at different physical locations. This might not seem obvious or useful at first
but this technique - called anycast - allows service operators to host the
same service at different network topological and physical locations. Any packet
sent towards one of their networks always reaches the network topological closest
router that announces their network. That way one always reaches the geographically
closest system - but not necessarily the same system all the time. This is for example
the reason why people can use 8.8.8.8
as Googleâs DNS server address worldwide
without reaching the same routers or same system worldwide. And it allows another
method to provide redundancy.
Another advantage is that one can arbitrarily use oneâs IP subnets. IP addresses
never have any geographic association. One can announce any subnet of oneâs
addresses anywhere worldwide - even at multiple locations.
One drawback is of course that one can - and this has happened by some state
actors more often than one would think - announce networks that oneâs not responsible
for to divert traffic to one ownâs network. This does not go without notice and
usually leads to oneâs peers to disconnect from oneâs network if done on purpose.
Usually there are no laws governing this but itâs an unwritten rule on the Internet
that no one peers with someone else who announces invalid rules (be it for malicious
or for legal reasons - it doesnât matter) because itâs in the interest of everyone
themselves to keep routing tables functional and correct. Large network operators
also usually monitor the network for announcements not matching their owns. There
have been ideas of addition some kind of signatures to BGP but this has not
been deployed up until now and would require some additional central authority.
Now thatâs basically what the Internet is:
- a huge bunch of autonomous and independent networks that use IP (IP prefixes
and AS numbers - the latter being used with BGP - are the only thing allocated
by a central authority; IP subnets free of charge)
- Border routers that announce the networks theyâre responsible for.
- Routers exchanging their routing tables always learning the shortest routes
automatically via a really simple protocol that only relies on local knowledge
thatâs really simple and thus error resilient and fault tolerant.
- Routers forwarding any traffic not belonging to their own subnets to the shortest
known route to the target independent of source, destination or
content (
net neutrality
) free of charge (cost neutrality
)
Keep in mind: The Internet would not work without network neutrality (in this
case each network would require a direct connection to each other network) and
without cost neutrality (each packet would have indeterminate price). This is not
some political standpoint, itâs just a basic building block of the Internet.
Keeping addresses readable: DNS
Since no one wants to remember IP addresses an additional hierarchical naming schema
as well as a resolution protocol has been defined. The domain name service. The DNS
forms a hierarchical database - the root domain name servers know only the entries
of the .
root zone - these are called top level domains (.com.
, .at.
, .edu.
, etc.).
These entries contain references to another set of name-servers operated by local
registrars. These know the next level of domain names. For example the servers
responsible for .at.
operated by nic.at know
which of my own DNS servers are responsible for tspi.at.
and know some additional
signature information for DNSSEC - but they donât know who the www
or mail
host
is. In fact Iâm operating more than 200 different hostnames under the tspi.at.
zone. Traditionally webpages are found under the www
host thatâs than resolved
by my own domain name servers when asked who www.tspi.at.
is. This is an recursive
process. Please also keep in mind itâs bad practice to offer webservices directly
under the domain APEX (in this case tspi.at.
) for a variety of reasons. Just donât
do this.
Whenever a recursive resolver tries to resolve www.tspi.at.
it:
- First asks the root servers responsible for
.
who at.
is. The root
servers then respond with one of the known DNS servers responsible for the
Austrian zone.
- The recursive resolver then asks the Austrian DNS servers who
tspi.at.
is. These resolvers then return one of my own DNS server addresses back to
the resolver.
- In the next step the resolver asks one of my nameservers for
www.tspi.at.
This process is called recursive resolving and is rather resource intensive and
places some load onto the root servers. Thus a second method usually used by
clients and small gateways is employed - which is called forwarding. In this case
the client directly asks itâs forwarder who www.tspi.at.
is who then either
performs the recursion on behalf of the client or forwards to another DNS server.
This has the advantage that forward DNS servers are capable of caching queried
information. For example nameservers know for more than 24 hours whoâs responsible
for .at.
. One has to keep these (multiple) caching layers in mind when updating
DNS zones - this is a rather slow process that might take up to days depending
on configured TTLs and DNS server behavior.
Since the database is hierarchical itâs managed by different hierarchical entities.
One usually has to pay for usage of domain names.
The DNS can contain additional information besides just resolving to IP addresses.
For example one can refer mail servers, keep information about DNSSEC signatures,
keep information about used public keys for OpenPGP, publish information about
servers being responsible for specific services (heavily used for XMPP for example),
add telephone numbers - there is also a zone that resolves telephone numbers to
SIP accounts to aid transition to voice over IP networks - etc.
The world wide web
So now one knows what the Internet is - but why havenât webpages, E-Mail, etc. been
mentioned? Because these are technologies built on top of the Internet. Basically
the world wide web is a simple concept. It consists of a markup language called
the hypertext markup language (HTML) as well as an address schema called unified
resource locator (URL). Information is exchanged via a protocol called hypertext
transfer protocol (HTTP).
The idea again is pretty simple. Anyone who wants to publish content runs their
own servers that are reachable via the Internet. One then publishes some documents
written in HTML on ones server and everyone requesting the document gets them
sent by the server if one does not want to use any kind of paywall or authentication.
On the local system resources are specified by a path-like string that many times
maps to a filesystem structure (for example /directoryA/B/filename
) - but this
is simply an identifier. Whenever a system connects to the webserver and asks
for /directoryA/B/filename
the webserver transmits itâs answer to the client.
Note that it means that webpages thus are not out there for anyone to reach but
are kept in a public area and handed to requesters on purpose and explicitly by
the hosting servers - usually the content owners. This works in contrast to broadcast
systems like television where the provider simply sends out a stream of data
to all customers and they decide what they want to receive or decode - which is
an argument when politic requests payment of TV flat-rates for internet connections,
it makes simply no sense.
The web
part of the WWW is built by hyperlinks. Each HTML document can contain
metadata called links
- everyone knows them when reading a webpage. These are
simply references to some other resource on any other server worldwide.
Thatâs basically it.
This article is tagged: Basics, Internet, Tutorial, Computer