Guten Abent everyone as the Germans say. Today we are talking about computer science. So I hope you enjoy. Also if you want to listen to these videos on Spotify or just support the channel you can join my Patreon link is in the description. Also make sure you are subscribed. And without a further ado let's begin. Somewhere around 4,000 years ago, a person in ancient Sumer sat down and picked up a wooden frame strung With beads. They weren't trying to build the future. They were just trying to count grain, maybe livestock, maybe debt, because even in

4,000 B.CE, somebody owed somebody else something. And keeping track of that by memory alone was a nightmare. That device was the abacus. And while it looks nothing like the thing sitting on your desk or in your pocket right now, it was doing something that every computer in history has also done. It Was turning a messy, complicated thing, numbers, into something physical, something you could actually move around and look at and trust. That urge, that specific human itch to take an abstract idea and make it concrete and reliable and fast. That itch is the whole story

of computer science. Everything else is just the details of how long that itch took to scratch. So Let's start at the beginning. Not the beginning of computers, but the beginning of the idea of computers because those two things are separated by about a century and that gap is where the whole field was actually born. For most of recorded history, computing was a job title when you needed large calculations done for astronomy, for navigation, for building a cathedral, for figuring out artillery trajectories. You hired people called computers, rooms full of them. They sat in rows and

passed numbers to each other one step at a time, checking each other's work. This wasn't metaphor. The original computers were human. The problem was obvious. Humans make mistakes. Humans get tired. Humans get bored halfway through the 900th repetitive edition. and write down the wrong digit. And then an artillery shell lands 200 m from where it was supposed to and Someone has a very bad day. This was the problem Charles Babage was staring at in the early 1800s. Babage was a British mathematician, wildly eccentric. One biographer called him an irassible genius. He was absolutely furious about

errors in mathematical tables. He'd spent years using those tables and he kept finding mistakes and the mistakes kept mattering and he wanted to fix it permanently. His Solution which he started designing around 1822 was a machine he called the difference engine. It would use a series of meshing gears to calculate mathematical tables automatically printing its results directly onto paper so that human transcription errors couldn't creep in. The design was beautiful. The execution was a disaster. Babbage kept redesigning it. The costs spiraled. The government Funding dried up and the difference engine was never actually finished in

his lifetime. But while he was still fighting to build the difference engine, Babage started designing something far more ambitious. He called it the analytical engine. The difference engine was a calculator. The analytical engine was something else entirely. It would have had a store, what we'd now call memory, capable of holding 1,000 numbers, each 50 digits Long. It would have had a mill, what we'd now call a processor, to perform operations on those numbers. It would accept input through punched cards, an idea borrowed from the jakard loom that was already weaving complex silk patterns using holes

in cards to control which threads were raised. And it would have been able to execute conditional branches and loops, meaning it could make decisions and repeat operations based on results. In other words, Babage Had designed a generalurpose programmable mechanical computer in the 1830s, a 100red years before anything like it would actually get built. He never finished the analytical engine either. But something important happened around it. A woman named Ada Love Lace, born Ada Byron, daughter of the poet Lord Byron, and genuinely extraordinary, became fascinated with Babage's designs. In 1843, She translated an Italian mathematician's paper

about the analytical engine and then she added her own notes. Those notes were longer than the original paper. In them, she wrote what many people consider the first computer program ever, a stepbystep procedure for calculating Bernoli numbers using the analytical engine. She also wrote something even more interesting. She wrote that the analytical engine Could in principle do anything that could be expressed as a set of operations on symbols. Not just numbers, but any kind of symbol at all. Music, letters, patterns. She saw that the machine wasn't really about arithmetic. It was about manipulation of symbols

according to rules. That was the seed of everything. Neither Babbage nor Lace got to see their ideas built. The engineering of the time just wasn't precise enough. The Gears needed tolerances that Victorian era manufacturing couldn't reliably produce. The project lived in notebooks and in a handful of partial prototypes and then it mostly got forgotten. The world moved on. The industrial revolution kept roaring and mathematicians kept doing mathematics. While engineers were failing to build Babage's machine, mathematicians were quietly doing something that would turn out to matter enormously. They were trying to put logic itself on a

rigorous footing. In 1847, a self-taught English mathematician named George Bull published a book called the mathematical analysis of logic. In it he argued that logical reasoning, the kind of thinking that lets you say if this A and D that are true, then the other thing is also true could be expressed using algebra. The core idea was this. Reduce everything to two states, true or false, one or zero. And then define operations A and D O R N O T that combine those states in predictable ways. A and D means both inputs must be true for

the output to be true. O R means at least one input must be true. N O T flips whatever you have. That sounds almost insultingly simple. But what Bul had done was find a way to make logic mechanical. If you could build a physical system that could be in two states and combine those systems in the right ways, you could automate Logical reasoning. For decades, Boolean algebra sat in the math journals looking interesting but not obviously useful. That changed in 1937 when a 21-year-old MIT master student named Claude Shannon wrote his thesis. Shannon noticed that

electrical circuits with switches could be in two states, open or closed, conducting or not. He noticed that you could arrange switches in series to get A N D behavior in parallel to get O R Behavior and use an inverter to get N O. And he proved that any logical or arithmetic operation could be built out of combinations of these arrangements. This was the moment when Boolean algebra became engineering. Shannon had shown exactly how to turn abstract logical rules into physical circuits. He'd built the bridge between math and machines. And meanwhile, a German engineer named Conrad

Zuz was actually building things. Working in his parents' Living room in Berlin in the late 1930s, Zuza built the Z1, a mechanical binary calculator that he programmed using punched film strips. It was unreliable and slow. But Souza understood something crucial. Binary was the right choice for machines because binary states on and off were easy for physical components to distinguish reliably. A system that needs to tell the difference between exactly two states is vastly more robust than one that needs to distinguish Between say 10. The Z3 completed by zoos in 1941 was the first fully functional

programmable computer. It processed binary numbers, read instructions from punched tape, and performed floating point arithmetic. It was destroyed in a bombing raid on Berlin in 1943. By then, on the other side of the world, a different war need was driving a different machine. The United States Army needed to calculate artillery firing tables. The tables mapped out how far a shell would travel given different launch angles, muzzle velocities, and atmospheric conditions. Calculating one table required around 750 separate differential equations. A human computer working a full day could calculate one trajectory. There were thousands of trajectories needed

per table. The solution was ANAC, The electronic numerical integrator and computer designed by John Mockley and J. Presper Eert at the University of Pennsylvania. Construction beginning in 1943 and completing in 1945. Eniac was enormous. It occupied an entire room. It used around 18,000 vacuum tubes, electronic components that could switch current on or off, like Shannon's relay circuits, but much faster. It weighed around 30 tons. It Consumed something in the neighborhood of 150 kow of electricity. enough to dim the lights in the surrounding neighborhood. When it powered up, it could perform around 5,000 additions per second,

which was roughly a,000 times faster than any mechanical machine before it. ENIAC also had a problem. It wasn't truly programmable in the modern sense. To change what it was computing, you had to Physically rewire it. The program was the machine's physical configuration, not something stored inside it. Changing tasks took days. The solution to this was already being thought through before Inyak was even finished. Mathematician John Fon Noyman wrote a report in 1945 describing what he called the stored program concept. Instead of wiring the program into the machine, store it in the same memory that stores

the data. Then the machine can read the program, execute it, and even modify it. all at electronic speed. This architecture, a processor that operates on data and a shared memory that holds both the data and the instructions telling the processor what to do with it is what we now call the vonoyman architecture. And it is with some modifications. The architecture of every conventional computer built since the first machines to actually implement It were built in Britain. The smallcale experimental machine at Manchester nicknamed Baby ran its first program on June 21st, 1948. It stored both data

and instructions in the same memory using cathode rate tubes. And it could actually search for the highest factor of a number by running through possibilities and updating its own memory with results. It took 52 minutes to run, but It ran all by itself, reading its own instructions, doing its own thing. Something had changed. The computer was no longer a calculator that humans aimed at a specific problem. It was a generalurpose symbol processing machine which meant the question was no longer can we build a machine to solve this specific calculation. The question had become something deeper

and stranger. What can any machine in principle Compute? That question had actually been asked and mostly answered a full decade before Enyac existed by a 24year-old British mathematician named Alan Turing. The context was a crisis in mathematics. In the late 1800s and early 1900s, mathematicians had been working to put all of mathematics on a rigorous formal foundation, trying to reduce everything to a set of Axioms from which all true mathematical statements could be derived. Then in 1931, Kurt Good dropped the bomb on that project. His incompleteness theorems showed that any formal system powerful enough to

describe basic arithmetic must contain true statements that cannot be proven within that system. Mathematics could not be made complete and consistent and formal all at the same time. A related question posed by The mathematician David Hilbert was whether there existed a general mechanical procedure, an algorithm that could take any mathematical statement and decide yes or no whether it was provable. Hilbert called this the ant chaidunks problem, the decision problem. In 1936, Alan Turing answered it. But to answer it, he first had to define what computation by a mechanical procedure even meant. And his definition was

extraordinary. He imagined an abstract machine, an infinitely long tape divided into squares, each square either blank or containing a symbol. A readright head moves along the tape, reading the current symbol, consulting a table of rules to decide what to write and which direction to move and updating its own internal state accordingly. That's the whole thing. That's a touring machine. What Turing then showed was that his Simple imaginary machine could compute anything that could be computed by any mechanical procedure whatsoever. And he showed something else. There exist problems that no touring machine can solve. Specifically, you

cannot build a touring machine that reads the description of another touring machine and reliably determines whether that machine will eventually halt or will loop forever. This is the halting problem and its undecidability Has a profound implication. There are questions you can ask of a computer. Clear and well-posed questions that no computer program can ever answer for all possible inputs. No matter how clever you are, no matter how much faster the processors get, some things are just outside the reach of mechanical computation. Full stop. Turring's proof used a beautiful trick, self reference. Suppose he said that

you had a machine H That could solve the halting problem. Feed it any program and any input and it correctly says halts or loops forever. Now build a new machine C that runs H on itself. If H says C halts, make C loop forever. If H says C loops forever, make C halt immediately. Now ask, what does H say about C running C? Either answer leads to a contradiction. So H cannot exist. The same flavor of self-referential paradox, like the liars paradox, like Goodles incompleteness, shows up everywhere at the foundations of logic and computation. Turing

had found the boundary of what machines could do before machines existed to test it against. Meanwhile, Alonzo Church at Princeton was arriving at equivalent results through a completely different route, the lambda calculus, a formal system for defining functions and their application. Church and Touring showed independently And then showed to each other that their systems were equivalent. Anything computable in one was computable in the other. From this grew the church touring thesis, the claim that these formal models capture everything that can be computed by any physical device following a rulebased procedure. It's a thesis, not a

theorem. It can be formally proven because there's no formal definition of any physical device. But it has held up For 90 years against every challenge. Every computer language ever written from assembly to Python to whatever you use is during complete meaning it can compute anything a juring machine can compute. So by 1945 the theoretical foundations were mostly in place. The math said here is what computation is here is what it can do and here is what it cannot do no matter what. The engineer said we have giant rooms full of vacuum tubes that can Actually

do this slowly and expensively. The gap between those two facts was where the next 50 years of computer science would live. The vacuum tube was the critical physical component of early computers and it had a critical physical problem. A vacuum tube is a glass envelope containing electrodes in a near vacuum. It can amplify signals and switch current on and off, but it generates heat. It burns out. ENIC 18,000 tubes failed at a rate of about One every 2 days. Most of the time, those early machines were down. Someone was crawling through them with a flashlight

looking for the dead, too. The solution arrived in 1947 at Bell Laboratories in New Jersey where a team led by William Shockley and including John Bardin and Walter Bratain demonstrated the transistor. A transistor does the same job as a vacuum tube. It switches and amplifies electrical signals, but does it with a Tiny piece of semiconductor material. No vacuum required, no hot filaments burning out. It was smaller, fester, more reliable, and used festly less power. To understand what the transistor actually does, think of it this way. A transistor has three terminals. Two of them are the

main conducting path like the two ends of a wire. The third terminal is the control input. When you apply a small voltage to the control terminal, it changes the electrical Properties of the semiconductor material between the other two terminals, either allowing current to flow or blocking it. An electrically controlled switch. And because it's a switch with two states, conducting or not, it maps perfectly onto binary logic. Voltage is one, no voltage is zero. Wire up enough transistors according to Shannon's rules and you can do Boolean logic. Wire up boolean logic in the right combinations and

you can do Arithmetic. Wire up arithmetic and you can compute. The transistor went into commercial production in the early 1950s and computers shrank from roomsized to cabinet sized. Then in 1958, Jack Kilby at Texas Instruments and Robert Noise at Fairchild Semiconductor independently developed the integrated circuit, a way to put multiple transistors along with the resistors And capacitors that connect them onto a single piece of semiconductor material silicon. Now instead of one transistor on its own, you could have dozens on one chip, then hundreds, then thousands. The transistors got smaller and the chips got more powerful.

And in 1965, a man named Gordon Moore noticed a pattern. The number of transistors on a Chip was roughly doubling every year or two. He predicted this would continue. Moore's law, as it came to be called, held for about 50 years. In 1971, Intel's first commercial microprocessor, the 4004, packed 2300 transistors onto a chip the size of a fingernail. By 2020, a modern processor might have 50 billion transistors on a chip the size of your thumb. Each Transistor is now just a few nanometers across. Smaller than the wavelength of light, smaller than many viruses. The

entire computational revolution of the last six decades flows from that one physical fact. Transistors got smaller and smaller and smaller. And every time they did, computers got faster and cheaper and smaller. So now you have a chip with billions of transistors doing boolean logic. Let's actually trace how that turns into something that feels Like thinking. The transistors are wired into logic gates. A logic gate takes one or two binary inputs and produces one binary output. According to a fixed rule, an A N D gate outputs one only if both its inputs are one. An O

gate outputs one if either input is one. A N O T gate flips its input. 1 becomes zero, zero becomes 1. A nan gate is a n and d followed by n o t. It outputs zero only when both inputs are one and one in all other cases. NAND gates turn out to be special because any other logic gate can be built from NAND gates alone. You can build an A and D from two NANs. You can build an O from three NANs. You can build an OT from a NAND with both inputs connected together.

This means a NAND gate is computationally complete. You could build an entire computer using nothing but NAND gates wired together in different patterns. Combine logic gates In particular ways and you get circuits that do useful things. Two gates can make a half adder. a circuit that adds two one bit numbers and produces a sum bit and a carry bit. Chain half adders together and you get a full adder, a circuit that adds two multi-bit binary numbers. Chain full adders together and you have an arithmetic unit. Add comparison circuits that can tell if one number is

larger than another. at circuits that can shift bits left or Right, which in binary corresponds to multiplying or dividing by two. Wire all of that together and you have something that can add, subtract, multiply, divide, and compare numbers, which is everything a processor needs to actually compute. Memory is a different trick. A flip-flop is a circuit made from two NAND gates in a particular feedback configuration. The output of each gate feeds into an input of the other. This circuit is Bestable. It's stable in either of two states and it stays in that state until you

send it a signal to change. A flip-flop can hold one bit of information. Put eight flip-flops together and you have one bite. Wire millions of them up with addressing circuits that can select any individual bite. And you have RAM, random access memory where you can read or write any stored value in roughly equal time. The CPU is the orchestrator Of all of this. It contains that arithmetic unit, the registers, small ultra fast memory locations right inside the processor. And the control unit, the control unit has one job. Fetch the next instruction from memory. Decode what

it means. execute it and then fetch the next one. This cycle fetch, decode, execute, repeats billions of times per second in a modern processor. Each clock tick a new instruction. Instructions are just numbers, specific binary patterns that the control unit knows how to decode. Every processor has a fixed vocabulary of instructions it understands. The instruction might say load the value at memory address X into register A or add the value in register B to register A or if register A is zero jump to memory address Y and start executing from there. That last one, the

conditional jump is everything. It's how A machine makes a decision. If this, do that. So when you write a program in Python and you use an if statement, somewhere at the bottom of all the abstractions, a conditional jump instruction is checking a bit in a register and either moving to the next instruction or skipping ahead. The sophistication of every piece of software ever written ultimately rests on that one lowlevel operation. Now, here's where things get complicated in an interesting way. The gap between billions of transistors doing simple logic operations and something like a word processor

or a video game is enormous. How do you bridge it? Layers. The whole trick of software engineering is layers of abstraction. Each one hiding the complexity of the layer below and presenting something more useful to the layer above. The lowest layer is machine code. Raw binary Instructions that the processor understands directly. Nobody writes machine code by hand anymore. Not really. Above machine code is assembly language where each instruction has a human readable name instead of a binary number. An assembler program translates assembly into machine code. Above assembly are higher level languages. The first of these

was Forran developed at IBM by John Bakis's team Between 1954 and 1957. Oran, short for formula translation, let scientists write mathematical formulas in something resembling algebra and a compiler program would translate that into machine code automatically. Before writing a program meant writing machine instructions one by one. After FORTRAN, a single line like y = a * x^ 2 + b * x + c would get translated into dozens of machine instructions automatically. Around the same time, a remarkable woman named Grace Hopper was working on the compiler concept at the Naval Reserve and later at Remington

Rand. Hopper built some of the earliest compilers and pushed hard for the idea that programming languages should be readable by humans, not just experts in binary. She helped develop Cobalt in 1959, which was designed so that business programs could be read almost like English. Upper is also perhaps Apocryphily associated with the term bug. A moth was found trapped in a relay of the Harvard Mark 2 computer in 1947 causing a malfunction. And Hopper's team taped it into the log book and wrote, "First actual case of bug being found. Languages kept evolving. Lisp arrived in 1958

from John McCarthy's AI research at MIT. Built around the lambda calculus and bringing with it ideas like recursion and symbolic computation that would echo Through decades of programming language design. Algo in 1960 introduced structured programming concepts. The idea of organizing code into blocks with clear beginnings and endings rather than arbitrary jumps. C was developed at Bell Labs in the early '7s by Dennis Richie and it struck a balance between expressiveness and closeness to the machine that made it enormously influential. Most of the operating systems And many of the languages you interact with today trace lineage

back to see. What all these languages share is the attempt to let humans express computational ideas in terms that are natural to humans while still giving a computer enough information to actually execute those ideas. The compiler or interpreter is the translator standing between the two. An operating system is the layer that sits between your hardware and your programs. And it has one fundamental problem to solve. Multiple things want to use the same resources. The CPU, the memory, the disk, and they can't all have everything all the time. The operating system is the referee. The idea

of multi-programming, keeping multiple programs in memory at once and switching between them, emerged in the early 1960s. IBM's operating system for the 360 product line released in 1964 was one of the first serious attempts to create a generalpurpose operating system. It was also famously delayed and over budget prompting Fred Brooks to write the mythical man month a book about software project management that is still being read and argued about today. The Unix operating system developed at Bell Labs starting around 1969 By Ken Thompson and Dennis Richie took a very different approach. Instead of one monolithic

program trying to do everything, Unix was built from small simple tools that each did one thing well. and that could be combined together by piping the output of one program into the input of another. This composability, this do one thing and do it well philosophy proved so powerful that Unix descendants are now everywhere. Linux, Mac OS, Android, and IOS all trace their lineage to Unix. Inside an operating system, the CPU scheduler decides which process gets to use the processor and for how long. The simplest approach, first come first served, just runs each program until it

finishes before starting the next. This works fine if every program takes roughly the same amount of time. It doesn't work at all if one program hogs the CPU for minutes while others wait. Roundrobin scheduling gives each process A fixed time slot called the quantum and then switches to the next one when the slot expires. Every process gets regular turns. Nobody starves. Modern schedulers are far more sophisticated considering priorities, how long processes have been waiting, whether they're doing CPU work or waiting for input and output, and many other factors. But the core tension is always the

same. Be fair to all processes while making Efficient use of the processor. Memory management has its own elegance. Early computers ran one program at a time and gave it all the memory. Then multirogramming demanded that multiple programs share memory which created a protection problem. One program's memory shouldn't be accessible to another or they'd corrupt each other. The solution that modern systems use is virtual memory. Each process thinks it has access to a large Contiguous block of memory starting at address zero. But that's an illusion. The operating system with help from a dedicated hardware component translates

those virtual addresses into actual physical locations in RAM. The translation uses a page table, a mapping from virtual pages to physical frames. And when a process tries to access a virtual address, that hardware looks up the page table and finds where that page actually lives. This creates a beautiful property. Programs are isolated from each other. They each see their own virtual memory space and can't directly touch each others. A crash in one program doesn't automatically corrupt another. And because the page table is maintained by the operating system, the OS can swap pages out to disk

when RAM fills up, giving programs more virtual memory than physically exists at the cost of speed. Since disk access is dramatically slower Than RAM access, this is virtual memory and it's why your computer can run more programs simultaneously than would fit in RAM, just slowly. While operating systems were being built and refined, computer scientists were grappling with a problem that shows up the moment you have more than a handful of data. How do you organize it? And how do you find what you're looking for? This is the domain of data structures And algorithms. And it's

where computer science starts feeling like a blend of mathematics and puzzle solving. An algorithm is a precise step-by-step procedure for solving a problem. The word predates computers by centuries. It comes from the name of the 9th century Persian mathematician Alqarismi whose book on arithmetic was translated into Latin and profoundly influenced European mathematics. But as computers arrived, algorithm Design became a central discipline in its own right. The simplest search algorithm is linear search. Start at the beginning of a list. Check each item one by one and stop when you find what you're looking for. If you

have a thousand items, you might need to check all thousand. If you have a million up to a million checks, the work grows linearly with the size of the input. Binary search is much cleverer. It requires that the list be sorted. Then You look at the middle item. If what you're looking for is smaller, it must be in the left half. Throw away the right half and repeat. If it's larger, it's in the right half. Throw away the left half and repeat. Each step cuts the remaining possibilities in half. To search a million items, you

need only about 20 comparisons. That's not linear growth. It's logarithmic. The difference between linear and Logarithmic is the difference between checking a million items and checking 20. That difference gets more dramatic as the list grows. But to use binary search you need sorted data which means sorting algorithms matter enormously. Bubble sort is the simplest. Go through the list. Swap adjacent elements that are in the wrong order. Repeat until no swaps are needed. It works. It's also slow. On average, it takes time proportional to the number of Elements squared. Double the list size and bubble sort

takes four times as long. For large data sets, this becomes impractical fast. Merge sort takes a different approach. Split the list in half. Sort each half. Merge the two sorted halves back together by repeatedly taking the smaller of the two front elements. The merge step is efficient and because you keep having the list, the total work grows as the list size times the Logarithm of the list size, which is faster than quadratic. Python's built-in sort uses a refined version called TIM sort, which is a hybrid of merge sort and insertion sort and exploits patterns in

real world data to run faster than the theoretical worst case most of the time. Quicksort, invented by Tony in 1959, chooses a pivot element, rearranges the list so everything smaller than the pivot is before it, and everything Larger is after it, and then recursively sorts each side. On average, quick sort is very fast. In its worst case, if you keep choosing bad pivots, it degrades to quadratic time. Modern implementations deal with this through randomized pivot selection. The efficiency of an algorithm is described using big O notation. O of N means linear time meaning the work

grows proportionally with input size. O of N log N means the slightly More than linear time of merge sort. O of N squared means quadratic time. O of log N means logarithmic time. the beautiful efficiency of binary search. Bo notation abstracts away the constant factors and focuses on how growth scales. A constant time operation O of one takes the same amount of work regardless of input size. That's the gold standard. Data structures are the ways you organize data to make certain operations Fast. An array stores items in consecutive memory locations. Accessing any element by its

index is O of one. You just calculate its address directly. But inserting into the middle requires shifting everything after it which is O of N. A linked list stores items in nodes each containing data and a pointer to the next node. Insertion is fast. You just update a pointer. But finding the nth item requires walking from the beginning through n minus one Nodes which is O of N. A hash table is one of the most useful structures. It takes your data's key, maybe a name or an ID, runs it through a hash function that converts

it to a number and uses that number as an index into an array. Lookup, insertion, and deletion are all O of one on average. The catch is collisions. Two different keys might hash to the same index. Good hash tables handle this gracefully, but a bad hash function that Produces many collisions can degrade performance significantly. Trees organize data hierarchically. A binary search tree keeps every left child smaller than its parent and every right child larger. Searching works like binary search. At each node, you go left or right, depending on what you're looking for. The tree stays

efficient as long as it stays balanced. If you always insert sorted data into a naive binary search tree, it degrades Into a linked list. Self-balancing trees like AVL trees and red black trees automatically rearrange themselves to stay balanced. Graphs are the most general structure. A graph is just a set of nodes connected by edges. Road networks are graphs. Social networks are graphs. The internet is a graph. Dependency relationships between software packages are graphs. Many of the most important problems in computer science are fundamentally graph problems. Finding the shortest path. Determining if two nodes are connected.

Detecting cycles. Now we get to a question that changed everything when it was asked and that still keeps theoreticians awake at night. To get there, you first need to know what computer scientists mean by polomial time. An algorithm runs in polomial time when the amount of work it needs grows At a rate that stays tied to the size of the input in a manageable way. double the input size and maybe the work doubles or quadruples or goes up eight-fold. The growth is bounded by some power of the input size. So as the input gets bigger,

the algorithm stays usable. Most of the algorithms we've talked about so far, sorting, searching, routing packets are polomial. They scale. Then there are problems where no one has found a polomial time solution. The work doesn't just double or quadruple. When the input grows, it explodes. The traveling salesman problem asks for the shortest route through a set of cities. Visiting each once with 10 cities, a brute force approach checks around 3 and 12 million possible routes. With 20 cities, it's checking over a trillion. Add a few more cities and even the fastest computer on the planet

would take longer than the age of the universe To check every option. That's nonpolial territory. In 1971, Steven Cook published a paper that noticed something strange about this divide. He was looking at satisfiability problems. Questions like given a set of logical conditions, is there an assignment of true false values to the variables that makes all the conditions satisfied simultaneously? Finding the answer from scratch looks Nonpolial. But here's the twist. If someone hands you a proposed answer, checking whether it's correct is quick polomial. Even cook showed that satisfiability sat at the center of an enormous web

of other problems with that same split personality. And he proved that if you could solve satisfiability in polomial time, you could solve all of them in polomial time because any one of them can be Translated into a satisfiability question. The following year, Richard Karp showed that dozens of other well-known problems, including the traveling salesman problem, graph coloring, and the knapsack problem, all lived in that same category. Every single one of them, finding a solution seems to require non-polomial time, but checking a proposed solution takes polomial time. This raised a question that sounds deceptively simple. Are

Those two things actually different? Is there a genuine wall between problems that are polomial to solve and problems that are only polomial to check? Or is that wall an illusion? And someday someone will find a polomial time algorithm for all of them. Nobody knows. It is the most famous unsolved problem in computer science and mathematics. One of the seven millennium prize problems with a million dollar prize attached. Most researchers strongly suspect the Wall is real. That some problems are genuinely nonpolial to solve even when they are polomial to check. But nobody has proven it. The

reason this question keeps everyone up at night is what would happen if the wall turned out to be an illusion. Almost all modern encryption relies on the assumption that certain mathematical problems like factoring large numbers into their prime factors are nonpolial to solve even though they are polomial to check. If that wall came Down, those problems might suddenly become polomial and every form of digital security built on that assumption, your bank, your messages, your login passwords would potentially be breakable. Cryptography as we know it would need to be completely rebuilt. And on the other side,

if the wall came down, then all those optimization problems that currently require nonpolomial time could Suddenly be solved in polomial time. drug discovery, supply chain optimization, protein folding, machine learning, training, many of the hardest problems humans face would become much more tractable. Either direction this goes resolves something enormous. And yet the question has sat there open for over 50 years, resisting every approach. This is not unusual in mathematics, but it's particularly striking in a field that Prides itself on building things that actually work. Computer science didn't just develop in a vacuum inside research labs. It

kept bumping into the real world, and the real world kept changing what computer scientists cared about. One of those collisions happened in 1969 when ARPANET came online. The Advanced Research Projects Agency network was funded by the US Department of Defense and was designed to link computers at universities and research institutions So they could share resources and communicate. It was specifically designed to be resilient. Unlike a traditional telephone network with central switching points that would fail if bombed, Arponet routed data in packets that could take different paths through the network depending on what was available. The

packet switching idea had been developed independently by Paul Baron at Rand Corporation and Donald Davies in Britain. The concept is this. Instead of establishing a dedicated circuit between two communicating machines the way a telephone call works, you break the message into chunks, packets. Label each packet with a destination address and release them into the network. Each packet finds its own way through available routers. The packets might arrive out of order. Some might get lost. The receiving end reassembles them into the original message. In 1974, Vincent Surf and Bob Khan published the paper that became the

foundation of how the modern internet works. They described a new protocol, a set of rules called TCP IP. IP the internet protocol handles addressing and routing. Each device on the network has a unique IP address and IP takes care of sending packets to the right address. TCP the transmission control protocol handles reliability. It breaks a message into packets, Numbers them, ensures they all arrive, requests retransmission of any that are lost and reassembles them in the correct order. TCP IP is deliberately divided into layers. The application layer contains protocols like HTTP for web browsing, SMTP for

email, and file transfer protocols for moving files around. The transport layer handles reliability or speed depending on the use case. The internet layer handles routing. The network access layer handles the physical connection, the actual electrons or photons moving through wires or air. Each layer only talks to the layer immediately above and below it. HTTP doesn't care about IP addresses. IP doesn't care about what application is running. This separation of concerns means you can change one layer without breaking the others. When wireless networking was invented, you just added it as a new Option at the network

access layer and everything above it, HTTP, email, everything kept working exactly the same way. Arponet became the internet. The internet grew slowly through the 70s and 80s, mostly as a tool for academics and researchers. Then in 1991, a British physicist named Tim Berners Lee working at CERN proposed and built something he called the worldwide web. The web is a layer on top of the Internet. A system of documents, web pages linked together by hyperlinks served using a protocol called HTTP and addressed using web addresses. The web made the internet usable for ordinary people in a

way that raw email and file transfers never had. The first web browser available to the public was Mosaic in 1993, which added the ability to display images in line with text. Within 2 years, the web had gone from a physicist Project to a global phenomenon. Within a decade, it had become a commercial infrastructure generating trillions of dollars. The same years that the internet was expanding brought another change. Computers got personal through the 1960s and most of the 70s. Computers were institutional machines. A university might have one. A company might have one. You booked time on

it. You didn't own it. The idea that an individual Person might own a computer was not taken seriously by most of the industry. Then in 1975, a small company called MITS, shipped the Altter 8800, a kit computer based on Intel's new 80080 microprocessor. It came with no keyboard, no screen, no storage. You programmed it by flipping switches on the front panel and read output from Blinking lights. It was by any reasonable measure nearly useless and it sold out immediately. The Alire told a very specific group of people, people who'd been dreaming about owning their own

computers that it was actually happening. Among those people was a young programmer named Bill Gates who with Paul Allen wrote a basic interpreter for the Alier and sold it back to MIT And Steve Wnjak who built the Apple 1 in 1976 and then the Apple 2 in 1977. a computer that actually worked straight out of the box with a keyboard and the ability to connect to a TV as a display. IBM entered the personal computer market in 1981 with the IBM PC. IBM made the crucial decision to build the PC around available off-the-shelf components, Intel's

8088 Processor and a disc operating system licensed from Microsoft. Because IBM used standard components and published its technical specifications, other companies could build compatible machines. The Clone Wars began and within a few years hundreds of companies were making IBM compatible PCs. The price dropped. The market exploded. Apple took a different direction. In 1984, the Macintosh shipped with something Most people had never used, a graphical user interface. Instead of typing text commands, you moved the mouse pointer around and clicked on icons and windows. The ideas for it had actually come from Xerox Park, the research division

of Xerox, where researchers had built the Alto, an early personal computer with a graphical user interface as early as 1973. Apple licensed and adapted those ideas. Microsoft followed with Windows, which eventually ran on the clone hardware that was becoming dominant. The graphical user interface changed. Who could use computers? Suddenly, you didn't need to know command syntax. You could see your files as little pictures and drag them around. The computer became a tool you could pick up with minimal training. This was the inflection point that turned computers from specialist Instruments into everyday appliances. Once computers were

everywhere, the next problem was making them talk to each other and making that conversation useful. The internet gave you the plumbing. The web gave you documents, but the web of the 1990s was mostly static pages that a human had written and uploaded, sitting there waiting to be read. The question that drove the next phase was, what if the computer could generate Pages dynamically responding to who was asking and what they'd asked before? This led to the development of databases, web servers, serverside programming, and eventually the whole architecture of modern web applications. A database is a

structured store of information that you can query efficiently. Relational databases which store data in tables and let you ask questions that span multiple tables had Been designed by Edgar Cod at IBM in 1970. The language for querying them which people call SQL became the industry standard. When you type something into a search engine or buy something online, a web server is receiving your request, querying a database, generating an HTML page from the results and sending it back to your browser. That cycle, request, query, generate, respond, is the engine behind almost Everything you do online. Google

founded by Larry Pageige and Sergey Brin in 1998 cracked one of the central problems of the early web with millions of pages. How do you find the good ones? Their page rank algorithm used the structure of the web itself specifically which pages link to which as a measure of quality. A page linked to by many other pages, especially authoritative pages, is probably more valuable than an obscure page nobody links to. PageRank Translated this intuition into a mathematical formula involving the iigen values of a large matrix, and it worked well enough to turn a graduate school

project into one of the most profitable companies in history. All of this was still fundamentally software running instructions that programmers had explicitly written. Around the same time, Google was growing. A different approach to computation was also maturing, One that Turing himself had gestured toward in 1950. In a 1950 paper titled Computing Machinery and Intelligence, Turing proposed what he called the imitation game, now called the Touring test. as a way to think about whether a machine could be considered to think. He asked not can machines think too philosophically loaded but can a machine fool a human

into thinking it's human Through text conversation. Then he sketched several paths toward building such machine. One of them was what he called the learning machine. A machine that instead of having its behavior programmed explicitly, learns from experience. The mathematical framework for machine learning that we now use traces back to the late 1940s. Warren McCullik and Walter Pittz had proposed in 1943 A mathematical model of a neuron. A unit that takes multiple binary inputs, weights them, sums the result, and outputs one if the sum exceeds a threshold. This was a simplified model of how biological neurons

work. And you could wire these artificial neurons together into networks. Frank Rosenlat built a physical implementation of this idea in 1957, the perceptron. It was a machine that Could learn to classify patterns. You showed it examples. Here is a pattern that belongs to category A. Here is one that belongs to category B. And it adjusted its internal weights each time it got a prediction wrong. Over many examples, it learned. When Rosenblot demonstrated it on simple visual patterns, the press went a little wild. Headlines about thinking machines. Then in 1969, Marvin Minsky and Seymour Peppert Published

perceptrons, a book that proved mathematically that single layer perceptrons couldn't learn certain classes of problems. Most famously, they couldn't learn the exor function, a function that returns true when inputs differ. This seemed to show fundamental limits on neural networks and funding for neural network research largely dried up. The field entered what became known as the AI winter. The root around Minsky and Papert's critique was Multi-layer networks stacking layers of neurons so that the outputs of one layer become the inputs of the next. A twolayer network can learn XOR. More layers can learn arbitrarily complex patterns,

but training multilayer networks required a way to efficiently update all those weights based on errors. The back propagation algorithm which had been known in various forms for decades was Popularized and made practical for neural networks by a paper from David Rumlhart Jeffrey Hinton and Ronald Williams in 1986. The idea of back propagation is to run the network forward, get an output, compare it to the correct answer, measure the error, and then propagate that error signal backward through the layers, updating each weight in proportion to how much it contributed to The error. Do this over millions

of examples and the weights gradually settle into values that produce accurate predictions. This worked in principle, but in the late 80s and '90s, the available data and computing power weren't enough to train large networks reliably. Neural networks kept getting beaten by simpler methods like support vector machines on most benchmark tasks. Another winter set in or at least a sustained skepticism About whether neural networks would ever fulfill their promise. What changed in the late 2000s and early 2010s was three things arriving at roughly the same time. First, massive data sets. The internet had been generating data

at staggering scale for two decades and companies like Google and Facebook had been collecting it. Second, graphics processing units, the GPUs designed to render video games by doing millions of simple calculations in parallel, turned Out to be exceptionally well suited to training neural networks, which also require millions of simple calculations in parallel. A GPU could train a network many times faster than a CPU. Third, better algorithms and architectural ideas that addressed some of the instability that had plagued deep network training. In 2012, a neural network called AlexNet, designed by Alex Kruvski, Ilia Sutskver, and Jeffrey

Hinton, entered the ImageNet large-scale Visual recognition challenge, a competition to identify which of a thousand categories an image belonged to. AlexNet didn't just win, it crushed the competition, cutting the error rate roughly in half compared to the previous state-of-the-art. The thing ran on two GPUs. The deep learning era had arrived. Within a decade, neural networks trained on large data sets were outperforming humans at image recognition, Defeating world champions at go and chess, translating between dozens of languages, generating photorealistic images, and producing text that read like a human had written it. The techniques kept improving. Convolutional

networks for images, recurrent networks for sequences, attention mechanisms, transformers. The scale kept growing. Millions of parameters, then billions, then hundreds of billions. The language models that most people have now interacted with are the product of all of that history. Turring's theoretical framework, McCullik and Pittz's neuron model, Hinton's back propagation work, the GPU hardware from gaming, the training data from the web and scale that previous generations of researchers would have Considered impossible. While machine learning was going through its long arc toward practicality, another field of computer science was becoming critical to everyday life in a way

that most people don't notice. Cryptography. The fundamental problem of cryptography is this. You want to communicate privately over a channel that might be observed. Before 1976, Every known method for doing this required that the two parties first share a secret key through some already secure channel. If you wanted to send an encrypted message to someone you'd never met, you first had to meet them to exchange the key. This was a chicken and egg problem for electronic commerce. You can't shop online securely until you've already exchanged a key with the store and you've never met the

store. In 1976, Whitfield Diffy and Martin Helman Published a paper that broke this problem open. They describe a key exchange protocol that lets two people who have never communicated before establish a shared secret over a public channel that anyone can observe without ever sending the secret itself. The trick relies on a mathematical operation called modular exponentiation that runs in polomial time going forward but appears to require nonpolomial time to reverse. Someone watching the exchange sees partial information but cannot reconstruct the secret without solving a problem that's computationally infeasible. Then in 1977, Ron Rivst, Adi Shamir

and Leonard Adelman went further. They invented RSA, a system with two keys. A public key that anyone can know and use to encrypt messages and a private key that only the recipient holds and uses to decrypt Them. The security of RSA rests on the difficulty of factoring large numbers. Multiplying two large prime numbers is easy given only the product. Finding the original primes is believed to be extremely hard. Your browser uses RSA or its modern successors every time you visit a website that starts with HTTPS. The same mathematical structures that make encryption work also underly

digital signatures which let you prove that a message Actually came from a specific sender. and hash functions which let you verify that a file hasn't been tempered with. The lock icon in your browser is resting on several decades of mathematical research into computational hardness which brings us back to quantum computing. A quantum computer exploits properties of quantum mechanics to process information in ways that classical computers cannot efficiently Simulate. The key idea is the cubit, a quantum bit where a classical bit is always either zero or one. A cubit can be in a superposition of zero

and one. a weighted combination of both states simultaneously. When you measure a cubit, you get a definite answer zero or one with probabilities determined by the superposition. But before measurement, the cubit Explores multiple computational paths at once. Entanglement lets cubits be correlated in ways that have no classical analog. The state of one cubit can be instantaneously related to the state of another regardless of the distance between them. Interference lets quantum computations amplify paths leading to correct answers and cancel out paths leading to wrong ones. In 1994, mathematician Peter Shore discovered a Quantum algorithm that can

factor large numbers exponentially faster than the best known classical algorithm. If a large enough quantum computer running Shor's algorithm existed, it could break RSA in hours instead of the billions of years a classical computer would need. This is why quantum computing is an existential threat to current cryptographic infrastructure and why governments and companies are urgently developing postquantum Cryptography. Encryption methods based on mathematical problems that remain hard even for quantum computers. The challenge is that building a quantum computer is extraordinarily difficult. Cubits are delicate. Any interaction with the environment, heat, vibrations, electromagnetic noise causes decoherence, collapsing

the superp position and destroying the computation. Current quantum processors have dozens To hundreds to now thousands of physical cubits, but they're noisy. Errors accumulate faster than many useful computations can complete. Quantum error correction can fix this, but it requires many physical cubits to encode each logical cubit reliably. Current estimates suggest you might need 1 million physical cubits to run Shor's algorithm on cryptographically relevant numbers. Google, IBM, Microsoft, and the cluster of startups are all racing Toward this. IBM has stated they expect to deliver meaningful quantum advantage by the end of 2026. Though most outside analysts

think fully fault tolerant quantum computing is more likely a late 20s or early30s achievement. The era we're in right now is characterized by machines with enough cubits to do interesting things but too much noise to do most useful things reliably. Hybrid quantum classical algorithms Are being developed that split a problem. The parts where quantum interference helps go to the quantum processor, the rest to a conventional computer. Let's step back from the technical details for a moment and think about what it all adds up to. Computer science has given the world several things that are not

purely engineering. They're closer to new ways of thinking about problems. The idea of algorithms that you can describe A procedure precisely enough for any agent to follow it regardless of understanding why it works changes how you approach any sufficiently complex problem. Breaking a task into unambiguous steps that can be executed without judgment is a profound cognitive move. is the basis not just of programming but of manufacturing processes, surgical protocols, financial contracts, and legal codes. The idea of abstraction, hiding complexity behind Simpler interfaces runs just as deep. You don't need to know how a transistor works

to write Python. You don't need to know how Python compiles to write a web application. You don't need to know how HTTP works to use a browser. Each layer hides the complexity below it and exposes something cleaner above it. The whole of technological civilization is built from stacked abstractions. Computer science just made this Principle explicit and systematic. The idea of computational complexity that problems have intrinsic difficulty levels and that difficulty scales with input size in measurable ways has quietly changed how we think about what's achievable. Before Cook's work, if a problem seemed hard, maybe you

just needed a cleverer algorithm. After Cook's work, you could potentially prove that no polomial time algorithm could exist for a problem, ruling out not just Today's attempts, but all possible future attempts. Knowing when to stop looking for a perfect solution and start looking for a good enough one is itself a form of wisdom. The idea of information, that data can be quantified, that you can measure how much information a message contains, that there's a limit to how much you can compress data without losing it, came from Claude Shannon's 1948 Paper, a mathematical theory of communication.

Shannon showed that every source of information has an entropy, a measure of its unpredictability, and that you cannot compress data below its entropy without distortion. This underlies every compression algorithm ever built. ZIP files, MP3 audio video streaming image formats. Shannon's work also showed that you could transmit data reliably Over a noisy channel if you used error correcting codes. By adding strategic redundancy to your message, you could detect and correct errors introduced by the channel. The coding theory that grew from Shannon's paper is why your QR codes still scan even when part of the image is

damaged. Why your CDs play cleanly even if scratched. why data transmitted across millions of miles of space from a satellite arrives intact. There's a theory that's worth sitting With, one that sits at the intersection of physics and computer science. It's called the computational universe hypothesis and it goes something like this. What if the universe itself is at some level running a computation? The physicist John Wheeler spent the latter part of his career arguing for what he called it from bit. The idea that at the most fundamental level, physical reality might emerge from Information theoretic relationships.

Not that the universe is like a computer in some loose metaphorical sense, but that information might be more physically fundamental than matter or energy. This connects to some deeply puzzling facts. The laws of physics are expressable as mathematical equations and those equations can be run as algorithms. The behavior of quantum systems can be simulated by quantum computers far more efficiently than by Classical ones. Which physicist Richard Fineman argued suggests that quantum mechanics is actually computational in nature. And Steven Wolram's work on cellular automa, simple rule-based systems applied repeatedly to grids of cells, showed that extraordinarily

complex lifelike behavior could emerge from extremely simple computational rules. If these ideas have teeth, then computer science isn't just a tool for Understanding computation. It becomes a tool for understanding reality. The church touring thesis would become not just a statement about mathematical formalisms but a statement about what kinds of processes can occur in the physical universe. This remains speculative, but the fact that the question is even sensible that physicists take seriously the idea that information might be a More fundamental category than mass or charge says something about how far computer science has moved from counting

grain with an abacus. Another idea worth sitting with the question of what consciousness and computation have to do with each other. Churing's 1950 paper raised this explicitly and it's still unresolved. There are people who have believed since at least the 1950s that sufficiently sophisticated Information processing will eventually produce something that deserves to be called consciousness or understanding. This is roughly the strong AI position. There are others who believe that there's something about biological consciousness, perhaps its quantum mechanical substrate, perhaps something else entirely that can't be captured in any formal computational system. Roger Penrose argued in

the Emperor's New Mind in 1989 that human mathematical insight involves non-computable processes and that consciousness might not be reducible to any touring machine. Nobody has won this argument. The large language models of the current era are in one sense extraordinary demonstrations of how much coherent contextually Appropriate behavior can emerge from essentially pattern matching at scale. In another sense, they're clearly doing something different from what happens when a person understands something. Where exactly the line is and whether any computational system could cross it is a question that remains genuinely open. The philosopher David Chalmer's called this

the hard problem of consciousness explaining not just the functional Coralates of awareness but why there's something it's like to be a conscious entity at all. Computer science has made enormous progress on the easy problems. Building systems that can recognize faces, translate languages, play games, generate text. But the hard problem hasn't budged. Which might mean consciousness isn't computational. Or it might mean we haven't built a system complex enough yet. Or it might mean we're asking the question wrong. So Where does all of this go from here? The most immediate pressure is quantum computing. The timeline is

still somewhat unclear, but the direction isn't. At some point, likely within this decade, quantum computers will be able to break current encryption schemes. The global cryptographic infrastructure needs to be replaced with postquantum algorithms. Before that happens, the US standards body finalized its first set of postquantum cryptographic standards In 2024. The migration is already beginning, but it will take years and touch essentially every piece of worked infrastructure on the planet. At the same time, the computational demands of training large AI models are running into physical limits. Training a Frontier model takes weeks on clusters of thousands

of GPUs, consumes enormous amounts of energy, and costs hundreds of millions of dollars. This can continue for a while, but the Growth curve can sustain itself indefinitely without new approaches. Neuromorphic computing, hardware architectures modeled more closely on biological neural structures that are dramatically more energyefficient than current GPUs is one research direction. Optical computing using photons instead of electrons is another. Both are early stage but attracting serious investment. The Semiconductor industry is running out of room on the two-dimensional chip. Transistors are now so small that quantum tunneling electrons passing through barriers they classically couldn't is becoming

a source of errors. The response is to stack chips vertically, adding a third dimension. 3D chip stacking is already in production for memory chips and is moving into processors. More exotic approaches like carbon nano Tube transistors are being explored in research labs. On the software side, the defining challenge of the next decade might be verification. How do you know that a large complex AI system is behaving safely and as intended? Neural networks are notoriously hard to interpret. They can produce confident, coherent, and completely wrong outputs. They can be subtly steered by adversarial inputs that are

invisible to Human observers. As these systems get deployed in higher stakes settings, medical diagnosis, autonomous vehicles, financial systems, infrastructure control, the gap between impressive demo and verified safe becomes an engineering problem with serious consequences. The field of formal verification using mathematical proof to guarantee that a program behaves correctly has made Progress on traditional software. But neural networks are far more resistant to formal analysis because of their scale and the implicit nature of what they've learned. interpretability research trying to understand what's happening inside these systems is still in early days. And then there's the question of

whether AI systems themselves will help accelerate computer science research. Already tools like Alpha Code can write Competitive programming solutions. Already large language models are helping with literature review, hypothesis generation and debugging. The possibility that AI might help crack that question about whether polomial and nonpolial problems are truly different or discover new quantum algorithms or make progress on the hard problem of consciousness is no longer science fiction. It's just probably not happening next year. The Trajectory from a Sumerian counting frame to a quantum processor is not a straight line. It loops and doubles back and fragments.

Babbage never finished his engine. Adah Love Lace died at 36. Iniak was built to calculate artillery trajectories and ended up as the ancestor of your smartphone. The internet was designed for military resilience and became a shopping mall and a social Graph and a library and a propaganda machine and an art form. Neural networks were dismissed twice and then took over. Computer science is not a story of inevitable progress marching from primitive to sophisticated. It's a story of ideas colliding with physics, mathematics, economics, war, accident, and the particular obsessions of particular people at particular moments. Bull

wasn't thinking about switching Circuits. Shannon wasn't thinking about file compression. Turing wasn't thinking about the computers that would follow from his thought experiments. And yet, each of their ideas landed in exactly the right place to become exactly the right foundation for what came next. The field is still young. The transistor is less than 80 years old. The internet is barely 50. Large language models as we know them are less than a decade old. The foundational questions, what can be computed, how efficiently by what physical means and what it would mean for a machine to truly

understand something are still open. Every time the story seemed to be over, it wasn't. The vacuum tube was supposed to make generalurpose computers impossible to maintain. The transistor made them reliable. Silicon was supposed to run Out of room at 10 nanome. It didn't. Neural networks were supposed to be fundamentally limited. They learned to write poetry. The gray counter in Sumer picked up an abacus because counting by memory wasn't good enough anymore. That gap between what human minds need and what they can do unaded is where all of this came from. The gap hasn't closed. It's

just gotten much, much more interesting.

The Entire History Of Computer Science (For Sleep)