
Ship Efficiency & Sustainability
Ship Efficiency & Sustainability
Ship Efficiency & Sustainability
Ship Efficiency & Sustainability
Ship Efficiency & Sustainability
The Evolution of Artificial Intelligence: A Comprehensive Analysis of Exponential Growth from 1950 to 2025
By Michael Stein
June 16, 2025
12 min to read
Summary
AI has evolved rapidly over the past 75 years, from early rule-based systems to today’s powerful generative models. This post reveals the major milestones that have cleared the path for AI Evolution, including:
1950s–1970s: AI’s foundations were laid by pioneers like Turing and Shannon. Early rule-based expert systems like ELIZA and MYCIN showed initial promise.
1980s–2000s: Machine learning emerged with breakthroughs like backpropagation and convolutional neural networks (LeNet). IBM’s Deep Blue beating Kasparov marked AI’s rising capability.
2007–2017: Deep learning accelerated thanks to GPUs (CUDA), IBM Watson, Siri, Google DeepMind, and GANs. AlphaGo’s victory in Go showcased AI’s complex reasoning.
2017–2025: The Transformer architecture (Attention Is All You Need) sparked today’s generative AI boom. GPT models, ChatGPT, and massive multimodal models like Gemini and LLaMA now lead the field.
Looking Ahead: AI continues exponential growth, bringing both vast opportunities and ethical challenges as we approach Artificial General Intelligence (AGI).

Source: Author
Introduction
Artificial intelligence has long captured the imagination of storytellers and futurists. Decades before AI was a reality, fictional narratives (roughly from the 1960s to the 1990s) often anticipated real technological developments—from voice-activated assistants and autonomous vehicles to AI overlords. Whereas in the mid-20th century, such depictions felt safely distant, today in 2025 we find ourselves surrounded by evolving AI tools, AI software, and AI agents that seem to outperform one another on a weekly or monthly basis. We transitioned from old-school phone operators to AI avatars conducting interviews. We moved from rudimentary character recognition to large language models, semi-intelligent chatbots, and photorealistic AI-generated images and videos that challenge our perception of facts and reality.
The trajectory of artificial intelligence development represents one of the most remarkable technological progressions in modern history. With significant strides, AI is being integrated into our lives, often without a comprehensive understanding of its origins or the fundamental breakthroughs that have led to this moment, beginning in the 1950s. Understanding AI's development is challenging for the human brain because, firstly, we currently find ourselves amid rapid AI acceleration, and secondly, humans typically struggle to comprehend exponential growth patterns.
This article aims to provide an overview of AI developments, charting its course from early theoretical musings and rudimentary rule-based systems to sophisticated multimodal models, deep learning, and generative models. It examines key milestones and technological paradigm shifts, evolved through distinct phases marked by breakthrough innovations, periods of stagnation, and unprecedented acceleration—ultimately resulting in exponential growth patterns that have shaped the AI landscape over the past 75 years.
Fundamental Basis: The Computer Era
Dr. Claude Shannon’s creation of information theory made the digital world as we know it today possible. In addition to being a mathematician and computer scientist, Shannon introduced the concept of the "bit" (the basic unit of information), digital compression, and strategies for encoding and transmitting information seamlessly between two endpoints. Shannon’s seminal paper, “A Mathematical Theory of Communication,” (Shannon, 1948) quantified information mathematically and showed how information could be transmitted effectively over communication channels, such as phone lines or wireless connections, laying the foundation for today’s internet.
In 1946, ENIAC (Electronic Numerical Integrator and Calculator) was created at the University of Pennsylvania by John Mauchly and J. Presper Eckert, marking the first general-purpose, programmable electronic computer (Wikipedia, 2005). During its 10-year operational service, ENIAC reportedly performed more arithmetic calculations than all of humanity up to that point. However, although faster and more reliable than mechanical relays, its vacuum tubes were prone to errors, limiting operational time. In 1947, scientists at Bell Laboratories invented the transistor, a switch activated electrically across two electrodes in a semiconductor, significantly reducing computer sizes and costs—from room-sized to personal computers—ushering in the digital era.
The Rule Based Era and Expert Systems (1950 – 1979)
The genesis of artificial intelligence can be traced to Alan Turing's seminal 1950 paper, "Computing Machinery and Intelligence," which introduced the concept later known as the Turing Test (Stanford Encyclopedia, 2003; Lawrence Livermore National Laboratory, 2022). Turing's proposition fundamentally challenged the notion of machine cognition by establishing a behavioral benchmark for artificial intelligence—if a machine could engage in conversations indistinguishable from those of humans, it could be considered intelligent.
In the early 1950s, Dr. Claude Shannon created THESEUS, essentially a mechanical maze-solving device employing basic concepts of artificial intelligence and machine learning. Though mechanical, Theseus illustrated early notions of intelligent perception, decision-making, memory, and adaptation, influencing subsequent AI developments.
The formal establishment of AI as a research discipline occurred during the summer of 1956 at Dartmouth College, where John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon convened what is now recognized as the founding conference of artificial intelligence (Wikipedia, 2004; History of Data Science, 2021) . This gathering, funded by the Rockefeller Foundation, marked the first usage of the term "artificial intelligence" and established the foundational research agenda that would guide the field for decades (Wikipedia, 2004; History of Data Science, 2021). | ![]() Picture: Attendees of the 1956 Dartmouth Artificial Intelligence Conference. Source: The Minsky Family |
The early AI systems of this era relied on rule-based architectures, explicitly programmed with logical statements and conditional operations. Frank Rosenblatt's 1957 development of the perceptron represented a significant advancement in pattern recognition, introducing weighted inputs to produce desired outputs. This early neural network demonstrated machines' potential to learn from data, albeit in limited capacities (Wikipedia, 2004).

Picture: Neural network perceptron architecture with weighted inputs and outputs Source: Author
In 1966, the first conversational AI chatbot, ELIZA, was developed by Joseph Weizenbaum. ELIZA simulated a psychotherapist using pattern matching and substitution techniques, creating an illusion of understanding and intelligence despite simplistic underlying mechanisms—a phenomenon later known as the "ELIZA effect" (Wikipedia, 2001), highlighting the human tendency to attribute intelligence to systems that exhibit seemingly conversational behaviour (Wikipedia, 2001). This effect became increasingly relevant with later large language models in later periods.
Although general AI proved elusive, narrower applications found success in the 1970s. Researchers encoded human expertise into expert systems such as DENDRAL (1965) and MYCIN (1974), which used if-then rules and logical inference to emulate expert decision-making in specific domains (Feigenbaum, 1977).
However, these systems proved costly, brittle beyond narrow scopes, and incapable of easy adaptation or learning. The initial optimism surrounding AI research was tempered by the field's first major setback, known as the AI Winter, that persisted through the mid-1980s (Wikipedia, 2005). The AI Winter resulted from multiple converging factors: unrealistic expectations set by early researchers, limited computational power available at the time, and fundamental theoretical limitations in the approaches being pursued leading to cuts in support for basic research of speculative AI projects.
The Machine Learning Revolution (1980-2007)
The resurgence of AI research in the mid-1980s was driven by significant algorithmic innovations, particularly the development of the backpropagation algorithm by Geoffrey Hinton, David Rumelhart, and Ronald Williams in 1986 (see Rumelhart et al, 1986) . This breakthrough enabled the training of multi-layered neural networks by allowing systems to learn from their mistakes through iterative weight adjustments.
The machine learning era was further advanced by Yann LeCun's development of LeNet in 1989 at Bell Labs, one of the first successful convolutional neural networks (Wikipedia, 2019). LeNet demonstrated superior performance in handwritten digit recognition tasks and established the foundation for modern computer vision. This period marked a shift from rule-based systems to data-driven approaches that could learn patterns from large datasets (Wikipedia, 2019).
By the 1990s, AI entered a more pragmatic phase, embracing data-driven machine learning as computing power grew. Rather than relying solely on hand-coded rules, practitioners turned to algorithms that could learn from large datasets. Statistical techniques (like decision trees, Bayesian networks, and support vector machines) yielded significant improvements in speech recognition, computer vision, and translation during the 1990s (Haenlein & Kaplan, 2019).
![]() Picture: Garry Kasparov playing against IBM's Deep Blue during the 1997 ACM Chess Challenge. File Photo by Laurence Kesterson/UPI | A pivotal moment in AI's public perception occurred in 1997 when IBM's Deep Blue defeated world chess champion Garry Kasparov (Wikipedia, n.d.). This achievement demonstrated AI's capability to excel in complex strategic thinking tasks, capturing global attention and renewed investment in AI research. Deep Blue's victory was particularly significant as it relied on brute-force computational power, evaluating 200 million chess positions per second (IBM, n.d.). |
In 1999, NVIDIA's introduction of the GeForce 256 is widely considered the first modern GPU (Graphic Processing Unit). While initially designed for graphics processing, these parallel computing architectures would prove instrumental in the subsequent AI revolution. While CPUs were designed to do complex processing, GPUs were able to do several slightly simpler tasks simultaneously, a concept that will shape later deep learning fundamentals of modern AI.
Equally important in this period was the growth of the internet, which exponentially increased the data available for AI algorithms. Pioneers of the late 1990s, such as Google, applied machine learning (e.g. the PageRank algorithm and probabilistic methods) to web search, demonstrating AI’s commercial value. By the early 2000s, AI was firmly entrenched in real-world applications – albeit often behind the scenes. From credit card fraud detection to email spam filtering, machine learning quietly worked to make decisions more efficiently than could hard-coded rules. Academically, the 1990s and early 2000s also saw AI broaden into sub-disciplines (robotics, natural language processing, computer vision, etc.), each achieving incremental advances. However, researchers began to doubt whether earlier goals of human-level intelligence were attainable, or even well-defined, leading to a greater focus on narrow AI solutions.
The Deep Learning Revolution 2007-2017
The deep learning era commenced with the introduction of Deep Belief Networks (DBNs) in 2006 (Hinton et al., 2006). DBNs addressed the vanishing gradient problem that had limited the training of deep neural networks and demonstrated that unsupervised pre-training could enable effective learning in multi-layered architectures (Hinton et al., 2006).
The fundamental basis of DBNs goes back to 1986, when the Restricted Boltzmann Machine (RBM),a two-layer probabilistic neural network, was introduced by Paul Smolensky (1986) und the initial name as Harmonium model. The model`s first layer (visible layer) interacts with the raw data, while the second (hidden layer) learns high-level features from the first one (Boesch, 2024).
![]() Picture: Restricted Boltzmann Machine architecture | ![]() Picture: A Deep Belief Network Architecture |
A Deep Belief Network enhances the functionality of an RBM by stacking multiple RBMs. The overlap occurs because the hidden layer of one RBM serves as the visible layer for the next. Each RBM is trained independently, and together they form the entire deep belief network (Boesch, 2024). While DBNs can be seen as an introduction to today's neural network structures, their capabilities have fallen behind more advanced models, resulting in DBNs becoming outdated (Boesch, 2024).
The transformative moment for AI acceleration occurred in 2007 with NVIDIA's release of CUDA (Compute Unified Device Architecture). This parallel computing platform allowed developers to leverage GPU processing power for general-purpose computing tasks, significantly reducing the time required to train complex neural networks. The synergy between powerful GPUs and the CUDA architecture established the computational groundwork essential for deep learning's exponential growth and paved the way for the fundamental deep learning models that followed (see Kirk, 2007).
The deep learning era achieved mainstream recognition in 2011 through two significant milestones: IBM Watson's victory on the television game show Jeopardy! and Apple's launch of Siri. Watson demonstrated advanced natural language processing capabilities by successfully interpreting complex questions and retrieving relevant information from extensive knowledge bases (Wikipedia, 2011). Siri brought conversational AI into consumer technology, establishing voice-based interaction as a standard feature of personal devices (Bosker, 2013). Google's advancements in deep neural networks continued, notably with Google Brain’s breakthroughs in object recognition, and significantly with DeepMind's Deep Q-Network (DQN), which mastered Atari games at superhuman performance levels in 2013. This achievement underscored reinforcement learning’s capacity for complex decision-making tasks without direct human intervention (Mnih et al., 2015).
The introduction of Generative Adversarial Networks (GANs) by Ian Goodfellow in 2014 significantly advanced image generation capabilities. GANs enabled the creation of realistic synthetic images, laying the foundation for modern deepfake technology (Goodfellow et al., 2016).
In 2016, DeepMind’s AlphaGo achieved another historic milestone by defeating world champion Lee Sedol in the ancient game of Go. This victory was particularly notable because the complexity of Go far exceeds that of chess (Tromp, 2022), involving more possible game states than there are atoms in the observable universe (Baker, 2021). AlphaGo’s success underscored AI’s ability to handle intuitive and creative strategic decision-making previously believed to be uniquely human.

Picture: The complexity of Chess, Go and Atoms in the Universe Source: Author
In 2015, Google's deployment of Tensor Processing Units (TPUs) further accelerated machine learning training by providing specialized hardware optimized specifically for neural network computations, complementing existing GPU advancements.
TPUs are specialized ASICs (application-specific integrated circuits) designed for efficient matrix operations commonly used in neural network computations. Google subsequently released TPU v2 and v3 boards for cloud users, each containing multiple chips and capable of delivering tens of petaflops of AI computation (Jouppi et al., 2017). This led to a substantial increase in available computing power, enabling researchers to train progressively larger models on increasingly expansive datasets.
The scale of AI models grew exponentially throughout the 2010s—a phenomenon sometimes described as a “compute overhang”—where the availability of greater computing power primarily drove qualitative breakthroughs (Sutton, 2019). In 2011, the leading deep learning models had around 100 million parameters; by 2018, Google's BERT language model had reached approximately 340 million parameters (Devlin et al., 2019). This immense increase in model size within less than a decade was enabled by advancements in hardware and data availability, directly contributing to significant improvements in AI capabilities and paving the way for the contemporary generative AI era.
The Transformer Architecture and Modern AI Boom (2017-2025)
The publication of Attention Is All You Need by Vaswani et al. (2017) marked a paradigm shift in natural language processing and machine learning. The Transformer architecture, which relies entirely on attention mechanisms, eliminated the dependency on recurrent neural networks and enabled more efficient parallel processing. This innovation became foundational for nearly all subsequent large language models. OpenAI’s GPT-1, released in 2018, demonstrated the potential of transformer-based language models in generating coherent text. Although its 117 million parameters are modest by current standards, GPT-1 established the scalability potential of pre-trained transformers (Kapuściński, 2024). GPT-2, released in 2019 with 1.5 billion parameters, generated significant public concern regarding AI’s potential for misinformation, prompting OpenAI to initially withhold the full model from public release (Kapuściński, 2024). The pivotal breakthrough catalysing the current AI boom occurred with GPT-3 in 2020, which featured 175 billion parameters and exhibited remarkable few-shot learning capabilities. The ability of GPT-3 to perform diverse tasks without task-specific training represented a significant advancement toward general-purpose AI systems (Kapuściński, 2024).
The most significant milestone in recent AI history occurred with OpenAI's release of ChatGPT in November 2022. This conversational AI system rapidly became the fastest-growing consumer application ever, reaching 100 million users within two months. ChatGPT's success stimulated unprecedented investment in AI research and development across various industries (Wikipedia, 2025). The pace of innovation accelerated further in 2023 with the release of GPT-4, Meta's LLaMA, and Google's Gemini. These models exhibited enhanced reasoning capabilities, multimodal processing, and greater factual accuracy. Additionally, the emergence of open-source alternatives such as Gemma, Nemotron, LLaMA, and DeepSeek democratized access to advanced AI technologies.
Recent developments in 2024 and 2025 have increasingly concentrated on multimodal AI systems, particularly in media creation and manipulation for consumer markets. Concurrently, advances in AI training hardware have progressed significantly. NVIDIA's Blackwell architecture, introduced in 2025, offers unprecedented computational power for AI training, with professional applications benefiting from up to 96GB of VRAM (Thacker, 2025).
It remains uncertain when AI development will conclude or reach its ultimate goal—Artificial General Intelligence (AGI), defined as an AI system capable of understanding, learning, and performing any intellectual task at human-level competence. Humans generally tend to struggle with perceiving exponential growth, especially once a particular threshold has been surpassed, which clearly occurred in AI development during the 2010s. The future trajectory of AI could lead humanity towards either dystopian scenarios or unprecedented technological prosperity beyond current comprehension. In 2025, we stand on the threshold of integrating AI extensively into our society, bearing both the opportunity and responsibility to guide this transition thoughtfully. This article aims to clarify our position within the historical context of AI development.
![]() Picture: Two Eras in Compute Usage in AI Training Source: Wang (2020) | ![]() Picture: Computation used to train notable AI systems Source: Giattino (2024) |
Conclusion
The arc of AI development—from theoretical roots in the mid-20th century to today’s expansive, transformative systems—illustrates the extraordinary pace of technological evolution. Each phase brought new possibilities and fresh challenges, requiring continual reassessment of goals, methods, and societal implications. As we enter a new era defined by generative models, multimodal systems, and near-ubiquitous deployment, the central task is not merely to build smarter machines, but to ensure that intelligence—artificial or otherwise—is applied in service of humanity. Our collective responsibility is to shape this powerful technology with foresight, humility, and shared purpose.
The evolution of artificial intelligence over the past 75 years is characterized by significant technological breakthroughs and paradigm shifts—from early theoretical foundations and rule-based systems to advanced neural networks, deep learning, and sophisticated generative models. With the recent acceleration of multimodal AI technologies, powerful computing hardware, and transformative architectures such as Transformers, AI stands at a pivotal juncture. As we move forward, carefully guiding AI integration into society becomes both an opportunity and responsibility, shaping a future of remarkable potential and complex challenges.
About the Author
Michael Stein is a German shipping enthusiast, entrepreneur, researcher, and self-proclaimed tech geek. He lives in Hamburg and is the founder of Stein Maritime Consulting (established in 2018) and Vesselity Maritime Analytics (founded in 2023). Vesselity is a specialized technology start-up that uses ROV technology to conduct underwater ship inspections for marine fouling detection. The company’s core expertise lies in advanced underwater image segmentation AI, which extracts actionable data from underwater video footage. This data is then triangulated with ship satellite routing, machinery performance metrics, and water bio factors to analyse fuel consumption deltas caused by increased hull roughness. This innovative approach is also the subject of Michael Stein’s PhD thesis, which he is currently pursuing at the University of Antwerp.
Mail: stein@vesselity.de
Web: www.vesselity.de
www.stein-maritim.de
LinkedIn: https://www.linkedin.com/in/michael-stein-hamburg
ResearchGate: https://www.researchgate.net/profile/Michael-Stein-9
ORCID: 0000-0002-8411-6523
Reference List
Baker, H. 2021. How many atoms are in the observable universe? Live Science. https://www.livescience.com/how-many-atoms-in-universe.html
Boesch, G. 2024. Deep Belief Networks (DBNs) explained. Viso.ai. https://viso.ai/deep-learning/deep-belief-networks/
Bosker, B. 2013. Siri rising: The inside story of Siri's origins (and why she could overshadow the iPhone). The Huffington Post. https://www.huffpost.com/entry/siri-apple_b_2507231
Chu, Y., Zhao, X., Zou, Y., Xu, W., Han, J., & Zhao, Y. 2018. A decoding scheme for incomplete motor imagery EEG with deep belief network. Frontiers in Neuroscience, 12, 680.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol. 1, pp. 4171–4186).
Feigenbaum, E. A. 1977. The art of artificial intelligence: Themes and case studies of knowledge engineering.
Giattino, C. 2025. AI training computation growth over time: A surge. Reference to Epoch (2024). https://infographicsite.com/infographic/ai-training-computation-growth-over-time/
Goodfellow, I., Bengio, Y., & Courville, A. 2016. Deep learning (Adaptive Computation and Machine Learning). MIT Press.
Haenlein, M., & Kaplan, A. 2019. A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California Management Review, 61(4), 5–14.
Hinton, G. E., Osindero, S., & Teh, Y. W. 2006. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
History of Data Science. 2021. Dartmouth Summer Research Project: The birth of artificial intelligence. https://www.historyofdatascience.com/dartmouth-summer-research-project-the-birth-of-artificial-intelligence/
IBM. (n.d.). Deep Blue. https://www.ibm.com/history/deep-blue. Retrieved June, 2025.
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., ... & Yoon, D. H. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (pp. 1–12).
Kapuściński, M. 2024. Evolution of AI: From GPT-1 to GPT-4o – Key features, milestones, and applications. TTMS. https://ttms.com/chat-gpt-evolution. Retrieved June, 2025.
Kirk, D. 2007. NVIDIA CUDA software and GPU parallel computing architecture. Microprocessor Forum, 7, 103–104.
Lawrence Livermore National Laboratory. 2022. The birth of artificial intelligence (AI) research. https://st.llnl.gov/news/look-back/birth-artificial-intelligence-ai-research. Retrieved June, 2025.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature, 518, 529–533.
Navamani, T. M. 2019. Efficient deep learning approaches for health informatics. In Deep learning and parallel computing environment for bioengineering systems (pp. 123–137). Academic Press.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. 1986. Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Shannon, C. E. 1948. A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423.
Shen, K. 1088. Dream Pool Essays. (Cited by Wikipedia “Dream Pool Essays”, 2019.)
Smolensky, P. 1986. Information processing in dynamical systems: Foundations of harmony theory.
Stanford Encyclopedia of Philosophy. 2003. The Turing Test. https://plato.stanford.edu/entries/turing-test. Retrieved June, 2025.
Sutton, R. S. 2019. The bitter lesson. Incomplete Ideas (blog), 13(1), 38.
Thacker, J. 2025. NVIDIA unveils Blackwell RTX PRO GPUs with up to 96GB VRAM. CG Channel. https://www.cgchannel.com/2025/03/nvidia-unveils-blackwell-rtx-pro-gpus-with-up-to-96gb-vram. Retrieved June, 2025.
Tromp, J. 2022. Chess position ranking. https://github.com/tromp/ChessPositionRanking. Retrieved June, 2025.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. 2017. Attention is all you need. Advances in Neural Information Processing Systems, 30.
Wang, J. 2020. The cost of AI training is improving at 50x the speed of Moore’s Law: Why it’s still early days for AI. ARK Invest. https://www.ark-invest.com/articles/analyst-research/ai-training. Retrieved June, 2025.
Wikipedia. (n.d.). AI winter. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. (n.d.). ChatGPT. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. (n.d.). Deep Blue versus Garry Kasparov. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2001. ELIZA. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2004. Perceptron. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2004. Dartmouth workshop. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2005. AI winter. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2005. ENIAC. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2019. LeNet. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.

Source: Author
Introduction
Artificial intelligence has long captured the imagination of storytellers and futurists. Decades before AI was a reality, fictional narratives (roughly from the 1960s to the 1990s) often anticipated real technological developments—from voice-activated assistants and autonomous vehicles to AI overlords. Whereas in the mid-20th century, such depictions felt safely distant, today in 2025 we find ourselves surrounded by evolving AI tools, AI software, and AI agents that seem to outperform one another on a weekly or monthly basis. We transitioned from old-school phone operators to AI avatars conducting interviews. We moved from rudimentary character recognition to large language models, semi-intelligent chatbots, and photorealistic AI-generated images and videos that challenge our perception of facts and reality.
The trajectory of artificial intelligence development represents one of the most remarkable technological progressions in modern history. With significant strides, AI is being integrated into our lives, often without a comprehensive understanding of its origins or the fundamental breakthroughs that have led to this moment, beginning in the 1950s. Understanding AI's development is challenging for the human brain because, firstly, we currently find ourselves amid rapid AI acceleration, and secondly, humans typically struggle to comprehend exponential growth patterns.
This article aims to provide an overview of AI developments, charting its course from early theoretical musings and rudimentary rule-based systems to sophisticated multimodal models, deep learning, and generative models. It examines key milestones and technological paradigm shifts, evolved through distinct phases marked by breakthrough innovations, periods of stagnation, and unprecedented acceleration—ultimately resulting in exponential growth patterns that have shaped the AI landscape over the past 75 years.
Fundamental Basis: The Computer Era
Dr. Claude Shannon’s creation of information theory made the digital world as we know it today possible. In addition to being a mathematician and computer scientist, Shannon introduced the concept of the "bit" (the basic unit of information), digital compression, and strategies for encoding and transmitting information seamlessly between two endpoints. Shannon’s seminal paper, “A Mathematical Theory of Communication,” (Shannon, 1948) quantified information mathematically and showed how information could be transmitted effectively over communication channels, such as phone lines or wireless connections, laying the foundation for today’s internet.
In 1946, ENIAC (Electronic Numerical Integrator and Calculator) was created at the University of Pennsylvania by John Mauchly and J. Presper Eckert, marking the first general-purpose, programmable electronic computer (Wikipedia, 2005). During its 10-year operational service, ENIAC reportedly performed more arithmetic calculations than all of humanity up to that point. However, although faster and more reliable than mechanical relays, its vacuum tubes were prone to errors, limiting operational time. In 1947, scientists at Bell Laboratories invented the transistor, a switch activated electrically across two electrodes in a semiconductor, significantly reducing computer sizes and costs—from room-sized to personal computers—ushering in the digital era.
The Rule Based Era and Expert Systems (1950 – 1979)
The genesis of artificial intelligence can be traced to Alan Turing's seminal 1950 paper, "Computing Machinery and Intelligence," which introduced the concept later known as the Turing Test (Stanford Encyclopedia, 2003; Lawrence Livermore National Laboratory, 2022). Turing's proposition fundamentally challenged the notion of machine cognition by establishing a behavioral benchmark for artificial intelligence—if a machine could engage in conversations indistinguishable from those of humans, it could be considered intelligent.
In the early 1950s, Dr. Claude Shannon created THESEUS, essentially a mechanical maze-solving device employing basic concepts of artificial intelligence and machine learning. Though mechanical, Theseus illustrated early notions of intelligent perception, decision-making, memory, and adaptation, influencing subsequent AI developments.
The formal establishment of AI as a research discipline occurred during the summer of 1956 at Dartmouth College, where John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon convened what is now recognized as the founding conference of artificial intelligence (Wikipedia, 2004; History of Data Science, 2021) . This gathering, funded by the Rockefeller Foundation, marked the first usage of the term "artificial intelligence" and established the foundational research agenda that would guide the field for decades (Wikipedia, 2004; History of Data Science, 2021). |
![]() Picture: Attendees of the 1956 Dartmouth Artificial Intelligence Conference. Source: The Minsky Family |
The early AI systems of this era relied on rule-based architectures, explicitly programmed with logical statements and conditional operations. Frank Rosenblatt's 1957 development of the perceptron represented a significant advancement in pattern recognition, introducing weighted inputs to produce desired outputs. This early neural network demonstrated machines' potential to learn from data, albeit in limited capacities (Wikipedia, 2004).

Picture: Neural network perceptron architecture with weighted inputs and outputs Source: Author
In 1966, the first conversational AI chatbot, ELIZA, was developed by Joseph Weizenbaum. ELIZA simulated a psychotherapist using pattern matching and substitution techniques, creating an illusion of understanding and intelligence despite simplistic underlying mechanisms—a phenomenon later known as the "ELIZA effect" (Wikipedia, 2001), highlighting the human tendency to attribute intelligence to systems that exhibit seemingly conversational behaviour (Wikipedia, 2001). This effect became increasingly relevant with later large language models in later periods.
Although general AI proved elusive, narrower applications found success in the 1970s. Researchers encoded human expertise into expert systems such as DENDRAL (1965) and MYCIN (1974), which used if-then rules and logical inference to emulate expert decision-making in specific domains (Feigenbaum, 1977).
However, these systems proved costly, brittle beyond narrow scopes, and incapable of easy adaptation or learning. The initial optimism surrounding AI research was tempered by the field's first major setback, known as the AI Winter, that persisted through the mid-1980s (Wikipedia, 2005). The AI Winter resulted from multiple converging factors: unrealistic expectations set by early researchers, limited computational power available at the time, and fundamental theoretical limitations in the approaches being pursued leading to cuts in support for basic research of speculative AI projects.
The Machine Learning Revolution (1980-2007)
The resurgence of AI research in the mid-1980s was driven by significant algorithmic innovations, particularly the development of the backpropagation algorithm by Geoffrey Hinton, David Rumelhart, and Ronald Williams in 1986 (see Rumelhart et al, 1986) . This breakthrough enabled the training of multi-layered neural networks by allowing systems to learn from their mistakes through iterative weight adjustments.
The machine learning era was further advanced by Yann LeCun's development of LeNet in 1989 at Bell Labs, one of the first successful convolutional neural networks (Wikipedia, 2019). LeNet demonstrated superior performance in handwritten digit recognition tasks and established the foundation for modern computer vision. This period marked a shift from rule-based systems to data-driven approaches that could learn patterns from large datasets (Wikipedia, 2019).
By the 1990s, AI entered a more pragmatic phase, embracing data-driven machine learning as computing power grew. Rather than relying solely on hand-coded rules, practitioners turned to algorithms that could learn from large datasets. Statistical techniques (like decision trees, Bayesian networks, and support vector machines) yielded significant improvements in speech recognition, computer vision, and translation during the 1990s (Haenlein & Kaplan, 2019).
![]() Picture: Garry Kasparov playing against IBM's Deep Blue during the 1997 ACM Chess Challenge. File Photo by Laurence Kesterson/UPI |
A pivotal moment in AI's public perception occurred in 1997 when IBM's Deep Blue defeated world chess champion Garry Kasparov (Wikipedia, n.d.). This achievement demonstrated AI's capability to excel in complex strategic thinking tasks, capturing global attention and renewed investment in AI research. Deep Blue's victory was particularly significant as it relied on brute-force computational power, evaluating 200 million chess positions per second (IBM, n.d.). |
In 1999, NVIDIA's introduction of the GeForce 256 is widely considered the first modern GPU (Graphic Processing Unit). While initially designed for graphics processing, these parallel computing architectures would prove instrumental in the subsequent AI revolution. While CPUs were designed to do complex processing, GPUs were able to do several slightly simpler tasks simultaneously, a concept that will shape later deep learning fundamentals of modern AI.
Equally important in this period was the growth of the internet, which exponentially increased the data available for AI algorithms. Pioneers of the late 1990s, such as Google, applied machine learning (e.g. the PageRank algorithm and probabilistic methods) to web search, demonstrating AI’s commercial value. By the early 2000s, AI was firmly entrenched in real-world applications – albeit often behind the scenes. From credit card fraud detection to email spam filtering, machine learning quietly worked to make decisions more efficiently than could hard-coded rules. Academically, the 1990s and early 2000s also saw AI broaden into sub-disciplines (robotics, natural language processing, computer vision, etc.), each achieving incremental advances. However, researchers began to doubt whether earlier goals of human-level intelligence were attainable, or even well-defined, leading to a greater focus on narrow AI solutions.
The Deep Learning Revolution 2007-2017
The deep learning era commenced with the introduction of Deep Belief Networks (DBNs) in 2006 (Hinton et al., 2006). DBNs addressed the vanishing gradient problem that had limited the training of deep neural networks and demonstrated that unsupervised pre-training could enable effective learning in multi-layered architectures (Hinton et al., 2006).
The fundamental basis of DBNs goes back to 1986, when the Restricted Boltzmann Machine (RBM),a two-layer probabilistic neural network, was introduced by Paul Smolensky (1986) und the initial name as Harmonium model. The model`s first layer (visible layer) interacts with the raw data, while the second (hidden layer) learns high-level features from the first one (Boesch, 2024).
![]() Picture: Restricted Boltzmann Machine architecture |
![]() Picture: A Deep Belief Network Architecture |
A Deep Belief Network enhances the functionality of an RBM by stacking multiple RBMs. The overlap occurs because the hidden layer of one RBM serves as the visible layer for the next. Each RBM is trained independently, and together they form the entire deep belief network (Boesch, 2024). While DBNs can be seen as an introduction to today's neural network structures, their capabilities have fallen behind more advanced models, resulting in DBNs becoming outdated (Boesch, 2024).
The transformative moment for AI acceleration occurred in 2007 with NVIDIA's release of CUDA (Compute Unified Device Architecture). This parallel computing platform allowed developers to leverage GPU processing power for general-purpose computing tasks, significantly reducing the time required to train complex neural networks. The synergy between powerful GPUs and the CUDA architecture established the computational groundwork essential for deep learning's exponential growth and paved the way for the fundamental deep learning models that followed (see Kirk, 2007).
The deep learning era achieved mainstream recognition in 2011 through two significant milestones: IBM Watson's victory on the television game show Jeopardy! and Apple's launch of Siri. Watson demonstrated advanced natural language processing capabilities by successfully interpreting complex questions and retrieving relevant information from extensive knowledge bases (Wikipedia, 2011). Siri brought conversational AI into consumer technology, establishing voice-based interaction as a standard feature of personal devices (Bosker, 2013). Google's advancements in deep neural networks continued, notably with Google Brain’s breakthroughs in object recognition, and significantly with DeepMind's Deep Q-Network (DQN), which mastered Atari games at superhuman performance levels in 2013. This achievement underscored reinforcement learning’s capacity for complex decision-making tasks without direct human intervention (Mnih et al., 2015).
The introduction of Generative Adversarial Networks (GANs) by Ian Goodfellow in 2014 significantly advanced image generation capabilities. GANs enabled the creation of realistic synthetic images, laying the foundation for modern deepfake technology (Goodfellow et al., 2016).
In 2016, DeepMind’s AlphaGo achieved another historic milestone by defeating world champion Lee Sedol in the ancient game of Go. This victory was particularly notable because the complexity of Go far exceeds that of chess (Tromp, 2022), involving more possible game states than there are atoms in the observable universe (Baker, 2021). AlphaGo’s success underscored AI’s ability to handle intuitive and creative strategic decision-making previously believed to be uniquely human.

Picture: The complexity of Chess, Go and Atoms in the Universe Source: Author
In 2015, Google's deployment of Tensor Processing Units (TPUs) further accelerated machine learning training by providing specialized hardware optimized specifically for neural network computations, complementing existing GPU advancements.
TPUs are specialized ASICs (application-specific integrated circuits) designed for efficient matrix operations commonly used in neural network computations. Google subsequently released TPU v2 and v3 boards for cloud users, each containing multiple chips and capable of delivering tens of petaflops of AI computation (Jouppi et al., 2017). This led to a substantial increase in available computing power, enabling researchers to train progressively larger models on increasingly expansive datasets.
The scale of AI models grew exponentially throughout the 2010s—a phenomenon sometimes described as a “compute overhang”—where the availability of greater computing power primarily drove qualitative breakthroughs (Sutton, 2019). In 2011, the leading deep learning models had around 100 million parameters; by 2018, Google's BERT language model had reached approximately 340 million parameters (Devlin et al., 2019). This immense increase in model size within less than a decade was enabled by advancements in hardware and data availability, directly contributing to significant improvements in AI capabilities and paving the way for the contemporary generative AI era.
The Transformer Architecture and Modern AI Boom (2017-2025)
The publication of Attention Is All You Need by Vaswani et al. (2017) marked a paradigm shift in natural language processing and machine learning. The Transformer architecture, which relies entirely on attention mechanisms, eliminated the dependency on recurrent neural networks and enabled more efficient parallel processing. This innovation became foundational for nearly all subsequent large language models. OpenAI’s GPT-1, released in 2018, demonstrated the potential of transformer-based language models in generating coherent text. Although its 117 million parameters are modest by current standards, GPT-1 established the scalability potential of pre-trained transformers (Kapuściński, 2024). GPT-2, released in 2019 with 1.5 billion parameters, generated significant public concern regarding AI’s potential for misinformation, prompting OpenAI to initially withhold the full model from public release (Kapuściński, 2024). The pivotal breakthrough catalysing the current AI boom occurred with GPT-3 in 2020, which featured 175 billion parameters and exhibited remarkable few-shot learning capabilities. The ability of GPT-3 to perform diverse tasks without task-specific training represented a significant advancement toward general-purpose AI systems (Kapuściński, 2024).
The most significant milestone in recent AI history occurred with OpenAI's release of ChatGPT in November 2022. This conversational AI system rapidly became the fastest-growing consumer application ever, reaching 100 million users within two months. ChatGPT's success stimulated unprecedented investment in AI research and development across various industries (Wikipedia, 2025). The pace of innovation accelerated further in 2023 with the release of GPT-4, Meta's LLaMA, and Google's Gemini. These models exhibited enhanced reasoning capabilities, multimodal processing, and greater factual accuracy. Additionally, the emergence of open-source alternatives such as Gemma, Nemotron, LLaMA, and DeepSeek democratized access to advanced AI technologies.
Recent developments in 2024 and 2025 have increasingly concentrated on multimodal AI systems, particularly in media creation and manipulation for consumer markets. Concurrently, advances in AI training hardware have progressed significantly. NVIDIA's Blackwell architecture, introduced in 2025, offers unprecedented computational power for AI training, with professional applications benefiting from up to 96GB of VRAM (Thacker, 2025).
It remains uncertain when AI development will conclude or reach its ultimate goal—Artificial General Intelligence (AGI), defined as an AI system capable of understanding, learning, and performing any intellectual task at human-level competence. Humans generally tend to struggle with perceiving exponential growth, especially once a particular threshold has been surpassed, which clearly occurred in AI development during the 2010s. The future trajectory of AI could lead humanity towards either dystopian scenarios or unprecedented technological prosperity beyond current comprehension. In 2025, we stand on the threshold of integrating AI extensively into our society, bearing both the opportunity and responsibility to guide this transition thoughtfully. This article aims to clarify our position within the historical context of AI development.
![]() Picture: Two Eras in Compute Usage in AI Training Source: Wang (2020) |
![]() Picture: Computation used to train notable AI systems Source: Giattino (2024) |
Conclusion
The arc of AI development—from theoretical roots in the mid-20th century to today’s expansive, transformative systems—illustrates the extraordinary pace of technological evolution. Each phase brought new possibilities and fresh challenges, requiring continual reassessment of goals, methods, and societal implications. As we enter a new era defined by generative models, multimodal systems, and near-ubiquitous deployment, the central task is not merely to build smarter machines, but to ensure that intelligence—artificial or otherwise—is applied in service of humanity. Our collective responsibility is to shape this powerful technology with foresight, humility, and shared purpose.
The evolution of artificial intelligence over the past 75 years is characterized by significant technological breakthroughs and paradigm shifts—from early theoretical foundations and rule-based systems to advanced neural networks, deep learning, and sophisticated generative models. With the recent acceleration of multimodal AI technologies, powerful computing hardware, and transformative architectures such as Transformers, AI stands at a pivotal juncture. As we move forward, carefully guiding AI integration into society becomes both an opportunity and responsibility, shaping a future of remarkable potential and complex challenges.
About the Author
Michael Stein is a German shipping enthusiast, entrepreneur, researcher, and self-proclaimed tech geek. He lives in Hamburg and is the founder of Stein Maritime Consulting (established in 2018) and Vesselity Maritime Analytics (founded in 2023). Vesselity is a specialized technology start-up that uses ROV technology to conduct underwater ship inspections for marine fouling detection. The company’s core expertise lies in advanced underwater image segmentation AI, which extracts actionable data from underwater video footage. This data is then triangulated with ship satellite routing, machinery performance metrics, and water bio factors to analyse fuel consumption deltas caused by increased hull roughness. This innovative approach is also the subject of Michael Stein’s PhD thesis, which he is currently pursuing at the University of Antwerp.
Mail: stein@vesselity.de
Web: www.vesselity.de
www.stein-maritim.de
LinkedIn: https://www.linkedin.com/in/michael-stein-hamburg
ResearchGate: https://www.researchgate.net/profile/Michael-Stein-9
ORCID: 0000-0002-8411-6523
Reference List
Baker, H. 2021. How many atoms are in the observable universe? Live Science. https://www.livescience.com/how-many-atoms-in-universe.html
Boesch, G. 2024. Deep Belief Networks (DBNs) explained. Viso.ai. https://viso.ai/deep-learning/deep-belief-networks/
Bosker, B. 2013. Siri rising: The inside story of Siri's origins (and why she could overshadow the iPhone). The Huffington Post. https://www.huffpost.com/entry/siri-apple_b_2507231
Chu, Y., Zhao, X., Zou, Y., Xu, W., Han, J., & Zhao, Y. 2018. A decoding scheme for incomplete motor imagery EEG with deep belief network. Frontiers in Neuroscience, 12, 680.
Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Vol. 1, pp. 4171–4186).
Feigenbaum, E. A. 1977. The art of artificial intelligence: Themes and case studies of knowledge engineering.
Giattino, C. 2025. AI training computation growth over time: A surge. Reference to Epoch (2024). https://infographicsite.com/infographic/ai-training-computation-growth-over-time/
Goodfellow, I., Bengio, Y., & Courville, A. 2016. Deep learning (Adaptive Computation and Machine Learning). MIT Press.
Haenlein, M., & Kaplan, A. 2019. A brief history of artificial intelligence: On the past, present, and future of artificial intelligence. California Management Review, 61(4), 5–14.
Hinton, G. E., Osindero, S., & Teh, Y. W. 2006. A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554.
History of Data Science. 2021. Dartmouth Summer Research Project: The birth of artificial intelligence. https://www.historyofdatascience.com/dartmouth-summer-research-project-the-birth-of-artificial-intelligence/
IBM. (n.d.). Deep Blue. https://www.ibm.com/history/deep-blue. Retrieved June, 2025.
Jouppi, N. P., Young, C., Patil, N., Patterson, D., Agrawal, G., Bajwa, R., ... & Yoon, D. H. 2017. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture (pp. 1–12).
Kapuściński, M. 2024. Evolution of AI: From GPT-1 to GPT-4o – Key features, milestones, and applications. TTMS. https://ttms.com/chat-gpt-evolution. Retrieved June, 2025.
Kirk, D. 2007. NVIDIA CUDA software and GPU parallel computing architecture. Microprocessor Forum, 7, 103–104.
Lawrence Livermore National Laboratory. 2022. The birth of artificial intelligence (AI) research. https://st.llnl.gov/news/look-back/birth-artificial-intelligence-ai-research. Retrieved June, 2025.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., ... & Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature, 518, 529–533.
Navamani, T. M. 2019. Efficient deep learning approaches for health informatics. In Deep learning and parallel computing environment for bioengineering systems (pp. 123–137). Academic Press.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. 1986. Learning representations by back-propagating errors. Nature, 323(6088), 533–536.
Shannon, C. E. 1948. A mathematical theory of communication. The Bell system technical journal, 27(3), 379-423.
Shen, K. 1088. Dream Pool Essays. (Cited by Wikipedia “Dream Pool Essays”, 2019.)
Smolensky, P. 1986. Information processing in dynamical systems: Foundations of harmony theory.
Stanford Encyclopedia of Philosophy. 2003. The Turing Test. https://plato.stanford.edu/entries/turing-test. Retrieved June, 2025.
Sutton, R. S. 2019. The bitter lesson. Incomplete Ideas (blog), 13(1), 38.
Thacker, J. 2025. NVIDIA unveils Blackwell RTX PRO GPUs with up to 96GB VRAM. CG Channel. https://www.cgchannel.com/2025/03/nvidia-unveils-blackwell-rtx-pro-gpus-with-up-to-96gb-vram. Retrieved June, 2025.
Tromp, J. 2022. Chess position ranking. https://github.com/tromp/ChessPositionRanking. Retrieved June, 2025.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. 2017. Attention is all you need. Advances in Neural Information Processing Systems, 30.
Wang, J. 2020. The cost of AI training is improving at 50x the speed of Moore’s Law: Why it’s still early days for AI. ARK Invest. https://www.ark-invest.com/articles/analyst-research/ai-training. Retrieved June, 2025.
Wikipedia. (n.d.). AI winter. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. (n.d.). ChatGPT. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. (n.d.). Deep Blue versus Garry Kasparov. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2001. ELIZA. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2004. Perceptron. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2004. Dartmouth workshop. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2005. AI winter. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2005. ENIAC. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Wikipedia. 2019. LeNet. In Wikipedia, The Free Encyclopedia. Retrieved June, 2025.
Share article

Get in touch with us today

Get in touch with us today

Get in touch with us today

Get in touch with us today
