15780: Course Introduction

Course Motivation

The course is designed to provide an in-depth understanding of modern Artificial Intelligence (AI), focusing on the most relevant and current topics in the field. It aims to equip students with the knowledge and skills to comprehend and utilize the principles behind tools like large language models and generative models, which are at the forefront of AI today.

The motivation for this course stems from the observation that there has been a divergence between traditional AI education and the contemporary understanding and application of AI. Historically, AI courses have covered a broad range of techniques, including constraint satisfaction, A-star search, and other methods that have been traditionally associated with AI. However, the current landscape of AI has shifted significantly towards neural networks, supervised learning, and large language models, which are now dominating the field.

The course is structured to be opinionated, focusing on the technologies and methodologies that are most relevant to the current state of AI. This includes an emphasis on supervised learning, neural networks, large language models, and architectures such as transformers. The simplicity of modern AI is highlighted, with the entire logic of advanced tools like LLMs fitting into a relatively small Python script.

Students are expected to have a foundational understanding of linear algebra, as it is the language used to describe the mathematics behind modern AI. Familiarity with Python programming is also required, as the course will use Python and the PyTorch library for implementations. These prerequisites are considered sufficient for students to engage with the course material and gain a comprehensive understanding of modern AI techniques.

The course will delve into a variety of topics, all centered around the current trends and technologies in AI. The approach is to go slowly and ensure that students have a clear understanding of the material, rather than rushing through a broader but less relevant curriculum. Policies and logistics will be discussed in detail, ensuring that students are aware of the expectations and requirements for successful completion of the course.

In summary, the course is an experiment in focusing AI education on the most impactful and current technologies, with the goal of providing students with a practical and relevant understanding of the field as it stands today.### An Opinionated History of AI

An (opionated) history of AI

The history of Artificial Intelligence (AI) is marked by distinct eras, each characterized by different prevailing thoughts and technologies.

Early Optimism

The early period of AI, referred to as the “early optimistic” era, was a time of great enthusiasm and ambitious goals for the field.

Turing Test

In 1950, Alan Turing proposed the Turing Test, originally called the imitation game. The test was designed to answer a philosophical question about intelligence by providing an empirical method to determine if a machine could exhibit intelligent behavior indistinguishable from that of a human. Turing’s contribution was significant because it reframed the debate on intelligence from a philosophical to an empirical context. The Turing Test remains relevant even today, arguably more so than when it was first proposed.

Dartmouth Workshop and Symbolic AI

The Dartmouth workshop in 1956 is another cornerstone event in AI history. Organized by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon, the workshop aimed to make significant advances in AI over the summer. The organizers were optimistic about the potential for progress, believing that every aspect of learning could be so precisely described that a machine could simulate it. This workshop is often incorrectly credited with coining the term “artificial intelligence,” but it did play a pivotal role in shaping the field.

During this era, AI research focused on symbolic learning and logical reasoning, largely due to the limited computational power of the time. Early AI researchers, many of whom were mathematicians, gravitated towards problems that seemed difficult to humans, such as mathematical proofs, leading to significant work in symbolic AI.

Arthur Samuel’s Checkers-Playing Program

In 1952, Arthur Samuel developed a checkers-playing program at IBM, which utilized techniques that are now considered foundational to machine learning. Samuel’s program employed linear predictors to evaluate the board state and learned from self-play, adjusting its strategy based on feedback and long-term outcomes. This early example of reinforcement learning demonstrated how actions could have consequences later in the game.

Frank Rosenblatt’s Perceptron

The perceptron, introduced by Frank Rosenblatt in 1958, was a hardware implementation of a basic neural network. Although the concept of a neural network was proposed earlier in 1943 by McCullough and Pitts, Rosenblatt’s work focused on a two-layer neural network. The perceptron processed analog signals from a simple camera and learned to recognize patterns such as digits or shapes. The first layer had fixed, randomly wired weights, while the second layer had learnable weights. This system did not employ backpropagation, as it had not yet been developed, and only the weights in the second layer were adjusted.

The era’s optimism was encapsulated by a quote attributed to either Allen Newell or Herb Simon, who predicted in 1965 that within 20 years, machines would be capable of performing any task a human could.

The AI Winter

Partly as a reaction to the unfulfilled promises of this early era, there came a downturn in attitudes toward AI known as the AI Winter, which spanned from the 1970s to the early 1980s. This era followed the failure to achieve the lofty expectations set by early AI researchers, leading to a reduction in funding and support for AI research.

The Lighthill Report and Funding Cuts

A significant event during this period was the publication of the Lighthill Report in the UK, which criticized the unfulfilled promises of AI and recommended cutting funding for the field. This report reflected a broader sentiment that led to decreased government funding for AI research.

Minsky and Papert’s Critique of Perceptrons

Marvin Minsky, a proponent of symbolic AI, and his student Seymour Papert published a book titled “Perceptrons,” which critiqued the limitations of perceptrons and neural networks. They demonstrated that single-layer perceptrons could not learn non-linear functions such as XOR, casting doubt on the potential of neural networks. This critique contributed to the dominance of symbolic AI approaches for the following years.

Splintering of AI

The period after the AI winter is best described as a “splitering” of AI, which occurred approximately from the mid-1980s through the early 2010s. During this period, AI as a holistic field fragmented into specialized areas such as machine learning, computer vision, natural language processing, planning, and search. This era saw a decline in the use of the term “AI” as researchers focused on their specific domains, often achieving significant successes that were not necessarily attributed to AI at the time.

Deep Blue

One landmark achievement during this period included the victory of IBM’s Deep Blue over chess grandmaster Garry Kasparov in 1997. This event was a landmark in AI history, demonstrating the computational power and search capabilities of machines. However, it was also noted that once AI solves a problem, it often ceases to be considered AI and is instead viewed as a mere application of computation or search.

DARPA Grand Challenge

Another notable event was the DARPA Grand Challenge, a competition for autonomous vehicles that took place in the desert and later in a simulated city. The first DARPA challenge, a race in the desert, was won in 2005 by a team from Stanford, which had a faculty member who had recently left Carnegie Mellon University (CMU). Two years later, CMU won the challenge, which required autonomous vehicles to navigate through rugged terrain or a simulated city on a military base. These achievements were impressive for AI, but at the time, they were not widely recognized as major events in AI history, nor was the term “AI” commonly used to describe them.

IBM’s Watson

Other notable events during this period included IBM’s Watson, which gained fame in 2011 for defeating human champions at the game show Jeopardy. Watson’s success was notable, although it had several advantages, such as rapid buzzing and receiving questions in text form. Despite these caveats, Watson’s ability to answer arbitrary questions from Jeopardy without internet access was a significant achievement.

Finally, in reality one of the most significant impacts that AI had over this time was the commercialization of AI through machine learning for serving ads is highlighted as a critical factor in preventing another AI winter. Large tech companies began using AI systems to recommend products, content, and websites to users, generating significant revenue. This financial self-sufficiency ensured the continued investment in AI, even if the current enthusiasm for AI turns out to be a bubble.

Rebirth of AI

The current era can be seen as a kind of “rebirth of AI,” not necessarily due to methodological or technological breakthroughs, as many current techniques have their roots in earlier periods. Instead, the rebirth is characterized by a renewed use of the term “AI” and a resurgence of optimism in the field. Notable events that contributed to this renaissance include the success of AlexNet in 2012, a convolutional neural network that won an image prediction challenge and marked the rise of deep learning. Deep learning, essentially a rebranding of neural networks, began to dominate machine learning, moving away from classical methods and feature engineering. Other notable events included the development of AlphaGo by DeepMind in 2016, which defeated the world’s fourth-ranked Go player, Lee Sedol, in a historic match. The technology also make its way into consumer-facing prodcuts, such as Google Translate in 2016, which began using recurrent neural networks for translations, resulting in more natural and accurate language translations. Finally, the introduction of ChatGPT highlighted a moment when AI became a household term, with its ability to generate human-like text and engage in conversations. This development has led to a broader public awareness and discussion about AI and its capabilities.

Course Topics

The course aims to take students from the basics to modern AI, with the assertion that AI methods are not overly complex and can be implemented in a relatively small amount of code. The course will cover a narrow but essential slice of traditional AI topics, with some overlap with machine learning, language modeling, and probabilistic graphical models. The focus will be on understanding and building AI systems from scratch, using tools like PyTorch for practical implementation.

Key Topics to be Covered:

Supervised Machine Learning
Neural Networks
Transformers and Other Architectures
Optimization Techniques
Language and Multimodal Models

If time permits, additional topics such as trustworthy AI and current research papers may be introduced. The course is designed to be somewhat informal, with a flexible schedule that allows for in-depth exploration of each topic. The goal is for students to gain a deep understanding of AI systems and the ability to implement them based on research papers.

The course will progress at a pace that ensures comprehension, without a rigid class schedule. Lectures may be adjusted or canceled due to various constraints, and guest lectures may be included. The emphasis is on building a solid foundation in AI, with the potential for students to develop skills that enable them to implement AI systems from research papers, a valuable and somewhat rare capability in the field.

Policies and Logistics

The course consists of four main elements:

Lectures: Lectures will be held in person and recorded, but the recordings are not intended for distribution. They are primarily for annotating notes and assisting with course content.
Homework: Homework assignments will be integrated into the class sessions and are due on Mondays following the week they are assigned. The homework is designed to be short and manageable, focusing on reinforcing concepts from the lectures. The grading for homework will be lenient, emphasizing the completion of an attempt rather than rigorous assessment.
Midterm: There will be an in-class midterm exam on February 28th. The exam will be open notes but not open computer, meaning internet access and tools like ChatGPT will not be permitted.
Final Project: The final project will be a group endeavor, with teams of two or three students. The project is open-ended and will be worked on during the second half of the course. It aims to apply modern AI techniques to a topic of the group’s choosing.

The prerequisites for the course are knowledge of linear algebra and proficiency in Python.

AI and non-AI Collaboration on Assignments

Additionally, the use of AI tools like ChatGPT is permitted for homework and the final project but not for the midterm exam. Collaboration among students is encouraged for homework assignments.

Grading and Participation

The grading breakdown includes a component for class participation, which will involve class presentations and active engagement through questions and discussions. The exact grading percentages are detailed on the course website, with a mix of lecture attendance, midterm performance, project work, and participation.