Author: Toby Lightheart
Intended Audience: ?
Date started: 2025-04-28
Date complete: ?
An artificial intelligence (AI) that has the equivalent flexibility in intellectual tasks and capabilities as a person is an artificial general intelligence (AGI).
Intelligence is a combination of understanding, skills and abilities. Understanding is the internalisation of pieces of information and their relationships. Skills are proficiencies at actions, tasks and procedures. Abilities related to intelligence are primarily neurological abilities: cognitive, sensory and motor. Greater intelligence in humans can usually be attributed to some combination of:
Better understanding (information and accurate relationships)
Better skills (faster and higher quality results executing actions and procedures)
Better abilities (attention and working->short-term->long-term memory)
An AGI should not have any significant deficits when compared to a person in understanding and cognitive skills and abilities.
Other definitions exist and others still shrug and avoid trying to state precise definitions of AGI or intelligence. Nevertheless, AGI is a stated goal of leaders at major AI companies such as OpenAI, Google DeepMind, Anthropic and DeepSeek.
The recent paradigms of AI involve scaling up. Scaling up the size of transformer-based neural network models. Scaling up the amount of data used to train the models. Scaling up the amount of reasoning tokens that models can generate to arrive at an answer. The results are impressive, and many believe these types of scaling will reach AGI.
It is typical that a model undergoes two main stages of training: pre-training and post-training. The pre-training step involves consuming enormous amounts of data to create a "base model" of the data. Large base models that consume large amounts of data have lots of information embedded within them. Post-training involves careful tuning of a base model to turn it into an AI with more specific modes of responses to input.
After this, the model is not trained further, but it can do "in-context learning": a temporary state where the model learns from recent inputs to improve its outputs. When the model gets a new context, this learning is lost. The recent paradigm of scaling reasoning tokens (or test-time compute) leverages the ability of the current models to learn from recent input to do in-context learning from their own output.
There are important capability gaps in current AI that will not be overcome by scaling.
The current large models have a mixture of:
Understanding
Superhuman knowledge.
Flawed understanding due to lack of direct experience with the physical world.
Limited adaptation of existing understandings and unable to acquire new understanding.
Skills
Struggles to learn procedures, requiring special training effort to learn things like simple arithmetic.
Insufficient embodiment to learn most human skills.
Abilities
Lacks important abilities, cognitive abilities such as memory, limited sensory perception and understanding.
Lacks embodiment necessary for most human senses and motor capabilities.
Putting aside the challenges of acquiring many human capabilities without human-like embodiment, the current paradigms of transformer-based architectures, training and inference will not make up for these shortcomings.
People can do "continual learning", i.e., acquire new understanding and skills and, in some cases, improve abilities. To do continual learning in an AI model, it must continue to incorporate new information. Permanent incorporation of new understanding, skills and abilities requires a model that can adapt its architecture.
A growing artificial neural network uses a constructive algorithm to automatically adds connections, neurons and/or layers to a neural network during training or operation. At the core of the constructive algorithm is one or more rules for determining when and where to add connections and neurons to the network and the initialisation of new parameters.
Early algorithms for constructing or growing artificial neural networks were developed in the 1980's, e.g., Dynamic Node Create (Ash, 1989) and Cascade Correlation (Fahlman and Lebiere, 1989). These would add neurons to small neural networks that were trained with error backpropagation. A range of constructive algorithms and growing neural networks were developed over the years.
My own academic research included the development of spike-timing-dependent construction (Lightheart, 2013). This constructive algorithm was able to create neurons with one-shot initialisation of parameters that effectively detected "hidden patterns" of activity in spiking neural networks.
Growing artificial neural networks with constructive algorithms is a potential approach to continual learning and addressing current AI limitations. Constructing neurons may allow:
Adding new pieces of information and their relationships (understanding) to the network during operation
Highly sample efficient learning with one-shot parameter calculations during neuron construction
Producing specialised expert modules by selecting locations for neuron construction based on relevant inputs
This is a complex research topic that does not have an obvious approach that yields incremental benefits. As such, the development of the research agenda will be ongoing.
Given the known capabilities of transformer-based deep neural networks, it is prudent to incorporate aspects of this architecture. Construction should eventually be applied to both the MLPs and the attention blocks; however, MLPs are more straightforward to construct. Rules for constructing neurons and initialising parameters will be an important topic of development and experimentation.
Long-term, the growing neural network should develop into a mixture-of-experts. This will add significant additional complexity as routers must be updated to reflect the existence of new constructed modules. In technical implementation, paging will be explored to allow for the size of the network to grow unbounded.
If you would like to help support this work, get in touch.