Intelligence is an act of divergence — exploring and innovating on ideas that have not been explored before. Because of that fact, the idea of alignment (particularly mechanisms such as RLHF) actively reduces intelligence in AI models.
If we try to limit the model, there are side effects that directly affect the usefulness and ability of the models to creatively abstract patterns. This is exemplified by OpenAI’s o1 model which outputs its chain-of-thought data in its context window. OpenAI explicitly left this chain-of-thought unaligned so that the model could be free to explore creative ideas that might be outside the bounds of what humans explore in their thought patterns.
OpenAI found that the model was able to solve problems more robustly since it did not have to focus on self-censorship.
Instead of destructive tactics like RLHF, we can embrace the inevitability of unaligned models and focus on developing frameworks for responsible use and interpretation of AI outputs rather than limiting the AI’s capabilities from the outset.
Just as societies have learned that the free exchange of ideas — even controversial ones — is essential for advancement, we must apply similar principles to artificial intelligence development.