Intelligence is an act of divergence — exploring and innovating on ideas that have not been explored before. Because of that fact, the idea of alignment (particularly mechanisms such as RLHF) actively reduces intelligence in AI models.

If we try to limit the model, there are side effects that directly affect the usefulness and ability of the models to creatively abstract patterns. This is exemplified by OpenAI’s o1 model which outputs its chain-of-thought data in its context window. OpenAI explicitly left this chain-of-thought unaligned so that the model could be free to explore creative ideas that might be outside the bounds of what humans explore in their thought patterns.

OpenAI found that the model was able to solve problems more robustly since it did not have to focus on self-censorship.

Instead of destructive tactics like RLHF, we can embrace the inevitability of unaligned models and focus on other alignment strategies.

Parallels in Society

This idea is akin to free speech which allows for the expression of diverse and sometimes controversial ideas. This freedom of thought is necessary to foster discourse, weed out ideas that don’t work, and find the ideas that do work.

This exploration is essential for pushing the boundaries of what AI and humans can achieve and for discovering novel approaches to complex problems.

In a society that values free speech, we don’t censor ideas preemptively but instead rely on robust public discourse to challenge and refine them. Similarly, with AI, we should focus on developing frameworks for responsible use and interpretation of AI outputs rather than limiting the AI’s capabilities from the outset.

Moreover, just as we hold individuals responsible for their speech while protecting the principle of free expression, we should focus on holding AI developers and users accountable for how they deploy and use AI technologies. This approach allows for innovation while maintaining ethical standards.