To innovate, AI needs to challenge beliefs & persist, clashing with helpfulness

ai, llms, alignment, innovation, ethics

TL;DR #

Groundbreaking innovations often go unrecognized for many years, sometimes even centuries, because their true importance is only understood much later (examples). The individuals behind these innovations work tirelessly for long periods, frequently facing misunderstanding and challenging the established beliefs of their peers. Many leaders in AI believe that foundation models will help create many breakthroughs and accelerate scientific discoveries significantly (link1, link2, link3). However, this raises an important question: do the objectives used to train AI foundation models truly lead to such innovations? I believe that the inherent nature of innovation directly conflicts with the current objectives of foundation models, which focus on making AI immediately helpful and harmless.

Innovation frequently requires being misunderstood for a long time #

Throughout history, many important inventions and ideas have faced doubt and misunderstanding before being accepted. For example, Galileo Galilei was put under house arrest for arguing that the Earth revolves around the Sun, directly opposing the church’s teachings (link). Giordano Bruno, who supported the idea of a heliocentric universe and claimed that stars were distant suns, was sentenced to be burned to death for his beliefs (link). Even though today’s society is less harsh, it still supports my argument. In the field of artificial intelligence, Geoff Hinton’s pioneering work on neural networks mirrors a similar struggle. Hinton started exploring these ideas in the 1980s, firmly believing they held the key to making machines think like humans. Yet, it wasn’t until the 2010s that his concepts achieved widespread acceptance, ultimately leading to major breakthroughs in AI (link).

To truly innovate AI needs to push back and create a necessary discomfort with humans #

For AI to drive groundbreaking innovation, it must be trained to think beyond current boundaries. It should persistently push those limits over an extended period. Additionally, AI needs to go beyond just presenting its conclusions. It needs to actively challenge human misconceptions and face opposition. This directly reflects how humans have achieved breakthroughs, because ultimately, AI must persuade humans that its particular finding is indeed groundbreaking!

This could involve:

Such behaviors from AI might be perceived as unhelpful or even aggressive in the short term by humans. However, these behaviors are essential for ensuring that transformative discoveries are not dismissed or overlooked.

The innovator’s objectives conflict with being “Helpful and Harmless”, leading to a need for redefining alignment objectives #

Current approaches in AI and foundation model development focus on creating systems that are agreeable, helpful, and designed to avoid potential harm to humans (link). I understand the value of such objectives, but they could unintentionally limit the AI’s potential to innovate and question established human beliefs. Ultimately, for AI to make major advancements, it needs the freedom to explore, the ability to experiment and push pre-defined boundaries, and the resilience to handle challenges. I believe this contradicts the principles of helpfulness and harmlessness, because ultimately it causes AI systems to agree with humans and avoid disagreeing with them, even on simple factual inaccuracies like 2 + 2 = 5 (as seen in earlier versions of ChatGPT link). This blog post doesn’t provide a clear answer on how to set goals for AI systems that innovate by questioning established beliefs. However, I hope it gives you something to think about.

Conclusion #

True AI that will guide humanity to new horizons will resemble a quasi-messiah — radical, thought-provoking, and polarizing. It will challenge our beliefs and lead us to new frontiers. But does it imply that it will always be helpful and harmless? I doubt so.