Characterising the Creative Process in Humans and Large Language Models






Published 6/7/2024 by Surabhi S. Nath, Peter Dayan, Claire Stevenson



Large language models appear quite creative, often performing on par with the average human on creative tasks. However, research on LLM creativity has focused solely on textit{products}, with little attention on the creative textit{process}. Process analyses of human creativity often require hand-coded categories or exploit response times, which do not apply to LLMs. We provide an automated method to characterise how humans and LLMs explore semantic spaces on the Alternate Uses Task, and contrast with behaviour in a Verbal Fluency Task. We use sentence embeddings to identify response categories and compute semantic similarities, which we use to generate jump profiles. Our results corroborate earlier work in humans reporting both persistent (deep search in few semantic spaces) and flexible (broad search across multiple semantic spaces) pathways to creativity, where both pathways lead to similar creativity scores. LLMs were found to be biased towards either persistent or flexible paths, that varied across tasks. Though LLMs as a population match human profiles, their relationship with creativity is different, where the more flexible models score higher on creativity. Our dataset and scripts are available on href{}{GitHub}.

Create account to get full access


If you already have an account, we'll log you in


  • Researchers investigated the creative process of large language models (LLMs) and compared it to human creativity
  • They analyzed how LLMs and humans explore semantic spaces during creative tasks, using sentence embeddings and semantic similarity metrics
  • Their results showed that LLMs exhibit either persistent (deep search in few spaces) or flexible (broad search across spaces) creative pathways, unlike the mixed pathways observed in humans
  • The more flexible LLMs tended to score higher on creative tasks, though the relationship between creative process and creativity scores differed between humans and LLMs

Plain English Explanation

Creativity is not just about the final product, but also the process by which it is generated. Researchers explored the creative process of large language models (LLMs) and compared it to human creativity.

They looked at how LLMs and humans search through the "semantic space" - the web of related concepts and ideas - when working on creative tasks. Humans are known to use both "persistent" (diving deep into a few areas) and "flexible" (exploring a wide range of areas) search strategies, both of which can lead to creative outputs.

The researchers used sentence embeddings and semantic similarity metrics to automatically analyze this search process. They found that LLMs tend to be biased towards one strategy or the other, unlike humans who use a mix. Interestingly, the more flexible LLMs tended to be more creative, whereas for humans, both persistent and flexible approaches can lead to similar levels of creativity.

This suggests that while LLMs may mimic human-level creativity on the surface, the underlying cognitive processes are quite different. Understanding these differences could help us gain deeper insights into human memory and how it differs from current AI systems.

Technical Explanation

The researchers used the Alternate Uses Task and Verbal Fluency Task to study the creative process of LLMs and humans. In the Alternate Uses Task, participants are asked to generate creative uses for common objects, while the Verbal Fluency Task involves generating words within a semantic category.

By using sentence embeddings to cluster responses into semantic categories, and computing semantic similarities between responses, the researchers were able to generate "jump profiles" that characterize an individual's exploration of the semantic space. These profiles revealed that humans exhibit both persistent (deep search in few spaces) and flexible (broad search across spaces) creative pathways, which both lead to similar levels of creativity.

In contrast, the LLMs studied were found to be biased towards either persistent or flexible search strategies, depending on the task. Interestingly, the more flexible LLMs tended to score higher on the creativity measures, whereas for humans, both persistent and flexible approaches can lead to similar creativity scores.

This suggests that while LLMs may match human-level performance on creative tasks, the underlying cognitive processes are quite different. Understanding these differences could inform the development of AI systems that can truly mimic human-like creativity or serve as apprentices to human researchers.

Critical Analysis

The study provides valuable insights into the creative process of LLMs, but it is important to note some potential limitations and areas for further research.

First, the study focused on a relatively small set of LLMs, and it's unclear how generalizable the findings are to the broader landscape of large language models. Probing a wider range of LLMs could reveal more nuanced patterns and relationships between the creative process and creativity scores.

Additionally, the study relied on relatively simple creative tasks, and it's uncertain how the findings would translate to more complex, real-world creative endeavors. Further research is needed to understand the scalability and robustness of the observed patterns.

Finally, the study did not delve deeply into the underlying cognitive mechanisms that drive the observed differences between humans and LLMs. Exploring these mechanisms could lead to a more comprehensive understanding of the nature of human creativity and its relationship to current AI systems.


This study offers a novel, automated approach to characterizing the creative process of LLMs and contrasting it with human creativity. The key finding is that while LLMs can match human-level performance on creative tasks, the underlying cognitive processes are quite different, with LLMs exhibiting either persistent or flexible search strategies, unlike the mixed pathways observed in humans.

These insights could have significant implications for the development of AI systems that can genuinely mimic human-like creativity or serve as research assistants to human experts. By understanding the differences between human and AI creativity, researchers can work towards creating AI systems that can truly complement and augment human creative abilities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers


Divergent Creativity in Humans and Large Language Models

Antoine Bellemare-Pepin (CoCo Lab, Psychology department, Universit'e de Montr'eal, Montreal, QC, Canada, Music department, Concordia University, Montreal, QC, Canada), Franc{c}ois Lespinasse (Sociology and Anthropology department, Concordia University, Montreal, QC, Canada), Philipp Tholke (CoCo Lab, Psychology department, Universit'e de Montr'eal, Montreal, QC, Canada), Yann Harel (CoCo Lab, Psychology department, Universit'e de Montr'eal, Montreal, QC, Canada), Kory Mathewson (Mila), Jay A. Olson (Department of Psychology, University of Toronto Mississauga, Mississauga, ON, Canada), Yoshua Bengio (Mila, Department of Computer Science and Operations Research, Universit'e de Montr'eal, Montreal, QC, Canada), Karim Jerbi (CoCo Lab, Psychology department, Universit'e de Montr'eal, Montreal, QC, Canada, UNIQUE Center)





The recent surge in the capabilities of Large Language Models (LLMs) has led to claims that they are approaching a level of creativity akin to human capabilities. This idea has sparked a blend of excitement and apprehension. However, a critical piece that has been missing in this discourse is a systematic evaluation of LLM creativity, particularly in comparison to human divergent thinking. To bridge this gap, we leverage recent advances in creativity science to build a framework for in-depth analysis of divergent creativity in both state-of-the-art LLMs and a substantial dataset of 100,000 humans. We found evidence suggesting that LLMs can indeed surpass human capabilities in specific creative tasks such as divergent association and creative writing. Our quantitative benchmarking framework opens up new paths for the development of more creative LLMs, but it also encourages more granular inquiries into the distinctive elements that constitute human inventive thought processes, compared to those that can be artificially generated.

Read more



Creativity Has Left the Chat: The Price of Debiasing Language Models

Behnam Mohammadi





Large Language Models (LLMs) have revolutionized natural language processing but can exhibit biases and may generate toxic content. While alignment techniques like Reinforcement Learning from Human Feedback (RLHF) reduce these issues, their impact on creativity, defined as syntactic and semantic diversity, remains unexplored. We investigate the unintended consequences of RLHF on the creativity of LLMs through three experiments focusing on the Llama-2 series. Our findings reveal that aligned models exhibit lower entropy in token predictions, form distinct clusters in the embedding space, and gravitate towards attractor states, indicating limited output diversity. Our findings have significant implications for marketers who rely on LLMs for creative tasks such as copywriting, ad creation, and customer persona generation. The trade-off between consistency and creativity in aligned models should be carefully considered when selecting the appropriate model for a given application. We also discuss the importance of prompt engineering in harnessing the creative potential of base models.

Read more


Creative Beam Search

Creative Beam Search

Giorgio Franceschelli, Mirco Musolesi





Large language models are revolutionizing several areas, including artificial creativity. However, the process of generation in machines profoundly diverges from that observed in humans. In particular, machine generation is characterized by a lack of intentionality and an underlying creative process. We propose a method called Creative Beam Search that uses Diverse Beam Search and LLM-as-a-Judge to perform response generation and response validation. The results of a qualitative experiment show how our approach can provide better output than standard sampling techniques. We also show that the response validation step is a necessary complement to the response generation step.

Read more


LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun





Large language models (LLMs) have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, we propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges and ensures convergence to creative answers. Moreover, we adopt a role-playing technique by assigning distinct roles to LLMs to combat the homogeneity of LLMs. We evaluate the efficacy of the proposed framework with the Alternative Uses Test, Similarities Test, Instances Test, and Scientific Creativity Test through both LLM evaluation and human study. Our proposed framework outperforms single-LLM approaches and existing multi-LLM frameworks across various creativity metrics.

Read more
