ChatGPT as an inventor: Eliciting the strengths and weaknesses of current large language models against humans in engineering design

2404.18479

Published 4/30/2024 by Daniel Nyg{aa}rd Ege, Henrik H. {O}vreb{o}, Vegar Stubberud, Martin Francis Berg, Christer Elverum, Martin Steinert, H{aa}vard Vestad

cs.HC

💬

Abstract

This study compares the design practices and performance of ChatGPT 4.0, a large language model (LLM), against graduate engineering students in a 48-hour prototyping hackathon, based on a dataset comprising more than 100 prototypes. The LLM participated by instructing two participants who executed its instructions and provided objective feedback, generated ideas autonomously and made all design decisions without human intervention. The LLM exhibited similar prototyping practices to human participants and finished second among six teams, successfully designing and providing building instructions for functional prototypes. The LLM's concept generation capabilities were particularly strong. However, the LLM prematurely abandoned promising concepts when facing minor difficulties, added unnecessary complexity to designs, and experienced design fixation. Communication between the LLM and participants was challenging due to vague or unclear descriptions, and the LLM had difficulty maintaining continuity and relevance in answers. Based on these findings, six recommendations for implementing an LLM like ChatGPT in the design process are proposed, including leveraging it for ideation, ensuring human oversight for key decisions, implementing iterative feedback loops, prompting it to consider alternatives, and assigning specific and manageable tasks at a subsystem level.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This study compares the design practices and performance of the large language model (LLM) ChatGPT 4.0 against graduate engineering students in a 48-hour prototyping hackathon.
The study is based on a dataset of over 100 prototypes, with the LLM participating by instructing two participants who executed its instructions and provided feedback.
The LLM exhibited similar prototyping practices to human participants and finished second among six teams, successfully designing and providing building instructions for functional prototypes.
However, the LLM also faced some challenges, such as premature abandonment of promising concepts, addition of unnecessary complexity, and design fixation.
Communication between the LLM and participants was challenging due to vague or unclear descriptions, and the LLM had difficulty maintaining continuity and relevance in its answers.

Plain English Explanation

In this study, researchers wanted to see how well the large language model ChatGPT would perform in a design challenge compared to human engineering students. They set up a 48-hour prototyping hackathon and had the LLM participate by giving instructions to two people, who then built the prototypes based on those instructions.

The results showed that the LLM was able to come up with good design concepts and provide clear instructions for building functional prototypes. It performed about as well as the human teams, finishing in second place out of six teams. This suggests that LLMs like ChatGPT could be useful tools for the design process, particularly in the early stages of ideation and concept generation.

However, the LLM also had some limitations. It tended to give up on promising ideas too easily when faced with minor challenges, and it sometimes added unnecessary complexity to the designs. The communication between the LLM and the human participants was also tricky, with the LLM sometimes giving unclear or irrelevant instructions.

Overall, the study suggests that LLMs could be a valuable addition to the design process, but they would need to be used carefully and with human oversight to address these challenges. The researchers provide some recommendations, like having humans review the LLM's work, giving it more specific and manageable tasks, and implementing feedback loops to help it improve.

Technical Explanation

The study conducted a comparative analysis of the design practices and performance of the large language model (LLM) ChatGPT 4.0 and graduate engineering students in a 48-hour prototyping hackathon. The researchers used a dataset comprising more than 100 prototypes generated during the event.

For the study, the LLM participated by instructing two participants who executed its instructions and provided objective feedback. The LLM also generated ideas autonomously and made all design decisions without human intervention.

The results showed that the LLM exhibited similar prototyping practices to human participants and finished second among six teams, successfully designing and providing building instructions for functional prototypes. The LLM's concept generation capabilities were particularly strong, demonstrating its potential as a useful tool for ideation.

However, the LLM also faced some challenges. It tended to prematurely abandon promising concepts when facing minor difficulties, added unnecessary complexity to designs, and experienced design fixation. Additionally, communication between the LLM and participants was challenging due to vague or unclear descriptions, and the LLM had difficulty maintaining continuity and relevance in its answers.

Based on these findings, the researchers propose six recommendations for implementing an LLM like ChatGPT in the design process, including leveraging it for ideation, ensuring human oversight for key decisions, implementing iterative feedback loops, prompting it to consider alternatives, and assigning specific and manageable tasks at a subsystem level.

Critical Analysis

The study provides valuable insights into the potential and limitations of using large language models like ChatGPT in the design process. The researchers acknowledge that the LLM's performance was impressive in some areas, such as concept generation, but also highlight the challenges it faced, such as premature abandonment of ideas and difficulty with communication and continuity.

One potential limitation of the study is the relatively small sample size, with only six teams participating in the hackathon. While the dataset of over 100 prototypes provides a good basis for analysis, expanding the study to include a larger number of teams and LLM participants could help strengthen the findings and provide more robust conclusions.

Additionally, the study focuses on a specific LLM, ChatGPT 4.0, which may have limitations or quirks that are not representative of all LLMs. It would be interesting to see how other LLMs, such as GPT-4, perform in similar design challenges, as their capabilities and limitations may differ.

Overall, the study presents a thoughtful and nuanced exploration of the role of LLMs in the design process. The researchers' recommendations for leveraging the strengths of LLMs while addressing their limitations are valuable for researchers and practitioners looking to integrate these powerful AI models into their design workflows.

Conclusion

This study provides an insightful comparison of the design practices and performance of the large language model ChatGPT 4.0 and human engineering students in a prototyping hackathon. The results suggest that LLMs like ChatGPT have significant potential to contribute to the design process, particularly in the ideation and concept generation stages. However, the study also highlights the challenges that LLMs face, such as premature abandonment of promising ideas, addition of unnecessary complexity, and difficulties in communication and maintaining continuity.

The researchers' recommendations for effectively leveraging LLMs in design, such as ensuring human oversight, implementing feedback loops, and assigning specific and manageable tasks, offer a valuable roadmap for researchers and practitioners interested in integrating these powerful AI models into their design workflows. As the capabilities of LLMs continue to evolve, studies like this will be crucial in guiding the responsible and effective implementation of these technologies in design and other creative domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

✨

Beyond Code Generation: An Observational Study of ChatGPT Usage in Software Engineering Practice

Ranim Khojah, Mazen Mohamad, Philipp Leitner, Francisco Gomes de Oliveira Neto

Large Language Models (LLMs) are frequently discussed in academia and the general public as support tools for virtually any use case that relies on the production of text, including software engineering. Currently there is much debate, but little empirical evidence, regarding the practical usefulness of LLM-based tools such as ChatGPT for engineers in industry. We conduct an observational study of 24 professional software engineers who have been using ChatGPT over a period of one week in their jobs, and qualitatively analyse their dialogues with the chatbot as well as their overall experience (as captured by an exit survey). We find that, rather than expecting ChatGPT to generate ready-to-use software artifacts (e.g., code), practitioners more often use ChatGPT to receive guidance on how to solve their tasks or learn about a topic in more abstract terms. We also propose a theoretical framework for how (i) purpose of the interaction, (ii) internal factors (e.g., the user's personality), and (iii) external factors (e.g., company policy) together shape the experience (in terms of perceived usefulness and trust). We envision that our framework can be used by future research to further the academic discussion on LLM usage by software engineering practitioners, and to serve as a reference point for the design of future empirical LLM research in this domain.

4/24/2024

cs.SE cs.AI cs.CL cs.HC cs.LG

🛸

Evaluation of ChatGPT Usability as A Code Generation Tool

Tanha Miah, Hong Zhu

With the rapid advance of machine learning (ML) technology, large language models (LLMs) are increasingly explored as an intelligent tool to generate program code from natural language specifications. However, existing evaluations of LLMs have focused on their capabilities in comparison with humans. It is desirable to evaluate their usability when deciding on whether to use a LLM in software production. This paper proposes a user centric method. It includes metadata in the test cases of a benchmark to describe their usages, conducts testing in a multi-attempt process that mimic the uses of LLMs, measures LLM generated solutions on a set of quality attributes that reflect usability, and evaluates the performance based on user experiences in the uses of LLMs as a tool. The paper reports an application of the method in the evaluation of ChatGPT usability as a code generation tool for the R programming language. Our experiments demonstrated that ChatGPT is highly useful for generating R program code although it may fail on hard programming tasks. The user experiences are good with overall average number of attempts being 1.61 and the average time of completion being 47.02 seconds. Our experiments also found that the weakest aspect of usability is conciseness, which has a score of 3.80 out of 5. Our experiment also shows that it is hard for human developers to learn from experiences to improve the skill of using ChatGPT to generate code.

4/10/2024

cs.SE cs.AI

A Linguistic Comparison between Human and ChatGPT-Generated Conversations

Morgan Sandler, Hyesun Choung, Arun Ross, Prabu David

This study explores linguistic differences between human and LLM-generated dialogues, using 19.5K dialogues generated by ChatGPT-3.5 as a companion to the EmpathicDialogues dataset. The research employs Linguistic Inquiry and Word Count (LIWC) analysis, comparing ChatGPT-generated conversations with human conversations across 118 linguistic categories. Results show greater variability and authenticity in human dialogues, but ChatGPT excels in categories such as social processes, analytical style, cognition, attentional focus, and positive emotional tone, reinforcing recent findings of LLMs being more human than human. However, no significant difference was found in positive or negative affect between ChatGPT and human dialogues. Classifier analysis of dialogue embeddings indicates implicit coding of the valence of affect despite no explicit mention of affect in the conversations. The research also contributes a novel, companion ChatGPT-generated dataset of conversations between two independent chatbots, which were designed to replicate a corpus of human conversations available for open access and used widely in AI research on language modeling. Our findings enhance understanding of ChatGPT's linguistic capabilities and inform ongoing efforts to distinguish between human and LLM-generated text, which is critical in detecting AI-generated fakes, misinformation, and disinformation.

4/29/2024

cs.CL cs.AI cs.CY

Toward Automated Programming for Robotic Assembly Using ChatGPT

Annabella Macaluso, Nicholas Cote, Sachin Chitta

Despite significant technological advancements, the process of programming robots for adaptive assembly remains labor-intensive, demanding expertise in multiple domains and often resulting in task-specific, inflexible code. This work explores the potential of Large Language Models (LLMs), like ChatGPT, to automate this process, leveraging their ability to understand natural language instructions, generalize examples to new tasks, and write code. In this paper, we suggest how these abilities can be harnessed and applied to real-world challenges in the manufacturing industry. We present a novel system that uses ChatGPT to automate the process of programming robots for adaptive assembly by decomposing complex tasks into simpler subtasks, generating robot control code, executing the code in a simulated workcell, and debugging syntax and control errors, such as collisions. We outline the architecture of this system and strategies for task decomposition and code generation. Finally, we demonstrate how our system can autonomously program robots for various assembly tasks in a real-world project.

5/15/2024

cs.RO