Question answering system of bridge design specification based on large language model

Read original: arXiv:2408.13282 - Published 8/27/2024 by Leye Zhang, Xiangxiang Tian, Hongjun Zhang

💬

Overview

The paper constructs a question-answering system for bridge design specifications using large language models.
Three implementation approaches are explored: full fine-tuning of a pre-trained BERT model, parameter-efficient fine-tuning of BERT, and building a language model from scratch.
The system is trained on a custom dataset of question-answer pairs related to bridge design specifications.
The paper evaluates the performance of the different models on training, validation, and test datasets.

Plain English Explanation

This research paper describes a system that can answer questions about bridge design specifications. The researchers used large language models, which are powerful AI models trained on vast amounts of text data, as the foundation for their question-answering system.

The researchers tried three different approaches to building the system:

Full fine-tuning of BERT: They took a pre-trained BERT model (a widely-used large language model) and fine-tuned it extensively on the bridge design question-answer dataset.
Parameter-efficient fine-tuning of BERT: They fine-tuned the BERT model, but in a more efficient way that required fewer changes to the model's parameters.
Building a language model from scratch: They created their own language model from the ground up, rather than using a pre-trained one.

The researchers trained and evaluated the performance of these three models on a custom dataset of questions and answers related to bridge design specifications. The results showed that the fully fine-tuned BERT model achieved 100% accuracy on the training, validation, and test datasets, meaning it could accurately extract the correct answers from the bridge design specifications. The other two approaches performed well on the training data, but their ability to generalize to new, unseen data (the test dataset) was not as strong.

The key insight from this research is that using a powerful pre-trained language model like BERT and fine-tuning it extensively on a specific domain (in this case, bridge design) can lead to a highly accurate question-answering system for that domain. This could be a useful approach for building intelligent assistants or knowledge-retrieval systems in other professional or technical fields.

Technical Explanation

The researchers constructed a question-answering system for bridge design specifications using large language models. They explored three different implementation schemes:

Full fine-tuning of the BERT pre-trained model: The researchers took the BERT model, which had been pre-trained on a large corpus of general text data, and fine-tuned it extensively on the bridge design question-answer dataset. This allowed the model to learn the specific patterns and language used in bridge design specifications.
Parameter-efficient fine-tuning of the BERT pre-trained model: In this approach, the researchers fine-tuned the BERT model in a more efficient way, requiring fewer changes to the model's parameters. This can be useful for situations where computational resources are limited.
Self-built language model from scratch: The researchers also built their own language model from the ground up, rather than using a pre-trained model as a starting point.

For all three approaches, the researchers used a custom dataset of question-answer pairs related to bridge design specifications. They trained the models using the TensorFlow and Keras deep learning frameworks, with the goal of having the models predict the start and end positions of the answer within the bridge design specification text.

The experimental results showed that the fully fine-tuned BERT model achieved 100% accuracy on the training, validation, and test datasets. This means the model was able to accurately extract the correct answers from the bridge design specifications when given user questions. The other two approaches (parameter-efficient fine-tuning and self-built language model) performed well on the training data, but their generalization ability on the test dataset was not as strong.

Critical Analysis

The paper provides a useful reference for developing question-answering systems in specialized domains, such as bridge design. The researchers' decision to use a pre-trained language model like BERT as a starting point, and then extensively fine-tune it on domain-specific data, appears to be a successful strategy for achieving high performance.

However, the paper does not provide much detail on the size and quality of the custom dataset used for training and evaluation. The size and representativeness of the dataset can have a significant impact on the model's performance, especially its ability to generalize to new, unseen data. More information on the dataset, such as the number of questions, the coverage of different bridge design topics, and the diversity of the language used, would have been helpful.

Additionally, the paper does not discuss any potential limitations or caveats of the research. For example, it's unclear how the system would perform on more complex, open-ended questions that require deeper reasoning or domain-specific knowledge beyond what is contained in the bridge design specifications. Further research would be needed to understand the system's limitations and explore ways to address them.

Overall, the paper demonstrates a promising approach to building domain-specific question-answering systems using large language models. However, additional research and validation would be needed to fully assess the robustness and generalizability of the proposed system.

Conclusion

This research paper presents a system for answering questions about bridge design specifications using large language models. The key findings are:

Full fine-tuning of a pre-trained BERT model achieved 100% accuracy on the training, validation, and test datasets, demonstrating the power of extensively adapting a general language model to a specific domain.
Parameter-efficient fine-tuning of BERT and building a language model from scratch also performed well on the training data, but their ability to generalize to new, unseen data was not as strong.

The research provides a useful reference for developing question-answering systems in other professional or technical domains. By leveraging the capabilities of large language models and fine-tuning them on domain-specific data, it's possible to create highly accurate systems that can extract relevant information to answer user questions. This could have valuable applications in areas like engineering, healthcare, or law, where quick access to specialized knowledge is important.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Question answering system of bridge design specification based on large language model

Leye Zhang, Xiangxiang Tian, Hongjun Zhang

This paper constructs question answering system for bridge design specification based on large language model. Three implementation schemes are tried: full fine-tuning of the Bert pretrained model, parameter-efficient fine-tuning of the Bert pretrained model, and self-built language model from scratch. Through the self-built question and answer task dataset, based on the tensorflow and keras deep learning platform framework, the model is constructed and trained to predict the start position and end position of the answer in the bridge design specification given by the user. The experimental results show that full fine-tuning of the Bert pretrained model achieves 100% accuracy in the training-dataset, validation-dataset and test-dataset, and the system can extract the answers from the bridge design specification given by the user to answer various questions of the user; While parameter-efficient fine-tuning of the Bert pretrained model and self-built language model from scratch perform well in the training-dataset, their generalization ability in the test-dataset needs to be improved. The research of this paper provides a useful reference for the development of question answering system in professional field.

8/27/2024

ColBERT Retrieval and Ensemble Response Scoring for Language Model Question Answering

Alex Gichamba, Tewodros Kederalah Idris, Brian Ebiyau, Eric Nyberg, Teruko Mitamura

Domain-specific question answering remains challenging for language models, given the deep technical knowledge required to answer questions correctly. This difficulty is amplified for smaller language models that cannot encode as much information in their parameters as larger models. The Specializing Large Language Models for Telecom Networks challenge aimed to enhance the performance of two small language models, Phi-2 and Falcon-7B in telecommunication question answering. In this paper, we present our question answering systems for this challenge. Our solutions achieved leading marks of 81.9% accuracy for Phi-2 and 57.3% for Falcon-7B. We have publicly released our code and fine-tuned models.

8/21/2024

Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions

Wenxin Zhou, Thuy Hang Ngo

Our team participated in the BioASQ 2024 Task12b and Synergy tasks to build a system that can answer biomedical questions by retrieving relevant articles and snippets from the PubMed database and generating exact and ideal answers. We propose a two-level information retrieval and question-answering system based on pre-trained large language models (LLM), focused on LLM prompt engineering and response post-processing. We construct prompts with in-context few-shot examples and utilize post-processing techniques like resampling and malformed response detection. We compare the performance of various pre-trained LLM models on this challenge, including Mixtral, OpenAI GPT and Llama2. Our best-performing system achieved 0.14 MAP score on document retrieval, 0.05 MAP score on snippet retrieval, 0.96 F1 score for yes/no questions, 0.38 MRR score for factoid questions and 0.50 F1 score for list questions in Task 12b.

7/10/2024

Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning

Elena Grazia Gado, Tommaso Martorella, Luca Zunino, Paola Mejia-Domenzain, Vinitra Swamy, Jibril Frej, Tanja Kaser

Intelligent Tutoring Systems (ITS) enhance personalized learning by predicting student answers to provide immediate and customized instruction. However, recent research has primarily focused on the correctness of the answer rather than the student's performance on specific answer choices, limiting insights into students' thought processes and potential misconceptions. To address this gap, we present MCQStudentBert, an answer forecasting model that leverages the capabilities of Large Language Models (LLMs) to integrate contextual understanding of students' answering history along with the text of the questions and answers. By predicting the specific answer choices students are likely to make, practitioners can easily extend the model to new answer choices or remove answer choices for the same multiple-choice question (MCQ) without retraining the model. In particular, we compare MLP, LSTM, BERT, and Mistral 7B architectures to generate embeddings from students' past interactions, which are then incorporated into a finetuned BERT's answer-forecasting mechanism. We apply our pipeline to a dataset of language learning MCQ, gathered from an ITS with over 10,000 students to explore the predictive accuracy of MCQStudentBert, which incorporates student interaction patterns, in comparison to correct answer prediction and traditional mastery-learning feature-based approaches. This work opens the door to more personalized content, modularization, and granular support.

5/31/2024