MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

Read original: arXiv:2402.16352 - Published 9/12/2024 by Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

Overview

This paper presents MathGenie, a system for generating synthetic data to enhance the mathematical reasoning capabilities of large language models (LLMs).
MathGenie uses a question back-translation approach, where existing questions are translated back and forth between natural language and mathematical representations.
The goal is to create a diverse set of synthetic training examples that can help LLMs better understand and solve mathematical problems.

Plain English Explanation

The researchers behind MathGenie have come up with a clever way to improve the mathematical abilities of large language models (LLMs) - the powerful AI systems that can understand and generate human-like text. The key insight is that by generating synthetic data for training, LLMs can become better at solving mathematical problems.

The way MathGenie works is by back-translating existing mathematical questions. First, a question is translated from natural language into a more formal, mathematical representation. Then, that representation is translated back into natural language, creating a new question that is related to the original but slightly different.

This back-and-forth translation process generates a large number of synthetic questions that LLMs can use for training. By exposing the models to this diverse set of mathematical problems, the researchers hope to enhance the models' mathematical reasoning capabilities - allowing them to better understand and solve a wide range of math-related tasks.

The key advantage of this approach is that it can create large quantities of high-quality training data without the need for expensive human annotation. By tapping into the wealth of existing mathematical questions, MathGenie can efficiently generate the synthetic examples needed to push the boundaries of what LLMs can do when it comes to mathematical reasoning.

Technical Explanation

The core of the MathGenie system is a question back-translation approach, where existing mathematical questions are translated back and forth between natural language and a more formal, mathematical representation.

First, a question is encoded using a pre-trained Transformer-based sequence-to-sequence model, converting the natural language into a symbolic, mathematical form. This mathematical representation is then decoded back into natural language using another Transformer model, creating a new question that is related to the original but with slight variations.

By iterating this back-and-forth translation process, MathGenie can generate a large and diverse set of synthetic training examples for LLMs. The researchers hypothesize that exposing the models to this wider range of mathematical problems will enhance their overall mathematical reasoning capabilities, allowing them to better understand and solve a variety of math-related tasks.

The experiments conducted in the paper demonstrate the effectiveness of this approach. LLMs trained on the synthetic data generated by MathGenie showed significant improvements in their mathematical reasoning abilities compared to models trained on standard datasets alone.

Critical Analysis

The MathGenie approach represents a novel and promising direction for improving the mathematical reasoning capabilities of LLMs. By generating synthetic data through a question back-translation process, the researchers have found a way to efficiently create large quantities of high-quality training examples without the need for expensive human annotation.

However, the paper does note some potential limitations and areas for further research. For example, the back-translation process may not always result in mathematically valid or semantically meaningful questions, which could introduce noise into the training data. Additionally, the paper does not explore the long-term robustness of the models trained with MathGenie, or how well the improvements in mathematical reasoning generalize to real-world applications.

Furthermore, while the paper demonstrates the effectiveness of MathGenie on a range of mathematical tasks, it would be valuable to see how the approach performs on even more diverse and challenging problem sets, particularly those that require advanced mathematical thinking and reasoning.

Despite these potential areas for improvement, the MathGenie system represents a significant step forward in the quest to enhance the mathematical capabilities of LLMs. By leveraging innovative techniques like question back-translation, the researchers have opened up new avenues for improving the self-improving code-assisted mathematical reasoning of these powerful AI models.

Conclusion

The MathGenie system presented in this paper offers a novel approach for generating synthetic data to enhance the mathematical reasoning capabilities of large language models (LLMs). By using a question back-translation technique, the researchers have found a way to efficiently create a diverse set of training examples that can help push the boundaries of what LLMs can do when it comes to solving mathematical problems.

The experiments conducted in the paper demonstrate the effectiveness of this approach, with LLMs trained on the synthetic data generated by MathGenie showing significant improvements in their mathematical reasoning abilities. While the paper highlights some potential limitations and areas for further research, the overall impact of this work is clear: MathGenie represents an important step forward in the ongoing effort to explore the mathematical extrapolation capabilities of large language models and unlock their full potential for tackling a wide range of mathematical tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

MathGenie: Generating Synthetic Data with Question Back-translation for Enhancing Mathematical Reasoning of LLMs

Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, Hongsheng Li

Large language models (LLMs) have exhibited great potential in mathematical reasoning. However, there remains a performance gap in this area between existing open-source models and closed-source models such as GPT-4. In this paper, we introduce MathGenie, a novel method for generating diverse and reliable math problems from a small-scale problem-solution dataset (denoted as seed data). We augment the ground-truth solutions of our seed data and train a back-translation model to translate the augmented solutions back into new questions. Subsequently, we generate code-integrated solutions for the new questions. To ensure the correctness of the code-integrated solutions, we employ rationale-based strategy for solution verification. Various pretrained models, ranging from 7B to 70B, are trained on the newly curated data to test the effectiveness of the proposed augmentation technique, resulting in a family of models known as MathGenieLM. These models consistently outperform previous open-source models across five representative mathematical reasoning datasets, achieving state-of-the-art performance. In particular, MathGenieLM-InternLM2 achieves an accuracy of 87.7% on GSM8K and 55.7% on MATH, securing the best overall score among open-source language models.

9/12/2024

🏋️

JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models

Kun Zhou, Beichen Zhang, Jiapeng Wang, Zhipeng Chen, Wayne Xin Zhao, Jing Sha, Zhichao Sheng, Shijin Wang, Ji-Rong Wen

Mathematical reasoning is an important capability of large language models~(LLMs) for real-world applications. To enhance this capability, existing work either collects large-scale math-related texts for pre-training, or relies on stronger LLMs (eg GPT-4) to synthesize massive math problems. Both types of work generally lead to large costs in training or synthesis. To reduce the cost, based on open-source available texts, we propose an efficient way that trains a small LLM for math problem synthesis, to efficiently generate sufficient high-quality pre-training data. To achieve it, we create a dataset using GPT-4 to distill its data synthesis capability into the small LLM. Concretely, we craft a set of prompts based on human education stages to guide GPT-4, to synthesize problems covering diverse math knowledge and difficulty levels. Besides, we adopt the gradient-based influence estimation method to select the most valuable math-related texts. The both are fed into GPT-4 for creating the knowledge distillation dataset to train the small LLM. We leverage it to synthesize 6 million math problems for pre-training our JiuZhang3.0 model, which only needs to invoke GPT-4 API 9.3k times and pre-train on 4.6B data. Experimental results have shown that JiuZhang3.0 achieves state-of-the-art performance on several mathematical reasoning datasets, under both natural language reasoning and tool manipulation settings. Our code and data will be publicly released in url{https://github.com/RUCAIBox/JiuZhang3.0}.

5/24/2024

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T. Kwok, Zhenguo Li, Adrian Weller, Weiyang Liu

Large language models (LLMs) have pushed the limits of natural language understanding and exhibited excellent problem-solving ability. Despite the great success, most existing open-source LLMs (e.g., LLaMA-2) are still far away from satisfactory for solving mathematical problem due to the complex reasoning procedures. To bridge this gap, we propose MetaMath, a fine-tuned language model that specializes in mathematical reasoning. Specifically, we start by bootstrapping mathematical questions by rewriting the question from multiple perspectives without extra knowledge, which results in a new dataset called MetaMathQA. Then we fine-tune the LLaMA-2 models on MetaMathQA. Experimental results on two popular benchmarks (i.e., GSM8K and MATH) for mathematical reasoning demonstrate that MetaMath outperforms a suite of open-source LLMs by a significant margin. Our MetaMath-7B model achieves 66.4% on GSM8K and 19.4% on MATH, exceeding the state-of-the-art models of the same size by 11.5% and 8.7%. Particularly, MetaMath-70B achieves an accuracy of 82.3% on GSM8K, slightly better than GPT-3.5-Turbo. We release all the MetaMathQA dataset, the MetaMath models with different model sizes and the training code for public use.

5/6/2024

SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models

Dian Yu, Baolin Peng, Ye Tian, Linfeng Song, Haitao Mi, Dong Yu

There is a growing trend of teaching large language models (LLMs) to solve mathematical problems through coding. Existing studies primarily focus on prompting powerful, closed-source models to generate seed training data followed by in-domain data augmentation, equipping LLMs with considerable capabilities for code-aided mathematical reasoning. However, continually training these models on augmented data derived from a few datasets such as GSM8K may impair their generalization abilities and restrict their effectiveness to a narrow range of question types. Conversely, the potential of improving such LLMs by leveraging large-scale, expert-written, diverse math question-answer pairs remains unexplored. To utilize these resources and tackle unique challenges such as code response assessment, we propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. We also explore different alignment algorithms with self-generated instruction/preference data to foster continuous improvement. Experiments across both in-domain (up to +5.7%) and out-of-domain (+4.4%) benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.

8/29/2024