Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

Read original: arXiv:2404.12608 - Published 4/22/2024 by Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri
Total Score

0

šŸ‘€

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • The provided document appears to be an HTML conversion of a research paper, likely from the arXiv preprint repository.
  • The document contains information about conversion errors and unsupported packages used in the paper.
  • The paper seems to discuss technical topics related to machine learning and data processing.

Plain English Explanation

This research paper explores the use of advanced machine learning techniques for various applications. Some of the key topics covered in the paper include link to "Comprehensive Survey of Self-Supervised Learning for Recommendation", link to "Generative Pre-trained Transformer for Symbolic Regression", link to "Autonomous Data Selection for Language Models of Mathematical Texts", link to "Automatic Macro Mining from Interaction Traces", and link to "Assisting Humans with Complex Comparisons". The researchers explore how these techniques can be used to solve real-world problems and advance the field of artificial intelligence.

Technical Explanation

The paper delves into the technical details of the researchers' work, including the design of their experiments, the architectural components of their models, and the key insights they gained from their analysis. For example, the paper discusses the use of link to "Comprehensive Survey of Self-Supervised Learning for Recommendation" to improve the performance of recommendation systems, the application of link to "Generative Pre-trained Transformer for Symbolic Regression" for symbolic regression tasks, and the development of link to "Autonomous Data Selection for Language Models of Mathematical Texts" to enhance language models for processing mathematical content. The paper also explores the use of link to "Automatic Macro Mining from Interaction Traces" for automating the extraction of macros from interaction data, and the application of link to "Assisting Humans with Complex Comparisons" to aid humans in making complex comparisons.

Critical Analysis

The paper acknowledges some limitations and areas for further research, such as the need to address the conversion errors and unsupported packages mentioned in the document. Additionally, the researchers may need to explore the potential biases and ethical implications of their techniques, especially when applied to sensitive domains or with the potential to impact people's lives. Further validation of the models' performance and robustness across diverse datasets and real-world scenarios would also be beneficial.

Conclusion

This research paper presents a comprehensive exploration of advanced machine learning techniques and their applications in various domains, including recommendation systems, symbolic regression, language modeling, and automated data processing. The insights and innovations discussed in the paper have the potential to drive significant advancements in the field of artificial intelligence and contribute to the development of more intelligent and efficient systems. However, as with any cutting-edge research, it is essential to consider the potential limitations and ethical implications of these techniques and to continue refining and validating the models to ensure their reliability and responsible use.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on š• ā†’

Related Papers

šŸ‘€

Total Score

0

Auto-Formula: Recommend Formulas in Spreadsheets using Contrastive Learning for Table Representations

Sibei Chen, Yeye He, Weiwei Cui, Ju Fan, Song Ge, Haidong Zhang, Dongmei Zhang, Surajit Chaudhuri

Spreadsheets are widely recognized as the most popular end-user programming tools, which blend the power of formula-based computation, with an intuitive table-based interface. Today, spreadsheets are used by billions of users to manipulate tables, most of whom are neither database experts nor professional programmers. Despite the success of spreadsheets, authoring complex formulas remains challenging, as non-technical users need to look up and understand non-trivial formula syntax. To address this pain point, we leverage the observation that there is often an abundance of similar-looking spreadsheets in the same organization, which not only have similar data, but also share similar computation logic encoded as formulas. We develop an Auto-Formula system that can accurately predict formulas that users want to author in a target spreadsheet cell, by learning and adapting formulas that already exist in similar spreadsheets, using contrastive-learning techniques inspired by similar-face recognition from compute vision. Extensive evaluations on over 2K test formulas extracted from real enterprise spreadsheets show the effectiveness of Auto-Formula over alternatives. Our benchmark data is available at https://github.com/microsoft/Auto-Formula to facilitate future research.

Read more

4/22/2024

šŸ¤”

Total Score

0

SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation

Zeyao Ma, Bohan Zhang, Jing Zhang, Jifan Yu, Xiaokang Zhang, Xiaohan Zhang, Sijia Luo, Xi Wang, Jie Tang

We introduce SpreadsheetBench, a challenging spreadsheet manipulation benchmark exclusively derived from real-world scenarios, designed to immerse current large language models (LLMs) in the actual workflow of spreadsheet users. Unlike existing benchmarks that rely on synthesized queries and simplified spreadsheet files, SpreadsheetBench is built from 912 real questions gathered from online Excel forums, which reflect the intricate needs of users. The associated spreadsheets from the forums contain a variety of tabular data such as multiple tables, non-standard relational tables, and abundant non-textual elements. Furthermore, we propose a more reliable evaluation metric akin to online judge platforms, where multiple spreadsheet files are created as test cases for each instruction, ensuring the evaluation of robust solutions capable of handling spreadsheets with varying values. Our comprehensive evaluation of various LLMs under both single-round and multi-round inference settings reveals a substantial gap between the state-of-the-art (SOTA) models and human performance, highlighting the benchmark's difficulty.

Read more

6/24/2024

šŸ›ø

Total Score

0

An Automatic Prompt Generation System for Tabular Data Tasks

Ashlesha Akella, Abhijit Manatkar, Brij Chavda, Hima Patel

Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through carefully crafted prompts. However, creating effective prompts for tabular datasets is challenging due to the structured nature of the data and the need to manage numerous columns. This paper presents an innovative auto-prompt generation system suitable for multiple LLMs, with minimal training. It proposes two novel methods; 1) A Reinforcement Learning-based algorithm for identifying and sequencing task-relevant columns 2) Cell-level similarity-based approach for enhancing few-shot example selection. Our approach has been extensively tested across 66 datasets, demonstrating improved performance in three downstream tasks: data imputation, error detection, and entity matching using two distinct LLMs; Google flan-t5-xxl and Mixtral 8x7B.

Read more

5/10/2024

Automated Contrastive Learning Strategy Search for Time Series
Total Score

0

Automated Contrastive Learning Strategy Search for Time Series

Baoyu Jing, Yansen Wang, Guoxin Sui, Jing Hong, Jingrui He, Yuqing Yang, Dongsheng Li, Kan Ren

In recent years, Contrastive Learning (CL) has become a predominant representation learning paradigm for time series. Most existing methods manually build specific CL Strategies (CLS) by human heuristics for certain datasets and tasks. However, manually developing CLS usually requires excessive prior knowledge about the data, and massive experiments to determine the detailed CL configurations. In this paper, we present an Automated Machine Learning (AutoML) practice at Microsoft, which automatically learns CLS for time series datasets and tasks, namely Automated Contrastive Learning (AutoCL). We first construct a principled search space of size over $3times10^{12}$, covering data augmentation, embedding transformation, contrastive pair construction, and contrastive losses. Further, we introduce an efficient reinforcement learning algorithm, which optimizes CLS from the performance on the validation tasks, to obtain effective CLS within the space. Experimental results on various real-world datasets demonstrate that AutoCL could automatically find the suitable CLS for the given dataset and task. From the candidate CLS found by AutoCL on several public datasets/tasks, we compose a transferable Generally Good Strategy (GGS), which has a strong performance for other datasets. We also provide empirical analysis as a guide for the future design of CLS.

Read more

8/19/2024