CityGPT: Empowering Urban Spatial Cognition of Large Language Models

2406.13948

Published 6/21/2024 by Jie Feng, Yuwei Du, Tianhui Liu, Siqi Guo, Yuming Lin, Yong Li

💬

Abstract

Large language models(LLMs) with powerful language generation and reasoning capabilities have already achieved success in many domains, e.g., math and code generation. However, due to the lacking of physical world's corpus and knowledge during training, they usually fail to solve many real-life tasks in the urban space. In this paper, we propose CityGPT, a systematic framework for enhancing the capability of LLMs on understanding urban space and solving the related urban tasks by building a city-scale world model in the model. First, we construct a diverse instruction tuning dataset CityInstruction for injecting urban knowledge and enhancing spatial reasoning capability effectively. By using a mixture of CityInstruction and general instruction data, we fine-tune various LLMs (e.g., ChatGLM3-6B, Qwen1.5 and LLama3 series) to enhance their capability without sacrificing general abilities. To further validate the effectiveness of proposed methods, we construct a comprehensive benchmark CityEval to evaluate the capability of LLMs on diverse urban scenarios and problems. Extensive evaluation results demonstrate that small LLMs trained with CityInstruction can achieve competitive performance with commercial LLMs in the comprehensive evaluation of CityEval. The source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityGPT.

Create account to get full access

Overview

Large language models (LLMs) have achieved remarkable success in many domains, such as math and code generation.
However, due to a lack of physical world knowledge during training, LLMs often fail to solve real-life tasks in urban environments.
The researchers propose CityGPT, a systematic framework for enhancing the capability of LLMs to understand urban spaces and solve related urban tasks.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can generate human-like text and perform various tasks. These models have already shown impressive capabilities in areas like mathematics and computer programming. However, they often struggle with real-world tasks in urban environments, like navigating a city or understanding urban problems.

The researchers have developed a new framework called CityGPT to address this issue. CityGPT aims to enhance the ability of LLMs to understand and interact with urban spaces. The researchers created a specialized training dataset called CityInstruction that teaches the models about urban concepts and spatial reasoning. By fine-tuning various LLMs, including ChatGLM3-6B, Qwen1.5, and the LLama3 series, on this dataset, the researchers were able to improve the models' urban understanding without sacrificing their general capabilities.

To evaluate the effectiveness of their approach, the researchers created a comprehensive benchmark called CityEval that tests the models' performance on various urban tasks and scenarios. The results show that even smaller LLMs trained with the CityInstruction dataset can achieve competitive results compared to larger, more commercially available models.

Technical Explanation

The researchers propose CityGPT, a framework for enhancing the capability of LLMs to understand and interact with urban spaces. They first construct a diverse instruction-tuning dataset called CityInstruction to inject urban knowledge and improve spatial reasoning capabilities.

The researchers then fine-tune various LLMs, including ChatGLM3-6B, Qwen1.5, and the LLama3 series, using a mixture of the CityInstruction dataset and general instruction data. This allows the models to enhance their urban understanding without losing their general abilities.

To validate the effectiveness of their approach, the researchers construct a comprehensive benchmark called CityEval that evaluates the models' performance on various urban scenarios and problems. The extensive evaluation results demonstrate that even smaller LLMs trained with the CityInstruction dataset can achieve competitive performance with larger, more commercially available models.

Critical Analysis

The researchers have made a valuable contribution by addressing the limitations of LLMs in understanding and interacting with urban environments. The CityGPT framework and the CityInstruction dataset provide a systematic approach to enhancing the urban capabilities of LLMs.

However, the paper could have explored the limitations and potential issues of this approach in more depth. For example, the researchers could have discussed the challenges of scaling the CityInstruction dataset to cover a broader range of urban scenarios or the potential biases that may be introduced by the dataset.

Additionally, the paper could have highlighted areas for future research, such as the integration of CityGPT with multimodal approaches, which could further enhance the models' understanding of urban environments by incorporating visual, auditory, and other sensory information.

Conclusion

The researchers have presented an innovative framework, CityGPT, that aims to enhance the capability of LLMs to understand and interact with urban spaces. By creating a specialized CityInstruction dataset and fine-tuning various LLMs, the researchers have demonstrated the potential for these models to perform well on diverse urban tasks and scenarios.

The successful development of CityGPT could have significant implications for urban planning, transportation, and a wide range of other applications that require a deep understanding of the built environment. As LLMs continue to evolve, the integration of specialized knowledge and capabilities, as demonstrated in this research, could be a promising approach to unlock their full potential in real-world problem-solving.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

UrbanGPT: Spatio-Temporal Large Language Models

Zhonghang Li, Lianghao Xia, Jiabin Tang, Yong Xu, Lei Shi, Long Xia, Dawei Yin, Chao Huang

Spatio-temporal prediction aims to forecast and gain insights into the ever-changing dynamics of urban environments across both time and space. Its purpose is to anticipate future patterns, trends, and events in diverse facets of urban life, including transportation, population movement, and crime rates. Although numerous efforts have been dedicated to developing neural network techniques for accurate predictions on spatio-temporal data, it is important to note that many of these methods heavily depend on having sufficient labeled data to generate precise spatio-temporal representations. Unfortunately, the issue of data scarcity is pervasive in practical urban sensing scenarios. Consequently, it becomes necessary to build a spatio-temporal model with strong generalization capabilities across diverse spatio-temporal learning scenarios. Taking inspiration from the remarkable achievements of large language models (LLMs), our objective is to create a spatio-temporal LLM that can exhibit exceptional generalization capabilities across a wide range of downstream urban tasks. To achieve this objective, we present the UrbanGPT, which seamlessly integrates a spatio-temporal dependency encoder with the instruction-tuning paradigm. This integration enables LLMs to comprehend the complex inter-dependencies across time and space, facilitating more comprehensive and accurate predictions under data scarcity. To validate the effectiveness of our approach, we conduct extensive experiments on various public datasets, covering different spatio-temporal prediction tasks. The results consistently demonstrate that our UrbanGPT, with its carefully designed architecture, consistently outperforms state-of-the-art baselines. These findings highlight the potential of building large language models for spatio-temporal learning, particularly in zero-shot scenarios where labeled data is scarce.

5/21/2024

cs.CL cs.AI cs.CY

UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models

Yue Jiang, Qin Chao, Yile Chen, Xiucheng Li, Shuai Liu, Gao Cong

Location-based services play an critical role in improving the quality of our daily lives. Despite the proliferation of numerous specialized AI models within spatio-temporal context of location-based services, these models struggle to autonomously tackle problems regarding complex urban planing and management. To bridge this gap, we introduce UrbanLLM, a fine-tuned large language model (LLM) designed to tackle diverse problems in urban scenarios. UrbanLLM functions as a problem-solver by decomposing urban-related queries into manageable sub-tasks, identifying suitable spatio-temporal AI models for each sub-task, and generating comprehensive responses to the given queries. Our experimental results indicate that UrbanLLM significantly outperforms other established LLMs, such as Llama and the GPT series, in handling problems concerning complex urban activity planning and management. UrbanLLM exhibits considerable potential in enhancing the effectiveness of solving problems in urban scenarios, reducing the workload and reliance for human experts.

6/19/2024

cs.LG

CityBench: Evaluating the Capabilities of Large Language Model as World Model

Jie Feng, Jun Zhang, Junbo Yan, Xin Zhang, Tianjian Ouyang, Tianhui Liu, Yuwei Du, Siqi Guo, Yong Li

Large language models (LLMs) with powerful generalization ability has been widely used in many domains. A systematic and reliable evaluation of LLMs is a crucial step in their development and applications, especially for specific professional fields. In the urban domain, there have been some early explorations about the usability of LLMs, but a systematic and scalable evaluation benchmark is still lacking. The challenge in constructing a systematic evaluation benchmark for the urban domain lies in the diversity of data and scenarios, as well as the complex and dynamic nature of cities. In this paper, we propose CityBench, an interactive simulator based evaluation platform, as the first systematic evaluation benchmark for the capability of LLMs for urban domain. First, we build CitySim to integrate the multi-source data and simulate fine-grained urban dynamics. Based on CitySim, we design 7 tasks in 2 categories of perception-understanding and decision-making group to evaluate the capability of LLMs as city-scale world model for urban domain. Due to the flexibility and ease-of-use of CitySim, our evaluation platform CityBench can be easily extended to any city in the world. We evaluate 13 well-known LLMs including open source LLMs and commercial LLMs in 13 cities around the world. Extensive experiments demonstrate the scalability and effectiveness of proposed CityBench and shed lights for the future development of LLMs in urban domain. The dataset, benchmark and source codes are openly accessible to the research community via https://github.com/tsinghua-fib-lab/CityBench

6/21/2024

cs.AI cs.CL cs.LG

📈

WorldGPT: Empowering LLM as Multimodal World Model

Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang

World models are progressively being employed across diverse fields, extending from basic environment simulation to complex scenario construction. However, existing models are mainly trained on domain-specific states and actions, and confined to single-modality state representations. In this paper, We introduce WorldGPT, a generalist world model built upon Multimodal Large Language Model (MLLM). WorldGPT acquires an understanding of world dynamics through analyzing millions of videos across various domains. To further enhance WorldGPT's capability in specialized scenarios and long-term tasks, we have integrated it with a novel cognitive architecture that combines memory offloading, knowledge retrieval, and context reflection. As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios. Conducting evaluations on WorldNet directly demonstrates WorldGPT's capability to accurately model state transition patterns, affirming its effectiveness in understanding and predicting the dynamics of complex scenarios. We further explore WorldGPT's emerging potential in serving as a world simulator, helping multimodal agents generalize to unfamiliar domains through efficiently synthesising multimodal instruction instances which are proved to be as reliable as authentic data for fine-tuning purposes. The project is available on url{https://github.com/DCDmllm/WorldGPT}.

4/30/2024

cs.AI cs.MM