Volume 28, Issue 7 pp. 2219-2231
RESEARCH ARTICLE

Correctness Comparison of ChatGPT-4, Gemini, Claude-3, and Copilot for Spatial Tasks

Hartwig H. Hochmair

Corresponding Author

Hartwig H. Hochmair

School of Forest, Fisheries, and Geomatics Sciences, Fort Lauderdale Research and Education Center, University of Florida, Davie, Florida, USA

Correspondence:

Hartwig H. Hochmair ([email protected])

Search for more papers by this author
Levente Juhász

Levente Juhász

GIS Center, Florida International University, Miami, Florida, USA

Search for more papers by this author
Takoda Kemp

Takoda Kemp

School of Forest, Fisheries, and Geomatics Sciences, Fort Lauderdale Research and Education Center, University of Florida, Davie, Florida, USA

Search for more papers by this author
First published: 12 August 2024
Citations: 12

ABSTRACT

Generative AI including large language models (LLMs) has recently gained significant interest in the geoscience community through its versatile task-solving capabilities including programming, arithmetic reasoning, generation of sample data, time-series forecasting, toponym recognition, or image classification. Existing performance assessments of LLMs for spatial tasks have primarily focused on ChatGPT, whereas other chatbots received less attention. To narrow this research gap, this study conducts a zero-shot correctness evaluation for a set of 76 spatial tasks across seven task categories assigned to four prominent chatbots, that is, ChatGPT-4, Gemini, Claude-3, and Copilot. The chatbots generally performed well on tasks related to spatial literacy, GIS theory, and interpretation of programming code and functions, but revealed weaknesses in mapping, code writing, and spatial reasoning. Furthermore, there was a significant difference in the correctness of results between the four chatbots. Responses from repeated tasks assigned to each chatbot showed a high level of consistency in responses with matching rates of over 80% for most task categories in the four chatbots.

Conflicts of Interest

The authors declares no conflicts of interest.

Data Availability Statement

The complete set of tasks assigned to chatbots in this study and their responses can be downloaded from https://doi.org/10.6084/m9.figshare.25903729.

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.