Empowering Earth Science with Large Language Models DDE in an IUGS-sponsored meeting on Large Language Models in Geological Sciences

2024-08-03

325

London, UK – July 16, 2024 - In a pioneering workshop convened under Chatham House Rules on 16^th July in London, the International Union of Geological Sciences (IUGS) and the Geological Society of London (GSL) brought together leading earth scientists, technologists, and policymakers to discuss the transformative impacts of large language models (LLMs) on geosciences research. The event invited the work on developing various LLMs from participants.

Representatives from DDE and its ecosystem partner Zhejiang Lab attended this workshop. Representatives from DDE included: Dr. Natarajan Ishwaran, Deputy Secretary General of DDE, Prof. Mike Stephenson, Past DDE Governing Council President and Director of DDE Africa/Mid-East Affairs. Chair of DDE Science Committee and President of ILP, Prof. Hans Thybo, and some DDE Secretariat staff attended the workshop online.

Zhejiang Lab was represented by Prof. Ye Jieping, leader of the team collaborating with DDE on the development of GeoGPT, one of the leading LLMs under development for promoting geosciences research and his technical advisor, Dr. Yitian Xiao.

The workshop, titled "Large Language Models in the Geological Sciences" was organized to highlight the significant strides made in integrating artificial intelligence (AI) within earth sciences. Most attendees of the workshop were from geosciences researching institute, academic societies and publishers, and shared insights into the evolution of AI over the past decade and how these advancements have set the stage for the development of models like GeoGPT.

One of the key themes of the workshop was the empowerment of geological scientists through the development of LLMs. One of the attendees stressed that AI should facilitate the work of scientists with advanced tools, improve the quality of scientific output, and guide the design of AI-facilitated research projects. Most of the attendees agreed that AI, particularly LLMs, should serve as a complement to human expertise rather than a replacement for critical thinking and skills necessary for innovation.

Another critical point raised was the importance of ensuring equitable access to AI resources. One of the attendees stressed the need for researchers from all institutions to have access to necessary tools and infrastructure, not just those in well-funded organizations. Such democratization of access to and use of AI guided LLMs is of particular importance to scientists in the global south who frequently lack the necessary technological infrastructure.

The workshop also addressed the ethical considerations surrounding the deployment and use of AI tools. One of the attendees presented findings and recommendations for the responsible and applicable deployment of AI in accelerating scientific discovery. Participants discussed the need for a culture of responsible AI use, including secure sharing of model data and the design of copyright and ethics safety projects to maintain and enhance scientific quality.

The workshop came to a consensus that LLMs as having significant potential impact on scientific research. However, they also pose considerable challenges, including data availability for training LLMs, computational resources, model complexity, and interpretability. Notable concerns were raised about the fact that the most advanced AI models are often not open access. Yet, there was optimism that the gap between closed-source and open-source models would narrow over time.

During the workshop, GeoGPT, a product that is now developing under the collaboration between DDE and Zhejiang Lab was presented as the only case to demonstrate how LLMs can be tailored to meet the needs of specific scientific domains. The model, which is based on open-source foundation models such as Lama and Qwen, has undergone significant upgrades and is now extending its mix of foundational models to include Mistral a core model from France. These advancements have improved the accuracy and applicability of GeoGPT in specialized domains.

A suite of research agents and data processing tools developed by the GeoGPT team was also showcased. These tools are designed to assist scientists in extracting and processing information, including the ability to summarize research papers, extract structured data from unstructured text, and link related pieces of information across multiple documents. The symposium attendees were particularly impressed by the potential of these tools to facilitate literature-based discovery and streamline the research process.

The workshop concluded with a call for increased collaboration and sharing of best practices among international researchers and institutions. One of the attendees highlighted the rapid evolution of open-source models and the importance of benchmarking different models to ensure they meet the required performance standards. The symposium participants expressed a strong consensus on the need for continued cooperation and innovation in the field.

The workshop also marked a significant milestone in the integration of AI technologies into geological science. Attendees left with a renewed sense of optimism about the potential of GeoGPT and similar tools to accelerate scientific discovery and improve the efficiency of research. The event underscored the importance of responsible AI deployment and the collective effort needed to ensure that these technologies are accessible to earth scientists in all parts of the world.