HEGEMON: Holistic Evaluation of Generative Foundation Models in a Security Context


The HEGEMON project establishes a sovereign benchmarking framework for AI foundation models and AI applications related to the automated analysis of geoinformation critical to national security.

Input

Multi-modal geobased data including high-resolution orthophotos, OpenStreetMap vector data, and unstructured text reports.

Output

A holistic multi-dimensional benchmarking framework for foundation models, AI application development for dossier generation, automated vector map derivation, and multimodal map interaction, as well as the development of respective use-case-specific benchmarks.

Goal

To ensure Germany’s technological sovereignty by establishing a transparent benchmarking framework for sovereign use of foundation models as well as AI-driven applications.


Introduction


In the modern digital landscape, geoinformation has become an indispensable pillar of public services and national defense. Security actors, such as the Zentrum für Geoinformationswesen der Bundeswehr (ZGeoBw), the Zentrale Stelle für Informationstechnik im Sicherheitsbereich (ZITiS), or the Bundesamt für Sicherheit in der Informationstechnik (BSI) are tasked with the continuous evaluation of vast amounts of heterogeneous data to create essential intelligence products. These include topographical maps and geoinformation dossiers (Aktuelle Geoinformationen). Currently, these processes remain largely manual or semi-automated, creating a significant bottleneck in time-critical situations. The HEGEMON program addresses this by investigating how modern foundation models can safely and effectively automate the generation of geoinformation.


Starting point


Our project is part of a multi-year science-to-application program initiated by the Agentur für Innovation in der Cybersicherheit GmbH (Cyberagentur). Unlike traditional initiatives, this program creates a unique competitive environment where multiple teams develop and rank different foundation models and evaluation frameworks. In collaboration with the Universität der Bundeswehr München (UniBw M), dida took on the challenge of developing trustworthy AI applications in the field of cybersecurity and geoinformatics.


The HEGEMON project and use cases


The primary goal of the HEGEMON project is to develop a holistic benchmarking framework for foundation models, systematically tailored for the security-relevant geoinformation context. To provide use case examples for testing the benchmarking framework and to address the use case requirements of the above-mentioned security actors, we are developing three distinct application demonstrators in the area of geoinformation processing in addition to the benchmarking framework:

  1. Automated country dossiers: This use case focuses on the creation of traceable natural language summaries regarding country-specific security topics. By utilizing supervised fine-tuning on LLMs, the system can synthesize intelligence from dozens of heterogeneous sources into structured, reliable reports, significantly reducing the manual effort required for intelligence analysis.

  2. Derivation of vector maps: This scenario addresses the technical challenge of transforming aerial imagery into standardized vector data. A second focus is on automated cartographic generalization - the process of mapping detailed, non-standardized geographic features to the strictly regulated ATKIS DLM50 format used by national security agencies.

  3. Intelligent map chatbot: We are implementing a multimodal system that allows users to explore digital maps using natural language. The chatbot can process complex spatial queries, such as asking for the presence and coordinates of specific medical facilities or infrastructure on a given map, and provide intelligent, text-based responses directly linked to the geospatial data.

Graphic 1 & 2: Visualizing the automated cartographic generalization of infrastructure networks (left) and building footprints (right) in central Leipzig. The AI solution performs a transformation from high-detail, non-standardized OpenStreetMap data (blue) to the strictly standardized ATKIS Digital Landscape Model 50 (red) specifications, specifically optimized for display at a 1:50,000 scale.

The holistic benchmarking framework and subsequently also the derived use cases will not only be evaluated on mere technical performance measures but on a more thorough set of criteria:

  1. Technical performance: Evaluation of task-specific performance and robustness.

  2. Security: Resistance to targeted manipulation attempts, including prompt injection and jailbreaking (adversarial robustness).

  3. Trustworthiness: Assessment of output consistency, explainability, fairness, usability, and compliance.

  4. Cost: Evaluation of resource efficiency, covering both monetary (e.g., operational costs) and non-monetary aspects (e.g., inference time and energy consumption).

  5. Strategy: Criteria relevant to national security, such as technological sovereignty and resilience (the ability to operate and adapt the model independently of foreign infrastructure).


Challenges


The integration of generative AI into security contexts faces hurdles that standard commercial applications do not. Most state-of-the-art foundation models are trained by private companies in the USA or China using non-public architectures, leading to a high technological dependency. 

Furthermore, these models - without proper benchmarking - can be susceptible to adversarial attacks, such as prompt injection or jailbreaking, which could compromise the integrity of security reports. It is our challenge to (1) make usage of such externally trained models as safe as possible and (2) design clever benchmarks that would show if a model is not safe to use.

From a technical perspective, the task of cartographic generalization remains a complex, largely unsolved problem, as it requires the AI to maintain topological consistency while simplifying massive amounts of non-standardized data for professional map standards.


Current progress


The project started in November 2025 and is currently in its development phase. This section will be updated regularly once milestones are reached (presumably in July 2026).


Project Facts


  • The project is scheduled to run for a duration of three years, spanning from 2025 to 2028.

  • The initiative is funded by the Agentur für Innovation in der Cybersicherheit GmbH (Cyberagentur) as part of the HEGEMON program.

  • This is a collaboration between the dida and the groups for Earth Observation (EO), and for Artificial Intelligence and Machine Learning (AIML) at the Bundeswehr Universität München (UniBw M).

For further inquiries related to the project, please contact Dr. Jan Macdonald, Project Lead HEGEMON, through the contact form below.


Contact



Related projects