© unsplash/@andrewtneel
Automatic analysis of geological maps and reports
Context
The first step in mining projects is the exploration of new mining sites. This involves evaluating a potential mining area in terms of its mineral composition and properties, as well as its mining method and profitability.
For this purpose, geologists evaluate various sources, such as mineral maps, geological reports from authorities and research institutes, property information and local word of mouth. The resulting assessment is of high economic importance because, due to the high cost of drilling, good prioritization of drill sites can significantly increase the efficiency of a mining project.
However, good analysis requires well-trained geologists with many years of experience and is a very time-consuming process. Automated source analysis can assist geologists and surveyors. This can result in a higher profitability due to an increased probability of hitting suitable, economically viable exploration sites.
Challenges
Geological maps and reports should be automatically analyzed by Machine Learning (ML) algorithms. Methods from the field of Natural Language Processing (NLP) will be used for the analysis of text data; methods from Computer Vision (CV) will be used for the analysis of image data.
One challenge in NLP projects is often the choice of a suitable OCR tool that can satisfactorily process the different input sources. For example, it makes a big difference whether handwritten or machine-generated reports are to be evaluated, or tabular information as opposed to continuous text.
Since the input data, both text and image, come from different sources, they may have different formats and terminologies that can make it difficult to develop a unified model. Therefore, the first step is to develop a concept of how to convert the data into a format that can be used for the model.
Potential solution approaches
The first step is to choose the OCR tool, such as Tesseract, Google Vision API, ABBYY FineReader or Amazon Textract, which provide different good results for different types of input. Modern NLP techniques that detect relationships between words and their respective contexts are Naive Bayes classifiers, TF-IDF or LSTM algorithms. Moreover, for simpler information extractions, rule-based approaches should also suffice.
The geological maps can be analyzed by CV algorithms, especially Convolutional Neural Networks (CNN), which are able to detect and classify objects in image data. As a result, they can automatically identify relevant information in the geological maps.
In order to capture and model possible dependencies, the different input data - the text data from the geological reports and the image data from the maps - should be integrated into a common data set. If both reports and maps are available for a given area, the results are more meaningful accordingly. The information extracted is then combined in a database. The user accesses this data via a Graphical User Interface (GUI) and can view the most promising exploration sites.