AI Summary • Published on Mar 5, 2026
Mathematical text understanding is complex due to specialized entities and intricate relationships. This paper frames mathematical problem interpretation as a Mathematical Entity Relation Extraction (MERE) task, where operands are identified as entities and operators as their relationships. The goal is to bridge the gap between natural language and mathematical notation, enabling systems to extract meaningful relationships for applications in knowledge management, semantic search, and educational tools. A key challenge addressed is the lack of transparency in deep learning models, which hinders trust and interpretability in automated systems for mathematical text. Existing methods often rely on traditional or heuristic approaches, and large language models, particularly with XAI techniques, have not been significantly utilized for MERE to ensure model transparency.
The methodology involves a custom dataset, data preprocessing, model building with transformer architectures, and explainability using SHAP. The dataset was created by combining English texts from `Bangla_MER` and `Somikoron` datasets. It consists of 3284 unique mathematical statements, manually labeled with six basic relationships (Factorial, Addition, Subtraction, Multiplication, Division, Square Root). Number phrases (e.g., "five thousand and forty") were treated as single mathematical entities. Data preprocessing included removing non-letter characters, stop words (using NLTK), and applying lemmatization and stemming to reduce token size and model complexity.
For model building, various transformer-based models (BERT, Electra, RoBERTa, AlBERT, DistillBERT, XlNet) were evaluated. BERT was selected for its superior performance. The `bert-base-uncased` pre-trained model was fine-tuned for the MERE task. The BERT model uses its tokenizer to generate attention masks, segment embeddings, and token embeddings, which are then passed through a BERT layer with 24 transformers, followed by a fully connected layer for relation prediction. The dataset was split 80% for training and 20% for testing. Hyperparameters like learning rate, verbose, batch size, epoch, and max length (set to 50) were tuned for optimal performance.
Model evaluation utilized accuracy, micro F1 score, and macro F1 score, along with precision, recall, specificity, and error rate, calculated from the confusion matrix. To provide transparency, Explainable Artificial Intelligence (XAI) was integrated using Shapley Additive Explanations (SHAP). SHAP assigns an importance value to each feature for a given prediction, based on cooperative game theory, to highlight influential words or symbols and explain the model's decision-making process. The `shap.Explainer(pred)` function was used, which automatically selects a model-agnostic SHAP explainer for Hugging Face transformer pipelines, estimating feature contributions by perturbing input tokens.
Comparative analysis of transformer-based architectures showed that BERT achieved the best performance. BERT obtained an accuracy of 99.39%, a macro F1 score of 99.36%, and a micro F1 score of 99.27%, outperforming Electra, RoBERTa, AlBERT, DistillBERT, and XlNet, which also showed remarkable scores above 95%. Training and validation loss curves for BERT demonstrated consistent decreases and close alignment across 40 epochs, indicating effective learning without overfitting or underfitting.
A confusion matrix revealed that the model accurately predicted mathematical entity relationships in most cases. The highest misclassifications were observed for 'Addition' (eight instances), mainly confused with 'Subtraction' (seven cases) and 'Multiplication' (one case). 'Division' saw four misclassifications, all incorrectly predicted as 'Multiplication'. Overall, the error rate was very low, with only 13 misclassifications out of 657 observations. Precision, Recall, Specificity, and F1 scores were consistently high, often close to 100%, further confirming the model's outstanding performance.
The SHAP explainability analysis provided crucial insights into feature importance. SHAP values highlighted which words contributed positively (red) or negatively (blue) to the model's predictions for each mathematical relation. For instance, in 'Addition' predictions, 'eighteen', 'players', 'team', and 'thirty-six' were significant contributors. For 'Division', words like 'divided', 'equal', 'bought', and 'each' were highly influential. The analysis revealed that operation-specific keywords (e.g., "divide", "root", "square", "factor", "sub", "equals") were primary drivers of predictions across all classes, rather than isolated numeric values. Operations like 'Square Root' and 'Division' showed strong confidence due to highly distinctive keywords, while 'Addition', 'Subtraction', and 'Multiplication' relied on a broader mix of contextual and relational words. The base value and final SHAP values demonstrated high confidence (over 80% for all six relations) in the predicted outcomes, even with a smaller initial base probability. The "sum of other features" also showed substantial contribution, indicating a robust, distributed understanding by the model.
This research introduces an effective and interpretable framework for Mathematical Entity Relation Extraction (MERE) by leveraging transformer-based models and Explainable AI. The high accuracy of 99.39% achieved by BERT, coupled with the transparency provided by SHAP, builds trust in automated systems for mathematical problem-solving. The approach of treating mathematical problems as entity-relationship tasks has significant potential for practical applications. These include the development of advanced intelligent educational tools that can offer precise answers, automated proof-checking systems for complex mathematical research, and the construction of detailed knowledge graphs for mathematical content. Such advancements could greatly enhance the speed and accuracy of working with mathematical information in both educational and professional settings.
Future work aims to expand the research to more complex mathematical problems beyond basic equations, incorporating longer sequences and different areas like algebra, geometry, and calculus. There is also an intention to optimize the computational efficiency of the model, as the current approach combining BERT and SHAP is computationally heavy. Further research could connect these automated systems with advanced tools like automated theorem proving and integrate them into more robust and flexible solutions through interdisciplinary collaborations.