MATcH

BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs

1University of Edinburgh
2Univ. Grenoble Alpes

Abstract

We introduce a task consisting in matching a proof to a given mathematical statement. The task fits well within current research on Mathematical Information Retrieval and, more generally, mathematical article analysis. We present a dataset for the task (the MATcH dataset) consisting of over 180k statement-proof pairs extracted from modern mathematical research articles.

While pre-trained language models perform strongly with a mean reciprocal rank of 73.7, our analysis shows they rely on shallow symbolic cues. MATcH highlights the need for deeper semantic reasoning in mathematical article analysis.

Dataset

Results

BibTeX

@inproceedings{li-etal-2023-bert,
  title = {BERT Is Not The Count: Learning to Match Mathematical Statements with Proofs},
  author = {Li, Weixian Waylon and Ziser, Yftah and Coavoux, Maximin and Cohen, Shay B.},
  booktitle = {Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics},
  year = {2023},
  address = {Dubrovnik, Croatia},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.eacl-main.260/},
  doi = {10.18653/v1/2023.eacl-main.260},
  pages = {3581--3593}
}