We present four checks to support the validity of the dataset we are releasing. Firstly, we conducted model-based analyses of performance that incorporate uncertainty about our measurements by accounting for the missing data. Additionally,…
A Comprehensive Behavioral Dataset for the Abstraction and Reasoning Corpus
