Abstract
Diagnosis assignment is the process of assigning disease codes to patients. Automatic diagnosis assignment has the potential to validate code assignments, correct erroneous codes, and register completion. Previous methods build on text-based techniques utilizing medical notes but are inapplicable in the absence of these notes. We propose using patients' medication data to assign diagnosis codes. We present a proof-of-concept study using medical data from an American dataset (MIMIC-III) and Danish nationwide registers to train a machine-learning-based model that predicts an extensive collection of diagnosis codes for multiple levels of aggregation over a disease hierarchy. We further suggest a specialized loss function designed to utilize the innate hierarchical nature of the disease hierarchy. We evaluate the proposed method on a subset of 567 disease codes. Moreover, we investigate the technique's generalizability and transferability by (1) training and testing models on the same subsets of disease codes over the two medical datasets and (2) training models on the American dataset while evaluating them on the Danish dataset, respectively. Results demonstrate the proposed method can correctly assign diagnosis codes on multiple levels of aggregation from the disease hierarchy over the American dataset with recall 70.0% and precision 69.48% for top-10 assigned codes; thereby being comparable to text-based techniques. Furthermore, the specialized loss function performs consistently better than the non-hierarchical state-of-the-art version. Moreover, results suggest the proposed method is language and dataset-agnostic, with initial indications of transferability over subsets of disease codes.