: — — , , (NER)? , , "Machine Learning Deep Learning" .
. , , .
, QuickUMLS. QuickUMLS [1] — — (, , ) , (UMLS). . QuickUMLS . QuickUMLS MedMentions [2].
, NER
, , NER. NER (, , , . .) . , , , . , , " ", , — , - . , , "" — , "", , .
NER , , , , "." (hospital), " / " "/" (alcohol). , , , . , "alcohol" " alcohol" [ , , alcohol]. , , , , . NER . Slimmer AI, .
, , , , , . (UMLS), , . , "" "", . , "alcohol" .
UMLS (CUI), , (STY), , , , . , UMLS , , — . UMLS 2020AB, , 3 . , .
MedMentions
MedMentions. 4 392 ( ), Pubmed 2016 ; 352 K ( CUI) UMLS. 34 — 1 % UMLS. , UMLS , .
, MedMentions CUI, . , , UMLS . UMLS 127 , . MedMentions — st21pv, , , 21 .
45,3 F- [2]. , BlueBERT [3] BioBERT [4], 56,3 , [5]. , , , . , . MedMentions.
QuickUMLS:
BERT QuickUMLS , , . , QuickUMLS — , . , , , , . :
. , .
. , , . — zero-shot.
Zero-shot learning (ZSL) — , , , , .
, , MedMentions. , MedMentions UMLS, . , MedMentions , .
QuickUMLS
QuickUMLS . spacy. n-, , , -. , n-, . [1]. UMLS , , n-. , simstring [6]. QuickUMLS, , UMLS . , “ ”, ( ) 0,7, :
patient:
{‘term’: ‘Inpatient’, ‘cui’: ‘C1548438’, ‘similarity’: 0.71, ‘semtypes’: {‘T078’}, ‘preferred’: 1}, {‘term’: ‘Inpatient’, ‘cui’: ‘C1549404’, ‘similarity’: 0.71, ‘semtypes’: {‘T078’}, ‘preferred’: 1}, {‘term’: ‘Inpatient’, ‘cui’: ‘C1555324’, ‘similarity’: 0.71, ‘semtypes’: {‘T058’}, ‘preferred’: 1}, {‘term’: ‘*^patient’, ‘cui’: ‘C0030705’, ‘similarity’: 0.71, ‘semtypes’: {‘T101’}, ‘preferred’: 1}, {‘term’: ‘patient’, ‘cui’: ‘C0030705’, ‘similarity’: 1.0, ‘semtypes’: {‘T101’}, ‘preferred’: 0}, {‘term’: ‘inpatient’, ‘cui’: ‘C0021562’, ‘similarity’: 0.71, ‘semtypes’: {‘T101’}, ‘preferred’: 0}
hemmorhage:
{‘term’: ‘No hemorrhage’, ‘cui’: ‘C1861265’, ‘similarity’: 0.72, ‘semtypes’: {‘T033’}, ‘preferred’: 1},
{‘term’: ‘hemorrhagin’, ‘cui’: ‘C0121419’, ‘similarity’: 0.7, ‘semtypes’: {‘T116’, ‘T126’}, ‘preferred’: 1},
{‘term’: ‘hemorrhagic’, ‘cui’: ‘C0333275’, ‘similarity’: 0.7, ‘semtypes’: {‘T080’}, ‘preferred’: 1},
{‘term’: ‘hemorrhage’, ‘cui’: ‘C0019080’, ‘similarity’: 1.0, ‘semtypes’: {‘T046’}, ‘preferred’: 0},
{‘term’: ‘GI hemorrhage’, ‘cui’: ‘C0017181’, ‘similarity’: 0.72, ‘semtypes’: {‘T046’}, ‘preferred’: 0},
{‘term’: ‘Hemorrhages’, ‘cui’: ‘C0019080’, ‘similarity’: 0.7, ‘semtypes’: {‘T046’}, ‘preferred’: 0}
, “patient” (T101) (C0030705). “” , "No hemmorhage". , , .
QuickUMLS , , 1, . () — (baseline model). seqeval , [5].
╔═══╦══════╦═══════╗
║ ║ BERT ║ QUMLS ║
╠═══╬══════╬═══════╣
║ P ║ .53 ║ .27 ║
║ R ║ .58 ║ .36 ║
║ F ║ .56 ║ .31 ║
╚═══╩══════╩═══════╝
1 —
, ? , , . , .
QuickUMLS
QuickUMLS . -, , , QuickUMLS, spacy, . . en_core_web_sm. , , . spacy scispacy [7], en_core_sci_sm. - .
╔═══╦══════╦═══════╦═════════╗
║ ║ BERT ║ QUMLS ║ + Spacy ║
╠═══╬══════╬═══════╬═════════╣
║ P ║ .53 ║ .27 ║ .29 ║
║ R ║ .58 ║ .36 ║ .37 ║
║ F ║ .56 ║ .31 ║ .32 ║
╚═══╩══════╩═══════╩═════════╝
2 — scispacy
, . QuickUMLS , - . , “” : , , , .
QuickUMLS
QuickUMLS 0,7 . , , “Jaccard”, “cosine”, “overlap” “dice”. , . 0,99, , SimString “Jaccard”, . , BERT.
╔═══╦══════╦═══════╦═════════╦════════╗
║ ║ BERT ║ QUMLS ║ + Spacy ║ + Grid ║
╠═══╬══════╬═══════╬═════════╬════════╣
║ P ║ .53 ║ .27 ║ .29 ║ .37 ║
║ R ║ .58 ║ .36 ║ .37 ║ .37 ║
║ F ║ .56 ║ .31 ║ .32 ║ .37 ║
╚═══╩══════╩═══════╩═════════╩════════╝
3 —
, , , , . , , , “alcohol”. , , , . , , , , . .
╔═══╦══════╦═══════╦═════════╦════════╦══════════╗
║ ║ BERT ║ QUMLS ║ + Spacy ║ + Grid ║ + Priors ║
╠═══╬══════╬═══════╬═════════╬════════╬══════════╣
║ P ║ .53 ║ .27 ║ .29 ║ .37 ║ .39 ║
║ R ║ .58 ║ .36 ║ .37 ║ .37 ║ .39 ║
║ F ║ .56 ║ .31 ║ .32 ║ .37 ║ .39 ║
╚═══╩══════╩═══════╩═════════╩════════╩══════════╝
4 —
, , , QuickUMLS. , 0,99, , QuickUMLS. , QuickUMLS.
: ?
, . -, , . , , : , , “alcohol” , . -, , . “ ”. . — “ ”, “”. - , , UMLS , . , :
, , QuickUMLS, . , , , . , QuickUMLS , .
, NER . , R&D . QuickUMLS , . , , , . QuickUMLS , github. , , , , , .
— , — : , , , .
, , , — , , "Machine Learning Deep Learning", NVIDIA.
[1] L. Soldaini, and N. Goharian. Quickumls: a fast, unsupervised approach for medical concept extraction, (2016), MedIR workshop, SIGIR
[2] S. Mohan, and D. Li, Medmentions: a large biomedical corpus annotated with UMLS concepts, (2019), arXiv preprint arXiv:1902.09476
[3] Y. Peng, Q. Chen, and Z. Lu, An empirical study of multi-task learning on BERT for biomedical text mining, (2020), arXiv preprint arXiv:2005.02799
[4] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, and J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, (2020), Bioinformatics, 36(4)
[5] K.C. Fraser, I. Nejadgholi, B. De Bruijn, M. Li, A. LaPlante and K.Z.E. Abidine, Extracting UMLS concepts from medical text using general and domain-specific deep learning models, (2019), arXiv preprint arXiv:1910.01274.
[6] N. Okazaki, and J.I. Tsujii, Simple and efficient algorithm for approximate dictionary matching, (2010, August), In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)
[7] M. Neumann, D. King, I. Beltagy, and W. Ammar, Scispacy: Fast and robust models for biomedical natural language processing, (2019), arXiv preprint arXiv:1902.07669.