Estudio de caso de reconocimiento de entidades nombradas en biomedicina

: — — , , (NER)? , , "Machine Learning Deep Learning" .






. , , .





, QuickUMLS. QuickUMLS [1] — — (, , ) , (UMLS). . QuickUMLS . QuickUMLS MedMentions [2].





Figura 1. Una descripción esquemática de cómo funciona QuickUMLS.  Habiendo recibido una cadena, una base de datos UMLS convertida en una base de datos simstring, el modelo devuelve coincidencias óptimas, identificadores de concepto y tipos semánticos.
1. , QuickUMLS. , UMLS, simstring, ,

, NER

, , NER. NER (, , , . .) . , , , . , , " ", , — , - . , , "" — , "", , .





NER , , , , "." (hospital), " / " "/" (alcohol). , , , . , "alcohol" " alcohol" [ , , alcohol]. , , , , . NER . Slimmer AI, .





, , , , , . (UMLS), , . , "" "", . , "alcohol" .





UMLS (CUI), , (STY), , , , . , UMLS , , — . UMLS 2020AB, , 3 . , .





MedMentions

MedMentions. 4 392 ( ), Pubmed 2016 ; 352 K ( CUI) UMLS. 34 — 1 % UMLS. , UMLS , .





, MedMentions CUI, . , , UMLS . UMLS 127 , . MedMentions — st21pv, , , 21 .





45,3 F- [2]. , BlueBERT [3] BioBERT [4], 56,3 , [5]. , , , . , . MedMentions.





QuickUMLS:

BERT QuickUMLS , , . , QuickUMLS — , . , , , , . :





  1. . , .





  2. . , , . — zero-shot.





Zero-shot learning (ZSL) — , , , , .





, , MedMentions. , MedMentions UMLS, . , MedMentions , .





QuickUMLS

QuickUMLS . spacy. n-, , , -.  , n-, . [1]. UMLS , , n-. , simstring [6]. QuickUMLS, , UMLS . , “ ”, ( ) 0,7, :





patient:





{‘term’: ‘Inpatient’, ‘cui’: ‘C1548438’, ‘similarity’: 0.71, ‘semtypes’: {‘T078’}, ‘preferred’: 1},
{‘term’: ‘Inpatient’, ‘cui’: ‘C1549404’, ‘similarity’: 0.71, ‘semtypes’: {‘T078’}, ‘preferred’: 1},
{‘term’: ‘Inpatient’, ‘cui’: ‘C1555324’, ‘similarity’: 0.71, ‘semtypes’: {‘T058’}, ‘preferred’: 1},
{‘term’: ‘*^patient’, ‘cui’: ‘C0030705’, ‘similarity’: 0.71, ‘semtypes’: {‘T101’}, ‘preferred’: 1},
{‘term’: ‘patient’, ‘cui’: ‘C0030705’, ‘similarity’: 1.0, ‘semtypes’: {‘T101’}, ‘preferred’: 0},
{‘term’: ‘inpatient’, ‘cui’: ‘C0021562’, ‘similarity’: 0.71, ‘semtypes’: {‘T101’}, ‘preferred’: 0}
      
      



hemmorhage:





{‘term’: ‘No hemorrhage’, ‘cui’: ‘C1861265’, ‘similarity’: 0.72, ‘semtypes’: {‘T033’}, ‘preferred’: 1},
{‘term’: ‘hemorrhagin’, ‘cui’: ‘C0121419’, ‘similarity’: 0.7, ‘semtypes’: {‘T116’, ‘T126’}, ‘preferred’: 1},
{‘term’: ‘hemorrhagic’, ‘cui’: ‘C0333275’, ‘similarity’: 0.7, ‘semtypes’: {‘T080’}, ‘preferred’: 1},
{‘term’: ‘hemorrhage’, ‘cui’: ‘C0019080’, ‘similarity’: 1.0, ‘semtypes’: {‘T046’}, ‘preferred’: 0},
{‘term’: ‘GI hemorrhage’, ‘cui’: ‘C0017181’, ‘similarity’: 0.72, ‘semtypes’: {‘T046’}, ‘preferred’: 0},
{‘term’: ‘Hemorrhages’, ‘cui’: ‘C0019080’, ‘similarity’: 0.7, ‘semtypes’: {‘T046’}, ‘preferred’: 0}
      
      



, “patient” (T101) (C0030705). “” , "No hemmorhage". , , .





QuickUMLS , , 1, . () — (baseline model). seqeval , [5].





╔═══╦══════╦═══════╗
║   ║ BERTQUMLS
╠═══╬══════╬═══════╣P.53.27R.58.36F.56.31
╚═══╩══════╩═══════╝
 1 —   
      
      



, ? , , . , .





QuickUMLS

QuickUMLS . -, , , QuickUMLS, spacy, . . en_core_web_sm. , , . spacy scispacy [7], en_core_sci_sm. - .





╔═══╦══════╦═══════╦═════════╗
║   ║ BERTQUMLS ║ + Spacy
╠═══╬══════╬═══════╬═════════╣P.53.27.29R.58.36.37F.56.31.32
╚═══╩══════╩═══════╩═════════╝
 2 —   scispacy
      
      



, . QuickUMLS , - . , “” : , , , .





QuickUMLS

QuickUMLS 0,7 . , , “Jaccard”, “cosine”, “overlap” “dice”. , . 0,99, , SimString “Jaccard”, . , BERT.





╔═══╦══════╦═══════╦═════════╦════════╗
║   ║ BERTQUMLS ║ + Spacy ║ + Grid
╠═══╬══════╬═══════╬═════════╬════════╣P.53.27.29.37R.58.36.37.37F.56.31.32.37
╚═══╩══════╩═══════╩═════════╩════════╝
 3 —    
      
      



, , , , . , , , “alcohol”. , , , . , , , , . .





, . , , , , , . . , .





╔═══╦══════╦═══════╦═════════╦════════╦══════════╗
║   ║ BERTQUMLS ║ + Spacy ║ + Grid ║ + Priors
╠═══╬══════╬═══════╬═════════╬════════╬══════════╣P.53.27.29.37.39R.58.36.37.37.39F.56.31.32.37.39
╚═══╩══════╩═══════╩═════════╩════════╩══════════╝
 4 —  
      
      



, , , QuickUMLS. , 0,99, , QuickUMLS. , QuickUMLS.





: ?

, . -, , . , , : , , “alcohol” , . -, , . “ ”. . — “ ”, “”. - , , UMLS , . , :





, , QuickUMLS, . , , , . , QuickUMLS , .





, NER . , R&D . QuickUMLS , . , , , . QuickUMLS , github. , , , , , .





— , — : , , , .





, , , — , , "Machine Learning Deep Learning", NVIDIA.





[1] L. Soldaini, and N. Goharian. Quickumls: a fast, unsupervised approach for medical concept extraction, (2016), MedIR workshop, SIGIR





[2] S. Mohan, and D. Li, Medmentions: a large biomedical corpus annotated with UMLS concepts, (2019), arXiv preprint arXiv:1902.09476





[3] Y. Peng, Q. Chen, and Z. Lu, An empirical study of multi-task learning on BERT for biomedical text mining, (2020), arXiv preprint arXiv:2005.02799





[4] J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C.H. So, and J. Kang, BioBERT: a pre-trained biomedical language representation model for biomedical text mining, (2020), Bioinformatics, 36(4)





[5] K.C. Fraser, I. Nejadgholi, B. De Bruijn, M. Li, A. LaPlante and K.Z.E. Abidine, Extracting UMLS concepts from medical text using general and domain-specific deep learning models, (2019), arXiv preprint arXiv:1910.01274.





[6] N. Okazaki, and J.I. Tsujii, Simple and efficient algorithm for approximate dictionary matching, (2010, August), In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)





[7] M. Neumann, D. King, I. Beltagy, and W. Ammar, Scispacy: Fast and robust models for biomedical natural language processing, (2019), arXiv preprint arXiv:1902.07669.





, :





  • Data Scientist





  • Data Analyst





  • Data Engineering









  • Fullstack- Python





  • Java-





  • QA- JAVA





  • Frontend-









  • C++





  • Unity





  • -





  • iOS-





  • Android-









  • Machine Learning





  • "Machine Learning Deep Learning"





  • " Data Science"





  • " Machine Learning Data Science"





  • "Python -"





  • " "









  • DevOps








All Articles