Apr 2, 2015; Informatics Journal Club; Azadeh Nikfarjam | SciVee

Apr 2, 2015; Informatics Journal Club; Azadeh Nikfarjam

April 16, 2015

Share Favorite

29 views

URL
Embed

Share

Please describe the reason for abuse:

Description:: Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.... » More

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features.

Nikfarjam A, Sarker A, O'Connor K, Ginn R, Gonzalez G.

J Am Med Inform Assoc. 2015 Mar 9. pii: ocu041. doi: 10.1093/jamia/ocu041. [Epub ahead of print]

Azadeh Nikfarjam

Arizona State University, Department of Biomedical Informatics

Social media is becoming increasingly popular as a platform for sharing personal health-related information. This information can be utilized for public health monitoring tasks, particularly for pharmacovigilance, via the use of natural language processing (NLP) techniques. However, the language in social media is highly informal, and user-expressed medical concepts are often nontechnical, descriptive, and challenging to extract. There has been limited progress in addressing these challenges, and thus far, advanced machine learning-based NLP techniques have been underutilized. Our objective is to design a machine learning-based approach to extract mentions of adverse drug reactions (ADRs) from highly informal text in social media. We introduce ADRMine, a machine learning-based concept extraction system that uses conditional random fields (CRFs). ADRMine utilizes a variety of features, including a novel feature for modeling words' semantic similarities. The similarities are modeled by clustering words based on unsupervised, pretrained word representation vectors (embeddings) generated from unlabeled user posts in social media using a deep learning technique. ADRMine outperforms several strong baseline systems in the ADR extraction task by achieving an F-measure of 0.82. Feature analysis demonstrates that the proposed word cluster features significantly improve extraction performance. It is possible to extract complex medical concepts, with relatively high performance, from informal, user-generated content. Our approach is particularly scalable, suitable for social media mining, as it relies on large volumes of unlabeled data, thus diminishing the need for large, annotated training data sets..

Speaker’s Biography:

I’m a PhD candidate in the department of Biomedical Informatics at Arizona State University. Previously, I graduated with a master in Computer Science and a BSc in Software Engineering. I have been doing research in text mining and information extraction in biomedical domain under the supervision of Dr. Graciela Gonzalez since 2010. Currently I’m working on automatic natural language processing techniques for medical concept extraction from social media postings. Some of the previous projects include automatic patient timeline generation from clinical notes, sentiment analysis and emotion extraction from user generated sentences, biological event extraction and gene function detection from biological literature.

« Hide

Loading comments

Related
More from this user

973

March 6, 2014. Informatics Journal Club;...

316

April 18, 2014: iDASH External Webinar; Razvan...

654

July 19, 2013: Hua Xu

497

04-04-2013 - Journal Club - Guergana Savova

1050

11/7/2013 Open Access Journal Club; Imre Solti

760

Images on iDASH: Initial Infrastructure and...

1214

Welcome and Overview of iDASH

1255

Ontology-driven image management, integration...

2529

Discussion on iDASH and Image Management

1272

Imaging Informatics in Radiology and beyond

Tags

Subject areas

information science