Europe PMC is a repository of life science literature. Europe PMC ingests all PubMed content and extends its index with other sources, including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents.
Index coverage
For more background on Europe PMC, see:
Europe PMC: a full-text literature database for the life sciences and platform for innovation. (2014). Nucleic Acids Research, 43(D1), D1042–D1048. http://doi.org/10.1093/nar/gku1061
This client supports the Europe PMC search syntax. If you are unfamiliar with searching Europe PMC, check out the Europe PMC query builder, a very nice tool that helps you to create your queries. To make use of your Europe PMC queries in R, simply copy & paste the search string to the search functions of this package.
In the following, some examples how to search Europe PMC are presented.
empc_search()
is the main function to query Europe PMC. It searches both metadata and fulltexts.
library(europepmc)
europepmc::epmc_search('malaria')
#> # A tibble: 100 x 27
#> id source pmid pmcid doi
#> <chr> <chr> <chr> <chr> <chr>
#> 1 29213208 MED 29213208 PMC5713003 10.1186/s41182-017-0070-9
#> 2 29203789 MED 29203789 PMC5714960 10.1038/s41598-017-16974-2
#> 3 29182619 MED 29182619 PMC5705151 10.1371/journal.pone.0188613
#> 4 29212553 MED 29212553 <NA> 10.1186/s13071-017-2548-z
#> 5 29190291 MED 29190291 <NA> 10.1371/journal.pmed.1002455
#> 6 29206655 MED 29206655 <NA> 10.1097/qco.0000000000000419
#> 7 29155671 MED 29155671 PMC5711327 10.3201/eid2313.170366
#> 8 29183370 MED 29183370 PMC5706414 10.1186/s12936-017-2130-3
#> 9 29217416 MED 29217416 <NA> 10.1016/j.parint.2017.12.002
#> 10 29162102 MED 29162102 PMC5697109 10.1186/s12936-017-2116-1
#> # ... with 90 more rows, and 22 more variables: title <chr>,
#> # authorString <chr>, journalTitle <chr>, journalVolume <chr>,
#> # pubYear <chr>, journalIssn <chr>, pageInfo <chr>, pubType <chr>,
#> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> # hasBook <chr>, hasSuppl <chr>, citedByCount <int>,
#> # hasReferences <chr>, hasTextMinedTerms <chr>,
#> # hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> # hasTMAccessionNumbers <chr>, firstPublicationDate <chr>, issue <chr>
Please note that Europe PMC expands queries with MeSH synonyms by default, a behaviour which can be turned off with the synonym
parameter.
europepmc::epmc_search('malaria', synonym = FALSE)
#> # A tibble: 100 x 27
#> id source pmid pmcid doi
#> <chr> <chr> <chr> <chr> <chr>
#> 1 29182619 MED 29182619 PMC5705151 10.1371/journal.pone.0188613
#> 2 29213208 MED 29213208 PMC5713003 10.1186/s41182-017-0070-9
#> 3 29162102 MED 29162102 PMC5697109 10.1186/s12936-017-2116-1
#> 4 29190291 MED 29190291 <NA> 10.1371/journal.pmed.1002455
#> 5 29162098 MED 29162098 PMC5697090 10.1186/s12936-017-2120-5
#> 6 29212553 MED 29212553 <NA> 10.1186/s13071-017-2548-z
#> 7 29143637 MED 29143637 PMC5688465 10.1186/s12889-017-4739-0
#> 8 29206991 MED 29206991 <NA> 10.1093/tropej/fmx090
#> 9 29203789 MED 29203789 PMC5714960 10.1038/s41598-017-16974-2
#> 10 29117114 MED 29117114 PMC5707999 10.3390/ijerph14111360
#> # ... with 90 more rows, and 22 more variables: title <chr>,
#> # authorString <chr>, journalTitle <chr>, issue <chr>,
#> # journalVolume <chr>, pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> # pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>,
#> # hasPDF <chr>, hasBook <chr>, hasSuppl <chr>, citedByCount <int>,
#> # hasReferences <chr>, hasTextMinedTerms <chr>,
#> # hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> # hasTMAccessionNumbers <chr>, firstPublicationDate <chr>
To get an exact match, use quotes as in the following example:
europepmc::epmc_search('"Human malaria parasites"')
#> # A tibble: 100 x 27
#> id source pmid doi
#> <chr> <chr> <chr> <chr>
#> 1 29109165 MED 29109165 10.1128/aac.01161-17
#> 2 28902970 MED 28902970 10.1111/cmi.12789
#> 3 27894375 MED 27894375 10.1017/s0031182016002110
#> 4 28900620 MED 28900620 10.1155/2017/2847548
#> 5 28525963 MED 28525963 10.1080/14760584.2017.1333426
#> 6 27748213 MED 27748213 <NA>
#> 7 PMC5576395 PMC <NA> <NA>
#> 8 27381764 MED 27381764 10.1016/j.ijpara.2016.05.008
#> 9 28531172 MED 28531172 10.1371/journal.pone.0177304
#> 10 27667688 MED 27667688 10.1016/j.dci.2016.09.012
#> # ... with 90 more rows, and 23 more variables: title <chr>,
#> # authorString <chr>, journalTitle <chr>, pubYear <chr>,
#> # journalIssn <chr>, pubType <chr>, isOpenAccess <chr>, inEPMC <chr>,
#> # inPMC <chr>, hasPDF <chr>, hasBook <chr>, citedByCount <int>,
#> # hasReferences <chr>, hasTextMinedTerms <chr>,
#> # hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> # hasTMAccessionNumbers <chr>, firstPublicationDate <chr>, issue <chr>,
#> # journalVolume <chr>, pageInfo <chr>, pmcid <chr>, hasSuppl <chr>
By default, 100 records are returned, but the number of results can be expanded or limited with the limit
parameter.
europepmc::epmc_search('"Human malaria parasites"', limit = 10)
#> # A tibble: 10 x 27
#> id source pmid doi
#> <chr> <chr> <chr> <chr>
#> 1 29109165 MED 29109165 10.1128/aac.01161-17
#> 2 28902970 MED 28902970 10.1111/cmi.12789
#> 3 27894375 MED 27894375 10.1017/s0031182016002110
#> 4 28900620 MED 28900620 10.1155/2017/2847548
#> 5 28525963 MED 28525963 10.1080/14760584.2017.1333426
#> 6 27748213 MED 27748213 <NA>
#> 7 PMC5576395 PMC <NA> <NA>
#> 8 27381764 MED 27381764 10.1016/j.ijpara.2016.05.008
#> 9 28531172 MED 28531172 10.1371/journal.pone.0177304
#> 10 27667688 MED 27667688 10.1016/j.dci.2016.09.012
#> # ... with 23 more variables: title <chr>, authorString <chr>,
#> # journalTitle <chr>, pubYear <chr>, journalIssn <chr>, pubType <chr>,
#> # isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>, hasPDF <chr>,
#> # hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> # hasLabsLinks <chr>, hasTMAccessionNumbers <chr>,
#> # firstPublicationDate <chr>, issue <chr>, journalVolume <chr>,
#> # pageInfo <chr>, pmcid <chr>, hasSuppl <chr>
Results are sorted by relevance. Other options via the sort
parameter are
sort = 'cited'
by the number of citation, descending from the most cited publicationsort = 'date'
by date published starting with the most recent publicationSometimes, you would like to send more than one search to Europe PMC at once. A simple solution is using plyr::ldply()
:
my_dois <- c(
"10.1159/000479962",
"10.1002/sctm.17-0081",
"10.1161/strokeaha.117.018077",
"10.1007/s12017-017-8447-9"
)
plyr::ldply(my_dois, function(x) {
europepmc::epmc_search(paste0("DOI:", x))
})
#> id source pmid doi
#> 1 28957815 MED 28957815 10.1159/000479962
#> 2 28941317 MED 28941317 10.1002/sctm.17-0081
#> 3 29018132 MED 29018132 10.1161/strokeaha.117.018077
#> 4 28623611 MED 28623611 10.1007/s12017-017-8447-9
#> title
#> 1 Clinical Relevance of Patent Foramen Ovale and Atrial Septum Aneurysm in Stroke: Findings of a Single-Center Cross-Sectional Study.
#> 2 Concise Review: Extracellular Vesicles Overcoming Limitations of Cell Therapies in Ischemic Stroke.
#> 3 One-Stop Management of Acute Stroke Patients: Minimizing Door-to-Reperfusion Times.
#> 4 Deferiprone Rescues Behavioral Deficits Induced by Mild Iron Exposure in a Mouse Model of Alpha-Synuclein Aggregation.
#> authorString
#> 1 Schnieder M, Siddiqui T, Karch A, Bähr M, Hasenfuss G, Liman J, Schroeter MR.
#> 2 Doeppner TR, Bähr M, Hermann DM, Giebel B.
#> 3 Psychogios MN, Behme D, Schregel K, Tsogkas I, Maier IL, Leyhe JR, Zapf A, Tran J, Bähr M, Liman J, Knauth M.
#> 4 Carboni E, Tatenhorst L, Tönges L, Barski E, Dambeck V, Bähr M, Lingor P.
#> journalTitle issue journalVolume pubYear journalIssn
#> 1 Eur Neurol 5-6 78 2017 0014-3022; 1421-9913;
#> 2 Stem Cells Transl Med 11 6 2017 2157-6564; 2157-6580;
#> 3 Stroke 11 48 2017 0039-2499; 1524-4628;
#> 4 Neuromolecular Med 2-3 19 2017 1535-1084; 1559-1174;
#> pageInfo pubType isOpenAccess inEPMC inPMC
#> 1 264-269 journal article N N N
#> 2 2044-2052 review; journal article; N N N
#> 3 3152-3155 clinical trial; journal article; N N N
#> 4 309-321 research-article; journal article; Y Y N
#> hasPDF hasBook citedByCount hasReferences hasTextMinedTerms
#> 1 N N 0 N N
#> 2 N N 0 N N
#> 3 N N 0 N N
#> 4 Y N 0 Y Y
#> hasDbCrossReferences hasLabsLinks hasTMAccessionNumbers
#> 1 N Y N
#> 2 N Y N
#> 3 N Y N
#> 4 N Y Y
#> firstPublicationDate pmcid hasSuppl
#> 1 2017-09-28 <NA> <NA>
#> 2 2017-09-23 <NA> <NA>
#> 3 2017-10-10 <NA> <NA>
#> 4 2017-06-16 PMC5570801 Y
By default, a non-nested data frame printed as tibble is returned. Other formats are output = "id_list"
" returning a list of IDs and sources, and output = “‘raw’”" to get full metadata as list. Please be aware that these lists can become very large.
Europe PMC parses article metadata for various concepts and terms.
Semantic types | Description/Examples |
---|---|
accession | A unique identifier given to a DNA or protein sequence record |
chemical | e.g. Granzymes, Peptides, Hydrogen |
disease | e.g. dysthymias, gid, icterohemorrhagic |
efo | Experimental Factor Ontology e.g. generation, health, mortality rate, scale, findings, genome etc. |
gene_protein | e.g. atp, cl-43, ecoriir, gng11, ipt1, mlks |
go_term | A Gene Ontology (GO) term e.g. annealing, neuroblasts |
organism | e.g. pneumocystidomycetes, sarus, terebratulide |
Here’s how to search for publications about meningitis:
europepmc::epmc_search('disease:meningitis')
#> # A tibble: 100 x 27
#> id source pmid pmcid doi
#> <chr> <chr> <chr> <chr> <chr>
#> 1 29095907 MED 29095907 PMC5667755 10.1371/journal.pone.0187466
#> 2 29084241 MED 29084241 PMC5662171 10.1371/journal.pone.0186985
#> 3 29038446 MED 29038446 PMC5643306 10.1038/s41598-017-13605-8
#> 4 PMC5631614 PMC <NA> PMC5631614 <NA>
#> 5 29207725 MED 29207725 PMC5713747 10.7860/jcdr/2017/28114.10532
#> 6 29057217 MED 29057217 PMC5635059 10.3389/fcimb.2017.00436
#> 7 29148389 MED 29148389 PMC5708259 10.3201/eid2312.171107
#> 8 29051603 MED 29051603 PMC5648924 10.1038/s41598-017-13234-1
#> 9 PMC5632230 PMC <NA> PMC5632230 <NA>
#> 10 PMC5631130 PMC <NA> PMC5631130 <NA>
#> # ... with 90 more rows, and 22 more variables: title <chr>,
#> # authorString <chr>, journalTitle <chr>, issue <chr>,
#> # journalVolume <chr>, pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> # pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>,
#> # hasPDF <chr>, hasBook <chr>, hasSuppl <chr>, citedByCount <int>,
#> # hasReferences <chr>, hasTextMinedTerms <chr>,
#> # hasDbCrossReferences <chr>, hasLabsLinks <chr>,
#> # hasTMAccessionNumbers <chr>, firstPublicationDate <chr>
To see, which other terms were text-mined on the article level, use the europepmc::epmc_tm()
function.
Another nice feature of Europe PMC is to search for cross-references between Europe PMC to other databases. For instance, to get publications cited by entries in the Protein Data bank in Europe published 2016:
europepmc::epmc_search('(HAS_PDB:y) AND FIRST_PDATE:2016')
#> # A tibble: 100 x 27
#> id source pmid doi
#> <chr> <chr> <chr> <chr>
#> 1 28089452 MED 28089452 10.1016/j.str.2016.12.006
#> 2 28089448 MED 28089448 10.1016/j.str.2016.12.005
#> 3 28065506 MED 28065506 10.1016/j.str.2016.12.001
#> 4 28039433 MED 28039433 10.1073/pnas.1611577114
#> 5 28036383 MED 28036383 10.1371/journal.pone.0168832
#> 6 28039325 MED 28039325 10.1093/nar/gkw1310
#> 7 28035004 MED 28035004 10.1074/jbc.m116.749713
#> 8 28034958 MED 28034958 10.1093/nar/gkw1307
#> 9 28034013 MED 28034013 10.1080/07391102.2016.1278038
#> 10 28031486 MED 28031486 10.1073/pnas.1616198114
#> # ... with 90 more rows, and 23 more variables: title <chr>,
#> # authorString <chr>, journalTitle <chr>, issue <chr>,
#> # journalVolume <chr>, pubYear <chr>, journalIssn <chr>, pageInfo <chr>,
#> # pubType <chr>, isOpenAccess <chr>, inEPMC <chr>, inPMC <chr>,
#> # hasPDF <chr>, hasBook <chr>, citedByCount <int>, hasReferences <chr>,
#> # hasTextMinedTerms <chr>, hasDbCrossReferences <chr>,
#> # hasLabsLinks <chr>, hasTMAccessionNumbers <chr>,
#> # firstPublicationDate <chr>, pmcid <chr>, hasSuppl <chr>
The following sources are supported
To retrieve metadata about these external database links, use europepmc_epmc_db()
.
Europe PMC let us also obtain citation metadata and reference sections. For retrieving citation metadata per article, use
europepmc::epmc_citations("9338777", limit = 500)
#> # A tibble: 211 x 12
#> id source
#> <chr> <chr>
#> 1 10221475 MED
#> 2 10342317 MED
#> 3 10440384 MED
#> 4 9696842 MED
#> 5 9703304 MED
#> 6 9728974 MED
#> 7 9728985 MED
#> 8 9728986 MED
#> 9 9728987 MED
#> 10 9756815 MED
#> # ... with 201 more rows, and 10 more variables: citationType <chr>,
#> # title <chr>, authorString <chr>, journalAbbreviation <chr>,
#> # pubYear <int>, volume <chr>, issue <chr>, pageInfo <chr>,
#> # citedByCount <int>, text <chr>
For reference section from an article:
europepmc::epmc_refs("28632490", limit = 200)
#> # A tibble: 169 x 19
#> id source citationType
#> <chr> <chr> <chr>
#> 1 12002480 MED JOURNAL ARTICLE
#> 2 18795164 MED JOURNAL ARTICLE
#> 3 18556606 MED JOURNAL ARTICLE
#> 4 17683018 MED JOURNAL ARTICLE
#> 5 15273108 MED JOURNAL ARTICLE
#> 6 18207219 MED JOURNAL ARTICLE
#> 7 17007908 MED JOURNAL ARTICLE
#> 8 26948762 MED JOURNAL ARTICLE
#> 9 23192912 MED JOURNAL ARTICLE
#> 10 25837385 MED JOURNAL ARTICLE
#> # ... with 159 more rows, and 16 more variables: title <chr>,
#> # authorString <chr>, journalAbbreviation <chr>, issue <chr>,
#> # pubYear <int>, volume <chr>, pageInfo <chr>, citedOrder <int>,
#> # match <chr>, essn <chr>, issn <chr>, publicationTitle <chr>,
#> # publisherLoc <chr>, publisherName <chr>, externalLink <chr>, doi <chr>
Europe PMC gives not only access to metadata, but also to full-texts. Adding AND (OPEN_ACCESS:y)
to your search query, returns only those articles where Europe PMC has also the fulltext.
Fulltext as xml can accessed via the PubMed Central ID (PMCID):
europepmc::epmc_ftxt("PMC3257301")
#> {xml_document}
#> <article article-type="research-article" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML">
#> [1] <front>\n <journal-meta>\n <journal-id journal-id-type="nlm-ta"> ...
#> [2] <body>\n <sec id="s1">\n <title>Introduction</title>\n <p>Atm ...
#> [3] <back>\n <ack>\n <p>We would like to thank Dr. C. Gourlay and Dr ...